TruSeq Phasing Analysis Methods
The TruSeq Phasing Analysis App works in three steps:
|
•
|
Identification of Haploid Fragments: The TruSeq Phasing Analysis App separates the sequence reads into the component 384 wells based on the barcode sequence and aligns the reads to the human reference sequence. The algorithm then detects haploid fragments of DNA, or clouds, in each well, and records the overlapping heterozygous variants called in the input WGS VCF file. Clouds that indicate an overlap of fragments from both maternal and paternal chromosomes are removed. |
|
•
|
Local Phasing: The resulting fragments (up to ~10 kb long) are pooled together and longer haplotypes reconstructed by chaining together fragments that share heterozygous SNPs. The resulting blocks, referred to as locally phased blocks, are on average 5–10× longer than the individual clouds and provide highly accurate haplotype blocks derived entirely from the data. |
|
•
|
Global Phasing: In the final step of the algorithm, the locally phased blocks are phased relative to one another to reconstruct longer haplotypes by statistical imputation using the 1000 Genomes phased reference panels. Final block sizes are normally increased another 5–10× relative to locally phased blocks. Phased blocks and confidence scores are output in the phased VCF file. |
This chapter describes these three steps, and explains the phase scoring scheme that the TruSeq Phasing Analysis app uses. See also the explanation in the following PDF: www.nature.com/nbt/journal/v32/n3/extref/nbt.2833-S1.pdf.1
|