Phased.vcf.gz

The principal results of the TruSeq Phasing Analysis are returned in the phased VCF file. This file recapitulates most of the information in the input VCF file, but the phased VCF file differs from the input VCF in the following way:

The presence of additional markup providing information about the phase of variants, our confidence in the phase assignments, and on sets of phased heterozygous variants. Phased variants in the phased VCF file (*phased.vcf.gz) contain a modification in the genotype (GT) field and also have five additional INFO fields. These fields are described in the table in this topic. Heterozygous variants that could not be phased are output in the final VCF file unchanged.
The variants excluded from analysis as described in Input VCF Requirements

Phased variants in the phased VCF file (*phased.vcf.gz) contain a modification in the genotype (GT) field and also have five additional INFO fields, which are described in the following table.

Table 12   Phased Variants
Metric Description
Genotype (GT) Phased variants (heterozygous and homozygous) contain a modified GT field, using a pipe symbol (|) instead of forward slash in accordance with VCF 4.1 specifications. Variants whose genotype contains a / in the phased VCF are unphased.
Emission Likelihood (EL) An indication of the degree of confidence that the phasing of one locus is correct relative to its neighbors. Although evaluation of quantitative correctness is still ongoing, the value can be interpreted as a likelihood. Values range from 0.5 (a random guess) to 1 (complete confidence). See Phase Scoring for a more detailed description.
Transition Likelihood (TL) The confidence in the phasing of variants between the current (heterozygous) variant locus and the previous heterozygous locus. Although evaluation of quantitative correctness is still ongoing, the value can be interpreted as a likelihood. Values range from 0.5 (random guess) to 1 (complete confidence). A more detailed description is provided in the section Phase Scoring.
Phase Set (PS) Used in the TruSeq Phasing Analysis App to define the heterozygous variants that belong to a globally phased block set with TL > 0.95. The integer value represents the coordinate of the first locus in a given phase set. PS values are unique within a chromosome locus. Therefore, variants with the same PS value on different chromosomes are not phased relative to each other.
Local Phase Set (LPS) Interpretation is analogous to that of PS, except that the phasing indicated by LPS annotation is based solely on the blocks determined by local phasing. In general, confidence in the relative phasing of variants that share the LPS is higher than the confidence for variants that share a PS value but not an LPS value.
Inconsistency Flag (ICF) An indication of positions where the alleles detected in the long fragment library do not match the expected alleles from the input WGS VCF file.

 

© 2015 Illumina, Inc. All rights reserved.

15055853 Rev. B