VCF File Annotations

Heading

Description

FILTER

If all filters are passed, PASS is written in the filter column.

ForcedReport—Filter if the variant would normally fail emit filters. Is printed to VCF because it is in one of the genotypes of interest VCF files.
LowDP—Applied to sites with depth of coverage below a cutoff.
LowGQ—The genotyping quality (GQ) is below a cutoff.
LowVariantFreq—The variant frequency is less than the given threshold.
MultiAllelicSite—Filter if the variant is in a multiallelic site that breaks ploidy assumptions. Only applicable to germline variant calling.
q{threshold}—Quality below {threshold}.
R8—For an indel, the number of adjacent repeats (1-base or 2-base) in the reference is greater than 8.
R{thresholdM}x{thresholdN}—Filter if the variant is in a repeat region, where a repeat is defined as any region where the reference has motif up to length thresholdM that repeats thresholdN or more times. Only applicable to indels that contain the repeat motif, and are under the cutoff frequency.
SB—The strand bias is more than the given threshold.

INFO

Possible entries in the INFO column include:

AF1000G—The allele frequency from all populations of 1000 genomes data.
AA—The inferred allele ancestral (if determined) to the chimpanzee/human lineage.
clinvar—Clinical significance. Format: GenotypeIndex|Significance.
cosmic—The numeric identifier for the variant in the Catalogue of Somatic Mutations in Cancer (COSMIC) database. Format: GenotypeIndex|Significance.
CSQ—Consequence type as predicted by IAE. Format: GenotypeIndex|HGNC|Transcript ID|Consequence.
CSQR—Predicted regulatory consequence type. Format: GenotypeIndex|RegulatoryID|Consequence
DP—The total depth (number of base calls aligned to a position and used in variant calling).
EVS—Allele frequency, coverage, and sample count taken from the Exome Variant Server (EVS). Format: AlleleFreqEVS|EVSCoverage|EVSSamples.
GMAF—Global minor allele frequency (GMAF); technically, the frequency of the second most frequent allele. Format: GlobalMinorAllele|AlleleFreqGlobalMinor.
phyloP—PhyloP conservation score. Denotes how conserved the reference sequence is between species throughout evolution.
RefMinor—Denotes positions where the reference base is a minor allele and is annotated as though it was a variant.

FORMAT

The format column lists fields separated by colons. For example, GT:GQ. The list of fields provided depends on the variant caller used. Available fields include:

AD—Allele Depth; if the GT is 0/0, the AD is the reference count. If the GT is 0/1 or 1/1, the AD is of the form X,Y, where X is the reference allele count and Y is the alternative allele count. If the GT is 1/2, the AD is of the form Y,Z, where Y and Z are the alternative allele 1 and 2 counts.
DP—Total depth used for variant calling.
GQ—Genotype quality.
GT—Genotype. 0 corresponds to the reference base, 1 corresponds to the first entry in the ALT column, and so on. The forward slash (/) indicates that no phasing information is available.
NL—Noise level; an estimate of base calling noise at this position.
SB—Strand bias at this position. Larger negative values indicate less bias; values near 0 indicate more bias.
VF—Variant frequency; if the GT is 0/0, the VF is the nonreference frequency. If the GT is 0/1 or 1/1, the VF is the frequency of the variant allele. If the GT is 1/2, the VF is the frequency of the two variant alleles, together.

SAMPLE

The sample column gives the values specified in the FORMAT column.