VCF Annotations

The VCF files for Amplicon DS can have the following annotations in the FILTER, FORMAT, and INFO fields.

Table 1   VCF FILTER Entries

Entry

Description

LowGQ

The genotyping quality (GQ) is below a cutoff.

LowVariantFreq

The variant frequency is less than the given threshold.

PB

The prevalence of the variant is significantly biased between the 2 forward and reverse probe pools.

R8

For an indel, the number of adjacent repeats (1-base or 2-base) in the reference is greater than 8.

SB

The strand bias is more than the given threshold.

LowDP

Applied to sites with depth of coverage that is below a cutoff.

Table 2   VCF FORMAT Entries

Entry

Description

AD

Allelic depths for the ref and alt alleles in the order listed. For indels, this value includes only the reads that confidently support each allele (posterior probability 0.999 or higher that read contains indicated allele vs all other intersecting indel alleles).

GQ

Genotype Quality.

GQX

Minimum of {Genotype quality assuming variant position,Genotype quality assuming nonvariant position}.

GT

Genotype.

NL

Noise level, as a Q-score.

PB

Probe pool bias.

SB

Strand bias.

VF

Variant frequency in the sample.

Table 3   VCF INFO Entries

Entry

Description

AA

The inferred allele ancestral to the chimpanzee/human lineage.

AF1000G

The allele frequency from all populations of 1000 genomes data.

clinvar

Clinical significance from the ClinVar database (www.ncbi.nlm.nih.gov/clinvar/).

cosmic

The numeric identifier for the variant in the Catalogue of Somatic Mutations in Cancer (COSMIC) database (cancer.sanger.ac.uk/cosmic).

CSQR

Regulatory consequence as predicted by Variant Effect Predictor (www.ensembl.org/info/docs/tools/vep/index.html) version 72. A comma-separated list for each affected regulatory region (including transcription factor binding sites) is provided using the following delimited format: RegulatoryID|Consequence. The annotations provided in this field come from the Ensembl database of regulatory features even if RefSeq was selected as the annotation source. Many of the RegulatoryIDs begin with ENSR. The consequences are indicated using valid Sequence Ontology (SO) terms (www.ensembl.org/info/genome/variation/predicted_data.html#consequences) and typically are either regulatory_region_variant or TF_binding_site_variant.

CSQT

Transcript consequence as predicted by Variant Effect Predictor (www.ensembl.org/info/docs/tools/vep/index.html) version 72. Only canonical transcripts are included in the VCF file to maintain readability. The ANT file contains consequences for all affected transcripts. This binary file can be loaded into VariantStudio for viewing. See www.illumina.com/informatics/research/biological-data-interpretation/variantstudio.html.

DP

The depth (number of base calls aligned to a position and used in variant calling). In regions of high coverage, GATK down-samples the available reads.

EVS

Allele frequency, sample count, and coverage taken from the Exome Variant Server (EVS). Format: AlleleFreqEVS|EVSCoverage|EVSSamples.

EXON

A comma-separated list of exon regions read from RefGene.

FC

Functional consequence.

GI

A comma-separated list of gene IDs read from RefGene.

GMAF

Global minor allele frequency (GMAF); technically, the frequency of the second most frequent allele. Format: GlobalMinorAllele|AlleleFreqGlobalMinor.

phastCons

Denotes if the variant is an identical or similar sequence that occurs between species and maintained between species throughout evolution.

TI

A comma-separated list of transcript IDs read from RefGene.