Home/
Getting Started/VCF Input Requirements

VCF Input Requirements

BaseSpace Variant Interpreter imports single nucleotide variants (SNVs), multiple nucleotide variants (MNVs), structural variants (SVs), copy number variants (CNVs), and indels reported in VCF and genome VCF (*.vcf or *.vcf.gz) file formats, v4.1.

On import, BaseSpace Variant Interpreter truncates file names after the first period. To prevent duplicate Analysis Result names, Illumina recommends that file names be unique up to the first period in the file.

Annotations are applied only to human genomes aligned to the following genome assemblies: hg19, GRCh37, hs37d5, hg38 and GRCh38. Variants identified on chromosome M for hg19, hs37d5, and hg38 are not supported and are not displayed. The genome assembly can be specified in the reference metadata header (eg,##reference=file:///data/hg38-altaware-cnv-anchor.v8/reference.bin)

CAUTION

If the genome assembly is not specified in the reference header, ingestion can fail.

The following table provides an example VCF file format. Columns and values vary by variant caller, and only variations produced by compatible variant callers are supported.

CAUTION

Imported VCF files must be sorted and indexed correctly to ensure valid calculation results in BaseSpace Variant Interpreter.

Example VCF Column Headings

VCF Column

Required Value

CHROM

The reference sequence name. Values are #, or chr#, where # is 1 of the following:

The chromosome number, as in 1–22.
The name, as in X or Y; M or MT for mitochondrial.

POS

The position of the variant. Values are numeric with the first base taking position 1 (1-based).

ID

The ID is the rs number for the SNP as recorded in the software. A value must be present. If a dbSNP entry does not exist, a missing value marker '.' is an acceptable value.

Although the ID column and valid values are required, the values are not imported. The software applies dbSNP annotations when available.

REF

The reference allele.

ALT

The alternate allele.

QUAL

The quality score assigned by the variant caller. A value of '.' is acceptable, and is reported as 0.

FILTER

The status of the variant call quality as annotated in the VCF file. PASS indicates that all filters passed. Otherwise, the variant call filter is listed.

INFO

The recognized field is DP (read depth), which is represented in the Read Depth column in the Variants table. A value of '.' (none) is acceptable.

FORMAT

A list of fields that define values in the Sample column. Possible values are '.' (none) and the following:

DP—Represented in the Read Depth column in the Variants table.
DPI—Represented in the Read Depth column in the Variants table for insertion and deletion events called by the Illumina Isaac Alignment and Variant Calling workflow.
AD—Represented in the Alt Read Depth and Allelic Depth columns in the Variants table.
AU—The number of A alleles used in tier 1.
CU—The number of C alleles used in tier 1.
GU—The number of G alleles used in tier 1.
TU—The number of T alleles used in tier 1.
TAR—Reads strongly supporting alternate allele for tier 1.
TIR—Reads strongly supporting indel allele for tier 1.
GQ—The genotype quality.
GQX—The minimum of the GQ value and the QUAL column.

Genotype Values: (Not present in somatic VCF files.)

Acceptable GT values are 0/0, 0/1, 1/1, and 1/2. Nonnumeric GT values, or './.' as in a no-call, are not imported.
Hemizygous alt GT values, '1', are accepted. Hemizygous reference calls, '0', are not imported.