Somatic VCF Fields Reported in the Variants Table

Information reported in VariantStudio for VCF files generated by the Illumina cancer analysis pipeline differs from what is reported for other VCF files.

For these files, there is no genotype (GT) or genotype score (GQX). Instead, allelic depths are listed.

Each VCF includes two samples, one of which is a reference and the other is the cancer sample. All reported values are specific to the cancer sample.

The following table lists the VCF fields that are unique to somatic VCF files.

Variants Table Column Heading

VCF File Column or Field Description

Allelic Depth

Based on values in the FORMAT column. Allelic Depth is calculated differently for SNVs and indels:

For SNVs—Based on four values listed as AU:CU:GU:TU in the FORMAT column. These values are listed as two numbers each, separated by a comma, and represent each possible allele in the cancer sample. The Allelic depth column is populated with the full set of numbers, 0,0:0,0:10,10:3,4.
For indels—Two values listed as TAR:TIR in the FORMAT column represent the Ref Allele and Alt Allele, respectively. Only the first number in each value is used. In the example 0,0:12,12, the Ref Allele is 0 and the Alt Allele is 12. Allelic Depth is listed as 0,12.

Alt Read Depth

Based on values in the FORMAT column. Alt Read Depth is calculated differently for SNVs and indels:

For SNVs—Based on the first value from the appropriate Allelic Depth (AU:CU:GU:TU). In the example 0,0:0,0:10,10:3,4, the values are 10,10 for GU and 3,4 for TU. If the Ref Allele is G and the Alt Allele is T, the Alt Read Depth is 3.
For indels—Based on the first value from the appropriate Allelic Depth (TAR:TIR). In the example 0,0:12,12, the Ref Allele is 0 and the Alt Allele is 12. Alt Read Depth is listed as 12.

Alt Variant Freq

For somatic VCF files, allele frequency is calculated from values in the VCF file before data are reported in the Variants table.

For SNVs—Using only the first values for AU:CU:GU:TU, allele frequency is calculated as (alt allelic depth/(alt allelic depth + ref allelic depth))*100. In the example 0,0:0,0:10,10:3,4, Alt Variant Freq is 23.08% by calculating (3/(3+10))*100.
For indels—Using only the first values for TAR and TIR, allele frequency is calculated as (TIR/(TIR+TAR))*100. In the example 0,0:12,12, Alt Variant Freq is 100% by calculating (12/(12+0))*100.

Genotype

Based on values listed in the INFO column. If SOMATIC is listed in the INFO column, the genotype is listed as somatic (som) in the Variants table.

Quality

Quality is based on different values for SNVs and indels:

For SNVs—Quality is based on the QSS_NT field in the INFO column. This score represents the probability that the SNV exists and is somatic.
For indels—Quality is based on the QSI_NT field in the INFO column. This score represents the probability that the indel exists and is somatic.

Read Depth

For SNVs and indels, Read Depth is extracted from values listed for DP in the FORMAT column of the cancer sample.