Somatic CNV VCF Output

The somatic CNV VCF file follows the standard VCF format and has the following differences from the germline CNV VCF output.

The following header lines are specific to somatic CNV calling.

[ModelSoure]—The primary basis on which the final tumor model was chosen. The following values are included:
[DEPTH+BAF]—Depth+BAF signal is used to determine tumor model.
[DEPTH+BAF_DOUBLED]—The initial depth+BAF model is additionally duplicated based on VAF signal or excess segments at half the expected depth change.
[DEPTH+BAF_DEDUPLICATED]—The depth+BAF model is deduplicated based on VAF signal or insufficient segments supporting a duplication.
[DEPTH+BAF_WEAK]—Depth+BAF signal is used to determine (lower-confidence) tumor model.
[VAF]—VAF signal is used to determine tumor model due to insufficient depth+BAF signal.
[DEGENERATE_DIPLOID]—Sample is treated as high-purity diploid in absence of adequate signal from depth+BAF and VAF. The diploid coverage is set to lowest value observed in a substantial number of bases in segments with BAF=50%. All VCF records have lowModelConfidence added to FILTER value.
[SAMPLE_MEDIAN]—Sample is treated as high-purity diploid in absence of adequate signal from depth+BAF and VAF. Diploid coverage set to sample median. All VCF recordsl have lowModelConfidence added to FILTER value.
EstimatedTumorPurity—Estimated fraction of cells in the sample due to tumor. The range of this field is [0, 1].
DiploidCoverage—Expected read count for a target bin in a diploid region. The numeric value is unlimited.
OverallPloidy—Length weighted average of tumor copy number for PASS events. The numeric value is unlimited.
AlternativeModelDedup—An alternative to the best model corresponding to one less whole-genome duplication is given as a pair of values (tumor purity, diploid coverage). This may be useful for manual investigation where the best model may involve a spurious genome duplication.
AlternativeModelDup—An alternative to the best model corresponding to one more whole-genome duplication is given as a pair of values (tumor purity, diploid coverage). This may be useful for manual investigation where the best model may have missed a true genome duplication.

 

The ID column represents the type of event. In addition to representing GAIN, LOSS, and REF events, the CNLOH (copy neutral loss of heterozygosity) and GAINLOH (copy number gain with LOH) entries represent LOH (loss of heterozygosity) events.

 

The ALT field may have two alleles, such as <DEL>,<DUP>, which allows representation of allele specific copy numbers if they differ in copy number states.

 

The FILTER field has the following additional applied filters.

binCount—CNV events with a bin count lower than the threshold are filtered.
lowModelConfidence—A low confidence in the model estimate marks all records as non-PASSING.

 

The FORMAT fields are described in the header section. The following fields are specific to somatic CNV:

AS—Number of allelic read count sites.
BC—Number of read count bins.
CN—Estimated total copy number in tumor fraction of sample.
CNF—Floating point estimate of tumor copy number.
CNQ—Exact total copy number Qscore.
MAF—Maximum a posteriori estimate of minor allele frequency.
MCN—Estimated minor-haplotype copy number.
MCNF—Floating point estimate of tumor minor-haplotype copy number.
MCNQ—Minor copy number Qscore.
NCN—Normal-sample copy number (present only in germline-aware mode).
SCND—Difference between CN and GCN (present only in germline-aware mode).
SD—Best estimate of segment’s bias-corrected read count.