DRAGEN Somatic Small Variant Caller (Tumor Only)

The DRAGEN Somatic Small Variant Caller takes mapped and aligned DNA reads as input and calls SNVs and indels through local de novo assembly of haplotypes in an active region.

Callable reference regions are first identified with sufficient alignment coverage. Within these reference regions, a fast scan of the sorted reads identifies active regions, centered around pileup columns with evidence of a variant in the tumor reads. The active regions are padded with enough context to cover significant, nonreference content nearby and padded even more where there is evidence of indels.

Aligned reads are clipped within each active region and assembled into a De Bruijn graph. The edges of the clipped reads are weighted by observation counts, with the reference sequence as a backbone. After some graph cleanup and simplification, all source-to-sink paths are extracted as candidate haplotypes. Each haplotype is Smith-Waterman aligned to the reference genome to identify the variants it represents. For each read-haplotype pair, the probability P(r|H) of observing the read is estimated using a pair hidden Markov model (HMM) assuming the haplotype is the true starting sample.

Scanning by reference position for each candidate somatic event as well as the reference event over the active region, the conditional probability P(r|e) of observing each overlapping read is estimated as the maximum P(r|H) for haplotypes supporting the event. These are combined into the conditional probability P(r|E) for an event hypothesis, E, involving a mixture of the reference and candidate somatic allele over a range of possible allele frequencies and multiplied to yield the conditional probability P(R|E) of observing the whole read pileup. From there, a TLOD score is calculated as the evidence that an ALT allele is present in the tumor sample at a given locus.

The set of filters applied to the output VCF of the Tumor-only pipeline is described in Illumina DRAGEN Bio-IT Platform User Guide (document # 1000000070494).