Analysis Methods | Large Indel and Structural Variant Calls

Large Indel and Structural Variant Calls

The large indel and structural variant caller uses the series of modules described here, and then generates output files in VCF 4.1 format.

Before ReadBroker

StatsGenerator—Computes summary statistics on insert sizes, read orientation, and alignment scores for each input BAM file.
AnomalousReadFinder—Grouper processes chromosomes in chunks. This method enables parallel execution and, therefore, faster performance. AnomalousReadFinder examines all alignments in a block and classifies reads and read pairs as follows:
Classifies reads as either shadow (unaligned) or semialigned partial or clipped alignment).
Classifies read pairs as either InsertionPair, DeletionPair, InversionPair, TandemDuplicationPair, or ChimericPair, according to which type of structural variant an anomalously mapped read pair is associated.
ClusterFinder—Clusters reads based on their type and the position of their alignment. Only reads of the same type are clustered together at this stage, except shadow and semialigned reads, which can be clustered together.
ClusterMerger—Associates clusters of various anomalous read types with shadow/semi-aligned read clusters, which breakpoints can cause. A breakpoint is a pair of bases that are adjacent in the sample genome but not in the reference. Two clusters are merged if they share the read or if they agree on the position and length of the structural variant. This information is inferred from read alignment orientation and distance.

ReadBroker

Interchromosomal translocations yield chimeric read pairs where 1 read aligns to one chromosome and its partner aligns to another. Because Grouper examines each chromosome individually, the ReadBroker step is performed to join the information from chimeric read pairs across chromosomes.

After ReadBroker

SmallAssembler—Assembles reads in clusters into contigs using a de Bruijn method and iteratively assembles reads into contigs until all reads in the cluster are assembled. It also produces a file containing the reads that were used to assemble the contig, with a realignment to the contig sequence.
SpanContigs—Uses the presence of nearby anomalous read pairs to determine whether to extend the search range used by the subsequent AlignContig step from its default.
AlignContig—Computes a dynamic programming alignment of a contig to a region of the reference genome; merges full or partial duplicate calls of the same event into a single call.
VariantFilter—Removes all structural variants that overlap with gaps identified in UCSC gaps. The UCSC gaps file defines regions of the genome that have not been sequenced.
DeletionGenotyper—Assigns a genotype to all deletions.
SomaticGenotyper—Assigns a quality score (Q-score) to all structural variants. Higher Q-scores indicate a higher probability that this structural variant is somatic.
DeletionGenotyper—Assigns a genotype to all deletions.