Pre-Scoring Alignment

As an initial step in the process, candidate variants smaller than minSomaticCallSize are filtered out. Remaining candidate variants that meet the following criteria are then excluded from somatic scoring (as they are deemed to be likely false positives from the onset):

    1 Candidate variants where the variant contig is constructed from an equal or greater number of normal sample reads compared to tumor sample reads (following coverage based normalization);
    2 Candidate variants having less than 2 tumor sample reads used to construct the variant contig (parameter “minAnomReadSuppCancer”)
    3 Candidate variants having more than 10 normal sample reads used to construct the variant contig (“maxAnomReadSuppNormal”)

After this pre-processing filter, a pool of breakpoint-associated reads is found for each sample. This will be used for realignment to both the reference allele and putative somatic allele to find evidence for support of each allele in each sample. The breakpoint-associated read pool is built from two sources: the first source is the reads used to assemble the variant contig; the second source comprises all reads (from the BAM files) that align within 10 bases of the predicted breakpoints. Reads that are flagged as PCR duplicates, unmapped, anomalous, or with a MAPQ score < 20 are not included.

The reads in the breakpoint-associated reads pools are then realigned to the reference and putative somatic alleles using a Smith-Waterman alignment. To have sufficient sequence context, the reference and intra-chromosomal somatic alleles are extended to have at least 120 bp on either side of the variant breakpoints. Read alignments that have mismatches in more than 10% of their sequence or have more than 3 gaps are not eligible to count as support for an allele. In addition, the read alignment score must favor the reference or somatic allele by at least (matchscore-mismatch score)*10 to be included as a supporting allele count; the realigned read must cross one of the predicted breakpoints by at least 8 bases.