Input Data Filtration

Isaac Somatic Variant Caller uses 2 tiers of input data filtration during somatic small variant calling:

Tier 1—A more stringent filtering to ensure high quality calls
Tier 2—A lower filtration stringency

Initially, candidates are called using a subset of the data with more stringent tier 1 filtering. If the method produces a nonzero quality score for any SNV or indel, the potential somatic variant is called again using data with a lower tier 2 stringency. The lower quality from the 2 tiers is selected for output. However, if the tier 2 quality is 0, the call is eliminated.

For somatic SNVs and indels, Isaac Somatic Variant Caller produces a general somatic quality score, Q(ssnv), or Q(somatic indel). This score indicates the probability of the somatic variant and a joint probability of the somatic variant and a specific normal genotype, Q(ssnv+ntype), or Q(somatic indel+ntype). The 2 tier evaluation is applied to each of these qualities separately, as follows:

Q(ssnv) = min(Q(ssnv|tier1), Q(ssnv|tier2))
Q(ssnv+ntype) = min(Q(ssnv+ntype|tier1), Q(ssnv+ntype|tier2))

The tier used for each quality value is provided in the Isaac Somatic Variant Caller output record for each somatic variant. If the most likely normal genotype is not the same at tier 1 and tier 2, then the normal genotype is reported as a conflict in the output.

Using 2 data tiers enables an initial somatic call based on high-quality data. Given a potential call, using 2 data tiers removes support for the putative somatic allele in the normal sample from lower quality data. The following table lists the primary data filtration levels that are changed between tier 1 and tier 2.

Parameter

Tier 1 Value

Tier 2 Value

Min paired-end alignment score

20

0

Min single-end alignment score

10

0

Single-end score rescue?

No

Yes

Include unanchored pairs?

No

Yes

Include anomalous pairs?

No

Yes

Include singleton pairs?

No

Yes

Mismatch density filter—Maximum mismatches in window

3

10