Input Requirements

The SV Caller runs quality checks on the input sequencing reads for each sample to make sure that the input corresponds to a paired read assay with the expected FR orientation, prior to estimating the fragment size distribution. To check consensus read pair orientation, a subset of high quality read pairs is sampled. At least 90% of these must have the expected FR orientation for SV analysis to continue, otherwise the SV caller issues a warning, skips any further analysis, and the resulting output files display empty results.

The SV Caller can tolerate nonpaired reads in the input, if sufficient paired-end reads exist to estimate the fragment size distribution. To estimate the fragment size distribution, the SV Caller requires at least 100 read pairs which meet the quality requirements of the estimation routine. Both reads of the pair must have a non-zero mapping quality to the same chromosome, are not filtered or part of a split read mapping, and do not contain indels or soft-clipping. If a sample does not contain a sufficient number of such read pairs, the SV Caller issues a warning, skips any further analysis, and writes empty results to its output files.

Read Groups

The SV Caller disregards any read group labels applied to the input sequences. Each input sample is treated as a separate library with a single fragment size distribution.

File Format

In standalone mode, input sequencing reads must be mapped and provided as input in either BAM or CRAM format. Each input file must be coordinate sorted and indexed to produce an asamtools/htslib-style index in a file named to match the input BAM or CRAM file with an additional '.bai', '.crai' or '.csi' file name extension.

At least one BAM or CRAM file must be provided for the normal or tumor sample. A matched tumor-normal sample pair can be provided as well. If multiple input files are provided for the normal sample, each file is treated as a separate sample as part of a joint diploid sample analysis.

In standalone mode, input BAM or CRAM files contain the following limitations:

•

Alignments cannot have an unknown read sequence (SEQ="*")

•

Alignments cannot contain the "=" character in the SEQ field.

•

Alignments cannot use the sequence match/mismatch ("="/"X"). CIGAR notation RG (read group) tags in the alignment records are ignored. Each alignment file is treated as representing one sample.

•

Alignments with base call quality values greater than 70 are rejected. These are not supported on the assumption that this indicates an offset error.