Gene Fusion Detection

The DRAGEN Gene Fusion module uses the DRAGEN RNA spliced aligner for detection of gene fusion events. It performs a split-read analysis on the supplementary (chimeric) alignments to detect potential breakpoints. The putative fusion events then go through various filtering stages to mitigate potential false positives. In addition to the final results, all potential candidates (unfiltered) are output, which can be used to maximize sensitivity.

Gene Fusion Output and Filters

The <outputPrefix>fusion_candidates.features.csv file lists the detected gene fusion events. The output CSV file includes the following columns. Any additional columns describe additional features of the fusion candidates.

•

#FusionGene—Parent gene names (in 5' to 3' order of transcript) participating in the fusion. If a fusion breakend overlaps multiple genes, all are listed.

•

Score—Fusion call confidence score based on the number of supporting split reads and read-pairs as well as other fusion features. The score can be 0 (low confidence) to 1 (high-confidence call).

•

LeftBreakpoint—Gene 1 breakpoint formatted as <Chromosome>:<Position>:<Strand>.

•

RightBreakpoint—Gene 2 breakpoint formatted as <Chromosome>:<Position>:<Strand>.

•

Filter—Semi-colon separated list of filters. Each output is either a Confidence or Information Only filter. The Filter value is PASS if none of the confidence filters are triggered. Otherwise, the output value is FAIL.

The following are the available filters.

Filter	Type	Description
DOUBLE_BROKEN_EXON	Confidence	If both breakpoints are 50 bp from annotated exon boundaries, then the number of supporting reads do not satisfy a high threshold requirement (≥ 10 supporting reads).
LOW_MAPQ	Confidence	All fusion supporting read alignments at either of the breakpoints have MAPQ < 20.
LOW_UNIQUE_ALIGNMENTS	Confidence	All fusion supporting read alignments map to a unique genomic interval at either of the breakpoints.
MIN_SCORE	Confidence	The fusion candidate has probabilistic score as determined by the features of the candidate.
MIN_SUPPORT	Confidence	The fusion candidate has < 2 fusion supporting read pairs.
READ_THROUGH	Confidence	The breakpoints are cis neighbors (< 200,000 bp) on the reference genome.
ANCHOR_SUPPORT	Information only	Read alignments of fusion supporting reads are 12 bp) at either of the two breakpoints.
HOMOLOGOUS	Information only	The candidate is likely a false candidate generated because the two genes involved have high gene homology.
LOW_ALT_TO_REF	Information only	The number of fusion supporting reads is < 1% of the number of reads supporting the reference transcript at either of the two breakpoints.
LOW_GENE_COVERAGE	Information only	Either of the two breakpoints have less than 125 bp with nonzero read coverage.

Gene Fusion Options

The following options can be used to configure the fusion caller:

•

--rna-gf-blast-pairs

A file listing gene pairs that have a high level of similarity. This list of gene pairs is used as a homology filter to reduce false positives. One method to generate this file is to follow the instructions as described on the Fusion Filter Wiki. Use the ref annot.cdsplus.fa.allvsall.outfmt6.genesym.gz file produced by CTAT. For human genome runs, a default file is include and used automatically if no other file is manually specified.

•

--rna-gf-enriched-genes

For RNA enrichment assays, a list of targeted genes specified as one gene-name per line. Only fusion calls involving at least one gene on the list are reported.

•

--rna-gf-restrict-genes

When parsing the gene annotations file (GTF/GFF) for use in the DRAGEN Gene Fusion module, you can use this option to restrict the entries of interest to only protein-coding regions. Restricting the GTF to only the protein-coding and lincRNA genes reduces false positive rates in currently studied fusion events. The default value is true.