Introduction | Workflow


Filtering—Bowtie filters the input reads against abundant sequences, such as mitochondrial or ribosomal sequences, as defined by iGenomes at
Only sequences that do not align against abundant sequences are passed through to the next phase of the analysis. Bowtie filters read pairs when at least 1 read aligns to an abundant sequence. Also, Bowtie trims off 2 bases from the 5’ end of the read because of a high mismatch rate from these 2 bases in the RNA-Seq libraries. See Bowtie.
Alignment—The STAR or TopHat2 aligner performs a spliced alignment of the filtered reads against the genome. Based on the user-specified genome, STAR or TopHat aligns reads against known transcripts and splice junctions. See STAR.
Alignment to ERCC—If selected, STAR aligns all reads to the ERCC RNA spike-in sequences, independent of alignment to the transcriptome. The aligner counts reads that align to each spike-in sequence, calculates FPKMs, and computes the correlation between FPKMs and the expected spike-in concentrations.
Fusion Calling—If selected, the STAR aligner supports Manta-fusion and the TopHat aligner supports TopHat-fusion. First, TopHat2 is used to detect fused alignments. Then, a post-alignment analysis script identifies candidate fusion genes from the fused alignments. See Manta .
Variant Calling—The Isaac Variant Caller performs variant calling, which produces gVCF output files. For stranded library preps, the strand bias filter is disabled.
Also, the Isaac variant caller uses the -bsnp-diploid-het-bias parameter to expand the range for the heterozygous variant call, in order to account for allele-specific expression.
The Isaac tool uses a RNA-specific random-forest-based variant scoring model, which was built using Platinum Genomes data as a reference. See Isaac Variant Caller.
Quantification—Cufflinks quantifies reference genes and transcripts. RnaReadCounter counts the number of aligned reads matching each annotated gene. See Cufflinks and RnaReadCounter.
Novel Transcript Assembly—If selected, transcripts are assembled and quantified independently for each sample.