Introduction | Workflow

Workflow

▶

Filtering—Bowtie filters the input reads against abundant sequences, such as mitochondrial or ribosomal sequences, as defined by iGenomes at support.illumina.com/sequencing/sequencing_software/igenome.html.

▶

Only sequences that do not align against abundant sequences are passed through to the next phase of the analysis. Bowtie filters read pairs when at least 1 read aligns to an abundant sequence. Also, Bowtie trims off 2 bases from the 5’ end of the read because of a high mismatch rate from these 2 bases in the RNA-Seq libraries. See Bowtie.

▶

Alignment—The STAR or TopHat2 aligner performs a spliced alignment of the filtered reads against the genome. Based on the user-specified genome, STAR or TopHat aligns reads against known transcripts and splice junctions. See STAR.

▶

Alignment to ERCC—If selected, STAR aligns all reads to the ERCC RNA spike-in sequences, independent of alignment to the transcriptome. The aligner counts reads that align to each spike-in sequence, calculates FPKMs, and computes the correlation between FPKMs and the expected spike-in concentrations.

▶

Fusion Calling—If selected, the STAR aligner supports Manta-fusion and the TopHat aligner supports TopHat-fusion. First, TopHat2 is used to detect fused alignments. Then, a post-alignment analysis script identifies candidate fusion genes from the fused alignments. See Manta .

▶

Variant Calling—The Isaac Variant Caller performs variant calling, which produces gVCF output files. For stranded library preps, the strand bias filter is disabled.

▶

Also, the Isaac variant caller uses the -bsnp-diploid-het-bias parameter to expand the range for the heterozygous variant call, in order to account for allele-specific expression.

▶

The Isaac tool uses a RNA-specific random-forest-based variant scoring model, which was built using Platinum Genomes data as a reference. See Isaac Variant Caller.

▶

Quantification—Cufflinks quantifies reference genes and transcripts. RnaReadCounter counts the number of aligned reads matching each annotated gene. See Cufflinks and RnaReadCounter.

▶

Novel Transcript Assembly—If selected, transcripts are assembled and quantified independently for each sample.