Amplicon Workflow

Demultiplexing—Demultiplexing is the first step in analysis if the sample sheet lists multiple samples and the run has index reads. The index reads are used to determine from which sample each cluster originates.
Samples are numbered starting from 1 based on the order they are listed in the sample sheet.
Sample number 0 is reserved for clusters which are not successfully assigned to a sample.
Clusters are assigned to a particular sample if they match the index sequence exactly, or if they have up to a single mismatch per index read.

A file called DemultiplexSummary.txt is produced in the Alignment folder after demultiplexing.

Alignment – Clusters are aligned against amplicon sequences from the manifest file. The first read is evaluated against the probe sequence (reverse complement of DLSO) for each amplicon in the manifest. If the start of the read matches (with at most 1 mismatch) a probe sequence, the read is aligned against the amplicon(s) for that probe sequence. If no such match was found for the read, BaseSpace looks for any probe sequence which is matched with fewer than six mismatches, and attempts to align against these amplicons. For paired-end data, the second read is compared to ULSO sequences. (Indels within the ULSO and DLSO are not accepted – they should not be observed given the assay chemistry.)

 

Note

ULSO stands for Upstream Locus-Specific Oligo and DLSO stands for Downstream Locus-Specific Oligo.

Once the probe sequence (ULSO or DLSO) is matched, the alignment itself is performed using banded Smith-Waterman alignment. The banding allows consideration of gaps up to length n/3 in a read of length n.
Paired-end evaluation – For paired-end runs, alignments are resolved using Kagu.
Bin/Sort – Reads are grouped by sample and chromosome, sorted by chromosome position, and written out to .bam files.
Variant analysis and annotation – SNPs and short indels are identified by the variant caller. The default variant caller is GATK. The other available options are Pisces (appropriate for somatic variant calling; for example, FFPE samples) and Starling (available for backward compatibility). The variant caller is configured using the sample sheet.
Variants are flagged as homozygous or heterozygous in the .vcf sample column, with either a 1/1 or a 0/1 respectively. Note that during Somatic Variant Calling, PISCES will make no decision as to whether or not a variant is homozygous, heterozygous, or somatic. PISCES will default to 0/1 in all cases.
If a reference gene database is available (refGene.txt in the Annotation subfolder of the reference genome folder), any SNPs or indels that fall within known genes are annotated. Variants are also flagged as filtered if their quality score is low or if they fail other stringent filters.
Statistics Reporting – Statistics from the run are summarized and reported, including to the page Summary.htm in the alignment folder.

 

© 2011-2012 Illumina, Inc. All rights reserved.

Rev. August 20, 2012