This depends on the size of the organism you are trying to resequence. For whole-genome resequencing, a 25-fold over-sampling should be adequate. For targeted resequencing involving mixes of many PCR products, 75-fold over-sampling will correct for the inability to mix the PCR products at a 1:1 ratio. Illumina sample prep shows no systematic bias. In sequencing the X chromosome, we achieved 16-fold average coverage with all sequenceable bases covered at least twice.
There are no inherent limits in the software. Illumina scientists have pooled 29 BACs of 130 kb each.
Homopolymers do not impact sequencing. The number of uniquely alignable reads is a function of the repeat content, so this will have an impact on productivity. With longer reads and paired-end sequencing, this may be less of an issue.
The new HiSeq v4 reagent kits now support dual indexing workflows without requiring the purchase of additional SBS agents. Sample prep for dual-indexed libraries requires that both indexes be present on the library. However, the second index does not need to be read during sequencing. A single-indexing workflow is supported on Illumina sequencing instruments, where only Index 1 is used. See the instrument user guide for more information about setting up an 8-base single-indexed sequencing run.
For dual index paired-end runs, there are 23 additional cycles (index & chemistry only).
For dual-index single-read runs, there are 16 additional cycles of indexing.
For information about the number of SBS kits required on the HiSeq, HiScanSQ, or GAIIx, see the user guide for your instrument guide.
The terms lane and channel are sometimes used synonymously in regards to the eight lanes of a flow cell. However, the term channel may also refer to a color channel on the Genome Analyzer (four colors corresponding to the four bases A, C, G, or T).
A tile is an image captured by the camera on the Genome Analyzer. A flow cell contains eight lanes. Each lane is imaged in two columns with 60 tiles from each column.
No. These kits are for use on HiSeq and HiScanSQ only.
For runs on the HiSeq, HiScanSQ, or GAIIx, creating and loading a sample sheet at the start of the run is optional. However, using a sample sheet allows you to view data shown on the indexing tab in the Sequencing Analysis Viewer (SAV) during the run. If you do not load a sample sheet at the start of a run in HCS, you will not be able to view indexing data in SAV. When analyzing indexed samples using CASAVA v1.8.2, a sample sheet is required. MiSeq runs require a sample sheet when setting up the run in MCS.
Illumina recommends that you create the sample sheet using the Illumina Experiment Manager (IEM) prior to performing library prep in order to confirm appropriate index combinations.
There are no changes for MiSeq analysis. HiSeq and GA data require an upgrade to CASAVA 1.8.2 to demultiplex dual-indexed libraries. It is also recommended to upgrade to SAV 1.8.4 or higher to use the new Index tab for real time demultiplexing information.
Index reads for single-read libraries use 7 cycle reads. Illumina does not support 6 cycle index reads for single-indexed libraries.
See the appropriate HiSeq instrument user guide for details on the loading of reagents with different workflows and which primers you need to use for your library type.
Dual-indexed runs on the HiSeq comprise 8 bp of index sequence rather than 6 bp plus a seventh for phasing calculations. For more information, see the user guide for your sequencing instrument.
Flow cells are designed for single-use. All eight lanes must be used at the same time. They can be used for the same sample or for different samples. You can run eight samples at a time without multiplexing. With multiplexing, you can increase throughput to up to 12 samples per lane or up to 96 samples per flow cell.
A quality score (or Q-score) is a prediction of the probability of an incorrect base call. Based on the Phred scale, the Q-score serves as a compact way to communicate very small error probabilities. Given a base call, X, the probability that X is not true, P(~X), is expressed by a quality score, Q(X), according to the relationship:
Q(X) = -10 log10(P(~X))
where P(~X) is the estimated probability of the base call being wrong.
A quality score of 10 indicates an error probability of 0.1, a quality score of 20 indicates an error probability of 0.01, a quality score of 30 indicates an error probability of 0.001, and so on.
During analysis, base call quality scores are written to FASTQ files in an encoded compact form, which uses only one byte per quality value. This method represents the quality score with an ASCII code equal to the value + 33.
Image analysis occurs in real time, phasing estimates and base calling begin occur after cycle 12, and base call quality scoring occurs after cycle 25.
It is the ability to distinguish between two or more clusters that are in close proximity to each other.
The matrix file is used for base calling and accounts for cross talk between dyes.
To remove the least reliable data from the analysis results, often derived from overlapping clusters, raw data are filtered to remove any reads that do not meet the overall quality as measured by the Illumina chastity filter. The chastity of a base call is calculated as the ratio of the brightest intensity divided by the sum of the brightest and second brightest intensities.
Clusters passing filter are represented by PF in analysis reports. Clusters pass filter if no more than one base call in the first 25 cycles has a chastity of < 0.6.