What is nucleotide diversity and why is it important?


What is nucleotide diversity?

High nucleotide diversity is when a library has roughly equal proportions of all 4 nucleotides in every cycle of the run. The diagram below illustrates the diversity and base-balance of well-balanced and unbalanced libraries, and how that can be reflected in the % base plot of Sequencing Analysis Viewer (SAV).

Why is nucleotide diversity important?

Nucleotide diversity is required for effective template generation on Illumina sequencing platforms and is important for the generation of high-quality data.

Diversity is especially important during the first 4–7 cycles of the first sequencing read for MiniSeq, MiSeq, NextSeq 500/550, and HiSeq 1000–2500 systems. The sequencing software uses images from these early cycles to identify the location of each cluster in a process called template generation.

Nucleotide diversity is also important for the first 25 cycles in the first sequencing read on all sequencing platforms because this is when phasing/pre-phasing, color matrix corrections, and the pass filter calculations occur. These corrections and calculations are used in base calling and quality score calculations for all cycles in a run for the clusters that pass filter.

The newest versions of Real-Time Analysis (RTA) software on the HiSeq 2500 and MiSeq platforms optimize the estimation of the color normalization matrix calculations and phasing/pre-phasing rates. These enhancements allow for low diversity and unbalanced libraries to be sequenced with a lower percentage of balanced library spike-in for color balance, resulting in higher quality sequencing data. When using HiSeq Control Software version 2.2.38 or higher, a minimum of 10% PhiX is required. When using MiSeq Control Software version 2.2 or higher, a minimum of 5% PhiX is required. If your sequencing platform is running a control software version lower than stated here, contact Illumina Technical Support for further guidance.

In systems running different versions of RTA, to achieve the same color balance, the input of anywhere between 10–50% PhiX may be required. For further guidance on the use of PhiX for color balance on other Illumina sequencing platforms, such as the MiniSeq and NextSeq 500/550, refer to the technical bulletins:

    How much PhiX spike-in is recommended when sequencing low diversity libraries on Illumina platforms?
    Best Practices for Low-Diversity Sequencing on the NextSeq 500/550 and MiniSeq Systems