Illumina FASTQ file generation pipelines include an adapter trimming option for the removal of adapter sequences from the 3’ ends of reads. Adapter sequences should be removed from reads because they interfere with downstream analyses, such as alignment of reads to a reference. The adapters contain the sequencing primer binding sites, the index sequences, and the sites that allow library fragments to attach to the flow cell lawn. Libraries prepared with Illumina library prep kits require adapter trimming only on the 3’ ends of reads, because adapter sequences are not found on the 5’ ends.
Note: Libraries prepared with the Nextera Mate Pair library prep kit are an exception, and guidelines for trimming adapters from these libraries can be found in the Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms technical note.
To understand why adapter sequences are found only on the 3’ ends of the reads, it helps to understand first where the sequencing primers anneal to the library template on a flow cell. The diagrams below show the sites of primer annealing at each stage of sequencing run: Read 1, Index 1, Index 2 and Read 2.
Figure 1. MiSeq, HiSeq 2000/2500 and NovaSeq paired-end flow cell
Figure 2. MiniSeq, NextSeq and HiSeq 3000/4000 paired-end flow cell
As shown in Figures 1 and 2, in both Read 1 and Read 2, the sequencing primer anneals to the adapter, immediately upstream of the DNA insert (in gray). Because the sequencing starts at the first base of the DNA insert in Reads 1 and 2, the adapter is not sequenced at the start of the read. However, if the sequencing extends beyond the length of the DNA insert, and into the adapter on the opposite end of the library fragment, that adapter sequence will be found on the 3’ end of the read. Therefore, reads require adapter trimming only on their 3’ ends.