CASAVA v1.8.2 User Guide | Bcl Conversion and Demultiplexing | Bcl Conversion Output Folder | FASTQ Files

Send Feedback

FASTQ Files

As of 1.8, CASAVA converts *.bcl files into FASTQ files, and uses these FASTQ files as sequence input for configureAlignment. The files are located in the Unaligned/Project_<ProjectName>/Sample_<SampleName> directories.

 

Note

Reads that were identified as sample prep controls in the control files are not saved in the FASTQ files.

Naming

Illumina FASTQ files use the following naming scheme:

<sample name>_<barcode sequence>_L<lane (0-padded to 3 digits)>_R<read number>_<set number (0-padded to 3 digits>.fastq.gz

For example, the following is a valid FASTQ file name:

NA10831_ATCACG_L002_R1_001.fastq.gz

In the case of non-multiplexed runs, <sample name> will be replaced with the lane numbers (lane1, lane2, ..., lane8) and <barcode sequence> will be replaced with "NoIndex".

Set Size

The FASTQ files are divided in files with the file size set by the --fastq-cluster-count command line option of configureBclToFastq.pl.The different files are distinguished by the 0-padded 3-digit set number.

 

TIP

If you need to generate one unique fastq gzipped file for use in a third-party tool, you can set the --fastq-cluster-count option to 0

Compression

FASTQ files are saved compressed in the GNU zip format, an open source file compression program. This is indicated by the .gz file extension. CASAVA automatically unzips the files before using them.

Format

Each entry in a FASTQ file consists of four lines:

Sequence identifier
Sequence
Quality score identifier line (consisting of a +)
Quality score

Each sequence identifier, the line that precedes the sequence and describes it, needs to be in the following format:

@<instrument>:<run number>:<flowcell ID>:<lane>:<tile>:<x-pos>:<y-pos> <read>:<is filtered>:<control number>:<index sequence>

The elements are described below.

Element

Requirements

Description

@

@

Each sequence identifier line starts with @

<instrument>

Characters allowed:

a-z, A-Z, 0-9 and underscore

Instrument ID

<run number>

Numerical

Run number on instrument

<flowcell ID>

Characters allowed:

a-z, A-Z, 0-9

 

<lane>

Numerical

Lane number

<tile>

Numerical

Tile number

<x_pos>

Numerical

X coordinate of cluster

<y_pos>

Numerical

Y coordinate of cluster

<read>

Numerical

Read number. 1 can be single read or read 2 of paired-end

<is filtered>

Y or N

Y if the read is filtered, N otherwise

<control number>

Numerical

0 when none of the control bits are on, otherwise it is an even number. See below.

<index sequence>

ACTG

Index sequence

An example of a valid entry is as follows; note the space preceding the read number element:

@EAS139:136:FC706VJ:2:5:1000:12850 1:Y:18:ATCACG

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

+

BBBBCCCC?<A?BC?7@@???????DBBA@@@@A@@

 

Note

CASAVA 1.8.2 FASTQ files contain only reads that passed filtering. If you want all reads in a FASTQ file, use the --with-failed-reads option.

Control Values

The tenth columns (<control number>) is zero if the read is not identified as a control. If the read is identified as a control, the number is greater than zero, and the value specifies what kind of control it is. The value is the decimal representation of a bit-wise encoding scheme, with bit 0 having a decimal value of 1, bit 1 a value of 2, bit 2 a value of 4, and so on.

The bits are used as follows:

Bit 0: always empty (0)
Bit 1: was the read identified as a control?
Bit 2: was the match ambiguous?
Bit 3: did the read match the phiX tag?
Bit 4: did the read align to match the phiX tag?
Bit 5: did the read match the control index sequence?
Bits 6,7: reserved for future use
Bits 8..15: the report key for the matched record in the controls.fasta file (specified by the REPORT_KEY metadata)

 

 

CASAVA v1.8.2 DNA Sequencing Analysis Workflow

© 2009-2011 Illumina, Inc. All rights reserved

Contact us: www.illumina.com/contact