DRAGEN BCL Data Conversion

BCL format is the native output format of Illumina sequencing systems and consists of a directory hierarchy containing data files and metadata. The data files are organized according to the flow cell layout of the sequencing system. The software converts this data to sample separated FASTQ files.

DRAGEN provides BCL conversion software that uses hardware acceleration on the DRAGEN platform, which results in improved run times compared to a pure software execution. To run the conversion software, use the --bcl-input-directory <BCL_ROOT>, --output-directory <DIR>, and --bcl-conversion-only true options.

The DRAGEN BCL conversion is designed to output FASTQ files that match bcl2fastq2 v2.20 output.

DRAGEN BCL conversion supports the following features.

•

Demultiplexing samples by barcode with optional mismatch tolerance.

•

Adapter sequence masking or trimming with adjustable matching stringency.

•

UMI sequence tagging and optional trimming.

•

Output of FASTQ files for index reads.

•

[Optional] Combine all lanes to the same FASTQ output files.

•

High sample count support (100,000).

•

UMI sequences in index reads.

•

Eliminate skew as the result of adapter sequence trimming by using the MinimumAdapterOverlap setting.

•

Outputs metrics to detect index-hopping.

Command Line Options

The following example command contains the required BCL conversion options:

dragen --bcl-conversion-only true --bcl-input-directory <...> --output-directory <...>

The following additional options can be specified on the command line:

•

--sample-sheet—Specifies the path to SampleSheet.csv file. --sample-sheet is optional if the SampleSheet.csv file is in the --bcl-input-directory directory.

•

--run-info—Sets the path to the RunInfo.xml file. By default, the file is located in the --bcl-input-directory directory.

•

--strict-mode—If set to true, DRAGEN aborts if any files are missing. The default is false.

•

--first-tile-only—If set to true, DRAGEN only converts the first tile of input (for testing and debugging). The default is false.

•

--bcl-only-lane <#>—Convert only the specified lane in this conversion run.

•

-f—Convert to output directory even if the directory exists (force).

•

--bcl-use-hw false—Do not use DRAGEN FPGA acceleration during BCL conversion. This allows concurrent execution of BCL conversion with DRAGEN analysis.

•

--bcl-sampleproject-subdirectories true—Output FASTQ files to subdirectories based on sample sheet Sample_Project column.

•

--no-lane-splitting true—Output all lanes of a flow cell to the same FASTQ files consecutively.

•

--bcl-only-matched-reads true—Disable outputting unmapped reads to files marked as Undetermined.

The following additional options can be used to manually control performance. Use of these options might reduce performance or result in analysis failure. Contact Illumina Technical Support if issues occur.

•

--shared-thread-odirect-output true —Switch to an alternate file output method that is optimized for sample counts greater than 100,000. This option is not recommended for lower sample counts and/or if using distributed file system targets such as GPFS or Lustre.

•

--bcl-num-parallel-tiles <#>—Number of tiles processed in parallel. The default is determined dynamically.

•

--bcl-num-conversion-threads <#>—Number of conversion threads per tile. The default is determined dynamically.

•

--bcl-num-compression-threads <#>—Number of CPU threads for compressing FASTQ output. The default is determined dynamically.

•

--bcl-num-decompression-threads <#>—Number of CPU threads for decompressing input BCL files. The default is determined dynamically.

The input BCL root directory and the output directory must be specified. The specified input path is three levels higher than the BaseCalls directory and should contain the RunInfo.xml file.

To specify the directory to store output FASTQ files, use the --output-directory option.

Sample Sheet Options

In addition to the command line options that control the behavior of BCL conversion, you can use the [Settings] section in the sample sheet configuration file to specify how the samples are processed. The following are the sample sheet settings for BCL conversion.

DRAGEN does not support the following sample sheet settings from bcl2fastq

•

FindAdapterWithIndels

•

ReverseComplement

Option	Default	Value	Description
AdapterBehavior	trim	trim, mask	Whether adapter should be trimmed or masked.
AdapterRead1	None	Read 1 adapter sequence containing A, C, G, or T	The sequence to trim or mask from the end of Read 1.
AdapterRead2	None	Read 2 adapter sequence containing A, C, G, or T	The sequence to trim or mask from the end of Read 2.
AdapterStringency	0.9	Float between 0.5 and 1.0	The stringency for matching the read to the adapter using the sliding window algorithm.
BarcodeMismatchesIndex1	1	0, 1, or 2	The number of allowed mismatches between the first Index Read and index sequence.
BarcodeMismatchesIndex2	1	0, 1, or 2	The number of allowed mismatches between the second Index Read and index sequence.
MinimumTrimmedReadLength	The minimum of 35 and the shortest non-indexed read length.	0 to the shortest non-indexed read length	Reads trimmed below this point become masked at that point.
MinimumAdapterOverlap	1	1, 2, or 3	Do not trim detected adapter sequences shorter than this value.
MaskShortReads	The minimum of 22 and MinimumTrimmedReadLength.	0 to MinimumTrimmedReadLength	Reads trimmed below this point become masked out.
TrimUMI	1	0 or 1	If set to 0, UMI sequences are not trimmed from output FASTQ reads. The UMI is still placed in sequence header.
CreateFastqForIndexReads	0	0 or 1	If set to 1, output FASTQ files for index reads and genomic reads.
OverrideCycles	None	Y: Specifies a sequencing read I: Specifies an indexing read U: Specifies a UMI length to be trimmed from read	String used to specify UMI cycles and mask out cycles of a read.

Override Cycles

The OverrideCycles mask elements are semicolon separated. For example:

OverrideCycles,U7N1Y143;I8;I8;U7N1Y143

DRAGEN supports flexible UMI processing during BCL conversion to support more third-party assays, including UMI sequences in index reads and multiple UMI regions per read. UMI sequences are trimmed from FASTQ read sequences and placed in the sequence identifier for each read, as normal.

The following are examples of OverrideCycles settings using 2x151 reads:

Setting

Description

OverrideCycles,U7N1Y143;I8;I8;U7N1Y143

UMI is comprised of the first 7 bps of each genomic read, linked by 1 bps of ignored sequence. This is the format for Illumina nonrandom UMIs, used in the following products:

•

TruSight Oncology 170 RUO

•

TruSight Oncology 500 RUO

•

IDT for Illumina - UMI Index Anchors

OverrideCycles,Y151;I8;U10;Y151

Index Read 2 is a 10 bps UMI. This is the format for Agilent XT HS.

OverrideCycles,Y151;I8U9;I8;Y151

Index Read 1 contains both an index and a 9 bps UMI. This is the format for IDT Dual Index Adapters with UMIs.

OverrideCycles,U3N2Y146;I8;I8;U3N2Y146

UMI is comprised of the first 3 bps of each genomic read, linked by 2 bps of ignored sequence. This is the format for UMIs in SureSelect XT HS 2 and IDT xGen Duplex Seq Adapter.

OverrideCycles,Y151;I8;I8;U10N12Y127

UMI is at the beginning of Read 2, attached with a linker sequence of length 12.

BCL Metrics

DRAGEN BCL conversion outputs metrics in CSV format to the Reports/ output subfolder. Information provided includes metrics files for demultiplexing, adapter sequence trimming, index-hopping (for unique-dual indexes only), and the top unknown barcodes for each lane. In addition, the sample sheet and RunInfo.xml file used during conversion is copied into the Reports/ subdirectory for reference.

Demultiplex Output File

The following information is included in the Demultiplex_Stats.csv output file.

Column	Description
Lane	The lane for each metric.
SampleID	The contents of Sample_ID in the sample sheet for this sample.
Index	The contents of index in sample sheet for this sample. For dual-index, the value concatenated with index2.
# Reads	The total number of pass-filter reads mapping to this sample for the lane.
# Perfect Index Reads	The number of mapped reads with barcodes that match the indexes provided in the sample sheet.
# One Mismatch Index Reads	The number of mapped reads with barcodes matched with one base mismatched.
# of >= Q30 Bases (PF)	The total number of bases mapped to this sample with a quality score greater than or equal to 30.
Mean Quality Score (PF)	The mean quality score of all bases mapping to this sample.

Adapter Metrics Output File

The following information is included in the Adapter_Metrics.csv output file.

Column	Description
Lane	The lane for each metric.
Sample_ID	The contents of Sample_ID in the sample sheet for this sample.
index	The contents of index in sample sheet for this sample.
index2	The contents of index2 in the sample sheet for this sample.
R1_AdapterBases	The total number of bases trimmed as adapter from read 1 reads.
R1_SampleBases	The total number of bases not trimmed from read 1.
R2_AdapterBases	The total number of bases trimmed as adapter from read 2 reads.
R2_SampleBases	The total number of bases not trimmed from read 2.
# Reads	The total number of pass-filter reads mapping to this sample in this lane.

Index Hopping Counts Output File

For unique dual index inputs, the Index_Hopping_Counts.csv file provides the number of reads mapping to every possible combination of provided index and index2 values, including via mismatch tolerance. The metrics provide visibility into any index-hopping behavior that might be occurring. The samples with both index and index2 values present in the sample sheet are present in the index hopping file for reference. The following information is included in the Index_Hopping_Counts.csv output file.

Column	Description
Lane	The lane for each metric.
SampleID	If the index combination corresponds to a sample, the contents of Sample_IDin the sample sheet for this sample.
index	The contents of index in sample sheet for the sample.
index2	The contents of index 2i n sample sheet for the sample.
# Reads	The total number of pass-filter reads mapping to the index and index2 combination.

Top Unknown Barcodes Output File

The Top_Unknown_Barcodes.csv file lists the most commonly-encountered barcode sequences in the flow cell input that are not listed in the sample sheet. The 100 most common unlisted sequences are listed, along with any other sequences with a frequency equivalent to the 100th most commonly encountered sequence. The following information is included in the Top_Unknown_Barcodes.csv output file.

Column	Description
Lane	The lane for each metric.
index	The first index value of this unlisted sequence
index2	The second index value of this unlisted sequence
# Reads	The total number of pass-filter reads mapping to the index and index2 combination

FASTQ List Output File

The fastq_list.csv output file is located in the output folder with the FASTQ files. The files provides the associations between the sample indexes, lane, and the output FASTQ file names. For information on running DRAGEN using fastq_list.csv, see, lane, and the output FASTQ file names. The columns of each row are documented below, along with example entries from a test run. For more information on running DRAGEN using fastq_list.csv, see FASTQ CSV File Format.

Column	Description
RGID	Read Group
RGSM	Sample ID
RGLB	Library
Lane	Flow cell lane
Read1File	Full path to a valid FASTQ input file
Read2File	Full path to a valid FASTQ input file. Required for paired-end input. If not using paired-end input, leave empty,

The following is an example fastq_list.csv output file.

RGID,RGSM,RGLB,Lane,Read1File,Read2File

AACAACCA.ACTGCATA.1,1,UnknownLibrary,1,/home/user/dragen_bcl_out/1_S1_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/1_S1_L001_R2_001.fastq.gz

AATCCGTC.ACTGCATA.1,2,UnknownLibrary,1,/home/user/dragen_bcl_out/2_S2_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/2_S2_L001_R2_001.fastq.gz

CGAACTTA.GCGTAAGA.1,3,UnknownLibrary,1,/home/user/dragen_bcl_out/3_S3_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/3_S3_L001_R2_001.fastq.gz

GATAGACA.GCGTAAGA.1,4,UnknownLibrary,1,/home/user/dragen_bcl_out/4_S4_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/4_S4_L001_R2_001.fastq.gz