DRAGEN BCL Data Conversion
BCL format is the native output format of Illumina sequencing systems and consists of a directory hierarchy containing data files and metadata. The data files are organized according to the flow cell layout of the sequencing system. The software converts this data to sample separated FASTQ files.
DRAGEN provides BCL conversion software that uses hardware acceleration on the DRAGEN platform, which results in improved run times compared to a pure software execution. To run the conversion software, use the --bcl-input-directory <BCL_ROOT>, --output-directory <DIR>, and --bcl-conversion-only true options.
The DRAGEN BCL conversion is designed to output FASTQ files that match bcl2fastq2 v2.20 output.
DRAGEN BCL conversion supports the following features.
• | Demultiplexing samples by barcode with optional mismatch tolerance. |
• | Adapter sequence masking or trimming with adjustable matching stringency. |
• | UMI sequence tagging and optional trimming. |
• | Output of FASTQ files for index reads. |
• | [Optional] Combine all lanes to the same FASTQ output files. |
• | High sample count support (100,000). |
• | UMI sequences in index reads. |
• | Eliminate skew as the result of adapter sequence trimming by using the MinimumAdapterOverlap setting. |
• | Outputs metrics to detect index-hopping. |
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
The following example command contains the required BCL conversion options:
dragen --bcl-conversion-only true --bcl-input-directory <...> --output-directory <...>
The following additional options can be specified on the command line:
• | --sample-sheet—Specifies the path to SampleSheet.csv file. --sample-sheet is optional if the SampleSheet.csv file is in the --bcl-input-directory directory. |
• | --run-info—Sets the path to the RunInfo.xml file. By default, the file is located in the --bcl-input-directory directory. |
• | --strict-mode—If set to true, DRAGEN aborts if any files are missing. The default is false. |
• | --first-tile-only—If set to true, DRAGEN only converts the first tile of input (for testing and debugging). The default is false. |
• | --bcl-only-lane <#>—Convert only the specified lane in this conversion run. |
• | -f—Convert to output directory even if the directory exists (force). |
• | --bcl-use-hw false—Do not use DRAGEN FPGA acceleration during BCL conversion. This allows concurrent execution of BCL conversion with DRAGEN analysis. |
• | --bcl-sampleproject-subdirectories true—Output FASTQ files to subdirectories based on sample sheet Sample_Project column. |
• | --no-lane-splitting true—Output all lanes of a flow cell to the same FASTQ files consecutively. |
• | --bcl-only-matched-reads true—Disable outputting unmapped reads to files marked as Undetermined. |
The following additional options can be used to manually control performance. Use of these options might reduce performance or result in analysis failure. Contact Illumina Technical Support if issues occur.
• | --shared-thread-odirect-output true —Switch to an alternate file output method that is optimized for sample counts greater than 100,000. This option is not recommended for lower sample counts and/or if using distributed file system targets such as GPFS or Lustre. |
• | --bcl-num-parallel-tiles <#>—Number of tiles processed in parallel. The default is determined dynamically. |
• | --bcl-num-conversion-threads <#>—Number of conversion threads per tile. The default is determined dynamically. |
• | --bcl-num-compression-threads <#>—Number of CPU threads for compressing FASTQ output. The default is determined dynamically. |
• | --bcl-num-decompression-threads <#>—Number of CPU threads for decompressing input BCL files. The default is determined dynamically. |
The input BCL root directory and the output directory must be specified. The specified input path is three levels higher than the BaseCalls directory and should contain the RunInfo.xml file.
To specify the directory to store output FASTQ files, use the --output-directory option.
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
In addition to the command line options that control the behavior of BCL conversion, you can use the [Settings] section in the sample sheet configuration file to specify how the samples are processed. The following are the sample sheet settings for BCL conversion.
DRAGEN does not support the following sample sheet settings from bcl2fastq
• | FindAdapterWithIndels |
• | ReverseComplement |
Option |
Default |
Value |
Description |
---|---|---|---|
AdapterBehavior |
trim |
trim, mask |
Whether adapter should be trimmed or masked. |
AdapterRead1 |
None |
Read 1 adapter sequence containing A, C, G, or T |
The sequence to trim or mask from the end of Read 1. |
AdapterRead2 |
None |
Read 2 adapter sequence containing A, C, G, or T |
The sequence to trim or mask from the end of Read 2. |
AdapterStringency |
0.9 |
Float between 0.5 and 1.0 |
The stringency for matching the read to the adapter using the sliding window algorithm. |
BarcodeMismatchesIndex1 |
1 |
0, 1, or 2 |
The number of allowed mismatches between the first Index Read and index sequence. |
BarcodeMismatchesIndex2 |
1 |
0, 1, or 2 |
The number of allowed mismatches between the second Index Read and index sequence. |
MinimumTrimmedReadLength |
The minimum of 35 and the shortest non-indexed read length. |
0 to the shortest non-indexed read length |
Reads trimmed below this point become masked at that point. |
MinimumAdapterOverlap |
1 |
1, 2, or 3 |
Do not trim detected adapter sequences shorter than this value. |
MaskShortReads |
The minimum of 22 and MinimumTrimmedReadLength. |
0 to MinimumTrimmedReadLength |
Reads trimmed below this point become masked out. |
TrimUMI |
1 |
0 or 1 |
If set to 0, UMI sequences are not trimmed from output FASTQ reads. The UMI is still placed in sequence header. |
CreateFastqForIndexReads |
0 |
0 or 1 |
If set to 1, output FASTQ files for index reads and genomic reads. |
OverrideCycles |
None |
Y: Specifies a sequencing read I: Specifies an indexing read U: Specifies a UMI length to be trimmed from read |
String used to specify UMI cycles and mask out cycles of a read. |
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
The OverrideCycles mask elements are semicolon separated. For example:
OverrideCycles,U7N1Y143;I8;I8;U7N1Y143
DRAGEN supports flexible UMI processing during BCL conversion to support more third-party assays, including UMI sequences in index reads and multiple UMI regions per read. UMI sequences are trimmed from FASTQ read sequences and placed in the sequence identifier for each read, as normal.
The following are examples of OverrideCycles settings using 2x151 reads:
Setting |
Description |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|
OverrideCycles,U7N1Y143;I8;I8;U7N1Y143 |
UMI is comprised of the first 7 bps of each genomic read, linked by 1 bps of ignored sequence. This is the format for Illumina nonrandom UMIs, used in the following products:
|
|||||||||
OverrideCycles,Y151;I8;U10;Y151 |
Index Read 2 is a 10 bps UMI. This is the format for Agilent XT HS. |
|||||||||
OverrideCycles,Y151;I8U9;I8;Y151 |
Index Read 1 contains both an index and a 9 bps UMI. This is the format for IDT Dual Index Adapters with UMIs. |
|||||||||
OverrideCycles,U3N2Y146;I8;I8;U3N2Y146 |
UMI is comprised of the first 3 bps of each genomic read, linked by 2 bps of ignored sequence. This is the format for UMIs in SureSelect XT HS 2 and IDT xGen Duplex Seq Adapter. |
|||||||||
OverrideCycles,Y151;I8;I8;U10N12Y127 |
UMI is at the beginning of Read 2, attached with a linker sequence of length 12. |
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
When using --no-lane-splitting true, DRAGEN FASTQ file name convention and FASTQ contents match bcl2fastq2 for the same feature.
DRAGEN only supports this mode when the Lane column is not specified in the sample sheet to make sure that all samples are present in all lanes in the same order listed. This is generally expected for flow cells with no fluidic boundaries between lanes.
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
DRAGEN BCL conversion outputs metrics in CSV format to the Reports/ output subfolder. Information provided includes metrics files for demultiplexing, adapter sequence trimming, index-hopping (for unique-dual indexes only), and the top unknown barcodes for each lane. In addition, the sample sheet and RunInfo.xml file used during conversion is copied into the Reports/ subdirectory for reference.
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
The following information is included in the Demultiplex_Stats.csv output file.
Column |
Description |
---|---|
Lane |
The lane for each metric. |
SampleID |
The contents of Sample_ID in the sample sheet for this sample. |
Index |
The contents of index in sample sheet for this sample. For dual-index, the value concatenated with index2. |
# Reads |
The total number of pass-filter reads mapping to this sample for the lane. |
# Perfect Index Reads |
The number of mapped reads with barcodes that match the indexes provided in the sample sheet. |
# One Mismatch Index Reads |
The number of mapped reads with barcodes matched with one base mismatched. |
# of >= Q30 Bases (PF) |
The total number of bases mapped to this sample with a quality score greater than or equal to 30. |
Mean Quality Score (PF) |
The mean quality score of all bases mapping to this sample. |
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
The following information is included in the Adapter_Metrics.csv output file.
Column |
Description |
---|---|
Lane |
The lane for each metric. |
Sample_ID |
The contents of Sample_ID in the sample sheet for this sample. |
index |
The contents of index in sample sheet for this sample. |
index2 |
The contents of index2 in the sample sheet for this sample. |
R1_AdapterBases |
The total number of bases trimmed as adapter from read 1 reads. |
R1_SampleBases |
The total number of bases not trimmed from read 1. |
R2_AdapterBases |
The total number of bases trimmed as adapter from read 2 reads. |
R2_SampleBases |
The total number of bases not trimmed from read 2. |
# Reads |
The total number of pass-filter reads mapping to this sample in this lane. |
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
For unique dual index inputs, the Index_Hopping_Counts.csv file provides the number of reads mapping to every possible combination of provided index and index2 values, including via mismatch tolerance. The metrics provide visibility into any index-hopping behavior that might be occurring. The samples with both index and index2 values present in the sample sheet are present in the index hopping file for reference. The following information is included in the Index_Hopping_Counts.csv output file.
Column |
Description |
---|---|
Lane |
The lane for each metric. |
SampleID |
If the index combination corresponds to a sample, the contents of Sample_IDin the sample sheet for this sample. |
index |
The contents of index in sample sheet for the sample. |
index2 |
The contents of index 2i n sample sheet for the sample. |
# Reads |
The total number of pass-filter reads mapping to the index and index2 combination. |
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
The Top_Unknown_Barcodes.csv file lists the most commonly-encountered barcode sequences in the flow cell input that are not listed in the sample sheet. The 100 most common unlisted sequences are listed, along with any other sequences with a frequency equivalent to the 100th most commonly encountered sequence. The following information is included in the Top_Unknown_Barcodes.csv output file.
Column |
Description |
---|---|
Lane |
The lane for each metric. |
index |
The first index value of this unlisted sequence |
index2 |
The second index value of this unlisted sequence |
# Reads |
The total number of pass-filter reads mapping to the index and index2 combination |
![Closed](../../../../Skins/Default/Stylesheets/Images/transparent.gif)
The fastq_list.csv output file is located in the output folder with the FASTQ files. The files provides the associations between the sample indexes, lane, and the output FASTQ file names. For information on running DRAGEN using fastq_list.csv, see, lane, and the output FASTQ file names. The columns of each row are documented below, along with example entries from a test run. For more information on running DRAGEN using fastq_list.csv, see FASTQ CSV File Format.
Column |
Description |
---|---|
RGID |
Read Group |
RGSM |
Sample ID |
RGLB |
Library |
Lane |
Flow cell lane |
Read1File |
Full path to a valid FASTQ input file |
Read2File |
Full path to a valid FASTQ input file. Required for paired-end input. If not using paired-end input, leave empty, |
The following is an example fastq_list.csv output file.
RGID,RGSM,RGLB,Lane,Read1File,Read2File
AACAACCA.ACTGCATA.1,1,UnknownLibrary,1,/home/user/dragen_bcl_out/1_S1_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/1_S1_L001_R2_001.fastq.gz
AATCCGTC.ACTGCATA.1,2,UnknownLibrary,1,/home/user/dragen_bcl_out/2_S2_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/2_S2_L001_R2_001.fastq.gz
CGAACTTA.GCGTAAGA.1,3,UnknownLibrary,1,/home/user/dragen_bcl_out/3_S3_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/3_S3_L001_R2_001.fastq.gz
GATAGACA.GCGTAAGA.1,4,UnknownLibrary,1,/home/user/dragen_bcl_out/4_S4_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/4_S4_L001_R2_001.fastq.gz