The BSSH DRAGEN apps mimic the pipelines found on the on-site DRAGEN server and should produce the same results as the on-site pipelines of the same software version as the app. However, some pipelines and software versions available on the on-site DRAGEN server may not be available as a BSSH DRAGEN app. Efforts are made to release on-site pipelines as BSSH DRAGEN apps, as needed.
The DRAGEN platform utilizes highly reconfigurable field-programmable gate arrays (FPGA) to provide hardware-accelerated implementations of genome analysis algorithms. The algorithms are implemented as logic circuits, which provide almost instantaneous outputs. FPGAs can perform operations in parallel and are thus not node-dependent, as is the case for CPU-based systems.
No. The DRAGEN software can only be run on the DRAGEN Bio-IT Platform server. Customers are also issued independent licenses that tie the usage of their licenses to specific servers.
DRAGEN is available on-site, hosted, or as a hybrid of both through secure data transfer between physical and cloud-based DRAGEN systems. The hybrid solution gives users the flexibility to scale up workflows in the cloud when higher analysis capacity is required.
DRAGEN processes one sample at a time using all the HW resources available (FPGA + CPU) and does it as fast as possible instead of sharing the resources to process multiple samples. This is true for on-site as well cloud analysis. Typically, the run times are so fast that running serially is not a problem and a single DRAGEN Server can easily keep up with a Novaseq™ 6000. For large scale WGS processing, with many Novaseq 6000s, additional Servers may be needed to keep up. It is also possible to schedule many DRAGEN instances in parallel in the Cloud, where each instance processes one sample.
Yes. Use the following command line:
DRAGEN --enable-map-align=false --file-conversion=true --cram-input=<file.cram> --output-format=bam -- output-directory=<out dir> --output-file-prefix=<file prefix> --ref-dir=<REF>
If you are using the Tumor-Normal BAM input option and your BAM read groups have a shared RGID, DRAGEN cannot determine the read group for the reads. Ideally, you should have different RGIDs for each read group, but you can work around the error by setting the -prepend-filename-to-rgid option to true on the command line.
The output BAM and VCF files are stored in the specified output
directory using the --output-directory option. Temporary/intermediate
files are stored in the output directory, but it is recommended that
you save intermediate files in the /staging folder for on-site servers
or in the ephemeral or EBS in the cloud. To specify the storage
location for intermediate/temporary files, use the following command
By default, DRAGEN outputs three supplementary alignments and no secondary alignments. The maximum number of supplementary (chimeric) and secondary (suboptimal) alignments per read is 30. For secondary alignments, use the --supp-aligns, and --sec-aligns filtering options to specify the maximum score difference or phred-scale likelihood difference between the best alignment and a secondary.
The DRAGEN system can only be used by one user at a time. The best way to handle multiple users is to use a job queueing tool that can accept jobs from users and queue them for processing by DRAGEN. The tool would then call the DRAGEN software with one job at a time, and then notify the user when a job is completed. DRAGEN works well with SLURM or LSF as a scheduler to manage multiple users.
Yes, you can provide a .bed file to indicate the intervals where you want calling to be performed using the --vc-target-bed option. For more information, see the Illumina DRAGEN Bio-IT Platform User Guide.
If the following message occurs, the SMT settings on the server are not correct.
SMT settings : SMT=8\ The SMT settings on this server are not correct. Please contact email@example.com
To resolve this, run the /opt/edico_driver/bin/edico_bootup.sh script.
For information on the full list of optional filters, see the Illumina DRAGEN Bio-IT Platform User Guide.
DRAGEN supports gzip and cram input formats. DRAGEN also generates compressed outputs.
The methodology for MAPQ calculation is similar to BWA MEM. Alignment candidate pair scores are formed by summing mate alignment scores and subtracting a penalty representing the likelihood of insert size versus the empirical distribution. MAPQ is primarily proportionate to the pair score difference between best and second-best candidates, with corrections for clustered suboptimal scores. Some inconsistencies in BWA’s calculation are corrected, particularly in cases where BWA applies a different scaling from alignment score differences to MAPQ in paired versus unpaired cases.
DRAGEN V3 is a MAP/A, Haplotype Caller with the following stages:
Pipeline-specific metrics are generated during each run (some options need to be enabled). For more information, see the Illumina DRAGEN Bio-IT Platform User Guide. DRAGEN results are not in a FASTQC format. FASTQC only reports information based on the FASTQ file. DRAGEN QC metrics are more informative and include metrics such as coverage and number of unique reads
DRAGEN primary alignments are soft-clipped, so all the bases and quals are present. By default, the supplemental alignments are hard-clipped.
You can disable hard clips using the following option:
This option, ranging from 0–7, is considered a field of 3 bits. Bit 0 is for primary alignments, bit 1 for supplementary alignments, and bit 2 for secondary alignments. Each bit determines whether local alignments of that type are reported with hard clipping (1) or soft clipping (0). The default is 6, meaning only primary alignments use soft clipping.
If DRAGEN is run with default settings, then no read bases or base quals are lost. If the hard-clips option is set to 7, then data may be lost. The bases and base quals in unaligned reads are retained in the DRAGEN output BAM.
For performance, DRAGEN BCL conversion uses lighter compression than bcl2fastq2 on the FASTQ files generated. Even though the compressed files are not the same size, the underlying FASTQ files may be identical. To determine if FASTQ files generated between the two platforms are identical, the md5sum values of the underlying (uncompressed) FASTQ files should be calculated for comparison. If the md5sum values of the underlying FASTQ files are identical, then the FASTQ files are identical regardless of the difference in compressed (fastq.gz) file size.
All analysis generated temp files that are deleted at the end of each analysis should be stored on the local SSD drive. Use the --intermediate-results-dir option to specify the location for temp files. By default, the temp files are written to the output directory. If your output directory is on network storage, it will slow down the DRAGEN analysis.
Fatal exception: Assertion failed in ../src/host/vc/vcaller.cpp line 358 -- targetBed.verifyContigs(m_refDict) -- Target BED file contains contigs not present in reference dictionary
This message is displayed when a Target BED file (--vc-target-bed) is passed to DRAGEN, but the contigs in that file do not match the contigs in the specified reference (--ref-dir). This can happen if the reference used is hg19, which uses the following naming convention for contigs: 'chr1, chr2, chr3...chrX,chrY', but the target bed file uses GRCh37, which names contigs as: '1, 2, 3...X,Y'.
Yes. The --combine-samples-by-name or --fastq-list options can be used for this purpose. For information on both options, see the Illumina DRAGEN Bio-IT Platform User Guide.
The --fastq-list option is preferred. You can specify all FASTQ files for a sample along with read group IDs. The --fastq-list file is in CSV format and can include all the samples, such as all samples for a flow cell. To process a specific sample from the CSV file, use --fastq-list-sample-id. Only the entries in the fastq-list file with an RGSM value matching the specified SampleID are processed. If you are using DRAGEN BCL conversion, the conversion process generates the fastq-list CSV file.
The DRAGEN WGS and WES pipelines have about 200 options and several major modes. Many options are available for tweaking and optimization to handle specific workflows. For example, inserting stages at the output of the BAM, filtering of reads pre-FATSQ, and adapter trimming are possible.
It is recommended that you save the hash table on the fast SSD drive that is local to the server. For an on-site server, the hash table should be stored under /staging. In the cloud, the hash table should be stored on the NVME local drive.
Variant calling specificity can be improved by eliminating misaligned reads that match an ALT contig but score poorly against the primary assembly.
If you have loaded a study into Cohort Analyzer, customer administrators can then permission the study to share with collaborators.
Each user of Cohort Analyzer has their own secure, private data center-based domain where their private data is stored. Only users within your domain can access this information.
Users can import private studies into Cohort Analyzer through the Data Uploader. Supported data types are somatic mutation, copy number variation, and RNA-seq data.
Yes. Cohort Analyzer reports can be exported and saved locally in PDF format. You can also download molecular data such as RNA expression, copy number variation, DNA methylation and somatic mutation information as *.csv files.
If the same sample prep is used, there is no need to recreate a panel of normals for each flow cell, assuming the typical QC steps are followed for the flow cell. There is no harm in regenerating the panel of normals for each flow cell, except the additional processing time. The control samples used in the panel of normals are used for median normalization in the DRAGEN CNV algorithm. In general, a larger number of samples help with determining the reference level. This is especially important in regions where there may be reference minor calls.
If the samples are still germline, without any copy number aberrations, then they may be used to increase the size of the panel of normals. However, Illumina does not recommend making calls within or near segmental duplicate regions with the current algorithm.
Yes. If you are processing exomes, you need a lot of samples to build up the annotations dataset size to have any statistical significance. Thirty is an arbitrary number, but works. The DRAGEN software reports an error if it detects no variance.
For WGS processing, Illumina recommends at least 10-15 samples. If you are calling in regions where the population may have reference minor calls, then the use of more samples will help in determining which reference baseline is the most representative in the population.
You can include groups of families or trios in your panel of normals, if you wish to use them to increase the size of the panel of normals. However, the panel of normals should not be solely comprised of family members.
Yes, DRAGEN supports joint calling of trios, large cohorts or populations with the available Combine gVCF and Joint Genotyping capability.
Yes. Use the same command line option that is used for VCF, --vc-hard-filter.
The force genotyping feature was introduced in DRAGEN 3.1. For more information, refer to the DRAGEN Bio-IT Platform User Guide.
You don't have to re-analyze all the samples again. The previously generated single sample gVCF can be re-used. You need to generate a new sample gVCF and rerun the joint calling step to joint call all the new samples added to the cohort that was previously joint called.
No, the DRAGEN pipeline handles systematic base calling errors using a Base Quality Drop Off algorithm. For more information, refer to the DRAGEN Bio-IT Platform User Guide.
Yes, DRAGEN uses the same approach as Mutect2 with regards to the panel of normals. You can provide a panel of normals file as input on the DRAGEN command line. For more information, refer to the DRAGEN Bio-IT Platform User Guide.
DRAGEN outputs both a raw and hard-filtered VCF by default. The filter used in the hard-filtered VCF is based on QUAL. You can configure hard-filtering by using the --vc-hard-filter option.
Currently, the depth is counted as 2, but in the future it will be counted as 1.
When calculating the GT probabilities for each genotyping event, DRAGEN keeps the mate with the strongest evidence and zeroes-out the HMM scores of the other if the mates agree on the best-scoring allele. If they disagree, DRAGEN sums the HMM scores and assigns the combined score to the mate that agrees with the combined result and zeroes-out the other.
While this is a sub-optimal detection algorithm, the goal was to solve the worst problems (double-counting of evidence) without introducing new ones. The optimal detection algorithm is quite complex and our current studies show that it is unlikely to have any significant advantage over the simple algorithm described above.
DRAGEN 3.3 only uses dbsnp for annotation. DRAGEN 3.3 does not currently support the --germline_resource option to filter out calls. This may be supported in the future.
To generate a panel of normals VCF file, run DRAGEN somatic in tumor-only mode on several non-somatic samples that use the same sequencer/prep as the somatic sample. To get the panel of normals file, determine the intersection of the calls, while ignoring filters. The purpose of the Panel of Normals file is to filter out artifacts caused by the sequencing process.
No. Currently, DRAGEN can generate a VCF or a gVCF, but not both in a single run.
The tumor-only pipeline lacks a matching normal sample and produces a VCF file containing both somatic and germline variants. The same set of filters are applied as for the Tumor-Normal pipeline (except those filters that are dependent upon having the normal sample available). More downstream filtering can be applied (via Panel of Normals and available databases of germline variants) to remove germline variants.