Enancio is a company recently acquired by Illumina with proprietary lossless data compression technology specifically designed for genomics data.
Fastq files and BAM or CRAM files are typically stored for different purposes. However, fastq.ora files enable you to store a compressed copy of your raw data with a preserved MD5 sum and smaller footprint than the corresponding CRAM file.
DRAGEN can now enable compression of two different formats: FASTQs and BAMs to fastq.ora and CRAM, respectively.
DRAGEN ORA lossless compression is specifically designed for genomics data. The DNA sequence is compressed using a reference-based method: reads are mapped on a reference genome using an ultra-fast mapping scheme devised for compression. A compact binary format is used to encode reads as positions and a list of differences, followed by an entropy coder. Quality scores are encoded in a lossless way using a range encoder and context models adapted to the different types of quality schemes.
Utilization of the compression is completely optional. DRAGEN users remain free to adopt the storage strategy they want: activate the conversion to Illumina FASTQ compressed file format and store these files, disable the conversion to DRAGEN ORA compressed file format fastq.ora and store fastq.gz, or store BAM or CRAM files.
Yes. With the DRAGEN 3.8 release, compression is totally seamless and compressed fastq.ora files are directly ingested into the DRAGEN mapper.
Additionally, once the free decompression software is installed, a simple command can be used to directly pipe the output of decompression on the fly into a wide range of popular mapping tools such as BWA3, STAR4, and Bowtie5.
DRAGEN ORA compression technology reduces the data footprint of FASTQ files by a factor of 51 compared to gzip. This translates into direct storage cost savings and more rapid file transfer speeds.
DRAGEN ORA FASTQ FASTQ compressed files can be shared. The decompression software is freely available. Once the free decompression software is installed, a simple command can be used to directly pipe the output of decompression on the fly into a wide range of popular mapping tools such as BWA3, STAR4, and Bowtie5.
A 235 GB raw FASTQ file can be compressed to 55 GB via gzip. The data footprint is further reduced to 11 GB with the DRAGEN ORA compression technology2.
The output of the compression technology is a new compressed FASTQ binary file format: fastq.ora. This file format can be stored and shared to enable significant storage cost savings and reduced file transfer times. All compressed files can be decompressed with the freely available decompression software.
Fastq.ora files can be decompressed on the fly for mapping and downstream analysis and will soon be directly ingested by DRAGEN.
The compression technology will be integrated across the Illumina portfolio in stages and will give users the option to produce compressed FASTQ files that are up to 5x smaller than fastq.gz1. Compression is already available on NextSeq 1000/2000. Beginning with the v3.8 release, compression will also be available on DRAGEN servers with native ingestion of compressed FASTQ files into the DRAGEN mapper.
During the NGS workflow, you can optionally enable compression to generate compressed fastq.ora files. With the DRAGEN v3.8 release, fastq.ora files can be directly ingested by the DRAGEN mapper for a seamless integration. fastq.ora files can also be decompressed on-the-fly for other mapping and downstream analyses. The integration of compression within DRAGEN BCL conversion streamlines the workflow, as shown in the figure below: