Pipeline Specific Hash Tables

When building a hash table, DRAGEN configures the options to work for DNA-seq processing by default. To run RNA-Seq data, you must build an RNA-Seq hash table using the --ht-build-rna-hashtable true option. For an RNA-Seq alignment run, refer to the original --output-directory, not to the automatically generated subdirectory.

The CNV pipeline requires that the hash table be built with --enable-cnv set to true, which generates an additional k-mer hashmap that is used in the CNV algorithm. Illumina recommends that that you always use the --enable-cnv option, in case you wish to perform CNV calling with the same hash table that is used for mapping and aligning.

DRAGEN methylation runs require building a special pair of hash tables with reference bases converted from C->T for one table, and G->A for the other. When running the hash table generation with the --ht-methylated option, these conversions are done automatically, and the converted hash tables are generated in a pair of subdirectories of the target directory specified with --output-directory. The subdirectories are named CT_converted and GA_converted, corresponding to the automatic base conversions. When using these hash tables for methylated alignment runs, refer to the original --output-directory and not to either of the automatically generated subdirectories.

These base conversions remove a significant amount of information from the hashtables, so you may find it necessary to tune the hash table parameters differently than you would in a conventional hash table build. The following options are recommended for building hash tables for mammalian species:

dragen --build-hash-table=true --output-directory $REFDIR \
--ht-reference $FASTA --ht-max-seed-freq 16 \
--ht-seed-len 27 --ht-num-threads 40 --ht-methylated=true