Reference Seed Interval

The size of the DRAGEN hash table is proportionate to the number of seeds populated from the reference genome. The default is to populate a seed starting at every position in the reference genome, ie, roughly 3 billion seeds from a human genome. This default requires at least 32 GB of memory on the DRAGEN PCIe board.

To operate on larger, nonhuman genomes or to reduce hash table congestion, it is possible to populate less than all reference seeds using the --ht-ref-seed-interval option to specify an average reference interval. The default interval for 100% population is --ht-ref-seed-interval 1, and 50% population is specified with --ht-ref-seed-interval 2. The population interval does not need to be an integer. For example, --ht-ref-seed-interval 1.2 indicates 83.3% population, with mostly 1-base and some 2-base intervals to achieve a 1.2 base interval on average.