ROH Caller
Regions of homozygosity (ROH) are detected as part of the small variant caller. The caller detects and outputs the runs of homozygosity from whole genome calls on autosomal human chromosomes. Sex chromosomes are ignored. ROH output allows downstream tools to screen for and predict consanguinity between the parents of the proband subject.
A region is defined as consecutive variant calls on the chromosome with no large gap in between these variants. In other words, regions are broken by chromosome or by large gaps with no SNV calls. The gap size is set to 3 Mbases.

The ROH algorithm runs on the small variant calls. It excludes variants with multiallelic sites, indels, complex variants, non-PASS filtered calls, and homozygous reference sites. The variant calls are then filtered further using a blacklist bed, and finally depth filtering is applied after the blacklist filter. The default value for the fraction of filtered calls is 0.2, filtering the calls with the highest 10% and lowest 10% in DP values. The resulting calls are then used to find regions.
The ROH algorithm first finds seed regions that contain at least 50 consecutive homozygous SNV calls with no heterozygous SNV or gaps of 500,000 bases between the variants. The regions can be extended using a scoring system that functions as follows.
• | Score increases with every additional homozygous variant (0.025) and decreases with a large penalty (1–0.025) for every heterozygous SNV. This provides some tolerance of presence of heterozygous SNV in the region. |
• | Each region expands on both ends until the regions reach the end of a chromosome, a gap of 500,000 bases between SNVs occurs, or the score becomes too low (0). |
Overlapping regions are merged into a single region. Regions can be merged across gaps of 500,000 bases between SNVs if a single region would have been called from the beginning of the first region to the end of the second region without the gap. There is no maximum size for regions, but regions always end at chromosome boundaries.

• | --vc-enable-roh |
Enable or disable the ROH caller by setting this option to true or false. Enabled by default for human autosomes only.
• | --vc-roh-blacklist-bed |
If provided, the ROH caller ignores variants that are contained in any region in the blacklist BED file. DRAGEN distributes blacklist files for all popular human genomes and automatically selects a blacklist to match the genome in use, unless this option is used explicitly select a file.

The ROH caller produces an ROH output file named <output-file-prefix>.roh.bed in which each row represents one region of homozygosity. The bed file contains the following columns:
Chromosome Start End Score #Homozygous #Heterozygous
Where
• | Score is a function of the number of homozygous and heterozygous variants, where each homozygous variant increases the score by 0.025, and each heterozygous variant reduces the score by 0.975. |
• | Start and end positions are a 0-based, half-open interval. |
• | #Homozygous is number of homozygous variants in the region. |
• | #Heterozygous is number of heterozygous variants in the region. |
The caller also produces a metrics file named <output-file-prefix>.roh_metrics.csv that lists the number of large ROH and percentage of SNPs in large ROH (> 3 MB).