Spinal Muscular Atrophy Calling

Disruption of all copies of the SMN1 gene in an individual causes spinal muscular atrophy (SMA). SMN1 has a very high identity paralog, SMN2, with differs only in approximately 10 SNVs and small indels. One of these (hg19 chr5:70247773 C->T) affects splicing and largely disrupts the production of functional SMN protein from SMN2. Standard WGS analysis does not produce complete variant calling results for SMN due to this high-similarity duplication combined with common copy-number variation. However, approximately 95% of SMA cases can be detected by determining the absence of the functional C (SMN1) allele in any copy of SMN.

DRAGEN SMA calling uses sequence-graph realignment to align reads to a single reference representing SMN1 and SMN2. In addition to the standard diploid genotype call, DRAGEN uses a direct statistical test to check for presence of any C allele. If no C allele is detected, the sample is called affected, otherwise unaffected.

SMA calling is only supported for human whole-genome sequencing samples in PCR-free libraries.

Usage

SMA calling is implemented together with repeat expansion detection. For information on graph-alignment and options, see Repeat Expansion Detection with Expansion Hunter.

SMA calling is enabled, along with repeat expansion detection, by setting the --repeat-genotype-enable option to true. To activate SMA calling, the variant specification catalog file must include a description of the targeted SMN1/2 variant. Example files are available in the /opt/edico/repeat-specs/experimental folder.

SMN output is included along with any targeted repeats in <outputPrefix>.repeat.vcf. SMN output is represented as a single SNV call at the key (splice-affecting) position in SMN1, with SMA status in custom fields:

SMA Result in repeat.vcf Output

Field

Description

VARID

SMN marks the SMN call.

GT

Genotype call at this position using a normal (diploid) genotype model.

DST

SMA status call: + indicates detected, - indicates undetected, ? indicates undetermined.

AD

Total read counts supporting the C and T allele.

RPL

Log10 Likelihood ratio between the affected and unaffected models. Positive scores are in favor of unaffected.