Spinal Muscular Atrophy Calling
Disruption of all copies of the SMN1 gene in an individual causes spinal muscular atrophy (SMA). SMN1 has a very high identity paralog, SMN2, with differs only in approximately 10 SNVs and small indels. One of these (hg19 chr5:70247773 C->T) affects splicing and largely disrupts the production of functional SMN protein from SMN2. Standard WGS analysis does not produce complete variant calling results for SMN due to this high-similarity duplication combined with common copy-number variation. However, approximately 95% of SMA cases can be detected by determining the absence of the functional C (SMN1) allele in any copy of SMN.
DRAGEN SMA calling uses sequence-graph realignment to align reads to a single reference representing SMN1 and SMN2. In addition to the standard diploid genotype call, DRAGEN uses a direct statistical test to check for presence of any C allele. If no C allele is detected, the sample is called affected, otherwise unaffected.
SMA calling is only supported for human whole-genome sequencing samples in PCR-free libraries.
Usage
SMA calling is implemented together with repeat expansion detection. For information on graph-alignment and options, see Repeat Expansion Detection with Expansion Hunter.
SMA calling is enabled, along with repeat expansion detection, by setting the --repeat-genotype-enable option to true. To activate SMA calling, the variant specification catalog file must include a description of the targeted SMN1/2 variant. Example files are available in the /opt/edico/repeat-specs/experimental folder.
SMN output is included along with any targeted repeats in <outputPrefix>.repeat.vcf. SMN output is represented as a single SNV call at the key (splice-affecting) position in SMN1, with SMA status in custom fields:
Field |
Description |
---|---|
VARID |
SMN marks the SMN call. |
GT |
Genotype call at this position using a normal (diploid) genotype model. |
DST |
SMA status call: + indicates detected, - indicates undetected, ? indicates undetermined. |
AD |
Total read counts supporting the C and T allele. |
RPL |
Log10 Likelihood ratio between the affected and unaffected models. Positive scores are in favor of unaffected. |