QUAL, QD, and GQ Formulation
In single sample VCF and gVCF, the QUAL follows the definition of the VCF specification (https://samtools.github.io/hts-specs/VCFv4.3.pdf).
• | QUAL is the Phred-scaled probability that the site has no variant and is computed as: |
QUAL = -10*log10 (posterior genotype probability of a homozygous-reference genotype (GT=0/0))
That is, QUAL = GP (GT=0/0), where GP = posterior genotype probability in Phred scale.
QUAL = 20 means there is 99% probability that there is a variant at the site. The GP values are also given in Phred-scale in the VCF file.
• | GQ is the Phred-scaled Probability that the call is incorrect. |
GQ=-10*log10(p), where p is the probability that the call is incorrect.
GQ=-10*log10(sum(10.^(-GP(i)/10))) where the sum is taken over the GT that did not win.
So, GQ of 3 means there's a 50 percent chance that the call is incorrect, and GQ of 20 means there's a 1 percent chance that the call is incorrect.
• | QD is the QUAL normalized by the read depth, DP. |
Metric |
QUAL |
GQ |
QD |
Description |
Probability that the site has no variant |
Probability that the call is incorrect |
Qual normalized by Depth |
Formulation |
QUAL = GP(GT=0/0) |
GQ=-10*log10(p) |
QUAL/DP |
Scale |
Unsigned Phred |
Unsigned Phred |
Unsigned Phred |
Numerical example |
QUAL=20: 1 % chance that there is no variant at the site QUAL=50: 1 in 1e5 chance that there is no variant at the site |
GQ=3, 50% chance that the call is incorrect GQ=20, 1% chance that the call is incorrect |
|