|
Ploidy and Purity Calculation
Following segmentation, SENECA performs ploidy and purity calculations. These calculations are based on the principle that for each value of ploidy and purity and a selected copy number, the values of B allele and read count ratios are inferred.
For example, for copy number state 1 (1 deleted allele of a diploid genome), the B allele ratio is always near 0 because only 1 allele is present. However, if a tumor sample has only 70% percent purity because of the presence of the normal genome as background, the B allele ratio increases due to the presence of a heterozygous normal allele. The low percentage of purity results in a final B allele ratio of 0.15.
SENECA fits a multivariate Gaussian distribution to copy data and B allele ratio data on a two-dimensional grid of varying ploidy and purity. On the grid, each state encodes ploidy and purity values. In addition, SENECA uses a separate state encoding copy neutral LOH and copy gain LOH to identify loss-of-heterozygosity events.
Ploidy and purity associated with the model that has the highest log-likelihood are then used to assign a copy number state to each segment. When both segments and copy numbers are estimated, a quality score for copy number assignment is computed using a likelihood ratio test. This test compares the likelihood of a current copy number assignment to a likelihood of assigning 1 more or 1 less copy. Results of the likelihood ratio test are then reported as a Q-score field in the VCF file using the following transformation: 2*log (s1/s2), where s1 is a sum of squares for selected model and s2 is a sum of squares for the next nearest model. Q-score threshold of 1.5 provides a good trade-off between sensitivity and specificity.