Ploidy and Purity Calculation

Following segmentation, SENECA performs ploidy and purity calculations. These calculations are based on the principle that for each value of ploidy and purity and a selected copy number, the values of B-allele and read count ratios are known a priori. For example, for copy number state 1 (one deleted allele of a diploid genome), the B-allele ratio is always near 0 because only one allele is present. However, if a tumor sample has only 70% percent purity (because of the presence of the normal genome as background), the B-allele ratio will increase due to the presence of a heterozygous normal allele; this would result in a final B-allele ratio of 0.15.

SENECA fits a multivariate Gaussian distribution to copy data and B-allele ratio data on a two-dimensional grid of varying ploidy and purity. On the grid, each state encodes ploidy and purity values. In addition, SENECA uses a separate state encoding copy neutral LOH and copy gain LOH to identify loss-of-heterozygosity events.

Ploidy and purity associated with the model having highest log-likelihood are then used to assign a copy number state to each segment. When both segments and copy numbers are estimated, a quality score for copy number assignment is computed using a likelihood ratio test. This test compares the likelihood of a current copy number assignment to a likelihood of assigning one more or one fewer copies. Results of the likelihood ratio test are then reported as a Q-score field in the VCF file using the following transformation: 2*log (s1/s2), where s1 is a sum of squares for selected model and s2 is a sum of squares for the next nearest model. Q-score threshold of 1.5 provides a good trade-off between sensitivity and specificity.


© 2014 Illumina, Inc. All rights reserved.	15050951 Rev. B