Clustering
During the clustering process, all samples and amplicons are jointly clustered to produce a single summary clustering result for each manifest. The clustering characteristics include the following:
|
▶
|
Clustering is agglomerative hierarchical. |
|
▶
|
Clustering uses correlation distance measure. |
|
▶
|
The method uses average linkage. |
The clustering process uses the raw hit counts as input instead of the normalized values from differential expression. There is no gene-based normalization. The following steps are performed on the input data:
|
1
|
Set a minimum count threshold of 1. |
|
2
|
Log-transform the counts. |
|
3
|
Perform median-normalization across all samples. |
After these steps are completed, the clustering analysis begins.
Clustering can be performed on both samples and amplicons. To cluster the amplicons, there must be at least 2 amplicons and at least 3 samples. To cluster the samples, there must be at least 2 samples and 3 amplicons.
When clustering is completed, the data are then further normalized for downstream display. Within each amplicon, the data are MAD-normalized, which uses the following formula: <normalized value> = (<raw value> - <median>)/<median absolute deviation>. This step ensures that the expression values for each gene are approximately on the same scale.
The workflow produces the following output files:
TruSeq Targeted RNA v1.0 App Online Help