Analysis Methods | Clustering

Clustering

During the clustering process, all samples and amplicons are jointly clustered to produce a single summary clustering result for each manifest. The clustering characteristics include the following:

Clustering is agglomerative hierarchical.
Clustering uses correlation distance measure.
The method uses average linkage.

The clustering process uses the raw hit counts as input instead of the normalized values from differential expression. There is no gene-based normalization. The following steps are performed on the input data:

1 Set a minimum count threshold of 1.
2 Log-transform the counts.
3 Perform median-normalization across all samples.

After these steps are completed, the clustering analysis begins.

Clustering can be performed on both samples and amplicons. To cluster the amplicons, there must be at least 2 amplicons and at least 3 samples. To cluster the samples, there must be at least 2 samples and 3 amplicons.

When clustering is completed, the data are then further normalized for downstream display. Within each amplicon, the data are MAD-normalized, which uses the following formula: <normalized value> = (<raw value> - <median>)/<median absolute deviation>. This step ensures that the expression values for each gene are approximately on the same scale.

The workflow produces the following output files:

Clustering heat map
Normalized hits sample
Amplicon dendrogram
Sample dendrogram