Cluster Ligands protocol in Discovery Studio

Q:

What is the principle behind the "Cluster ligands" protocol in Discovery Studio?

Are the clustering of the ligands based on any particular molecular properties?

A:

The Cluster Ligands protocol will cluster a set of molecules into subsets (clusters) of molecules so that each molecule in the same cluster has similar properties.

The clustering method is from Pipeline Pilot, based on the root mean square (RMS) difference of numeric descriptor properties, and the Tanimoto distance for fingerprints properties, or a combination of the two if both numeric descriptors and fingerprints are being used.

The clustering is done by a relocation method based on maximal dissimilarity partitioning, i.e. Classify input molecules into clusters by maximize the distance between cluster centers while minimizing the distance within a cluster.

In the maximal dissimilarity partitioning method, the algorithm begins by randomly choosing a data record as the first cluster center. The record maximally distant from the first point is selected as the next cluster center. The record maximally distant from both current points is selected after that. The process repeats itself until there is a sufficient number of cluster centers. The objects that have not been selected are then assigned to the nearest cluster center to determine the cluster membership.

For more details please refer to the DS Help page section and reference therein at: Small Molecules tools > Theory – Small Molecules > Cluster Molecules