Slightly Different MCSS Clustering

JM 2015-02-04

BIOVIA Pipeline Pilot

I have need of a slightly different MCSS Clustering / Fuzzy matching algorithm.

I can read in a set, say 20 small molecules, that I would like to represent as query (core) molecules. These

structures may be considered as fuzzy substructures. They have already been chosen by chemists

and hence I do not want a computer to re-decide any substructures.

I would then like to read in a database of molecules and then cluster molecules in that database

according the substructure similarity of those molecule to the query molecules.

This is not a substructure matching problem, because there may be one or more atoms

in my database molecules that don't match my pre-defined query molecules. However

the algorithm should attempt to make the matches as close as possible within defined

criteria. If the similarity is not sufficient, the unmatched molecules are all placed in some

group.

I have checked in several locations in the PP User Forum for protocols that would perform

this type of analysis, but I could not find anything that specifically uses a pre-defined set

of core molecules.

Does anyone have ideas how to do this or a protocol that they are willing to share?

Thank you.

Regards,

Jim Metz