Molecule selections too similar

KB 2015-02-04

Hi,

It's often the case that we need to select a subset of molecules based on a number of objectives including chemical desirability and diversity.

Using the Pareto subset optimiser is a nice tool for doing this, for example, maxmising unique feature counts for fingerprint bits or Murcko scaffolds can help encourage diversity.

However, I have found that similar molecules can still be selected and think that better performance could result from minimising the maximum similarity between compounds within the subset.

Has anyone tried this or knows whether this might be easy to implement? - I notice there is a "custom score" option available although can't quite see whether/how this could be used for this purpose

Thanks in advance

Kris