Does anyone know of a way to improve the calculated similarity between scaffolds containing R-groups and molecules which do not contain R-groups?
I've experimented using the Tversky coefficient and various fingerprints but still can't acheive similarities that are as reflective as hoped.
For example using the scaffold as the query with beta=0 in the Tversky coefficient applied to ECFP_8 fingerprints
Example 1a)
I would have hoped that since the scaffold is a direct substructure of the molecule, the similarity would be much closer to 1, and I understand that it is the scaffold R-groups that are the issue
Changing [*] to [R] or [H] does not improve the situation
Example 1b)
Example 2)
Example 3)
I have explored some other things such as substructure searching and generating R-capped 'scaffolds' by fragmenting the molecules - thus enabling comparison of like with like. I've also considered various schemes to cap the scaffolds to generate molecules to enable comparison of like with like. However, as I'm sure you can appreciate there are significant limitations to each of these approaches.
What I really need is a fingerprint where somehow the scaffold R atoms can match any atom
Anyone have thoughts on whether this is possible and if so how?...