Generate "Fuzzy" Maximum Common Substructures for Clusters etc.

Name: Generate Fuzzy MCS

Author: Christian Herhaus

Version: 0.3

Created: 02/2013

Modified: 14/11/2013

Purpose: This example workflow with the included component "Generate Fuzzy MCS" is to introduce fuzziness into Maximum Common Substructures, i.e. for similarity-based clusters, and to propose a solution for automated generation of fuzzy MCS's. By that, much more flexibility in defining descriptive scaffold structures is provided for clusters whose intrinsic structural diversity is too high to be characterized by conventional Maximum Common Substructure approaches. Flexibility is introduced by standard query features so that the resulting "fuzzy" substructures are still fully database-searchable.

Supported query features are:

  • Element lists for atoms, e.g. [C,N,O]
  • Single/double or Any bonds

fuzzy-MCS-demo.png

The screenplay of the current approach is like that:

  • Set all atoms and bonds to "any"
  • Generate a conventional MCS
  • With this "Any-MCS", do a substructure search against the cluster molecules
  • Reduce the cluster molecule structures to the found substructures
  • Repeat substructure search against these reduced structures to standardize the mapping of query to reference atom/bond indices
  • Go through the atom/bond index lists of the reduced references and store all index-to-type correlations into lookup tables
  • Remove redundant information from these lookup tables
  • Now take the "Any-MCS" and replace the atom/bond types by the look-up table content:           
    •     if there is only one feature remaining for an atom/bond in the look-up table: Assign it directly
    •     if there are >=2 features remaining: Assign query features
  • Kekulize the result structure to solve remaining mesomeric structure issues

Feedback or contributions to improve the component are highly appreciated.

Requirements: Pipeline Pilot 8.5 or later

O/S: Windows and Linux

Limitations: This is still an early version which is not completely error free. Known issues:

  • The algorithm is depending on a consistent numeration of atoms/bonds. For symmetric substituents like carboxylic acids or nitro groups this is not always the case which causes inconsistent results
  • In some cases aromatic substructures or substructures adjacent to aromatic systems are not perceived well.

Keywords (tags): pipeline_pilot library cluster similarity maximum_common_substructure MCS fuzzy element_list query_bond query_feature

Contents: Fuzzy-MCS-Demo.xml

Installation:

1. Unzip the archive.

2. Either drag and drop the example protocol into the protocol window of your Pipeline Pilot client or import it into your user tab.

3. Run the protocol to explore the functionality.

Component "Generate Fuzzy MCS" updated to version 03

14.11.2013: Previous versions contained a minor bug which did not affect results but breaks the protocol under Pipeline Pilot v9.1. This is corrected now.