Matched Molecular Pairs as automatic idea generator

Attached is an example protocol showing the new matched molecular pairs (MMPs) calculator in Pipeline Pilot. The MMPs are converted to two different reaction objects (RXNs): Minimal tranformations, for instance R-Cl-->R-OH, and Full transformations where the R group in the minimal transformations is fully exemplified.  The reaction objects are fully atom-mapped which allows calculation of reaction fingerprints (RCFP). The reaction fingerprints are used to perform activity modeling, not of molecules but of transformations. The MMP transformations together with the reaction fingerprint models are combined in an automatic 'what to make next' idea generator. 

The protocol consists of three different parts separated by green sticky notes:
1. Preparation: derivation of the minimal and full reaction objects from the attached example input file
2. A histogram of delta(pIC50) in the MMP set is derived as illustration. A simple Bayesian model is derived from the full reaction objects. This model predicts the likelihood for a given reaction whether it will yield a product that is >=10 times more active than the reactant. Various reaction fingerprints are tried showing the best results are obtained with RCFP_6. A semi-quantitative multi-category Bayesian model is derived next which predicts for a given reaction the delta(pIC50) bin (bin size of 0.5). 3. The dataset is randomly split in 80% training / 20% test. A classic PLS model is derived from the training set. The MMPs are derived from the training set and converted in minimal and full reaction objects. A multi-category Bayesian model is derived from the full reactions. The minimal MMP transformations are applied to the test set molecules to get ~6M ideas of what could be made next. Many of these ideas can be discarded immediately since they are chemically unrealistic. Some ideas are actually molecules from the training set and these are used to compare predicted activity by classic QSAR vs. multi-categorical Bayesian vs. Predicted activity = activity of reactant + delta(activity) of tranformation.

The above was presented at the Accelrys European Science Forum on June 2012 in Brussels. Since then Torsten Schindler (Roche) has contributed a much more elegant reaction atom to atom mapper.

Slides: http://slidesha.re/RpEcXw

Requirements: Pipeline Pilot 8.5 CU2 or later; Chemistry, Reporting & Data Modeling collections

Keywords (tags): Matched Molecular Pair, MMP, reaction fingerprint, RCFP, pipeline_pilot, library

This protocol is fairly resource intensive, you might have to run it in parts.