Pareto Optimization with Molecular Mutations

DH 2009-02-25

At last week's Accelrys User Group Meeting, Jim Metz gave a fascinating talk on simultaneous optimization of multiple molecular properties. His approach combined structural transformations ("mutations") of the molecular structure with the use of the Pareto Subset Optimizer (in the Advanced Data Modeling collection). I refer you to his talk for more details on the problem he was addressing. You can see the abstract here: http://accelrys.com/events/ugms/ugm-2009/abstracts.html#metz -- and I believe his slides will be posted at some point.

Inspired by Jim's talk, I came up with a proof-of-concept protocol for an optimization scheme that is a bit simpler than the one he presented, and does not require that weights be specified for the properties being optimized. (An aside for practitioners: it is important to understand that while the Pareto components in Pipeline Pilot do allow you to specify weights, these weights do not affect the course of the optimization. Rather, they are used to produce an objective function that you can use to sort the Pareto-optimal samples after the optimization is complete.)

The protocol does a tradeoff optimization between blood-brain barrier penetration and molecular solubility. (You need the ADMET collection to compute the ADMET_BBB property, but you can replace it by any other property competitive with solubility if you don't have this collection.) Since I don't have a general "molecular mutator", I used the Enumerate Metabolites and Enumerate Bioisosteres components to generate new structures.

I have only run this protocol through a very simple test to establish that it moves the system in the correct direction over a few iterations, so caveat user. There are many ways it could be enhanced, as suggested by the sticky notes inside the subprotocol. And I'm sure it could be made more efficient as well.

Pipeline 1 just grabs the first few compounds from Asinex to use as a starting point. Pipeline 2 does the actual optimization. Pipeline 3 then displays the results. These are the points on the first Pareto front, with a different color for each iteration. Over the first few iterations, you can see that the front moves in the general direction of increasing both solubility and ADMET_BBB (which corresponds to a decrease in BBB penetration).