Open Source chemistry toolkit integration: RDKit on Server

Name: RDKit on Server

Author: Christian Herhaus

Version: 1.0

Created: 20/02/2012

Modified: 21/02/2012

Purpose: Demonstrates the possibilities of interacting between RDKit and Pipeline Pilot.

The Open Source cheminfomatics toolkit "RDKit" (http://www.rdkit.org) developed by Greg Landrum is getting more and more popular. While many of its functionalities are redundant with functionality existing in Pipeline Pilot, it may nevertheless be of interest to combine both tools i.e. for comparison or for getting the best results of both worlds.

While integration of external software can always be achieved the "usual" way by exporting and importing intermediate structures data files (SDF, Smiles or other formats) or by using SOAP, this causes unnecessary I/O or network traffic and may be avoided where possible. Therefore, in a somewhat experimental contribution, requirements for and possibilities of a direct integration of Pipeline Pilot and RDKit were explored.

Please note that this demo usecase is limited to Windows servers with an ActiveState Python distribution only as Python integration into Pipeline Pilot is achieved by usage of Windows Scripting Host (WSH) so far (see Limitations section). The attached example protocol demonstates in a first demo structure manipulation and calculation of descriptors in RDKit for structures which were passed from Pipeline Pilot before. In a second demo, the more critical passing of fingerprint data between the two tools is demonstrated in both directions. Please note that the HTML output files keep all intermediately generated fingerprint properties for demonstration purposes. Make shure to scroll to the far right end of the output tables not to miss any result information.

I would be interested to get feeback

  • if other users are successful with this installation,
  • if other users think this integration could be of general use
  • and if anybody sees possibilities to overcome the currently existing limitations of being bound to WSH, ActiveState and Microsoft Windows

Requirements:

  • Pipeline Pilot 8.0 or later

O/S:

  • Windows

Limitations:

  • Windows only,
  • ActiveState Python only (WSH interface required for Python on server integration),
  • Accelrys reports a small, unfixed per-record memory leak in ActiveState Python which may cause problems for larger datasets

Keywords (tags): RDKit, Python, scripting, integration, Open Source

Contents: RDKit on Server Example.xml, Readme.txt

Installation:

A) Installation of Python & RDKit

1. Install the most the recent ActiveState Python distribution on your Windows server (tested: 2.7.2.5 on WinXP SP3)

2. Install the most recent version of the Python Imaging Library (PIL; tested: 1.1.7)

3. Install the most recent version of the Numpy computing package for Python (tested: 1.6.1)

4. Install the most recent version of RDKit (tested: 2011_06_1.zip, extracted to C:)

5. Set the system environment variable RDBASE (tested: C:\RDKit_2011_06_1)

6. Set the system environment variable PYTHONPATH to %RDBASE%

7. Amend the system environment variable PATH by ;%RDBASE%\lib

B) Integration into Pipeline Pilot

1. In the Administration Portal, set a global property RDBASE to the RDKit path (tested: C:\RDKit_2011_06_1)

2. Unzip the archive.

3. Drag and drop the Example protocol into the Pipeline Pilot client.

4. Run the protocol.