Alkane Chain Isomers and Enumeration

Name: Alkane Chain Enumeration

Author: Mike Cherry
Created: 11/2012

Purpose: Demonstrate different approaches to the creation of alkane chains

Requirements: Pipeline Pilot 8.5

Enumeration of chains, in this case alkane chains, has been the subject of a number of studies from both an algorithmic perspective of how to perform the enumeration and a more chemical perspective in knowing what isomers exist.

http://www.sadgurupublications.com/ContentPaper/2012/2_144_2%283%292012_ACPI.pdf

http://misterkgb.files.wordpress.com/2010/09/computerized-enumeration-of-the-alkane-series1.pdf

Attached are three different approaches to both finding and enumerating alkane chain isomers.

The first approach (carbonChainsFromChainAssemblies.xml) is based on the idea of finding what already exists, it breaks down known molecules into chain assemblies and manipulates these chain assemblies to give a set of alkane chain isomers. Whilst it has some advantages such as the fact that results are based on known molecules it is limited in scope in that results for larger chains are unlikely to be exhaustive, it is also time dependent on reading your original data source which for smaller chains can be inefficient.

The second approach presented (carbonChainsFromRG.xml) extends upon the first approach in that rather than reading an existing molecular datasource it first generates a source of molecules from a reaction file (RG File). This approach was found to give better coverage of the possible isomers but scaling to larger chains results in an ever increasing redundancy in what is produced and ever increasing time to run.

The final method attached (carbonChainsFromScratch.xml) is reaction based again but is more fundamental in its approach, simply adding atoms one at a time to an existing structure in an iterative fashion until a predefined chain limit is reached. This approach is exhaustive and found with carbon chains to give the correct number of isomers up to carbon numbers of 20 (not tested beyond this). It also scales well in time with the number of isomers produced for a given number of carbons. (eg 14 carbons = 1858 isomers in 6 seconds, 15 carbons = 4,347 isomers in 15 seconds, 16 carbons = 10,359 isomers in 39 seconds) The approach also has the benefit that you could by defining your own additional reactions introduce other elements and bond types into the chains potentially giving you a means to enumerate different chain assemblies from scratch.

Attachment: ChainEnumeration.zip