Request for Comment - BIOVIA Direct and SMILES with enhanced stereo chemistry extensions

Background

Simplified Molecular Input Line Entry Specification (SMILES) is an ASCII-based line notation to describe chemical structures. The standard SMILES does not include descriptors for advanced molecular features like enhanced stereochemistry as introduced by BIOVIA.

CXSMILES is an extension of the original SMILES and introduces, amongst others, descriptors for enhanced stereochemistry. With the upcoming 2022 release, BIOVIA will add the capability to create SMILES with the enhanced stereochemistry extension as defined in CXSMILES to the Pipeline Pilot Chemistry library and components.

Request for Comment

BIOVIA Direct provides the operator 'smiles' and the function 'mdlaux.smiles' to generate SMILES strings from molecules. Currently, these do not support the CXSMILES extensions for enhanced stereochemistry; if you pass a molecule with enhanced stereochemistry to the 'smiles' operator or the 'mdlaux.smiles' function they will return NULL and an additional error 'MDL-2046: SMILES generation failed: Molecule contains enhanced stereochemistry' in the BIOVIA Direct error buffer. 

BIOVIA plans to add the capability to generate SMILES strings with CXSMILES extensions for enhanced stereochemistry to these operators and functions. For that, we are interested in our customers' preferences on the implementation, in particular with regard to backward compatibility to the current behaviour. As a start for discussion, we are thinking about the following implementation.


1) In general, the operator/function will return one of the following three values for molecules with enhanced stereochemistry:

a) NULL, and an additional error in the Direct error buffer - this corresponds to the current behaviour of Direct. This option avoids conflicts with established workflows that use these operators/functions.

b) the standard SMILES string, plus the CXSMILES extension for enhanced stereochemistry - the most accurate value, with the potential to fail with 3rd party tools or established workflows.

c) the standard SMILES string, calculated after converting the enhanced stereochemistry to absolute stereochemistry - the most compatible value, but not neccessarily accurate with regard to stereochemistry. This is the current default for BIOVIA Draw and Pipeline Pilot.


2) The default output value

We would like to know which default output option - from the three options listed in 1 - you would prefer?


3) The output value of the operator/function can be switched by one of these two options.

a) We will provide a new Direct flag (valid for the current Oracle session) and a new Direct global property (valid for any Direct session) that sets the output format.

b) Both, smiles and mdlaux.smiles, currently take a second optional argument 'noncanonical'. This optional argument will be extended to take an additional string that sets the output.

Which of the options would you prefer?