BIOVIA Direct: Formula searches require the number of atoms to be specified

BIOVIA Direct allows, among other things, searches for molecules with an exact or a substructure formula in Oracle. For instance, a typical substructure formula search looks like this:

 

select ID from molecules where fmlalike (ctab,'C O') = 1;

 

The sequence 'C O' in the SQL above represents the search term for the structural formula. 

 

A common misconception is that the BIOVIA Direct search for structural formulas is a simple substring comparison. Users therefore often expect that the above search term will equally find molecules with the formulas 'C2H4O2' and 'C6H5NO2'. However, the BIOVIA Direct formula search is applying some chemical intelligence when parsing the search term. 

 

In particular in addition to the atom symbol, the - implicit or explicit - atom count for each element is always taken into account and therefore must be specified unless the desired atom count is 1. In the search term of the above example, no explicit atom count is given for any of the elements, so BIOVIA Direct assumes an implicit atom count of 1 for both elements. The above search will therefore only find molecules that have exactly 1 carbon atom and 1 oxygen atom, and thus none of the expected molecules will be hit.

 

Users can enter a single number per element for the atom count in the search term, or a range. Both, a single number and the range ,are placed after the atom, with the range being embraced by brackets. The search term for a relatively narrow substructure formula search for the two molecules with the example formulas 'C2H4O2' or 'C6H5NO2' would therefore look like this:

 

select ID from molecules where fmlalike (ctab,'C(2-6) O2') = 1;