Chemical Registration: Compounds are not registered sequentially during bulk registration

Chemical Registration users attempting file registration will find that the compounds in the SDFile are not registered sequentially.

This is expected and the application does not handle registrations this way. One scenario that could cause gaps in Corporate IDs is single registrations occurring at the same time while bulk jobs are running.

Even if there are no single registrations occurring at the same time while bulk jobs are running, users will see gaps in in Corporate IDs. ChemReg Developer explains it as follows:

“Bulk registration records are stored in a staging table, from which they are processed and registered by an independent process. By default, the process doing the registration runs several concurrent threads registering the submissions.

Each thread will process a block of records. For example, the first thread will register the first 20 records and the second thread will register the second block of 20 records. This means the first record from the first block of 20 records will be processed at the same time as the first record from the second block of 20 records. Corporate IDs are assigned as records are registered, so the first record would receive the first corporate identifier, but the 21st record would receive the 2nd corporate identifier.

Records in a large job would therefore not receive sequential identifiers based on the position in the job. Turning off parallel processing would avoid this, but would reduce the overall throughput of records being registered.

The other problem is the requirement to not allow one job to block registration of records from other jobs. So when multiple jobs are queued, records are selected from multiple jobs. This means Corporate IDs are then generated sequentially but used across multiple jobs. So any individual job will not get a contiguous sequence of corporate identifiers assigned to its records.”

The Registration product team is looking into this issue with the aim of achieving sequential registration and if this is possible, the fix could be available in a future release of Chemical Registration application.