In the beginning of this year, DELMIA Quintiq R&D released a whitepaper that describes various options for ensuring the availability of a DELMIA Quintiq solution. As I received various questions about the so called "warm standby" configuration, I would like to explain in this post how a fail over scenario with warm standby configuration works.
Startup sequence
To better understand the fail over scenario steps, it is important to first understand the startup sequence of a DELMIA Quintiq solution. To run any DELMIA Quintiq solution the following components are required:
- QDBI (aka QDBODB) - the database integrator that manages the Dataset Store (DSS)
- QAE - the application engine that manages data and business logic
- QTCE - the visualization engine that manages user sessions
- TC/WC - user interface that displays data and is used to trigger transactions
These components need to be started in sequence as they depend on each other: the QAE loads datasets via the QDBI, the QAE pushes data to the QTCE and QTCE pushes data to the TC/WC. This sounds simple but there is a lot more to it as can be seen in diagram below that describes the startup sequence:
Two steps that are on the critical path and can take relatively long are:
- Parse BL (Business Logic): loading and parsing of the object and UI definitions
- Load datasets: loading of relevant datasets from the DSS
The duration of the first step depends on the size of the model definitions. For example each quill library will be loaded sequentially and this can take up to several minutes. The duration of the second step depends on the number and size of the datasets. Datasets can be loaded with a very high speed (Gigabits per seconds) but this process can still takes minutes when loading large datasets. If propagation is needed, the loading time will be even longer.
Warm vs. cold standby configuration
In the cold standby configuration, a secondary environment is fully configured but is not running any DELMIA Quintiq component. In the warm standby configuration, a secondary environment is fully configured and also has all DELMIA Quintiq components running. The only difference with the primary environment is that the QAE and QTCE have loaded the business logic but not the datasets. In case of a switch between primary and secondary environment, the datasets will be loaded into the QAE and subsequently in the QTCE. Compared to a cold standby configuration, the warm standby configuration saves the time of:
- Starting the VM ( if not already running)
- Starting the services
- Loading the BL in QAE
- Loading the BL in QTCE
The timesavings may seem not significant but for business critical operations, several minutes can make a huge difference.
Warm standby failover scenario
The diagram below explains how the warm standby switch over works.
In case of a scenario where the primary QAE fails, the QTCE keeps running in read-only mode. This means that the users are still connected and can view data but cannot start new transactions. Then the secondary QAE receives a notification to unlock and load the datasets. Once the datasets are loaded in the QAE, the secondary QTCE will automatically receive the data due to the auto-subscribe feature. The primary QTCE is then stopped to trigger the reconnect functionality in the TC/WC. As the primary QTCE is not available anymore, it will then connect to the secondary QTCE. As soon as the session is created, the user can resume normal operations.
What is worth noting is that during the switch from primary to secondary environment, the TC/WC keeps running and no restart is needed. Also when connecting to the secondary TC/WC, the user does not need to log in again. So the worst case is that the user cannot make any changes for a limited time.
Prerequisites for warm standby configuration
To implement a warm standby configuration the following features are needed:
1. Quintiq Authentication Server (QAS)
The QAS is needed to manage user authentication independently from the primary and secondary environments.
2. Dataset mirroring
To minimize the dataset loading times, the dataset must be mirrored to the secondary environment. This ensures that the secondary environment is kept up to date and also makes the secondary environment fully independent of the primary environment.
3. Same model version
To avoid propagation after loading the datasets, the model version must be identical.
4. Orchestrator
Last but not least, a third party "orchestrator" such as Jenkins is needed to coordinate switch activities. In particular the following activities need to implemented:
- notify the secondary QAE to initiate the loading of the datasets
- monitor secondary QTC to detect when it is available
- stop the primary QTCE as soon as the secondary QTCE is available.
I hope that my post has helped in better understanding the warm standby configuration for DELMIA Quintiq. Please don't hesitate to contact me in case of any question regarding this or other high availability configurations.
