Lessons learned from a DELMIA Quintiq implementation

At the end of last year, one of our customers encountered severe disruptions in their DELMIA Quintiq solution that severely impacted their business. In response to this, a task force was formed consisting of members from DELMIA Quintiq R&D and Services to address the problems and thanks to their hard work and dedication, they were able to overcome all issues regain the trust of the customer in the solution and DS as solution provider. Now the dust has settled, it's time to look back on the incidents and think about the lessons learned from this project and to share this with the DELMIA Quintiq community. 

 

1. Keep up to date with the latest patches

The first thing we found was that the customer encountered several issues that were in fact already solved in later releases of the DELMIA Quintiq software. R&D releases patches every quarter that contain fixes for problems found in the software. Staying up to date with the latest patch release is the best way to reduce the risk of software issues. It therefore recommended to keep up to date with the latest patch release. Upgrading the software requires some effort and therefore this should be taken into account in the project budget. 

 

2. Perform a regular Load and Performance test

Quite frequently solutions are delivered in phases. In this customer project, load and performance tests were executed during the initial project but not anymore during extension projects that followed after that. As the solution grew over time in functionality and data, the impact on performance was not properly assessed until after go live of the extensions. This then leads to unexpected behavior and failures that could have been found at an early stage. Therefore the lesson learned is to re-execute the L&PT on every major delivery of the solution. 

 

3. Establish a process of continuous improvement

During log file analysis, we found that there was a problem with a knowledge table that caused a full propagate on the planning data sets on every restart of the application engine. As a result, the start time of the application time was much longer than necessary. We also found was the data sets contained a huge amount of debugging information which was not cleaned up. Because of that, the memory usage grew over time and also loading times increased considerably. A process of regular log file analysis would have detected these issues at an early stage before they become a problem. Spending half an hour every week to scan log files and model overview is key to ensuring the proper functioning of the solution. As a solution partner, we of course need to educate the customers on how to perform such a scan. 

 

4. Perform risk-impact analysis

An ad-choc update of a software component on the application server caused the connections to the visualization engines and remote job servers to be broken. After the update, all engines terminated and restarted again automatically, creating a huge load on the application engine which led to out of memory exceptions. It shows that the connection between application server and dependent engines is quite critical and therefore precautions must be taken to avoid disruptions in this connection. The key to ensuring the availability of the solution is therefore to understand the technical architecture of the solution and based upon that take precautions in the infrastructure to prevent issues from occurring.

 

I hope you enjoyed this post. Please let me know your comments or questions.