Why benchmarking is essential to accept optimizers

VW 2021-12-30

Developing optimizers is a different game than writing quill to derive attributes, set relations, create objects, import data, etc. When writing quill, it is in most cases clear what the end result should be and how it should be tested and accepted. However, when creating an optimizer, this is not so clear. Usually, when customers start testing an optimizer in a non-structured way - let's call it happy-optimizer-testing - they will start commenting about what they do (not) like about the plan. They might make comments like "OTIF is 90% and we want to have at least 95%", or "this machine should be fully utilized for the first 20 days and it isn't".

When confronted with such 'issues', many of our ORS colleagues will try to make sense out of such comments, and even start making changes to the optimizer to get something out of it that is more in line with what the customer wants. In many cases, only to find out that the 'fixed' optimizer will invite different, but similar comments. Soon, the ORS will find that he or she is pushing out bubbles out of the carpet - every bubble that is pushed, will introduce another bubble at some other spot.

This is why Quintiq introduced the benchmarking process, almost 10 years ago. What are the essential elements of benchmarking?

The benchmark is a static dataset
The data in the benchmark does not have sanity check violations
The quality of the plan in the benchmark is measured by the value of the total KPI
There are no hard constraint violations in the benchmark
There is an agreement with the customer that the optimizer should generate a plan that is within X% of the total KPI value of the benchmark (where X is typically 1 or 2%)
The optimizer gets a maximum running duration Y to get this result
The customer also gets a maximum 'puzzling time' to get the best possible value for the total KPI

So, what are the advantages of the above approach? Firstly, the ORS can aim at a fixed target, instead of a moving one, as the data is fixed and static. Secondly, the total KPI is the ultimate referee that determines whether a plan is good or bad - no subjective interpretations or loose anecdotal comments. Thirdly, the customer is required to 'prove' that the plan can indeed be improved, instead of making unjustified claims. And lastly, we are now all working with the same objective, which is the total KPI.

Of course, there's also drawbacks to this process:

When the optimizer works great on the benchmark dataset, this is no guarantee that it will work well on future data configurations. Optimizers are sensitive things and when data changes, results can go bad. And customer expect some robustness from our software, which is completely understandable. This problem can be mitigated by having multiple benchmarks with different configurations.
The total KPI might not always represent the business objectives of the customer well. So much time will be needed to tune the weights of the objective function. This process must be completed, before benchmarks can be created.

Albeit it has these drawbacks, the benchmarking approach is the only method that we know to make optimizer testing a finite process. It is objective and reproducible. Thanks to benchmarking in part, our delivery of optimization in projects has made a giant leap. It is crucial to explain to the customer from the early beginning, how we will approach this part of the project, and why we believe this will lead to a sound deliverable.