Forcing Pipeline Pilot to Release Memory

IK 2015-01-30

Sometimes it is useful to be able to have more control over how memory is allocated and used in Pipeline Pilot. One way you can do this is by using a parallel subprotocol that runs on your local server. Though this feature is generally used to take advantage of multiple CPUs for performance increases, it can also be used to ensure that a subprotocol's memory footprint gets cleaned up. This is because each batch is run as a separate process on the server and after that batch completes, that process is cleaned up.

To set this up, you first have to create a subprotocol, which you can do be selecting your memory-heavy components, right-clicking and selecting "Collapse to Subprotocol". Then, make sure the new subprotocol is selected and click on the "Implementation" tab and change "Parallel Processing Options" to true. Now, you will want to specify your server name and the number of processes you want to run. If you only have a memory issue, then I would recommend using only one process. Remember that the more processes you have, the more system memory you will be using. For batch size, you will want to select a number that is low enough to fit within your memory constraints, but high enough that the small bit of latency from the overhead doesn't affect your run time too much. This is something that you may have to play around with to get it just right. If everything is setup correctly, you should now be able to run your protocol and see that your records are being processed in some multiple of your batch size.

If you have any other tips or tricks related to this topic, please comment below. Thank you!