How to improve cache performance on a Distributed System
Pipeline Pilot relies on the File System to operate. On a distributed environment, we describe the implementation using Network File System (NFS). When configured according the documentation the system is robust and allows multiple Pipeline Pilot servers to cooperate.
There are two distributed implementation possible: GRID (Linux) and Load balaced. (Windows and Linux). This approach will work for both system.
How?
First you will need to configure the Local Temp Directory to point to a local folder. This can be done from the Admin Portal under Setup > Folder Locations. (Details and Instruction are at the bottom of this post)
In the example enclosed, I make use of the hidden parameter CacheCategory existing on the cache component. When this parameter is configured to point to \\\$(LocalJobTempDirectory) and the CacheID Scope is Cache Category, the cache component will make use of the implementation and the information will be stored locally on the execution server, saving the overhead time involved in the use of NFS.
Notice that as the cache is located on the executing host creating it and not accessible outside the node.
| Execution example | |
| Implementation | |
| Example Protocol |
Local Temp Directory: Details and Instructions.
(From the product documentation)
The local temporary directory can be used to locate temporary job files to a file system that is local to the running job. This is useful when the server directory is located on a shared resource such as an NFS mount. Using local temporary storage will improve protocol performance and reduce network loads on cluster and grid systems.
When running some applications in this way, you may be advised to reset this folder to a local path that is valid for each node of the cluster. Each job running on the cluster will then create a job-specific subfolder for temporary files, accessible in the protocol by the global variable "LocalJobTempDirectory". Note that job result files should not be written to this location, which is cleaned up at the end of the job.
By default, this temporary file folder is the same as the main jobs folder and, with this setting, the temporary files are all written to the standard job temporary file subfolder. (i.e. by default, the global variables "LocalJobTempDirectory" and "TempDir" indicate the same folder location.)
