Pipeline Pilot Python Jupyter Notebook

MC 2020-08-17

Pipeline Pilot 2020 introduced a component allowing developers to utilize Jupyter Notebook (https://jupyter.org/index.html) as a means to build Python scripts within the context of Pipeline Pilot.

(Components\Database and Application Integration\Utilities\Prototypes\Application Integration\Jupyter Notebook\Python Jupyter Notebook (on Server))

The installation comes with an example protocol (Protocols\Examples\Integration\Programming\Python\Jupyter Notebook\Filter Using DataFrame) demonstrating the use of the Jupyter Notebook component. To fully appreciate its use load the example protocol into the Pipeline Pilot client, run the protocol to load the data into the Jupyter Notebook components, then select either of the “Python Jupyter Notebook (on Server)” components and click to open the notebook components script parameter.

Note: your Pipeline Pilot administrator will need to have enabled the Jupyter Notebook web service. Also note that to execute the cells within the Jupyter Notebook you need to have run the protocol (pipe containing the Jupyter Notebook component) prior to being able to execute the cells in the notebook.

The notebook, which is divided into cells, starts with an explanation of the requirements for use within Pipeline Pilot and then loads some libraries to facilitate that usage. Executing a cell allows you as a developer to quickly get a handle on the data present and available for the script, remembering to always start execution from the top to ensure all dependencies have been run eg libraries loaded, prior to executing the current cell, or simply use one of the “Cell” menu options such as “Run All”.

To view the actual data at any time simply enter the name of the variable of interest and execute the cell.

Attached to this post is a simple example using KMeans clustering from the scikit-learn package (https://scikit-learn.org/stable/modules/clustering.html)