Pipeline Pilot Spotlight: How I used BIOVIA Pipeline Pilot to Analyze Public COVID-19 Data

Authored by @MH 

The purpose of this post is to show how BIOVIA Pipeline Pilot components can be used to retrieve and process public data using Web Services. It is not intended to communicate any conclusions or inferences about the actual data.

The general approach can be summarized as:

  • Identify your objective
  • Find the appropriate data sources
  • Select the appropriate components to read the data
  • Remove unwanted data
  • Process and display the results

Data Sources

These sources have many services that return different different types and aggregations of data. It can take a bit of searching to find the ones that best fit your needs.

https://covid19api.com/  

https://data.cdc.gov

Components

A basic HTTP Connector can call REST services. In this case, the Source property is a simple string: https://api.covid19api.com/summary and the result is stored in a cache for future processing. You can click the link to see the JSON data structure.

Sometimes the Source is a more complex string that needs to be evaluated at runtime. In this case, a PilotScript component is needed to build the Source string:

 Source = @USALatest


Some data sources were in Web pages rather than a Web service. In those cases it was easier to copy the text from the page and paste it into a spreadsheet and read it into a cache with an Excel Reader:

We then get into the meat of the processing:

  • Read the cached JSON data
  • Convert the JSON into data records
  • Remove the Response data (do this after all HTTP calls)
  • Detach the //Countries node from the other data
  • Flatten the Countries data hierarchy
  • Join this Countries data in the Summary cache with the Economic Development data in the CountryEconDev cache by using a Join Data from Cache component and setting these properties:
    • JoinUsing: Country
    • CacheID: CountryEconDev

One branch of of the processing then:

  • Calculates summations of these fields categorized by EconLevel:
    • TotalConfirmed, TotalDeaths, NewConfirmed, NewRecovered, TotalRecovered
  • Uses PilotScript to create new fields for display:

Finally, we:

  • Sort the data on EconLevel
  • Send it to a Bar Chart report
  • Tile it horizontally with a Sortable Table from another branch (chart and table on the same row)
  • View it in a browser page


Other parts of the analysis display XY charts plotting the Death Rate vs. various factors for each country (smoking rate, obesity rate, health care costs, median age and population density) to see if there are any visual trends. There is also an R correlation matrix that calculates and displays correlations between each factor.

There's much more that can be done with BIOVIA Pipeline Pilot for chemistry, biology, data pipelining and AI/ML.

Pipeline Pilot ​​​​​​​Low Code No Code ​​​​​​​Data Sources ​​​​​​​Data Transformation ​​​​​​​Data Visualization