Authored by @MH
The purpose of this post is to show how BIOVIA Pipeline Pilot components can be used to retrieve and process public data using Web Services. It is not intended to communicate any conclusions or inferences about the actual data.
The general approach can be summarized as:
- Identify your objective
- Find the appropriate data sources
- Select the appropriate components to read the data
- Remove unwanted data
- Process and display the results
Data Sources
These sources have many services that return different different types and aggregations of data. It can take a bit of searching to find the ones that best fit your needs.
Components
A basic HTTP Connector can call REST services. In this case, the Source property is a simple string: https://api.covid19api.com/summary and the result is stored in a cache for future processing. You can click the link to see the JSON data structure.
Sometimes the Source is a more complex string that needs to be evaluated at runtime. In this case, a PilotScript component is needed to build the Source string:
Source = @USALatest
Some data sources were in Web pages rather than a Web service. In those cases it was easier to copy the text from the page and paste it into a spreadsheet and read it into a cache with an Excel Reader:
We then get into the meat of the processing:
- Read the cached JSON data
- Convert the JSON into data records
- Remove the Response data (do this after all HTTP calls)
- Detach the //Countries node from the other data
- Flatten the Countries data hierarchy
- Join this Countries data in the Summary cache with the Economic Development data in the CountryEconDev cache by using a Join Data from Cache component and setting these properties:
- JoinUsing: Country
- CacheID: CountryEconDev
One branch of of the processing then:
- Calculates summations of these fields categorized by EconLevel:
- TotalConfirmed, TotalDeaths, NewConfirmed, NewRecovered, TotalRecovered
- Uses PilotScript to create new fields for display:
Finally, we:
- Sort the data on EconLevel
- Send it to a Bar Chart report
- Tile it horizontally with a Sortable Table from another branch (chart and table on the same row)
- View it in a browser page
Other parts of the analysis display XY charts plotting the Death Rate vs. various factors for each country (smoking rate, obesity rate, health care costs, median age and population density) to see if there are any visual trends. There is also an R correlation matrix that calculates and displays correlations between each factor.
There's much more that can be done with BIOVIA Pipeline Pilot for chemistry, biology, data pipelining and AI/ML.
Pipeline Pilot Low Code No Code Data Sources Data Transformation Data Visualization