R Custom Script for Each Data

If you have the R Statistics collection and have learned enough about R to write simple scripts, you may have used the R Custom Script component. This component places the entire data stream into an R data frame and applies your script to it.

In some cases, you may instead wish to apply an R script to each data record separately, rather than the data stream as a whole. Currently you can do this using a run-to-completion subprotocol. However, this approach incurs the overhead of starting and shutting down R for every data record. In some cases, this can greatly slow down the calculation.

I have implemented an R Custom Script for Each Data component which applies an R script to each data record while only running R once. The component is attached, along with a simple example that uses the component to perform a logistic regression on each of a set of dose-response data. For the component to work, you must be running Pipeline Pilot version 7.0 or greater and have the R Statistics collection installed.

This component may be added to the R Statistics collection in a future Pipeline Pilot release. (Thanks to Sarbanes-Oxley, I can't be more specific than this.) Meanwhile, please let us know if you find it useful or how we might improve it. Nancy Latimer wants me to be sure to mention that the component "makes it really easy to manipulate Gene Expression data in Pipeline Pilot." So if you're a Gene Expression collection customer, you may be especially interested in having a look.