Hello,
With the advent of the NGS collection, we have had a few requests to handle files in the Variant Call Format (VCF) (http://www.1000genomes.org/node/101). The NGS developers are well aware of this request, and it is on their radar. But, till a beautiful, fully tested, reader is available in a released version of Pipeline Pilot, I took a stab at it with the attached protocol as the result. The protocol has the following features:
- Data in the Info column is broken out into separate properties; name for the Info properties coming from the Info column itself
- Data in the sample column(s) is broken out into separate properties; the name for the columns coming from the Format column and the sample name
- Manages the variable number of comment lines
- I tried two methods for this; one using only PilotScript, the other uses a reader/writer combination. In my testing the reader/writer combination works faster.
- I tried two methods for this; one using only PilotScript, the other uses a reader/writer combination. In my testing the reader/writer combination works faster.
- Takes into account that the sample property does not always have values for each of the ones listed in the Format property.
- Provides different names for the DP value in the Info property and the Format property, as they represent different values.
- Handles multiple Sample properties.
- Provides a report if any alternative values are indicated in the comment lines.
Hope this is helpful.
Jeannine