Info from text file | Dassault Systèmes®

HP 2015-02-04

BIOVIA Pipeline Pilot

Hello everyone,

I have a text file reporting the drugs for different target proteins, with the following structure:

GENE_CATEGORY[1] Kringle domain

GENE_ID[1][1] PLG

GENE_NAME[1][1] Plasmin

GENE_UNIPROT_ACC[1][1] P00747

GENE_UNIPROT_ID[1][1] PLMN_HUMAN

GENE_DRUGBANK_ID[1][1][1] DB00513

GENE_GENERIC_NAME[1][1][1] aminocaproic acid

GENE_INVESTIGATIONAL[1][1][1] TRUE

GENE_SMALL_MOLECULE[1][1][1] TRUE

GENE_DRUG_DESC[1][1][1] drug

GENE_DRUGBANK_ID[1][1][2] DB0008

[...]

I would like to extract the information regarding the protein categories (under "GENE_CATEGORY") and drugbank IDs (GENE_DRUGBANK_ID) as two properties in PP. In the end I would like to have a table in PP like the following:

CATEGORY DRUGBANK_ID

Kringle domain DB00513

Kringle domain DB0008

[...] [...]

Note that the list I have is sorted so that probably there is no need to take into account the numbers within the square brackets, which tell to what protein each drug belongs; i.e. each DRUGBANK_ID belongs to the last GENE_CATEGORY found in the text before it.

Thanks for the help!

Matteo