Hello everyone,
I have a text file reporting the drugs for different target proteins, with the following structure:
GENE_CATEGORY[1] Kringle domain
GENE_ID[1][1] PLG
GENE_NAME[1][1] Plasmin
GENE_UNIPROT_ACC[1][1] P00747
GENE_UNIPROT_ID[1][1] PLMN_HUMAN
GENE_DRUGBANK_ID[1][1][1] DB00513
GENE_GENERIC_NAME[1][1][1] aminocaproic acid
GENE_INVESTIGATIONAL[1][1][1] TRUE
GENE_SMALL_MOLECULE[1][1][1] TRUE
GENE_DRUG_DESC[1][1][1] drug
GENE_DRUGBANK_ID[1][1][2] DB0008
[...]
I would like to extract the information regarding the protein categories (under "GENE_CATEGORY") and drugbank IDs (GENE_DRUGBANK_ID) as two properties in PP. In the end I would like to have a table in PP like the following:
CATEGORY DRUGBANK_ID
Kringle domain DB00513
Kringle domain DB0008
[...] [...]
Note that the list I have is sorted so that probably there is no need to take into account the numbers within the square brackets, which tell to what protein each drug belongs; i.e. each DRUGBANK_ID belongs to the last GENE_CATEGORY found in the text before it.
Thanks for the help!
Matteo