Uniprot release cannot be readed with sequence reader

PG 2009-04-21

There is a bug at the bioperl level that avoid loading the uniprot files. The error involves the records with organism lines:

OC Bacteria; Proteobacteria; Alphaproteobacteria; Rhodospirillales; OC Acetobacteraceae; Acetobacter; Acetobacter subgen. Acetobacter.

The error message goes as:

MSG: The lineage 'Bacteria, Proteobacteria, Alphaproteobacteria, Rhodospirillales, Acetobacteraceae, Acetobacter, Acetobacter subgen, Acetobacter, Acetobacter aceti' had two non-consecutive nodes with the same name. Can't cope!

If you come across this error this is the workaround before the next release addressing this issue is out.

Here is what the example protocol does.

Read your Uniprot file with text reader, Set the end text delimiter as //. this will read each individual record. the filter will take the secuences producing the error out. Save it back the files will generate an Uniprot without errors that can be read back as sequence.