how to filter the data

Hi friends

1. In a protein dataset, a few sequences do not contain sequence information. What code should i use to filter such sequences. I tried using data record type validator component, but couldnt filter them

2. how to remove redundant sequences from the input and how can we remove the fragments if any. i hope, it can be done only using blast searches

3. How do we remove fragments [short peptide sequences  with length varying about a few residues to 100's residues