BioDB

Name: BioDB
Author: Pedro Gomez Fabre
Version: 1.0
Created: 08/2007
Modified: 08/2007

Purpose: Retrieves DNA/Protein Sequence reference server, NCBI and EXPASY. There are two components needed to be linked together any of the component on "Database Locations Settings" with the FTP retrieval component.

Components:
Folder (Database Locations Settings):

For NCBI components base on the extension, retrieve the corresponding sequence:
PRI - primate sequences
ROD - rodent sequences
MAM - other mammalian sequences
VRT - other vertebrate sequences
INV - invertebrate sequences
PLN - plant, fungal, and algal sequences
BCT - bacterial sequences
VRL - viral sequences
PHG - bacteriophage sequences
SYN - synthetic sequences
UNA - unannotated sequences
EST - EST sequences (expressed sequence tags)
PAT - patent sequences
STS - STS sequences (sequence tagged sites)
GSS - GSS sequences (genome survey sequences)
HTG - HTG sequences (high-throughput genomic sequences)
HTC - unfinished high-throughput cDNA sequencing
ENV - environmental sampling sequences

Based on the existing division at the NCBI, the set parameters are collected on the following components

- bacterial sequences from NCBI.xml
- bacteriophage sequences from ncbi.xml
- daily update from ncbi (nt).xml
- daily update from ncbi (proteins).xml
- est sequences from ncbi.xml
- gss sequences from ncbi.xml
- htc - unfinished high-throughput cdna sequencing from ncbi.xml
- htg sequences from ncbi.xml
- invertebrate sequences from ncbi.xml
- other mammalian sequences from ncbi.xml
- other vertebrate sequences from ncbi.xml
- patent sequences from ncbi.xml
- plant, fungal, and algal sequences from ncbi.xml
- primate sequences from ncbi.xml
- rodent sequences from ncbi.xml
- sts sequences from ncbi.xml
- synthetic sequences from ncbi.xml
- unannotated sequences from ncbi.xml
- viral sequences from ncbi.xml

For Uniprot sequences there are four components retrieving only swiss-prot and swiss-prot & trembl in flat format and xml format.

- uniprotkb-swiss-prot data set in xml from expasy.xml
- uniprotkb-swiss-prot from expasy (flat format).xml
- uniprotkb-trembl data from expasy (flat format).xml
- uniprotkb-trembl data set in xml from expasy.xml

Folder (FTP retrieval):
- Retrieve from remote location (FTP).xml
This component is a modified version of the FTP component with the addition of a mutiple connection feature, to avoid the TimeOut given by normal FTP component while retrieving big files from NCBI.
For small files select single connection.

WHAT IF YOUR FILE IS ALREADY IN YOUR SYSTEM?:
The component compare the size and timestamp of the files on the server and the Pipeline Pilot location. If the file is the same, skips it from the download process. If the file is new or the size has changed, it will retrieve the new version.

As output of this component a list of parameters with the retrieved files is passed out.

Protocols:
Retrieve UNA division from NCBI.xml
Sample protocol for sequence retrieval. The first time will display the file on HTML.
If you repeat the process right after you will see an empty HTML report. showing that
no files are retrieved.

Requirements: Pipeline Pilot 6.1.1
O/S: PP Server Windows and Linux
PP client Windows
Limitations: None
Keywords: NCBI, Uniprot, FTP retrieval
Contents:
Components:
Folder (Database Locations Settings):
- bacterial sequences from NCBI.xml
- bacteriophage sequences from ncbi.xml
- daily update from ncbi (nt).xml
- daily update from ncbi (proteins).xml
- est sequences from ncbi.xml
- gss sequences from ncbi.xml
- htc - unfinished high-throughput cdna sequencing from ncbi.xml
- htg sequences from ncbi.xml
- invertebrate sequences from ncbi.xml
- other mammalian sequences from ncbi.xml
- other vertebrate sequences from ncbi.xml
- patent sequences from ncbi.xml
- plant, fungal, and algal sequences from ncbi.xml
- primate sequences from ncbi.xml
- rodent sequences from ncbi.xml
- sts sequences from ncbi.xml
- synthetic sequences from ncbi.xml
- unannotated sequences from ncbi.xml
- viral sequences from ncbi.xml
- uniprotkb-swiss-prot data set in xml from expasy.xml
- uniprotkb-swiss-prot from expasy (flat format).xml
- uniprotkb-trembl data from expasy (flat format).xml
- uniprotkb-trembl data set in xml from expasy.xml
Folder (FTP retrieval):
- Retrieve from remote location (FTP).xml
Protocols:
- Retrieve UNA division from NCBI.xml

Installation: Drag and drop the components on the component area. Do as appropriate with protocols.