3. cPSORTdb
cPSORTdb is a dataset of protein localizations that were predicted by computational methods. Currently, cPSORTdb contains over 73,100 bacterial and archaeal replicons available through the NCBI RefSeq database that have been analyzed by PSORTb version 3.0. New genomes are automatically analyzed and added to the database as they become available from the NCBI database.
The long format predictions generated by the PSORTb versions 3.0 method are stored in cPSORTdb and are fully browsable and searchable.
PSORTb v.3.0 is designed for Archaea, Gram-negative and Gram-positive bacterial proteins and consists of multiple analytical modules:
SCL-BLAST & SCL-BLASTe, or SubCellular Localization BLAST
Support Vector Machines (SVMs)
Motif & Profile Analysis
Outer Membrane Motif Analysis
ModHMM
Signal Peptide
Each module analyzes one biological feature known to influence or be characteristic of subcellular localization. The modules may act as a binary predictor, classifying a protein as either belonging or not belonging to a particular localization site, or they may be multi-category, able to assign a protein to one of several localization sites.
In order to generate a final prediction, the results of each module are combined and assessed. A probabilistic method and 5-fold cross validation were used to assess the likelihood of a protein being at a specific localization given the prediction of a certain module. These likelihoods are used to generate a probability value for each of the five localization sites for a user's query protein.
When analyzing either a Gram-negative organism or an organism that stains Gram-positive but has an outer membrane, the 5 possible localization sites are:
For a Gram-positive or an archaeal organism, the 4 possible localization sites are:
In addition, for organisms that stain Gram-negative but have no outer membrane, 3 localizations are predicted:
PSORTb returns a list of these localization sites and the associated probability value for each, ranked in descending order. A cutoff of 7.5 or above is used to return a final prediction, otherwise a result of "Unknown" is returned.
In addition, PSORTb returns sub-category localizations (proteins targeted to bacterial organelles or the host cell), as detected by SCL-BLAST module. The following secondary localizations are predicted:
See here for information on the database fields associated with PSORTb v.3.0, or read more information about PSORTb v.3.0 modules.
Submit a protein to PSORTb v.3.0.
We're always looking for new proteins to add to our database!
If you think you've got a good candidate, please submit it to us!
Interested in hearing about our latest updates? Enter your email below to subscribe to our mailing list!