※ Documentation

1. Content of the Database

    Base on the rationale of "seeing is believing", the MiCroKit database contains proteins that have been experimentally identified to be localized on Kinetochore, Centrosome and/or Midbody. All microkit proteins have been manually curated from PubMed.
    The MiCroKit database holds the information of such proteins, including names, accession identifiers in the public databases, and their host organisms, etc. The localization information of each protein and the respective literature are also provided. The functional annotations, e.g. functional domains and Gene Ontology terms, could be found in the protein browsing page of each protein, e.g. MCK-CE-00001. The full list of the fields of information and their respective meanings are provided in the next section of this document.
    Most of the previous studies have been focused on yeast and animals, and only a few of microkit proteins are discovered in plants. So the MiCroKit database version 2.0 provides the information of microkit proteins from seven organisms, i.e. Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster, Xenopus laevis, Mus musculus and Homo sapiens. More organisms will be added into our database in the future.

2. All the fields of information for each protein in this database

    The following fields can be accessed for each protein in the database:

ID The MiCroKit database ID of a protein. An ID starts with the "MCK" (means a curated MiCroKit protein), followed by the abbreviation of the organism name (two letters, SC: Saccharomyces cerevisiae, SP: Schizosaccharomyces pombe, CE: Caenorhabditis elegans, DM: Drosophila melanogaster, XL: Xenopus laevis, MM: Mus musculus, HS: Homo sapiens) , and the last five digits represent the accession-number of the protein. The three sections are deliminated with a hyphen, i.e. "-".
SP UniProt accession IDs
PI Theoretical/calculated PI (Compute pI/Mw from ExPASy)
MW Calculated molecular weight (Compute pI/Mw from ExPASy)
GP Genbank IDs of the protein sequence of each entry
GM Genbank IDs of the nucleotide sequence of each entry
PN Protein name, standard nomenclature
PA Protein synonyms/alias
GN Gene names of the protein (usually taken from UniProt)
GA Synonyms/alias for the protein (usually taken from UniProt)
DT Creation date of the entry
UD Dates of important updates of the entry
CL Sub-cellular localization of the protein. Only Kinetochore, Centrosome and Midbody are considered in our database.
OS The host organism of the protein
OX NCBI Taxa ID of the host organism
RF Primary references to report the proteins to be localized on Kinetochore, Centrosome and/or Midbody
FN Functional description of the protein (taken from UniProt)
UC Users comments. For furture update, Users could contact with us to add useful comments on each entry, to make the database more integrated
KW Keywords of the protein (taken from UniProt)
SS Sequence source (public database of the given sequence)
NL The length of the nucleotide sequence of a microkit protein
NS The full length of nucleotide (cDNA/mRNA) CDS (coding sequence) of a microkit protein
PL The length of protein sequence
PS Protein sequence of the entry
GO Gene Ontology annotations of the entry (curated from UniProt)
IP Functional domain annotations of the entry (Interpro, curated from UniProt)
PF Functional domain annotations of the entry (Pfam, curated from UniProt)
SM Functional domain annotations of the entry (SMART, curated from UniProt)
PS Functional domain annotations of the entry (PROSITE, curated from UniProt)
PR Functional domain annotations of the entry (PRINTS, curated from UniProt)

3. How to search the MiCroKit database

    Three searching options are provided in MiCroKit: Simple search, Advance search and BLAST search.

Simple search will present all entries containing the input keywords (separated by space character, for example, input "bub1 human" to find five records in MiCroKit database, although MCK-HS-00193 is the human Bub1 protein).

Advanced search allows you to input up to three terms to find the information more specifically. The querying fields can be empty if less terms are needed.The three terms could be connected by the following operators:

exclude: If selected, the term following this operator must be not contained in the specified field(s)
and: the term following this operator has to be included in the specified field(s)
or: either the preceding or the following term to this operator should occur in the specified field(s)

For example: if you want to search for human CENP-E protein specifically, you can try the query with: OS Homo sapiens (Human) and GN CENPE and CL kinetochore. (see in example figure)

BLAST search could be used to find the specific protein and/or related homologues by sequence alignment. This search-option will help you to find the querying protein accurately and fast.

4. How to browse the MiCroKit database.

    You can browse through the MiCroKit database instead of searching for a specific protein. In the browse page, all microkit proteins are sorted by a specific feature of MCK ID, SwissProt ID, organism, molecular weight, or pI. Each feature title is clickable, to re-sort the microkit proteins either ascending (U) or descending (D). Each page will show up to twenty entries.

5.Download the Database

    Users can also download the data of MiCroKit database for further analysis. The following files could be downloadable:

    All information of MiCroKit database in ASCII file (.zip or .tar.gz)

    Protein sequences of all microkit proteins in FASTA file format (.zip or .tar.gz)

    Nucleotide sequences of all microkit proteins in FASTA file format (.zip or .tar.gz)


¡ù CITATION:

For publication of results, please cite the following article:

MiCroKit: An Integrated Database of Midbody, Centrosome and Kinetochore
Yu Xue,
Fengfeng Zhou, Chuanhai Fu, Changjiang Jin, Xuebiao Yao and Ying Xu.
(Submitting)


Last update: June 5th, 2006
Copyright © 2005,2006 LCD, USTC, All Rights Reserved