Dataset Information

Current status and new features of the Consensus Coding Sequence database.

ABSTRACT: The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.

SUBMITTER: Farrell CM

PROVIDER: S-EPMC3965069 | biostudies-literature | 2014 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Current status and new features of the Consensus Coding Sequence database.

Farrell Catherine M CM O'Leary Nuala A NA Harte Rachel A RA Loveland Jane E JE Wilming Laurens G LG Wallin Craig C Diekhans Mark M Barrell Daniel D Searle Stephen M J SM Aken Bronwen B Hiatt Susan M SM Frankish Adam A Suner Marie-Marthe MM Rajput Bhanu B Steward Charles A CA Brown Garth R GR Bennett Ruth R Murphy Michael M Wu Wendy W Kay Mike P MP Hart Jennifer J Rajan Jeena J Weber Janet J Snow Catherine C Riddick Lillian D LD Hunt Toby T Webb David D Thomas Mark M Tamez Pamela P Rangwala Sanjida H SH McGarvey Kelly M KM Pujar Shashikant S Shkeda Andrei A Mudge Jonathan M JM Gonzalez Jose M JM Gilbert James G R JG Trevanion Stephen J SJ Baertsch Robert R Harrow Jennifer L JL Hubbard Tim T Ostell James M JM Haussler David D Pruitt Kim D KD

Nucleic acids research 20131111 Database issue

The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellc ...[more]

PMID: 24217909

Similar Datasets

Project description:A comprehensive in silico analysis of 71 species representing the different taxonomic classes and physiological genre of the domain Archaea was performed. These organisms differed in their physiological attributes, particularly oxygen tolerance and energy metabolism. We explored the diversity and similarity in the codon usage pattern in the genes and genomes of these organisms, emphasizing on their core cellular pathways. Our thrust was to figure out whether there is any underlying similarity in the design of core pathways within these organisms. Analyses of codon utilization pattern, construction of hierarchical linear models of codon usage, expression pattern and codon pair preference pointed to the fact that, in the archaea there is a trend towards biased use of synonymous codons in the core cellular pathways and the Nc-plots appeared to display the physiological variations present within the different species. Our analyses revealed that aerobic species of archaea possessed a larger degree of freedom in regulating expression levels than could be accounted for by codon usage bias alone. This feature might be a consequence of their enhanced metabolic activities as a result of their adaptation to the relatively O2-rich environment. Species of archaea, which are related from the taxonomical viewpoint, were found to have striking similarities in their ORF structuring pattern. In the anaerobic species of archaea, codon bias was found to be a major determinant of gene expression. We have also detected a significant difference in the codon pair usage pattern between the whole genome and the genes related to vital cellular pathways, and it was not only species-specific but pathway specific too. This hints towards the structuring of ORFs with better decoding accuracy during translation. Finally, a codon-pathway interaction in shaping the codon design of pathways was observed where the transcription pathway exhibited a significantly different coding frequency signature.

Dataset Information

Current status and new features of the Consensus Coding Sequence database.

Publications

Current status and new features of the Consensus Coding Sequence database.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets