Unknown

Dataset Information

0

Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.


ABSTRACT: The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.

SUBMITTER: Pujar S 

PROVIDER: S-EPMC5753299 | biostudies-literature | 2018 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

Pujar Shashikant S   O'Leary Nuala A NA   Farrell Catherine M CM   Loveland Jane E JE   Mudge Jonathan M JM   Wallin Craig C   Girón Carlos G CG   Diekhans Mark M   Barnes If I   Bennett Ruth R   Berry Andrew E AE   Cox Eric E   Davidson Claire C   Goldfarb Tamara T   Gonzalez Jose M JM   Hunt Toby T   Jackson John J   Joardar Vinita V   Kay Mike P MP   Kodali Vamsi K VK   Martin Fergal J FJ   McAndrews Monica M   McGarvey Kelly M KM   Murphy Michael M   Rajput Bhanu B   Rangwala Sanjida H SH   Riddick Lillian D LD   Seal Ruth L RL   Suner Marie-Marthe MM   Webb David D   Zhu Sophia S   Aken Bronwen L BL   Bruford Elspeth A EA   Bult Carol J CJ   Frankish Adam A   Murphy Terence T   Pruitt Kim D KD  

Nucleic acids research 20180101 D1


The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are ge  ...[more]

Similar Datasets

| S-EPMC7055050 | biostudies-literature
| S-EPMC3308164 | biostudies-literature
| S-EPMC6021557 | biostudies-literature
| S-EPMC5502359 | biostudies-literature
| S-EPMC7289301 | biostudies-literature
| S-EPMC3965069 | biostudies-literature
| S-EPMC8728265 | biostudies-literature
| S-EPMC8515502 | biostudies-literature
| S-EPMC4527852 | biostudies-literature
| S-EPMC7048090 | biostudies-literature