Unknown

Dataset Information

0

Integrative annotation of 21,037 human genes validated by full-length cDNA clones.


ABSTRACT: The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.

SUBMITTER: Imanishi T 

PROVIDER: S-EPMC393292 | biostudies-literature | 2004 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Integrative annotation of 21,037 human genes validated by full-length cDNA clones.

Imanishi Tadashi T   Itoh Takeshi T   Suzuki Yutaka Y   O'Donovan Claire C   Fukuchi Satoshi S   Koyanagi Kanako O KO   Barrero Roberto A RA   Tamura Takuro T   Yamaguchi-Kabata Yumi Y   Tanino Motohiko M   Yura Kei K   Miyazaki Satoru S   Ikeo Kazuho K   Homma Keiichi K   Kasprzyk Arek A   Nishikawa Tetsuo T   Hirakawa Mika M   Thierry-Mieg Jean J   Thierry-Mieg Danielle D   Ashurst Jennifer J   Jia Libin L   Nakao Mitsuteru M   Thomas Michael A MA   Mulder Nicola N   Karavidopoulou Youla Y   Jin Lihua L   Kim Sangsoo S   Yasuda Tomohiro T   Lenhard Boris B   Eveno Eric E   Suzuki Yoshiyuki Y   Yamasaki Chisato C   Takeda Jun-ichi J   Gough Craig C   Hilton Phillip P   Fujii Yasuyuki Y   Sakai Hiroaki H   Tanaka Susumu S   Amid Clara C   Bellgard Matthew M   Bonaldo Maria de Fatima Mde F   Bono Hidemasa H   Bromberg Susan K SK   Brookes Anthony J AJ   Bruford Elspeth E   Carninci Piero P   Chelala Claude C   Couillault Christine C   de Souza Sandro J SJ   Debily Marie-Anne MA   Devignes Marie-Dominique MD   Dubchak Inna I   Endo Toshinori T   Estreicher Anne A   Eyras Eduardo E   Fukami-Kobayashi Kaoru K   Gopinath Gopal R GR   Graudens Esther E   Hahn Yoonsoo Y   Han Michael M   Han Ze-Guang ZG   Hanada Kousuke K   Hanaoka Hideki H   Harada Erimi E   Hashimoto Katsuyuki K   Hinz Ursula U   Hirai Momoki M   Hishiki Teruyoshi T   Hopkinson Ian I   Imbeaud Sandrine S   Inoko Hidetoshi H   Kanapin Alexander A   Kaneko Yayoi Y   Kasukawa Takeya T   Kelso Janet J   Kersey Paul P   Kikuno Reiko R   Kimura Kouichi K   Korn Bernhard B   Kuryshev Vladimir V   Makalowska Izabela I   Makino Takashi T   Mano Shuhei S   Mariage-Samson Regine R   Mashima Jun J   Matsuda Hideo H   Mewes Hans-Werner HW   Minoshima Shinsei S   Nagai Keiichi K   Nagasaki Hideki H   Nagata Naoki N   Nigam Rajni R   Ogasawara Osamu O   Ohara Osamu O   Ohtsubo Masafumi M   Okada Norihiro N   Okido Toshihisa T   Oota Satoshi S   Ota Motonori M   Ota Toshio T   Otsuki Tetsuji T   Piatier-Tonneau Dominique D   Poustka Annemarie A   Ren Shuang-Xi SX   Saitou Naruya N   Sakai Katsunaga K   Sakamoto Shigetaka S   Sakate Ryuichi R   Schupp Ingo I   Servant Florence F   Sherry Stephen S   Shiba Rie R   Shimizu Nobuyoshi N   Shimoyama Mary M   Simpson Andrew J AJ   Soares Bento B   Steward Charles C   Suwa Makiko M   Suzuki Mami M   Takahashi Aiko A   Tamiya Gen G   Tanaka Hiroshi H   Taylor Todd T   Terwilliger Joseph D JD   Unneberg Per P   Veeramachaneni Vamsi V   Watanabe Shinya S   Wilming Laurens L   Yasuda Norikazu N   Yoo Hyang-Sook HS   Stodolsky Marvin M   Makalowski Wojciech W   Go Mitiko M   Nakai Kenta K   Takagi Toshihisa T   Kanehisa Minoru M   Sakaki Yoshiyuki Y   Quackenbush John J   Okazaki Yasushi Y   Hayashizaki Yoshihide Y   Hide Winston W   Chakraborty Ranajit R   Nishikawa Ken K   Sugawara Hideaki H   Tateno Yoshio Y   Chen Zhu Z   Oishi Michio M   Tonellato Peter P   Apweiler Rolf R   Okubo Kousaku K   Wagner Lukas L   Wiemann Stefan S   Strausberg Robert L RL   Isogai Takao T   Auffray Charles C   Nomura Nobuo N   Gojobori Takashi T   Sugano Sumio S  

PLoS biology 20040420 6


The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integr  ...[more]

Similar Datasets

| S-EPMC2222646 | biostudies-literature
| S-EPMC2608845 | biostudies-literature
| S-EPMC403704 | biostudies-literature
| S-EPMC528924 | biostudies-literature
| S-EPMC1088967 | biostudies-literature
| S-EPMC2866332 | biostudies-literature
| S-EPMC8513462 | biostudies-literature
| S-EPMC6085635 | biostudies-literature
| S-EPMC151182 | biostudies-literature
| S-EPMC9316252 | biostudies-literature