Unknown

Dataset Information

0

The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC).


ABSTRACT: The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5'-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline.

SUBMITTER: Gerhard DS 

PROVIDER: S-EPMC528928 | biostudies-literature | 2004 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC).

Gerhard Daniela S DS   Wagner Lukas L   Feingold Elise A EA   Shenmen Carolyn M CM   Grouse Lynette H LH   Schuler Greg G   Klein Steven L SL   Old Susan S   Rasooly Rebekah R   Good Peter P   Guyer Mark M   Peck Allison M AM   Derge Jeffery G JG   Lipman David D   Collins Francis S FS   Jang Wonhee W   Sherry Steven S   Feolo Mike M   Misquitta Leonie L   Lee Eduardo E   Rotmistrovsky Kirill K   Greenhut Susan F SF   Schaefer Carl F CF   Buetow Kenneth K   Bonner Tom I TI   Haussler David D   Kent Jim J   Kiekhaus Mark M   Furey Terry T   Brent Michael M   Prange Christa C   Schreiber Kirsten K   Shapiro Nicole N   Bhat Narayan K NK   Hopkins Ralph F RF   Hsie Florence F   Driscoll Tom T   Soares M Bento MB   Casavant Tom L TL   Scheetz Todd E TE   Brown-stein Michael J MJ   Usdin Ted B TB   Toshiyuki Shiraki S   Carninci Piero P   Piao Yulan Y   Dudekula Dawood B DB   Ko Minoru S H MS   Kawakami Koichi K   Suzuki Yutaka Y   Sugano Sumio S   Gruber C E CE   Smith M R MR   Simmons Blake B   Moore Troy T   Waterman Richard R   Johnson Stephen L SL   Ruan Yijun Y   Wei Chia Lin CL   Mathavan S S   Gunaratne Preethi H PH   Wu Jiaqian J   Garcia Angela M AM   Hulyk Stephen W SW   Fuh Edwin E   Yuan Ye Y   Sneed Anna A   Kowis Carla C   Hodgson Anne A   Muzny Donna M DM   McPherson John J   Gibbs Richard A RA   Fahey Jessica J   Helton Erin E   Ketteman Mark M   Madan Anuradha A   Rodrigues Stephanie S   Sanchez Amy A   Whiting Michelle M   Madari Anup A   Young Alice C AC   Wetherby Keith D KD   Granite Steven J SJ   Kwong Peggy N PN   Brinkley Charles P CP   Pearson Russell L RL   Bouffard Gerard G GG   Blakesly Robert W RW   Green Eric D ED   Dickson Mark C MC   Rodriguez Alex C AC   Grimwood Jane J   Schmutz Jeremy J   Myers Richard M RM   Butterfield Yaron S N YS   Griffith Malachi M   Griffith Obi L OL   Krzywinski Martin I MI   Liao Nancy N   Morin Ryan R   Palmquist Diana D   Petrescu Anca S AS   Skalska Ursula U   Smailus Duane E DE   Stott Jeff M JM   Schnerch Angelique A   Schein Jacqueline E JE   Jones Steven J M SJ   Holt Robert A RA   Baross Agnes A   Marra Marco A MA   Clifton Sandra S   Makowski Kathryn A KA   Bosak Stephanie S   Malek Joel J  

Genome research 20041001 10B


The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5'-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this proje  ...[more]

Similar Datasets

| S-EPMC186622 | biostudies-literature
| S-EPMC403720 | biostudies-literature
| S-EPMC5397604 | biostudies-literature
| S-EPMC151182 | biostudies-literature
| S-EPMC9618600 | biostudies-literature
| PRJEB37431 | ENA
| S-EPMC5054741 | biostudies-literature
| S-EPMC2222646 | biostudies-literature
| S-EPMC4370113 | biostudies-literature
2015-01-01 | GSE63424 | GEO