Unknown

Dataset Information

0

Whole-genome shotgun assembly and comparison of human genome assemblies.


ABSTRACT: We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304-1351; International Human Genome Sequencing Consortium (2001) Nature 409, 860-921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats.

SUBMITTER: Istrail S 

PROVIDER: S-EPMC357027 | biostudies-literature | 2004 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Whole-genome shotgun assembly and comparison of human genome assemblies.

Istrail Sorin S   Sutton Granger G GG   Florea Liliana L   Halpern Aaron L AL   Mobarry Clark M CM   Lippert Ross R   Walenz Brian B   Shatkay Hagit H   Dew Ian I   Miller Jason R JR   Flanigan Michael J MJ   Edwards Nathan J NJ   Bolanos Randall R   Fasulo Daniel D   Halldorsson Bjarni V BV   Hannenhalli Sridhar S   Turner Russell R   Yooseph Shibu S   Lu Fu F   Nusskern Deborah R DR   Shue Bixiong Chris BC   Zheng Xiangqun Holly XH   Zhong Fei F   Delcher Arthur L AL   Huson Daniel H DH   Kravitz Saul A SA   Mouchard Laurent L   Reinert Knut K   Remington Karin A KA   Clark Andrew G AG   Waterman Michael S MS   Eichler Evan E EE   Adams Mark D MD   Hunkapiller Michael W MW   Myers Eugene W EW   Venter J Craig JC  

Proceedings of the National Academy of Sciences of the United States of America 20040209 7


We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for  ...[more]

Similar Datasets

| S-EPMC151187 | biostudies-literature
| S-EPMC311156 | biostudies-literature
| S-EPMC3224119 | biostudies-literature
| S-EPMC4120091 | biostudies-literature
| S-EPMC3953531 | biostudies-literature
| S-EPMC4192377 | biostudies-literature
| S-EPMC4865240 | biostudies-literature
| S-EPMC7045388 | biostudies-literature
| S-EPMC4239345 | biostudies-literature
| S-EPMC2674632 | biostudies-literature