Unknown

Dataset Information

0

Reference-assisted chromosome assembly.


ABSTRACT: One of the most difficult problems in modern genomics is the assembly of full-length chromosomes using next generation sequencing (NGS) data. To address this problem, we developed "reference-assisted chromosome assembly" (RACA), an algorithm to reliably order and orient sequence scaffolds generated by NGS and assemblers into longer chromosomal fragments using comparative genome information and paired-end reads. Evaluation of results using simulated and real genome assemblies indicates that our approach can substantially improve genomes generated by a wide variety of de novo assemblers if a good reference assembly of a closely related species and outgroup genomes are available. We used RACA to reconstruct 60 Tibetan antelope (Pantholops hodgsonii) chromosome fragments from 1,434 SOAPdenovo sequence scaffolds, of which 16 chromosome fragments were homologous to complete cattle chromosomes. Experimental validation by PCR showed that predictions made by RACA are highly accurate. Our results indicate that RACA will significantly facilitate the study of chromosome evolution and genome rearrangements for the large number of genomes being sequenced by NGS that do not have a genetic or physical map.

SUBMITTER: Kim J 

PROVIDER: S-EPMC3562798 | biostudies-other | 2013 Jan

REPOSITORIES: biostudies-other

altmetric image

Publications

Reference-assisted chromosome assembly.

Kim Jaebum J   Larkin Denis M DM   Cai Qingle Q   Asan   Zhang Yongfen Y   Ge Ri-Li RL   Auvil Loretta L   Capitanu Boris B   Zhang Guojie G   Lewin Harris A HA   Ma Jian J  

Proceedings of the National Academy of Sciences of the United States of America 20130110 5


One of the most difficult problems in modern genomics is the assembly of full-length chromosomes using next generation sequencing (NGS) data. To address this problem, we developed "reference-assisted chromosome assembly" (RACA), an algorithm to reliably order and orient sequence scaffolds generated by NGS and assemblers into longer chromosomal fragments using comparative genome information and paired-end reads. Evaluation of results using simulated and real genome assemblies indicates that our a  ...[more]

Similar Datasets

| S-EPMC4629036 | biostudies-literature
| S-EPMC6472339 | biostudies-literature
| S-EPMC6807559 | biostudies-literature
| S-EPMC8558581 | biostudies-literature
| S-EPMC7509475 | biostudies-literature
| S-EPMC10387071 | biostudies-literature
| S-EPMC10078159 | biostudies-literature
| S-EPMC7238675 | biostudies-literature
| S-EPMC11373640 | biostudies-literature
| S-EPMC8022726 | biostudies-literature