Unknown

Dataset Information

0

Jointly aligning a group of DNA reads improves accuracy of identifying large deletions.


ABSTRACT: Performing sequence alignment to identify structural variants, such as large deletions, from genome sequencing data is a fundamental task, but current methods are far from perfect. The current practice is to independently align each DNA read to a reference genome. We show that the propensity of genomic rearrangements to accumulate in repeat-rich regions imposes severe ambiguities in these alignments, and consequently on the variant calls-with current read lengths, this affects more than one third of known large deletions in the C. Venter genome. We present a method to jointly align reads to a genome, whereby alignment ambiguity of one read can be disambiguated by other reads. We show this leads to a significant improvement in the accuracy of identifying large deletions (?20 bases), while imposing minimal computational overhead and maintaining an overall running time that is at par with current tools. A software implementation is available as an open-source Python program called JRA at https://bitbucket.org/jointreadalignment/jra-src.

SUBMITTER: Shrestha AMS 

PROVIDER: S-EPMC5815140 | biostudies-literature | 2018 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Jointly aligning a group of DNA reads improves accuracy of identifying large deletions.

Shrestha Anish M S AMS   Frith Martin C MC   Asai Kiyoshi K   Richard Hugues H  

Nucleic acids research 20180201 3


Performing sequence alignment to identify structural variants, such as large deletions, from genome sequencing data is a fundamental task, but current methods are far from perfect. The current practice is to independently align each DNA read to a reference genome. We show that the propensity of genomic rearrangements to accumulate in repeat-rich regions imposes severe ambiguities in these alignments, and consequently on the variant calls-with current read lengths, this affects more than one thir  ...[more]

Similar Datasets

| S-EPMC4168712 | biostudies-literature
2017-07-30 | GSE101815 | GEO
| S-EPMC2335322 | biostudies-literature
| S-EPMC8097683 | biostudies-literature
| S-EPMC5221426 | biostudies-literature
| S-EPMC6022640 | biostudies-literature
| S-EPMC8279564 | biostudies-literature
| S-EPMC10576640 | biostudies-literature
| S-EPMC9881981 | biostudies-literature
| S-EPMC9882582 | biostudies-literature