Unknown

Dataset Information

0

RepARK--de novo creation of repeat libraries from whole-genome NGS reads.


ABSTRACT: Generation of repeat libraries is a critical step for analysis of complex genomes. In the era of next-generation sequencing (NGS), such libraries are usually produced using a whole-genome shotgun (WGS) derived reference sequence whose completeness greatly influences the quality of derived repeat libraries. We describe here a de novo repeat assembly method--RepARK (Repetitive motif detection by Assembly of Repetitive K-mers)--which avoids potential biases by using abundant k-mers of NGS WGS reads without requiring a reference genome. For validation, repeat consensuses derived from simulated and real Drosophila melanogaster NGS WGS reads were compared to repeat libraries generated by four established methods. RepARK is orders of magnitude faster than the other methods and generates libraries that are: (i) composed almost entirely of repetitive motifs, (ii) more comprehensive and (iii) almost completely annotated by TEclass. Additionally, we show that the RepARK method is applicable to complex genomes like human and can even serve as a diagnostic tool to identify repetitive sequences contaminating NGS datasets.

SUBMITTER: Koch P 

PROVIDER: S-EPMC4027187 | biostudies-literature | 2014 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

RepARK--de novo creation of repeat libraries from whole-genome NGS reads.

Koch Philipp P   Platzer Matthias M   Downie Bryan R BR  

Nucleic acids research 20140314 9


Generation of repeat libraries is a critical step for analysis of complex genomes. In the era of next-generation sequencing (NGS), such libraries are usually produced using a whole-genome shotgun (WGS) derived reference sequence whose completeness greatly influences the quality of derived repeat libraries. We describe here a de novo repeat assembly method--RepARK (Repetitive motif detection by Assembly of Repetitive K-mers)--which avoids potential biases by using abundant k-mers of NGS WGS reads  ...[more]

Similar Datasets

| S-EPMC3726674 | biostudies-literature
| S-EPMC4792456 | biostudies-literature
| S-EPMC5009518 | biostudies-literature
| S-EPMC4120091 | biostudies-literature
| S-EPMC4582210 | biostudies-literature
| S-EPMC3158087 | biostudies-literature
| S-EPMC5411768 | biostudies-literature
| S-EPMC5543108 | biostudies-literature
| S-EPMC5770995 | biostudies-literature
| S-EPMC4161746 | biostudies-literature