Unknown

Dataset Information

0

The combination of direct and paired link graphs can boost repetitive genome assembly.


ABSTRACT: Currently, most paired link based scaffolding algorithms intrinsically mask the sequences between two linked contigs and bypass their direct link information embedded in the original de Bruijn assembly graph. Such disadvantage substantially complicates the scaffolding process and leads to the inability of resolving repetitive contig assembly. Here we present a novel algorithm, inGAP-sf, for effectively generating high-quality and continuous scaffolds. inGAP-sf achieves this by using a new strategy based on the combination of direct link and paired link graphs, in which direct link is used to increase graph connectivity and to decrease graph complexity and paired link is employed to supervise the traversing process on the direct link graph. Such advantage greatly facilitates the assembly of short-repeat enriched regions. Moreover, a new comprehensive decision model is developed to eliminate the noise routes accompanying with the introduced direct link. Through extensive evaluations on both simulated and real datasets, we demonstrated that inGAP-sf outperforms most of the genome scaffolding algorithms by generating more accurate and continuous assembly, especially for short repetitive regions.

SUBMITTER: Shi W 

PROVIDER: S-EPMC5399794 | biostudies-literature | 2017 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

The combination of direct and paired link graphs can boost repetitive genome assembly.

Shi Wenyu W   Ji Peifeng P   Zhao Fangqing F  

Nucleic acids research 20170401 6


Currently, most paired link based scaffolding algorithms intrinsically mask the sequences between two linked contigs and bypass their direct link information embedded in the original de Bruijn assembly graph. Such disadvantage substantially complicates the scaffolding process and leads to the inability of resolving repetitive contig assembly. Here we present a novel algorithm, inGAP-sf, for effectively generating high-quality and continuous scaffolds. inGAP-sf achieves this by using a new strate  ...[more]

Similar Datasets

| S-EPMC3619201 | biostudies-literature
| S-EPMC8296540 | biostudies-literature
| S-EPMC9438950 | biostudies-literature
| S-EPMC5069867 | biostudies-literature
| S-EPMC8035996 | biostudies-literature
| S-EPMC3158087 | biostudies-literature
| S-EPMC4015147 | biostudies-literature
| S-EPMC8311964 | biostudies-literature
| S-EPMC6545690 | biostudies-literature
| S-EPMC6719893 | biostudies-literature