Dataset Information

Spongospora subterranea de novo genome assembly project

ABSTRACT: Spongospora genome

PROVIDER: PRJEB25390 | ENA |

REPOSITORIES: ENA

ACCESS DATA

Similar Datasets

Project description:There are very few methods for de novo genome assembly based on the overlap graph approach. It is considered as giving more exact results than the so-called de Bruijn graph approach but in much greater time and of much higher memory usage. It is not uncommon that assembly methods involving the overlap graph model are not able to successfully compute greater data sets, mainly due to memory limitation of a computer. This was the reason for developing in last decades mainly de Bruijn-based assembly methods, fast and fairly accurate. However, the latter methods can fail for longer or more repetitive genomes, as they decompose reads to shorter fragments and lose a part of information. An efficient assembler for processing big data sets and using the overlap graph model is still looked out. We propose a new genome-scale de novo assembler based on the overlap graph approach, designed for short-read sequencing data. The method, ALGA, incorporates several new ideas resulting in more exact contigs produced in short time. Among these ideas we have creation of a sparse but quite informative graph, reduction of the graph including a procedure referring to the problem of minimum spanning tree of a local subgraph, and graph traversal connected with simultaneous analysis of contigs stored so far. What is rare in genome assembly, the algorithm is almost parameter-free, with only one optional parameter to be set by a user. ALGA was compared with nine state-of-the-art assemblers in tests on genome-scale sequencing data obtained from real experiments on six organisms, differing in size, coverage, GC content, and repetition rate. ALGA produced best results in the sense of overall quality of genome reconstruction, understood as a good balance between genome coverage, accuracy, and length of resulting sequences. The algorithm is one of tools involved in processing data in currently realized national project Genomic Map of Poland. ALGA is available at http://alga.put.poznan.pl. Supplementary material is available at Bioinformatics online.

Dataset Information

Spongospora subterranea de novo genome assembly project

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets