Ontology highlight
ABSTRACT: Background
The fast reduction of prices of DNA sequencing allowed rapid accumulation of genome data. However, the process of obtaining complete genome sequences is still very time consuming and labor demanding. In addition, data produced from various sequencing technologies or alternative assemblies remain underexplored to improve assembly of incomplete genome sequences.Findings
We have developed FGAP, a tool for closing gaps of draft genome sequences that takes advantage of different datasets. FGAP uses BLAST to align multiple contigs against a draft genome assembly aiming to find sequences that overlap gaps. The algorithm selects the best sequence to fill and eliminate the gap.Conclusions
FGAP reduced the number of gaps by 78% in an E. coli draft genome assembly using two different sequencing technologies, Illumina and 454. Using PacBio long reads, 98% of gaps were solved. In human chromosome 14 assemblies, FGAP reduced the number of gaps by 35%. All the inserted sequences were validated with a reference genome using QUAST. The source code and a web tool are available at http://www.bioinfo.ufpr.br/fgap/.
SUBMITTER: Piro VC
PROVIDER: S-EPMC4091766 | biostudies-literature | 2014 Jun
REPOSITORIES: biostudies-literature
Piro Vitor C VC Faoro Helisson H Weiss Vinicius A VA Steffens Maria B R MB Pedrosa Fabio O FO Souza Emanuel M EM Raittz Roberto T RT
BMC research notes 20140618
<h4>Background</h4>The fast reduction of prices of DNA sequencing allowed rapid accumulation of genome data. However, the process of obtaining complete genome sequences is still very time consuming and labor demanding. In addition, data produced from various sequencing technologies or alternative assemblies remain underexplored to improve assembly of incomplete genome sequences.<h4>Findings</h4>We have developed FGAP, a tool for closing gaps of draft genome sequences that takes advantage of diff ...[more]