Unknown

Dataset Information

0

PEP_scaffolder: using (homologous) proteins to scaffold genomes.


ABSTRACT: Recovering the gene structures is one of the important goals of genome assembly. In low-quality assemblies, and even some high-quality assemblies, certain gene regions are still incomplete; thus, novel scaffolding approaches are required to complete gene regions.We developed an efficient and fast genome scaffolding method called PEP_scaffolder, using proteins to scaffold genomes. The pipeline aims to recover protein-coding gene structures. We tested the method on human contigs; using human UniProt proteins as guides, the improvement on N50 size was 17% increase with an accuracy of ?97%. PEP_scaffolder improved the proportion of fully covered proteins among all proteins, which was close to the proportion in the finished genome. The method provided a high accuracy of 91% using orthologs of distant species. Tested on simulated fly contigs, PEP_scaffolder outperformed other scaffolders, with the shortest running time and the highest accuracy.The software is freely available at http://www.fishbrowser.org/software/PEP_scaffolder/ CONTACT: lijt@cafs.ac.cnSupplementary information: Supplementary data are available at Bioinformatics online.

SUBMITTER: Zhu BH 

PROVIDER: S-EPMC5048069 | biostudies-other | 2016 Oct

REPOSITORIES: biostudies-other

altmetric image

Publications

PEP_scaffolder: using (homologous) proteins to scaffold genomes.

Zhu Bai-Han BH   Song Ying-Nan YN   Xue Wei W   Xu Gui-Cai GC   Xiao Jun J   Sun Ming-Yuan MY   Sun Xiao-Wen XW   Li Jiong-Tang JT  

Bioinformatics (Oxford, England) 20160622 20


<h4>Motivation</h4>Recovering the gene structures is one of the important goals of genome assembly. In low-quality assemblies, and even some high-quality assemblies, certain gene regions are still incomplete; thus, novel scaffolding approaches are required to complete gene regions.<h4>Results</h4>We developed an efficient and fast genome scaffolding method called PEP_scaffolder, using proteins to scaffold genomes. The pipeline aims to recover protein-coding gene structures. We tested the method  ...[more]

Similar Datasets

| S-EPMC22334 | biostudies-literature
| S-EPMC2740814 | biostudies-literature
| S-EPMC4102405 | biostudies-literature
| S-EPMC4486954 | biostudies-literature
| S-EPMC10466382 | biostudies-literature
2016-06-01 | E-GEOD-79095 | biostudies-arrayexpress
| S-EPMC3792089 | biostudies-literature
| S-EPMC6404977 | biostudies-literature
| S-EPMC6317479 | biostudies-literature
| S-EPMC10130707 | biostudies-literature