Unknown

Dataset Information

0

Compleasm: a faster and more accurate reimplementation of BUSCO.


ABSTRACT:

Motivation

Evaluating the gene completeness is critical to measuring the quality of a genome assembly. An incomplete assembly can lead to errors in gene predictions, annotation, and other downstream analyses. Benchmarking Universal Single-Copy Orthologs (BUSCO) is a widely used tool for assessing the completeness of genome assembly by testing the presence of a set of single-copy orthologs conserved across a wide range of taxa. However, BUSCO is slow particularly for large genome assemblies. It is cumbersome to apply BUSCO to a large number of assemblies.

Results

Here, we present compleasm, an efficient tool for assessing the completeness of genome assemblies. Compleasm utilizes the miniprot protein-to-genome aligner and the conserved orthologous genes from BUSCO. It is 14 times faster than BUSCO for human assemblies and reports a more accurate completeness of 99.6% than BUSCO's 95.7%, which is in close agreement with the annotation completeness of 99.5% for T2T-CHM13.

Availability and implementation

https://github.com/huangnengCSU/compleasm.

SUBMITTER: Huang N 

PROVIDER: S-EPMC10558035 | biostudies-literature | 2023 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

compleasm: a faster and more accurate reimplementation of BUSCO.

Huang Neng N   Li Heng H  

Bioinformatics (Oxford, England) 20231001 10


<h4>Motivation</h4>Evaluating the gene completeness is critical to measuring the quality of a genome assembly. An incomplete assembly can lead to errors in gene predictions, annotation, and other downstream analyses. Benchmarking Universal Single-Copy Orthologs (BUSCO) is a widely used tool for assessing the completeness of genome assembly by testing the presence of a set of single-copy orthologs conserved across a wide range of taxa. However, BUSCO is slow particularly for large genome assembli  ...[more]

Similar Datasets

| S-EPMC10152795 | biostudies-literature
| S-EPMC9853855 | biostudies-literature
| S-EPMC4908353 | biostudies-literature
| S-EPMC11532793 | biostudies-literature
| S-EPMC4378532 | biostudies-other
| S-EPMC6078290 | biostudies-literature
| S-EPMC9352231 | biostudies-literature
| S-EPMC4872991 | biostudies-literature
| S-EPMC3217585 | biostudies-literature
| S-EPMC5850500 | biostudies-literature