Unknown

Dataset Information

0

SigAlign: an alignment algorithm guided by explicit similarity criteria.


ABSTRACT: In biological sequence alignment, prevailing heuristic aligners achieve high-throughput by several approximation techniques, but at the cost of sacrificing the clarity of output criteria and creating complex parameter spaces. To surmount these challenges, we introduce 'SigAlign', a novel alignment algorithm that employs two explicit cutoffs for the results: minimum length and maximum penalty per length, alongside three affine gap penalties. Comparative analyses of SigAlign against leading database search tools (BLASTn, MMseqs2) and read mappers (BWA-MEM, bowtie2, HISAT2, minimap2) highlight its performance in read mapping and database searches. Our research demonstrates that SigAlign not only provides high sensitivity with a non-heuristic approach, but also surpasses the throughput of existing heuristic aligners, particularly for high-accuracy reads or genomes with few repetitive regions. As an open-source library, SigAlign is poised to become a foundational component to provide a transparent and customizable alignment process to new analytical algorithms, tools and pipelines in bioinformatics.

SUBMITTER: Bahk K 

PROVIDER: S-EPMC11347165 | biostudies-literature | 2024 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

SigAlign: an alignment algorithm guided by explicit similarity criteria.

Bahk Kunhyung K   Sung Joohon J  

Nucleic acids research 20240801 15


In biological sequence alignment, prevailing heuristic aligners achieve high-throughput by several approximation techniques, but at the cost of sacrificing the clarity of output criteria and creating complex parameter spaces. To surmount these challenges, we introduce 'SigAlign', a novel alignment algorithm that employs two explicit cutoffs for the results: minimum length and maximum penalty per length, alongside three affine gap penalties. Comparative analyses of SigAlign against leading databa  ...[more]

Similar Datasets

| S-EPMC5860613 | biostudies-literature
| S-EPMC3705623 | biostudies-literature
| S-EPMC3774796 | biostudies-literature
| S-EPMC5137889 | biostudies-literature
| S-EPMC4410667 | biostudies-literature
| S-EPMC10262298 | biostudies-literature
| S-EPMC145823 | biostudies-other
| S-EPMC1409777 | biostudies-literature
| S-EPMC3289081 | biostudies-literature
| S-EPMC4528633 | biostudies-literature