Unknown

Dataset Information

0

Gentle masking of low-complexity sequences improves homology search.


ABSTRACT: Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with "gentle" masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is min(0,S), where S is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to "harsh" masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search.

SUBMITTER: Frith MC 

PROVIDER: S-EPMC3242753 | biostudies-literature | 2011

REPOSITORIES: biostudies-literature

altmetric image

Publications

Gentle masking of low-complexity sequences improves homology search.

Frith Martin C MC  

PloS one 20111219 12


Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with "gentle" mas  ...[more]

Similar Datasets

| S-EPMC7773278 | biostudies-literature
| S-EPMC4481844 | biostudies-other
2021-07-21 | GSE179646 | GEO
| S-EPMC2873317 | biostudies-literature
| S-EPMC2895107 | biostudies-literature
2024-07-29 | GSE272969 | GEO
2022-03-04 | GSE189259 | GEO
| S-EPMC3208395 | biostudies-literature
2021-07-21 | GSE179638 | GEO
2021-07-21 | GSE179641 | GEO