Unknown

Dataset Information

0

CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection.


ABSTRACT: Genes evolve by point mutations, but also by shuffling, fusion, and fission of genetic fragments. Therefore, similarity between two sequences can be due to common ancestry producing homology, and/or partial sharing of component fragments. Disentangling these processes is especially challenging in large molecular data sets, because of computational time. In this article, we present CompositeSearch, a memory-efficient, fast, and scalable method to detect composite gene families in large data sets (typically in the range of several million sequences). CompositeSearch generalizes the use of similarity networks to detect composite and component gene families with a greater recall, accuracy, and precision than recent programs (FusedTriplets and MosaicFinder). Moreover, CompositeSearch provides user-friendly quality descriptions regarding the distribution and primary sequence conservation of these gene families allowing critical biological analyses of these data.

SUBMITTER: Pathmanathan JS 

PROVIDER: S-EPMC5850286 | biostudies-literature | 2018 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection.

Pathmanathan Jananan Sylvestre JS   Lopez Philippe P   Lapointe François-Joseph FJ   Bapteste Eric E  

Molecular biology and evolution 20180101 1


Genes evolve by point mutations, but also by shuffling, fusion, and fission of genetic fragments. Therefore, similarity between two sequences can be due to common ancestry producing homology, and/or partial sharing of component fragments. Disentangling these processes is especially challenging in large molecular data sets, because of computational time. In this article, we present CompositeSearch, a memory-efficient, fast, and scalable method to detect composite gene families in large data sets  ...[more]

Similar Datasets

| S-EPMC6420648 | biostudies-literature
2008-04-02 | GSE11010 | GEO
| S-EPMC7055778 | biostudies-literature
2008-04-02 | E-GEOD-11010 | biostudies-arrayexpress
2024-10-24 | GSE274751 | GEO
| S-EPMC6452684 | biostudies-literature
| PRJNA107095 | ENA
| S-EPMC8114821 | biostudies-literature
| S-EPMC6110828 | biostudies-other
| S-EPMC6124095 | biostudies-other