Unknown

Dataset Information

0

GenFamClust: an accurate, synteny-aware and reliable homology inference algorithm.


ABSTRACT: BACKGROUND:Homology inference is pivotal to evolutionary biology and is primarily based on significant sequence similarity, which, in general, is a good indicator of homology. Algorithms have also been designed to utilize conservation in gene order as an indication of homologous regions. We have developed GenFamClust, a method based on quantification of both gene order conservation and sequence similarity. RESULTS:In this study, we validate GenFamClust by comparing it to well known homology inference algorithms on a synthetic dataset. We applied several popular clustering algorithms on homologs inferred by GenFamClust and other algorithms on a metazoan dataset and studied the outcomes. Accuracy, similarity, dependence, and other characteristics were investigated for gene families yielded by the clustering algorithms. GenFamClust was also applied to genes from a set of complete fungal genomes and gene families were inferred using clustering. The resulting gene families were compared with a manually curated gold standard of pillars from the Yeast Gene Order Browser. We found that the gene-order component of GenFamClust is simple, yet biologically realistic, and captures local synteny information for homologs. CONCLUSIONS:The study shows that GenFamClust is a more accurate, informed, and comprehensive pipeline to infer homologs and gene families than other commonly used homology and gene-family inference methods.

SUBMITTER: Ali RH 

PROVIDER: S-EPMC4893229 | biostudies-literature | 2016 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

GenFamClust: an accurate, synteny-aware and reliable homology inference algorithm.

Ali Raja H RH   Muhammad Sayyed A SA   Arvestad Lars L  

BMC evolutionary biology 20160604 1


<h4>Background</h4>Homology inference is pivotal to evolutionary biology and is primarily based on significant sequence similarity, which, in general, is a good indicator of homology. Algorithms have also been designed to utilize conservation in gene order as an indication of homologous regions. We have developed GenFamClust, a method based on quantification of both gene order conservation and sequence similarity.<h4>Results</h4>In this study, we validate GenFamClust by comparing it to well know  ...[more]

Similar Datasets

| S-EPMC3852004 | biostudies-literature
| S-EPMC2647951 | biostudies-literature
| S-EPMC6321874 | biostudies-literature
| S-EPMC5790135 | biostudies-literature
| S-EPMC10187222 | biostudies-literature
| S-EPMC1933213 | biostudies-literature
| S-EPMC5066062 | biostudies-literature
| S-EPMC10016055 | biostudies-literature
| S-EPMC6248979 | biostudies-literature
| S-EPMC9750109 | biostudies-literature