Unknown

Dataset Information

0

The gene family-free median of three.


ABSTRACT: BACKGROUND:The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes. METHODS:We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We study its computational complexity and we describe an integer linear program (ILP) for its exact solution. We further discuss a related problem called family-free adjacencies for k genomes for the special case of [Formula: see text] and present an ILP for its solution. However, for this problem, the computation of exact solutions remains intractable for sufficiently large instances. We then proceed to describe a heuristic method, FFAdj-AM, which performs well in practice. RESULTS:The developed methods compute accurate positional orthologs for genomes comparable in size of bacterial genomes on simulated data and genomic data acquired from the OMA orthology database. In particular, FFAdj-AM performs equally or better when compared to the well-established gene family prediction tool MultiMSOAR. CONCLUSIONS:We study the computational complexity of a new family-free model and present algorithms for its solution. With FFAdj-AM, we propose an appealing alternative to established tools for identifying higher confidence positional orthologs.

SUBMITTER: Doerr D 

PROVIDER: S-EPMC5446766 | biostudies-literature | 2017

REPOSITORIES: biostudies-literature

altmetric image

Publications

The gene family-free median of three.

Doerr Daniel D   Balaban Metin M   Feijão Pedro P   Chauve Cedric C  

Algorithms for molecular biology : AMB 20170526


<h4>Background</h4>The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes.<h4>Methods</h4>We present a model for constructing a median of three genomes in th  ...[more]

Similar Datasets

| S-EPMC3526435 | biostudies-literature
| S-EPMC3668173 | biostudies-literature
| S-EPMC2821827 | biostudies-literature
| S-EPMC2874752 | biostudies-literature
| S-EPMC4867544 | biostudies-literature
| S-EPMC8356735 | biostudies-literature
| S-EPMC7232685 | biostudies-literature
| S-EPMC10298685 | biostudies-literature
| S-EPMC8111734 | biostudies-literature
| S-EPMC3854447 | biostudies-literature