Unknown

Dataset Information

0

MafFilter: a highly flexible and extensible multiple genome alignment files processor.


ABSTRACT:

Background

Sequence alignments are the starting point for most evolutionary and comparative analyses. Full genome sequences can be compared to study patterns of within and between species variation. Genome sequence alignments are complex structures containing information such as coordinates, quality scores and synteny structure, which are stored in Multiple Alignment Format (MAF) files. Processing these alignments therefore involves parsing and manipulating typically large MAF files in an efficient way.

Results

MafFilter is a command-line driven program written in C++ that enables the processing of genome alignments stored in the Multiple Alignment Format in an efficient and extensible manner. It provides an extensive set of tools which can be parametrized and combined by the user via option files. We demonstrate the software's functionality and performance on several biological examples covering Primate genomics and fungal population genomics. Example analyses involve window-based alignment filtering, feature extractions and various statistics, phylogenetics and population genomics calculations.

Conclusions

MafFilter is a highly efficient and flexible tool to analyse multiple genome alignments. By allowing the user to combine a large set of available methods, as well as designing his/her own, it enables the design of custom data filtering and analysis pipelines for genomic studies. MafFilter is an open source software available at http://bioweb.me/maffilter.

SUBMITTER: Dutheil JY 

PROVIDER: S-EPMC3904536 | biostudies-literature | 2014 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

MafFilter: a highly flexible and extensible multiple genome alignment files processor.

Dutheil Julien Y JY   Gaillard Sylvain S   Stukenbrock Eva H EH  

BMC genomics 20140122


<h4>Background</h4>Sequence alignments are the starting point for most evolutionary and comparative analyses. Full genome sequences can be compared to study patterns of within and between species variation. Genome sequence alignments are complex structures containing information such as coordinates, quality scores and synteny structure, which are stored in Multiple Alignment Format (MAF) files. Processing these alignments therefore involves parsing and manipulating typically large MAF files in a  ...[more]

Similar Datasets

| S-EPMC2699263 | biostudies-literature
| S-EPMC6054264 | biostudies-literature
| S-EPMC9571520 | biostudies-literature
| S-EPMC3229529 | biostudies-literature
| S-EPMC3142524 | biostudies-literature
| S-EPMC8100226 | biostudies-literature
| S-EPMC4504710 | biostudies-literature
| S-EPMC10663985 | biostudies-literature
| S-EPMC9421119 | biostudies-literature
| S-EPMC2892488 | biostudies-literature