Dataset Information

PosiGene: automated and easy-to-use pipeline for genome-wide detection of positively selected genes.

ABSTRACT: Many comparative genomics studies aim to find the genetic basis of species-specific phenotypic traits. A prevailing strategy is to search genome-wide for genes that evolved under positive selection based on the non-synonymous to synonymous substitution ratio. However, incongruent results largely due to high false positive rates indicate the need for standardization of quality criteria and software tools. Main challenges are the ortholog and isoform assignment, the high sensitivity of the statistical models to alignment errors and the imperative to parallelize large parts of the software. We developed the software tool PosiGene that (i) detects positively selected genes (PSGs) on genome-scale, (ii) allows analysis of specific evolutionary branches, (iii) can be used in arbitrary species contexts and (iv) offers visualization of the results for further manual validation and biological interpretation. We exemplify PosiGene's performance using simulated and real data. In the simulated data approach, we determined a false positive rate <1%. With real data, we found that 68.4% of the PSGs detected by PosiGene, were shared by at least one previous study that used the same set of species. PosiGene is a user-friendly, reliable tool for reproducible genome-wide identification of PSGs and freely available at https://github.com/gengit/PosiGene.

SUBMITTER: Sahm A

PROVIDER: S-EPMC5499814 | biostudies-other | 2017 Jun

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

PosiGene: automated and easy-to-use pipeline for genome-wide detection of positively selected genes.

Sahm Arne A Bens Martin M Platzer Matthias M Szafranski Karol K

Nucleic acids research 20170601 11

Many comparative genomics studies aim to find the genetic basis of species-specific phenotypic traits. A prevailing strategy is to search genome-wide for genes that evolved under positive selection based on the non-synonymous to synonymous substitution ratio. However, incongruent results largely due to high false positive rates indicate the need for standardization of quality criteria and software tools. Main challenges are the ortholog and isoform assignment, the high sensitivity of the statist ...[more]

PMID: 28334822

Similar Datasets

Project description:BackgroundGenome imputation, admixture resolution and genome-wide association analyses are timely and computationally intensive processes with many composite and requisite steps. Analysis time increases further when building and installing the run programs required for these analyses. For scientists that may not be as versed in programing language, but want to perform these operations hands on, there is a lengthy learning curve to utilize the vast number of programs available for these analyses.ResultsIn an effort to streamline the entire process with easy-to-use steps for scientists working with big data, the Odyssey pipeline was developed. Odyssey is a simplified, efficient, semi-automated genome-wide imputation and analysis pipeline, which prepares raw genetic data, performs pre-imputation quality control, phasing, imputation, post-imputation quality control, population stratification analysis, and genome-wide association with statistical data analysis, including result visualization. Odyssey is a pipeline that integrates programs such as PLINK, SHAPEIT, Eagle, IMPUTE, Minimac, and several R packages, to create a seamless, easy-to-use, and modular workflow controlled via a single user-friendly configuration file. Odyssey was built with compatibility in mind, and thus utilizes the Singularity container solution, which can be run on Linux, MacOS, and Windows platforms. It is also easily scalable from a simple desktop to a High-Performance System (HPS).ConclusionOdyssey facilitates efficient and fast genome-wide association analysis automation and can go from raw genetic data to genome: phenome association visualization and analyses results in 3-8 h on average, depending on the input data, choice of programs within the pipeline and available computer resources. Odyssey was built to be flexible, portable, compatible, scalable, and easy to setup. Biologists less familiar with programing can now work hands on with their own big data using this easy-to-use pipeline.

Dataset Information

PosiGene: automated and easy-to-use pipeline for genome-wide detection of positively selected genes.

Publications

PosiGene: automated and easy-to-use pipeline for genome-wide detection of positively selected genes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets