Dataset Information

Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates.

ABSTRACT: BACKGROUND: Model selection is a vital part of most phylogenetic analyses, and accounting for the heterogeneity in evolutionary patterns across sites is particularly important. Mixture models and partitioning are commonly used to account for this variation, and partitioning is the most popular approach. Most current partitioning methods require some a priori partitioning scheme to be defined, typically guided by known structural features of the sequences, such as gene boundaries or codon positions. Recent evidence suggests that these a priori boundaries often fail to adequately account for variation in rates and patterns of evolution among sites. Furthermore, new phylogenomic datasets such as those assembled from ultra-conserved elements lack obvious structural features on which to define a priori partitioning schemes. The upshot is that, for many phylogenetic datasets, partitioned models of molecular evolution may be inadequate, thus limiting the accuracy of downstream phylogenetic analyses. RESULTS: We present a new algorithm that automatically selects a partitioning scheme via the iterative division of the alignment into subsets of similar sites based on their rates of evolution. We compare this method to existing approaches using a wide range of empirical datasets, and show that it consistently leads to large increases in the fit of partitioned models of molecular evolution when measured using AICc and BIC scores. In doing so, we demonstrate that some related approaches to solving this problem may have been associated with a small but important bias. CONCLUSIONS: Our method provides an alternative to traditional approaches to partitioning, such as dividing alignments by gene and codon position. Because our method is data-driven, it can be used to estimate partitioned models for all types of alignments, including those that are not amenable to traditional approaches to partitioning.

SUBMITTER: Frandsen PB

PROVIDER: S-EPMC4327964 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates.

Frandsen Paul B PB Calcott Brett B Mayer Christoph C Lanfear Robert R

BMC evolutionary biology 20150210

<h4>Background</h4>Model selection is a vital part of most phylogenetic analyses, and accounting for the heterogeneity in evolutionary patterns across sites is particularly important. Mixture models and partitioning are commonly used to account for this variation, and partitioning is the most popular approach. Most current partitioning methods require some a priori partitioning scheme to be defined, typically guided by known structural features of the sequences, such as gene boundaries or codon ...[more]

PMID: 25887041

Similar Datasets

Project description:Crop yield is an essential measure for breeders, researchers, and farmers and is composed of and may be calculated by the number of ears per square meter, grains per ear, and thousand grain weight. Manual wheat ear counting, required in breeding programs to evaluate crop yield potential, is labor-intensive and expensive; thus, the development of a real-time wheat head counting system would be a significant advancement. In this paper, we propose a computationally efficient system called DeepCount to automatically identify and count the number of wheat spikes in digital images taken under natural field conditions. The proposed method tackles wheat spike quantification by segmenting an image into superpixels using simple linear iterative clustering (SLIC), deriving canopy relevant features, and then constructing a rational feature model fed into the deep convolutional neural network (CNN) classification for semantic segmentation of wheat spikes. As the method is based on a deep learning model, it replaces hand-engineered features required for traditional machine learning methods with more efficient algorithms. The method is tested on digital images taken directly in the field at different stages of ear emergence/maturity (using visually different wheat varieties), with different canopy complexities (achieved through varying nitrogen inputs) and different heights above the canopy under varying environmental conditions. In addition, the proposed technique is compared with a wheat ear counting method based on a previously developed edge detection technique and morphological analysis. The proposed approach is validated with image-based ear counting and ground-based measurements. The results demonstrate that the DeepCount technique has a high level of robustness regardless of variables, such as growth stage and weather conditions, hence demonstrating the feasibility of the approach in real scenarios. The system is a leap toward a portable and smartphone-assisted wheat ear counting systems, results in reducing the labor involved, and is suitable for high-throughput analysis. It may also be adapted to work on Red; Green; Blue (RGB) images acquired from unmanned aerial vehicle (UAVs).

Dataset Information

Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates.

Publications

Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets