Dataset Information

A feature selection strategy for gene expression time series experiments with hidden Markov models.

ABSTRACT: Studies conducted in time series could be far more informative than those that only capture a specific moment in time. However, when it comes to transcriptomic data, time points are sparse creating the need for a constant search for methods capable of extracting information out of experiments of this kind. We propose a feature selection algorithm embedded in a hidden Markov model applied to gene expression time course data on either single or even multiple biological conditions. For the latter, in a simple case-control study features or genes are selected under the assumption of no change over time for the control samples, while the case group must have at least one change. The proposed model reduces the feature space according to a two-state hidden Markov model. The two states define change/no-change in gene expression. Features are ranked in consonance with three scores: number of changes across time, magnitude of such changes and quality of replicates as a measure of how much they deviate from the mean. An important highlight is that this strategy overcomes the few samples limitation, common in transcriptome experiments through a process of data transformation and rearrangement. To prove this method, our strategy was applied to three publicly available data sets. Results show that feature domain is reduced by up to 90% leaving only few but relevant features yet with findings consistent to those previously reported. Moreover, our strategy proved to be robust, stable and working on studies where sample size is an issue otherwise. Hence, even with two biological replicates and/or three time points our method proves to work well.

SUBMITTER: Cardenas-Ovando RA

PROVIDER: S-EPMC6786538 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A feature selection strategy for gene expression time series experiments with hidden Markov models.

Cárdenas-Ovando Roberto A RA Fernández-Figueroa Edith A EA Rueda-Zárate Héctor A HA Noguez Julieta J Rangel-Escareño Claudia C

PloS one 20191010 10

Studies conducted in time series could be far more informative than those that only capture a specific moment in time. However, when it comes to transcriptomic data, time points are sparse creating the need for a constant search for methods capable of extracting information out of experiments of this kind. We propose a feature selection algorithm embedded in a hidden Markov model applied to gene expression time course data on either single or even multiple biological conditions. For the latter, ...[more]

PMID: 31600242

Similar Datasets

Project description:BackgroundThere is strong incentive to model behaviour-dependent habitat selection, as this can help delineate critical habitats for important life processes and reduce bias in model parameters. For this purpose, a two-stage modelling approach is often taken: (i) classify behaviours with a hidden Markov model (HMM), and (ii) fit a step selection function (SSF) to each subset of data. However, this approach does not properly account for the uncertainty in behavioural classification, nor does it allow states to depend on habitat selection. An alternative approach is to estimate both state switching and habitat selection in a single, integrated model called an HMM-SSF.MethodsWe build on this recent methodological work to make the HMM-SSF approach more efficient and general. We focus on writing the model as an HMM where the observation process is defined by an SSF, such that well-known inferential methods for HMMs can be used directly for parameter estimation and state classification. We extend the model to include covariates on the HMM transition probabilities, allowing for inferences into the temporal and individual-specific drivers of state switching. We demonstrate the method through an illustrative example of plains zebra (Equus quagga), including state estimation, and simulations to estimate a utilisation distribution.ResultsIn the zebra analysis, we identified two behavioural states, with clearly distinct patterns of movement and habitat selection ("encamped" and "exploratory"). In particular, although the zebra tended to prefer areas higher in grassland across both behavioural states, this selection was much stronger in the fast, directed exploratory state. We also found a clear diel cycle in behaviour, which indicated that zebras were more likely to be exploring in the morning and encamped in the evening.ConclusionsThis method can be used to analyse behaviour-specific habitat selection in a wide range of species and systems. A large suite of statistical extensions and tools developed for HMMs and SSFs can be applied directly to this integrated model, making it a very versatile framework to jointly learn about animal behaviour, habitat selection, and space use.

Dataset Information

A feature selection strategy for gene expression time series experiments with hidden Markov models.

Publications

A feature selection strategy for gene expression time series experiments with hidden Markov models.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets