Unknown

Dataset Information

0

Some Statistical Strategies for DAE-seq Data Analysis: Variable Selection and Modeling Dependencies among Observations.


ABSTRACT: In DAE (DNA After Enrichment)-seq experiments, genomic regions related with certain biological processes are enriched/isolated by an assay and are then sequenced on a high-throughput sequencing platform to determine their genomic positions. Statistical analysis of DAE-seq data aims to detect genomic regions with significant aggregations of isolated DNA fragments ("enriched regions") versus all the other regions ("background"). However, many confounding factors may influence DAE-seq signals. In addition, the signals in adjacent genomic regions may exhibit strong correlations, which invalidate the independence assumption employed by many existing methods. To mitigate these issues, we develop a novel Autoregressive Hidden Markov Model (AR-HMM) to account for covariates effects and violations of the independence assumption. We demonstrate that our AR-HMM leads to improved performance in identifying enriched regions in both simulated and real datasets, especially in those in epigenetic datasets with broader regions of DAE-seq signal enrichment. We also introduce a variable selection procedure in the context of the HMM/AR-HMM where the observations are not independent and the mean value of each state-specific emission distribution is modeled by some covariates. We study the theoretical properties of this variable selection procedure and demonstrate its efficacy in simulated and real DAE-seq data. In summary, we develop several practical approaches for DAE-seq data analysis that are also applicable to more general problems in statistics.

SUBMITTER: Rashid NU 

PROVIDER: S-EPMC3963211 | biostudies-literature | 2014 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Some Statistical Strategies for DAE-seq Data Analysis: Variable Selection and Modeling Dependencies among Observations.

Rashid Naim U NU   Sun Wei W   Ibrahim Joseph G JG  

Journal of the American Statistical Association 20140101 505


In DAE (DNA After Enrichment)-seq experiments, genomic regions related with certain biological processes are enriched/isolated by an assay and are then sequenced on a high-throughput sequencing platform to determine their genomic positions. Statistical analysis of DAE-seq data aims to detect genomic regions with significant aggregations of isolated DNA fragments ("enriched regions") versus all the other regions ("background"). However, many confounding factors may influence DAE-seq signals. In a  ...[more]

Similar Datasets

| S-EPMC9660265 | biostudies-literature
| S-EPMC6765121 | biostudies-literature
| S-EPMC4692967 | biostudies-literature
| S-EPMC11316085 | biostudies-literature
| S-EPMC7671404 | biostudies-literature
| S-EPMC7032893 | biostudies-literature
| S-EPMC5499643 | biostudies-literature
| S-EPMC7944974 | biostudies-literature
| S-EPMC2844735 | biostudies-literature
| S-EPMC2712761 | biostudies-literature