Unknown

Dataset Information

0

Distributed gene expression modelling for exploring variability in epigenetic function.


ABSTRACT: Predictive gene expression modelling is an important tool in computational biology due to the volume of high-throughput sequencing data generated by recent consortia. However, the scope of previous studies has been restricted to a small set of cell-lines or experimental conditions due an inability to leverage distributed processing architectures for large, sharded data-sets.We present a distributed implementation of gene expression modelling using the MapReduce paradigm and prove that performance improves as a linear function of available processor cores. We then leverage the computational efficiency of this framework to explore the variability of epigenetic function across fifty histone modification data-sets from variety of cancerous and non-cancerous cell-lines.We demonstrate that the genome-wide relationships between histone modifications and mRNA transcription are lineage, tissue and karyotype-invariant, and that models trained on matched -omics data from non-cancerous cell-lines are able to predict cancerous expression with equivalent genome-wide fidelity.

SUBMITTER: Budden DM 

PROVIDER: S-EPMC5097851 | biostudies-literature | 2016 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Distributed gene expression modelling for exploring variability in epigenetic function.

Budden David M DM   Crampin Edmund J EJ  

BMC bioinformatics 20161105 1


<h4>Background</h4>Predictive gene expression modelling is an important tool in computational biology due to the volume of high-throughput sequencing data generated by recent consortia. However, the scope of previous studies has been restricted to a small set of cell-lines or experimental conditions due an inability to leverage distributed processing architectures for large, sharded data-sets.<h4>Results</h4>We present a distributed implementation of gene expression modelling using the MapReduce  ...[more]

Similar Datasets

| S-EPMC6210221 | biostudies-literature
| S-EPMC6766365 | biostudies-literature
| S-EPMC8759562 | biostudies-literature
| S-EPMC7017299 | biostudies-literature
| S-EPMC2556388 | biostudies-other
| S-EPMC3063692 | biostudies-literature
| S-EPMC4274012 | biostudies-literature
| S-EPMC2836165 | biostudies-literature
| S-EPMC4376354 | biostudies-literature
| S-EPMC3076844 | biostudies-other