Unknown

Dataset Information

0

A novel method for predicting activity of cis-regulatory modules, based on a diverse training set.


ABSTRACT:

Motivation

With the rapid emergence of technologies for locating cis-regulatory modules (CRMs) genome-wide, the next pressing challenge is to assign precise functions to each CRM, i.e. to determine the spatiotemporal domains or cell-types where it drives expression. A popular approach to this task is to model the typical k-mer composition of a set of CRMs known to drive a common expression pattern, and assign that pattern to other CRMs exhibiting a similar k-mer composition. This approach does not rely on prior knowledge of transcription factors relevant to the CRM or their binding motifs, and is thus more widely applicable than motif-based methods for predicting CRM activity, but is also prone to false positive predictions.

Results

We present a novel strategy to improve the above-mentioned approach: to predict if a CRM drives a specific gene expression pattern, assess not only how similar the CRM is to other CRMs with similar activity but also to CRMs with distinct activities. We use a state-of-the-art statistical method to quantify a CRM's sequence similarity to many different training sets of CRMs, and employ a classification algorithm to integrate these similarity scores into a single prediction of the CRM's activity. This strategy is shown to significantly improve CRM activity prediction over current approaches.

Availability and implementation

Our implementation of the new method, called IMMBoost, is freely available as source code, at https://github.com/weiyangedward/IMMBoost CONTACT: sinhas@illinois.eduSupplementary information: Supplementary data are available at Bioinformatics online.

SUBMITTER: Yang W 

PROVIDER: S-EPMC6075022 | biostudies-literature | 2017 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

A novel method for predicting activity of cis-regulatory modules, based on a diverse training set.

Yang Wei W   Sinha Saurabh S  

Bioinformatics (Oxford, England) 20160907 1


<h4>Motivation</h4>With the rapid emergence of technologies for locating cis-regulatory modules (CRMs) genome-wide, the next pressing challenge is to assign precise functions to each CRM, i.e. to determine the spatiotemporal domains or cell-types where it drives expression. A popular approach to this task is to model the typical k-mer composition of a set of CRMs known to drive a common expression pattern, and assign that pattern to other CRMs exhibiting a similar k-mer composition. This approac  ...[more]

Similar Datasets

| S-EPMC2669485 | biostudies-literature
| S-EPMC3424583 | biostudies-literature
| S-EPMC1796902 | biostudies-literature
| S-EPMC2490743 | biostudies-literature
2016-12-14 | GSE81358 | GEO
| S-EPMC4143197 | biostudies-literature
| S-EPMC3359238 | biostudies-literature
| S-EPMC3694643 | biostudies-literature
| S-EPMC2882937 | biostudies-literature
| S-EPMC1665632 | biostudies-literature