Unknown

Dataset Information

0

Discriminative motif optimization based on perceptron training.


ABSTRACT: Generating accurate transcription factor (TF) binding site motifs from data generated using the next-generation sequencing, especially ChIP-seq, is challenging. The challenge arises because a typical experiment reports a large number of sequences bound by a TF, and the length of each sequence is relatively long. Most traditional motif finders are slow in handling such enormous amount of data. To overcome this limitation, tools have been developed that compromise accuracy with speed by using heuristic discrete search strategies or limited optimization of identified seed motifs. However, such strategies may not fully use the information in input sequences to generate motifs. Such motifs often form good seeds and can be further improved with appropriate scoring functions and rapid optimization.We report a tool named discriminative motif optimizer (DiMO). DiMO takes a seed motif along with a positive and a negative database and improves the motif based on a discriminative strategy. We use area under receiver-operating characteristic curve (AUC) as a measure of discriminating power of motifs and a strategy based on perceptron training that maximizes AUC rapidly in a discriminative manner. Using DiMO, on a large test set of 87 TFs from human, drosophila and yeast, we show that it is possible to significantly improve motifs identified by nine motif finders. The motifs are generated/optimized using training sets and evaluated on test sets. The AUC is improved for almost 90% of the TFs on test sets and the magnitude of increase is up to 39%.DiMO is available at http://stormo.wustl.edu/DiMO

SUBMITTER: Patel RY 

PROVIDER: S-EPMC3967114 | biostudies-literature | 2014 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Discriminative motif optimization based on perceptron training.

Patel Ronak Y RY   Stormo Gary D GD  

Bioinformatics (Oxford, England) 20131224 7


<h4>Motivation</h4>Generating accurate transcription factor (TF) binding site motifs from data generated using the next-generation sequencing, especially ChIP-seq, is challenging. The challenge arises because a typical experiment reports a large number of sequences bound by a TF, and the length of each sequence is relatively long. Most traditional motif finders are slow in handling such enormous amount of data. To overcome this limitation, tools have been developed that compromise accuracy with  ...[more]

Similar Datasets

| S-EPMC5840810 | biostudies-literature
| S-EPMC5728525 | biostudies-literature
| S-EPMC2562012 | biostudies-literature
| S-EPMC3957073 | biostudies-literature
| S-EPMC3050600 | biostudies-literature
| S-EPMC3157928 | biostudies-literature
| S-EPMC3923751 | biostudies-literature
| S-EPMC2194741 | biostudies-literature
| S-EPMC8588560 | biostudies-literature
| S-EPMC3834837 | biostudies-other