Unknown

Dataset Information

0

Ensemble approach combining multiple methods improves human transcription start site prediction.


ABSTRACT:

Background

The computational prediction of transcription start sites is an important unsolved problem. Some recent progress has been made, but many promoters, particularly those not associated with CpG islands, are still difficult to locate using current methods. These methods use different features and training sets, along with a variety of machine learning techniques and result in different prediction sets.

Results

We demonstrate the heterogeneity of current prediction sets, and take advantage of this heterogeneity to construct a two-level classifier ('Profisi Ensemble') using predictions from 7 programs, along with 2 other data sources. Support vector machines using 'full' and 'reduced' data sets are combined in an either/or approach. We achieve a 14% increase in performance over the current state-of-the-art, as benchmarked by a third-party tool.

Conclusions

Supervised learning methods are a useful way to combine predictions from diverse sources.

SUBMITTER: Dineen DG 

PROVIDER: S-EPMC3053590 | biostudies-literature | 2010 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Ensemble approach combining multiple methods improves human transcription start site prediction.

Dineen David G DG   Schröder Markus M   Higgins Desmond G DG   Cunningham Pádraig P  

BMC genomics 20101130


<h4>Background</h4>The computational prediction of transcription start sites is an important unsolved problem. Some recent progress has been made, but many promoters, particularly those not associated with CpG islands, are still difficult to locate using current methods. These methods use different features and training sets, along with a variety of machine learning techniques and result in different prediction sets.<h4>Results</h4>We demonstrate the heterogeneity of current prediction sets, and  ...[more]

Similar Datasets

| S-EPMC148603 | biostudies-other
| S-EPMC2709924 | biostudies-literature
| S-EPMC2374378 | biostudies-literature
| S-EPMC3708499 | biostudies-literature
| S-EPMC3160847 | biostudies-literature
| S-EPMC3377991 | biostudies-literature
| S-EPMC2845628 | biostudies-literature
| S-EPMC3651085 | biostudies-literature
2010-07-01 | GSE22511 | GEO
| S-EPMC3730108 | biostudies-literature