Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

0

A systematic evaluation of pattern discovery algorithms


ABSTRACT: Pattern discovery algorithms are methods for discovering recurrent, non-random motifs widely used in the analysis of biological sequences. Many algorithms exist but few comparisons have been made amongst them. We systematically profile eight representative methods at multiple parameter settings across 174 diverse experimental datasets, including ten novel ChIP-on-chip datasets. We executed 16,777 pattern discovery analyses to assess prediction accuracy, CPU usage and memory consumption. For 144 datasets we developed a gold-standard using machine-learning algorithms; cross-validation was used for the remaining datasets. Performance was highly disparate, with median accuracy ranging from 32% to 96%. Importantly we were unable to replicate previously reported algorithm-rankings, emphasizing the need to use many and diverse experimental datasets. We found deterministic algorithms like Projection and Oligo/Dyad had the highest prediction accuracy. Computational efficiency was not linearly related to dataset size and becomes critical: some algorithms are intractably slow on large datasets. This work provides the first combined assessment of the CPU, memory, and prediction accuracies of pattern discovery algorithms on real experimental datasets. HL60-Mnt-ChIP: ChIP-Chip with 10 biological replicates HL60-Trrap-ChIP: ChIP-Chip with 13 biological replicates

ORGANISM(S): Homo sapiens

SUBMITTER: Igor Jurisica 

PROVIDER: E-GEOD-15370 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

Similar Datasets

2016-08-06 | E-GEOD-71860 | biostudies-arrayexpress
2010-05-17 | E-GEOD-8447 | biostudies-arrayexpress
2010-05-17 | E-GEOD-8449 | biostudies-arrayexpress
2016-08-06 | E-GEOD-71961 | biostudies-arrayexpress
2016-07-24 | E-GEOD-77854 | biostudies-arrayexpress
2016-07-24 | E-GEOD-77856 | biostudies-arrayexpress
2010-05-18 | E-GEOD-16562 | biostudies-arrayexpress
2016-07-06 | E-GEOD-84052 | biostudies-arrayexpress
2010-05-14 | E-GEOD-13938 | biostudies-arrayexpress
2016-08-11 | E-GEOD-72143 | biostudies-arrayexpress