Unknown

Dataset Information

0

YAMDA: thousandfold speedup of EM-based motif discovery using deep learning libraries and GPU.


ABSTRACT: Motivation:Motif discovery in large biopolymer sequence datasets can be computationally demanding, presenting significant challenges for discovery in omics research. MEME, arguably one of the most popular motif discovery software, takes quadratic time with respect to dataset size, leading to excessively long runtimes for large datasets. Therefore, there is a demand for fast programs that can generate results of the same quality as MEME. Results:Here we describe YAMDA, a highly scalable motif discovery software package. It is built on Pytorch, a tensor computation deep learning library with strong GPU acceleration that is highly optimized for tensor operations that are also useful for motifs. YAMDA takes linear time to find motifs as accurately as MEME, completing in seconds or minutes, which translates to speedups over a thousandfold. Availability and implementation:YAMDA is freely available on Github (https://github.com/daquang/YAMDA). Supplementary information:Supplementary data are available at Bioinformatics online.

SUBMITTER: Quang D 

PROVIDER: S-EPMC6184538 | biostudies-literature | 2018 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

YAMDA: thousandfold speedup of EM-based motif discovery using deep learning libraries and GPU.

Quang Daniel D   Guan Yuanfang Y   Parker Stephen C J SCJ  

Bioinformatics (Oxford, England) 20181001 20


<h4>Motivation</h4>Motif discovery in large biopolymer sequence datasets can be computationally demanding, presenting significant challenges for discovery in omics research. MEME, arguably one of the most popular motif discovery software, takes quadratic time with respect to dataset size, leading to excessively long runtimes for large datasets. Therefore, there is a demand for fast programs that can generate results of the same quality as MEME.<h4>Results</h4>Here we describe YAMDA, a highly sca  ...[more]

Similar Datasets

2023-03-31 | GSE165175 | GEO
| S-EPMC5537091 | biostudies-other
2023-03-31 | GSE165173 | GEO
2023-03-31 | GSE165174 | GEO
2023-03-31 | GSE165171 | GEO
| S-EPMC10026561 | biostudies-literature
| S-EPMC6356839 | biostudies-literature
| S-EPMC10357458 | biostudies-literature
| S-EPMC6717539 | biostudies-literature
| S-EPMC9528019 | biostudies-literature