Dataset Information

Interactive Exploration on Large Genomic Datasets.

ABSTRACT: The prevalence of large genomics datasets has made the the need to explore this data more important. Large sequencing projects like the 1000 Genomes Project [1], which reconstructed the genomes of 2,504 individuals sampled from 26 populations, have produced over 200TB of publically available data. Meanwhile, existing genomic visualization tools have been unable to scale with the growing amount of larger, more complex data. This difficulty is acute when viewing large regions (over 1 megabase, or 1,000,000 bases of DNA), or when concurrently viewing multiple samples of data. While genomic processing pipelines have shifted towards using distributed computing techniques, such as with ADAM [4], genomic visualization tools have not. In this work we present Mango, a scalable genome browser built on top of ADAM that can run both locally and on a cluster. Mango presents a combination of different optimizations that can be combined in a single application to drive novel genomic visualization techniques over terabytes of genomic data. By building visualization on top of a distributed processing pipeline, we can perform visualization queries over large regions that are not possible with current tools, and decrease the time for viewing large data sets. Mango is part of the Big Data Genomics project at University of California-Berkeley [25] and is published under the Apache 2 license. Mango is available at https://github.com/bigdatagenomics/mango.

SUBMITTER: Tu E

PROVIDER: S-EPMC5754031 | biostudies-literature | 2016

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Interactive Exploration on Large Genomic Datasets.

Tu Eric E

EECS technical report series 20160516

The prevalence of large genomics datasets has made the the need to explore this data more important. Large sequencing projects like the 1000 Genomes Project [1], which reconstructed the genomes of 2,504 individuals sampled from 26 populations, have produced over 200TB of publically available data. Meanwhile, existing genomic visualization tools have been unable to scale with the growing amount of larger, more complex data. This difficulty is acute when viewing large regions (over 1 megabase, or ...[more]

PMID: 29308454

Dataset Information

Interactive Exploration on Large Genomic Datasets.

Publications

Interactive Exploration on Large Genomic Datasets.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Interactive exploration of integrated biological datasets using context-sensitive workflows.
| S-EPMC3929842 | biostudies-literature

Interactive Exploration, Analysis, and Visualization of Complex Phenome-Genome Datasets with ASPIREdb.
| S-EPMC4940263 | biostudies-literature

OmicLoupe: facilitating biological discovery by interactive exploration of multiple omic datasets and statistical comparisons.
| S-EPMC7931979 | biostudies-literature

EpiExplorer: live exploration and global analysis of large epigenomic datasets.
| S-EPMC3491424 | biostudies-literature

GPAT: retrieval of genomic annotation from large genomic position datasets.
| S-EPMC2654044 | biostudies-literature

MetaOmGraph: a workbench for interactive exploratory data analysis of large expression datasets.
| S-EPMC7039010 | biostudies-literature

PAD2: interactive exploration of transcription factor genomic colocalization using ChIP-seq data.
| S-EPMC10090434 | biostudies-literature

HiPiler: Visual Exploration of Large Genome Interaction Matrices with Interactive Small Multiples.
| S-EPMC6038708 | biostudies-literature

Interactive exploration of a global clinical network from a large breast cancer cohort.
| S-EPMC9365762 | biostudies-literature

CellProfiler Analyst: interactive data exploration, analysis and classification of large biological image sets.
| S-EPMC5048071 | biostudies-literature