Dataset Information

SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery.

ABSTRACT: Today's genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a new unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), an approach that directly analyzes raw sequencing data to detect a signature of regulation: sample-specific sequence variation. The approach, which includes a new statistical test, is computationally efficient and can be run at scale. SPLASH unifies detection of myriad forms of sequence variation. We demonstrate that SPLASH identifies complex mutation patterns in SARS-CoV-2 strains, discovers regulated RNA isoforms at the single cell level, documents the vast sequence diversity of adaptive immune receptors, and uncovers biology in non-model organisms undocumented in their reference genomes: geographic and seasonal variation and diatom association in eelgrass, an oceanic plant impacted by climate change, and tissue-specific transcripts in octopus. SPLASH is a new unifying approach to genomic analysis that enables an expansive scope of discovery without metadata or references.

SUBMITTER: Chaung K

PROVIDER: S-EPMC9258296 | biostudies-literature | 2023 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery.

Chaung Kaitlin K Baharav Tavor Z TZ Henderson George G Zheludev Ivan N IN Wang Peter L PL Salzman Julia J

bioRxiv : the preprint server for biology 20230731

Today's genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a new unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), an approach that directly analyzes raw sequencing data to detect a signature of regulation: sample-specific sequence variation. The approach, which includes a new statistical test, is computationally efficient and can be run at scale. SPLASH unifies detection of myriad forms of sequence var ...[more]

PMID: 35794890

Dataset Information

SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery.

Publications

SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

SPLASH: A statistical, reference-free genomic algorithm unifies biological discovery.
| S-EPMC10861363 | biostudies-literature

A lossless reference-free sequence compression algorithm leveraging grammatical, statistical, and substitution rules.
| S-EPMC11735755 | biostudies-literature

DIVE: a reference-free statistical approach to diversity-generating and mobile genetic element discovery.
| S-EPMC10589994 | biostudies-literature

Statistical algorithm enabled high precision tumor biomarker discovery for circulating extracellular vesicle-based cancer liquid biopsy
2024-01-31 | GSE246925 | GEO

ECHO: a reference-free short-read error correction algorithm.
| S-EPMC3129260 | biostudies-literature

A statistical model for reference-free inference of archaic local ancestry.
| S-EPMC6555542 | biostudies-literature

Predictive Power Estimation Algorithm (PPEA)--a new algorithm to reduce overfitting for genomic biomarker discovery.
| S-EPMC3174148 | biostudies-literature

Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons.
| S-EPMC5037627 | biostudies-literature

Speeding genomic island discovery through systematic design of reference database composition.
| S-EPMC10936790 | biostudies-literature

Discovery of biological networks from diverse functional genomic data.
| S-EPMC1414113 | biostudies-literature