Dataset Information

A community resource for paired genomic and metabolomic data mining.

ABSTRACT:

SUBMITTER: Schorn MA

PROVIDER: S-EPMC7987574 | biostudies-literature | 2021 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A community resource for paired genomic and metabolomic data mining.

Schorn Michelle A MA Verhoeven Stefan S Ridder Lars L Huber Florian F Acharya Deepa D DD Aksenov Alexander A AA Aleti Gajender G Moghaddam Jamshid Amiri JA Aron Allegra T AT Aziz Saefuddin S Bauermeister Anelize A Bauman Katherine D KD Baunach Martin M Beemelmanns Christine C Beman J Michael JM Berlanga-Clavero María Victoria MV Blacutt Alex A AA Bode Helge B HB Boullie Anne A Brejnrod Asker A Bugni Tim S TS Calteau Alexandra A Cao Liu L Carrión Víctor J VJ Castelo-Branco Raquel R Chanana Shaurya S Chase Alexander B AB Chevrette Marc G MG Costa-Lotufo Leticia V LV Crawford Jason M JM Currie Cameron R CR Cuypers Bart B Dang Tam T de Rond Tristan T Demko Alyssa M AM Dittmann Elke E Du Chao C Drozd Christopher C Dujardin Jean-Claude JC Dutton Rachel J RJ Edlund Anna A Fewer David P DP Garg Neha N Gauglitz Julia M JM Gentry Emily C EC Gerwick Lena L Glukhov Evgenia E Gross Harald H Gugger Muriel M Guillén Matus Dulce G DG Helfrich Eric J N EJN Hempel Benjamin-Florian BF Hur Jae-Seoun JS Iorio Marianna M Jensen Paul R PR Kang Kyo Bin KB Kaysser Leonard L Kelleher Neil L NL Kim Chung Sub CS Kim Ki Hyun KH Koester Irina I König Gabriele M GM Leao Tiago T Lee Seoung Rak SR Lee Yi-Yuan YY Li Xuanji X Little Jessica C JC Maloney Katherine N KN Männle Daniel D Martin H Christian C McAvoy Andrew C AC Metcalf Willam W WW Mohimani Hosein H Molina-Santiago Carlos C Moore Bradley S BS Mullowney Michael W MW Muskat Mitchell M Nothias Louis-Félix LF O'Neill Ellis C EC Parkinson Elizabeth I EI Petras Daniel D Piel Jörn J Pierce Emily C EC Pires Karine K Reher Raphael R Romero Diego D Roper M Caroline MC Rust Michael M Saad Hamada H Saenz Carmen C Sanchez Laura M LM Sørensen Søren Johannes SJ Sosio Margherita M Süssmuth Roderich D RD Sweeney Douglas D Tahlan Kapil K Thomson Regan J RJ Tobias Nicholas J NJ Trindade-Silva Amaro E AE van Wezel Gilles P GP Wang Mingxun M Weldon Kelly C KC Zhang Fan F Ziemert Nadine N Duncan Katherine R KR Crüsemann Max M Rogers Simon S Dorrestein Pieter C PC Medema Marnix H MH van der Hooft Justin J J JJJ

Nature chemical biology 20210401 4

PMID: 33589842

Similar Datasets

Project description:Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter's prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in "reverse" to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems.

Project description:Single-cell sequencing has shed light on previously inaccessible biological questions from different fields of research, including organism development, immune function, and disease progression. The number of single-cell-based studies increased dramatically over the past decade. Several new methods and tools have been continuously developed, making it extremely tricky to navigate this research landscape and develop an up-to-date workflow to analyze single-cell sequencing data, particularly for researchers seeking to enter this field without computational experience. Moreover, choosing appropriate tools and optimal parameters to meet the demands of researchers represents a major challenge in processing single-cell sequencing data. However, a specific resource for easy access to detailed information on single-cell sequencing methods and data processing pipelines is still lacking. In the present study, an online resource called SingleScan was developed to curate all up-to-date single-cell transcriptome/genome analyzing tools and pipelines. All the available tools were categorized according to their main tasks, and several typical workflows for single-cell data analysis were summarized. In addition, spatial transcriptomics, which is a breakthrough molecular analysis method that enables researchers to measure all gene activity in tissue samples and map the site of activity, was included along with a portion of single-cell and spatial analysis solutions. For each processing step, the available tools and specific parameters used in published articles are provided and how these parameters affect the results is shown in the resource. All information used in the resource was manually extracted from related literature. An interactive website was designed for data retrieval, visualization, and download. By analyzing the included tools and literature, users can gain insights into the trends of single-cell studies and easily grasp the specific usage of a specific tool. SingleScan will facilitate the analysis of single-cell sequencing data and promote the development of new tools to meet the growing and diverse needs of the research community. The SingleScan database is publicly accessible via the website at http://cailab.labshare.cn/SingleScan .

Dataset Information

A community resource for paired genomic and metabolomic data mining.

Publications

A community resource for paired genomic and metabolomic data mining.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets