Project description:The Tara Oceans expedition (2009-2013) sampled contrasting ecosystems of the world oceans, collecting environmental data and plankton, from viruses to metazoans, for later analysis using modern sequencing and state-of-the-art imaging technologies. It surveyed 210 ecosystems in 20 biogeographic provinces, collecting over 35,000 samples of seawater and plankton. The interpretation of such an extensive collection of samples in their ecological context requires means to explore, assess and access raw and validated data sets. To address this challenge, the Tara Oceans Consortium offers open science resources, including the use of open access archives for nucleotides (ENA) and for environmental, biogeochemical, taxonomic and morphological data (PANGAEA), and the development of on line discovery tools and collaborative annotation tools for sequences and images. Here, we present an overview of Tara Oceans Data, and we provide detailed registries (data sets) of all campaigns (from port-to-port), stations and sampling events.
Project description:A recent study proposed five new RNA virus phyla, two of which, 'Taraviricota' and 'Arctiviricota', were stated to be 'dominant in the oceans'. However, the study's assignments classify 28,353 putative RdRp-containing contigs to known phyla but only 886 (2.8%) to the five proposed new phyla combined. I re-mapped the reads to the contigs, finding that known phyla also account for a large majority (93.8%) of reads according to the study's classifications, and that contigs originally assigned to 'Arctiviricota' accounted for only a tiny fraction (0.01%) of reads from Arctic Ocean samples. Performing my own virus identification and classifications, I found that 99.95 per cent of reads could be assigned to known phyla. The most abundant species was Beihai picorna-like virus 34 (15% of reads), and the most abundant order-like cluster was classified as Picornavirales (45% of reads). Sequences in the claimed new phylum 'Pomiviricota' were placed inside a phylogenetic tree for established order Durnavirales with 100 per cent confidence. Moreover, two contigs assigned to the proposed phylum 'Taraviricota' were found to have high-identity alignments to dinoflagellate proteins, tentatively identifying this group of RdRp-like sequences as deriving from non-viral transcripts. Together, these results comprehensively contradict the claim that new phyla dominate the data.
Project description:Mixed microbial communities are complex, dynamic and heterogeneous. It is therefore essential that biomolecular fractions obtained for high-throughput omic analyses are representative of single samples to facilitate meaningful data integration, analysis and modeling. We have developed a new methodological framework for the reproducible isolation of high-quality genomic DNA, large and small RNA, proteins, and polar and non-polar metabolites from single unique mixed microbial community samples. The methodology is based around reproducible cryogenic sample preservation and cell lysis. Metabolites are extracted first using organic solvents, followed by the sequential isolation of nucleic acids and proteins using chromatographic spin columns. The methodology was validated by comparison to traditional dedicated and simultaneous biomolecular isolation methods. To prove the broad applicability of the methodology, we applied it to microbial consortia of biotechnological, environmental and biomedical research interest. The developed methodological framework lays the foundation for standardized molecular eco-systematic studies on a range of different microbial communities in the future.
Project description:There is a critical need for improving the level of chemistry awareness in systems biology. The data and information related to modulation of genes and proteins by small molecules continue to accumulate at the same time as simulation tools in systems biology and whole body physiologically based pharmacokinetics (PBPK) continue to evolve. We called this emerging area at the interface between chemical biology and systems biology systems chemical biology (SCB) (Nat Chem Biol 3: 447-450, 2007).The overarching goal of computational SCB is to develop tools for integrated chemical-biological data acquisition, filtering and processing, by taking into account relevant information related to interactions between proteins and small molecules, possible metabolic transformations of small molecules, as well as associated information related to genes, networks, small molecules, and, where applicable, mutants and variants of those proteins. There is yet an unmet need to develop an integrated in silico pharmacology/systems biology continuum that embeds drug-target-clinical outcome (DTCO) triplets, a capability that is vital to the future of chemical biology, pharmacology, and systems biology. Through the development of the SCB approach, scientists will be able to start addressing, in an integrated simulation environment, questions that make the best use of our ever-growing chemical and biological data repositories at the system-wide level. This chapter reviews some of the major research concepts and describes key components that constitute the emerging area of computational systems chemical biology.
Project description:Nucleo-cytoplasmic large DNA viruses (NCLDVs) constitute a group of eukaryotic viruses that can have crucial ecological roles in the sea by accelerating the turnover of their unicellular hosts or by causing diseases in animals. To better characterize the diversity, abundance and biogeography of marine NCLDVs, we analyzed 17 metagenomes derived from microbial samples (0.2-1.6??m size range) collected during the Tara Oceans Expedition. The sample set includes ecosystems under-represented in previous studies, such as the Arabian Sea oxygen minimum zone (OMZ) and Indian Ocean lagoons. By combining computationally derived relative abundance and direct prokaryote cell counts, the abundance of NCLDVs was found to be in the order of 10(4)-10(5) genomes?ml(-1) for the samples from the photic zone and 10(2)-10(3) genomes?ml(-1) for the OMZ. The Megaviridae and Phycodnaviridae dominated the NCLDV populations in the metagenomes, although most of the reads classified in these families showed large divergence from known viral genomes. Our taxon co-occurrence analysis revealed a potential association between viruses of the Megaviridae family and eukaryotes related to oomycetes. In support of this predicted association, we identified six cases of lateral gene transfer between Megaviridae and oomycetes. Our results suggest that marine NCLDVs probably outnumber eukaryotic organisms in the photic layer (per given water mass) and that metagenomic sequence analyses promise to shed new light on the biodiversity of marine viruses and their interactions with potential hosts.
Project description:A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009-2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world's planktonic ecosystems.
Project description:Several studies in Bioinformatics, Computational Biology and Systems Biology rely on the definition of physico-chemical or mathematical models of biological systems at different scales and levels of complexity, ranging from the interaction of atoms in single molecules up to genome-wide interaction networks. Traditional computational methods and software tools developed in these research fields share a common trait: they can be computationally demanding on Central Processing Units (CPUs), therefore limiting their applicability in many circumstances. To overcome this issue, general-purpose Graphics Processing Units (GPUs) are gaining an increasing attention by the scientific community, as they can considerably reduce the running time required by standard CPU-based software, and allow more intensive investigations of biological systems. In this review, we present a collection of GPU tools recently developed to perform computational analyses in life science disciplines, emphasizing the advantages and the drawbacks in the use of these parallel architectures. The complete list of GPU-powered tools here reviewed is available at http://bit.ly/gputools.
Project description:Illumina reads of the SSU-rDNA-V9 region obtained from the circumglobal Tara Oceans expedition allow the investigation of protistan plankton diversity patterns on a global scale. We analyzed 6,137,350 V9-amplicons from ocean surface waters and the deep chlorophyll maximum, which were taxonomically assigned to the phylum Ciliophora. For open ocean samples global planktonic ciliate diversity is relatively low (ca. 1,300 observed and predicted ciliate OTUs). We found that 17% of all detected ciliate OTUs occurred in all oceanic regions under study. On average, local ciliate OTU richness represented 27% of the global ciliate OTU richness, indicating that a large proportion of ciliates is widely distributed. Yet, more than half of these OTUs shared <90% sequence similarity with reference sequences of described ciliates. While alpha-diversity measures (richness and exp(Shannon H)) are hardly affected by contemporary environmental conditions, species (OTU) turnover and community similarity (β-diversity) across taxonomic groups showed strong correlation to environmental parameters. Logistic regression models predicted significant correlations between the occurrence of specific ciliate genera and individual nutrients, the oceanic carbonate system and temperature. Planktonic ciliates displayed distinct vertical distributions relative to chlorophyll a. In contrast, the Tara Oceans dataset did not reveal any evidence that latitude is structuring ciliate communities.