Project description:Virophages, e.g., Sputnik, Mavirus, and Organic Lake virophage (OLV), are unusual parasites of giant double-stranded DNA (dsDNA) viruses, yet little is known about their diversity. Here, we describe the global distribution, abundance, and genetic diversity of virophages based on analyzing and mapping comprehensive metagenomic databases. The results reveal a distinct abundance and worldwide distribution of virophages, involving almost all geographical zones and a variety of unique environments. These environments ranged from deep ocean to inland, iced to hydrothermal lakes, and human gut- to animal-associated habitats. Four complete virophage genomic sequences (Yellowstone Lake virophages [YSLVs]) were obtained, as was one nearly complete sequence (Ace Lake Mavirus [ALM]). The genomes obtained were 27,849 bp long with 26 predicted open reading frames (ORFs) (YSLV1), 23,184 bp with 21 ORFs (YSLV2), 27,050 bp with 23 ORFs (YSLV3), 28,306 bp with 34 ORFs (YSLV4), and 17,767 bp with 22 ORFs (ALM). The homologous counterparts of five genes, including putative FtsK-HerA family DNA packaging ATPase and genes encoding DNA helicase/primase, cysteine protease, major capsid protein (MCP), and minor capsid protein (mCP), were present in all virophages studied thus far. They also shared a conserved gene cluster comprising the two core genes of MCP and mCP. Comparative genomic and phylogenetic analyses showed that YSLVs, having a closer relationship to each other than to the other virophages, were more closely related to OLV than to Sputnik but distantly related to Mavirus and ALM. These findings indicate that virophages appear to be widespread and genetically diverse, with at least 3 major lineages.
Project description:Bacteriophages are the most abundant biological entity on the planet, but at the same time do not account for much of the genetic material isolated from most environments due to their small genome sizes. They also show great genetic diversity and mosaic genomes making it challenging to analyze and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e.contigs) of phage origin in metagenomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic genome structure of many bacteriophages. The method is demonstrated to out-perform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology https://cge.cbs.dtu.dk/services/MetaPhinder/, while the source code can be downloaded from https://bitbucket.org/genomicepidemiology/metaphinder or https://github.com/vanessajurtz/MetaPhinder.
Project description:Although plasmids are important for bacterial survival and adaptation, plasmid detection and assembly from genomic, let alone metagenomic, samples remain challenging. The recently developed plasmidSPAdes assembler addressed some of these challenges in the case of isolate genomes but stopped short of detecting plasmids in metagenomic assemblies, an untapped source of yet to be discovered plasmids. We present the metaplasmidSPAdes tool for plasmid assembly in metagenomic data sets that reduced the false positive rate of plasmid detection compared with the state-of-the-art approaches. We assembled plasmids in diverse data sets and have shown that thousands of plasmids remained below the radar in already completed genomic and metagenomic studies. Our analysis revealed the extreme variability of plasmids and has led to the discovery of many novel plasmids (including many plasmids carrying antibiotic-resistance genes) without significant similarities to currently known ones.
Project description:Ethnic differences in dementia are increasingly recognized in epidemiological measures and diagnostic biomarkers. Nonetheless, ethnic diversity remains limited in many study populations. Here, we provide insights into ethnic diversity in open-access neuroimaging dementia data sets. Data sets comprising dementia populations with available data on ethnicity were included. Statistical analyses of sample and effect sizes were based on the Cochrane Handbook. Nineteen databases were included, with 17 studies of healthy groups or a combination of diagnostic groups if breakdown was unavailable and 12 of mild cognitive impairment and dementia groups. Combining all studies on dementia patients, the largest ethnic group was Caucasian (20 547 participants), with the next most common being Afro-Caribbean (1958), followed by Asian (1211). The smallest effect size detectable within the Caucasian group was 0.03, compared to Afro-Caribbean (0.1) and Asian (0.13). Our findings quantify the lack of ethnic diversity in openly available dementia data sets. More representative data would facilitate the development and validation of biomarkers relevant across ethnicities.
Project description:EMG produced TPA metagenomics assembly of the Bacterial diversity in various environments such as cultivated soil, alpine soil, semi-arid soil (Soil diversity) data set
Project description:There are very few studies exploring the genetic diversity of tick-borne encephalitis complex viruses. Most of the viruses have been sequenced using capillary electrophoresis, however, very few viruses have been analyzed using deep sequencing to look at the genotypes in each virus population. In this study, different viruses and strains belonging to the tick-borne encephalitis complex were sequenced and genetic diversity was analyzed. Shannon entropy and single nucleotide variants were used to compare the viruses. Then genetic diversity was compared to the phylogenetic relationship of the viruses.
Project description:BackgroundFluoroquinolones are broad-spectrum antibiotics used to prevent and treat a wide range of bacterial infections. Plasmid-mediated qnr genes provide resistance to fluoroquinolones in many bacterial species and are increasingly encountered in clinical settings. Over the last decade, several families of qnr genes have been discovered and characterized, but their true prevalence and diversity still remain unclear. In particular, environmental and host-associated bacterial communities have been hypothesized to maintain a large and unknown collection of qnr genes that could be mobilized into pathogens.ResultsIn this study we used computational methods to screen genomes and metagenomes for novel qnr genes. In contrast to previous studies, we analyzed an almost 20-fold larger dataset comprising almost 13 terabases of sequence data. In total, 362,843 potential qnr gene fragments were identified, from which 611 putative qnr genes were reconstructed. These gene sequences included all previously described plasmid-mediated qnr gene families. Fifty-two of the 611 identified qnr genes were reconstructed from metagenomes, and 20 of these were previously undescribed. All of the novel qnr genes were assembled from metagenomes associated with aquatic environments. Nine of the novel genes were selected for validation, and six of the tested genes conferred consistently decreased susceptibility to ciprofloxacin when expressed in Escherichia coli.ConclusionsThe results presented in this study provide additional evidence for the ubiquitous presence of qnr genes in environmental microbial communities, expand the number of known qnr gene variants and further elucidate the diversity of this class of resistance genes. This study also strengthens the hypothesis that environmental bacterial communities act as sources of previously uncharacterized qnr genes.
Project description:This is data on the microbial diversity of a geothermal spring located on the banks of the acidic creek of Kunashir Island. Data was obtained using 16s rRNA amplicon directed metagenomic sequencing on Illumina MiSeq. The raw sequence data used for analysis is available in NCBI under the Sequence Read Archive (SRA) with the BioProject No. PRJNA637298, PRJNA637447 and SRA accession number SRP265942, SRP266050. The data sequences of the 16s rRNA gene are presented at the accession numbers MT604934-MT604967, MT604911-MT604921 in NCBI GenBank database.
Project description:k-SLAM is a highly efficient algorithm for the characterization of metagenomic data. Unlike other ultra-fast metagenomic classifiers, full sequence alignment is performed allowing for gene identification and variant calling in addition to accurate taxonomic classification. A k-mer based method provides greater taxonomic accuracy than other classifiers and a three orders of magnitude speed increase over alignment based approaches. The use of alignments to find variants and genes along with their taxonomic origins enables novel strains to be characterized. k-SLAM's speed allows a full taxonomic classification and gene identification to be tractable on modern large data sets. A pseudo-assembly method is used to increase classification accuracy by up to 40% for species which have high sequence homology within their genus.