Project description:BACKGROUND:Nowadays, not only are single genomes commonly analyzed, but also metagenomes, which are sets of, DNA fragments (reads) derived from microbes living in a given environment. Metagenome analysis is aimed at extracting crucial information on the organisms that have left their traces in an investigated environmental sample.In this study we focus on the MetaSUB Forensics Challenge (organized within the CAMDA 2018 conference) which consists in predicting the geographical origin of metagenomic samples. Contrary to the existing methods for environmental classification that are based on taxonomic or functional classification, we rely on the similarity between a sample and the reference database computed at a reads level. RESULTS:We report the results of our extensive experimental study to investigate the behavior of our method and its sensitivity to different parameters. In our tests, we have followed the protocol of the MetaSUB Challenge, which allowed us to compare the obtained results with the solutions based on taxonomic and functional classification. CONCLUSIONS:The results reported in the paper indicate that our method is competitive with those based on taxonomic classification. Importantly, by measuring the similarity at the reads level, we avoid the necessity of using large databases with annotated gene sequences. Hence our main finding is that environmental classification of metagenomic data can be proceeded without using large databases required for taxonomic or functional classification. REVIEWERS:This article was reviewed by Eran Elhaik, Alexandra Bettina Graf, Chengsheng Zhu, and Andre Kahles.
Project description:AimsThe aim of this study was to develop and demonstrate an approach for describing the diversity of human pathogenic viruses in an environmentally isolated viral metagenome.Methods and resultsIn silico bioinformatic experiments were used to select an optimum annotation strategy for discovering human viruses in virome data sets and applied to annotate a class B biosolid virome. Results from the in silico study indicated that <1% errors in virus identification could be achieved when nucleotide-based search programs (BLASTn or tBLASTx), viral genome only databases and sequence reads >200 nt were considered. Within the 51,925 annotated sequences, 94 DNA and 19 RNA sequences were identified as human viruses. Virus diversity included environmentally transmitted agents such as parechovirus, coronavirus, adenovirus and aichi virus, as well as viruses associated with chronic human infections such as human herpes and hepatitis C viruses.ConclusionsThis study provided a bioinformatic approach for identifying pathogens in a virome data set and demonstrated the human virus diversity in a relevant environmental sample.Significance and impact of the study As the costs of next-generation sequencing decrease, the pathogen diversity described by virus metagenomes will provide an unbiased guide for subsequent cell culture and quantitative pathogen analyses and ensures that highly enriched and relevant pathogens are not neglected in exposure and risk assessments.