Project description:In metagenomic analyses of microbiomes, one of the first steps is usually the taxonomic classification of reads by comparison to a database of previously taxonomically classified genomes. While different studies comparing metagenomic taxonomic classification methods have determined that different tools are 'best', there are two tools that have been used the most to-date: Kraken (k-mer-based classification against a user-constructed database) and MetaPhlAn (classification by alignment to clade-specific marker genes), the latest versions of which are Kraken2 and MetaPhlAn 3, respectively. We found large discrepancies in both the proportion of reads that were classified as well as the number of species that were identified when we used both Kraken2 and MetaPhlAn 3 to classify reads within metagenomes from human-associated or environmental datasets. We then investigated which of these tools would give classifications closest to the real composition of metagenomic samples using a range of simulated and mock samples and examined the combined impact of tool-parameter-database choice on the taxonomic classifications given. This revealed that there may not be a one-size-fits-all 'best' choice. While Kraken2 can achieve better overall performance, with higher precision, recall and F1 scores, as well as alpha- and beta-diversity measures closer to the known composition than MetaPhlAn 3, the computational resources required for this may be prohibitive for many researchers, and the default database and parameters should not be used. We therefore conclude that the best tool-parameter-database choice for a particular application depends on the scientific question of interest, which performance metric is most important for this question and the limit of available computational resources.
Project description:Metagenomic sequencing is revolutionizing the detection and characterization of microbial species, and a wide variety of software tools are available to perform taxonomic classification of these data. The fast pace of development of these tools and the complexity of metagenomic data make it important that researchers are able to benchmark their performance. Here, we review current approaches for metagenomic analysis and evaluate the performance of 20 metagenomic classifiers using simulated and experimental datasets. We describe the key metrics used to assess performance, offer a framework for the comparison of additional classifiers, and discuss the future of metagenomic data analysis.
Project description:Peroxidases (EC 1.11.1.x), which are encoded by small or large multigenic families, are involved in several important physiological and developmental processes. They use various peroxides as electron acceptors to catalyse a number of oxidative reactions and are present in almost all living organisms. We have created a peroxidase database (http://peroxibase.isb-sib.ch) that contains all identified peroxidase-encoding sequences (about 6000 sequences in 940 organisms). They are distributed between 11 superfamilies and about 60 subfamilies. All the sequences have been individually annotated and checked. PeroxiBase can be consulted using six major interlink sections 'Classes', 'Organisms', 'Cellular localisations', 'Inducers', 'Repressors' and 'Tissue types'. General documentation on peroxidases and PeroxiBase is accessible in the 'Documents' section containing 'Introduction', 'Class description', 'Publications' and 'Links'. In addition to the database, we have developed a tool to classify peroxidases based on the PROSITE profile methodology. To improve their specificity and to prevent overlaps between closely related subfamilies the profiles were built using a new strategy based on the silencing of residues. This new profile construction method and its discriminatory capacity have been tested and validated using the different peroxidase families and subfamilies present in the database. The peroxidase classification tool called PeroxiScan is accessible at the following address: http://peroxibase.isb-sib.ch/peroxiscan.php.
Project description:Elucidating the role of gut microbiota in physiological and pathological processes has recently emerged as a key research aim in life sciences. In this respect, metaproteomics (the study of the whole protein complement of a microbial community) can provide a unique contribution by revealing which functions are actually being expressed by specific microbial taxa. However, its wide application to gut microbiota research has been hindered by challenges in data analysis, especially related to the choice of the proper sequence databases for protein identification. Here we present a systematic investigation of variables concerning database construction and annotation, and evaluate their impact on human and mouse gut metaproteomic results. We found that both publicly available and experimental metagenomic databases lead to the identification of unique peptide assortments, suggesting parallel database searches as a mean to gain more complete information. Taxonomic and functional results were revealed to be strongly database-dependent, especially when dealing with mouse samples. As a striking example, in mouse the Firmicutes/Bacteroidetes ratio varied up to 10-fold depending on the database used. Finally, we provide recommendations regarding metagenomic sequence processing aimed at maximizing gut metaproteome characterization, and contribute to identify an optimized pipeline for metaproteomic data analysis.
Project description:Shotgun metagenomic sequencing comprehensively samples the DNA of a microbial sample. Choosing the best bioinformatics processing package can be daunting due to the wide variety of tools available. Here, we assessed publicly available shotgun metagenomics processing packages/pipelines including bioBakery, Just a Microbiology System (JAMS), Whole metaGenome Sequence Assembly V2 (WGSA2), and Woltka using 19 publicly available mock community samples and a set of five constructed pathogenic gut microbiome samples. Also included is a workflow for labelling bacterial scientific names with NCBI taxonomy identifiers for better resolution in assessing results. The Aitchison distance, a sensitivity metric, and total False Positive Relative Abundance were used for accuracy assessments for all pipelines and mock samples. Overall, bioBakery4 performed the best with most of the accuracy metrics, while JAMS and WGSA2, had the highest sensitivities. Furthermore, bioBakery is commonly used and only requires a basic knowledge of command line usage. This work provides an unbiased assessment of shotgun metagenomics packages and presents results assessing the performance of the packages using mock community sequence data.
Project description:Human emotion recognition has been a major field of research in the last decades owing to its noteworthy academic and industrial applications. However, most of the state-of-the-art methods identified emotions after analyzing facial images. Emotion recognition using electroencephalogram (EEG) signals has got less attention. However, the advantage of using EEG signals is that it can capture real emotion. However, very few EEG signals databases are publicly available for affective computing. In this work, we present a database consisting of EEG signals of 44 volunteers. Twenty-three out of forty-four are females. A 32 channels CLARITY EEG traveler sensor is used to record four emotional states namely, happy, fear, sad, and neutral of subjects by showing 12 videos. So, 3 video files are devoted to each emotion. Participants are mapped with the emotion that they had felt after watching each video. The recorded EEG signals are considered further to classify four types of emotions based on discrete wavelet transform and extreme learning machine (ELM) for reporting the initial benchmark classification performance. The ELM algorithm is used for channel selection followed by subband selection. The proposed method performs the best when features are captured from the gamma subband of the FP1-F7 channel with 94.72% accuracy. The presented database would be available to the researchers for affective recognition applications.
| S-EPMC7422491 | biostudies-literature
Project description:Benchmarking Metagenomics Tools for Taxonomic Classification