Dataset Information

Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform

ABSTRACT: Summary Obtaining a standardized benchmark of computational methods is a major issue in data-science communities. Dedicated frameworks enabling fair benchmarking in a unified environment are yet to be developed. Here, we introduce Codabench, a meta-benchmark platform that is open sourced and community driven for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench is open to everyone free of charge and allows benchmark organizers to fairly compare submissions under the same setting (software, hardware, data, algorithms), with custom protocols and data formats. Codabench has unique features facilitating easy organization of flexible and reproducible benchmarks, such as the possibility of reusing templates of benchmarks and supplying compute resources on demand. Codabench has been used internally and externally on various applications, receiving more than 130 users and 2,500 submissions. As illustrative use cases, we introduce four diverse benchmarks covering graph machine learning, cancer heterogeneity, clinical diagnosis, and reinforcement learning. Highlights • Codabench facilitates flexible, easy, and reproducible benchmarking• Organizers can customize benchmark design and submission format• Organizers may host their own platform instance or use the public instance• Four use cases in diverse domains are introduced to demonstrate the key features The bigger picture In almost all communities working on data science, researchers face increasingly severe issues of reproducibility and fair comparison. Researchers work on their own version of hardware/software environment, code, and data, and consequently, the published results are hardly comparable. We introduce Codabench, a meta-benchmark platform, that is capable of flexible and easy benchmarking and supports reproducibility. Codabench is an important step toward benchmarking and reproducible research. It has been used in various communities including graph machine learning, cancer heterogeneity, clinical diagnosis, and reinforcement learning. Codabench is ready to help trendy research, e.g., artificial intelligence (AI) for science and data-centric AI. Fair and flexible benchmarking is a common issue in data-science communities. We develop the Codabench platform for flexible, easy, and reproducible benchmarking. It is open sourced and community driven. With Codabench, we are able to fairly and easily compare algorithms as well as datasets under diverse protocols. The reproducibility is also guaranteed.

SUBMITTER: Xu Z

PROVIDER: S-EPMC9278500 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:BackgroundThe rapid advances in next-generation sequencing technologies have revolutionized the microbiome research by greatly increasing our ability to understand diversity of microbes in a given sample. Over the past decade, several computational pipelines have been developed to efficiently process and annotate these microbiome data. However, most of these pipelines require an implementation of additional tools for downstream analyses as well as advanced programming skills.ResultsHere we introduce a user-friendly microbiome analysis platform, EzMAP (Easy Microbiome Analysis Platform), which was developed using Java Swings, Java Script and R programming language. EzMAP is a standalone package providing graphical user interface, enabling easy access to all the functionalities of QIIME2 (Quantitative Insights Into Microbial Ecology) as well as streamlined downstream analyses using QIIME2 output as input. This platform is designed to give users the detailed reports and the intermediate output files that are generated progressively. The users are allowed to download the features/OTU table (.biom;.tsv;.xls), representative sequences (.fasta) and phylogenetic tree (.nwk), taxonomy assignment file (optional). For downstream analyses, users are allowed to perform relative abundances (at all taxonomical levels), community comparison (alpha and beta diversity, core microbiome), differential abundances (DESeq2 and linear discriminant analysis) and functional prediction (PICRust, Tax4Fun and FunGuilds). Our case study using a published rice microbiome dataset demonstrates intuitive user interface and great accessibility of the EzMAP.ConclusionsThis EzMAP allows users to consolidate the microbiome analysis processes from raw sequence processing to downstream analyses specific for individual projects. We believe that this will be an invaluable tool for the beginners in their microbiome data analysis. This platform is freely available at https://github.com/gnanibioinfo/EzMAP and will be continually updated for adoption of changes in methods and approaches.

Dataset Information

Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets