Unknown

Dataset Information

0

A community effort to identify and correct mislabeled samples in proteogenomic studies.


ABSTRACT: Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly variable performance observed across the submitted methods. Post-challenge collaboration between the top-performing teams and the challenge organizers has created an open-source software, COSMO, with demonstrated high accuracy and robustness in mislabeling identification and correction in simulated and real multi-omic datasets.

SUBMITTER: Yoo S 

PROVIDER: S-EPMC8134945 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC5305221 | biostudies-literature
| S-EPMC6072548 | biostudies-literature
| S-EPMC6292093 | biostudies-literature
| S-EPMC4431839 | biostudies-literature
| S-EPMC8458405 | biostudies-literature
| S-EPMC5707065 | biostudies-literature
| S-EPMC4410931 | biostudies-literature
| S-EPMC7064602 | biostudies-literature
| S-EPMC10078209 | biostudies-literature
| S-EPMC6816447 | biostudies-literature