Dataset Information

Comparison of methods for the detection of outliers and associated biomarkers in mislabeled omics data.

ABSTRACT: BACKGROUND:Previous studies have reported that labeling errors are not uncommon in omics data. Potential outliers may severely undermine the correct classification of patients and the identification of reliable biomarkers for a particular disease. Three methods have been proposed to address the problem: sparse label-noise-robust logistic regression (Rlogreg), robust elastic net based on the least trimmed square (enetLTS), and Ensemble. Ensemble is an ensembled classification based on distinct feature selection and modeling strategies. The accuracy of biomarker selection and outlier detection of these methods needs to be evaluated and compared so that the appropriate method can be chosen. RESULTS:The accuracy of variable selection, outlier identification, and prediction of three methods (Ensemble, enetLTS, Rlogreg) were compared for simulated and an RNA-seq dataset. On simulated datasets, Ensemble had the highest variable selection accuracy, as measured by a comprehensive index, and lowest false discovery rate among the three methods. When the sample size was large and the proportion of outliers was ?5%, the positive selection rate of Ensemble was similar to that of enetLTS. However, when the proportion of outliers was 10% or 15%, Ensemble missed some variables that affected the response variables. Overall, enetLTS had the best outlier detection accuracy with false positive rates ?5%, Ensemble can be used for variable selection on a subset after removing outliers identified by enetLTS. For outlier identification, enetLTS is the recommended method. In practice, the proportion of outliers can be estimated according to the inaccuracy of the diagnostic methods used.

SUBMITTER: Sun H

PROVIDER: S-EPMC7646480 | biostudies-literature | 2020 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Comparison of methods for the detection of outliers and associated biomarkers in mislabeled omics data.

Sun Hongwei H Cui Yuehua Y Wang Hui H Liu Haixia H Wang Tong T

BMC bioinformatics 20200814 1

<h4>Background</h4>Previous studies have reported that labeling errors are not uncommon in omics data. Potential outliers may severely undermine the correct classification of patients and the identification of reliable biomarkers for a particular disease. Three methods have been proposed to address the problem: sparse label-noise-robust logistic regression (Rlogreg), robust elastic net based on the least trimmed square (enetLTS), and Ensemble. Ensemble is an ensembled classification based on dis ...[more]

PMID: 32795265

Dataset Information

Comparison of methods for the detection of outliers and associated biomarkers in mislabeled omics data.

Publications

Comparison of methods for the detection of outliers and associated biomarkers in mislabeled omics data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Evaluation and comparison of multi-omics data integration methods for cancer subtyping.
| S-EPMC8384175 | biostudies-literature

Evaluation and Comparison of Multi-Omics Data Integration Methods for Subtyping of Cutaneous Melanoma
| S-EPMC9775581 | biostudies-literature

Explore Biomarkers Associated With Prognosis of Recurrent and Metastatic CRC After Surgery by Multi-omics Methods
| 2745213 | ecrin-mdr-crc

reGenotyper: Detecting mislabeled samples in genetic data.
| S-EPMC5305221 | biostudies-literature

Performance Comparison of Deep Learning Autoencoders for Cancer Subtype Detection Using Multi-Omics Data.
| S-EPMC8122584 | biostudies-literature

Comparison of methods for blood pathogens detection
| PRJEB42541 | ENA

Detection of patient subgroups with differential expression in omics data: a comprehensive comparison of univariate measures.
| S-EPMC3838370 | biostudies-literature

Unsupervised Multi-Omics Data Integration Methods: A Comprehensive Review.
| S-EPMC8981526 | biostudies-literature

Multi-Omics Data Analysis Identifies Prognostic Biomarkers across Cancers.
| S-EPMC10366886 | biostudies-literature

A Unified Approach for Outliers and Influential Data Detection - The Value of Information in Retrospect.
| S-EPMC10617639 | biostudies-literature