Unknown

Dataset Information

0

Systematic auditing is essential to debiasing machine learning in biology.


ABSTRACT: Biases in data used to train machine learning (ML) models can inflate their prediction performance and confound our understanding of how and what they learn. Although biases are common in biological data, systematic auditing of ML models to identify and eliminate these biases is not a common practice when applying ML in the life sciences. Here we devise a systematic, principled, and general approach to audit ML models in the life sciences. We use this auditing framework to examine biases in three ML applications of therapeutic interest and identify unrecognized biases that hinder the ML process and result in substantially reduced model performance on new datasets. Ultimately, we show that ML models tend to learn primarily from data biases when there is insufficient signal in the data to learn from. We provide detailed protocols, guidelines, and examples of code to enable tailoring of the auditing framework to other biomedical applications.

SUBMITTER: Eid FE 

PROVIDER: S-EPMC7876113 | biostudies-literature | 2021 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Systematic auditing is essential to debiasing machine learning in biology.

Eid Fatma-Elzahraa FE   Elmarakeby Haitham A HA   Chan Yujia Alina YA   Fornelos Nadine N   ElHefnawi Mahmoud M   Van Allen Eliezer M EM   Heath Lenwood S LS   Lage Kasper K  

Communications biology 20210210 1


Biases in data used to train machine learning (ML) models can inflate their prediction performance and confound our understanding of how and what they learn. Although biases are common in biological data, systematic auditing of ML models to identify and eliminate these biases is not a common practice when applying ML in the life sciences. Here we devise a systematic, principled, and general approach to audit ML models in the life sciences. We use this auditing framework to examine biases in thre  ...[more]

Similar Datasets

| S-EPMC8562935 | biostudies-literature
| S-EPMC7519645 | biostudies-literature
| S-EPMC9112473 | biostudies-literature
| S-EPMC6307915 | biostudies-literature
| S-EPMC8634067 | biostudies-literature
| S-EPMC9259560 | biostudies-literature
2013-01-01 | E-GEOD-29210 | biostudies-arrayexpress
| S-EPMC5870730 | biostudies-literature
| S-EPMC8148342 | biostudies-literature
| S-EPMC8648918 | biostudies-literature