Unknown

Dataset Information

0

Multiple similarly effective solutions exist for biomedical feature selection and classification problems.


ABSTRACT: Binary classification is a widely employed problem to facilitate the decisions on various biomedical big data questions, such as clinical drug trials between treated participants and controls, and genome-wide association studies (GWASs) between participants with or without a phenotype. A machine learning model is trained for this purpose by optimizing the power of discriminating samples from two groups. However, most of the classification algorithms tend to generate one locally optimal solution according to the input dataset and the mathematical presumptions of the dataset. Here we demonstrated from the aspects of both disease classification and feature selection that multiple different solutions may have similar classification performances. So the existing machine learning algorithms may have ignored a horde of fishes by catching only a good one. Since most of the existing machine learning algorithms generate a solution by optimizing a mathematical goal, it may be essential for understanding the biological mechanisms for the investigated classification question, by considering both the generated solution and the ignored ones.

SUBMITTER: Liu J 

PROVIDER: S-EPMC5634418 | biostudies-literature | 2017 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Multiple similarly effective solutions exist for biomedical feature selection and classification problems.

Liu Jiamei J   Xu Cheng C   Yang Weifeng W   Shu Yayun Y   Zheng Weiwei W   Zhou Fengfeng F  

Scientific reports 20171009 1


Binary classification is a widely employed problem to facilitate the decisions on various biomedical big data questions, such as clinical drug trials between treated participants and controls, and genome-wide association studies (GWASs) between participants with or without a phenotype. A machine learning model is trained for this purpose by optimizing the power of discriminating samples from two groups. However, most of the classification algorithms tend to generate one locally optimal solution  ...[more]

Similar Datasets

| S-EPMC6235481 | biostudies-literature
| S-EPMC5158321 | biostudies-literature
| S-EPMC4102475 | biostudies-literature
| S-EPMC6101392 | biostudies-literature
| S-EPMC2789385 | biostudies-literature
| S-EPMC1854845 | biostudies-other
| S-EPMC3206593 | biostudies-literature
| S-EPMC115205 | biostudies-literature
| S-EPMC5050509 | biostudies-literature
| S-EPMC4127204 | biostudies-other