Dataset Information

Systematic Comparison of the Influence of Different Data Preprocessing Methods on the Performance of Gait Classifications Using Machine Learning.

ABSTRACT: Human movements are characterized by highly non-linear and multi-dimensional interactions within the motor system. Therefore, the future of human movement analysis requires procedures that enhance the classification of movement patterns into relevant groups and support practitioners in their decisions. In this regard, the use of data-driven techniques seems to be particularly suitable to generate classification models. Recently, an increasing emphasis on machine-learning applications has led to a significant contribution, e.g., in increasing the classification performance. In order to ensure the generalizability of the machine-learning models, different data preprocessing steps are usually carried out to process the measured raw data before the classifications. In the past, various methods have been used for each of these preprocessing steps. However, there are hardly any standard procedures or rather systematic comparisons of these different methods and their impact on the classification performance. Therefore, the aim of this analysis is to compare different combinations of commonly applied data preprocessing steps and test their effects on the classification performance of gait patterns. A publicly available dataset on intra-individual changes of gait patterns was used for this analysis. Forty-two healthy participants performed 6 sessions of 15 gait trials for 1 day. For each trial, two force plates recorded the three-dimensional ground reaction forces (GRFs). The data was preprocessed with the following steps: GRF filtering, time derivative, time normalization, data reduction, weight normalization and data scaling. Subsequently, combinations of all methods from each preprocessing step were analyzed by comparing their prediction performance in a six-session classification using Support Vector Machines, Random Forest Classifiers, Multi-Layer Perceptrons, and Convolutional Neural Networks. The results indicate that filtering GRF data and a supervised data reduction (e.g., using Principal Components Analysis) lead to increased prediction performance of the machine-learning classifiers. Interestingly, the weight normalization and the number of data points (above a certain minimum) in the time normalization does not have a substantial effect. In conclusion, the present results provide first domain-specific recommendations for commonly applied data preprocessing methods and might help to build more comparable and more robust classification models based on machine learning that are suitable for a practical application.

SUBMITTER: Burdack J

PROVIDER: S-EPMC7174559 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Systematic Comparison of the Influence of Different Data Preprocessing Methods on the Performance of Gait Classifications Using Machine Learning.

Burdack Johannes J Horst Fabian F Giesselbach Sven S Hassan Ibrahim I Daffner Sabrina S Schöllhorn Wolfgang I WI

Frontiers in bioengineering and biotechnology 20200415

Human movements are characterized by highly non-linear and multi-dimensional interactions within the motor system. Therefore, the future of human movement analysis requires procedures that enhance the classification of movement patterns into relevant groups and support practitioners in their decisions. In this regard, the use of data-driven techniques seems to be particularly suitable to generate classification models. Recently, an increasing emphasis on machine-learning applications has led to ...[more]

PMID: 32351945

Similar Datasets

Project description:BackgroundSepsis is one of the most life-threatening circumstances for critically ill patients in the United States, while diagnosis of sepsis is challenging as a standardized criteria for sepsis identification is still under development. Disparities in social determinants of sepsis patients can interfere with the risk prediction performances using machine learning.MethodsWe analyzed a cohort of critical care patients from the Medical Information Mart for Intensive Care (MIMIC)-III database. Disparities in social determinants, including race, sex, marital status, insurance types and languages, among patients identified by six available sepsis criteria were revealed by forest plots with 95% confidence intervals. Sepsis patients were then identified by the Sepsis-3 criteria. Sixteen machine learning classifiers were trained to predict in-hospital mortality for sepsis patients on a training set constructed by random selection. The performance was measured by area under the receiver operating characteristic curve (AUC). The performance of the trained model was tested on the entire randomly conducted test set and each sub-population built based on each of the following social determinants: race, sex, marital status, insurance type, and language. The fluctuations in performances were further examined by permutation tests.ResultsWe analyzed a total of 11,791 critical care patients from the MIMIC-III database. Within the population identified by each sepsis identification method, significant differences were observed among sub-populations regarding race, marital status, insurance type, and language. On the 5783 sepsis patients identified by the Sepsis-3 criteria statistically significant performance decreases for mortality prediction were observed when applying the trained machine learning model on Asian and Hispanic patients, as well as the Spanish-speaking patients. With pairwise comparison, we detected performance discrepancies in mortality prediction between Asian and White patients, Asians and patients of other races, as well as English-speaking and Spanish-speaking patients.ConclusionsDisparities in proportions of patients identified by various sepsis criteria were detected among the different social determinant groups. The performances of mortality prediction for sepsis patients can be compromised when applying a universally trained model for each subpopulation. To achieve accurate diagnosis, a versatile diagnostic system for sepsis is needed to overcome the social determinant disparities of patients.

Project description:Gait speed is a measure of general fitness. Changing from usual (UGS) to maximum (MGS) gait speed requires coordinated action of many body systems. Gait speed reserve (GSR) is defined as MGS-UGS. From a shortlist of 88 features across five categories including sociodemographic, cognitive, and physiological, we aimed to find and compare the sets of predictors that best describe UGS, MGS, and GSR. For this, we leveraged data from 3,925 adults aged 50+ from Wave 3 of The Irish Longitudinal Study on Ageing (TILDA). Features were selected by a histogram gradient boosting regression-based stepwise feature selection pipeline. Each model's feature importance and input-output relationships were explored using TreeExplainer from the Shapely Additive Explanations explainable machine learning package. The mean Radj2 (SD) from fivefold cross-validation on training data and the Radj2 score on test data were 0.38 (0.04) and 0.41 for UGS, 0.45 (0.04) and 0.46 for MGS, and 0.19 (0.02) and 0.21 for GSR. Each model selected features across all categories. Features common to all models were age, grip strength, chair stands time, mean motor reaction time, and height. Exclusive to UGS and MGS were educational attainment, fear of falling, Montreal cognitive assessment errors, and orthostatic intolerance. Exclusive to MGS and GSR were body mass index (BMI), and number of medications. No features were selected exclusively for UGS and GSR. Features unique to UGS were resting-state pulse interval, Center for Epidemiologic Studies Depression Scale (CESD) depression, sit-to-stand difference in diastolic blood pressure, and left visual acuity. Unique to MGS were standard deviation in sustained attention to response task times, resting-state heart rate, smoking status, total heartbeat power during paced breathing, and visual acuity. Unique to GSR were accuracy proportion in a sound-induced flash illusion test, Mini-mental State Examination errors, and number of cardiovascular conditions. No interactions were present in the GSR model. The four features that overall gave the most impactful interactions in the UGS and MGS models were age, chair stands time, grip strength, and BMI. These findings may help provide new insights into the multisystem predictors of gait speed and gait speed reserve in older adults and support a network physiology approach to their study.

Dataset Information

Systematic Comparison of the Influence of Different Data Preprocessing Methods on the Performance of Gait Classifications Using Machine Learning.

Publications

Systematic Comparison of the Influence of Different Data Preprocessing Methods on the Performance of Gait Classifications Using Machine Learning.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets