Dataset Information

Can Deep Learning Recognize Subtle Human Activities?

ABSTRACT: Deep Learning has driven recent and exciting progress in computer vision, instilling the belief that these algorithms could solve any visual task. Yet, datasets commonly used to train and test computer vision algorithms have pervasive confounding factors. Such biases make it difficult to truly estimate the performance of those algorithms and how well computer vision models can extrapolate outside the distribution in which they were trained. In this work, we propose a new action classification challenge that is performed well by humans, but poorly by state-of-the-art Deep Learning models. As a proof-of-principle, we consider three exemplary tasks: drinking, reading, and sitting. The best accuracies reached using state-of-the-art computer vision models were 61.7%, 62.8%, and 76.8%, respectively, while human participants scored above 90% accuracy on the three tasks. We propose a rigorous method to reduce confounds when creating datasets, and when comparing human versus computer vision performance. Source code and datasets are publicly available.

SUBMITTER: Jacquot V

PROVIDER: S-EPMC8291217 | biostudies-literature | 2020 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Can Deep Learning Recognize Subtle Human Activities?

Jacquot Vincent V Ying Zhuofan Z Kreiman Gabriel G

Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops 20200601

Deep Learning has driven recent and exciting progress in computer vision, instilling the belief that these algorithms could solve any visual task. Yet, datasets commonly used to train and test computer vision algorithms have pervasive confounding factors. Such biases make it difficult to truly estimate the performance of those algorithms and how well computer vision models can extrapolate outside the distribution in which they were trained. In this work, we propose a new action classification ch ...[more]

PMID: 34290902

Similar Datasets

Project description:BackgroundThe proportion of overweight and obese people has increased tremendously in a short period, culminating in a worldwide trend of obesity that is reaching epidemic proportions. Overweight and obesity are serious issues, especially with regard to children. This is because obese children have twice the risk of becoming obese as adults, as compared to non-obese children. Nowadays, many methods for maintaining a caloric balance exist; however, these methods are not applicable to children. In this study, a new approach for helping children monitor their activities using a convolutional neural network (CNN) is proposed, which is applicable for real-time scenarios requiring high accuracy.MethodsA total of 136 participants (86 boys and 50 girls), aged between 8.5 years and 12.5 years (mean 10.5, standard deviation 1.1), took part in this study. The participants performed various movement while wearing custom-made three-axis accelerometer modules around their waists. The data acquired by the accelerometer module was preprocessed by dividing them into small sets (128 sample points for 2.8 s). Approximately 183,600 data samples were used by the developed CNN for learning to classify ten physical activities : slow walking, fast walking, slow running, fast running, walking up the stairs, walking down the stairs, jumping rope, standing up, sitting down, and remaining still.ResultsThe developed CNN classified the ten activities with an overall accuracy of 81.2%. When similar activities were merged, leading to seven merged activities, the CNN classified activities with an overall accuracy of 91.1%. Activity merging also improved performance indicators, for the maximum case of 66.4% in recall, 48.5% in precision, and 57.4% in f1 score . The developed CNN classifier was compared to conventional machine learning algorithms such as the support vector machine, decision tree, and k-nearest neighbor algorithms, and the proposed CNN classifier performed the best: CNN (81.2%) > SVM (64.8%) > DT (63.9%) > kNN (55.4%) (for ten activities); CNN (91.1%) > SVM (74.4%) > DT (73.2%) > kNN (65.3%) (for the merged seven activities).DiscussionThe developed algorithm distinguished physical activities with improved time resolution using short-time acceleration signals from the physical activities performed by children. This study involved algorithm development, participant recruitment, IRB approval, custom-design of a data acquisition module, and data collection. The self-selected moving speeds for walking and running (slow and fast) and the structure of staircase degraded the performance of the algorithm. However, after similar activities were merged, the effects caused by the self-selection of speed were reduced. The experimental results show that the proposed algorithm performed better than conventional algorithms. Owing to its simplicity, the proposed algorithm could be applied to real-time applicaitons.

Project description:Reliable estimation of desired motion trajectories plays a crucial part in the continuous control of lower extremity assistance devices such as prostheses and orthoses. Moreover, reliable estimation methods are also required to predict hard-to-measure biomechanical quantities (e.g., joint contact moment/force) for use in sports injury science. Recognising that human locomotion is an inherently time-sequential and limb-synergetic behaviour, this study investigates models and learning algorithms for predicting the motion of a subject's leg from the motion of complementary limbs. The novel deep learning model architectures proposed are based on the Long Short-Term Memory approach with the addition of an attention mechanism. A dataset comprising Inertial Measurement Unit signals from 21 subjects traversing varied terrains was used, including stair ascent/descent, ramp ascent/descent, stopped, level-ground walking and the transitions between these conditions. Fourier Analysis is deployed to evaluate the model robustness, in addition to assessing time-based prediction errors. The experiment on three unseen test participants suggests that the branched neural network structure is preferred to tackle the multioutput problem, and the inclusion of an attention mechanism demonstrates improved performance in terms of accuracy, robustness and network size. An experimental comparison found that 57% of the model parameters were not needed after adding attention layers meanwhile the prediction error is lower than the LSTM model without attention mechanism. The attention model has errors of 9.06% and 7.64% (normalised root mean square error) for ankle and hip acceleration prediction respectively. Also, less high-frequency noise is present in the attention model predictions. We conclude that the internal structure of the proposed deep learning model is justified, principally the benefit of using an attention mechanism. Experimental results for biomechanical motion estimation are obtained, showing greater accuracy than only with LSTM. The trained attention model can be used throughout despite transitioning between terrain types. Such a model will be useful in, for example, the control of lower-limb prostheses, instead of the need to identify and switch between different trajectory generators for different walking modes.

Dataset Information

Can Deep Learning Recognize Subtle Human Activities?

Publications

Can Deep Learning Recognize Subtle Human Activities?

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets