sensors-logo

Journal Browser

Journal Browser

Sensing and Vision Technologies for Human Activity Recognition

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: closed (24 December 2023) | Viewed by 21639

Special Issue Editors

German Research Center for Artificial Intelligence (DFKI GmbH), Kaiserslautern, Germany
Interests: multimodal sensing; human activity recognition; deep learning
German Research Center for Artificial Intelligence (DFKI GmbH), Kaiserslautern, Germany
Interests: human activity recognition; multimodal feature representation; machine learning-based industrial applications

Special Issue Information

Dear Colleagues,

Human activity recognition (HAR) leverages machine learning from ubiquitous sensing or computer vision to understand the activity context and predict the intention of humans. In the past few decades, HAR has continued to help in seamlessly integrating technologies with daily life by providing computing services appropriately to the situational contexts. Recent development has seen a convergence of heterogenous modalities, especially combining complimentary modalities including sensing, imaging, and vision technologies, leading the path towards holistic HAR approaches.

This Special Issue aims to highlight the state-of-the-art research in HAR, especially trans-domain methodologies combining different sensing and vision technologies. The topics in focus include, but are not limited to:

  • HAR systems and studies based on sensing or vision technologies;
  • Sensor and vision fusion methods;
  • Wearable and pervasive sensing;
  • Common representations shared among heterogenous modalities;
  • Cross-modality deep learning methods;
  • Multi-modal simulations;
  • Imaging techniques for sensing (e.g., tomography) in HAR;
  • Reviews and studies regarding ethical aspects for sensing and vision in HAR;
  • Sensor or vision data generation methods.

Dr. Bo Zhou
Dr. Sungho Suh
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

17 pages, 30293 KiB  
Article
HDPose: Post-Hierarchical Diffusion with Conditioning for 3D Human Pose Estimation
by Donghoon Lee and Jaeho Kim
Sensors 2024, 24(3), 829; https://doi.org/10.3390/s24030829 - 26 Jan 2024
Viewed by 698
Abstract
Recently, monocular 3D human pose estimation (HPE) methods were used to accurately predict 3D pose by solving the ill-pose problem caused by 3D-2D projection. However, monocular 3D HPE still remains challenging owing to the inherent depth ambiguity and occlusions. To address this issue, [...] Read more.
Recently, monocular 3D human pose estimation (HPE) methods were used to accurately predict 3D pose by solving the ill-pose problem caused by 3D-2D projection. However, monocular 3D HPE still remains challenging owing to the inherent depth ambiguity and occlusions. To address this issue, previous studies have proposed diffusion model-based approaches (DDPM) that learn to reconstruct a correct 3D pose from a noisy initial 3D pose. In addition, these approaches use 2D keypoints or context encoders that encode spatial and temporal information to inform the model. However, they often fall short of achieving peak performance, or require an extended period to converge to the target pose. In this paper, we proposed HDPose, which can converge rapidly and predict 3D poses accurately. Our approach aggregated spatial and temporal information from the condition into a denoising model in a hierarchical structure. We observed that the post-hierarchical structure achieved the best performance among various condition structures. Further, we evaluated our model on the widely used Human3.6M and MPI-INF-3DHP datasets. The proposed model demonstrated competitive performance with state-of-the-art models, achieving high accuracy with faster convergence while being considerably more lightweight. Full article
(This article belongs to the Special Issue Sensing and Vision Technologies for Human Activity Recognition)
Show Figures

Figure 1

22 pages, 9579 KiB  
Article
Dynamic Japanese Sign Language Recognition Throw Hand Pose Estimation Using Effective Feature Extraction and Classification Approach
by Manato Kakizaki, Abu Saleh Musa Miah, Koki Hirooka and Jungpil Shin
Sensors 2024, 24(3), 826; https://doi.org/10.3390/s24030826 - 26 Jan 2024
Viewed by 1329
Abstract
Japanese Sign Language (JSL) is vital for communication in Japan’s deaf and hard-of-hearing community. But probably because of the large number of patterns, 46 types, there is a mixture of static and dynamic, and the dynamic ones have been excluded in most studies. [...] Read more.
Japanese Sign Language (JSL) is vital for communication in Japan’s deaf and hard-of-hearing community. But probably because of the large number of patterns, 46 types, there is a mixture of static and dynamic, and the dynamic ones have been excluded in most studies. Few researchers have been working to develop a dynamic JSL alphabet, and their performance accuracy is unsatisfactory. We proposed a dynamic JSL recognition system using effective feature extraction and feature selection approaches to overcome the challenges. In the procedure, we follow the hand pose estimation, effective feature extraction, and machine learning techniques. We collected a video dataset capturing JSL gestures through standard RGB cameras and employed MediaPipe for hand pose estimation. Four types of features were proposed. The significance of these features is that the same feature generation method can be used regardless of the number of frames or whether the features are dynamic or static. We employed a Random forest (RF) based feature selection approach to select the potential feature. Finally, we fed the reduced features into the kernels-based Support Vector Machine (SVM) algorithm classification. Evaluations conducted on our proprietary newly created dynamic Japanese sign language alphabet dataset and LSA64 dynamic dataset yielded recognition accuracies of 97.20% and 98.40%, respectively. This innovative approach not only addresses the complexities of JSL but also holds the potential to bridge communication gaps, offering effective communication for the deaf and hard-of-hearing, and has broader implications for sign language recognition systems globally. Full article
(This article belongs to the Special Issue Sensing and Vision Technologies for Human Activity Recognition)
Show Figures

Figure 1

27 pages, 9818 KiB  
Article
Robust Feature Representation Using Multi-Task Learning for Human Activity Recognition
by Behrooz Azadi, Michael Haslgrübler, Bernhard Anzengruber-Tanase, Georgios Sopidis and Alois Ferscha
Sensors 2024, 24(2), 681; https://doi.org/10.3390/s24020681 - 21 Jan 2024
Viewed by 973
Abstract
Learning underlying patterns from sensory data is crucial in the Human Activity Recognition (HAR) task to avoid poor generalization when coping with unseen data. A key solution to such an issue is representation learning, which becomes essential when input signals contain activities with [...] Read more.
Learning underlying patterns from sensory data is crucial in the Human Activity Recognition (HAR) task to avoid poor generalization when coping with unseen data. A key solution to such an issue is representation learning, which becomes essential when input signals contain activities with similar patterns or when patterns generated by different subjects for the same activity vary. To address these issues, we seek a solution to increase generalization by learning the underlying factors of each sensor signal. We develop a novel multi-channel asymmetric auto-encoder to recreate input signals precisely and extract indicative unsupervised futures. Further, we investigate the role of various activation functions in signal reconstruction to ensure the model preserves the patterns of each activity in the output. Our main contribution is that we propose a multi-task learning model to enhance representation learning through shared layers between signal reconstruction and the HAR task to improve the robustness of the model in coping with users not included in the training phase. The proposed model learns shared features between different tasks that are indeed the underlying factors of each input signal. We validate our multi-task learning model using several publicly available HAR datasets, UCI-HAR, MHealth, PAMAP2, and USC-HAD, and an in-house alpine skiing dataset collected in the wild, where our model achieved 99%, 99%, 95%, 88%, and 92% accuracy. Our proposed method shows consistent performance and good generalization on all the datasets compared to the state of the art. Full article
(This article belongs to the Special Issue Sensing and Vision Technologies for Human Activity Recognition)
Show Figures

Figure 1

20 pages, 9462 KiB  
Article
Climbing Technique Evaluation by Means of Skeleton Video Stream Analysis
by Raul Beltrán Beltrán, Julia Richter, Guido Köstermeyer and Ulrich Heinkel
Sensors 2023, 23(19), 8216; https://doi.org/10.3390/s23198216 - 01 Oct 2023
Viewed by 1331
Abstract
Due to the growing interest in climbing, increasing importance has been given to research in the field of non-invasive, camera-based motion analysis. While existing work uses invasive technologies such as wearables or modified walls and holds, or focuses on competitive sports, we for [...] Read more.
Due to the growing interest in climbing, increasing importance has been given to research in the field of non-invasive, camera-based motion analysis. While existing work uses invasive technologies such as wearables or modified walls and holds, or focuses on competitive sports, we for the first time present a system that uses video analysis to automatically recognize six movement errors that are typical for novices with limited climbing experience. Climbing a complete route consists of three repetitive climbing phases. Therefore, a characteristic joint arrangement may be detected as an error in a specific climbing phase, while this exact arrangement may not considered to be an error in another climbing phase. That is why we introduced a finite state machine to determine the current phase and to check for errors that commonly occur in the current phase. The transition between the phases depends on which joints are being used. To capture joint movements, we use a fourth-generation iPad Pro with LiDAR to record climbing sequences in which we convert the climber’s 2-D skeleton provided by the Vision framework from Apple into 3-D joints using the LiDAR depth information. Thereupon, we introduced a method that derives whether a joint moves or not, determining the current phase. Finally, the 3-D joints are analyzed with respect to defined characteristic joint arrangements to identify possible motion errors. To present the feedback to the climber, we imitate a virtual mentor by realizing an application on the iPad that creates an analysis immediately after the climber has finished the route by pointing out the detected errors and by giving suggestions for improvement. Quantitative tests with three experienced climbers that were able to climb reference routes without any errors and intentionally with errors resulted in precision–recall curves evaluating the error detection performance. The results demonstrate that while the number of false positives is still in an acceptable range, the number of detected errors is sufficient to provide climbing novices with adequate suggestions for improvement. Moreover, our study reveals limitations that mainly originate from incorrect joint localizations caused by the LiDAR sensor range. With human pose estimation becoming increasingly reliable and with the advance of sensor capabilities, these limitations will have a decreasing impact on our system performance. Full article
(This article belongs to the Special Issue Sensing and Vision Technologies for Human Activity Recognition)
Show Figures

Figure 1

21 pages, 8196 KiB  
Article
Fall Recognition Based on an IMU Wearable Device and Fall Verification through a Smart Speaker and the IoT
by Hsin-Chang Lin, Ming-Jen Chen, Chao-Hsiung Lee, Lu-Chih Kung and Jung-Tang Huang
Sensors 2023, 23(12), 5472; https://doi.org/10.3390/s23125472 - 09 Jun 2023
Cited by 2 | Viewed by 2009
Abstract
A fall is one of the most devastating events that aging people can experience. Fall-related physical injuries, hospital admission, or even mortality among the elderly are all critical health issues. As the population continues to age worldwide, there is an imperative need to [...] Read more.
A fall is one of the most devastating events that aging people can experience. Fall-related physical injuries, hospital admission, or even mortality among the elderly are all critical health issues. As the population continues to age worldwide, there is an imperative need to develop fall detection systems. We propose a system for the recognition and verification of falls based on a chest-worn wearable device, which can be used for elderly health institutions or home care. The wearable device utilizes a built-in three-axis accelerometer and gyroscope in the nine-axis inertial sensor to determine the user’s postures, such as standing, sitting, and lying down. The resultant force was obtained by calculation with three-axis acceleration. Integration of three-axis acceleration and a three-axis gyroscope can obtain a pitch angle through the gradient descent algorithm. The height value was converted from a barometer. Integration of the pitch angle with the height value can determine the behavior state including sitting down, standing up, walking, lying down, and falling. In our study, we can clearly determine the direction of the fall. Acceleration changes during the fall can determine the force of the impact. Furthermore, with the IoT (Internet of Things) and smart speakers, we can verify whether the user has fallen by asking from smart speakers. In this study, posture determination is operated directly on the wearable device through the state machine. The ability to recognize and report a fall event in real-time can help to lessen the response time of a caregiver. The family members or care provider monitor, in real-time, the user’s current posture via a mobile device app or internet webpage. All collected data supports subsequent medical evaluation and further intervention. Full article
(This article belongs to the Special Issue Sensing and Vision Technologies for Human Activity Recognition)
Show Figures

Figure 1

13 pages, 3898 KiB  
Article
Forward and Backward Walking: Multifactorial Characterization of Gait Parameters
by Lucia Donno, Cecilia Monoli, Carlo Albino Frigo and Manuela Galli
Sensors 2023, 23(10), 4671; https://doi.org/10.3390/s23104671 - 11 May 2023
Viewed by 2819
Abstract
Although extensive literature exists on forward and backward walking, a comprehensive assessment of gait parameters on a wide and homogenous population is missing. Thus, the purpose of this study is to analyse the differences between the two gait typologies on a relatively large [...] Read more.
Although extensive literature exists on forward and backward walking, a comprehensive assessment of gait parameters on a wide and homogenous population is missing. Thus, the purpose of this study is to analyse the differences between the two gait typologies on a relatively large sample. Twenty-four healthy young adults participated in this study. By means of a marker-based optoelectronic system and force platforms, differences between forward and backward walking were outlined in terms of kinematics and kinetics. Statistically, significant differences were observed in most of the spatial–temporal parameters, evidencing some adaptation mechanisms in backward walking. Differently from the ankle joint, the hip and knee range of motion was significantly reduced when switching from forward to backward walking. In terms of kinetics, hip and ankle moment patterns for forward and backward walking were approximately mirrored images of each other. Moreover, joint powers appeared drastically reduced during reversed gait. Specifically, valuable differences in terms of produced and absorbed joint powers between forward and backward walking were pointed out. The outcomes of this study could represent a useful reference data for future investigation evaluating the efficacy of backward walking as a rehabilitation tool for pathological subjects. Full article
(This article belongs to the Special Issue Sensing and Vision Technologies for Human Activity Recognition)
Show Figures

Figure 1

17 pages, 1182 KiB  
Article
An Automatic Calibration Method for Kappa Angle Based on a Binocular Gaze Constraint
by Jiahui Liu, Jiannan Chi and Hang Sun
Sensors 2023, 23(8), 3929; https://doi.org/10.3390/s23083929 - 12 Apr 2023
Cited by 1 | Viewed by 1747
Abstract
Kappa-angle calibration shows its importance in gaze tracking due to the special structure of the eyeball. In a 3D gaze-tracking system, after the optical axis of the eyeball is reconstructed, the kappa angle is needed to convert the optical axis of the eyeball [...] Read more.
Kappa-angle calibration shows its importance in gaze tracking due to the special structure of the eyeball. In a 3D gaze-tracking system, after the optical axis of the eyeball is reconstructed, the kappa angle is needed to convert the optical axis of the eyeball to the real gaze direction. At present, most of the kappa-angle-calibration methods use explicit user calibration. Before eye-gaze tracking, the user needs to look at some pre-defined calibration points on the screen, thereby providing some corresponding optical and visual axes of the eyeball with which to calculate the kappa angle. Especially when multi-point user calibration is required, the calibration process is relatively complicated. In this paper, a method that can automatically calibrate the kappa angle during screen browsing is proposed. Based on the 3D corneal centers and optical axes of both eyes, the optimal objective function of the kappa angle is established according to the coplanar constraint of the visual axes of the left and right eyes, and the differential evolution algorithm is used to iterate through kappa angles according to the theoretical angular constraint of the kappa angle. The experiments show that the proposed method can make the gaze accuracy reach 1.3° in the horizontal plane and 1.34° in the vertical plane, both of which are within the acceptable margins of gaze-estimation error. The demonstration of explicit kappa-angle calibration is of great significance to the realization of the instant use of gaze-tracking systems. Full article
(This article belongs to the Special Issue Sensing and Vision Technologies for Human Activity Recognition)
Show Figures

Figure 1

Review

Jump to: Research

25 pages, 8814 KiB  
Review
Deep Learning-Based Anomaly Detection in Video Surveillance: A Survey
by Huu-Thanh Duong, Viet-Tuan Le and Vinh Truong Hoang
Sensors 2023, 23(11), 5024; https://doi.org/10.3390/s23115024 - 24 May 2023
Cited by 13 | Viewed by 9516
Abstract
Anomaly detection in video surveillance is a highly developed subject that is attracting increased attention from the research community. There is great demand for intelligent systems with the capacity to automatically detect anomalous events in streaming videos. Due to this, a wide variety [...] Read more.
Anomaly detection in video surveillance is a highly developed subject that is attracting increased attention from the research community. There is great demand for intelligent systems with the capacity to automatically detect anomalous events in streaming videos. Due to this, a wide variety of approaches have been proposed to build an effective model that would ensure public security. There has been a variety of surveys of anomaly detection, such as of network anomaly detection, financial fraud detection, human behavioral analysis, and many more. Deep learning has been successfully applied to many aspects of computer vision. In particular, the strong growth of generative models means that these are the main techniques used in the proposed methods. This paper aims to provide a comprehensive review of the deep learning-based techniques used in the field of video anomaly detection. Specifically, deep learning-based approaches have been categorized into different methods by their objectives and learning metrics. Additionally, preprocessing and feature engineering techniques are discussed thoroughly for the vision-based domain. This paper also describes the benchmark databases used in training and detecting abnormal human behavior. Finally, the common challenges in video surveillance are discussed, to offer some possible solutions and directions for future research. Full article
(This article belongs to the Special Issue Sensing and Vision Technologies for Human Activity Recognition)
Show Figures

Figure 1

Back to TopTop