sensors-logo

Journal Browser

Journal Browser

Human Activity Recognition Based on Image Sensors and Deep Learning

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: closed (31 October 2021) | Viewed by 20819

Special Issue Editors


E-Mail Website
Guest Editor
LISPEN EA 7515, Arts et Métiers, Institut Image, Chalon-sur-Saône, Burgundy, France
Interests: virtual and augmented reality; computer vision; image processing

E-Mail Website
Guest Editor
ImViA EA 7535, Dijon, Burgundy, France
Interests: computer vision; motion analysis; human monitoring

Special Issue Information

Dear Colleagues,

Video-based human activity recognition (HAR) has made considerable progress in recent years due to its applications in various fields, such as surveillance, entertainment, smart homes, sport analysis, human–computer interaction, virtual reality, enhanced manufacturing, and healthcare systems. Its purpose is to automatically detect, track, and describe human activities in a sequence of image frames.

Deep learning (DL) techniques have become popular for video-based HAR, thanks in particular to their accuracy and their ability to handle large and well-annotated video databases. Nonetheless, their application to this field is still relatively new. Consequently, the exploration of use of DL in video-based HAR provides scope for significant contributions. For example, common DL approaches automatically extract hierarchical features from static images and do not take into account motion, which is a key feature for human activity description. Using techniques such as long short-term memory (LSTM), which have proven their power in motion modeling, could provide more efficient solutions. Moreover, multistream networks would allow modeling the temporal dependencies between motion and appearance. To deal with challenging conditions (such as background cluttering and illumination change) introduced by the motion in images and improve classification performance, advanced DL architectures could be considered like transfer learning, generative adversarial network (GAN), and multitask learning.

The aim of this Special Issue is to report on recent research works on video-based human activity recognition using advanced deep learning techniques. We encourage submissions of conceptual, empirical, and literature review papers focusing on this field, regardless of the application area.

Prof. Dr. Fakhreddine Ababsa
Dr. Cyrille Migniot
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 5693 KiB  
Article
Consciousness Detection on Injured Simulated Patients Using Manual and Automatic Classification via Visible and Infrared Imaging
by Diana Queirós Pokee, Carina Barbosa Pereira, Lucas Mösch, Andreas Follmann and Michael Czaplik
Sensors 2021, 21(24), 8455; https://doi.org/10.3390/s21248455 - 18 Dec 2021
Cited by 2 | Viewed by 2913
Abstract
In a disaster scene, triage is a key principle for effectively rescuing injured people according to severity level. One main parameter of the used triage algorithm is the patient’s consciousness. Unmanned aerial vehicles (UAV) have been investigated toward (semi-)automatic triage. In addition to [...] Read more.
In a disaster scene, triage is a key principle for effectively rescuing injured people according to severity level. One main parameter of the used triage algorithm is the patient’s consciousness. Unmanned aerial vehicles (UAV) have been investigated toward (semi-)automatic triage. In addition to vital parameters, such as heart and respiratory rate, UAVs should detect victims’ mobility and consciousness from the video data. This paper presents an algorithm combining deep learning with image processing techniques to detect human bodies for further (un)consciousness classification. The algorithm was tested in a 20-subject group in an outside environment with static (RGB and thermal) cameras where participants performed different limb movements in different body positions and angles between the cameras and the bodies’ longitudinal axis. The results verified that the algorithm performed better in RGB. For the most probable case of 0 degrees, RGB data obtained the following results: Mathews correlation coefficient (MMC) of 0.943, F1-score of 0.951, and precision-recall area under curve AUC (PRC) score of 0.968. For the thermal data, the MMC was 0.913, F1-score averaged 0.923, and AUC (PRC) was 0.960. Overall, the algorithm may be promising along with others for a complete contactless triage assessment in disaster events during day and night. Full article
(This article belongs to the Special Issue Human Activity Recognition Based on Image Sensors and Deep Learning)
Show Figures

Figure 1

20 pages, 2542 KiB  
Article
Adaptive Attention Memory Graph Convolutional Networks for Skeleton-Based Action Recognition
by Di Liu, Hui Xu, Jianzhong Wang, Yinghua Lu, Jun Kong and Miao Qi
Sensors 2021, 21(20), 6761; https://doi.org/10.3390/s21206761 - 12 Oct 2021
Cited by 4 | Viewed by 2263
Abstract
Graph Convolutional Networks (GCNs) have attracted a lot of attention and shown remarkable performance for action recognition in recent years. For improving the recognition accuracy, how to build graph structure adaptively, select key frames and extract discriminative features are the key problems of [...] Read more.
Graph Convolutional Networks (GCNs) have attracted a lot of attention and shown remarkable performance for action recognition in recent years. For improving the recognition accuracy, how to build graph structure adaptively, select key frames and extract discriminative features are the key problems of this kind of method. In this work, we propose a novel Adaptive Attention Memory Graph Convolutional Networks (AAM-GCN) for human action recognition using skeleton data. We adopt GCN to adaptively model the spatial configuration of skeletons and employ Gated Recurrent Unit (GRU) to construct an attention-enhanced memory for capturing the temporal feature. With the memory module, our model can not only remember what happened in the past but also employ the information in the future using multi-bidirectional GRU layers. Furthermore, in order to extract discriminative temporal features, the attention mechanism is also employed to select key frames from the skeleton sequence. Extensive experiments on Kinetics, NTU RGB+D and HDM05 datasets show that the proposed network achieves better performance than some state-of-the-art methods. Full article
(This article belongs to the Special Issue Human Activity Recognition Based on Image Sensors and Deep Learning)
Show Figures

Figure 1

16 pages, 1852 KiB  
Article
Multi-Channel Generative Framework and Supervised Learning for Anomaly Detection in Surveillance Videos
by Tuan-Hung Vu, Jacques Boonaert, Sebastien Ambellouis and Abdelmalik Taleb-Ahmed
Sensors 2021, 21(9), 3179; https://doi.org/10.3390/s21093179 - 03 May 2021
Cited by 10 | Viewed by 2731
Abstract
Recently, most state-of-the-art anomaly detection methods are based on apparent motion and appearance reconstruction networks and use error estimation between generated and real information as detection features. These approaches achieve promising results by only using normal samples for training steps. In this paper, [...] Read more.
Recently, most state-of-the-art anomaly detection methods are based on apparent motion and appearance reconstruction networks and use error estimation between generated and real information as detection features. These approaches achieve promising results by only using normal samples for training steps. In this paper, our contributions are two-fold. On the one hand, we propose a flexible multi-channel framework to generate multi-type frame-level features. On the other hand, we study how it is possible to improve the detection performance by supervised learning. The multi-channel framework is based on four Conditional GANs (CGANs) taking various type of appearance and motion information as input and producing prediction information as output. These CGANs provide a better feature space to represent the distinction between normal and abnormal events. Then, the difference between those generative and ground-truth information is encoded by Peak Signal-to-Noise Ratio (PSNR). We propose to classify those features in a classical supervised scenario by building a small training set with some abnormal samples of the original test set of the dataset. The binary Support Vector Machine (SVM) is applied for frame-level anomaly detection. Finally, we use Mask R-CNN as detector to perform object-centric anomaly localization. Our solution is largely evaluated on Avenue, Ped1, Ped2, and ShanghaiTech datasets. Our experiment results demonstrate that PSNR features combined with supervised SVM are better than error maps computed by previous methods. We achieve state-of-the-art performance for frame-level AUC on Ped1 and ShanghaiTech. Especially, for the most challenging Shanghaitech dataset, a supervised training model outperforms up to 9% the state-of-the-art an unsupervised strategy. Full article
(This article belongs to the Special Issue Human Activity Recognition Based on Image Sensors and Deep Learning)
Show Figures

Figure 1

16 pages, 1797 KiB  
Article
Using a Deep Learning Method and Data from Two-Dimensional (2D) Marker-Less Video-Based Images for Walking Speed Classification
by Tasriva Sikandar, Mohammad F. Rabbi, Kamarul H. Ghazali, Omar Altwijri, Mahdi Alqahtani, Mohammed Almijalli, Saleh Altayyar and Nizam U. Ahamed
Sensors 2021, 21(8), 2836; https://doi.org/10.3390/s21082836 - 17 Apr 2021
Cited by 6 | Viewed by 2760
Abstract
Human body measurement data related to walking can characterize functional movement and thereby become an important tool for health assessment. Single-camera-captured two-dimensional (2D) image sequences of marker-less walking individuals might be a simple approach for estimating human body measurement data which could be [...] Read more.
Human body measurement data related to walking can characterize functional movement and thereby become an important tool for health assessment. Single-camera-captured two-dimensional (2D) image sequences of marker-less walking individuals might be a simple approach for estimating human body measurement data which could be used in walking speed-related health assessment. Conventional body measurement data of 2D images are dependent on body-worn garments (used as segmental markers) and are susceptible to changes in the distance between the participant and camera in indoor and outdoor settings. In this study, we propose five ratio-based body measurement data that can be extracted from 2D images and can be used to classify three walking speeds (i.e., slow, normal, and fast) using a deep learning-based bidirectional long short-term memory classification model. The results showed that average classification accuracies of 88.08% and 79.18% could be achieved in indoor and outdoor environments, respectively. Additionally, the proposed ratio-based body measurement data are independent of body-worn garments and not susceptible to changes in the distance between the walking individual and camera. As a simple but efficient technique, the proposed walking speed classification has great potential to be employed in clinics and aged care homes. Full article
(This article belongs to the Special Issue Human Activity Recognition Based on Image Sensors and Deep Learning)
Show Figures

Figure 1

17 pages, 3592 KiB  
Article
3D Human Pose Estimation with a Catadioptric Sensor in Unconstrained Environments Using an Annealed Particle Filter
by Fakhreddine Ababsa, Hicham Hadj-Abdelkader and Marouane Boui
Sensors 2020, 20(23), 6985; https://doi.org/10.3390/s20236985 - 07 Dec 2020
Cited by 3 | Viewed by 2777
Abstract
The purpose of this paper is to investigate the problem of 3D human tracking in complex environments using a particle filter with images captured by a catadioptric vision system. This issue has been widely studied in the literature on RGB images acquired from [...] Read more.
The purpose of this paper is to investigate the problem of 3D human tracking in complex environments using a particle filter with images captured by a catadioptric vision system. This issue has been widely studied in the literature on RGB images acquired from conventional perspective cameras, while omnidirectional images have seldom been used and published research works in this field remains limited. In this study, the Riemannian varieties was considered in order to compute the gradient on spherical images and generate a robust descriptor used along with an SVM classifier for human detection. Original likelihood functions associated with the particle filter are proposed, using both geodesic distances and overlapping regions between the silhouette detected in the images and the projected 3D human model. Our approach was experimentally evaluated on real data and showed favorable results compared to machine learning based techniques about the 3D pose accuracy. Thus, the Root Mean Square Error (RMSE) was measured by comparing estimated 3D poses and truth data, resulting in a mean error of 0.065 m when walking action was applied. Full article
(This article belongs to the Special Issue Human Activity Recognition Based on Image Sensors and Deep Learning)
Show Figures

Figure 1

21 pages, 3590 KiB  
Article
Recognition of Non-Manual Content in Continuous Japanese Sign Language
by Heike Brock, Iva Farag and Kazuhiro Nakadai
Sensors 2020, 20(19), 5621; https://doi.org/10.3390/s20195621 - 01 Oct 2020
Cited by 15 | Viewed by 2504
Abstract
The quality of recognition systems for continuous utterances in signed languages could be largely advanced within the last years. However, research efforts often do not address specific linguistic features of signed languages, as e.g., non-manual expressions. In this work, we evaluate the potential [...] Read more.
The quality of recognition systems for continuous utterances in signed languages could be largely advanced within the last years. However, research efforts often do not address specific linguistic features of signed languages, as e.g., non-manual expressions. In this work, we evaluate the potential of a single video camera-based recognition system with respect to the latter. For this, we introduce a two-stage pipeline based on two-dimensional body joint positions extracted from RGB camera data. The system first separates the data flow of a signed expression into meaningful word segments on the base of a frame-wise binary Random Forest. Next, every segment is transformed into image-like shape and classified with a Convolutional Neural Network. The proposed system is then evaluated on a data set of continuous sentence expressions in Japanese Sign Language with a variation of non-manual expressions. Exploring multiple variations of data representations and network parameters, we are able to distinguish word segments of specific non-manual intonations with 86% accuracy from the underlying body joint movement data. Full sentence predictions achieve a total Word Error Rate of 15.75%. This marks an improvement of 13.22% as compared to ground truth predictions obtained from labeling insensitive towards non-manual content. Consequently, our analysis constitutes an important contribution for a better understanding of mixed manual and non-manual content in signed communication. Full article
(This article belongs to the Special Issue Human Activity Recognition Based on Image Sensors and Deep Learning)
Show Figures

Figure 1

15 pages, 4550 KiB  
Article
Focus on the Visible Regions: Semantic-Guided Alignment Model for Occluded Person Re-Identification
by Qin Yang, Peizhi Wang, Zihan Fang and Qiyong Lu
Sensors 2020, 20(16), 4431; https://doi.org/10.3390/s20164431 - 08 Aug 2020
Cited by 26 | Viewed by 3632
Abstract
The occlusion problem is very common in pedestrian retrieval scenarios. When persons are occluded by various obstacles, the noise caused by the occluded area greatly affects the retrieval results. However, many previous pedestrian re-identification (Re-ID) methods ignore this problem. To solve it, we [...] Read more.
The occlusion problem is very common in pedestrian retrieval scenarios. When persons are occluded by various obstacles, the noise caused by the occluded area greatly affects the retrieval results. However, many previous pedestrian re-identification (Re-ID) methods ignore this problem. To solve it, we propose a semantic-guided alignment model that uses image semantic information to separate useful information from occlusion noise. In the image preprocessing phase, we use a human semantic parsing network to generate probability maps. These maps show which regions of images are occluded, and the model automatically crops images to preserve the visible parts. In the construction phase, we fuse the probability maps with the global features of the image, and semantic information guides the model to focus on visible human regions and extract local features. During the matching process, we propose a measurement strategy that only calculates the distance of public areas (visible human areas on both images) between images, thereby suppressing the spatial misalignment caused by non-public areas. Experimental results on a series of public datasets confirm that our method outperforms previous occluded Re-ID methods, and it achieves top performance in the holistic Re-ID problem. Full article
(This article belongs to the Special Issue Human Activity Recognition Based on Image Sensors and Deep Learning)
Show Figures

Figure 1

Back to TopTop