Research

Jump to: Other

13 pages, 1897 KiB

Open AccessArticle

Driver Abnormal Expression Detection Method Based on Improved Lightweight YOLOv5

by Keming Yao, Zhongzhou Wang, Fuao Guo and Feng Li

Electronics 2024, 13(6), 1138; https://doi.org/10.3390/electronics13061138 - 20 Mar 2024

Viewed by 521

The rapid advancement of intelligent assisted driving technology has significantly enhanced transportation convenience in society and contributed to the mitigation of traffic safety hazards. Addressing the potential for drivers to experience abnormal physical conditions during the driving process, an enhanced lightweight network model [...] Read more.

The rapid advancement of intelligent assisted driving technology has significantly enhanced transportation convenience in society and contributed to the mitigation of traffic safety hazards. Addressing the potential for drivers to experience abnormal physical conditions during the driving process, an enhanced lightweight network model based on YOLOv5 for detecting abnormal facial expressions of drivers is proposed in this paper. Initially, the lightweighting of the YOLOv5 backbone network is achieved by integrating the FasterNet Block, a lightweight module from the FasterNet network, with the C3 module in the main network. This combination forms the C3-faster module. Subsequently, the original convolutional modules in the YOLOv5 model are replaced with the improved GSConvns module to reduce computational load. Building upon the GSConvns module, the VoV-GSCSP module is constructed to ensure the lightweighting of the neck network while maintaining detection accuracy. Finally, channel pruning and fine-tuning operations are applied to the entire model. Channel pruning involves removing channels with minimal impact on output results, further reducing the model’s computational load, parameters, and size. The fine-tuning operation compensates for any potential loss in detection accuracy. Experimental results demonstrate that the proposed model achieves a substantial reduction in both parameter count and computational load while maintaining a high detection accuracy of 84.5%. The improved model has a compact size of only 4.6 MB, making it more conducive to the efficient operation of onboard computers. Full article

(This article belongs to the Special Issue Advances of Artificial Intelligence and Vision Applications)

► Show Figures

Figure 1

13 pages, 6704 KiB

Open AccessArticle

A Multi-Object Tracking Approach Combining Contextual Features and Trajectory Prediction

by Peng Zhang, Qingyang Jing, Xinlei Zhao, Lijia Dong, Weimin Lei, Wei Zhang and Zhaonan Lin

Electronics 2023, 12(23), 4720; https://doi.org/10.3390/electronics12234720 - 21 Nov 2023

Viewed by 822

Abstract

Aiming to solve the problem of the identity switching of objects with similar appearances in real scenarios, a multi-object tracking approach combining contextual features and trajectory prediction is proposed. This approach integrates the motion and appearance features of objects. The motion features are [...] Read more.

Aiming to solve the problem of the identity switching of objects with similar appearances in real scenarios, a multi-object tracking approach combining contextual features and trajectory prediction is proposed. This approach integrates the motion and appearance features of objects. The motion features are mainly used for trajectory prediction, and the appearance features are divided into contextual features and individual features, which are mainly used for trajectory matching. In order to accurately distinguish the identities of objects with similar appearances, a context graph is constructed by taking the specified object as the master node and its neighboring objects as the branch nodes. A preprocessing module is applied to exclude unnecessary connections in the graph model based on the speed of the historical trajectory of the object, and to distinguish the features of objects with similar appearances. Feature matching is performed using the Hungarian algorithm, based on the similarity matrix obtained from the features. Post-processing is performed for the temporarily unmatched frames to obtain the final object matching results. The experimental results show that the approach proposed in this paper can achieve the highest MOTA. Full article

(This article belongs to the Special Issue Advances of Artificial Intelligence and Vision Applications)

► Show Figures

Figure 1

18 pages, 4763 KiB

Open AccessArticle

Automated Facial Emotion Recognition Using the Pelican Optimization Algorithm with a Deep Convolutional Neural Network

by Mohammed Alonazi, Hala J. Alshahrani, Faiz Abdullah Alotaibi, Mohammed Maray, Mohammed Alghamdi and Ahmed Sayed

Electronics 2023, 12(22), 4608; https://doi.org/10.3390/electronics12224608 - 11 Nov 2023

Cited by 2 | Viewed by 1225

Abstract

Facial emotion recognition (FER) stands as a pivotal artificial intelligence (AI)-driven technology that exploits the capabilities of computer-vision techniques for decoding and comprehending emotional expressions displayed on human faces. With the use of machine-learning (ML) models, specifically deep neural networks (DNN), FER empowers [...] Read more.

Facial emotion recognition (FER) stands as a pivotal artificial intelligence (AI)-driven technology that exploits the capabilities of computer-vision techniques for decoding and comprehending emotional expressions displayed on human faces. With the use of machine-learning (ML) models, specifically deep neural networks (DNN), FER empowers the automatic detection and classification of a broad spectrum of emotions, encompassing surprise, happiness, sadness, anger, and more. Challenges in FER include handling variations in lighting, poses, and facial expressions, as well as ensuring that the model generalizes well to various emotions and populations. This study introduces an automated facial emotion recognition using the pelican optimization algorithm with a deep convolutional neural network (AFER-POADCNN) model. The primary objective of the AFER-POADCNN model lies in the automatic recognition and classification of facial emotions. To accomplish this, the AFER-POADCNN model exploits the median-filtering (MF) approach to remove the noise present in it. Furthermore, the capsule-network (CapsNet) approach can be applied to the feature-extraction process, allowing the model to capture intricate facial expressions and nuances. To optimize the CapsNet model’s performance, hyperparameter tuning is undertaken with the aid of the pelican optimization algorithm (POA). This ensures that the model is finely tuned to detect a wide array of emotions and generalizes effectively across diverse populations and scenarios. Finally, the detection and classification of different kinds of facial emotions take place using a bidirectional long short-term memory (BiLSTM) network. The simulation analysis of the AFER-POADCNN system is tested on a benchmark FER dataset. The comparative result analysis showed the better performance of the AFER-POADCNN algorithm over existing models, with a maximum accuracy of 99.05%. Full article

(This article belongs to the Special Issue Advances of Artificial Intelligence and Vision Applications)

► Show Figures

Figure 1

15 pages, 2716 KiB

Open AccessArticle

Fault Detection in Solar Energy Systems: A Deep Learning Approach

by Zeynep Bala Duranay

Electronics 2023, 12(21), 4397; https://doi.org/10.3390/electronics12214397 - 24 Oct 2023

Cited by 1 | Viewed by 2506

Abstract

While solar energy holds great significance as a clean and sustainable energy source, photovoltaic panels serve as the linchpin of this energy conversion process. However, defects in these panels can adversely impact energy production, necessitating the rapid and effective detection of such faults. [...] Read more.

While solar energy holds great significance as a clean and sustainable energy source, photovoltaic panels serve as the linchpin of this energy conversion process. However, defects in these panels can adversely impact energy production, necessitating the rapid and effective detection of such faults. This study explores the potential of using infrared solar module images for the detection of photovoltaic panel defects through deep learning, which represents a crucial step toward enhancing the efficiency and sustainability of solar energy systems. A dataset comprising 20,000 images, derived from infrared solar modules, was utilized in this study, consisting of 12 classes: cell, cell-multi, cracking, diode, diode-multi, hot spot, hot spot-multi, no-anomaly, offline-module, shadowing, soiling, and vegetation. The methodology employed the exemplar Efficientb0 model. From the exemplar model, 17,000 features were selected using the NCA feature selector. Subsequently, classification was performed using an SVM classifier. The proposed method applied to a dataset consisting of 12 classes has yielded successful results in terms of accuracy, F1-score, precision, and sensitivity metrics. These results indicate average values of 93.93% accuracy, 89.82% F1-score, 91.50% precision, and 88.28% sensitivity, respectively. The proposed method in this study accurately classifies photovoltaic panel defects based on images of infrared solar modules. Full article

(This article belongs to the Special Issue Advances of Artificial Intelligence and Vision Applications)

► Show Figures

Figure 1

12 pages, 652 KiB

Open AccessArticle

Deform2NeRF: Non-Rigid Deformation and 2D–3D Feature Fusion with Cross-Attention for Dynamic Human Reconstruction

by Xiaolong Xie, Xusheng Guo, Wei Li, Jie Liu and Jianfeng Xu

Electronics 2023, 12(21), 4382; https://doi.org/10.3390/electronics12214382 - 24 Oct 2023

Cited by 1 | Viewed by 1086

Abstract

Reconstructing dynamic human body models from multi-view videos poses a substantial challenge in the field of 3D computer vision. Currently, the Animatable NeRF method addresses this challenge by mapping observed points from the viewing space to a canonical space. However, this mapping introduces [...] Read more.

Reconstructing dynamic human body models from multi-view videos poses a substantial challenge in the field of 3D computer vision. Currently, the Animatable NeRF method addresses this challenge by mapping observed points from the viewing space to a canonical space. However, this mapping introduces positional shifts in predicted points, resulting in artifacts, particularly in intricate areas. In this paper, we propose an innovative approach called Deform2NeRF that incorporates non-rigid deformation correction and image feature fusion modules into the Animatable NeRF framework to enhance the reconstruction of animatable human models. Firstly, we introduce a non-rigid deformation field network to address the issue of point position shift effectively. This network adeptly corrects positional discrepancies caused by non-rigid deformations. Secondly, we introduce a 2D–3D feature fusion learning module with cross-attention and integrate it with the NeRF network to mitigate artifacts in specific detailed regions. Our experimental results demonstrate that our method significantly improves the PSNR index by approximately 5% compared to representative methods in the field. This remarkable advancement underscores the profound importance of our approach in the domains of new view synthesis and digital human reconstruction. Full article

(This article belongs to the Special Issue Advances of Artificial Intelligence and Vision Applications)

► Show Figures

Figure 1

19 pages, 4827 KiB

Open AccessArticle

Global Individual Interaction Network Based on Consistency for Group Activity Recognition

by Cheng Huang, Dong Zhang, Bing Li, Yun Xian and Dah-Jye Lee

Electronics 2023, 12(19), 4104; https://doi.org/10.3390/electronics12194104 - 30 Sep 2023

Viewed by 601

Abstract

Modeling the interactions among individuals in a group is essential for group activity recognition (GAR). Various graph neural networks (GNNs) are regarded as popular modeling methods for GAR, as they can characterize the interaction among individuals at a low computational cost. The performance [...] Read more.

Modeling the interactions among individuals in a group is essential for group activity recognition (GAR). Various graph neural networks (GNNs) are regarded as popular modeling methods for GAR, as they can characterize the interaction among individuals at a low computational cost. The performance of the current GNN-based modeling methods is affected by two factors. Firstly, their local receptive field in the mapping layer limits their ability to characterize the global interactions among individuals in spatial–temporal dimensions. Secondly, GNN-based GAR methods do not have an efficient mechanism to use global activity consistency and individual action consistency. In this paper, we argue that the global interactions among individuals, as well as the constraints of global activity and individual action consistencies, are critical to group activity recognition. We propose new convolutional operations to capture the interactions among individuals from a global perspective. We use contrastive learning to maximize the global activity consistency and individual action consistency for more efficient recognition. Comprehensive experiments show that our method achieved better GAR performance than the state-of-the-art methods on two popular GAR benchmark datasets. Full article

(This article belongs to the Special Issue Advances of Artificial Intelligence and Vision Applications)

► Show Figures

Figure 1

19 pages, 6756 KiB

Open AccessArticle

An AIoT-Based Assistance System for Visually Impaired People

by Jiawen Li, Lianglu Xie, Zhe Chen, Liang Shi, Rongjun Chen, Yongqi Ren, Leijun Wang and Xu Lu

Electronics 2023, 12(18), 3760; https://doi.org/10.3390/electronics12183760 - 06 Sep 2023

Cited by 3 | Viewed by 2280

Abstract

In this work, an assistance system based on the Artificial Intelligence of Things (AIoT) framework was designed and implemented to provide convenience for visually impaired people. This system aims to be low-cost and multi-functional with object detection, obstacle distance measurement, and text recognition [...] Read more.

In this work, an assistance system based on the Artificial Intelligence of Things (AIoT) framework was designed and implemented to provide convenience for visually impaired people. This system aims to be low-cost and multi-functional with object detection, obstacle distance measurement, and text recognition achieved by wearable smart glasses, heart rate detection, fall detection, body temperature measurement, and humidity-temperature monitoring offered by an intelligent walking stick. The total hardware cost is approximately $66.8, as diverse low-cost sensors and modules are embedded. Meanwhile, a voice assistant is adopted, which helps to convey detection results to users. As for the performance evaluation, the accuracies of object detection and text recognition in the wearable smart glasses experiments are 92.16% and 99.91%, respectively, and the maximum deviation rate compared to the mobile app on obstacle distance measurement is 6.32%. In addition, the intelligent walking stick experiments indicate that the maximum deviation rates compared to the commercial devices on heart rate detection, body temperature measurement, and humidity-temperature monitoring are 3.52%, 0.19%, and 3.13%, respectively, and the fall detection accuracy is 87.33%. Such results demonstrate that the proposed assistance system yields reliable performances similar to commercial devices and is impressive when considering the total cost as a primary concern. Consequently, it satisfies the fundamental requirements of daily life, benefiting the safety and well-being of visually impaired people. Full article

(This article belongs to the Special Issue Advances of Artificial Intelligence and Vision Applications)

► Show Figures

Figure 1

13 pages, 10750 KiB

Open AccessArticle

AI-Driven High-Precision Model for Blockage Detection in Urban Wastewater Systems

by Ravindra R. Patil, Rajnish Kaur Calay, Mohamad Y. Mustafa and Saniya M. Ansari

Electronics 2023, 12(17), 3606; https://doi.org/10.3390/electronics12173606 - 26 Aug 2023

Cited by 1 | Viewed by 967

Abstract

In artificial intelligence (AI), computer vision consists of intelligent models to interpret and recognize the visual world, similar to human vision. This technology relies on a synergy of extensive data and human expertise, meticulously structured to yield accurate results. Tackling the intricate task [...] Read more.

In artificial intelligence (AI), computer vision consists of intelligent models to interpret and recognize the visual world, similar to human vision. This technology relies on a synergy of extensive data and human expertise, meticulously structured to yield accurate results. Tackling the intricate task of locating and resolving blockages within sewer systems is a significant challenge due to their diverse nature and lack of robust technique. This research utilizes the previously introduced “S-BIRD” dataset, a collection of frames depicting sewer blockages, as the foundational training data for a deep neural network model. To enhance the model’s performance and attain optimal results, transfer learning and fine-tuning techniques are strategically implemented on the YOLOv5 architecture, using the corresponding dataset. The outcomes of the trained model exhibit a remarkable accuracy rate in sewer blockage detection, thereby boosting the reliability and efficacy of the associated robotic framework for proficient removal of various blockages. Particularly noteworthy is the achieved mean average precision (mAP) score of 96.30% at a confidence threshold of 0.5, maintaining a consistently high-performance level of 79.20% across Intersection over Union (IoU) thresholds ranging from 0.5 to 0.95. It is expected that this work contributes to advancing the applications of AI-driven solutions for modern urban sanitation systems. Full article

(This article belongs to the Special Issue Advances of Artificial Intelligence and Vision Applications)

► Show Figures

Figure 1

14 pages, 3989 KiB

Open AccessArticle

Efficient Reversible Data Hiding Using Two-Dimensional Pixel Clustering

by Junying Yuan, Huicheng Zheng and Jiangqun Ni

Electronics 2023, 12(7), 1645; https://doi.org/10.3390/electronics12071645 - 30 Mar 2023

Cited by 3 | Viewed by 871

Abstract

Pixel clustering is a technique of content-adaptive data embedding in the area of high-performance reversible data hiding (RDH). Using pixel clustering, the pixels in a cover image can be classified into different groups based on a single factor, which is usually the local [...] Read more.

Pixel clustering is a technique of content-adaptive data embedding in the area of high-performance reversible data hiding (RDH). Using pixel clustering, the pixels in a cover image can be classified into different groups based on a single factor, which is usually the local complexity. Since finer pixel clustering seems to improve the embedding performance, in this manuscript, we propose using two factors for two-dimensional pixel clustering to develop high-performance RDH. Firstly, in addition to the local complexity, a novel factor was designed as the second factor for pixel clustering. Specifically, the proposed factor was defined using the rotation-invariant code derived from pixel relationships in the four-neighborhood. Then, pixels were allocated to the two-dimensional clusters based on the two clustering factors, and cluster-based pixel prediction was realized. As a result, two-dimensional prediction-error histograms (2D-PEHs) were constructed, and performance optimization was based on the selection of expansion bins from the 2D-PEHs. Next, an algorithm for fast expansion-bin selection was introduced to reduce the time complexity. Lastly, data embedding was realized using the technique of prediction-error expansion according to the optimally selected expansion bins. Extensive experiments show that the embedding performance was significantly enhanced, particularly in terms of improved image quality and reduced time complexity, and embedding capacity also moderately improved. Full article

(This article belongs to the Special Issue Advances of Artificial Intelligence and Vision Applications)

► Show Figures

Figure 1

12 pages, 2843 KiB

Open AccessArticle

Applying Monte Carlo Dropout to Quantify the Uncertainty of Skip Connection-Based Convolutional Neural Networks Optimized by Big Data

by Abouzar Choubineh, Jie Chen, Frans Coenen and Fei Ma

Electronics 2023, 12(6), 1453; https://doi.org/10.3390/electronics12061453 - 19 Mar 2023

Cited by 3 | Viewed by 3053

Abstract

Although Deep Learning (DL) models have been introduced in various fields as effective prediction tools, they often do not care about uncertainty. This can be a barrier to their adoption in real-world applications. The current paper aims to apply and evaluate Monte Carlo [...] Read more.

Although Deep Learning (DL) models have been introduced in various fields as effective prediction tools, they often do not care about uncertainty. This can be a barrier to their adoption in real-world applications. The current paper aims to apply and evaluate Monte Carlo (MC) dropout, a computationally efficient approach, to investigate the reliability of several skip connection-based Convolutional Neural Network (CNN) models while keeping their high accuracy. To do so, a high-dimensional regression problem is considered in the context of subterranean fluid flow modeling using 376,250 generated samples. The results demonstrate the effectiveness of MC dropout in terms of reliability with a Standard Deviation (SD) of 0.012–0.174, and of accuracy with a coefficient of determination (R²) of 0.7881–0.9584 and Mean Squared Error (MSE) of 0.0113–0.0508, respectively. The findings of this study may contribute to the distribution of pressure in the development of oil/gas fields. Full article

(This article belongs to the Special Issue Advances of Artificial Intelligence and Vision Applications)

► Show Figures

Figure 1

20 pages, 1551 KiB

Open AccessArticle

CAE-Net: Cross-Modal Attention Enhancement Network for RGB-T Salient Object Detection

by Chengtao Lv, Bin Wan, Xiaofei Zhou, Yaoqi Sun, Ji Hu, Jiyong Zhang and Chenggang Yan

Electronics 2023, 12(4), 953; https://doi.org/10.3390/electronics12040953 - 14 Feb 2023

Viewed by 1496

Abstract

RGB salient object detection (SOD) performs poorly in low-contrast and complex background scenes. Fortunately, the thermal infrared image can capture the heat distribution of scenes as complementary information to the RGB image, so the RGB-T SOD has recently attracted more and more attention. [...] Read more.

RGB salient object detection (SOD) performs poorly in low-contrast and complex background scenes. Fortunately, the thermal infrared image can capture the heat distribution of scenes as complementary information to the RGB image, so the RGB-T SOD has recently attracted more and more attention. Many researchers have committed to accelerating the development of RGB-T SOD, but some problems still remain to be solved. For example, the defective sample and interfering information contained in the RGB or thermal image hinder the model from learning proper saliency features, meanwhile the low-level features with noisy information result in incomplete salient objects or false positive detection. To solve these problems, we design a cross-modal attention enhancement network (CAE-Net). First, we concretely design a cross-modal fusion (CMF) module to fuse cross-modal features, where the cross-attention unit (CAU) is employed to enhance the two modal features, and channel attention is used to dynamically weigh and fuse the two modal features. Then, we design the joint-modality decoder (JMD) to fuse cross-level features, where the low-level features are purified by higher level features, and multi-scale features are sufficiently integrated. Besides, we add two single-modality decoder (SMD) branches to preserve more modality-specific information. Finally, we employ a multi-stream fusion (MSF) module to fuse three decoders’ features. Comprehensive experiments are conducted on three RGB-T datasets, and the results show that our CAE-Net is comparable to the other methods. Full article

(This article belongs to the Special Issue Advances of Artificial Intelligence and Vision Applications)

► Show Figures

Figure 1

30 pages, 19008 KiB

Open AccessEditor’s ChoiceArticle

Automated Pre-Play Analysis of American Football Formations Using Deep Learning

by Jacob Newman, Andrew Sumsion, Shad Torrie and Dah-Jye Lee

Electronics 2023, 12(3), 726; https://doi.org/10.3390/electronics12030726 - 01 Feb 2023

Cited by 2 | Viewed by 5017

Abstract

Annotation and analysis of sports videos is a time-consuming task that, once automated, will provide benefits to coaches, players, and spectators. American football, as the most watched sport in the United States, could especially benefit from this automation. Manual annotation and analysis of [...] Read more.

Annotation and analysis of sports videos is a time-consuming task that, once automated, will provide benefits to coaches, players, and spectators. American football, as the most watched sport in the United States, could especially benefit from this automation. Manual annotation and analysis of recorded videos of American football games is an inefficient and tedious process. Currently, most college football programs focus on annotating offensive formations to help them develop game plans for their upcoming games. As a first step to further research for this unique application, we use computer vision and deep learning to analyze an overhead image of a football play immediately before the play begins. This analysis consists of locating individual football players and labeling their position or roles, as well as identifying the formation of the offensive team. We obtain greater than 90% accuracy on both player detection and labeling, and 84.8% accuracy on formation identification. These results prove the feasibility of building a complete American football strategy analysis system using artificial intelligence. Collecting a larger dataset in real-world situations will enable further improvements. This would likewise enable American football teams to analyze game footage quickly. Full article

(This article belongs to the Special Issue Advances of Artificial Intelligence and Vision Applications)

► Show Figures

Figure 1

Other

Jump to: Research

44 pages, 1279 KiB

Open AccessSystematic Review

Intelligent Robotics—A Systematic Review of Emerging Technologies and Trends

by Josip Tomo Licardo, Mihael Domjan and Tihomir Orehovački

Electronics 2024, 13(3), 542; https://doi.org/10.3390/electronics13030542 - 29 Jan 2024

Viewed by 3257

Abstract

Intelligent robotics has the potential to revolutionize various industries by amplifying output, streamlining operations, and enriching customer interactions. This systematic literature review aims to analyze emerging technologies and trends in intelligent robotics, addressing key research questions, identifying challenges and opportunities, and proposing the [...] Read more.

Intelligent robotics has the potential to revolutionize various industries by amplifying output, streamlining operations, and enriching customer interactions. This systematic literature review aims to analyze emerging technologies and trends in intelligent robotics, addressing key research questions, identifying challenges and opportunities, and proposing the best practices for responsible and beneficial integration into various sectors. Our research uncovers the significant improvements brought by intelligent robotics across industries such as manufacturing, logistics, tourism, agriculture, healthcare, and construction. The main results indicate the importance of focusing on human–robot collaboration, ethical considerations, sustainable practices, and addressing industry-specific challenges to harness the opportunities presented by intelligent robotics fully. The implications and future directions of intelligent robotics involve addressing both challenges and potential risks, maximizing benefits, and ensuring responsible implementation. The continuous improvement and refinement of existing technology will shape human life and industries, driving innovation and advancements in intelligent robotics. Full article

(This article belongs to the Special Issue Advances of Artificial Intelligence and Vision Applications)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Advances of Artificial Intelligence and Vision Applications

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (13 papers)

Research

Other

Further Information

Guidelines

MDPI Initiatives

Follow MDPI