Advances in Image Recognition and Processing Technologies

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 30 April 2024 | Viewed by 3422

Special Issue Editors


E-Mail Website
Guest Editor
College of Information Science and Technology, Beijing University of Chemical Technology (BUCT) and Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China
Interests: pattern recognition; detection and tracking; visual intelligence
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing 100191, China
Interests: computer vision; pattern recognition; image processing; edge intelligence

Special Issue Information

Dear Colleagues,

Image Recognition and Processing Technologies have contributed to significant advances in many fields in recent years. However, there remain many challenges to be addressed due to the inherent complexity of computer vision, limiting its performance in various applications. Therefore, this Special Issue is being assembled in order to share various in-depth research results related to image recognition and processing methods, including, but not limited to, object detection, object tracking, image super-resolution, depth estimation, semantic segmentation and so on. We hope that these advanced methods can boost the application of these technologies in the real world.

It is our pleasure to invite you to join this Special Issue, entitled “Advances in Image Recognition and Processing Technologies”, whereby you are welcome to contribute a manuscript presenting your valuable research progress. Thank you very much.

Dr. Yang Zhang
Dr. Shuai Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • image recognition
  • image processing
  • image super-resolution
  • depth estimation
  • semantic segmentation
  • object detection
  • object tracking
  • semantic segmentation

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 8740 KiB  
Article
Dynamic Downsampling Algorithm for 3D Point Cloud Map Based on Voxel Filtering
by Wenqi Lyu, Wei Ke, Hao Sheng, Xiao Ma and Huayun Zhang
Appl. Sci. 2024, 14(8), 3160; https://doi.org/10.3390/app14083160 - 09 Apr 2024
Viewed by 451
Abstract
In response to the challenge of handling large-scale 3D point cloud data, downsampling is a common approach, yet it often leads to the problem of feature loss. We present a dynamic downsampling algorithm for 3D point cloud maps based on an improved voxel [...] Read more.
In response to the challenge of handling large-scale 3D point cloud data, downsampling is a common approach, yet it often leads to the problem of feature loss. We present a dynamic downsampling algorithm for 3D point cloud maps based on an improved voxel filtering approach. The algorithm consists of two modules, namely, dynamic downsampling and point cloud edge extraction. The former adapts voxel downsampling according to the features of the point cloud, while the latter preserves edge information within the 3D point cloud map. Comparative experiments with voxel downsampling, grid downsampling, clustering-based downsampling, random downsampling, uniform downsampling, and farthest-point downsampling were conducted. The proposed algorithm exhibited favorable downsampling simplification results, with a processing time of 0.01289 s and a simplification rate of 91.89%. Additionally, it demonstrated faster downsampling speed and showcased improved overall performance. This enhancement not only benefits productivity but also highlights the system’s efficiency and effectiveness. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

12 pages, 3423 KiB  
Article
AI Somatotype System Using 3D Body Images: Based on Deep-Learning and Transfer Learning
by Jiwun Yoon, Sang-Yong Lee and Ji-Yong Lee
Appl. Sci. 2024, 14(6), 2608; https://doi.org/10.3390/app14062608 - 20 Mar 2024
Viewed by 548
Abstract
Humans share a similar body structure, but each individual possesses unique characteristics, which we define as one’s body type. Various classification methods have been devised to understand and assess these body types. Recent research has applied artificial intelligence technology utilizing noninvasive measurement tools, [...] Read more.
Humans share a similar body structure, but each individual possesses unique characteristics, which we define as one’s body type. Various classification methods have been devised to understand and assess these body types. Recent research has applied artificial intelligence technology utilizing noninvasive measurement tools, such as 3D body scanner, which minimize physical contact. The purpose of this study was to develop an artificial intelligence somatotype system capable of predicting the three body types proposed by Heath-Carter’s somatotype theory using 3D body images collected using a 3D body scanner. To classify body types, measurements were taken to determine the three somatotype components (endomorphy, mesomorphy, and ectomorphy). MobileNetV2 was utilized as the transfer learning model. The results of this study are as follows: first, the AI somatotype model showed good performance, with a training accuracy around 91% and a validation accuracy around 72%. The respective loss values were 0.26 for the training set and 0.69 for the validation set. Second, validation of the model’s performance using test data resulted in accurate predictions for 18 out of 21 new data points, with prediction errors occurring in three cases, indicating approximately 85% classification accuracy. This study provides foundational data for subsequent research aiming to predict 13 detailed body types across the three body types. Furthermore, it is hoped that the outcomes of this research can be applied in practical settings, enabling anyone with a smartphone camera to identify various body types based on captured images and predict obesity and diseases. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

15 pages, 563 KiB  
Article
Camouflaged Object Detection Based on Deep Learning with Attention-Guided Edge Detection and Multi-Scale Context Fusion
by Yalin Wen, Wei Ke and Hao Sheng
Appl. Sci. 2024, 14(6), 2494; https://doi.org/10.3390/app14062494 - 15 Mar 2024
Viewed by 611
Abstract
In nature, objects that use camouflage have features like colors and textures that closely resemble their background. This creates visual illusions that help them hide and protect themselves from predators. This similarity also makes the task of detecting camouflaged objects very challenging. Methods [...] Read more.
In nature, objects that use camouflage have features like colors and textures that closely resemble their background. This creates visual illusions that help them hide and protect themselves from predators. This similarity also makes the task of detecting camouflaged objects very challenging. Methods for camouflaged object detection (COD), which rely on deep neural networks, are increasingly gaining attention. These methods focus on improving model performance and computational efficiency by extracting edge information and using multi-layer feature fusion. Our improvement is based on researching ways to enhance efficiency in the encode–decode process. We have developed a variant model that combines Swin Transformer (Swin-T) and EfficientNet-B7. This model integrates the strengths of both Swin-T and EfficientNet-B7, and it employs an attention-guided tracking module to efficiently extract edge information and identify objects in camouflaged environments. Additionally, we have incorporated dense skip links to enhance the aggregation of deep-level feature information. A boundary-aware attention module has been incorporated into the final layer of the initial shallow information recognition phase. This module utilizes the Fourier transform to quickly relay specific edge information from the initially obtained shallow semantics to subsequent stages, thereby more effectively achieving feature recognition and edge extraction. In the latter phase, which is focused on deep semantic extraction, we employ a dense skip joint attention module to enhance the decoder’s performance and efficiency, ensuring accurate capture of deep-level information, feature recognition, and edge extraction. In the later stage of deep semantic extraction, we use a dense skip joint attention module to improve the decoder’s performance and efficiency in capturing precise deep information. This module efficiently identifies the specifics and edge information of undetected camouflaged objects across channels and spaces. Differing from previous methods, we introduce an adaptive pixel strength loss function for handling key captured information. Our proposed method shows strong competitive performance on three current benchmark datasets (CHAMELEON, CAMO, COD10K). Compared to 26 previously proposed methods using 4 measurement metrics, our approach exhibits favorable competitiveness. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

20 pages, 6853 KiB  
Article
GLBRF: Group-Based Lightweight Human Behavior Recognition Framework in Video Camera
by Young-Chan Lee, So-Yeon Lee, Byeongchang Kim and Dae-Young Kim
Appl. Sci. 2024, 14(6), 2424; https://doi.org/10.3390/app14062424 - 13 Mar 2024
Viewed by 406
Abstract
Behavioral recognition is an important technique for recognizing actions by analyzing human behavior. It is used in various fields, such as anomaly detection and health estimation. For this purpose, deep learning models are used to recognize and classify the features and patterns of [...] Read more.
Behavioral recognition is an important technique for recognizing actions by analyzing human behavior. It is used in various fields, such as anomaly detection and health estimation. For this purpose, deep learning models are used to recognize and classify the features and patterns of each behavior. However, video-based behavior recognition models require a lot of computational power as they are trained using large datasets. Therefore, there is a need for a lightweight learning framework that can efficiently recognize various behaviors. In this paper, we propose a group-based lightweight human behavior recognition framework (GLBRF) that achieves both low computational burden and high accuracy in video-based behavior recognition. The GLBRF system utilizes a relatively small dataset to reduce computational cost using a 2D CNN model and improves behavior recognition accuracy by applying location-based grouping to recognize interaction behaviors between people. This enables efficient recognition of multiple behaviors in various services. With grouping, the accuracy was as high as 98%, while without grouping, the accuracy was relatively low at 68%. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

20 pages, 7751 KiB  
Article
SCGFormer: Semantic Chebyshev Graph Convolution Transformer for 3D Human Pose Estimation
by Jiayao Liang and Mengxiao Yin
Appl. Sci. 2024, 14(4), 1646; https://doi.org/10.3390/app14041646 - 18 Feb 2024
Viewed by 663
Abstract
With the rapid advancement of deep learning, 3D human pose estimation has largely freed itself from reliance on manually annotated methods. The effective utilization of joint features has become significant. Utilizing 2D human joint information to predict 3D human skeletons is of paramount [...] Read more.
With the rapid advancement of deep learning, 3D human pose estimation has largely freed itself from reliance on manually annotated methods. The effective utilization of joint features has become significant. Utilizing 2D human joint information to predict 3D human skeletons is of paramount importance. Effectively leveraging 2D joint data can improve the accuracy of 3D human skeleton prediction. In this paper, we propose the SCGFormer model to reduce the error in predicting human skeletal poses in three-dimensional space. The network architecture of SCGFormer encompasses Transformer and two distinct types of graph convolution, organized into two interconnected modules: SGraAttention and AcChebGconv. SGraAttention extracts global feature information from each 2D human joint, thereby augmenting local feature learning by integrating prior knowledge of human joint relationships. Simultaneously, AcChebGconv broadens the receptive field for graph structure information and constructs implicit joint relationships to aggregate more valuable adjacent features. SCGraFormer is tested on widely recognized benchmark datasets such as Human3.6M and MPI-INF-3DHP and achieves excellent results. In particular, on Human3.6M, our method achieves the best results in 9 actions (out of a total of 15 actions), with an overall average error reduction of about 1.5 points compared to state-of-the-art methods, demonstrating the excellent performance of SCGFormer. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

Back to TopTop