Current Trends and Future Perspectives on Computer Vision and Pattern Recognition

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (31 July 2023) | Viewed by 17701

Special Issue Editors

Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China
Interests: pattern recognition; computer vision
1. Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China
2. Center of Materials Science and Optoelectronics Engineering School of Integrated Circuits, University of Chinese Academy of Sciences, Beijing 100049, China
Interests: pattern recognition; image classification; neural network; convolutional network; computer vision; object detection
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Computer Science, University College Dublin, Belfield, D04 V1W8 Dublin, Ireland
Interests: IoT; social computing; intelligent transportation systems; IoT; social networks analysis; mobile edge computing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Recent advancements in Computer Vision and Pattern Recognition have accelerated the development of intelligent applications for numerous industries and domains. Such solutions are not only seamlessly integrated in the environment, but typically have large adaptability for unexpected conditions, which increases their usefulness for real-world problems. Recent advances in Computer Vision and Pattern Recognition have had many successes but also have several limitations and there is limited understanding of their inner workings. It is remains a major challenge in the deployment of Computer Vision and Pattern Recognition algorithms in real-world scenarios. Therefore, this paper, on the Current Trends and Future Perspectives on Computer Vision and Pattern Recognition, seeks to collect the most recent approaches and findings, as well as discuss the current challenges of Computer Vision and Pattern Recognition solutions for a wide variety of applications. We expect this Special Issue to tackle the research concerns in the closely linked fields of Computer Vision and Pattern Recognition, such as Machine Learning, Data Mining, Computer Vision and Image Processing. We encourage interdisciplinary study and application in these fields.

Important new theories, methods, applications and systems in emerging areas of Computer Vision and Pattern Recognition are welcome high-quality submissions. The topics of interest include, but are not limited to:

  • Interpretable Machine Learning for Computer Vision;
  • Computer vision theory;
  • Semi-supervised, weakly supervised and unsupervised learning frameworks for Pattern Recognition systems;
  • Embodied vision: active agents, simulation;
  • Automated Deep Learning, including one or multiple stages of the machine learning process (e.g., data pre-processing, network architecture selection, hyper-parameter optimisation);
  • 3D from multi-view, sensors and single images;
  • Automated Deep Learning, including one or multiple stages of the machine learning process (e.g., data pre-processing, network architecture selection, hyper-parameter optimisation);
  • Multimodal learning;
  • Ethics/Privacy issues in deploying Pattern Recognition-based systems;
  • Virtual and augmented reality content and systems;
  • Benchmarks of current Pattern-Recognition-based solutions for real-world problems;

Dr. Weijun Li
Dr. Xin Ning
Dr. Sahraoui Dhelim
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • pattern-recognition systems
  • machine learning
  • computer vision
  • virtual reality
  • object detection and classfication

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

12 pages, 1861 KiB  
Article
Face Keypoint Detection Method Based on Blaze_ghost Network
by Ning Yu, Yongping Tian, Xiaochuan Zhang and Xiaofeng Yin
Appl. Sci. 2023, 13(18), 10385; https://doi.org/10.3390/app131810385 - 17 Sep 2023
Cited by 1 | Viewed by 956
Abstract
The accuracy and speed of facial keypoint detection are crucial factors for effectively extracting fatigue features, such as eye blinking and yawning. This paper focuses on the improvement and optimization of facial keypoint detection algorithms, presenting a facial keypoint detection method based on [...] Read more.
The accuracy and speed of facial keypoint detection are crucial factors for effectively extracting fatigue features, such as eye blinking and yawning. This paper focuses on the improvement and optimization of facial keypoint detection algorithms, presenting a facial keypoint detection method based on the Blaze_ghost network and providing more reliable support for facial fatigue analysis. Firstly, the Blaze_ghost network is designed as the backbone network with a deeper structure and more parameters to better capture facial detail features, improving the accuracy of keypoint localization. Secondly, HuberWingloss is designed as the loss function to further reduce the training difficulty of the model and enhance its generalization ability. Compared to traditional loss functions, HuberWingloss can reduce the interference of outliers (such as noise and occlusion) in model training, improve the model’s robustness to complex situations, and further enhance the accuracy of keypoint detection. Experimental results show that the proposed method achieves significant improvements in both the NME (Normal Mean Error) and FR (Failure Rate) evaluation metrics. Compared to traditional methods, the proposed model demonstrates a considerable improvement in keypoint localization accuracy while still maintaining high detection efficiency. Full article
Show Figures

Figure 1

21 pages, 9265 KiB  
Article
Handwriting-Based Text Line Segmentation from Malayalam Documents
by Pearlsy P V and Deepa Sankar
Appl. Sci. 2023, 13(17), 9712; https://doi.org/10.3390/app13179712 - 28 Aug 2023
Viewed by 1221
Abstract
Optical character recognition systems for Malayalam handwritten documents have become an open research area. A major hindrance in this research is the unavailability of a benchmark database. Therefore, a new database of 402 Malayalam handwritten document images and ground truth images of 7535 [...] Read more.
Optical character recognition systems for Malayalam handwritten documents have become an open research area. A major hindrance in this research is the unavailability of a benchmark database. Therefore, a new database of 402 Malayalam handwritten document images and ground truth images of 7535 text lines is developed for the implementation of the proposed technique. This paper proposes a technique for the extraction of text lines from handwritten documents in the Malayalam language, specifically based on the handwriting of the writer. Text lines are extracted based on horizontal and vertical projection values, the size of the handwritten characters, the height of the text lines and the curved nature of the Malayalam alphabet. The proposed technique is able to overcome incorrect segmentation due to the presence of characters written with spaces above or below other characters and the overlapping of lines because of ascenders and descenders. The performance of the proposed method for text line extraction is quantitatively evaluated using the MatchScore value metric and is found to be 85.507%. The recognition accuracy, detection rate and F-measure of the proposed method are found to be 99.39%, 85.5% and 91.92%, respectively. It is experimentally verified that the proposed method outperforms some of the existing language-independent text line extraction algorithms. Full article
Show Figures

Figure 1

18 pages, 9092 KiB  
Article
Motion Capture for Sporting Events Based on Graph Convolutional Neural Networks and Single Target Pose Estimation Algorithms
by Chengpeng Duan, Bingliang Hu, Wei Liu and Jie Song
Appl. Sci. 2023, 13(13), 7611; https://doi.org/10.3390/app13137611 - 27 Jun 2023
Cited by 5 | Viewed by 1705
Abstract
Human pose estimation refers to accurately estimating the position of the human body from a single RGB image and detecting the location of the body. It serves as the basis for several computer vision tasks, such as human tracking, 3D reconstruction, and autonomous [...] Read more.
Human pose estimation refers to accurately estimating the position of the human body from a single RGB image and detecting the location of the body. It serves as the basis for several computer vision tasks, such as human tracking, 3D reconstruction, and autonomous driving. Improving the accuracy of pose estimation has significant implications for the advancement of computer vision. This paper addresses the limitations of single-branch networks in pose estimation. It presents a top-down single-target pose estimation approach based on multi-branch self-calibrating networks combined with graph convolutional neural networks. The study focuses on two aspects: human body detection and human body pose estimation. The human body detection is for athletes appearing in sports competitions, followed by human body pose estimation, which is divided into two methods: coordinate regression-based and heatmap test-based. To improve the accuracy of the heatmap test, the high-resolution feature map output from HRNet is used for deconvolution to improve the accuracy of single-target pose estimation recognition. Full article
Show Figures

Figure 1

17 pages, 4144 KiB  
Article
High Speed and Accuracy of Animation 3D Pose Recognition Based on an Improved Deep Convolution Neural Network
by Wei Ding and Wenfa Li
Appl. Sci. 2023, 13(13), 7566; https://doi.org/10.3390/app13137566 - 27 Jun 2023
Cited by 4 | Viewed by 1249
Abstract
Pose recognition in character animations is an important avenue of research in computer graphics. However, the current use of traditional artificial intelligence algorithms to recognize animation gestures faces hurdles such as low accuracy and speed. Therefore, to overcome the above problems, this paper [...] Read more.
Pose recognition in character animations is an important avenue of research in computer graphics. However, the current use of traditional artificial intelligence algorithms to recognize animation gestures faces hurdles such as low accuracy and speed. Therefore, to overcome the above problems, this paper proposes a real-time 3D pose recognition system, which includes both facial and body poses, based on deep convolutional neural networks and further designs a single-purpose 3D pose estimation system. First, we transformed the human pose extracted from the input image to an abstract pose data structure. Subsequently, we generated the required character animation at runtime based on the transformed dataset. This challenges the conventional concept of monocular 3D pose estimation, which is extremely difficult to achieve. It can also achieve real-time running speed at a resolution of 384 fps. The proposed method was used to identify multiple-character animation using multiple datasets (Microsoft COCO 2014, CMU Panoptic, Human3.6M, and JTA). The results indicated that the improved algorithm improved the recognition accuracy and performance by approximately 3.5% and 8–10 times, respectively, which is significantly superior to other classic algorithms. Furthermore, we tested the proposed system on multiple pose-recognition datasets. The 3D attitude estimation system speed can reach 24 fps with an error of 100 mm, which is considerably less than that of the 2D attitude estimation system with a speed of 60 fps. The pose recognition based on deep learning proposed in this study yielded surprisingly superior performance, proving that the use of deep-learning technology for image recognition has great potential. Full article
Show Figures

Figure 1

17 pages, 1000 KiB  
Article
A Financial Time-Series Prediction Model Based on Multiplex Attention and Linear Transformer Structure
by Caosen Xu, Jingyuan Li, Bing Feng and Baoli Lu
Appl. Sci. 2023, 13(8), 5175; https://doi.org/10.3390/app13085175 - 21 Apr 2023
Cited by 5 | Viewed by 4056
Abstract
Financial time-series prediction has been an important topic in deep learning, and the prediction of financial time series is of great importance to investors, commercial banks and regulators. This paper proposes a model based on multiplexed attention mechanisms and linear transformers to predict [...] Read more.
Financial time-series prediction has been an important topic in deep learning, and the prediction of financial time series is of great importance to investors, commercial banks and regulators. This paper proposes a model based on multiplexed attention mechanisms and linear transformers to predict financial time series. The linear transformer model has a faster model training efficiency and a long-time forecasting capability. Using a linear transformer reduces the original transformer’s complexity and preserves the decoder’s multiplexed attention mechanism. The results show that the proposed method can effectively improve the prediction accuracy of the model, increase the inference speed of the model and reduce the number of operations, which has new implications for the prediction of financial time series. Full article
Show Figures

Figure 1

19 pages, 5181 KiB  
Article
Deep Clustering Efficient Learning Network for Motion Recognition Based on Self-Attention Mechanism
by Tielin Ru and Ziheng Zhu
Appl. Sci. 2023, 13(5), 2996; https://doi.org/10.3390/app13052996 - 26 Feb 2023
Cited by 1 | Viewed by 1071
Abstract
Multi-person behavior event recognition has become an increasingly challenging research field in human–computer interaction. With the rapid development of deep learning and computer vision, it plays an important role in the inference and analysis of real sports events, that is, given the video [...] Read more.
Multi-person behavior event recognition has become an increasingly challenging research field in human–computer interaction. With the rapid development of deep learning and computer vision, it plays an important role in the inference and analysis of real sports events, that is, given the video frequency of sports events, when letting it analyze and judge the behavior trend of athletes, often faced with the limitations of large-scale data sets and hardware, it takes a lot of time, and the accuracy of the results is not high. Therefore, we propose a deep clustering learning network for motion recognition under the self-attention mechanism, which can efficiently solve the accuracy and efficiency problems of sports event analysis and judgment. This method can not only solve the problem of gradient disappearance and explosion in the recurrent neural network (RNN), but also capture the internal correlation between multiple people on the sports field for identification, etc., by using the long and short-term memory network (LSTM), and combine the motion coding information in the key frames with the deep embedded clustering (DEC) to better analyze and judge the complex behavior change types of athletes. In addition, by using the self-attention mechanism, we can not only analyze the whole process of the sports video macroscopically, but also focus on the specific attributes of the movement, extract the key posture features of the athletes, further enhance the features, effectively reduce the amount of parameters in the calculation process of self-attention, reduce the computational complexity, and maintain the ability to capture details. The accuracy and efficiency of reasoning and judgment are improved. Through verification on large video datasets of mainstream sports, we achieved high accuracy and improved the efficiency of inference and prediction. It is proved that the method is effective and feasible in the analysis and reasoning of sports videos. Full article
Show Figures

Figure 1

16 pages, 3975 KiB  
Article
An Attention-Based Method for Remaining Useful Life Prediction of Rotating Machinery
by Yaohua Deng, Chengwang Guo, Zilin Zhang, Linfeng Zou, Xiali Liu and Shengyu Lin
Appl. Sci. 2023, 13(4), 2622; https://doi.org/10.3390/app13042622 - 17 Feb 2023
Cited by 4 | Viewed by 1510
Abstract
Data imbalance and large data probability distribution discrepancies are major factors that reduce the accuracy of remaining useful life (RUL) prediction of high-reliability rotating machinery. In feature extraction, most deep transfer learning models consider the overall features but rarely attend to the local [...] Read more.
Data imbalance and large data probability distribution discrepancies are major factors that reduce the accuracy of remaining useful life (RUL) prediction of high-reliability rotating machinery. In feature extraction, most deep transfer learning models consider the overall features but rarely attend to the local target features that are useful for RUL prediction; insufficient attention paid to local features reduces the accuracy and reliability of prediction. By considering the contribution of input data to the modeling output, a deep learning model that incorporates the attention mechanism in feature selection and extraction is proposed in our work; an unsupervised clustering method for classification of rotating machinery performance state evolution is put forward, and a similarity function is used to calculate the expected attention of input data to build an input data extraction attention module; the module is then fused with a gated recurrent unit (GRU), a variant of a recurrent neural network, to construct an attention-GRU model that combines prediction calculation and weight calculation for RUL prediction. Tests on public datasets show that the attention-GRU model outperforms traditional GRU and LSTM in RUL prediction, achieves less prediction error, and improves the performance and stability of the model. Full article
Show Figures

Figure 1

12 pages, 5813 KiB  
Article
Point Cloud Repair Method via Convex Set Theory
by Tianzhen Dong, Yi Zhang, Mengying Li and Yuntao Bai
Appl. Sci. 2023, 13(3), 1830; https://doi.org/10.3390/app13031830 - 31 Jan 2023
Cited by 2 | Viewed by 1510
Abstract
The point cloud is the basis for 3D object surface reconstruction. An incomplete point cloud significantly reduces the accuracy of downstream work such as 3D object reconstruction and recognition. Therefore, point-cloud repair is indispensable work. However, the original shape of the point cloud [...] Read more.
The point cloud is the basis for 3D object surface reconstruction. An incomplete point cloud significantly reduces the accuracy of downstream work such as 3D object reconstruction and recognition. Therefore, point-cloud repair is indispensable work. However, the original shape of the point cloud is difficult to restore due to the uncertainty of the position of the new filling point. Considering the advantages of the convex set in dealing with uncertainty problems, we propose a point-cloud repair method via a convex set that transforms a point-cloud repair problem into a construction problem of the convex set. The core idea of the proposed method is to discretize the hole boundary area into multiple subunits and add new 3D points to the specific subunit according to the construction properties of the convex set. Specific subunits must be located in the hole area. For the selection of the specific subunit, we introduced Markov random fields (MRF) to transform them into the maximal a posteriori (MAP) estimation problem of random field labels. Variational inference was used to approximate MAP and calculate the specific subunit that needed to add new points. Our method iteratively selects specific subunits and adds new filling points. With the increasing number of iterations, the specific subunits gradually move to the center of the hole region until the hole is completely repaired. The quantitative and qualitative results of the experiments demonstrate that our method was superior to the compared method. Full article
Show Figures

Figure 1

16 pages, 3411 KiB  
Article
Monocular 3D Object Detection Based on Pseudo Multimodal Information Extraction and Keypoint Estimation
by Dan Zhao, Chaofeng Ji and Guizhong Liu
Appl. Sci. 2023, 13(3), 1731; https://doi.org/10.3390/app13031731 - 29 Jan 2023
Cited by 2 | Viewed by 1876
Abstract
Three-dimensional object detection is an essential and fundamental task in the field of computer vision which can be widely used in various scenarios such as autonomous driving and visual navigation. In view of the current insufficient utilization of image information in current monocular [...] Read more.
Three-dimensional object detection is an essential and fundamental task in the field of computer vision which can be widely used in various scenarios such as autonomous driving and visual navigation. In view of the current insufficient utilization of image information in current monocular camera-based 3D object detection algorithms, we propose a monocular 3D object detection algorithm based on pseudo-multimodal information extraction and keypoint estimation. We utilize the original image to generate pseudo-lidar and a bird’s-eye view, and then feed the fused data of the original image and pseudo-lidar to the keypoint-based network for an initial 3D box estimation, finally using the bird’s-eye view to refine the initial 3D box. The experimental performance of our method exceeds state-of-the-art algorithms under the evaluation criteria of 3D object detection and localization on the KITTI dataset, achieving the best experimental performance so far. Full article
Show Figures

Figure 1

Back to TopTop