State-of-the-Art of Computer Vision and Pattern Recognition

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 August 2024 | Viewed by 4770

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science and Engineering, Sejong University, Seoul, Republic of Korea
Interests: big data; computer vision; pattern recognition; biometrics; deep learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Information and Communication Engineering, and Convergence Engineering for Intelligent Drone, Sejong University, Seoul 05006, Republic of Korea
Interests: deep learning; object detection; NLP; pattern recognition; computer vision
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In the rapidly evolving field of computer vision and pattern recognition, continuous advancements are reshaping the way we perceive and interact with visual data. This Special Issue aims to discuss the latest breakthroughs and innovations in these domains, offering a comprehensive snapshot of the cutting-edge research that is pushing the boundaries of what is possible.

The Special Issue will cover a wide spectrum of topics, including, but not limited to, image classification, object detection, image segmentation, video analysis, deep learning, feature extraction, face recognition, and gesture recognition. Contributions will explore novel algorithms, architectures, methodologies, and applications that contribute to the enhanced understanding and interpretation of visual data. Additionally, the issue will delve into the fusion of computer vision and pattern recognition, highlighting the synergies between these two fields and their combined potential to revolutionize various industries.

We invite researchers, practitioners, and experts in computer vision and pattern recognition to submit their original research, reviews, and case studies. The Special Issue aims to foster interdisciplinary collaboration, enabling researchers to share their insights, experiences, and challenges. By addressing both theoretical and practical aspects, this collection of articles will not only provide a comprehensive overview of recent advances but also serve as a valuable resource for researchers, practitioners, and educators in the field.

Prof. Dr. Hyeonjoon Moon
Dr. Lien Minh Dang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image classification
  • object detection
  • image segmentation
  • video analysis
  • deep learning
  • feature extraction
  • gesture recognition
  • pattern recognition
  • computer vision
  • pattern recognition
  • face recognition

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 6250 KiB  
Article
Emotion Recognition beyond Pixels: Leveraging Facial Point Landmark Meshes
by Herag Arabian, Tamer Abdulbaki Alshirbaji, J. Geoffrey Chase and Knut Moeller
Appl. Sci. 2024, 14(8), 3358; https://doi.org/10.3390/app14083358 - 16 Apr 2024
Viewed by 322
Abstract
Digital health apps have become a staple in daily life, promoting awareness and providing motivation for a healthier lifestyle. With an already overwhelmed healthcare system, digital therapies offer relief to both patient and physician alike. One such planned digital therapy application is the [...] Read more.
Digital health apps have become a staple in daily life, promoting awareness and providing motivation for a healthier lifestyle. With an already overwhelmed healthcare system, digital therapies offer relief to both patient and physician alike. One such planned digital therapy application is the incorporation of an emotion recognition model as a tool for therapeutic interventions for people with autism spectrum disorder (ASD). Diagnoses of ASD have increased relatively rapidly in recent years. To ensure effective recognition of expressions, a system is designed to analyze and classify different emotions from facial landmarks. Facial landmarks combined with a corresponding mesh have the potential of bypassing hurdles of model robustness commonly affecting emotion recognition from images. Landmarks are extracted from facial images using the Mediapipe framework, after which a custom mesh is constructed from the detected landmarks and used as input to a graph convolution network (GCN) model for emotion classification. The GCN makes use of the relations formed from the mesh along with the special distance features extracted. A weighted loss approach is also utilized to reduce the effects of an imbalanced dataset. The model was trained and evaluated with the Aff-Wild2 database. The results yielded a 58.76% mean accuracy on the selected validation set. The proposed approach shows the potential and limitations of using GCNs for emotion recognition in real-world scenarios. Full article
(This article belongs to the Special Issue State-of-the-Art of Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 2204 KiB  
Article
Polyp Generalization via Diversifying Style at Feature-Level Space
by Sahadev Poudel and Sang-Woong Lee
Appl. Sci. 2024, 14(7), 2780; https://doi.org/10.3390/app14072780 - 26 Mar 2024
Viewed by 237
Abstract
In polyp segmentation, the latest notable topic revolves around polyp generalization, which aims to develop deep learning-based models capable of learning from single or multiple source domains and applying this knowledge to unseen datasets. A significant challenge in real-world clinical settings is the [...] Read more.
In polyp segmentation, the latest notable topic revolves around polyp generalization, which aims to develop deep learning-based models capable of learning from single or multiple source domains and applying this knowledge to unseen datasets. A significant challenge in real-world clinical settings is the suboptimal performance of generalized models due to domain shift. Convolutional neural networks (CNNs) are often biased towards low-level features, such as style features, impacting generalization. Despite attempts to mitigate this bias using data augmentation techniques, learning model-agnostic and class-specific feature representations remains complex. Previous methods have employed image-level transformations with styles to supplement training data diversity. However, these approaches face limitations in ensuring style diversity due to restricted style sources, limiting the utilization of the potential style space. To address this, we propose a straightforward yet effective style conversion and generation module integrated into the UNet model. This module transfers diverse yet plausible style features to the original training data at the feature-level space, ensuring that generated styles align closely with the original data. Our method demonstrates superior performance in single-domain generalization tasks across five datasets compared to prior methods. Full article
(This article belongs to the Special Issue State-of-the-Art of Computer Vision and Pattern Recognition)
Show Figures

Figure 1

14 pages, 5220 KiB  
Article
Land-Cover Classification Using Deep Learning with High-Resolution Remote-Sensing Imagery
by Muhammad Fayaz, Junyoung Nam, L. Minh Dang, Hyoung-Kyu Song and Hyeonjoon Moon
Appl. Sci. 2024, 14(5), 1844; https://doi.org/10.3390/app14051844 - 23 Feb 2024
Viewed by 842
Abstract
Land-area classification (LAC) research offers a promising avenue to address the intricacies of urban planning, agricultural zoning, and environmental monitoring, with a specific focus on urban areas and their complex land usage patterns. The potential of LAC research is significantly propelled by advancements [...] Read more.
Land-area classification (LAC) research offers a promising avenue to address the intricacies of urban planning, agricultural zoning, and environmental monitoring, with a specific focus on urban areas and their complex land usage patterns. The potential of LAC research is significantly propelled by advancements in high-resolution satellite imagery and machine learning strategies, particularly the use of convolutional neural networks (CNNs). Accurate LAC is paramount for informed urban development and effective land management. Traditional remote-sensing methods encounter limitations in precisely classifying dynamic and complex urban land areas. Therefore, in this study, we investigated the application of transfer learning with Inception-v3 and DenseNet121 architectures to establish a reliable LAC system for identifying urban land use classes. Leveraging transfer learning with these models provided distinct advantages, as it allows the LAC system to benefit from pre-trained features on large datasets, enhancing model generalization and performance compared to starting from scratch. Transfer learning also facilitates the effective utilization of limited labeled data for fine-tuning, making it a valuable strategy for optimizing model accuracy in complex urban land classification tasks. Moreover, we strategically employ fine-tuned versions of Inception-v3 and DenseNet121 networks, emphasizing the transformative impact of these architectures. The fine-tuning process enables the model to leverage pre-existing knowledge from extensive datasets, enhancing its adaptability to the intricacies of LC classification. By aligning with these advanced techniques, our research not only contributes to the evolution of remote-sensing methodologies but also underscores the paramount importance of incorporating cutting-edge methodologies, such as fine-tuning and the use of specific network architectures, in the continual enhancement of LC classification systems. Through experiments conducted on the UC-Merced_LandUse dataset, we demonstrate the effectiveness of our approach, achieving remarkable results, including 92% accuracy, 93% recall, 92% precision, and a 92% F1-score. Moreover, employing heatmap analysis further elucidates the decision-making process of the models, providing insights into the classification mechanism. The successful application of CNNs in LAC, coupled with heatmap analysis, opens promising avenues for enhanced urban planning, agricultural zoning, and environmental monitoring through more accurate and automated land-area classification. Full article
(This article belongs to the Special Issue State-of-the-Art of Computer Vision and Pattern Recognition)
Show Figures

Figure 1

18 pages, 1398 KiB  
Article
A Deep Bidirectional LSTM Model Enhanced by Transfer-Learning-Based Feature Extraction for Dynamic Human Activity Recognition
by Najmul Hassan, Abu Saleh Musa Miah and Jungpil Shin
Appl. Sci. 2024, 14(2), 603; https://doi.org/10.3390/app14020603 - 10 Jan 2024
Cited by 3 | Viewed by 1866
Abstract
Dynamic human activity recognition (HAR) is a domain of study that is currently receiving considerable attention within the fields of computer vision and pattern recognition. The growing need for artificial-intelligence (AI)-driven systems to evaluate human behaviour and bolster security underscores the timeliness of [...] Read more.
Dynamic human activity recognition (HAR) is a domain of study that is currently receiving considerable attention within the fields of computer vision and pattern recognition. The growing need for artificial-intelligence (AI)-driven systems to evaluate human behaviour and bolster security underscores the timeliness of this research. Despite the strides made by numerous researchers in developing dynamic HAR frameworks utilizing diverse pre-trained architectures for feature extraction and classification, persisting challenges include suboptimal performance accuracy and the computational intricacies inherent in existing systems. These challenges arise due to the vast video-based datasets and the inherent similarity in the data. To address these challenges, we propose an innovative, dynamic HAR technique employing a deep-learning-based, deep bidirectional long short-term memory (Deep BiLSTM) model facilitated by a pre-trained transfer-learning-based feature-extraction approach. Our approach begins with the utilization of Convolutional Neural Network (CNN) models, specifically MobileNetV2, for extracting deep-level features from video frames. Subsequently, these features are fed into an optimized deep bidirectional long short-term memory (Deep BiLSTM) network to discern dependencies and process data, enabling optimal predictions. During the testing phase, an iterative fine-tuning procedure is introduced to update the high parameters of the trained model, ensuring adaptability to varying scenarios. The proposed model’s efficacy was rigorously evaluated using three benchmark datasets, namely UCF11, UCF Sport, and JHMDB, achieving notable accuracies of 99.20%, 93.3%, and 76.30%, respectively. This high-performance accuracy substantiates the superiority of our proposed model, signaling a promising advancement in the domain of activity recognition. Full article
(This article belongs to the Special Issue State-of-the-Art of Computer Vision and Pattern Recognition)
Show Figures

Figure 1

16 pages, 10304 KiB  
Article
BWLM: A Balanced Weight Learning Mechanism for Long-Tailed Image Recognition
by Baoyu Fan, Han Ma, Yue Liu and Xiaochen Yuan
Appl. Sci. 2024, 14(1), 454; https://doi.org/10.3390/app14010454 - 04 Jan 2024
Viewed by 774
Abstract
With the growth of data in the real world, datasets often encounter the problem of long-tailed distribution of class sample sizes. In long-tailed image recognition, existing solutions usually adopt a class rebalancing strategy, such as reweighting based on the effective sample size of [...] Read more.
With the growth of data in the real world, datasets often encounter the problem of long-tailed distribution of class sample sizes. In long-tailed image recognition, existing solutions usually adopt a class rebalancing strategy, such as reweighting based on the effective sample size of each class, which leans towards common classes in terms of higher accuracy. However, increasing the accuracy of rare classes while maintaining the accuracy of common classes is the key to solving the problem of long-tailed image recognition. This research explores a direction that balances the accuracy of both common and rare classes simultaneously. Firstly, a two-stage training is adopted, motivated by the use of transfer learning to balance features of common and rare classes. Secondly, a balanced weight function called Balanced Focal Softmax (BFS) loss is proposed, which combines balanced softmax loss focusing on common classes with balanced focal loss focusing on rare classes to achieve dual balance in long-tailed image recognition. Subsequently, a Balanced Weight Learning Mechanism (BWLM) to further utilize the feature of weight decay is proposed, where the weight decay as the weight balancing technique for the BFS loss tends to make the model learn smaller balanced weights by punishing the larger weights. Through extensive experiments on five long-tailed image datasets, it proves that transferring the weights from the first stage to the second stage can alleviate the bias of the naive models toward common classes. The proposed BWLM not only balances the weights of common and rare classes, but also greatly improves the accuracy of long-tailed image recognition and outperforms many state-of-the-art algorithms. Full article
(This article belongs to the Special Issue State-of-the-Art of Computer Vision and Pattern Recognition)
Show Figures

Figure 1

Back to TopTop