sensors-logo

Journal Browser

Journal Browser

AI-Driven Sensing for Image Processing and Recognition

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: 20 May 2024 | Viewed by 13563

Special Issue Editors


E-Mail Website
Guest Editor
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
Interests: computer vision; image processing; pattern recognition; machine learning
Department of Biomedical Engineering, Hefei University of Technology, Hefei 230009, China
Interests: image fusion; image super-resolution; visual recognition; biomedical image analysis; machine learning; computer vision
Department of Computer Information Systems, State University of New York at Buffalo State, Buffalo, NY 14222, USA
Interests: computer vision; image processing; pattern recognition; machine learning
Special Issues, Collections and Topics in MDPI journals
College of Automation, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
Interests: image color analysis; image enhancement; image fusion; image restoration
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Image processing and recognition have attracted much attention due to the recent rapid development of artificial intelligence and the improvement in quality requirements for various practical applications. Although the remarkable success of existing methods can help resolve intractable problems, more challenging problems continue to emerge. For example, recently, learning-based image-fusion methods have seen great progress using pre-registered multimodal data; however, they face severe issues when dealing with misaligned multimodal data due to spatial deformation and difficulty in narrowing cross-modality discrepancy. To overcome this limitation, efforts should be made to develop algorithms, architectures and techniques. From an architectural point of view, two-stage registration prior to employing the fusion technique could be an attractive solution. On the other hand, a similar solution could also be applied to other image processing and recognition problems. The use of some innovative architectures and design processes has been attempted for practical application to inject vitality into the industry.

This Special Issue aims to cover the latest progress and future trends in image processing and recognition based on artificial intelligence. Topics of interest include, but are not limited to:

  • Image fusion
  • Image registration
  • Image inpainting and restoration
  • Structural pattern recognition
  • Performance evaluation and benchmark datasets
  • Machine learning
  • Neural networks and deep learning
  • Action recognition
  • Pattern classification and clustering
  • Object detection, tracking and recognition
  • Medical imaging
  • Medical image analysis or segmentation
  • High dynamic range (HDR) Imaging
  • Cross-modal retrieval and identification
  • Multimodal feature fusion
  • Remote sensing images
  • Remote sensing image fusion

If you want to learn more information or need any advice, you can contact the Special Issue Editor Penelope Wang via <penelope.wang@mdpi.com> directly.

Prof. Dr. Huafeng Li
Dr. Yu Liu
Dr. Guanqiu Qi
Dr. Zhiqin Zhu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • image processing
  • pattern recognition
  • machine learning

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 1984 KiB  
Article
Image Filtering to Improve Maize Tassel Detection Accuracy Using Machine Learning Algorithms
by Eric Rodene, Gayara Demini Fernando, Ved Piyush, Yufeng Ge, James C. Schnable, Souparno Ghosh and Jinliang Yang
Sensors 2024, 24(7), 2172; https://doi.org/10.3390/s24072172 - 28 Mar 2024
Viewed by 569
Abstract
Unmanned aerial vehicle (UAV)-based imagery has become widely used to collect time-series agronomic data, which are then incorporated into plant breeding programs to enhance crop improvements. To make efficient analysis possible, in this study, by leveraging an aerial photography dataset for a field [...] Read more.
Unmanned aerial vehicle (UAV)-based imagery has become widely used to collect time-series agronomic data, which are then incorporated into plant breeding programs to enhance crop improvements. To make efficient analysis possible, in this study, by leveraging an aerial photography dataset for a field trial of 233 different inbred lines from the maize diversity panel, we developed machine learning methods for obtaining automated tassel counts at the plot level. We employed both an object-based counting-by-detection (CBD) approach and a density-based counting-by-regression (CBR) approach. Using an image segmentation method that removes most of the pixels not associated with the plant tassels, the results showed a dramatic improvement in the accuracy of object-based (CBD) detection, with the cross-validation prediction accuracy (r2) peaking at 0.7033 on a detector trained with images with a filter threshold of 90. The CBR approach showed the greatest accuracy when using unfiltered images, with a mean absolute error (MAE) of 7.99. However, when using bootstrapping, images filtered at a threshold of 90 showed a slightly better MAE (8.65) than the unfiltered images (8.90). These methods will allow for accurate estimates of flowering-related traits and help to make breeding decisions for crop improvement. Full article
(This article belongs to the Special Issue AI-Driven Sensing for Image Processing and Recognition)
Show Figures

Figure 1

14 pages, 4337 KiB  
Article
Double-Branch Multi-Scale Contextual Network: A Model for Multi-Scale Street Tree Segmentation in High-Resolution Remote Sensing Images
by Hongyang Zhang and Shuo Liu
Sensors 2024, 24(4), 1110; https://doi.org/10.3390/s24041110 - 08 Feb 2024
Viewed by 620
Abstract
Street trees are of great importance to urban green spaces. Quick and accurate segmentation of street trees from high-resolution remote sensing images is of great significance in urban green space management. However, traditional segmentation methods can easily miss some targets because of the [...] Read more.
Street trees are of great importance to urban green spaces. Quick and accurate segmentation of street trees from high-resolution remote sensing images is of great significance in urban green space management. However, traditional segmentation methods can easily miss some targets because of the different sizes of street trees. To solve this problem, we propose the Double-Branch Multi-Scale Contextual Network (DB-MSC Net), which has two branches and a Multi-Scale Contextual (MSC) block in the encoder. The MSC block combines parallel dilated convolutional layers and transformer blocks to enhance the network’s multi-scale feature extraction ability. A channel attention mechanism (CAM) is added to the decoder to assign weights to features from RGB images and the normalized difference vegetation index (NDVI). We proposed a benchmark dataset to test the improvement of our network. Experimental research showed that the DB-MSC Net demonstrated good performance compared with typical methods like Unet, HRnet, SETR and recent methods. The overall accuracy (OA) was improved by at least 0.16% and the mean intersection over union was improved by at least 1.13%. The model’s segmentation accuracy meets the requirements of urban green space management. Full article
(This article belongs to the Special Issue AI-Driven Sensing for Image Processing and Recognition)
Show Figures

Figure 1

24 pages, 11963 KiB  
Article
A Ship Detection Model Based on Dynamic Convolution and an Adaptive Fusion Network for Complex Maritime Conditions
by Zhisheng Li, Zhihui Deng, Kun Hao, Xiaofang Zhao and Zhigang Jin
Sensors 2024, 24(3), 859; https://doi.org/10.3390/s24030859 - 28 Jan 2024
Viewed by 943
Abstract
Ship detection is vital for maritime safety and vessel monitoring, but challenges like false and missed detections persist, particularly in complex backgrounds, multiple scales, and adverse weather conditions. This paper presents YOLO-Vessel, a ship detection model built upon YOLOv7, which incorporates several innovations [...] Read more.
Ship detection is vital for maritime safety and vessel monitoring, but challenges like false and missed detections persist, particularly in complex backgrounds, multiple scales, and adverse weather conditions. This paper presents YOLO-Vessel, a ship detection model built upon YOLOv7, which incorporates several innovations to improve its performance. First, we devised a novel backbone network structure called Efficient Layer Aggregation Networks and Omni-Dimensional Dynamic Convolution (ELAN-ODConv). This architecture effectively addresses the complex background interference commonly encountered in maritime ship images, thereby improving the model’s feature extraction capabilities. Additionally, we introduce the space-to-depth structure in the head network, which can solve the problem of small ship targets in images that are difficult to detect. Furthermore, we introduced ASFFPredict, a predictive network structure addressing scale variation among ship types, bolstering multiscale ship target detection. Experimental results demonstrate YOLO-Vessel’s effectiveness, achieving a 78.3% mean average precision (mAP), surpassing YOLOv7 by 2.3% and Faster R-CNN by 11.6%. It maintains real-time detection at 8.0 ms/frame, meeting real-time ship detection needs. Evaluation in adverse weather conditions confirms YOLO-Vessel’s superiority in ship detection, offering a robust solution to maritime challenges and enhancing marine safety and vessel monitoring. Full article
(This article belongs to the Special Issue AI-Driven Sensing for Image Processing and Recognition)
Show Figures

Figure 1

19 pages, 10500 KiB  
Article
A Novel CNN Model for Classification of Chinese Historical Calligraphy Styles in Regular Script Font
by Qing Huang, Michael Li, Dan Agustin, Lily Li and Meena Jha
Sensors 2024, 24(1), 197; https://doi.org/10.3390/s24010197 - 29 Dec 2023
Viewed by 957
Abstract
Chinese calligraphy, revered globally for its therapeutic and mindfulness benefits, encompasses styles such as regular (Kai Shu), running (Xing Shu), official (Li Shu), and cursive (Cao Shu) scripts. Beginners often start with the regular script, advancing to more intricate styles like cursive. Each [...] Read more.
Chinese calligraphy, revered globally for its therapeutic and mindfulness benefits, encompasses styles such as regular (Kai Shu), running (Xing Shu), official (Li Shu), and cursive (Cao Shu) scripts. Beginners often start with the regular script, advancing to more intricate styles like cursive. Each style, marked by unique historical calligraphy contributions, requires learners to discern distinct nuances. The integration of AI in calligraphy analysis, collection, recognition, and classification is pivotal. This study introduces an innovative convolutional neural network (CNN) architecture, pioneering the application of CNN in the classification of Chinese calligraphy. Focusing on the four principal calligraphy styles from the Tang dynasty (690–907 A.D.), this research spotlights the era when the traditional regular script font (Kai Shu) was refined. A comprehensive dataset of 8282 samples from these calligraphers, representing the zenith of regular style, was compiled for CNN training and testing. The model distinguishes personal styles for classification, showing superior performance over existing networks. Achieving 89.5–96.2% accuracy in calligraphy classification, our approach underscores the significance of CNN in the categorization of both font and artistic styles. This research paves the way for advanced studies in Chinese calligraphy and its cultural implications. Full article
(This article belongs to the Special Issue AI-Driven Sensing for Image Processing and Recognition)
Show Figures

Figure 1

17 pages, 3949 KiB  
Article
A Finger Vein Liveness Detection System Based on Multi-Scale Spatial-Temporal Map and Light-ViT Model
by Liukui Chen, Tengwen Guo, Li Li, Haiyang Jiang, Wenfu Luo and Zuojin Li
Sensors 2023, 23(24), 9637; https://doi.org/10.3390/s23249637 - 05 Dec 2023
Cited by 1 | Viewed by 790
Abstract
Prosthetic attack is a problem that must be prevented in current finger vein recognition applications. To solve this problem, a finger vein liveness detection system was established in this study. The system begins by capturing short-term static finger vein videos using uniform near-infrared [...] Read more.
Prosthetic attack is a problem that must be prevented in current finger vein recognition applications. To solve this problem, a finger vein liveness detection system was established in this study. The system begins by capturing short-term static finger vein videos using uniform near-infrared lighting. Subsequently, it employs Gabor filters without a direct-current (DC) component for vein area segmentation. The vein area is then divided into blocks to compute a multi-scale spatial–temporal map (MSTmap), which facilitates the extraction of coarse liveness features. Finally, these features are trained for refinement and used to predict liveness detection results with the proposed Light Vision Transformer (Light-ViT) model, which is equipped with an enhanced Light-ViT backbone, meticulously designed by interleaving multiple MN blocks and Light-ViT blocks, ensuring improved performance in the task. This architecture effectively balances the learning of local image features, controls network parameter complexity, and substantially improves the accuracy of liveness detection. The accuracy of the Light-ViT model was verified to be 99.63% on a self-made living/prosthetic finger vein video dataset. This proposed system can also be directly applied to the finger vein recognition terminal after the model is made lightweight. Full article
(This article belongs to the Special Issue AI-Driven Sensing for Image Processing and Recognition)
Show Figures

Figure 1

16 pages, 42676 KiB  
Article
BézierCE: Low-Light Image Enhancement via Zero-Reference Bézier Curve Estimation
by Xianjie Gao, Kai Zhao, Lei Han and Jinming Luo
Sensors 2023, 23(23), 9593; https://doi.org/10.3390/s23239593 - 03 Dec 2023
Cited by 1 | Viewed by 1065
Abstract
Due to problems such as the shooting light, viewing angle, and camera equipment, low-light images with low contrast, color distortion, high noise, and unclear details can be seen regularly in real scenes. These low-light images will not only affect our observation but will [...] Read more.
Due to problems such as the shooting light, viewing angle, and camera equipment, low-light images with low contrast, color distortion, high noise, and unclear details can be seen regularly in real scenes. These low-light images will not only affect our observation but will also greatly affect the performance of computer vision processing algorithms. Low-light image enhancement technology can help to improve the quality of images and make them more applicable to fields such as computer vision, machine learning, and artificial intelligence. In this paper, we propose a novel method to enhance images through Bézier curve estimation. We estimate the pixel-level Bézier curve by training a deep neural network (BCE-Net) to adjust the dynamic range of a given image. Based on the good properties of the Bézier curve, in that it is smooth, continuous, and differentiable everywhere, low-light image enhancement through Bézier curve mapping is effective. The advantages of BCE-Net’s brevity and zero-reference make it generalizable to other low-light conditions. Extensive experiments show that our method outperforms existing methods both qualitatively and quantitatively. Full article
(This article belongs to the Special Issue AI-Driven Sensing for Image Processing and Recognition)
Show Figures

Figure 1

15 pages, 3365 KiB  
Article
DeepVision: Enhanced Drone Detection and Recognition in Visible Imagery through Deep Learning Networks
by Hassan J. Al Dawasari, Muhammad Bilal, Muhammad Moinuddin, Kamran Arshad and Khaled Assaleh
Sensors 2023, 23(21), 8711; https://doi.org/10.3390/s23218711 - 25 Oct 2023
Cited by 1 | Viewed by 1074
Abstract
Drones are increasingly capturing the world’s attention, transcending mere hobbies to revolutionize areas such as engineering, disaster aid, logistics, and airport protection, among myriad other fascinating applications. However, there is growing concern about the risks that they pose to physical infrastructure, particularly at [...] Read more.
Drones are increasingly capturing the world’s attention, transcending mere hobbies to revolutionize areas such as engineering, disaster aid, logistics, and airport protection, among myriad other fascinating applications. However, there is growing concern about the risks that they pose to physical infrastructure, particularly at airports, due to potential misuse. In recent times, numerous incidents involving unauthorized drones at airports disrupting flights have been reported. To solve this issue, this article introduces an innovative deep learning method proposed to effectively distinguish between drones and birds. Evaluating the suggested approach with a carefully assembled image dataset demonstrates exceptional performance, surpassing established detection systems previously proposed in the literature. Since drones can appear extremely small compared to other aerial objects, we developed a robust image-tiling technique with overlaps, which showed improved performance in the presence of very small drones. Moreover, drones are frequently mistaken for birds due to their resemblances in appearance and movement patterns. Among the various models tested, including SqueezeNet, MobileNetV2, ResNet18, and ResNet50, the SqueezeNet model exhibited superior performance for medium area ratios, achieving higher average precision (AP) of 0.770. In addition, SqueezeNet’s superior AP scores, faster detection times, and more stable precision-recall dynamics make it more suitable for real-time, accurate drone detection than the other existing CNN methods. The proposed approach has the ability to not only detect the presence or absence of drones in a particular area but also to accurately identify and differentiate between drones and birds. The dataset utilized in this research was obtained from a real-world dataset made available by a group of universities and research institutions as part of the 2020 Drone vs. Bird Detection Challenge. We have also tested the performance of the proposed model on an unseen dataset, further validating its better performance. Full article
(This article belongs to the Special Issue AI-Driven Sensing for Image Processing and Recognition)
Show Figures

Figure 1

18 pages, 1842 KiB  
Article
Dataset Condensation via Expert Subspace Projection
by Zhiheng Ma, Dezheng Gao, Shaolei Yang, Xing Wei and Yihong Gong
Sensors 2023, 23(19), 8148; https://doi.org/10.3390/s23198148 - 28 Sep 2023
Viewed by 886
Abstract
The rapid growth in dataset sizes in modern deep learning has significantly increased data storage costs. Furthermore, the training and time costs for deep neural networks are generally proportional to the dataset size. Therefore, reducing the dataset size while maintaining model performance is [...] Read more.
The rapid growth in dataset sizes in modern deep learning has significantly increased data storage costs. Furthermore, the training and time costs for deep neural networks are generally proportional to the dataset size. Therefore, reducing the dataset size while maintaining model performance is an urgent research problem that needs to be addressed. Dataset condensation is a technique that aims to distill the original dataset into a much smaller synthetic dataset while maintaining downstream training performance on any agnostic neural network. Previous work has demonstrated that matching the training trajectory between the synthetic dataset and the original dataset is more effective than matching the instantaneous gradient, as it incorporates long-range information. Despite the effectiveness of trajectory matching, it suffers from complex gradient unrolling across iterations, which leads to significant memory and computation overhead. To address this issue, this paper proposes a novel approach called Expert Subspace Projection (ESP), which leverages long-range information while avoiding gradient unrolling. Instead of strictly enforcing the synthetic dataset’s training trajectory to mimic that of the real dataset, ESP only constrains it to lie within the subspace spanned by the training trajectory of the real dataset. The memory-saving advantage offered by our method facilitates unbiased training on the complete set of synthetic images and seamless integration with other dataset condensation techniques. Through extensive experiments, we have demonstrated the effectiveness of our approach. Our method outperforms the trajectory matching method on CIFAR10 by 16.7% in the setting of 1 Image/Class, surpassing the previous state-of-the-art method by 3.2%. Full article
(This article belongs to the Special Issue AI-Driven Sensing for Image Processing and Recognition)
Show Figures

Figure 1

20 pages, 9236 KiB  
Article
SFPFusion: An Improved Vision Transformer Combining Super Feature Attention and Wavelet-Guided Pooling for Infrared and Visible Images Fusion
by Hui Li, Yongbiao Xiao, Chunyang Cheng and Xiaoning Song
Sensors 2023, 23(18), 7870; https://doi.org/10.3390/s23187870 - 13 Sep 2023
Cited by 2 | Viewed by 1503
Abstract
The infrared and visible image fusion task aims to generate a single image that preserves complementary features and reduces redundant information from different modalities. Although convolutional neural networks (CNNs) can effectively extract local features and obtain better fusion performance, the size of the [...] Read more.
The infrared and visible image fusion task aims to generate a single image that preserves complementary features and reduces redundant information from different modalities. Although convolutional neural networks (CNNs) can effectively extract local features and obtain better fusion performance, the size of the receptive field limits its feature extraction ability. Thus, the Transformer architecture has gradually become mainstream to extract global features. However, current Transformer-based fusion methods ignore the enhancement of details, which is important to image fusion tasks and other downstream vision tasks. To this end, a new super feature attention mechanism and the wavelet-guided pooling operation are applied to the fusion network to form a novel fusion network, termed SFPFusion. Specifically, super feature attention is able to establish long-range dependencies of images and to fully extract global features. The extracted global features are processed by wavelet-guided pooling to fully extract multi-scale base information and to enhance the detail features. With the powerful representation ability, only simple fusion strategies are utilized to achieve better fusion performance. The superiority of our method compared with other state-of-the-art methods is demonstrated in qualitative and quantitative experiments on multiple image fusion benchmarks. Full article
(This article belongs to the Special Issue AI-Driven Sensing for Image Processing and Recognition)
Show Figures

Figure 1

16 pages, 3210 KiB  
Article
A Zero-Shot Low Light Image Enhancement Method Integrating Gating Mechanism
by Junhao Tian and Jianwei Zhang
Sensors 2023, 23(16), 7306; https://doi.org/10.3390/s23167306 - 21 Aug 2023
Viewed by 920
Abstract
Photographs taken under harsh ambient lighting can suffer from a number of image quality degradation phenomena due to insufficient exposure. These include reduced brightness, loss of transfer information, noise, and color distortion. In order to solve the above problems, researchers have proposed many [...] Read more.
Photographs taken under harsh ambient lighting can suffer from a number of image quality degradation phenomena due to insufficient exposure. These include reduced brightness, loss of transfer information, noise, and color distortion. In order to solve the above problems, researchers have proposed many deep learning-based methods to improve the illumination of images. However, most existing methods face the problem of difficulty in obtaining paired training data. In this context, a zero-reference image enhancement network for low light conditions is proposed in this paper. First, the improved Encoder-Decoder structure is used to extract image features to generate feature maps and generate the parameter matrix of the enhancement factor from the feature maps. Then, the enhancement curve is constructed using the parameter matrix. The image is iteratively enhanced using the enhancement curve and the enhancement parameters. Second, the unsupervised algorithm needs to design an image non-reference loss function in training. Four non-reference loss functions are introduced to train the parameter estimation network. Experiments on several datasets with only low-light images show that the proposed network has improved performance compared with other methods in NIQE, PIQE, and BRISQUE non-reference evaluation index, and ablation experiments are carried out for key parts, which proves the effectiveness of this method. At the same time, the performance data of the method on PC devices and mobile devices are investigated, and the experimental analysis is given. This proves the feasibility of the method in this paper in practical application. Full article
(This article belongs to the Special Issue AI-Driven Sensing for Image Processing and Recognition)
Show Figures

Figure 1

16 pages, 5620 KiB  
Article
Lightweight Model for Pavement Defect Detection Based on Improved YOLOv7
by Peile Huang, Shenghuai Wang, Jianyu Chen, Weijie Li and Xing Peng
Sensors 2023, 23(16), 7112; https://doi.org/10.3390/s23167112 - 11 Aug 2023
Cited by 3 | Viewed by 1499
Abstract
Existing pavement defect detection models face challenges in balancing detection accuracy and speed while being constrained by large parameter sizes, hindering deployment on edge terminal devices with limited computing resources. To address these issues, this paper proposes a lightweight pavement defect detection model [...] Read more.
Existing pavement defect detection models face challenges in balancing detection accuracy and speed while being constrained by large parameter sizes, hindering deployment on edge terminal devices with limited computing resources. To address these issues, this paper proposes a lightweight pavement defect detection model based on an improved YOLOv7 architecture. The model introduces four key enhancements: first, the incorporation of the SPPCSPC_Group grouped space pyramid pooling module to reduce the parameter load and computational complexity; second, the utilization of the K-means clustering algorithm for generating anchors, accelerating model convergence; third, the integration of the Ghost Conv module, enhancing feature extraction while minimizing the parameters and calculations; fourth, introduction of the CBAM convolution module to enrich the semantic information in the last layer of the backbone network. The experimental results demonstrate that the improved model achieved an average accuracy of 91%, and the accuracy in detecting broken plates and repaired models increased by 9% and 8%, respectively, compared to the original model. Moreover, the improved model exhibited reductions of 14.4% and 29.3% in the calculations and parameters, respectively, and a 29.1% decrease in the model size, resulting in an impressive 80 FPS (frames per second). The enhanced YOLOv7 successfully balances parameter reduction and computation while maintaining high accuracy, making it a more suitable choice for pavement defect detection compared with other algorithms. Full article
(This article belongs to the Special Issue AI-Driven Sensing for Image Processing and Recognition)
Show Figures

Figure 1

19 pages, 1149 KiB  
Article
A Long-Tailed Image Classification Method Based on Enhanced Contrastive Visual Language
by Ying Song, Mengxing Li and Bo Wang
Sensors 2023, 23(15), 6694; https://doi.org/10.3390/s23156694 - 26 Jul 2023
Viewed by 1322
Abstract
To solve the problem that the common long-tailed classification method does not use the semantic features of the original label text of the image, and the difference between the classification accuracy of most classes and minority classes are large, the long-tailed image classification [...] Read more.
To solve the problem that the common long-tailed classification method does not use the semantic features of the original label text of the image, and the difference between the classification accuracy of most classes and minority classes are large, the long-tailed image classification method based on enhanced contrast visual language trains the head class and tail class samples separately, uses text image to pre-train the information, and uses the enhanced momentum contrastive loss function and RandAugment enhancement to improve the learning of tail class samples. On the ImageNet-LT long-tailed dataset, the enhanced contrasting visual language-based long-tailed image classification method has improved all class accuracy, tail class accuracy, middle class accuracy, and the F1 value by 3.4%, 7.6%, 3.5%, and 11.2%, respectively, compared to the BALLAD method. The difference in accuracy between the head class and tail class is reduced by 1.6% compared to the BALLAD method. The results of three comparative experiments indicate that the long-tailed image classification method based on enhanced contrastive visual language has improved the performance of tail classes and reduced the accuracy difference between the majority and minority classes. Full article
(This article belongs to the Special Issue AI-Driven Sensing for Image Processing and Recognition)
Show Figures

Figure 1

Back to TopTop