Advanced Research and Applications of Deep Learning and Neural Network in Image Recognition

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (15 July 2023) | Viewed by 17981

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors


E-Mail Website
Guest Editor
National Laboratory of Radar Signal Processing, Xidian University, Xi’an 710071, China
Interests: target detection and recognition; deep learning; synthetic aperture radar

E-Mail Website
Guest Editor
Faculty of Geosciences and Engineering, Southwest Jiaotong University, Chengdu 610031, China
Interests: image registration; image matching; image fusion; image segmentation
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Automation, Northwestern Polytechnical University, Xi'an 710021, China
Interests: SAR target recognition; transfer learning; unsupervised learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Over the past two decades, few developments have been more astounding than the rapid progress achieved in image recognition. Object detection performance rates skyrocketed from approximately 30 percent in mean average precision to more than 90 percent for the PASCAL VOC benchmark. Likewise, state-of-the-art learning algorithms even exceed human performance for image classification in the ImageNet dataset. These improvements in image classification have significant impacts on a wide range of practical applications, including video surveillance, autonomous driving, intelligent healthcare, remote sensing image interpretation, and artificial intelligence.

The major driving force behind the recent advances in image classification lies in deep learning algorithms. The success of deep learning is powered by two crucial issues: large-scale training datasets and powerful computational platforms. In most cases, the performances obtained by deep neural networks are much better than those of hand-crafted delicate image features in various image classification tasks. Yet, despite the great success of deep learning in image recognition so far, there are still numerous challenges to overcome. The aim of this Special Issue is to present new solutions to these challenging problems. Topics of interests include but are not limited to the following issues:

  • The improvement of model generalization ability, i.e., how to train the deep models that generalize well in real-world scenarios that have not been seen in training.
  • The improvement of learning efficiency in small data environments. Current techniques generally break down if few labeled examples are available. This should be carefully considered in practical applications.
  • Lightweight neural networks oriented to this specific task.
  • The integration of prior knowledge in deep learning.
  • The multitask deep learning technique.

The meta deep learning under a certain background.

Dr. Ganggang Dong
Dr. Yuanxin Ye
Dr. Zhongling Huang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • model generalization
  • multitask learning
  • small sample size
  • meta-learning
  • specific data augmentation

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

3 pages, 180 KiB  
Editorial
Guest Editorial: Foreword to the Special Issue on Advanced Research and Applications of Deep Learning and Neural Network in Image Recognition
by Ganggang Dong, Yuanxin Ye and Zhongling Huang
Electronics 2024, 13(3), 557; https://doi.org/10.3390/electronics13030557 - 30 Jan 2024
Viewed by 493
Abstract
Over the last two decades, the realm of image recognition has undergone a remarkable transformation, characterized by an astonishing pace of advancement [...] Full article

Research

Jump to: Editorial

14 pages, 3227 KiB  
Article
An Image Unmixing and Stitching Deep Learning Algorithm for In-Screen Fingerprint Recognition Application
by Xiaochuan Chen, Xuan Feng, Yapeng Li, Ran Duan, Lei Wang, Yangbing Li, Minghua Xuan, Qiaofeng Tan and Xue Dong
Electronics 2023, 12(18), 3768; https://doi.org/10.3390/electronics12183768 - 06 Sep 2023
Viewed by 826
Abstract
The market share of organic light-emitting diode (OLED) screens in consumer electronics has grown rapidly in recent years. In order to increase the screen-to-body ratio of OLED phones, under-screen or in-screen fingerprint recognition is a must-have option. Current commercial hardware schemes include adhesive, [...] Read more.
The market share of organic light-emitting diode (OLED) screens in consumer electronics has grown rapidly in recent years. In order to increase the screen-to-body ratio of OLED phones, under-screen or in-screen fingerprint recognition is a must-have option. Current commercial hardware schemes include adhesive, ultrasonic, and under-screen optical ones. No mature in-screen solution has been proposed. In this work, we designed and manufactured an OLED panel with an in-screen fingerprint recognition system for the first time, by integrating an active sensor array into the OLED panel. The sensor and display module share the same set of fabrication processes when manufactured. Compared with the current widely commercially available under-screen schemes, the proposed in-screen solution can achieve a much larger functional area, better flexibility, and smaller thickness, while significantly reducing module cost. A point light source scheme, implemented by lighting up a single or several adjacent OLED pixels, instead of a conventional area source scheme as in the CMOS image sensor, or a CIS-based solution, has to be adopted since the optical distance is not long enough due to the integration. We designed a pattern for the point light sources and developed an optical unmixing network model to realize the unmixing and stitching of images obtained by each point light source at the same exposure time. After training, data verification of this network model shows that this deep learning algorithm outputs a stitched image of large area and high quality, where FRR = 0.7% given FAR = 1:50 k. In despite of a poorer quality of raw images and a much more complex algorithm compared with current commercial solutions, the proposed algorithm still obtains results comparable to peer studies, proving the effectiveness of our algorithm. Thus, the time required for fingerprint capture in our in-screen scheme is greatly reduced, by which one of the main obstacles for commercial application is overcome. Full article
Show Figures

Figure 1

17 pages, 9453 KiB  
Article
An Improved CNN for Polarization Direction Measurement
by Hao Han, Jin Liu, Wei Wang, Chao Gao and Jianhua Shi
Electronics 2023, 12(17), 3723; https://doi.org/10.3390/electronics12173723 - 04 Sep 2023
Viewed by 859
Abstract
Spatially polarization modulation has been proven to be an efficient and simple method for polarization measurement. Since the polarization information is encoded in the intensity distribution of the modulated light, the task of polarization measurement can be treated as the image processing problem, [...] Read more.
Spatially polarization modulation has been proven to be an efficient and simple method for polarization measurement. Since the polarization information is encoded in the intensity distribution of the modulated light, the task of polarization measurement can be treated as the image processing problem, while the pattern of the light is captured by a camera. However, classical image processing methods could not meet the increasing demand of practical applications due to their poor computational efficiency. To address this issue, in this paper, an improved Convolutional Neural Network is proposed to extract the Stokes parameters of the light from the irradiance image. In our algorithm, residual blocks are adopted and different layers are connected to ensure that the underlying features include more details of the image. Furthermore, refined residual block and Global Average Pooling are introduced to avoid overfitting issues and gradient vanishing problems. Finally, our algorithm is tested on massive synthetic and real data, while the mean square error (MSE) between the extracted values and the true values of the normalized Stokes parameters is counted. Compared to VGG and FAM, the experimental results demonstrate that our algorithm has outstanding performance. Full article
Show Figures

Figure 1

20 pages, 22744 KiB  
Article
A Railway Track Extraction Method Based on Improved DeepLabV3+
by Yanbin Weng, Zuochuang Li, Xiahu Chen, Jing He, Fengnian Liu, Xiaobin Huang and Hua Yang
Electronics 2023, 12(16), 3500; https://doi.org/10.3390/electronics12163500 - 18 Aug 2023
Cited by 2 | Viewed by 1021
Abstract
Extracting railway tracks is crucial for creating electronic railway maps. Traditional methods require significant manual labor and resources while existing neural networks have limitations in efficiency and precision. To address these challenges, a railway track extraction method using an improved DeepLabV3+ model is [...] Read more.
Extracting railway tracks is crucial for creating electronic railway maps. Traditional methods require significant manual labor and resources while existing neural networks have limitations in efficiency and precision. To address these challenges, a railway track extraction method using an improved DeepLabV3+ model is proposed, which incorporates several key enhancements. Firstly, the encoder part of the method utilizes the lightweight network MobileNetV3 as the backbone extraction network for DeepLabV3+. Secondly, the decoder part adopts the lightweight, universal upsampling operator CARAFE for upsampling. Lastly, to address any potential extraction errors, morphological algorithms are applied to optimize the extraction results. A dedicated railway track segmentation dataset is also created to train and evaluate the proposed method. The experimental results demonstrate that the model achieves impressive performance on the railway track segmentation dataset and DeepGlobe dataset. The MIoU scores are 88.93% and 84.72%, with Recall values of 89.02% and 86.96%. Moreover, the overall accuracy stands at 97.69% and 94.84%. The algorithm’s operation time is about 5% lower than the original network. Furthermore, the morphological algorithm effectively eliminates errors like holes and spots. These findings indicate the model’s accuracy, efficiency, and enhancement brought by the morphological algorithm in error elimination. Full article
Show Figures

Figure 1

13 pages, 2755 KiB  
Article
Transformer-Based Global PointPillars 3D Object Detection Method
by Lin Zhang, Hua Meng, Yunbing Yan and Xiaowei Xu
Electronics 2023, 12(14), 3092; https://doi.org/10.3390/electronics12143092 - 16 Jul 2023
Cited by 3 | Viewed by 1953
Abstract
The PointPillars algorithm can detect vehicles, pedestrians, and cyclists on the road, and is widely used in the field of environmental awareness in autonomous driving. However, its feature encoding network only uses a minimalist PointNet network for feature extraction of point cloud information, [...] Read more.
The PointPillars algorithm can detect vehicles, pedestrians, and cyclists on the road, and is widely used in the field of environmental awareness in autonomous driving. However, its feature encoding network only uses a minimalist PointNet network for feature extraction of point cloud information, which does not consider the global context information of the point cloud, and the local structure features are not sufficiently extracted, and these feature losses can seriously affect the performance of the object detection network. To address this problem, this paper proposes an improved PointPillars algorithm named TGPP: Transformer-based Global PointPillars. After the point cloud is divided into several pillars, global context features and local structure features are extracted through a multi-head attention mechanism, so that the point cloud after feature coding has global context features and local structure features; the two-dimensional pseudo-image generated by this feature is used for feature learning using a two-dimensional convolutional neural network. Finally, the SSD detection head is used to achieve 3D object detection. It is demonstrated that the TGPP achieves an average accuracy improvement of 2.64% in the KITTI test set. Full article
Show Figures

Figure 1

17 pages, 1378 KiB  
Article
Generalized Zero-Shot Image Classification via Partially-Shared Multi-Task Representation Learning
by Gerui Wang and Sheng Tang
Electronics 2023, 12(9), 2085; https://doi.org/10.3390/electronics12092085 - 03 May 2023
Viewed by 1143
Abstract
Generalized Zero-Shot Learning (GZSL) holds significant research importance as it enables the classification of samples from both seen and unseen classes. A prevailing approach for GZSL is learning transferable representations that can generalize well to both seen and unseen classes during testing. This [...] Read more.
Generalized Zero-Shot Learning (GZSL) holds significant research importance as it enables the classification of samples from both seen and unseen classes. A prevailing approach for GZSL is learning transferable representations that can generalize well to both seen and unseen classes during testing. This approach encompasses two key concepts: discriminative representations and semantic-relevant representations. “Semantic-relevant” facilitates the transfer of semantic knowledge using pre-defined semantic descriptors, while “discriminative” is crucial for accurate category discrimination. However, these two concepts are arguably inherently conflicting, as semantic descriptors are not specifically designed for image classification. Existing methods often struggle with balancing these two aspects and neglect the conflict between them, leading to suboptimal representation generalization and transferability to unseen classes. To address this issue, we propose a novel partially-shared multi-task representation learning method, termed PS-GZSL, which jointly preserves complementary and sharable knowledge between these two concepts. Specifically, we first propose a novel perspective that treats the learning of discriminative and semantic-relevant representations as optimizing a discrimination task and a visual-semantic alignment task, respectively. Then, to learn more complete and generalizable representations, PS-GZSL explicitly factorizes visual features into task-shared and task-specific representations and introduces two advanced tasks: an instance-level contrastive discrimination task and a relation-based visual-semantic alignment task. Furthermore, PS-GZSL employs Mixture-of-Experts (MoE) with a dropout mechanism to prevent representation degeneration and integrates a conditional GAN (cGAN) to synthesize unseen features for estimating unseen visual features. Extensive experiments and more competitive results on five widely-used GZSL benchmark datasets validate the effectiveness of our PS-GZSL. Full article
Show Figures

Figure 1

14 pages, 610 KiB  
Article
Deep Learning Architecture Improvement Based on Dynamic Pruning and Layer Fusion
by Qi Li, Hengyi Li and Lin Meng
Electronics 2023, 12(5), 1208; https://doi.org/10.3390/electronics12051208 - 02 Mar 2023
Cited by 2 | Viewed by 1221
Abstract
The heavy workload of current deep learning architectures significantly impedes the application of deep learning, especially on resource-constrained devices. Pruning has provided a promising solution to compressing the bloated deep learning models by removing the redundancies of the networks. However, existing pruning methods [...] Read more.
The heavy workload of current deep learning architectures significantly impedes the application of deep learning, especially on resource-constrained devices. Pruning has provided a promising solution to compressing the bloated deep learning models by removing the redundancies of the networks. However, existing pruning methods mainly focus on compressing the superfluous channels without considering layer-level redundancies, which results in the channel-pruned models still suffering from serious redundancies. To mitigate this problem, we propose an effective compression algorithm for deep learning models that uses both the channel-level and layer-level compression techniques to optimize the enormous deep learning models. In detail, the channels are dynamically pruned first, and then the model is further optimized by fusing the redundant layers. Only a minor performance loss results. The experimental results show that the computations of ResNet-110 are reduced by 80.05%, yet the accuracy is only decreased by 0.72%. Forty-eight convolutional layers could be discarded from ResNet-110 with no loss of performance, which fully demonstrates the efficiency of the proposal. Full article
Show Figures

Figure 1

16 pages, 2349 KiB  
Article
Siamese PointNet: 3D Head Pose Estimation with Local Feature Descriptor
by Qi Wang, Hang Lei and Weizhong Qian
Electronics 2023, 12(5), 1194; https://doi.org/10.3390/electronics12051194 - 01 Mar 2023
Cited by 3 | Viewed by 1482
Abstract
Head pose estimation is an important part of the field of face analysis technology. It can be applied to driver attention monitoring, passenger monitoring, effective information screening, etc. However, illumination changes and partial occlusion interfere with the task, and due to the non-stationary [...] Read more.
Head pose estimation is an important part of the field of face analysis technology. It can be applied to driver attention monitoring, passenger monitoring, effective information screening, etc. However, illumination changes and partial occlusion interfere with the task, and due to the non-stationary characteristic of the head pose change process, normal regression networks are unable to achieve very accurate results on large-scale synthetic training data. To address the above problems, a Siamese network based on 3D point clouds was proposed, which adopts a share weight network with similar pose samples to constrain the regression process of the pose’s angles; meanwhile, a local feature descriptor was introduced to describe the local geometric features of the objects. In order to verify the performance of our method, we conducted experiments on two public datasets: the Biwi Kinect Head Pose dataset and Pandora. The results show that compared with the latest methods, our standard deviation was reduced by 0.4, and the mean error was reduced by 0.1; meanwhile, our network also maintained a good real-time performance. Full article
Show Figures

Figure 1

17 pages, 3739 KiB  
Article
Siamese Neural Pointnet: 3D Face Verification under Pose Interference and Partial Occlusion
by Qi Wang, Wei-Zhong Qian, Hang Lei and Lu Chen
Electronics 2023, 12(3), 620; https://doi.org/10.3390/electronics12030620 - 26 Jan 2023
Cited by 2 | Viewed by 1623
Abstract
Face verification based on ordinary 2D RGB images has been widely used in daily life. However, the quality of ordinary 2D RGB images is limited by illumination, and they lack stereoscopic features, which makes it difficult to apply them in poor lighting conditions [...] Read more.
Face verification based on ordinary 2D RGB images has been widely used in daily life. However, the quality of ordinary 2D RGB images is limited by illumination, and they lack stereoscopic features, which makes it difficult to apply them in poor lighting conditions and means they are susceptible to interference from head pose and partial occlusions. Considering point clouds are not affected by illumination and can easily represent geometric information, this paper constructs a novel Siamese network for 3D face verification based on Pointnet. In order to reduce the influence of the self-generated point clouds, the chamfer distance is adopted to constrain the original point clouds and explore a new energy function to distinguish features. The experimental results with the Pandora and Curtin Faces datasets show that the accuracy of the proposed method is improved by 0.6% compared with the latest methods; in large pose interference and partial occlusion, the accuracy is improved by 4% and 5%. The results verify that our method outperforms the latest methods and can be applied to a variety of complex scenarios while maintaining real-time performance. Full article
Show Figures

Graphical abstract

17 pages, 882 KiB  
Article
Cyclic Federated Learning Method Based on Distribution Information Sharing and Knowledge Distillation for Medical Data
by Liang Yu and Jianjun Huang
Electronics 2022, 11(23), 4039; https://doi.org/10.3390/electronics11234039 - 05 Dec 2022
Cited by 1 | Viewed by 1367
Abstract
Federated learning has been attracting increasing amounts of attention for its potential applications in disease diagnosis within the medical field due to privacy preservation and its ability to solve data silo problems. However, the inconsistent distributions of client-side data significantly degrade the performance [...] Read more.
Federated learning has been attracting increasing amounts of attention for its potential applications in disease diagnosis within the medical field due to privacy preservation and its ability to solve data silo problems. However, the inconsistent distributions of client-side data significantly degrade the performance of traditional federated learning. To eliminate the adverse effects of non-IID problems on federated learning performance on multiple medical institution datasets, this paper proposes a cyclic federated learning method based on distribution information sharing and knowledge distillation for medical data (CFL_DS_KD). The method is divided into two main phases. The first stage is an offline preparation process in which all clients train a generator model on local datasets and pass the generator to neighbouring clients to generate virtual shared data. The second stage is an online process that can also be mainly divided into two steps. The first step is a knowledge distillation learning process in which all clients first initialise the task model on the local datasets and share it with neighbouring clients. The clients then use the shared task model to guide the updating of their local task models on the virtual shared data. The second step simply re-updates the task model on the local datasets again and shares it with neighbouring clients. Our experiments on non-IID datasets demonstrated the superior performance of our proposed method compared to existing federated learning algorithms. Full article
Show Figures

Figure 1

25 pages, 6463 KiB  
Article
Vision-Based Quadruped Pose Estimation and Gait Parameter Extraction Method
by Zewu Gong, Yunwei Zhang, Dongfeng Lu and Tiannan Wu
Electronics 2022, 11(22), 3702; https://doi.org/10.3390/electronics11223702 - 11 Nov 2022
Cited by 3 | Viewed by 1664
Abstract
In the study of animal behavior, the prevention of sickness, and the gait planning of legged robots, pose estimation, and gait parameter extraction of quadrupeds are of tremendous importance. However, there are several varieties of quadrupeds, and distinct species frequently have radically diverse [...] Read more.
In the study of animal behavior, the prevention of sickness, and the gait planning of legged robots, pose estimation, and gait parameter extraction of quadrupeds are of tremendous importance. However, there are several varieties of quadrupeds, and distinct species frequently have radically diverse body types, limb configurations, and gaits. Currently, it is challenging to forecast animal pose estimation with any degree of accuracy. This research developed a quadruped animal pose estimation and gait parameter extraction method to address this problem. A computational framework including three components of target screening, animal pose estimation model, and animal gaits parameter extraction, which can totally and efficiently solve the problem of quadruped animal pose estimation and gait parameter extraction, makes up its core. On the basis of the HRNet network, an improved quadruped animal keypoint extraction network, RFB-HRNet, was proposed to enhance the extraction effect of quadruped pose estimation. The basic concept was to use a DyConv (dynamic convolution) module and an RFB (receptive field block) module to propose a special receptive field module DyC-RFB to optimize the feature extraction capability of the HRNet network at stage 1 and to enhance the feature extraction capability of the entire network model. The public dataset AP10K was then used to validate the model’s performance, and it was discovered that the proposed method was superior to alternative methods. Second, a two-stage cascade network was created by adding an object detection network to the front end of the pose estimation network to filter the animal object in input images, which enhanced the pose estimation effect of small targets and multitargets. The acquired keypoints data of animals were then utilized to extract the gait parameters of the experimental objects. Experiment findings showed that the gait parameter extraction model proposed in this research could effectively extract the gait frequency, gait sequence, gait duty cycle, and gait trajectory parameters of quadruped animals, and obtain real-time and accurate gait trajectory. Full article
Show Figures

Figure 1

17 pages, 4727 KiB  
Article
RERB: A Dataset for Residential Area Extraction with Regularized Boundary in Remote Sensing Imagery for Mapping Application
by Songlin Liu, Li Zhang, Wei Liu, Jun Hu, Hui Gong, Xin Zhou and Danchao Gong
Electronics 2022, 11(17), 2790; https://doi.org/10.3390/electronics11172790 - 05 Sep 2022
Viewed by 1162
Abstract
Due to the high automaticity and efficiency of image-based residential area extraction, it has become one of the research hotspots in surveying, mapping, and computer vision, etc. For the application of mapping residential area, the extracted contour is required to be regular. However, [...] Read more.
Due to the high automaticity and efficiency of image-based residential area extraction, it has become one of the research hotspots in surveying, mapping, and computer vision, etc. For the application of mapping residential area, the extracted contour is required to be regular. However, the contour results of existing deep-learning-based residential area extraction methods are assigned accurately according to the actual range of residential areas in imagery, which are difficult to directly apply to mapping due to the extractions being messy and irregular. Most of the existing ground object extraction datasets based on optical satellite images mainly promote the research of semantic segmentation, thereby ignoring the requirements of mapping applications. In this paper, we introduce an optical satellite images dataset named RERB (Residential area Extraction with Regularized Boundary) to support and advance end-to-end learning of residential area mapping. The characteristic of RERB is that it embeds the prior knowledge of regularized contour in the dataset. In detail, the RERB dataset contains 13,892 high-quality satellite images with a spatial resolution of 2 m acquired from different cities in China, and the size of each image is approximately 256 × 256 pixels, which covers an area of more than 3640 square kilometers. The novel published RERB dataset encompasses four superiorities: (1) Large-scale and high-resolution; (2) well annotated and regular label contour; (3) rich background; and (4) class imbalance. Therefore, the RERB dataset is suitable for both semantic segmentation and mapping application tasks. Furthermore, to validate the effectiveness of the RERB, a novel end-to-end regularization extraction algorithm of residential areas based on contour cross-entropy constraints is designed and implemented, which can significantly improve the regularization degree of extraction for the mapping of residential areas. The comparative experimental results demonstrate the preponderance and practicability of our public dataset and can further facilitate future research. Full article
Show Figures

Figure 1

11 pages, 4391 KiB  
Article
Multi-Scale Semantic Segmentation for Fire Smoke Image Based on Global Information and U-Net
by Yuanpan Zheng, Zhenyu Wang, Boyang Xu and Yiqing Niu
Electronics 2022, 11(17), 2718; https://doi.org/10.3390/electronics11172718 - 30 Aug 2022
Cited by 6 | Viewed by 1572
Abstract
Smoke is translucent and irregular, resulting in a very complex mix between background and smoke. Thin or small smoke is visually inconspicuous, and its boundary is often blurred. Therefore, it is a very difficult task to completely segment smoke from images. To solve [...] Read more.
Smoke is translucent and irregular, resulting in a very complex mix between background and smoke. Thin or small smoke is visually inconspicuous, and its boundary is often blurred. Therefore, it is a very difficult task to completely segment smoke from images. To solve the above issues, a multi-scale semantic segmentation for fire smoke based on global information and U-Net is proposed. This algorithm uses multi-scale residual group attention (MRGA) combined with U-Net to extract multi-scale smoke features, and enhance the perception of small-scale smoke. The encoder Transformer was used to extract global information, and improve accuracy for thin smoke at the edge of images. Finally, the proposed algorithm was tested on smoke dataset, and achieves 91.83% mIoU. Compared with existing segmentation algorithms, mIoU is improved by 2.87%, and mPA is improved by 3.42%. Thus, it is a segmentation algorithm for fire smoke with higher accuracy. Full article
Show Figures

Figure 1

Back to TopTop