Advanced Machine Learning Methods for Image Processing, Perception and Understanding

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Computational and Applied Mathematics".

Deadline for manuscript submissions: closed (10 October 2023) | Viewed by 15212

Special Issue Editors

School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China
Interests: image processing; machine learning; video analysis
School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China
Interests: image processing; machine learning; pattern recognition

Special Issue Information

Dear Colleagues,

Image processing, perception and understanding have underpinned much of recent progress in a wide range of computer vision applications, including space exploration, military surveillance, environmental protection, precise agriculture, intelligence manufacturing, intelligent transportation etc. However, due the complicated imaging environments as well as various application requirements in practice, image processing, perception and understanding are still confronted with many challenges, including image quality degeneration, multi-source image fusion, few-shot learning, cross-domain generalization, unsupervised learning, fast inference, embedded employment etc. Thus, it is necessary to investigate advanced machine learning methods to mitigate these existing challenges, and to sufficiently exploit the potential of image processing, perception and understanding in more real-world applications. 

This Special Issue calls for innovative works that explore recent advances, prospects, and challenges in machine learning methods or applications to produce high-quality images; improve the generalization performance towards different perception and understanding tasks under degenerated images, few-shot annotations, cross-domain data etc.; as well as accelerate the inference speed, especially on embedded devices with limited computational resources. It is noteworthy that in this Special Issue the keyword ‘image’ must be understood in a wide sense: optical imagery, infrared imagery, multispectral imagery, hyperspectral imagery, SAR imagery, medical imagery, and so on. The purpose is to provide a platform to enhance interdisciplinary research and collaborations, and to share the most innovative ideas in various related fields.

Prof. Dr. Lei Zhang
Prof. Dr. Wei Wei
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image restoration and fusion methods
  • image classification and segmentation methods
  • object detection and recognition methods
  • conventional machine learning methods
  • deep learning methods

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

45 pages, 10738 KiB  
Article
CAM-FRN: Class Attention Map-Based Flare Removal Network in Frontal-Viewing Camera Images of Vehicles
by Seon Jong Kang, Kyung Bong Ryu, Min Su Jeong, Seong In Jeong and Kang Ryoung Park
Mathematics 2023, 11(17), 3644; https://doi.org/10.3390/math11173644 - 23 Aug 2023
Viewed by 1657
Abstract
In recent years, active research has been conducted on computer vision and artificial intelligence (AI) for autonomous driving to increase the understanding of the importance of object detection technology using a frontal-viewing camera. However, using an RGB camera as a frontal-viewing camera can [...] Read more.
In recent years, active research has been conducted on computer vision and artificial intelligence (AI) for autonomous driving to increase the understanding of the importance of object detection technology using a frontal-viewing camera. However, using an RGB camera as a frontal-viewing camera can generate lens flare artifacts due to strong light sources, components of the camera lens, and foreign substances, which damage the images, making the shape of objects in the images unrecognizable. Furthermore, the object detection performance is significantly reduced owing to a lens flare during semantic segmentation performed for autonomous driving. Flare artifacts pose challenges in their removal, as they are caused by various scattering and reflection effects. The state-of-the-art methods using general scene image retain artifactual noises and fail to eliminate flare entirely when there exist severe levels of flare in the input image. In addition, no study has been conducted to solve these problems in the field of semantic segmentation for autonomous driving. Therefore, this study proposed a novel lens flare removal technique based on a class attention map-based flare removal network (CAM-FRN) and a semantic segmentation method using the images in which the lens flare is removed. CAM-FRN is a generative-based flare removal network that estimates flare regions, generates highlighted images as input, and incorporates the estimated regions into the loss function for successful artifact reconstruction and comprehensive flare removal. We synthesized a lens flare using the Cambridge-driving Labeled Video Database (CamVid) and Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago (KITTI) datasets, which are road scene open datasets. The experimental results showed that semantic segmentation accuracy in images with lens flare was removed based on CAM-FRN, exhibiting 71.26% and 60.27% mean intersection over union (mIoU) in the CamVid and KITTI databases, respectively. This indicates that the proposed method is significantly better than state-of-the-art methods. Full article
Show Figures

Figure 1

22 pages, 7673 KiB  
Article
Enhanced Night-to-Day Image Conversion Using CycleGAN-Based Base-Detail Paired Training
by Dong-Min Son, Hyuk-Ju Kwon and Sung-Hak Lee
Mathematics 2023, 11(14), 3102; https://doi.org/10.3390/math11143102 - 13 Jul 2023
Cited by 3 | Viewed by 2348
Abstract
Numerous studies are underway to enhance the identification of surroundings in nighttime environments. These studies explore methods such as utilizing infrared images to improve night image visibility or converting night images into day-like representations for enhanced visibility. This research presents a technique focused [...] Read more.
Numerous studies are underway to enhance the identification of surroundings in nighttime environments. These studies explore methods such as utilizing infrared images to improve night image visibility or converting night images into day-like representations for enhanced visibility. This research presents a technique focused on converting the road conditions depicted in night images to resemble daytime scenes. To facilitate this, a paired dataset is created by augmenting limited day and night image data using CycleGAN. The model is trained using both original night images and single-scale luminance transform (SLAT) day images to enhance the level of detail in the converted daytime images. However, the generated daytime images may exhibit sharpness and noise issues. To address these concerns, an image processing approach, inspired by the Stevens effect and local blurring, which align with visual characteristics, is employed to reduce noise and enhance image details. Consequently, this study contributes to improving the visibility of night images by means of day image conversion and subsequent image processing. The proposed night-to-day image translation in this study has a processing time of 0.81 s, including image processing, which is less than one second. Therefore, it is considered valuable as a module for daytime image translation. Additionally, the image quality assessment metric, BRISQUE, yielded a score of 19.8, indicating better performance compared to conventional methods. The outcomes of this research hold potential applications in fields such as CCTV surveillance systems and self-driving cars. Full article
Show Figures

Figure 1

18 pages, 6274 KiB  
Article
Combining the Taguchi Method and Convolutional Neural Networks for Arrhythmia Classification by Using ECG Images with Single Heartbeats
by Shu-Fen Li, Mei-Ling Huang and Yan-Sheng Wu
Mathematics 2023, 11(13), 2841; https://doi.org/10.3390/math11132841 - 24 Jun 2023
Viewed by 1262
Abstract
In recent years, deep learning has been applied in numerous fields and has yielded excellent results. Convolutional neural networks (CNNs) have been used to analyze electrocardiography (ECG) data in biomedical engineering. This study combines the Taguchi method and CNNs for classifying ECG images [...] Read more.
In recent years, deep learning has been applied in numerous fields and has yielded excellent results. Convolutional neural networks (CNNs) have been used to analyze electrocardiography (ECG) data in biomedical engineering. This study combines the Taguchi method and CNNs for classifying ECG images from single heartbeats without feature extraction or signal conversion. All of the fifteen types (five classes) in the MIT-BIH Arrhythmia Dataset were included in this study. The classification accuracy achieved 96.79%, which is comparable to the state-of-the-art literature. The proposed model demonstrates effective and efficient performance in the identification of heartbeat diseases while minimizing misdiagnosis. Full article
Show Figures

Figure 1

33 pages, 75637 KiB  
Article
LCA-GAN: Low-Complexity Attention-Generative Adversarial Network for Age Estimation with Mask-Occluded Facial Images
by Se Hyun Nam, Yu Hwan Kim, Jiho Choi, Chanhum Park and Kang Ryoung Park
Mathematics 2023, 11(8), 1926; https://doi.org/10.3390/math11081926 - 19 Apr 2023
Cited by 2 | Viewed by 1343
Abstract
Facial-image-based age estimation is being increasingly used in various fields. Examples include statistical marketing analysis based on age-specific product preferences, medical applications such as beauty products and telemedicine, and age-based suspect tracking in intelligent surveillance camera systems. Masks are increasingly worn for hygiene, [...] Read more.
Facial-image-based age estimation is being increasingly used in various fields. Examples include statistical marketing analysis based on age-specific product preferences, medical applications such as beauty products and telemedicine, and age-based suspect tracking in intelligent surveillance camera systems. Masks are increasingly worn for hygiene, personal privacy concerns, and fashion. In particular, the acquisition of mask-occluded facial images has become more frequent due to the COVID-19 pandemic. These images cause a loss of important features and information for age estimation, which reduces the accuracy of age estimation. Existing de-occlusion studies have investigated masquerade masks that do not completely occlude the eyes, nose, and mouth; however, no studies have investigated the de-occlusion of masks that completely occlude the nose and mouth and its use for age estimation, which is the goal of this study. Accordingly, this study proposes a novel low-complexity attention-generative adversarial network (LCA-GAN) for facial age estimation that combines an attention architecture and conditional generative adversarial network (conditional GAN) to de-occlude mask-occluded human facial images. The open databases MORPH and PAL were used to conduct experiments. According to the results, the mean absolution error (MAE) of age estimation with the de-occluded facial images reconstructed using the proposed LCA-GAN is 6.64 and 6.12 years, respectively. Thus, the proposed method yielded higher age estimation accuracy than when using occluded images or images reconstructed using the state-of-the-art method. Full article
Show Figures

Figure 1

26 pages, 5242 KiB  
Article
Remote Sensing Imagery Object Detection Model Compression via Tucker Decomposition
by Lang Huyan, Ying Li, Dongmei Jiang, Yanning Zhang, Quan Zhou, Bo Li, Jiayuan Wei, Juanni Liu, Yi Zhang, Peng Wang and Hai Fang
Mathematics 2023, 11(4), 856; https://doi.org/10.3390/math11040856 - 07 Feb 2023
Cited by 2 | Viewed by 1116
Abstract
Although convolutional neural networks (CNNs) have made significant progress, their deployment onboard is still challenging because of their complexity and high processing cost. Tensors provide a natural and compact representation of CNN weights via suitable low-rank approximations. A novel decomposed module called DecomResnet [...] Read more.
Although convolutional neural networks (CNNs) have made significant progress, their deployment onboard is still challenging because of their complexity and high processing cost. Tensors provide a natural and compact representation of CNN weights via suitable low-rank approximations. A novel decomposed module called DecomResnet based on Tucker decomposition was proposed to deploy a CNN object detection model on a satellite. We proposed a remote sensing image object detection model compression framework based on low-rank decomposition which consisted of four steps, namely (1) model initialization, (2) initial training, (3) decomposition of the trained model and reconstruction of the decomposed model, and (4) fine-tuning. To validate the performance of the decomposed model in our real mission, we constructed a dataset containing only two classes of objects based on the DOTA and HRSC2016. The proposed method was comprehensively evaluated on the NWPU VHR-10 dataset and the CAST-RS2 dataset created in this work. The experimental results demonstrated that the proposed method, which was based on Resnet-50, could achieve up to 4.44 times the compression ratio and 5.71 times the speedup ratio with merely a 1.9% decrease in the mAP (mean average precision) of the CAST-RS2 dataset and a 5.3% decrease the mAP of the NWPU VHR-10 dataset. Full article
Show Figures

Figure 1

13 pages, 1047 KiB  
Article
MLA-Net: Feature Pyramid Network with Multi-Level Local Attention for Object Detection
by Xiaobao Yang, Wentao Wang, Junsheng Wu, Chen Ding, Sugang Ma and Zhiqiang Hou
Mathematics 2022, 10(24), 4789; https://doi.org/10.3390/math10244789 - 16 Dec 2022
Cited by 2 | Viewed by 1355
Abstract
Feature pyramid networks and attention mechanisms are the mainstream methods to improve the detection performance of many current models. However, when they are learned jointly, there is a lack of information association between multi-level features. Therefore, this paper proposes a feature pyramid of [...] Read more.
Feature pyramid networks and attention mechanisms are the mainstream methods to improve the detection performance of many current models. However, when they are learned jointly, there is a lack of information association between multi-level features. Therefore, this paper proposes a feature pyramid of the multi-level local attention method, dubbed as MLA-Net (Feature Pyramid Network with Multi-Level Local Attention for Object Detection), which aims to establish a correlation mechanism for multi-level local information. First, the original multi-level features are deformed and rectified using the local pixel-rectification module, and global semantic enhancement is achieved through the multi-level spatial-attention module. After that, the original features are further fused through the residual connection to achieve the fusion of contextual features to enhance the feature representation. Extensive ablation experiments were conducted on the MS COCO (Microsoft Common Objects in Context) dataset, and the results demonstrate the effectiveness of the proposed method with a 0.5% enhancement. An improvement of 1.2% was obtained on the PASCAL VOC (Pattern Analysis Statistical Modelling and Computational Learning, Visual Object Classes) dataset, reaching 81.8%, thereby, indicating that the proposed method is robust and can compete with other advanced detection models. Full article
Show Figures

Figure 1

Review

Jump to: Research

32 pages, 941 KiB  
Review
Artificial Intelligence in Business-to-Customer Fashion Retail: A Literature Review
by Aitor Goti, Leire Querejeta-Lomas, Aitor Almeida, José Gaviria de la Puerta and Diego López-de-Ipiña
Mathematics 2023, 11(13), 2943; https://doi.org/10.3390/math11132943 - 30 Jun 2023
Cited by 3 | Viewed by 5239
Abstract
Many industries, including healthcare, banking, the auto industry, education, and retail, have already undergone significant changes because of artificial intelligence (AI). Business-to-Customer (B2C) e-commerce has considerably increased the use of AI in recent years. The purpose of this research is to examine the [...] Read more.
Many industries, including healthcare, banking, the auto industry, education, and retail, have already undergone significant changes because of artificial intelligence (AI). Business-to-Customer (B2C) e-commerce has considerably increased the use of AI in recent years. The purpose of this research is to examine the significance and impact of AI in the realm of fashion e-commerce. To that end, a systematic review of the literature is carried out, in which data from the Web Of Science and Scopus databases were used to analyze 219 publications on the subject. The articles were first categorized using AI techniques. In the realm of fashion e-commerce, they were divided into two categories. These categorizations allowed for the identification of research gaps in the use of AI. These gaps offer potential and possibilities for further research. Full article
Show Figures

Figure 1

Back to TopTop