Topic Editors

School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
Dr. Wenqi Ren
School of Cyber Science and Technology, Sun Yat-Sen University, Guangzhou 510275, China

Applications in Image Analysis and Pattern Recognition

Abstract submission deadline
31 May 2024
Manuscript submission deadline
31 August 2024
Viewed by
146963

Topic Information

Dear Colleagues,

There could be up to ~80% neurons in the human brain related to processing visual information and cognition. Apparently, image analysis and pattern recognition are at the core of artificial intelligence, which aims to design computer programs to achieve or mimic human-like intelligence in perception and inference in the real world. With the rapid development of visual sensors and imaging technologies, image analysis and pattern recognition techniques have been extensively applied in various artificial intelligence-related areas, from industry and agriculture to surveillance and social security, etc.

Despite the significant success in methods for image analysis and pattern recognition in the past decade, their applications in addressing real problems are still unsatisfactory. Such a status also indicates a non-neglectable gap between theoretical progress and real applications in the related areas. The collection of these topics moves toward narrowing such gaps, and so we invite papers on both theorical and applied issues related to image analysis and pattern recognition.

All interested authors are invited to submit their innovative methods on the following (but are not limited to) aspects:

  • Deep learning based methods for image analysis;
  • Deep learning based methods for video analysis;
  • Image fusion methods and applications;
  • Multimedia systems and applications;
  • Image enhancement and restoration methods and their applications;
  • Image analysis and pattern recognition for robotics and unmanned systems;
  • Document image analysis and applications;
  • Structural pattern recognition methods and applications;
  • Biomedical image analysis and applications;
  • Advances in pattern recognition theories.

Prof. Dr. Bin Fan
Dr. Wenqi Ren
Topic Editors

Keywords

  • image analysis
  • pattern recognition
  • structural pattern recognition
  • computer vision
  • multimedia analysis
  • deep learning
  • document image analysis
  • image enhancement
  • image restoration
  • biomedical image analysis
  • robotics
  • unmanned systems
  • image retrieval
  • image understanding
  • feature extraction
  • image segmentation
  • semantic segmentation
  • object detection
  • image classification
  • image acquiring techniques

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Applied Sciences
applsci
2.7 4.5 2011 16.9 Days CHF 2400 Submit
Sensors
sensors
3.9 6.8 2001 17 Days CHF 2600 Submit
Journal of Imaging
jimaging
3.2 4.4 2015 21.7 Days CHF 1800 Submit
Machine Learning and Knowledge Extraction
make
3.9 8.5 2019 19.9 Days CHF 1800 Submit

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (88 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
19 pages, 5712 KiB  
Article
Soil Sampling Map Optimization with a Dual Deep Learning Framework
by Tan-Hanh Pham and Kim-Doang Nguyen
Mach. Learn. Knowl. Extr. 2024, 6(2), 751-769; https://doi.org/10.3390/make6020035 - 29 Mar 2024
Viewed by 446
Abstract
Soil sampling constitutes a fundamental process in agriculture, enabling precise soil analysis and optimal fertilization. The automated selection of accurate soil sampling locations representative of a given field is critical for informed soil treatment decisions. This study leverages recent advancements in deep learning [...] Read more.
Soil sampling constitutes a fundamental process in agriculture, enabling precise soil analysis and optimal fertilization. The automated selection of accurate soil sampling locations representative of a given field is critical for informed soil treatment decisions. This study leverages recent advancements in deep learning to develop efficient tools for generating soil sampling maps. We proposed two models, namely UDL and UFN, which are the results of innovations in machine learning architecture design and integration. The models are meticulously trained on a comprehensive soil sampling dataset collected from local farms in South Dakota. The data include five key attributes: aspect, flow accumulation, slope, normalized difference vegetation index, and yield. The inputs to the models consist of multispectral images, and the ground truths are highly unbalanced binary images. To address this challenge, we innovate a feature extraction technique to find patterns and characteristics from the data before using these refined features for further processing and generating soil sampling maps. Our approach is centered around building a refiner that extracts fine features and a selector that utilizes these features to produce prediction maps containing the selected optimal soil sampling locations. Our experimental results demonstrate the superiority of our tools compared to existing methods. During testing, our proposed models exhibit outstanding performance, achieving the highest mean Intersection over Union of 60.82% and mean Dice Coefficient of 73.74%. The research not only introduces an innovative tool for soil sampling but also lays the foundation for the integration of traditional and modern soil sampling methods. This work provides a promising solution for precision agriculture and soil management. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 2824 KiB  
Article
Time of Flight Distance Sensor–Based Construction Equipment Activity Detection Method
by Young-Jun Park and Chang-Yong Yi
Appl. Sci. 2024, 14(7), 2859; https://doi.org/10.3390/app14072859 - 28 Mar 2024
Viewed by 420
Abstract
In this study, we delve into a novel approach by employing a sensor-based pattern recognition model to address the automation of construction equipment activity analysis. The model integrates time of flight (ToF) sensors with deep convolutional neural networks (DCNNs) to accurately classify the [...] Read more.
In this study, we delve into a novel approach by employing a sensor-based pattern recognition model to address the automation of construction equipment activity analysis. The model integrates time of flight (ToF) sensors with deep convolutional neural networks (DCNNs) to accurately classify the operational activities of construction equipment, focusing on piston movements. The research utilized a one-twelfth-scale excavator model, processing the displacement ratios of its pistons into a unified dataset for analysis. Methodologically, the study outlines the setup of the sensor modules and their integration with a controller, emphasizing the precision in capturing equipment dynamics. The DCNN model, characterized by its four-layered convolutional blocks, was meticulously tuned within the MATLAB environment, demonstrating the model’s learning capabilities through hyperparameter optimization. An analysis of 2070 samples representing six distinct excavator activities yielded an impressive average precision of 95.51% and a recall of 95.31%, with an overall model accuracy of 95.19%. When compared against other vision-based and accelerometer-based methods, the proposed model showcases enhanced performance and reliability under controlled experimental conditions. This substantiates its potential for practical application in real-world construction scenarios, marking a significant advancement in the field of construction equipment monitoring. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 7570 KiB  
Article
Semantic Segmentation of Remote Sensing Images Depicting Environmental Hazards in High-Speed Rail Network Based on Large-Model Pre-Classification
by Qi Dong, Xiaomei Chen, Lili Jiang, Lin Wang, Jiachong Chen and Ying Zhao
Sensors 2024, 24(6), 1876; https://doi.org/10.3390/s24061876 - 14 Mar 2024
Viewed by 424
Abstract
With the rapid development of China’s railways, ensuring the safety of the operating environment of high-speed railways faces daunting challenges. In response to safety hazards posed by light and heavy floating objects during the operation of trains, we propose a dual-branch semantic segmentation [...] Read more.
With the rapid development of China’s railways, ensuring the safety of the operating environment of high-speed railways faces daunting challenges. In response to safety hazards posed by light and heavy floating objects during the operation of trains, we propose a dual-branch semantic segmentation network with the fusion of large models (SAMUnet). The encoder part of this network uses a dual-branch structure, in which the backbone branch uses a residual network for feature extraction and the large-model branch leverages the results of feature extraction generated by the segment anything model (SAM). Moreover, a decoding attention module is fused with the results of prediction of the SAM in the decoder part to enhance the performance of the network. We conducted experiments on the Inria Aerial Image Labeling (IAIL), Massachusetts, and high-speed railway hazards datasets to verify the effectiveness and applicability of the proposed SAMUnet network in comparison with commonly used semantic segmentation networks. The results demonstrated its superiority in terms of both the accuracies of segmentation and feature extraction. It was able to precisely extract hazards in the environment of high-speed railways to significantly improve the accuracy of semantic segmentation. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

14 pages, 3336 KiB  
Article
Dazzling Evaluation of the Impact of a High-Repetition-Rate CO2 Pulsed Laser on Infrared Imaging Systems
by Hanyu Zheng, Yunzhe Wang, Yang Liu, Tao Sun and Junfeng Shao
Sensors 2024, 24(6), 1827; https://doi.org/10.3390/s24061827 - 12 Mar 2024
Viewed by 376
Abstract
This article utilizes the Canny edge extraction algorithm based on contour curvature and the cross-correlation template matching algorithm to extensively study the impact of a high-repetition-rate CO2 pulsed laser on the target extraction and tracking performance of an infrared imaging detector. It [...] Read more.
This article utilizes the Canny edge extraction algorithm based on contour curvature and the cross-correlation template matching algorithm to extensively study the impact of a high-repetition-rate CO2 pulsed laser on the target extraction and tracking performance of an infrared imaging detector. It establishes a quantified dazzling pattern for lasers on infrared imaging systems. By conducting laser dazzling and damage experiments, a detailed analysis of the normalized correlation between the target and the dazzling images is performed to quantitatively describe the laser dazzling effects. Simultaneously, an evaluation system, including target distance and laser power evaluation factors, is established to determine the dazzling level and whether the target is recognizable. The research results reveal that the laser power and target position are crucial factors affecting the detection performance of infrared imaging detector systems under laser dazzling. Different laser powers are required to successfully interfere with the recognition algorithm of the infrared imaging detector at different distances. And laser dazzling produces a considerable quantity of false edge information, which seriously affects the performance of the pattern recognition algorithm. In laser damage experiments, the detector experienced functional damage, with a quarter of the image displaying as completely black. The energy density threshold required for the functional damage of the detector is approximately 3 J/cm2. The dazzling assessment conclusions also apply to the evaluation of the damage results. Finally, the proposed evaluation formula aligns with the experimental results, objectively reflecting the actual impact of laser dazzling on the target extraction and the tracking performance of infrared imaging systems. This study provides an in-depth and accurate analysis for understanding the influence of lasers on the performance of infrared imaging detectors. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

23 pages, 3795 KiB  
Article
Classifying Breast Tumors in Digital Tomosynthesis by Combining Image Quality-Aware Features and Tumor Texture Descriptors
by Loay Hassan, Mohamed Abdel-Nasser, Adel Saleh and Domenec Puig
Mach. Learn. Knowl. Extr. 2024, 6(1), 619-641; https://doi.org/10.3390/make6010029 - 11 Mar 2024
Viewed by 757
Abstract
Digital breast tomosynthesis (DBT) is a 3D breast cancer screening technique that can overcome the limitations of standard 2D digital mammography. However, DBT images often suffer from artifacts stemming from acquisition conditions, a limited angular range, and low radiation doses. These artifacts have [...] Read more.
Digital breast tomosynthesis (DBT) is a 3D breast cancer screening technique that can overcome the limitations of standard 2D digital mammography. However, DBT images often suffer from artifacts stemming from acquisition conditions, a limited angular range, and low radiation doses. These artifacts have the potential to degrade the performance of automated breast tumor classification tools. Notably, most existing automated breast tumor classification methods do not consider the effect of DBT image quality when designing the classification models. In contrast, this paper introduces a novel deep learning-based framework for classifying breast tumors in DBT images. This framework combines global image quality-aware features with tumor texture descriptors. The proposed approach employs a two-branch model: in the top branch, a deep convolutional neural network (CNN) model is trained to extract robust features from the region of interest that includes the tumor. In the bottom branch, a deep learning model named TomoQA is trained to extract global image quality-aware features from input DBT images. The quality-aware features and the tumor descriptors are then combined and fed into a fully-connected layer to classify breast tumors as benign or malignant. The unique advantage of this model is the combination of DBT image quality-aware features with tumor texture descriptors, which helps accurately classify breast tumors as benign or malignant. Experimental results on a publicly available DBT image dataset demonstrate that the proposed framework achieves superior breast tumor classification results, outperforming all existing deep learning-based methods. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 5765 KiB  
Article
Real-Time Cucumber Target Recognition in Greenhouse Environments Using Color Segmentation and Shape Matching
by Wenbo Liu, Haonan Sun, Yu Xia and Jie Kang
Appl. Sci. 2024, 14(5), 1884; https://doi.org/10.3390/app14051884 - 25 Feb 2024
Viewed by 420
Abstract
Accurate identification of fruits in greenhouse environments is an essential need for the precise functioning of agricultural robots. This study presents a solution to the problem of distinguishing cucumber fruits from their stems and leaves, which often have similar colors in their natural [...] Read more.
Accurate identification of fruits in greenhouse environments is an essential need for the precise functioning of agricultural robots. This study presents a solution to the problem of distinguishing cucumber fruits from their stems and leaves, which often have similar colors in their natural environment. The proposed algorithm for cucumber fruit identification relies on color segmentation and form matching. First, we get the boundary details from the acquired image of the cucumber sample. The edge information is described and reconstructed by utilizing a shape descriptor known as the Fourier descriptor in order to acquire a matching template image. Subsequently, we generate a multi-scale template by amalgamating computational and real-world data. The target image is subjected to color conditioning in order to enhance the segmenacktation of the target region inside the HSV color space. Then, the segmented target region is compared to the multi-scale template based on its shape. The method of color segmentation decreases the presence of unwanted information in the target image, hence improving the effectiveness of shape matching. An analysis was performed on a set of 200 cucumber photos that were obtained from the field. The findings indicate that the method presented in this study surpasses conventional recognition algorithms in terms of accuracy and efficiency, with a recognition rate of up to 86%. Moreover, the system has exceptional proficiency in identifying cucumber targets within greenhouses. This attribute renders it a great resource for offering technical assistance to agricultural robots that operate with accuracy. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 587 KiB  
Article
CRAS: Curriculum Regularization and Adaptive Semi-Supervised Learning with Noisy Labels
by Ryota Higashimoto, Soh Yoshida and Mitsuji Muneyasu
Appl. Sci. 2024, 14(3), 1208; https://doi.org/10.3390/app14031208 - 31 Jan 2024
Viewed by 473
Abstract
This paper addresses the performance degradation of deep neural networks caused by learning with noisy labels. Recent research on this topic has exploited the memorization effect: networks fit data with clean labels during the early stages of learning and eventually memorize data with [...] Read more.
This paper addresses the performance degradation of deep neural networks caused by learning with noisy labels. Recent research on this topic has exploited the memorization effect: networks fit data with clean labels during the early stages of learning and eventually memorize data with noisy labels. This property allows for the separation of clean and noisy samples from a loss distribution. In recent years, semi-supervised learning, which divides training data into a set of labeled clean samples and a set of unlabeled noisy samples, has achieved impressive results. However, this strategy has two significant problems: (1) the accuracy of dividing the data into clean and noisy samples depends strongly on the network’s performance, and (2) if the divided data are biased towards the unlabeled samples, there are few labeled samples, causing the network to overfit to the labels and leading to a poor generalization performance. To solve these problems, we propose the curriculum regularization and adaptive semi-supervised learning (CRAS) method. Its key ideas are (1) to train the network with robust regularization techniques as a warm-up before dividing the data, and (2) to control the strength of the regularization using loss weights that adaptively respond to data bias, which varies with each split at each training epoch. We evaluated the performance of CRAS on benchmark image classification datasets, CIFAR-10 and CIFAR-100, and real-world datasets, mini-WebVision and Clothing1M. The findings demonstrate that CRAS excels in handling noisy labels, resulting in a superior generalization and robustness to a range of noise rates, compared with the existing method. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 3585 KiB  
Article
Enhancement of GUI Display Error Detection Using Improved Faster R-CNN and Multi-Scale Attention Mechanism
by Xi Pan, Zhan Huan, Yimang Li and Yingying Cao
Appl. Sci. 2024, 14(3), 1144; https://doi.org/10.3390/app14031144 - 30 Jan 2024
Viewed by 705
Abstract
Graphical user interfaces (GUIs) hold an irreplaceable position in modern software and applications. Users can interact through them. Due to different terminal devices, there are sometimes display errors, such as component occlusion, image loss, text overlap, and empty values during software rendering. To [...] Read more.
Graphical user interfaces (GUIs) hold an irreplaceable position in modern software and applications. Users can interact through them. Due to different terminal devices, there are sometimes display errors, such as component occlusion, image loss, text overlap, and empty values during software rendering. To address the aforementioned common four GUI display errors, a target detection algorithm based on the improved Faster R-CNN is proposed. Specifically, ResNet-50 is used instead of the traditional VGG-16 as the feature extraction network. The feature pyramid network (FPN) and the enhanced multi-scale attention (EMA) algorithm are introduced to improve accuracy. ROI-Align is used instead of ROI-Pooling to enhance the generalization capability of the network. Since training models require a large number of labeled screenshots of errors, there is currently no publicly available dataset with GUI display problems. Therefore, a training data generation algorithm has been developed, which can automatically generate screenshots with GUI display problems based on the Rico dataset. Experimental results show that the improved Faster R-CNN achieves a detection accuracy of 87.3% in the generated GUI problem dataset, which is a 7% improvement compared to the previous version. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 3370 KiB  
Article
Deep Learning-Based Technique for Remote Sensing Image Enhancement Using Multiscale Feature Fusion
by Ming Zhao, Rui Yang, Min Hu and Botao Liu
Sensors 2024, 24(2), 673; https://doi.org/10.3390/s24020673 - 21 Jan 2024
Viewed by 836
Abstract
The present study proposes a novel deep-learning model for remote sensing image enhancement. It maintains image details while enhancing brightness in the feature extraction module. An improved hierarchical model named Global Spatial Attention Network (GSA-Net), based on U-Net for image enhancement, is proposed [...] Read more.
The present study proposes a novel deep-learning model for remote sensing image enhancement. It maintains image details while enhancing brightness in the feature extraction module. An improved hierarchical model named Global Spatial Attention Network (GSA-Net), based on U-Net for image enhancement, is proposed to improve the model’s performance. To circumvent the issue of insufficient sample data, gamma correction is applied to create low-light images, which are then used as training examples. A loss function is constructed using the Structural Similarity (SSIM) and Peak Signal-to-Noise Ratio (PSNR) indices. The GSA-Net network and loss function are utilized to restore images obtained via low-light remote sensing. This proposed method was tested on the Northwestern Polytechnical University Very-High-Resolution 10 (NWPU VHR-10) dataset, and its overall superiority was demonstrated in comparison with other state-of-the-art algorithms using various objective assessment indicators, such as PSNR, SSIM, and Learned Perceptual Image Patch Similarity (LPIPS). Furthermore, in high-level visual tasks such as object detection, this novel method provides better remote sensing images with distinct details and higher contrast than the competing methods. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 20196 KiB  
Article
Synthetic Document Images with Diverse Shadows for Deep Shadow Removal Networks
by Yuhi Matsuo and Yoshimitsu Aoki
Sensors 2024, 24(2), 654; https://doi.org/10.3390/s24020654 - 19 Jan 2024
Viewed by 722
Abstract
Shadow removal for document images is an essential task for digitized document applications. Recent shadow removal models have been trained on pairs of shadow images and shadow-free images. However, obtaining a large, diverse dataset for document shadow removal takes time and effort. Thus, [...] Read more.
Shadow removal for document images is an essential task for digitized document applications. Recent shadow removal models have been trained on pairs of shadow images and shadow-free images. However, obtaining a large, diverse dataset for document shadow removal takes time and effort. Thus, only small real datasets are available. Graphic renderers have been used to synthesize shadows to create relatively large datasets. However, the limited number of unique documents and the limited lighting environments adversely affect the network performance. This paper presents a large-scale, diverse dataset called the Synthetic Document with Diverse Shadows (SynDocDS) dataset. The SynDocDS comprises rendered images with diverse shadows augmented by a physics-based illumination model, which can be utilized to obtain a more robust and high-performance deep shadow removal network. In this paper, we further propose a Dual Shadow Fusion Network (DSFN). Unlike natural images, document images often have constant background colors requiring a high understanding of global color features for training a deep shadow removal network. The DSFN has a high global color comprehension and understanding of shadow regions and merges shadow attentions and features efficiently. We conduct experiments on three publicly available datasets, the OSR, Kligler’s, and Jung’s datasets, to validate our proposed method’s effectiveness. In comparison to training on existing synthetic datasets, our model training on the SynDocDS dataset achieves an enhancement in the PSNR and SSIM, increasing them from 23.00 dB to 25.70 dB and 0.959 to 0.971 on average. In addition, the experiments demonstrated that our DSFN clearly outperformed other networks across multiple metrics, including the PSNR, the SSIM, and its impact on OCR performance. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

15 pages, 11956 KiB  
Article
Decomposition Technique for Bio-Transmittance Imaging Based on Attenuation Coefficient Matrix Inverse
by Purnomo Sidi Priambodo, Toto Aminoto and Basari Basari
J. Imaging 2024, 10(1), 22; https://doi.org/10.3390/jimaging10010022 - 15 Jan 2024
Viewed by 1232
Abstract
Human body tissue disease diagnosis will become more accurate if transmittance images, such as X-ray images, are separated according to each constituent tissue. This research proposes a new image decomposition technique based on the matrix inverse method for biological tissue images. The fundamental [...] Read more.
Human body tissue disease diagnosis will become more accurate if transmittance images, such as X-ray images, are separated according to each constituent tissue. This research proposes a new image decomposition technique based on the matrix inverse method for biological tissue images. The fundamental idea of this research is based on the fact that when k different monochromatic lights penetrate a biological tissue, they will experience different attenuation coefficients. Furthermore, the same happens when monochromatic light penetrates k different biological tissues, as they will also experience different attenuation coefficients. The various attenuation coefficients are arranged into a unique k×k-dimensional square matrix. k-many images taken by k-many different monochromatic lights are then merged into an image vector entity; further, a matrix inverse operation is performed on the merged image, producing N-many tissue thickness images of the constituent tissues. This research demonstrates that the proposed method effectively decomposes images of biological objects into separate images, each showing the thickness distributions of different constituent tissues. In the future, this proposed new technique is expected to contribute to supporting medical imaging analysis. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

28 pages, 22846 KiB  
Article
Predicting Wind Comfort in an Urban Area: A Comparison of a Regression- with a Classification-CNN for General Wind Rose Statistics
by Jennifer Werner, Dimitri Nowak, Franziska Hunger, Tomas Johnson, Andreas Mark, Alexander Gösta and Fredrik Edelvik
Mach. Learn. Knowl. Extr. 2024, 6(1), 98-125; https://doi.org/10.3390/make6010006 - 04 Jan 2024
Cited by 1 | Viewed by 1764
Abstract
Wind comfort is an important factor when new buildings in existing urban areas are planned. It is common practice to use computational fluid dynamics (CFD) simulations to model wind comfort. These simulations are usually time-consuming, making it impossible to explore a high number [...] Read more.
Wind comfort is an important factor when new buildings in existing urban areas are planned. It is common practice to use computational fluid dynamics (CFD) simulations to model wind comfort. These simulations are usually time-consuming, making it impossible to explore a high number of different design choices for a new urban development with wind simulations. Data-driven approaches based on simulations have shown great promise, and have recently been used to predict wind comfort in urban areas. These surrogate models could be used in generative design software and would enable the planner to explore a large number of options for a new design. In this paper, we propose a novel machine learning workflow (MLW) for direct wind comfort prediction. The MLW incorporates a regression and a classification U-Net, trained based on CFD simulations. Furthermore, we present an augmentation strategy focusing on generating more training data independent of the underlying wind statistics needed to calculate the wind comfort criterion. We train the models based on different sets of training data and compare the results. All trained models (regression and classification) yield an F1-score greater than 80% and can be combined with any wind rose statistic. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

13 pages, 1658 KiB  
Article
A Robust Machine Learning Model for Diabetic Retinopathy Classification
by Gigi Tăbăcaru, Simona Moldovanu, Elena Răducan and Marian Barbu
J. Imaging 2024, 10(1), 8; https://doi.org/10.3390/jimaging10010008 - 28 Dec 2023
Cited by 1 | Viewed by 1361
Abstract
Ensemble learning is a process that belongs to the artificial intelligence (AI) field. It helps to choose a robust machine learning (ML) model, usually used for data classification. AI has a large connection with image processing and feature classification, and it can also [...] Read more.
Ensemble learning is a process that belongs to the artificial intelligence (AI) field. It helps to choose a robust machine learning (ML) model, usually used for data classification. AI has a large connection with image processing and feature classification, and it can also be successfully applied to analyzing fundus eye images. Diabetic retinopathy (DR) is a disease that can cause vision loss and blindness, which, from an imaging point of view, can be shown when screening the eyes. Image processing tools can analyze and extract the features from fundus eye images, and these corroborate with ML classifiers that can perform their classification among different disease classes. The outcomes integrated into automated diagnostic systems can be a real success for physicians and patients. In this study, in the form image processing area, the manipulation of the contrast with the gamma correction parameter was applied because DR affects the blood vessels, and the structure of the eyes becomes disorderly. Therefore, the analysis of the texture with two types of entropies was necessary. Shannon and fuzzy entropies and contrast manipulation led to ten original features used in the classification process. The machine learning library PyCaret performs complex tasks, and the empirical process shows that of the fifteen classifiers, the gradient boosting classifier (GBC) provides the best results. Indeed, the proposed model can classify the DR degrees as normal or severe, achieving an accuracy of 0.929, an F1 score of 0.902, and an area under the curve (AUC) of 0.941. The validation of the selected model with a bootstrap statistical technique was performed. The novelty of the study consists of the extraction of features from preprocessed fundus eye images, their classification, and the manipulation of the contrast in a controlled way. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 3275 KiB  
Article
A Dual-Tree–Complex Wavelet Transform-Based Infrared and Visible Image Fusion Technique and Its Application in Tunnel Crack Detection
by Feng Wang and Tielin Chen
Appl. Sci. 2024, 14(1), 114; https://doi.org/10.3390/app14010114 - 22 Dec 2023
Viewed by 510
Abstract
Computer vision methods have been widely used in recent years for the detection of structural cracks. To address the issues of poor image quality and the inadequate performance of semantic segmentation networks under low-light conditions in tunnels, in this paper, infrared images are [...] Read more.
Computer vision methods have been widely used in recent years for the detection of structural cracks. To address the issues of poor image quality and the inadequate performance of semantic segmentation networks under low-light conditions in tunnels, in this paper, infrared images are used, and a preprocessing method based on image fusion technology is developed. First, the DAISY descriptor and the perspective transform are applied for image alignment. Then, the source image is decomposed into high- and low-frequency components of different scales and directions using DT-CWT, and high- and low-frequency subband fusion rules are designed according to the characteristics of infrared and visible images. Finally, a fused image is reconstructed from the processed coefficients, and the fusion results are evaluated using the improved semantic segmentation network. The results show that using the proposed fusion method to preprocess images leads to a low false alarm rate and low missed detection rate in comparison to those using the source image directly or using the classical fusion algorithm. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 2628 KiB  
Article
High-Precision Carton Detection Based on Adaptive Image Augmentation for Unmanned Cargo Handling Tasks
by Bing Liang, Xin Wang, Wenhao Zhao and Xiaobang Wang
Sensors 2024, 24(1), 12; https://doi.org/10.3390/s24010012 - 19 Dec 2023
Viewed by 627
Abstract
Unattended intelligent cargo handling is an important means to improve the efficiency and safety of port cargo trans-shipment, where high-precision carton detection is an unquestioned prerequisite. Therefore, this paper introduces an adaptive image augmentation method for high-precision carton detection. First, the imaging parameters [...] Read more.
Unattended intelligent cargo handling is an important means to improve the efficiency and safety of port cargo trans-shipment, where high-precision carton detection is an unquestioned prerequisite. Therefore, this paper introduces an adaptive image augmentation method for high-precision carton detection. First, the imaging parameters of the images are clustered into various scenarios, and the imaging parameters and perspectives are adaptively adjusted to achieve the automatic augmenting and balancing of the carton dataset in each scenario, which reduces the interference of the scenarios on the carton detection precision. Then, the carton boundary features are extracted and stochastically sampled to synthesize new images, thus enhancing the detection performance of the trained model for dense cargo boundaries. Moreover, the weight function of the hyperparameters of the trained model is constructed to achieve their preferential crossover during genetic evolution to ensure the training efficiency of the augmented dataset. Finally, an intelligent cargo handling platform is developed and field experiments are conducted. The outcomes of the experiments reveal that the method attains a detection precision of 0.828. This technique significantly enhances the detection precision by 18.1% and 4.4% when compared to the baseline and other methods, which provides a reliable guarantee for intelligent cargo handling processes. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 10650 KiB  
Article
Zero-Shot Traffic Sign Recognition Based on Midlevel Feature Matching
by Yaozong Gan, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa and Miki Haseyama
Sensors 2023, 23(23), 9607; https://doi.org/10.3390/s23239607 - 04 Dec 2023
Viewed by 1085
Abstract
Traffic sign recognition is a complex and challenging yet popular problem that can assist drivers on the road and reduce traffic accidents. Most existing methods for traffic sign recognition use convolutional neural networks (CNNs) and can achieve high recognition accuracy. However, these methods [...] Read more.
Traffic sign recognition is a complex and challenging yet popular problem that can assist drivers on the road and reduce traffic accidents. Most existing methods for traffic sign recognition use convolutional neural networks (CNNs) and can achieve high recognition accuracy. However, these methods first require a large number of carefully crafted traffic sign datasets for the training process. Moreover, since traffic signs differ in each country and there is a variety of traffic signs, these methods need to be fine-tuned when recognizing new traffic sign categories. To address these issues, we propose a traffic sign matching method for zero-shot recognition. Our proposed method can perform traffic sign recognition without training data by directly matching the similarity of target and template traffic sign images. Our method uses the midlevel features of CNNs to obtain robust feature representations of traffic signs without additional training or fine-tuning. We discovered that midlevel features improve the accuracy of zero-shot traffic sign recognition. The proposed method achieves promising recognition results on the German Traffic Sign Recognition Benchmark open dataset and a real-world dataset taken from Sapporo City, Japan. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

23 pages, 6299 KiB  
Article
Two-Stage Pedestrian Detection Model Using a New Classification Head for Domain Generalization
by Daniel Schulz and Claudio A. Perez
Sensors 2023, 23(23), 9380; https://doi.org/10.3390/s23239380 - 24 Nov 2023
Viewed by 869
Abstract
Pedestrian detection based on deep learning methods have reached great success in the past few years with several possible real-world applications including autonomous driving, robotic navigation, and video surveillance. In this work, a new neural network two-stage pedestrian detector with a new custom [...] Read more.
Pedestrian detection based on deep learning methods have reached great success in the past few years with several possible real-world applications including autonomous driving, robotic navigation, and video surveillance. In this work, a new neural network two-stage pedestrian detector with a new custom classification head, adding the triplet loss function to the standard bounding box regression and classification losses, is presented. This aims to improve the domain generalization capabilities of existing pedestrian detectors, by explicitly maximizing inter-class distance and minimizing intra-class distance. Triplet loss is applied to the features generated by the region proposal network, aimed at clustering together pedestrian samples in the features space. We used Faster R-CNN and Cascade R-CNN with the HRNet backbone pre-trained on ImageNet, changing the standard classification head for Faster R-CNN, and changing one of the three heads for Cascade R-CNN. The best results were obtained using a progressive training pipeline, starting from a dataset that is further away from the target domain, and progressively fine-tuning on datasets closer to the target domain. We obtained state-of-the-art results, MR2 of 9.9, 11.0, and 36.2 for the reasonable, small, and heavy subsets on the CityPersons benchmark with outstanding performance on the heavy subset, the most difficult one. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

19 pages, 47409 KiB  
Article
Ore Rock Fragmentation Calculation Based on Multi-Modal Fusion of Point Clouds and Images
by Jianjun Peng, Yunhao Cui, Zhidan Zhong and Yi An
Appl. Sci. 2023, 13(23), 12558; https://doi.org/10.3390/app132312558 - 21 Nov 2023
Viewed by 649
Abstract
The accurate calculation of ore rock fragmentation is important for achieving the autonomous mining operation of mine excavators. However, a single mode cannot accurately calculate the ore rock fragmentation due to the low resolution of the point cloud mode and the lack of [...] Read more.
The accurate calculation of ore rock fragmentation is important for achieving the autonomous mining operation of mine excavators. However, a single mode cannot accurately calculate the ore rock fragmentation due to the low resolution of the point cloud mode and the lack of spatial position information of the image mode. To solve this problem, we propose an ore rock fragmentation calculation method (ORFCM) based on the multi-modal fusion of point clouds and images. The ORFCM makes full use of the advantages of multi-modal data, including the fine-grained object segmentation of images and spatial location information of point clouds. To solve the problem of image under-segmentation, we propose a multiscale adaptive edge-detection method based on an innovative standard deviation map to enhance the weak edges. Furthermore, an improved marked watershed segmentation algorithm is proposed to solve the problem of low segmentation accuracy caused by excessive noise of the gradient map and weak edges submerged. Experiments demonstrate that ORFCM can accurately calculate ore rock fragmentation in the local excavation area without relying on external markers for pixel calibration. The average error of the equivalent diameter of ore rock blocks is 0.66 cm, the average error of the elliptical long diameter is 1.42 cm, and the average error of the elliptical short diameter is 1.06 cm, which can effectively meet practical engineering needs. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

13 pages, 867 KiB  
Article
A Four-Stage Mahalanobis-Distance-Based Method for Hand Posture Recognition
by Dawid Warchoł and Tomasz Kapuściński
Appl. Sci. 2023, 13(22), 12347; https://doi.org/10.3390/app132212347 - 15 Nov 2023
Viewed by 527
Abstract
Automatic recognition of hand postures is an important research topic with many applications, e.g., communication support for deaf people. In this paper, we present a novel four-stage, Mahalanobis-distance-based method for hand posture recognition using skeletal data. The proposed method is based on a [...] Read more.
Automatic recognition of hand postures is an important research topic with many applications, e.g., communication support for deaf people. In this paper, we present a novel four-stage, Mahalanobis-distance-based method for hand posture recognition using skeletal data. The proposed method is based on a two-stage classification algorithm with two additional stages related to joint preprocessing (normalization) and a rule-based system, specific to hand shapes that the algorithm is meant to classify. The method achieves superior effectiveness on two benchmark datasets, the first of which was created by us for the purpose of this work, while the second is a well-known and publicly available dataset. The method’s recognition rate measured by leave-one-subject-out cross-validation tests is 94.69% on the first dataset and 97.44% on the second. Experiments, including comparison with other state-of-the-art methods and ablation studies related to classification accuracy and time, confirm the effectiveness of our approach. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

23 pages, 4942 KiB  
Article
Anomaly Detection in the Production Process of Stamping Progressive Dies Using the Shape- and Size-Adaptive Descriptors
by Liang Ma and Fanwu Meng
Sensors 2023, 23(21), 8904; https://doi.org/10.3390/s23218904 - 01 Nov 2023
Viewed by 917
Abstract
In the production process of progressive die stamping, anomaly detection is essential for ensuring the safety of expensive dies and the continuous stability of the production process. Early monitoring processes involve manually inspecting the quality of post-production products to infer whether there are [...] Read more.
In the production process of progressive die stamping, anomaly detection is essential for ensuring the safety of expensive dies and the continuous stability of the production process. Early monitoring processes involve manually inspecting the quality of post-production products to infer whether there are anomalies in the production process, or using some sensors to monitor some state signals during the production process. However, the former is an extremely tedious and time-consuming task, and the latter cannot provide warnings before anomalies occur. Both methods can only detect anomalies after they have occurred, which usually means that damage to the die has already been caused. In this paper, we propose a machine-vision-based method for real-time anomaly detection in the production of progressive die stamping. This method can detect anomalies before they cause actual damage to the mold, thereby stopping the machine and protecting the mold and machine. In the proposed method, a whole continuous motion scene cycle is decomposed into a standard background template library, and the potential anomaly regions in the image to be detected are determined according to the difference from the background template library. Finally, the shape- and size-adaptive descriptors of these regions and corresponding reference regions are extracted and compared to determine the actual anomaly regions. The experimental results indicate that this method can achieve reasonable accuracy in the detection of anomalies in the production process of stamping progressive dies. The experimental results demonstrate that this method not only achieves satisfactory accuracy in anomaly detection during the production of progressive die stamping, but also attains competitive performance levels when compared with methods based on deep learning. Furthermore, it requires simpler preliminary preparations and does not necessitate the adoption of the deep learning paradigm. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

12 pages, 3017 KiB  
Article
Target Localization and Grasping of Parallel Robots with Multi-Vision Based on Improved RANSAC Algorithm
by Ruizhen Gao, Yang Li, Zhiqiang Liu and Shuai Zhang
Appl. Sci. 2023, 13(20), 11302; https://doi.org/10.3390/app132011302 - 14 Oct 2023
Viewed by 862
Abstract
Some traditional robots are based on offline programming reciprocal motion, and with the continuous upgrades in vision technology, more and more tasks are being replaced with machine vision. At present, the main method of target recognition used in palletizers is the traditional SURF [...] Read more.
Some traditional robots are based on offline programming reciprocal motion, and with the continuous upgrades in vision technology, more and more tasks are being replaced with machine vision. At present, the main method of target recognition used in palletizers is the traditional SURF algorithm, but this method of grasping leads to low accuracy due to the influence of too many mis-matched points. Due to the accuracy of robot target localization with binocular-based vision being low, an improved random sampling consistency algorithm for performing complete parallel robot target localization and grasping under the guidance of multi-vision is proposed. Firstly, the improved RANSAC algorithm, based on the SURF algorithm, was created based on the SURF algorithm; next, the parallax gradient method was applied to iterate the matched point pairs several times to further optimize the data; then, the 3D reconstruction was completed using the improved algorithm via the program technique; finally, the obtained data were input into the robot arm, and the camera’s internal and external parameters were obtained using the calibration method so that the robot could accurately locate and grasp objects. The experiments show that the improved algorithm shows better recognition accuracy and grasping success with the multi-vision approach. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

20 pages, 65655 KiB  
Article
A Spatially Guided Machine-Learning Method to Classify and Quantify Glomerular Patterns of Injury in Histology Images
by Justinas Besusparis, Mindaugas Morkunas and Arvydas Laurinavicius
J. Imaging 2023, 9(10), 220; https://doi.org/10.3390/jimaging9100220 - 11 Oct 2023
Viewed by 1149
Abstract
Introduction The diagnosis of glomerular diseases is primarily based on visual assessment of histologic patterns. Semi-quantitative scoring of active and chronic lesions is often required to assess individual characteristics of the disease. Reproducibility of the visual scoring systems remains debatable, while digital and [...] Read more.
Introduction The diagnosis of glomerular diseases is primarily based on visual assessment of histologic patterns. Semi-quantitative scoring of active and chronic lesions is often required to assess individual characteristics of the disease. Reproducibility of the visual scoring systems remains debatable, while digital and machine-learning technologies present opportunities to detect, classify and quantify glomerular lesions, also considering their inter- and intraglomerular heterogeneity. Materials and methods: We performed a cross-validated comparison of three modifications of a convolutional neural network (CNN)-based approach for recognition and intraglomerular quantification of nine main glomerular patterns of injury. Reference values provided by two nephropathologists were used for validation. For each glomerular image, visual attention heatmaps were generated with a probability of class attribution for further intraglomerular quantification. The quality of classifier-produced heatmaps was evaluated by intersection over union metrics (IoU) between predicted and ground truth localization heatmaps. Results: A proposed spatially guided modification of the CNN classifier achieved the highest glomerular pattern classification accuracies, with area under curve (AUC) values up to 0.981. With regards to heatmap overlap area and intraglomerular pattern quantification, the spatially guided classifier achieved a significantly higher generalized mean IoU value compared to single-multiclass and multiple-binary classifiers. Conclusions: We propose a spatially guided CNN classifier that in our experiments reveals the potential to achieve high accuracy for the localization of intraglomerular patterns. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 3617 KiB  
Article
KD-Net: Continuous-Keystroke-Dynamics-Based Human Identification from RGB-D Image Sequences
by Xinxin Dai, Ran Zhao, Pengpeng Hu and Adrian Munteanu
Sensors 2023, 23(20), 8370; https://doi.org/10.3390/s23208370 - 10 Oct 2023
Viewed by 781
Abstract
Keystroke dynamics is a soft biometric based on the assumption that humans always type in uniquely characteristic manners. Previous works mainly focused on analyzing the key press or release events. Unlike these methods, we explored a novel visual modality of keystroke dynamics for [...] Read more.
Keystroke dynamics is a soft biometric based on the assumption that humans always type in uniquely characteristic manners. Previous works mainly focused on analyzing the key press or release events. Unlike these methods, we explored a novel visual modality of keystroke dynamics for human identification using a single RGB-D sensor. In order to verify this idea, we created a dataset dubbed KD-MultiModal, which contains 243.2 K frames of RGB images and depth images, obtained by recording a video of hand typing with a single RGB-D sensor. The dataset comprises RGB-D image sequences of 20 subjects (10 males and 10 females) typing sentences, and each subject typed around 20 sentences. In the task, only the hand and keyboard region contributed to the person identification, so we also propose methods of extracting Regions of Interest (RoIs) for each type of data. Unlike the data of the key press or release, our dataset not only captures the velocity of pressing and releasing different keys and the typing style of specific keys or combinations of keys, but also contains rich information on the hand shape and posture. To verify the validity of our proposed data, we adopted deep neural networks to learn distinguishing features from different data representations, including RGB-KD-Net, D-KD-Net, and RGBD-KD-Net. Simultaneously, the sequence of point clouds also can be obtained from depth images given the intrinsic parameters of the RGB-D sensor, so we also studied the performance of human identification based on the point clouds. Extensive experimental results showed that our idea works and the performance of the proposed method based on RGB-D images is the best, which achieved 99.44% accuracy based on the unseen real-world data. To inspire more researchers and facilitate relevant studies, the proposed dataset will be publicly accessible together with the publication of this paper. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

15 pages, 3822 KiB  
Article
Foreground Segmentation-Based Density Grading Networks for Crowd Counting
by Zelong Liu, Xin Zhou, Tao Zhou and Yuanyuan Chen
Sensors 2023, 23(19), 8177; https://doi.org/10.3390/s23198177 - 29 Sep 2023
Viewed by 642
Abstract
Estimating object counts within a single image or video frame represents a challenging yet pivotal task in the field of computer vision. Its increasing significance arises from its versatile applications across various domains, including public safety and urban planning. Among the various object [...] Read more.
Estimating object counts within a single image or video frame represents a challenging yet pivotal task in the field of computer vision. Its increasing significance arises from its versatile applications across various domains, including public safety and urban planning. Among the various object counting tasks, crowd counting is particularly notable for its critical role in social security and urban planning. However, intricate backgrounds in images often lead to misidentifications, wherein the complex background is mistaken as the foreground, thereby inflating forecasting errors. Additionally, the uneven distribution of crowd density within the foreground further exacerbates predictive errors of the network. This paper introduces a novel architecture with a three-branch structure aimed at synergistically incorporating hierarchical foreground information and global scale information into density map estimation, thereby achieving more precise counting results. Hierarchical foreground information guides the network to perform distinct operations on regions with varying densities, while global scale information evaluates the overall density level of the image and adjusts the model’s global predictions accordingly. We also systematically investigate and compare three potential locations for integrating hierarchical foreground information into the density estimation network, ultimately determining the most effective placement.Through extensive comparative experiments across three datasets, we demonstrate the superior performance of our proposed method. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

27 pages, 3661 KiB  
Article
Efficient Extraction of Deep Image Features Using a Convolutional Neural Network (CNN) for Detecting Ventricular Fibrillation and Tachycardia
by Azeddine Mjahad, Mohamed Saban, Hossein Azarmdel and Alfredo Rosado-Muñoz
J. Imaging 2023, 9(9), 190; https://doi.org/10.3390/jimaging9090190 - 18 Sep 2023
Cited by 2 | Viewed by 1359
Abstract
To safely select the proper therapy for ventricular fibrillation (VF), it is essential to distinguish it correctly from ventricular tachycardia (VT) and other rhythms. Provided that the required therapy is not the same, an erroneous detection might [...] Read more.
To safely select the proper therapy for ventricular fibrillation (VF), it is essential to distinguish it correctly from ventricular tachycardia (VT) and other rhythms. Provided that the required therapy is not the same, an erroneous detection might lead to serious injuries to the patient or even cause ventricular fibrillation (VF). The primary innovation of this study lies in employing a CNN to create new features. These features exhibit the capacity and precision to detect and classify cardiac arrhythmias, including VF and VT. The electrocardiographic (ECG) signals utilized for this assessment were sourced from the established MIT-BIH and AHA databases. The input data to be classified are time–frequency (tf) representation images, specifically, Pseudo Wigner–Ville (PWV). Previous to Pseudo Wigner–Ville (PWV) calculation, preprocessing for denoising, signal alignment, and segmentation is necessary. In order to check the validity of the method independently of the classifier, four different CNNs are used: InceptionV3, MobilNet, VGGNet and AlexNet. The classification results reveal the following values: for VF detection, there is a sensitivity (Sens) of 98.16%, a specificity (Spe) of 99.07%, and an accuracy (Acc) of 98.91%; for ventricular tachycardia (VT), the sensitivity is 90.45%, the specificity is 99.73%, and the accuracy is 99.09%; for normal sinus rhythms, sensitivity stands at 99.34%, specificity is 98.35%, and accuracy is 98.89%; finally, for other rhythms, the sensitivity is 96.98%, the specificity is 99.68%, and the accuracy is 99.11%. Furthermore, distinguishing between shockable (VF/VT) and non-shockable rhythms yielded a sensitivity of 99.23%, a specificity of 99.74%, and an accuracy of 99.61%. The results show that using tf representations as a form of image, combined in this case with a CNN classifier, raises the classification performance above the results in previous works. Considering that these results were achieved without the preselection of ECG episodes, it can be concluded that these features may be successfully introduced in Automated External Defibrillation (AED) and Implantable Cardioverter Defibrillation (ICD) therapies, also opening the door to their use in other ECG rhythm detection applications. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 2875 KiB  
Article
Using Different Types of Artificial Neural Networks to Classify 2D Matrix Codes and Their Rotations—A Comparative Study
by Ladislav Karrach and Elena Pivarčiová
J. Imaging 2023, 9(9), 188; https://doi.org/10.3390/jimaging9090188 - 18 Sep 2023
Viewed by 1344
Abstract
Artificial neural networks can solve various tasks in computer vision, such as image classification, object detection, and general recognition. Our comparative study deals with four types of artificial neural networks—multilayer perceptrons, probabilistic neural networks, radial basis function neural networks, and convolutional neural networks—and [...] Read more.
Artificial neural networks can solve various tasks in computer vision, such as image classification, object detection, and general recognition. Our comparative study deals with four types of artificial neural networks—multilayer perceptrons, probabilistic neural networks, radial basis function neural networks, and convolutional neural networks—and investigates their ability to classify 2D matrix codes (Data Matrix codes, QR codes, and Aztec codes) as well as their rotation. The paper presents the basic building blocks of these artificial neural networks and their architecture and compares the classification accuracy of 2D matrix codes under different configurations of these neural networks. A dataset of 3000 synthetic code samples was used to train and test the neural networks. When the neural networks were trained on the full dataset, the convolutional neural network showed its superiority, followed by the RBF neural network and the multilayer perceptron. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

13 pages, 2949 KiB  
Article
Decoding Algorithm of Motor Imagery Electroencephalogram Signal Based on CLRNet Network Model
by Chaozhu Zhang, Hongxing Chu and Mingyuan Ma
Sensors 2023, 23(18), 7694; https://doi.org/10.3390/s23187694 - 06 Sep 2023
Viewed by 804
Abstract
EEG decoding based on motor imagery is an important part of brain–computer interface technology and is an important indicator that determines the overall performance of the brain–computer interface. Due to the complexity of motor imagery EEG feature analysis, traditional classification models rely heavily [...] Read more.
EEG decoding based on motor imagery is an important part of brain–computer interface technology and is an important indicator that determines the overall performance of the brain–computer interface. Due to the complexity of motor imagery EEG feature analysis, traditional classification models rely heavily on the signal preprocessing and feature design stages. End-to-end neural networks in deep learning have been applied to the classification task processing of motor imagery EEG and have shown good results. This study uses a combination of a convolutional neural network (CNN) and a long short-term memory (LSTM) network to obtain spatial information and temporal correlation from EEG signals. The use of cross-layer connectivity reduces the network gradient dispersion problem and enhances the overall network model stability. The effectiveness of this network model is demonstrated on the BCI Competition IV dataset 2a by integrating CNN, BiLSTM and ResNet (called CLRNet in this study) to decode motor imagery EEG. The network model combining CNN and BiLSTM achieved 87.0% accuracy in classifying motor imagery patterns in four classes. The network stability is enhanced by adding ResNet for cross-layer connectivity, which further improved the accuracy by 2.0% to achieve 89.0% classification accuracy. The experimental results show that CLRNet has good performance in decoding the motor imagery EEG dataset. This study provides a better solution for motor imagery EEG decoding in brain–computer interface technology research. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

24 pages, 10442 KiB  
Article
CRABR-Net: A Contextual Relational Attention-Based Recognition Network for Remote Sensing Scene Objective
by Ningbo Guo, Mingyong Jiang, Lijing Gao, Yizhuo Tang, Jinwei Han and Xiangning Chen
Sensors 2023, 23(17), 7514; https://doi.org/10.3390/s23177514 - 29 Aug 2023
Cited by 3 | Viewed by 796
Abstract
Remote sensing scene objective recognition (RSSOR) plays a serious application value in both military and civilian fields. Convolutional neural networks (CNNs) have greatly enhanced the improvement of intelligent objective recognition technology for remote sensing scenes, but most of the methods using CNN for [...] Read more.
Remote sensing scene objective recognition (RSSOR) plays a serious application value in both military and civilian fields. Convolutional neural networks (CNNs) have greatly enhanced the improvement of intelligent objective recognition technology for remote sensing scenes, but most of the methods using CNN for high-resolution RSSOR either use only the feature map of the last layer or directly fuse the feature maps from various layers in the “summation” way, which not only ignores the favorable relationship information between adjacent layers but also leads to redundancy and loss of feature map, which hinders the improvement of recognition accuracy. In this study, a contextual, relational attention-based recognition network (CRABR-Net) was presented, which extracts different convolutional feature maps from CNN, focuses important feature content by using a simple, parameter-free attention module (SimAM), fuses the adjacent feature maps by using the complementary relationship feature map calculation, improves the feature learning ability by using the enhanced relationship feature map calculation, and finally uses the concatenated feature maps from different layers for RSSOR. Experimental results show that CRABR-Net exploits the relationship between the different CNN layers to improve recognition performance, achieves better results compared to several state-of-the-art algorithms, and the average accuracy on AID, UC-Merced, and RSSCN7 can be up to 96.46%, 99.20%, and 95.43% with generic training ratios. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

24 pages, 3972 KiB  
Article
Automatic Facial Aesthetic Prediction Based on Deep Learning with Loss Ensembles
by Jwan Najeeb Saeed, Adnan Mohsin Abdulazeez and Dheyaa Ahmed Ibrahim
Appl. Sci. 2023, 13(17), 9728; https://doi.org/10.3390/app13179728 - 28 Aug 2023
Viewed by 1238
Abstract
Deep data-driven methodologies have significantly enhanced the automatic facial beauty prediction (FBP), particularly convolutional neural networks (CNNs). However, despite its wide utilization in classification-based applications, the adoption of CNN in regression research is still constrained. In addition, biases in beauty scores assigned to [...] Read more.
Deep data-driven methodologies have significantly enhanced the automatic facial beauty prediction (FBP), particularly convolutional neural networks (CNNs). However, despite its wide utilization in classification-based applications, the adoption of CNN in regression research is still constrained. In addition, biases in beauty scores assigned to facial images, such as preferences for specific, ethnicities, or age groups, present challenges to the effective generalization of models, which may not be appropriately addressed within conventional individual loss functions. Furthermore, regression problems commonly employ L2 loss to measure error rate, and this function is sensitive to outliers, making it difficult to generalize depending on the number of outliers in the training phase. Meanwhile, L1 loss is another regression-loss function that penalizes errors linearly and is less sensitive to outliers. The Log-cosh loss function is a flexible and robust loss function for regression problems. It provides a good compromise between the L1 and L2 loss functions. The Ensemble of multiple loss functions has been proven to improve the performance of deep-learning models in various tasks. In this work, we proposed to ensemble three regression-loss functions, namely L1, L2, and Log-cosh, and subsequently averaging them to create a new composite cost function. This strategy capitalizes on the unique traits of each loss function, constructing a unified framework that harmonizes outlier tolerance, precision, and adaptability. The proposed loss function’s effectiveness was demonstrated by incorporating it with three pretrained CNNs (AlexNet, VGG16-Net, and FIAC-Net) and evaluating it based on three FBP benchmarks (SCUT-FBP, SCUT-FBP5500, and MEBeauty). Integrating FIAC-Net with the proposed loss function yields remarkable outcomes across datasets due to its pretrained task of facial-attractiveness classification. The efficacy is evident in managing uncertain noise distributions, resulting in a strong correlation between machine- and human-rated aesthetic scores, along with low error rates. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

28 pages, 21211 KiB  
Article
An Intelligent Sorting Method of Film in Cotton Combining Hyperspectral Imaging and the AlexNet-PCA Algorithm
by Quang Li, Ling Zhao, Xin Yu, Zongbin Liu and Yiqing Zhang
Sensors 2023, 23(16), 7041; https://doi.org/10.3390/s23167041 - 09 Aug 2023
Viewed by 845
Abstract
Long-staple cotton from Xinjiang is renowned for its exceptional quality. However, it is susceptible to contamination with plastic film during mechanical picking. To address the issue of tricky removal of film in seed cotton, a technique based on hyperspectral images and AlexNet-PCA is [...] Read more.
Long-staple cotton from Xinjiang is renowned for its exceptional quality. However, it is susceptible to contamination with plastic film during mechanical picking. To address the issue of tricky removal of film in seed cotton, a technique based on hyperspectral images and AlexNet-PCA is proposed to identify the colorless and transparent film of the seed cotton. The method consists of black and white correction of hyperspectral images, dimensionality reduction of hyperspectral data, and training and testing of convolutional neural network (CNN) models. The key technique is to find the optimal way to reduce the dimensionality of the hyperspectral data, thus reducing the computational cost. The biggest innovation of the paper is the combination of CNNs and dimensionality reduction methods to achieve high-precision intelligent recognition of transparent plastic films. Experiments with three dimensionality reduction methods and three CNN architectures are conducted to seek the optimal model for plastic film recognition. The results demonstrate that AlexNet-PCA-12 achieves the highest recognition accuracy and cost performance in dimensionality reduction. In the practical application sorting tests, the method proposed in this paper achieved a 97.02% removal rate of plastic film, which provides a modern theoretical model and effective method for high-precision identification of heteropolymers in seed cotton. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

21 pages, 1997 KiB  
Article
Shot Boundary Detection with 3D Depthwise Convolutions and Visual Attention
by Miguel Jose Esteve Brotons, Francisco Javier Lucendo, Rodriguez-Juan Javier and Jose Garcia-Rodriguez
Sensors 2023, 23(16), 7022; https://doi.org/10.3390/s23167022 - 08 Aug 2023
Viewed by 805
Abstract
Shot boundary detection is the process of identifying and locating the boundaries between individual shots in a video sequence. A shot is a continuous sequence of frames that are captured by a single camera, without any cuts or edits. Recent investigations have shown [...] Read more.
Shot boundary detection is the process of identifying and locating the boundaries between individual shots in a video sequence. A shot is a continuous sequence of frames that are captured by a single camera, without any cuts or edits. Recent investigations have shown the effectiveness of the use of 3D convolutional networks to solve this task due to its high capacity to extract spatiotemporal features of the video and determine in which frame a transition or shot change occurs. When this task is used as part of a scene segmentation use case with the aim of improving the experience of viewing content from streaming platforms, the speed of segmentation is very important for live and near-live use cases such as start-over. The problem with models based on 3D convolutions is the large number of parameters that they entail. Standard 3D convolutions impose much higher CPU and memory requirements than do the same 2D operations. In this paper, we rely on depthwise separable convolutions to address the problem but with a scheme that significantly reduces the number of parameters. To compensate for the slight loss of performance, we analyze and propose the use of visual self-attention as a mechanism of improvement. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

13 pages, 1536 KiB  
Article
Open-Set Recognition of Wood Species Based on Deep Learning Feature Extraction Using Leaves
by Tianyu Fang, Zhenyu Li, Jialin Zhang, Dawei Qi and Lei Zhang
J. Imaging 2023, 9(8), 154; https://doi.org/10.3390/jimaging9080154 - 30 Jul 2023
Viewed by 1251
Abstract
An open-set recognition scheme for tree leaves based on deep learning feature extraction is presented in this study. Deep learning algorithms are used to extract leaf features for different wood species, and the leaf set of a wood species is divided into two [...] Read more.
An open-set recognition scheme for tree leaves based on deep learning feature extraction is presented in this study. Deep learning algorithms are used to extract leaf features for different wood species, and the leaf set of a wood species is divided into two datasets: the leaf set of a known wood species and the leaf set of an unknown species. The deep learning network (CNN) is trained on the leaves of selected known wood species, and the features of the remaining known wood species and all unknown wood species are extracted using the trained CNN. Then, the single-class classification is performed using the weighted SVDD algorithm to recognize the leaves of known and unknown wood species. The features of leaves recognized as known wood species are fed back to the trained CNN to recognize the leaves of known wood species. The recognition results of a single-class classifier for known and unknown wood species are combined with the recognition results of a multi-class CNN to finally complete the open recognition of wood species. We tested the proposed method on the publicly available Swedish Leaf Dataset, which includes 15 wood species (5 species used as known and 10 species used as unknown). The test results showed that, with F1 scores of 0.7797 and 0.8644, mixed recognition rates of 95.15% and 93.14%, and Kappa coefficients of 0.7674 and 0.8644 under two different data distributions, the proposed method outperformed the state-of-the-art open-set recognition algorithms in all three aspects. And, the more wood species that are known, the better the recognition. This approach can extract effective features from tree leaf images for open-set recognition and achieve wood species recognition without compromising tree material. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 16622 KiB  
Article
Tomato Maturity Detection and Counting Model Based on MHSA-YOLOv8
by Ping Li, Jishu Zheng, Peiyuan Li, Hanwei Long, Mai Li and Lihong Gao
Sensors 2023, 23(15), 6701; https://doi.org/10.3390/s23156701 - 26 Jul 2023
Cited by 6 | Viewed by 3744
Abstract
The online automated maturity grading and counting of tomato fruits has a certain promoting effect on digital supervision of fruit growth status and unmanned precision operations during the planting process. The traditional grading and counting of tomato fruit maturity is mostly done manually, [...] Read more.
The online automated maturity grading and counting of tomato fruits has a certain promoting effect on digital supervision of fruit growth status and unmanned precision operations during the planting process. The traditional grading and counting of tomato fruit maturity is mostly done manually, which is time-consuming and laborious work, and its precision depends on the accuracy of human eye observation. The combination of artificial intelligence and machine vision has to some extent solved this problem. In this work, firstly, a digital camera is used to obtain tomato fruit image datasets, taking into account factors such as occlusion and external light interference. Secondly, based on the tomato maturity grading task requirements, the MHSA attention mechanism is adopted to improve YOLOv8’s backbone to enhance the network’s ability to extract diverse features. The Precision, Recall, F1-score, and mAP50 of the tomato fruit maturity grading model constructed based on MHSA-YOLOv8 were 0.806, 0.807, 0.806, and 0.864, respectively, which improved the performance of the model with a slight increase in model size. Finally, thanks to the excellent performance of MHSA-YOLOv8, the Precision, Recall, F1-score, and mAP50 of the constructed counting models were 0.990, 0.960, 0.975, and 0.916, respectively. The tomato maturity grading and counting model constructed in this study is not only suitable for online detection but also for offline detection, which greatly helps to improve the harvesting and grading efficiency of tomato growers. The main innovations of this study are summarized as follows: (1) a tomato maturity grading and counting dataset collected from actual production scenarios was constructed; (2) considering the complexity of the environment, this study proposes a new object detection method, MHSA-YOLOv8, and constructs tomato maturity grading models and counting models, respectively; (3) the models constructed in this study are not only suitable for online grading and counting but also for offline grading and counting. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

10 pages, 1048 KiB  
Article
Quantitative CT Metrics Associated with Variability in the Diffusion Capacity of the Lung of Post-COVID-19 Patients with Minimal Residual Lung Lesions
by Han Wen, Julio A. Huapaya, Shreya M. Kanth, Junfeng Sun, Brianna P. Matthew, Simone C. Lee, Michael Do, Marcus Y. Chen, Ashkan A. Malayeri and Anthony F. Suffredini
J. Imaging 2023, 9(8), 150; https://doi.org/10.3390/jimaging9080150 - 26 Jul 2023
Cited by 4 | Viewed by 1020
Abstract
(1) Background: A reduction in the diffusion capacity of the lung for carbon monoxide is a prevalent longer-term consequence of COVID-19 infection. In patients who have zero or minimal residual radiological abnormalities in the lungs, it has been debated whether the cause was [...] Read more.
(1) Background: A reduction in the diffusion capacity of the lung for carbon monoxide is a prevalent longer-term consequence of COVID-19 infection. In patients who have zero or minimal residual radiological abnormalities in the lungs, it has been debated whether the cause was mainly due to a reduced alveolar volume or involved diffuse interstitial or vascular abnormalities. (2) Methods: We performed a cross-sectional study of 45 patients with either zero or minimal residual lesions in the lungs (total volume < 7 cc) at two months to one year post COVID-19 infection. There was considerable variability in the diffusion capacity of the lung for carbon monoxide, with 27% of the patients at less than 80% of the predicted reference. We investigated a set of independent variables that may affect the diffusion capacity of the lung, including demographic, pulmonary physiology and CT (computed tomography)-derived variables of vascular volume, parenchymal density and residual lesion volume. (3) Results: The leading three variables that contributed to the variability in the diffusion capacity of the lung for carbon monoxide were the alveolar volume, determined via pulmonary function tests, the blood vessel volume fraction, determined via CT, and the parenchymal radiodensity, also determined via CT. These factors explained 49% of the variance of the diffusion capacity, with p values of 0.031, 0.005 and 0.018, respectively, after adjusting for confounders. A multiple-regression model combining these three variables fit the measured values of the diffusion capacity, with R = 0.70 and p < 0.001. (4) Conclusions: The results are consistent with the notion that in some post-COVID-19 patients, after their pulmonary lesions resolve, diffuse changes in the vascular and parenchymal structures, in addition to a low alveolar volume, could be contributors to a lingering low diffusion capacity. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 3403 KiB  
Article
Varroa Destructor Classification Using Legendre–Fourier Moments with Different Color Spaces
by Alicia Noriega-Escamilla, César J. Camacho-Bello, Rosa M. Ortega-Mendoza, José H. Arroyo-Núñez and Lucia Gutiérrez-Lazcano
J. Imaging 2023, 9(7), 144; https://doi.org/10.3390/jimaging9070144 - 14 Jul 2023
Cited by 1 | Viewed by 1505
Abstract
Bees play a critical role in pollination and food production, so their preservation is essential, particularly highlighting the importance of detecting diseases in bees early. The Varroa destructor mite is the primary factor contributing to increased viral infections that can lead to hive [...] Read more.
Bees play a critical role in pollination and food production, so their preservation is essential, particularly highlighting the importance of detecting diseases in bees early. The Varroa destructor mite is the primary factor contributing to increased viral infections that can lead to hive mortality. This study presents an innovative method for identifying Varroa destructors in honey bees using multichannel Legendre–Fourier moments. The descriptors derived from this approach possess distinctive characteristics, such as rotation and scale invariance, and noise resistance, allowing the representation of digital images with minimal descriptors. This characteristic is advantageous when analyzing images of living organisms that are not in a static posture. The proposal evaluates the algorithm’s efficiency using different color models, and to enhance its capacity, a subdivision of the VarroaDataset is used. This enhancement allows the algorithm to process additional information about the color and shape of the bee’s legs, wings, eyes, and mouth. To demonstrate the advantages of our approach, we compare it with other deep learning methods, in semantic segmentation techniques, such as DeepLabV3, and object detection techniques, such as YOLOv5. The results suggest that our proposal offers a promising means for the early detection of the Varroa destructor mite, which could be an essential pillar in the preservation of bees and, therefore, in food production. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

20 pages, 9805 KiB  
Article
Augmented Reality in Maintenance—History and Perspectives
by Ana Malta, Torres Farinha and Mateus Mendes
J. Imaging 2023, 9(7), 142; https://doi.org/10.3390/jimaging9070142 - 10 Jul 2023
Viewed by 1888
Abstract
Augmented Reality (AR) is a technology that allows virtual elements to be superimposed over images of real contexts, whether these are text elements, graphics, or other types of objects. Smart AR glasses are increasingly optimized, and modern ones have features such as Global [...] Read more.
Augmented Reality (AR) is a technology that allows virtual elements to be superimposed over images of real contexts, whether these are text elements, graphics, or other types of objects. Smart AR glasses are increasingly optimized, and modern ones have features such as Global Positioning System (GPS), a microphone, and gesture recognition, among others. These devices allow users to have their hands free to perform tasks while they receive instructions in real time through the glasses. This allows maintenance professionals to carry out interventions more efficiently and in a shorter time than would be necessary without the support of this technology. In the present work, a timeline of important achievements is established, including important findings in object recognition, real-time operation. and integration of technologies for shop floor use. Perspectives on future research and related recommendations are proposed as well. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

11 pages, 2637 KiB  
Article
Fast and Efficient Evaluation of the Mass Composition of Shredded Electrodes from Lithium-Ion Batteries Using 2D Imaging
by Peter Bischoff, Alexandra Kaas, Christiane Schuster, Thomas Härtling and Urs Peuker
J. Imaging 2023, 9(7), 135; https://doi.org/10.3390/jimaging9070135 - 05 Jul 2023
Cited by 2 | Viewed by 1403
Abstract
With the increasing number of electrical devices, especially electric vehicles, the need for efficient recycling processes of electric components is on the rise. Mechanical recycling of lithium-ion batteries includes the comminution of the electrodes and sorting the particle mixtures to achieve the highest [...] Read more.
With the increasing number of electrical devices, especially electric vehicles, the need for efficient recycling processes of electric components is on the rise. Mechanical recycling of lithium-ion batteries includes the comminution of the electrodes and sorting the particle mixtures to achieve the highest possible purities of the individual material components (e.g., copper and aluminum). An important part of recycling is the quantitative determination of the yield and recovery rate, which is required to adapt the processes to different feed materials. Since this is usually done by sorting individual particles manually before determining the mass of each material, we developed a novel method for automating this evaluation process. The method is based on detecting the different material particles in images based on simple thresholding techniques and analyzing the correlation of the area of each material in the field of view to the mass in the previously prepared samples. This can then be applied to further samples to determine their mass composition. Using this automated method, the process is accelerated, the accuracy is improved compared to a human operator, and the cost of the evaluation process is reduced. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

15 pages, 5804 KiB  
Article
MineSDS: A Unified Framework for Small Object Detection and Drivable Area Segmentation for Open-Pit Mining Scenario
by Yong Liu, Cheng Li, Jiade Huang and Ming Gao
Sensors 2023, 23(13), 5977; https://doi.org/10.3390/s23135977 - 27 Jun 2023
Cited by 1 | Viewed by 1231
Abstract
To tackle the challenges posed by dense small objects and fuzzy boundaries on unstructured roads in the mining scenario, we proposed an end-to-end small object detection and drivable area segmentation framework for open-pit mining. We employed a convolutional network backbone as a feature [...] Read more.
To tackle the challenges posed by dense small objects and fuzzy boundaries on unstructured roads in the mining scenario, we proposed an end-to-end small object detection and drivable area segmentation framework for open-pit mining. We employed a convolutional network backbone as a feature extractor for both two tasks, as multi-task learning yielded promising results in autonomous driving perception. To address small object detection, we introduced a lightweight attention module that allowed our network to focus more on the spatial and channel dimensions of small objects without impeding inference time. We also used a convolutional block attention module in the drivable area segmentation subnetwork, which assigned more weight to road boundaries to improve feature mapping capabilities. Furthermore, to improve our network perception accuracy of both tasks, we used weighted summation when designing the loss function. We validated the effectiveness of our approach by testing it on pre-collected mining data which were called Minescape. Our detection results on the Minescape dataset showed 87.8% mAP index, which was 9.3% higher than state-of-the-art algorithms. Our segmentation results surpassed the comparison algorithm by 1 percent in MIoU index. Our experimental results demonstrated that our approach achieves competitive performance. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

23 pages, 29094 KiB  
Article
An Approach for 3D Modeling of the Regular Relief Surface Topography Formed by a Ball Burnishing Process Using 2D Images and Measured Profilograms
by Stoyan Slavov, Lyubomir Si Bao Van, Diyan Dimitrov and Boris Nikolov
Sensors 2023, 23(13), 5801; https://doi.org/10.3390/s23135801 - 21 Jun 2023
Viewed by 939
Abstract
Advanced in the present paper is an innovative approach for three-dimensional modeling of the regular relief topography formed via a ball burnishing process. The proposed methodology involves capturing a greyscale image of and profile measuring the surface topography in two perpendicular directions using [...] Read more.
Advanced in the present paper is an innovative approach for three-dimensional modeling of the regular relief topography formed via a ball burnishing process. The proposed methodology involves capturing a greyscale image of and profile measuring the surface topography in two perpendicular directions using a stylus method. A specially developed algorithm further identifies the best match between the measured profile segment and a row or column from the captured topography image by carrying out a signal correlation assessment based on an appropriate similarity metric. To ensure accurate scaling, the image pixel grey levels are scaled with a factor calculated as being the larger ratio between the ultimate heights of the measured profilograms and the more perfectly matched image row/column. Nine different similarity metrics were tested to determine the best performing model. The developed approach was evaluated for eight distinct types of fully and partially regular reliefs, and the results reveal that the best-scaled 3D topography models are produced for the fully regular reliefs with much greater heights. Following a thorough analysis of the results obtained, at the end of the paper, we draw some conclusions and discuss potential future work. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

20 pages, 2936 KiB  
Article
Analysis of the Asymmetry between Both Eyes in Early Diagnosis of Glaucoma Combining Features Extracted from Retinal Images and OCTs into Classification Models
by Francisco Rodríguez-Robles, Rafael Verdú-Monedero, Rafael Berenguer-Vidal, Juan Morales-Sánchez and Inmaculada Sellés-Navarro
Sensors 2023, 23(10), 4737; https://doi.org/10.3390/s23104737 - 14 May 2023
Viewed by 1355
Abstract
This study aims to analyze the asymmetry between both eyes of the same patient for the early diagnosis of glaucoma. Two imaging modalities, retinal fundus images and optical coherence tomographies (OCTs), have been considered in order to compare their different capabilities for glaucoma [...] Read more.
This study aims to analyze the asymmetry between both eyes of the same patient for the early diagnosis of glaucoma. Two imaging modalities, retinal fundus images and optical coherence tomographies (OCTs), have been considered in order to compare their different capabilities for glaucoma detection. From retinal fundus images, the difference between cup/disc ratio and the width of the optic rim has been extracted. Analogously, the thickness of the retinal nerve fiber layer has been measured in spectral-domain optical coherence tomographies. These measurements have been considered as asymmetry characteristics between eyes in the modeling of decision trees and support vector machines for the classification of healthy and glaucoma patients. The main contribution of this work is indeed the use of different classification models with both imaging modalities to jointly exploit the strengths of each of these modalities for the same diagnostic purpose based on the asymmetry characteristics between the eyes of the patient. The results show that the optimized classification models provide better performance with OCT asymmetry features between both eyes (sensitivity 80.9%, specificity 88.2%, precision 66.7%, accuracy 86.5%) than with those extracted from retinographies, although a linear relationship has been found between certain asymmetry features extracted from both imaging modalities. Therefore, the resulting performance of the models based on asymmetry features proves their ability to differentiate healthy from glaucoma patients using those metrics. Models trained from fundus characteristics are a useful option as a glaucoma screening method in the healthy population, although with lower performance than those trained from the thickness of the peripapillary retinal nerve fiber layer. In both imaging modalities, the asymmetry of morphological characteristics can be used as a glaucoma indicator, as detailed in this work. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 16376 KiB  
Article
Gaze-Dependent Image Re-Ranking Technique for Enhancing Content-Based Image Retrieval
by Yuhu Feng, Keisuke Maeda, Takahiro Ogawa and Miki Haseyama
Appl. Sci. 2023, 13(10), 5948; https://doi.org/10.3390/app13105948 - 11 May 2023
Viewed by 1491
Abstract
Content-based image retrieval (CBIR) aims to find desired images similar to the image input by the user, and it is extensively used in the real world. Conventional CBIR methods do not consider user preferences since they only determine retrieval results by referring to [...] Read more.
Content-based image retrieval (CBIR) aims to find desired images similar to the image input by the user, and it is extensively used in the real world. Conventional CBIR methods do not consider user preferences since they only determine retrieval results by referring to the degree of resemblance or likeness between the query and potential candidate images. Because of the above reason, a “semantic gap” appears, as the model may not accurately understand the potential intention that a user has included in the query image. In this article, we propose a re-ranking method for CBIR that considers a user’s gaze trace as interactive information to help the model predict the user’s inherent attention. The proposed method uses the user’s gaze trace corresponding to the image obtained from the initial retrieval as the user’s preference information. We introduce image captioning to effectively express the relationship between images and gaze information by generating image captions based on the gaze trace. As a result, we can transform the coordinate data into a text format and explicitly express the semantic information of the images. Finally, image retrieval is performed again using the generated gaze-dependent image captions to obtain images that align more accurately with the user’s preferences or interests. The experimental results on an open image dataset with corresponding gaze traces and human-generated descriptions demonstrate the efficacy or efficiency of the proposed method. Our method considers visual information as the user’s feedback to achieve user-oriented image retrieval. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 2165 KiB  
Article
Real-Time Machine Learning-Based Driver Drowsiness Detection Using Visual Features
by Yaman Albadawi, Aneesa AlRedhaei and Maen Takruri
J. Imaging 2023, 9(5), 91; https://doi.org/10.3390/jimaging9050091 - 29 Apr 2023
Cited by 9 | Viewed by 10402
Abstract
Drowsiness-related car accidents continue to have a significant effect on road safety. Many of these accidents can be eliminated by alerting the drivers once they start feeling drowsy. This work presents a non-invasive system for real-time driver drowsiness detection using visual features. These [...] Read more.
Drowsiness-related car accidents continue to have a significant effect on road safety. Many of these accidents can be eliminated by alerting the drivers once they start feeling drowsy. This work presents a non-invasive system for real-time driver drowsiness detection using visual features. These features are extracted from videos obtained from a camera installed on the dashboard. The proposed system uses facial landmarks and face mesh detectors to locate the regions of interest where mouth aspect ratio, eye aspect ratio, and head pose features are extracted and fed to three different classifiers: random forest, sequential neural network, and linear support vector machine classifiers. Evaluations of the proposed system over the National Tsing Hua University driver drowsiness detection dataset showed that it can successfully detect and alarm drowsy drivers with an accuracy up to 99%. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 1669 KiB  
Article
Progressively Hybrid Transformer for Multi-Modal Vehicle Re-Identification
by Wenjie Pan, Linhan Huang, Jianbao Liang, Lan Hong and Jianqing Zhu
Sensors 2023, 23(9), 4206; https://doi.org/10.3390/s23094206 - 23 Apr 2023
Cited by 3 | Viewed by 1747
Abstract
Multi-modal (i.e., visible, near-infrared, and thermal-infrared) vehicle re-identification has good potential to search vehicles of interest in low illumination. However, due to the fact that different modalities have varying imaging characteristics, a proper multi-modal complementary information fusion is crucial to multi-modal vehicle re-identification. [...] Read more.
Multi-modal (i.e., visible, near-infrared, and thermal-infrared) vehicle re-identification has good potential to search vehicles of interest in low illumination. However, due to the fact that different modalities have varying imaging characteristics, a proper multi-modal complementary information fusion is crucial to multi-modal vehicle re-identification. For that, this paper proposes a progressively hybrid transformer (PHT). The PHT method consists of two aspects: random hybrid augmentation (RHA) and a feature hybrid mechanism (FHM). Regarding RHA, an image random cropper and a local region hybrider are designed. The image random cropper simultaneously crops multi-modal images of random positions, random numbers, random sizes, and random aspect ratios to generate local regions. The local region hybrider fuses the cropped regions to let regions of each modal bring local structural characteristics of all modalities, mitigating modal differences at the beginning of feature learning. Regarding the FHM, a modal-specific controller and a modal information embedding are designed to effectively fuse multi-modal information at the feature level. Experimental results show the proposed method wins the state-of-the-art method by a larger 2.7% mAP on RGBNT100 and a larger 6.6% mAP on RGBN300, demonstrating that the proposed method can learn multi-modal complementary information effectively. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

10 pages, 1165 KiB  
Brief Report
Invariant Pattern Recognition with Log-Polar Transform and Dual-Tree Complex Wavelet-Fourier Features
by Guangyi Chen and Adam Krzyzak
Sensors 2023, 23(8), 3842; https://doi.org/10.3390/s23083842 - 09 Apr 2023
Viewed by 1157
Abstract
In this paper, we propose a novel method for 2D pattern recognition by extracting features with the log-polar transform, the dual-tree complex wavelet transform (DTCWT), and the 2D fast Fourier transform (FFT2). Our new method is invariant to translation, rotation, and scaling of [...] Read more.
In this paper, we propose a novel method for 2D pattern recognition by extracting features with the log-polar transform, the dual-tree complex wavelet transform (DTCWT), and the 2D fast Fourier transform (FFT2). Our new method is invariant to translation, rotation, and scaling of the input 2D pattern images in a multiresolution way, which is very important for invariant pattern recognition. We know that very low-resolution sub-bands lose important features in the pattern images, and very high-resolution sub-bands contain significant amounts of noise. Therefore, intermediate-resolution sub-bands are good for invariant pattern recognition. Experiments on one printed Chinese character dataset and one 2D aircraft dataset show that our new method is better than two existing methods for a combination of rotation angles, scaling factors, and different noise levels in the input pattern images in most testing cases. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

14 pages, 1674 KiB  
Article
YOLOv5s-CA: A Modified YOLOv5s Network with Coordinate Attention for Underwater Target Detection
by Ge Wen, Shaobao Li, Fucai Liu, Xiaoyuan Luo, Meng-Joo Er, Mufti Mahmud and Tao Wu
Sensors 2023, 23(7), 3367; https://doi.org/10.3390/s23073367 - 23 Mar 2023
Cited by 20 | Viewed by 3552
Abstract
Underwater target detection techniques have been extensively applied to underwater vehicles for marine surveillance, aquaculture, and rescue applications. However, due to complex underwater environments and insufficient training samples, the existing underwater target recognition algorithm accuracy is still unsatisfactory. A long-term effort is essential [...] Read more.
Underwater target detection techniques have been extensively applied to underwater vehicles for marine surveillance, aquaculture, and rescue applications. However, due to complex underwater environments and insufficient training samples, the existing underwater target recognition algorithm accuracy is still unsatisfactory. A long-term effort is essential to improving underwater target detection accuracy. To achieve this goal, in this work, we propose a modified YOLOv5s network, called YOLOv5s-CA network, by embedding a Coordinate Attention (CA) module and a Squeeze-and-Excitation (SE) module, aiming to concentrate more computing power on the target to improve detection accuracy. Based on the existing YOLOv5s network, the number of bottlenecks in the first C3 module was increased from one to three to improve the performance of shallow feature extraction. The CA module was embedded into the C3 modules to improve the attention power focused on the target. The SE layer was added to the output of the C3 modules to strengthen model attention. Experiments on the data of the 2019 China Underwater Robot Competition were conducted, and the results demonstrate that the mean Average Precision (mAP) of the modified YOLOv5s network was increased by 2.4%. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

22 pages, 8315 KiB  
Article
Real-Time Fire Smoke Detection Method Combining a Self-Attention Mechanism and Radial Multi-Scale Feature Connection
by Chuan Jin, Anqi Zheng, Zhaoying Wu and Changqing Tong
Sensors 2023, 23(6), 3358; https://doi.org/10.3390/s23063358 - 22 Mar 2023
Cited by 3 | Viewed by 2145
Abstract
Fire remains a pressing issue that requires urgent attention. Due to its uncontrollable and unpredictable nature, it can easily trigger chain reactions and increase the difficulty of extinguishing, posing a significant threat to people’s lives and property. The effectiveness of traditional photoelectric- or [...] Read more.
Fire remains a pressing issue that requires urgent attention. Due to its uncontrollable and unpredictable nature, it can easily trigger chain reactions and increase the difficulty of extinguishing, posing a significant threat to people’s lives and property. The effectiveness of traditional photoelectric- or ionization-based detectors is inhibited when detecting fire smoke due to the variable shape, characteristics, and scale of the detected objects and the small size of the fire source in the early stages. Additionally, the uneven distribution of fire and smoke and the complexity and variety of the surroundings in which they occur contribute to inconspicuous pixel-level-based feature information, making identification difficult. We propose a real-time fire smoke detection algorithm based on multi-scale feature information and an attention mechanism. Firstly, the feature information layers extracted from the network are fused into a radial connection to enhance the semantic and location information of the features. Secondly, to address the challenge of recognizing harsh fire sources, we designed a permutation self-attention mechanism to concentrate on features in channel and spatial directions to gather contextual information as accurately as possible. Thirdly, we constructed a new feature extraction module to increase the detection efficiency of the network while retaining feature information. Finally, we propose a cross-grid sample matching approach and a weighted decay loss function to handle the issue of imbalanced samples. Our model achieves the best detection results compared to standard detection methods using a handcrafted fire smoke detection dataset, with APval reaching 62.5%, APSval reaching 58.5%, and FPS reaching 113.6. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

21 pages, 13516 KiB  
Article
Real-Time Target Detection System for Animals Based on Self-Attention Improvement and Feature Extraction Optimization
by Mingyu Zhang, Fei Gao, Wuping Yang and Haoran Zhang
Appl. Sci. 2023, 13(6), 3987; https://doi.org/10.3390/app13063987 - 21 Mar 2023
Cited by 5 | Viewed by 2553
Abstract
In this paper, we propose a wildlife detection algorithm based on improved YOLOv5s by combining six real wildlife images of different sizes and forms as datasets. Firstly, we use the RepVGG model to simplify the network structure that integrates the ideas of VGG [...] Read more.
In this paper, we propose a wildlife detection algorithm based on improved YOLOv5s by combining six real wildlife images of different sizes and forms as datasets. Firstly, we use the RepVGG model to simplify the network structure that integrates the ideas of VGG and ResNet. This RepVGG introduces a structural reparameterization approach to ensure model flexibility while reducing the computational effort. This not only enhances the ability of model feature extraction but also speeds up the model computation, further improving the model’s real-time performance. Secondly, we use the sliding window method of the Swin Transformer module to divide the feature map to speed up the convergence of the model and improve the real-time performance of the model. Then, it introduces the C3TR module to segment the feature map, expand the perceptual field of the feature map, solve the problem of backpropagation gradient disappearance and gradient explosion, and enhance the feature extraction and feature fusion ability of the model. Finally, the model is improved by using SimOTA, a positive and negative sample matching strategy, by introducing the cost matrix to obtain the highest accuracy with the minimum cost. The experimental results show that the improved YOLOv5s algorithm proposed in this paper improves mAP by 3.2% and FPS by 11.9 compared with the original YOLOv5s algorithm. In addition, the detection accuracy and detection speed of the improved YOLOv5s model in this paper have obvious advantages in terms of the detection effects of other common target detection algorithms on the animal dataset in this paper, which proves that the improved effectiveness and superiority of the improved YOLOv5s target detection algorithm in animal target detection. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 2573 KiB  
Article
Left Ventricle Detection from Cardiac Magnetic Resonance Relaxometry Images Using Visual Transformer
by Lisa Anita De Santi, Antonella Meloni, Maria Filomena Santarelli, Laura Pistoia, Anna Spasiano, Tommaso Casini, Maria Caterina Putti, Liana Cuccia, Filippo Cademartiri and Vincenzo Positano
Sensors 2023, 23(6), 3321; https://doi.org/10.3390/s23063321 - 21 Mar 2023
Cited by 1 | Viewed by 1776
Abstract
Left Ventricle (LV) detection from Cardiac Magnetic Resonance (CMR) imaging is a fundamental step, preliminary to myocardium segmentation and characterization. This paper focuses on the application of a Visual Transformer (ViT), a novel neural network architecture, to automatically detect LV from CMR relaxometry [...] Read more.
Left Ventricle (LV) detection from Cardiac Magnetic Resonance (CMR) imaging is a fundamental step, preliminary to myocardium segmentation and characterization. This paper focuses on the application of a Visual Transformer (ViT), a novel neural network architecture, to automatically detect LV from CMR relaxometry sequences. We implemented an object detector based on the ViT model to identify LV from CMR multi-echo T2* sequences. We evaluated performances differentiated by slice location according to the American Heart Association model using 5-fold cross-validation and on an independent dataset of CMR T2*, T2, and T1 acquisitions. To the best of our knowledge, this is the first attempt to localize LV from relaxometry sequences and the first application of ViT for LV detection. We collected an Intersection over Union (IoU) index of 0.68 and a Correct Identification Rate (CIR) of blood pool centroid of 0.99, comparable with other state-of-the-art methods. IoU and CIR values were significantly lower in apical slices. No significant differences in performances were assessed on independent T2* dataset (IoU = 0.68, p = 0.405; CIR = 0.94, p = 0.066). Performances were significantly worse on the T2 and T1 independent datasets (T2: IoU = 0.62, CIR = 0.95; T1: IoU = 0.67, CIR = 0.98), but still encouraging considering the different types of acquisition. This study confirms the feasibility of the application of ViT architectures in LV detection and defines a benchmark for relaxometry imaging. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

17 pages, 912 KiB  
Article
Skew Class-Balanced Re-Weighting for Unbiased Scene Graph Generation
by Haeyong Kang and Chang D. Yoo
Mach. Learn. Knowl. Extr. 2023, 5(1), 287-303; https://doi.org/10.3390/make5010018 - 10 Mar 2023
Cited by 2 | Viewed by 2050
Abstract
An unbiased scene graph generation (SGG) algorithm referred to as Skew Class-Balanced Re-Weighting (SCR) is proposed for considering the unbiased predicate prediction caused by the long-tailed distribution. The prior works focus mainly on alleviating the deteriorating performances of the minority predicate predictions, showing [...] Read more.
An unbiased scene graph generation (SGG) algorithm referred to as Skew Class-Balanced Re-Weighting (SCR) is proposed for considering the unbiased predicate prediction caused by the long-tailed distribution. The prior works focus mainly on alleviating the deteriorating performances of the minority predicate predictions, showing drastic dropping recall scores, i.e., losing the majority predicate performances. It has not yet correctly analyzed the trade-off between majority and minority predicate performances in the limited SGG datasets. In this paper, to alleviate the issue, the Skew Class-Balanced Re-Weighting (SCR) loss function is considered for the unbiased SGG models. Leveraged by the skewness of biased predicate predictions, the SCR estimates the target predicate weight coefficient and then re-weights more to the biased predicates for better trading-off between the majority predicates and the minority ones. Extensive experiments conducted on the standard Visual Genome dataset and Open Image V4 and V6 show the performances and generality of the SCR with the traditional SGG models. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

15 pages, 4932 KiB  
Article
Research on Crack Width Measurement Based on Binocular Vision and Improved DeeplabV3+
by Chaoxin Chen and Peng Shen
Appl. Sci. 2023, 13(5), 2752; https://doi.org/10.3390/app13052752 - 21 Feb 2023
Cited by 6 | Viewed by 1956
Abstract
Crack width is the main manifestation of concrete material deterioration. To measure the crack information quickly and conveniently, a non-contact measurement method of concrete planar structure crack based on binocular vision is proposed. Firstly, an improved DeeplabV3+ semantic segmentation model is proposed, which [...] Read more.
Crack width is the main manifestation of concrete material deterioration. To measure the crack information quickly and conveniently, a non-contact measurement method of concrete planar structure crack based on binocular vision is proposed. Firstly, an improved DeeplabV3+ semantic segmentation model is proposed, which uses L-MobileNetV2 as the backbone feature extraction network, adopts IDAM structure to extract high-level semantic information, introduces ECA attention mechanism, and optimizes the loss function of the model to achieve high-precision segmentation of crack areas. Secondly, the plane space coordinate equation of the concrete structure was constructed based on the principle of binocular vision and SIFT feature point matching, and the crack width was calculated by combining the segmented image. Finally, to verify the performance of the above method, a measurement test platform was built. The experimental results show that the RMSE of the crack measurement by using the algorithm is less than 0.2 mm, and the error rate is less than 4%, which has stable accuracy in different measurement angles. It solves the problem of fast and convenient measurement of the crack width of concrete planar structures in an outdoor environment. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

14 pages, 7867 KiB  
Article
Hyperspectral Imaging Sorting of Refurbishment Plasterboard Waste
by Miguel Castro-Díaz, Mohamed Osmani, Sergio Cavalaro, Íñigo Cacho, Iratxe Uria, Paul Needham, Jeremy Thompson, Bill Parker and Tatiana Lovato
Appl. Sci. 2023, 13(4), 2413; https://doi.org/10.3390/app13042413 - 13 Feb 2023
Viewed by 1657
Abstract
Post-consumer plasterboard waste sorting is carried out manually by operators, which is time-consuming and costly. In this work, a laboratory-scale hyperspectral imaging (HSI) system was evaluated for automatic refurbishment plasterboard waste sorting. The HSI system was trained to differentiate between plasterboard (gypsum core [...] Read more.
Post-consumer plasterboard waste sorting is carried out manually by operators, which is time-consuming and costly. In this work, a laboratory-scale hyperspectral imaging (HSI) system was evaluated for automatic refurbishment plasterboard waste sorting. The HSI system was trained to differentiate between plasterboard (gypsum core between two lining papers) and contaminants (e.g., wood, plastics, mortar or ceramics). Segregated plasterboard samples were crushed and sieved to obtain gypsum particles of less than 250 microns, which were characterized through X-ray fluorescence to determine their chemical purity levels. Refurbishment plasterboard waste particles <10 mm in size were not processed with the HSI-based sorting system because the manual processing of these particles at a laboratory scale would have been very time-consuming. Gypsum from refurbishment plasterboard waste particles <10 mm in size contained very small amounts of undesirable chemical impurities for plasterboard manufacturing (chloride, magnesium, sodium, potassium and phosphorus salts), and its chemical purity was similar to that of the gypsum from HSI-sorted plasterboard (96 wt%). The combination of unprocessed refurbishment plasterboard waste <10 mm with HSI-sorted plasterboard ≥10 mm in size led to a plasterboard recovery yield >98 wt%. These findings underpin the potential implementation of an industrial-scale HSI system for plasterboard waste sorting. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

13 pages, 1506 KiB  
Article
Infrared Macrothermoscopy Patterns—A New Category of Dermoscopy
by Flavio Leme Ferrari, Marcos Leal Brioschi, Carlos Dalmaso Neto and Carlos Roberto de Medeiros
J. Imaging 2023, 9(2), 36; https://doi.org/10.3390/jimaging9020036 - 06 Feb 2023
Cited by 2 | Viewed by 1692
Abstract
(1) Background: The authors developed a new non-invasive dermatological infrared macroimaging analysis technique (MacroIR) that evaluates microvascular, inflammatory, and metabolic changes that may be dermoscopy complimentary, by analyzing different skin and mucosal lesions in a combined way—naked eye, polarized light dermatoscopy (PLD), and [...] Read more.
(1) Background: The authors developed a new non-invasive dermatological infrared macroimaging analysis technique (MacroIR) that evaluates microvascular, inflammatory, and metabolic changes that may be dermoscopy complimentary, by analyzing different skin and mucosal lesions in a combined way—naked eye, polarized light dermatoscopy (PLD), and MacroIR—and comparing results; (2) Methods: ten cases were evaluated using a smartphone coupled with a dermatoscope and a macro lens integrated far-infrared transducer into specific software to capture and organize high-resolution images in different electromagnetic spectra, and then analyzed by a dermatologist; (3) Results: It was possible to identify and compare structures found in two dermoscopic forms. Visual anatomical changes were correlated with MacroIR and aided skin surface dermatological analysis, presenting studied area microvascular, inflammatory, and metabolic data. All MacroIR images correlated with PLD, naked eye examination, and histopathological findings; (4) Conclusion: MacroIR and clinic dermatologist concordance rates were comparable for all dermatological conditions in this study. MacroIR imaging is a promising method that can improve dermatological diseases diagnosis. The observations are preliminary and require further evaluation in larger studies. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

25 pages, 14005 KiB  
Article
On Deceiving Malware Classification with Section Injection
by Adeilson Antonio da Silva and Mauricio Pamplona Segundo
Mach. Learn. Knowl. Extr. 2023, 5(1), 144-168; https://doi.org/10.3390/make5010009 - 16 Jan 2023
Cited by 3 | Viewed by 2374
Abstract
We investigate how to modify executable files to deceive malware classification systems. This work’s main contribution is a methodology to inject bytes across a malware file randomly and use it both as an attack to decrease classification accuracy but also as a defensive [...] Read more.
We investigate how to modify executable files to deceive malware classification systems. This work’s main contribution is a methodology to inject bytes across a malware file randomly and use it both as an attack to decrease classification accuracy but also as a defensive method, augmenting the data available for training. It respects the operating system file format to make sure the malware will still execute after our injection and will not change its behavior. We reproduced five state-of-the-art malware classification approaches to evaluate our injection scheme: one based on Global Image Descriptor (GIST) + K-Nearest-Neighbors (KNN), three Convolutional Neural Network (CNN) variations and one Gated CNN. We performed our experiments on a public dataset with 9339 malware samples from 25 different families. Our results show that a mere increase of 7% in the malware size causes an accuracy drop between 25% and 40% for malware family classification. They show that an automatic malware classification system may not be as trustworthy as initially reported in the literature. We also evaluate using modified malware alongside the original ones to increase networks robustness against the mentioned attacks. The results show that a combination of reordering malware sections and injecting random data can improve the overall performance of the classification. All the code is publicly available. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

21 pages, 5736 KiB  
Article
Fuzzy Model for the Automatic Recognition of Human Dendritic Cells
by Marwa Braiki, Kamal Nasreddine, Abdesslam Benzinou and Nolwenn Hymery
J. Imaging 2023, 9(1), 13; https://doi.org/10.3390/jimaging9010013 - 06 Jan 2023
Viewed by 1803
Abstract
Background and objective: Nowadays, foodborne illness is considered one of the most outgrowing diseases in the world, and studies show that its rate increases sharply each year. Foodborne illness is considered a public health problem which is caused by numerous factors, such as [...] Read more.
Background and objective: Nowadays, foodborne illness is considered one of the most outgrowing diseases in the world, and studies show that its rate increases sharply each year. Foodborne illness is considered a public health problem which is caused by numerous factors, such as food intoxications, allergies, intolerances, etc. Mycotoxin is one of the food contaminants which is caused by various species of molds (or fungi), which, in turn, causes intoxications that can be chronic or acute. Thus, even low concentrations of Mycotoxin have a severely harmful impact on human health. It is, therefore, necessary to develop an assessment tool for evaluating their impact on the immune response. Recently, researchers have approved a new method of investigation using human dendritic cells, yet the analysis of the geometric properties of these cells is still visual. Moreover, this type of analysis is subjective, time-consuming, and difficult to perform manually. In this paper, we address the automation of this evaluation using image-processing techniques. Methods: Automatic classification approaches of microscopic dendritic cell images are developed to provide a fast and objective evaluation. The first proposed classifier is based on support vector machines (SVM) and Fisher’s linear discriminant analysis (FLD) method. The FLD–SVM classifier does not provide satisfactory results due to the significant confusion between the inhibited cells on one hand, and the other two cell types (mature and immature) on the other hand. Then, another strategy was suggested to enhance dendritic cell recognition results that are emitted from microscopic images. This strategy is mainly based on fuzzy logic which allows us to consider the uncertainties and inaccuracies of the given data. Results: These proposed methods are tested on a real dataset consisting of 421 images of microscopic dendritic cells, where the fuzzy classification scheme efficiently improved the classification results by successfully classifying 96.77% of the dendritic cells. Conclusions: The fuzzy classification-based tools provide cell maturity and inhibition rates which help biologists evaluate severe health impacts caused by food contaminants. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

15 pages, 3276 KiB  
Article
Synthetic Data Generation for Visual Detection of Flattened PET Bottles
by Vitālijs Feščenko, Jānis Ārents and Roberts Kadiķis
Mach. Learn. Knowl. Extr. 2023, 5(1), 14-28; https://doi.org/10.3390/make5010002 - 29 Dec 2022
Viewed by 2526
Abstract
Polyethylene terephthalate (PET) bottle recycling is a highly automated task; however, manual quality control is required due to inefficiencies of the process. In this paper, we explore automation of the quality control sub-task, namely visual bottle detection, using convolutional neural network (CNN)-based methods [...] Read more.
Polyethylene terephthalate (PET) bottle recycling is a highly automated task; however, manual quality control is required due to inefficiencies of the process. In this paper, we explore automation of the quality control sub-task, namely visual bottle detection, using convolutional neural network (CNN)-based methods and synthetic generation of labelled training data. We propose a synthetic generation pipeline tailored for transparent and crushed PET bottle detection; however, it can also be applied to undeformed bottles if the viewpoint is set from above. We conduct various experiments on CNNs to compare the quality of real and synthetic data, show that synthetic data can reduce the amount of real data required and experiment with the combination of both datasets in multiple ways to obtain the best performance. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

12 pages, 2255 KiB  
Communication
Prediction of Carlson Trophic State Index of Small Inland Water from UAV-Based Multispectral Image Modeling
by Cheng-Yun Lin, Ming-Shiun Tsai, Jeff T. H. Tsai and Chih-Cheng Lu
Appl. Sci. 2023, 13(1), 451; https://doi.org/10.3390/app13010451 - 29 Dec 2022
Cited by 1 | Viewed by 1303
Abstract
This paper demonstrates a predictive method for the spatially explicit and periodic in situ monitoring of surface water quality in a small lake using an unmanned aerial vehicle (UAV), equipped with a multi-spectrometer. According to the reflectance of different substances in different spectral [...] Read more.
This paper demonstrates a predictive method for the spatially explicit and periodic in situ monitoring of surface water quality in a small lake using an unmanned aerial vehicle (UAV), equipped with a multi-spectrometer. According to the reflectance of different substances in different spectral bands, multiple regression analyses are used to determine the models that comprise the most relevant band combinations from the multispectral images for the eutrophication assessment of lake water. The relevant eutrophication parameters, such as chlorophyll a, total phosphorus, transparency and dissolved oxygen, are, thus, evaluated and expressed by these regression models. Our experiments find that the predicted eutrophication parameters from the corresponding regression models may generally exhibit good linear results with the coefficients of determination (R2) ranging from 0.7339 to 0.9406. In addition, the result of Carlson trophic state index (CTSI), determined by the on-site water quality sampling data, is found to be rather consistent with the predicted results using the regression model data proposed in this research. The maximal error in CTSI accuracy is as low as 1.4% and the root mean square error (RMSE) is only 0.6624, which reveals the great potential of low-altitude drones equipped with multispectrometers in real-time monitoring and evaluation of the trophic status of a surface water body in an ecosystem. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1