Topic Editors

School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
Dr. Wenqi Ren
School of Cyber Science and Technology, Sun Yat-Sen University, Guangzhou 510275, China

Applications in Image Analysis and Pattern Recognition

Abstract submission deadline
31 May 2024
Manuscript submission deadline
31 August 2024
Viewed by
79605

Topic Information

Dear Colleagues,

There could be up to ~80% neurons in the human brain related to processing visual information and cognition. Apparently, image analysis and pattern recognition are at the core of artificial intelligence, which aims to design computer programs to achieve or mimic human-like intelligence in perception and inference in the real world. With the rapid development of visual sensors and imaging technologies, image analysis and pattern recognition techniques have been extensively applied in various artificial intelligence-related areas, from industry and agriculture to surveillance and social security, etc.

Despite the significant success in methods for image analysis and pattern recognition in the past decade, their applications in addressing real problems are still unsatisfactory. Such a status also indicates a non-neglectable gap between theoretical progress and real applications in the related areas. The collection of these topics moves toward narrowing such gaps, and so we invite papers on both theorical and applied issues related to image analysis and pattern recognition.

All interested authors are invited to submit their innovative methods on the following (but are not limited to) aspects:

  • Deep learning based methods for image analysis;
  • Deep learning based methods for video analysis;
  • Image fusion methods and applications;
  • Multimedia systems and applications;
  • Image enhancement and restoration methods and their applications;
  • Image analysis and pattern recognition for robotics and unmanned systems;
  • Document image analysis and applications;
  • Structural pattern recognition methods and applications;
  • Biomedical image analysis and applications;
  • Advances in pattern recognition theories.

Prof. Dr. Bin Fan
Dr. Wenqi Ren
Topic Editors

Keywords

  • image analysis
  • pattern recognition
  • structural pattern recognition
  • computer vision
  • multimedia analysis
  • deep learning
  • document image analysis
  • image enhancement
  • image restoration
  • biomedical image analysis
  • robotics
  • unmanned systems
  • image retrieval
  • image understanding
  • feature extraction
  • image segmentation
  • semantic segmentation
  • object detection
  • image classification
  • image acquiring techniques

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Applied Sciences
applsci
2.7 4.5 2011 15.8 Days CHF 2300 Submit
Sensors
sensors
3.9 6.8 2001 16.4 Days CHF 2600 Submit
Journal of Imaging
jimaging
3.2 4.4 2015 21.9 Days CHF 1600 Submit
Machine Learning and Knowledge Extraction
make
3.9 8.5 2019 19.2 Days CHF 1400 Submit

Preprints is a platform dedicated to making early versions of research outputs permanently available and citable. MDPI journals allow posting on preprint servers such as Preprints.org prior to publication. For more details about reprints, please visit https://www.preprints.org.

Published Papers (64 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
Article
Efficient Extraction of Deep Image Features Using a Convolutional Neural Network (CNN) for Detecting Ventricular Fibrillation and Tachycardia
J. Imaging 2023, 9(9), 190; https://doi.org/10.3390/jimaging9090190 - 18 Sep 2023
Viewed by 285
Abstract
To safely select the proper therapy for ventricular fibrillation (VF), it is essential to distinguish it correctly from ventricular tachycardia (VT) and other rhythms. Provided that the required therapy is not the same, an erroneous detection might [...] Read more.
To safely select the proper therapy for ventricular fibrillation (VF), it is essential to distinguish it correctly from ventricular tachycardia (VT) and other rhythms. Provided that the required therapy is not the same, an erroneous detection might lead to serious injuries to the patient or even cause ventricular fibrillation (VF). The primary innovation of this study lies in employing a CNN to create new features. These features exhibit the capacity and precision to detect and classify cardiac arrhythmias, including VF and VT. The electrocardiographic (ECG) signals utilized for this assessment were sourced from the established MIT-BIH and AHA databases. The input data to be classified are time–frequency (tf) representation images, specifically, Pseudo Wigner–Ville (PWV). Previous to Pseudo Wigner–Ville (PWV) calculation, preprocessing for denoising, signal alignment, and segmentation is necessary. In order to check the validity of the method independently of the classifier, four different CNNs are used: InceptionV3, MobilNet, VGGNet and AlexNet. The classification results reveal the following values: for VF detection, there is a sensitivity (Sens) of 98.16%, a specificity (Spe) of 99.07%, and an accuracy (Acc) of 98.91%; for ventricular tachycardia (VT), the sensitivity is 90.45%, the specificity is 99.73%, and the accuracy is 99.09%; for normal sinus rhythms, sensitivity stands at 99.34%, specificity is 98.35%, and accuracy is 98.89%; finally, for other rhythms, the sensitivity is 96.98%, the specificity is 99.68%, and the accuracy is 99.11%. Furthermore, distinguishing between shockable (VF/VT) and non-shockable rhythms yielded a sensitivity of 99.23%, a specificity of 99.74%, and an accuracy of 99.61%. The results show that using tf representations as a form of image, combined in this case with a CNN classifier, raises the classification performance above the results in previous works. Considering that these results were achieved without the preselection of ECG episodes, it can be concluded that these features may be successfully introduced in Automated External Defibrillation (AED) and Implantable Cardioverter Defibrillation (ICD) therapies, also opening the door to their use in other ECG rhythm detection applications. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Using Different Types of Artificial Neural Networks to Classify 2D Matrix Codes and Their Rotations—A Comparative Study
J. Imaging 2023, 9(9), 188; https://doi.org/10.3390/jimaging9090188 - 18 Sep 2023
Viewed by 196
Abstract
Artificial neural networks can solve various tasks in computer vision, such as image classification, object detection, and general recognition. Our comparative study deals with four types of artificial neural networks—multilayer perceptrons, probabilistic neural networks, radial basis function neural networks, and convolutional neural networks—and [...] Read more.
Artificial neural networks can solve various tasks in computer vision, such as image classification, object detection, and general recognition. Our comparative study deals with four types of artificial neural networks—multilayer perceptrons, probabilistic neural networks, radial basis function neural networks, and convolutional neural networks—and investigates their ability to classify 2D matrix codes (Data Matrix codes, QR codes, and Aztec codes) as well as their rotation. The paper presents the basic building blocks of these artificial neural networks and their architecture and compares the classification accuracy of 2D matrix codes under different configurations of these neural networks. A dataset of 3000 synthetic code samples was used to train and test the neural networks. When the neural networks were trained on the full dataset, the convolutional neural network showed its superiority, followed by the RBF neural network and the multilayer perceptron. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Decoding Algorithm of Motor Imagery Electroencephalogram Signal Based on CLRNet Network Model
Sensors 2023, 23(18), 7694; https://doi.org/10.3390/s23187694 - 06 Sep 2023
Viewed by 212
Abstract
EEG decoding based on motor imagery is an important part of brain–computer interface technology and is an important indicator that determines the overall performance of the brain–computer interface. Due to the complexity of motor imagery EEG feature analysis, traditional classification models rely heavily [...] Read more.
EEG decoding based on motor imagery is an important part of brain–computer interface technology and is an important indicator that determines the overall performance of the brain–computer interface. Due to the complexity of motor imagery EEG feature analysis, traditional classification models rely heavily on the signal preprocessing and feature design stages. End-to-end neural networks in deep learning have been applied to the classification task processing of motor imagery EEG and have shown good results. This study uses a combination of a convolutional neural network (CNN) and a long short-term memory (LSTM) network to obtain spatial information and temporal correlation from EEG signals. The use of cross-layer connectivity reduces the network gradient dispersion problem and enhances the overall network model stability. The effectiveness of this network model is demonstrated on the BCI Competition IV dataset 2a by integrating CNN, BiLSTM and ResNet (called CLRNet in this study) to decode motor imagery EEG. The network model combining CNN and BiLSTM achieved 87.0% accuracy in classifying motor imagery patterns in four classes. The network stability is enhanced by adding ResNet for cross-layer connectivity, which further improved the accuracy by 2.0% to achieve 89.0% classification accuracy. The experimental results show that CLRNet has good performance in decoding the motor imagery EEG dataset. This study provides a better solution for motor imagery EEG decoding in brain–computer interface technology research. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
CRABR-Net: A Contextual Relational Attention-Based Recognition Network for Remote Sensing Scene Objective
Sensors 2023, 23(17), 7514; https://doi.org/10.3390/s23177514 - 29 Aug 2023
Viewed by 301
Abstract
Remote sensing scene objective recognition (RSSOR) plays a serious application value in both military and civilian fields. Convolutional neural networks (CNNs) have greatly enhanced the improvement of intelligent objective recognition technology for remote sensing scenes, but most of the methods using CNN for [...] Read more.
Remote sensing scene objective recognition (RSSOR) plays a serious application value in both military and civilian fields. Convolutional neural networks (CNNs) have greatly enhanced the improvement of intelligent objective recognition technology for remote sensing scenes, but most of the methods using CNN for high-resolution RSSOR either use only the feature map of the last layer or directly fuse the feature maps from various layers in the “summation” way, which not only ignores the favorable relationship information between adjacent layers but also leads to redundancy and loss of feature map, which hinders the improvement of recognition accuracy. In this study, a contextual, relational attention-based recognition network (CRABR-Net) was presented, which extracts different convolutional feature maps from CNN, focuses important feature content by using a simple, parameter-free attention module (SimAM), fuses the adjacent feature maps by using the complementary relationship feature map calculation, improves the feature learning ability by using the enhanced relationship feature map calculation, and finally uses the concatenated feature maps from different layers for RSSOR. Experimental results show that CRABR-Net exploits the relationship between the different CNN layers to improve recognition performance, achieves better results compared to several state-of-the-art algorithms, and the average accuracy on AID, UC-Merced, and RSSCN7 can be up to 96.46%, 99.20%, and 95.43% with generic training ratios. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Automatic Facial Aesthetic Prediction Based on Deep Learning with Loss Ensembles
Appl. Sci. 2023, 13(17), 9728; https://doi.org/10.3390/app13179728 - 28 Aug 2023
Viewed by 282
Abstract
Deep data-driven methodologies have significantly enhanced the automatic facial beauty prediction (FBP), particularly convolutional neural networks (CNNs). However, despite its wide utilization in classification-based applications, the adoption of CNN in regression research is still constrained. In addition, biases in beauty scores assigned to [...] Read more.
Deep data-driven methodologies have significantly enhanced the automatic facial beauty prediction (FBP), particularly convolutional neural networks (CNNs). However, despite its wide utilization in classification-based applications, the adoption of CNN in regression research is still constrained. In addition, biases in beauty scores assigned to facial images, such as preferences for specific, ethnicities, or age groups, present challenges to the effective generalization of models, which may not be appropriately addressed within conventional individual loss functions. Furthermore, regression problems commonly employ L2 loss to measure error rate, and this function is sensitive to outliers, making it difficult to generalize depending on the number of outliers in the training phase. Meanwhile, L1 loss is another regression-loss function that penalizes errors linearly and is less sensitive to outliers. The Log-cosh loss function is a flexible and robust loss function for regression problems. It provides a good compromise between the L1 and L2 loss functions. The Ensemble of multiple loss functions has been proven to improve the performance of deep-learning models in various tasks. In this work, we proposed to ensemble three regression-loss functions, namely L1, L2, and Log-cosh, and subsequently averaging them to create a new composite cost function. This strategy capitalizes on the unique traits of each loss function, constructing a unified framework that harmonizes outlier tolerance, precision, and adaptability. The proposed loss function’s effectiveness was demonstrated by incorporating it with three pretrained CNNs (AlexNet, VGG16-Net, and FIAC-Net) and evaluating it based on three FBP benchmarks (SCUT-FBP, SCUT-FBP5500, and MEBeauty). Integrating FIAC-Net with the proposed loss function yields remarkable outcomes across datasets due to its pretrained task of facial-attractiveness classification. The efficacy is evident in managing uncertain noise distributions, resulting in a strong correlation between machine- and human-rated aesthetic scores, along with low error rates. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
An Intelligent Sorting Method of Film in Cotton Combining Hyperspectral Imaging and the AlexNet-PCA Algorithm
Sensors 2023, 23(16), 7041; https://doi.org/10.3390/s23167041 - 09 Aug 2023
Viewed by 400
Abstract
Long-staple cotton from Xinjiang is renowned for its exceptional quality. However, it is susceptible to contamination with plastic film during mechanical picking. To address the issue of tricky removal of film in seed cotton, a technique based on hyperspectral images and AlexNet-PCA is [...] Read more.
Long-staple cotton from Xinjiang is renowned for its exceptional quality. However, it is susceptible to contamination with plastic film during mechanical picking. To address the issue of tricky removal of film in seed cotton, a technique based on hyperspectral images and AlexNet-PCA is proposed to identify the colorless and transparent film of the seed cotton. The method consists of black and white correction of hyperspectral images, dimensionality reduction of hyperspectral data, and training and testing of convolutional neural network (CNN) models. The key technique is to find the optimal way to reduce the dimensionality of the hyperspectral data, thus reducing the computational cost. The biggest innovation of the paper is the combination of CNNs and dimensionality reduction methods to achieve high-precision intelligent recognition of transparent plastic films. Experiments with three dimensionality reduction methods and three CNN architectures are conducted to seek the optimal model for plastic film recognition. The results demonstrate that AlexNet-PCA-12 achieves the highest recognition accuracy and cost performance in dimensionality reduction. In the practical application sorting tests, the method proposed in this paper achieved a 97.02% removal rate of plastic film, which provides a modern theoretical model and effective method for high-precision identification of heteropolymers in seed cotton. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Shot Boundary Detection with 3D Depthwise Convolutions and Visual Attention
Sensors 2023, 23(16), 7022; https://doi.org/10.3390/s23167022 - 08 Aug 2023
Viewed by 347
Abstract
Shot boundary detection is the process of identifying and locating the boundaries between individual shots in a video sequence. A shot is a continuous sequence of frames that are captured by a single camera, without any cuts or edits. Recent investigations have shown [...] Read more.
Shot boundary detection is the process of identifying and locating the boundaries between individual shots in a video sequence. A shot is a continuous sequence of frames that are captured by a single camera, without any cuts or edits. Recent investigations have shown the effectiveness of the use of 3D convolutional networks to solve this task due to its high capacity to extract spatiotemporal features of the video and determine in which frame a transition or shot change occurs. When this task is used as part of a scene segmentation use case with the aim of improving the experience of viewing content from streaming platforms, the speed of segmentation is very important for live and near-live use cases such as start-over. The problem with models based on 3D convolutions is the large number of parameters that they entail. Standard 3D convolutions impose much higher CPU and memory requirements than do the same 2D operations. In this paper, we rely on depthwise separable convolutions to address the problem but with a scheme that significantly reduces the number of parameters. To compensate for the slight loss of performance, we analyze and propose the use of visual self-attention as a mechanism of improvement. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Open-Set Recognition of Wood Species Based on Deep Learning Feature Extraction Using Leaves
J. Imaging 2023, 9(8), 154; https://doi.org/10.3390/jimaging9080154 - 30 Jul 2023
Viewed by 676
Abstract
An open-set recognition scheme for tree leaves based on deep learning feature extraction is presented in this study. Deep learning algorithms are used to extract leaf features for different wood species, and the leaf set of a wood species is divided into two [...] Read more.
An open-set recognition scheme for tree leaves based on deep learning feature extraction is presented in this study. Deep learning algorithms are used to extract leaf features for different wood species, and the leaf set of a wood species is divided into two datasets: the leaf set of a known wood species and the leaf set of an unknown species. The deep learning network (CNN) is trained on the leaves of selected known wood species, and the features of the remaining known wood species and all unknown wood species are extracted using the trained CNN. Then, the single-class classification is performed using the weighted SVDD algorithm to recognize the leaves of known and unknown wood species. The features of leaves recognized as known wood species are fed back to the trained CNN to recognize the leaves of known wood species. The recognition results of a single-class classifier for known and unknown wood species are combined with the recognition results of a multi-class CNN to finally complete the open recognition of wood species. We tested the proposed method on the publicly available Swedish Leaf Dataset, which includes 15 wood species (5 species used as known and 10 species used as unknown). The test results showed that, with F1 scores of 0.7797 and 0.8644, mixed recognition rates of 95.15% and 93.14%, and Kappa coefficients of 0.7674 and 0.8644 under two different data distributions, the proposed method outperformed the state-of-the-art open-set recognition algorithms in all three aspects. And, the more wood species that are known, the better the recognition. This approach can extract effective features from tree leaf images for open-set recognition and achieve wood species recognition without compromising tree material. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Tomato Maturity Detection and Counting Model Based on MHSA-YOLOv8
Sensors 2023, 23(15), 6701; https://doi.org/10.3390/s23156701 - 26 Jul 2023
Viewed by 869
Abstract
The online automated maturity grading and counting of tomato fruits has a certain promoting effect on digital supervision of fruit growth status and unmanned precision operations during the planting process. The traditional grading and counting of tomato fruit maturity is mostly done manually, [...] Read more.
The online automated maturity grading and counting of tomato fruits has a certain promoting effect on digital supervision of fruit growth status and unmanned precision operations during the planting process. The traditional grading and counting of tomato fruit maturity is mostly done manually, which is time-consuming and laborious work, and its precision depends on the accuracy of human eye observation. The combination of artificial intelligence and machine vision has to some extent solved this problem. In this work, firstly, a digital camera is used to obtain tomato fruit image datasets, taking into account factors such as occlusion and external light interference. Secondly, based on the tomato maturity grading task requirements, the MHSA attention mechanism is adopted to improve YOLOv8’s backbone to enhance the network’s ability to extract diverse features. The Precision, Recall, F1-score, and mAP50 of the tomato fruit maturity grading model constructed based on MHSA-YOLOv8 were 0.806, 0.807, 0.806, and 0.864, respectively, which improved the performance of the model with a slight increase in model size. Finally, thanks to the excellent performance of MHSA-YOLOv8, the Precision, Recall, F1-score, and mAP50 of the constructed counting models were 0.990, 0.960, 0.975, and 0.916, respectively. The tomato maturity grading and counting model constructed in this study is not only suitable for online detection but also for offline detection, which greatly helps to improve the harvesting and grading efficiency of tomato growers. The main innovations of this study are summarized as follows: (1) a tomato maturity grading and counting dataset collected from actual production scenarios was constructed; (2) considering the complexity of the environment, this study proposes a new object detection method, MHSA-YOLOv8, and constructs tomato maturity grading models and counting models, respectively; (3) the models constructed in this study are not only suitable for online grading and counting but also for offline grading and counting. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Quantitative CT Metrics Associated with Variability in the Diffusion Capacity of the Lung of Post-COVID-19 Patients with Minimal Residual Lung Lesions
J. Imaging 2023, 9(8), 150; https://doi.org/10.3390/jimaging9080150 - 26 Jul 2023
Cited by 1 | Viewed by 490
Abstract
(1) Background: A reduction in the diffusion capacity of the lung for carbon monoxide is a prevalent longer-term consequence of COVID-19 infection. In patients who have zero or minimal residual radiological abnormalities in the lungs, it has been debated whether the cause was [...] Read more.
(1) Background: A reduction in the diffusion capacity of the lung for carbon monoxide is a prevalent longer-term consequence of COVID-19 infection. In patients who have zero or minimal residual radiological abnormalities in the lungs, it has been debated whether the cause was mainly due to a reduced alveolar volume or involved diffuse interstitial or vascular abnormalities. (2) Methods: We performed a cross-sectional study of 45 patients with either zero or minimal residual lesions in the lungs (total volume < 7 cc) at two months to one year post COVID-19 infection. There was considerable variability in the diffusion capacity of the lung for carbon monoxide, with 27% of the patients at less than 80% of the predicted reference. We investigated a set of independent variables that may affect the diffusion capacity of the lung, including demographic, pulmonary physiology and CT (computed tomography)-derived variables of vascular volume, parenchymal density and residual lesion volume. (3) Results: The leading three variables that contributed to the variability in the diffusion capacity of the lung for carbon monoxide were the alveolar volume, determined via pulmonary function tests, the blood vessel volume fraction, determined via CT, and the parenchymal radiodensity, also determined via CT. These factors explained 49% of the variance of the diffusion capacity, with p values of 0.031, 0.005 and 0.018, respectively, after adjusting for confounders. A multiple-regression model combining these three variables fit the measured values of the diffusion capacity, with R = 0.70 and p < 0.001. (4) Conclusions: The results are consistent with the notion that in some post-COVID-19 patients, after their pulmonary lesions resolve, diffuse changes in the vascular and parenchymal structures, in addition to a low alveolar volume, could be contributors to a lingering low diffusion capacity. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Varroa Destructor Classification Using Legendre–Fourier Moments with Different Color Spaces
J. Imaging 2023, 9(7), 144; https://doi.org/10.3390/jimaging9070144 - 14 Jul 2023
Viewed by 858
Abstract
Bees play a critical role in pollination and food production, so their preservation is essential, particularly highlighting the importance of detecting diseases in bees early. The Varroa destructor mite is the primary factor contributing to increased viral infections that can lead to hive [...] Read more.
Bees play a critical role in pollination and food production, so their preservation is essential, particularly highlighting the importance of detecting diseases in bees early. The Varroa destructor mite is the primary factor contributing to increased viral infections that can lead to hive mortality. This study presents an innovative method for identifying Varroa destructors in honey bees using multichannel Legendre–Fourier moments. The descriptors derived from this approach possess distinctive characteristics, such as rotation and scale invariance, and noise resistance, allowing the representation of digital images with minimal descriptors. This characteristic is advantageous when analyzing images of living organisms that are not in a static posture. The proposal evaluates the algorithm’s efficiency using different color models, and to enhance its capacity, a subdivision of the VarroaDataset is used. This enhancement allows the algorithm to process additional information about the color and shape of the bee’s legs, wings, eyes, and mouth. To demonstrate the advantages of our approach, we compare it with other deep learning methods, in semantic segmentation techniques, such as DeepLabV3, and object detection techniques, such as YOLOv5. The results suggest that our proposal offers a promising means for the early detection of the Varroa destructor mite, which could be an essential pillar in the preservation of bees and, therefore, in food production. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Augmented Reality in Maintenance—History and Perspectives
J. Imaging 2023, 9(7), 142; https://doi.org/10.3390/jimaging9070142 - 10 Jul 2023
Viewed by 891
Abstract
Augmented Reality (AR) is a technology that allows virtual elements to be superimposed over images of real contexts, whether these are text elements, graphics, or other types of objects. Smart AR glasses are increasingly optimized, and modern ones have features such as Global [...] Read more.
Augmented Reality (AR) is a technology that allows virtual elements to be superimposed over images of real contexts, whether these are text elements, graphics, or other types of objects. Smart AR glasses are increasingly optimized, and modern ones have features such as Global Positioning System (GPS), a microphone, and gesture recognition, among others. These devices allow users to have their hands free to perform tasks while they receive instructions in real time through the glasses. This allows maintenance professionals to carry out interventions more efficiently and in a shorter time than would be necessary without the support of this technology. In the present work, a timeline of important achievements is established, including important findings in object recognition, real-time operation. and integration of technologies for shop floor use. Perspectives on future research and related recommendations are proposed as well. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Fast and Efficient Evaluation of the Mass Composition of Shredded Electrodes from Lithium-Ion Batteries Using 2D Imaging
J. Imaging 2023, 9(7), 135; https://doi.org/10.3390/jimaging9070135 - 05 Jul 2023
Viewed by 834
Abstract
With the increasing number of electrical devices, especially electric vehicles, the need for efficient recycling processes of electric components is on the rise. Mechanical recycling of lithium-ion batteries includes the comminution of the electrodes and sorting the particle mixtures to achieve the highest [...] Read more.
With the increasing number of electrical devices, especially electric vehicles, the need for efficient recycling processes of electric components is on the rise. Mechanical recycling of lithium-ion batteries includes the comminution of the electrodes and sorting the particle mixtures to achieve the highest possible purities of the individual material components (e.g., copper and aluminum). An important part of recycling is the quantitative determination of the yield and recovery rate, which is required to adapt the processes to different feed materials. Since this is usually done by sorting individual particles manually before determining the mass of each material, we developed a novel method for automating this evaluation process. The method is based on detecting the different material particles in images based on simple thresholding techniques and analyzing the correlation of the area of each material in the field of view to the mass in the previously prepared samples. This can then be applied to further samples to determine their mass composition. Using this automated method, the process is accelerated, the accuracy is improved compared to a human operator, and the cost of the evaluation process is reduced. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
MineSDS: A Unified Framework for Small Object Detection and Drivable Area Segmentation for Open-Pit Mining Scenario
Sensors 2023, 23(13), 5977; https://doi.org/10.3390/s23135977 - 27 Jun 2023
Viewed by 572
Abstract
To tackle the challenges posed by dense small objects and fuzzy boundaries on unstructured roads in the mining scenario, we proposed an end-to-end small object detection and drivable area segmentation framework for open-pit mining. We employed a convolutional network backbone as a feature [...] Read more.
To tackle the challenges posed by dense small objects and fuzzy boundaries on unstructured roads in the mining scenario, we proposed an end-to-end small object detection and drivable area segmentation framework for open-pit mining. We employed a convolutional network backbone as a feature extractor for both two tasks, as multi-task learning yielded promising results in autonomous driving perception. To address small object detection, we introduced a lightweight attention module that allowed our network to focus more on the spatial and channel dimensions of small objects without impeding inference time. We also used a convolutional block attention module in the drivable area segmentation subnetwork, which assigned more weight to road boundaries to improve feature mapping capabilities. Furthermore, to improve our network perception accuracy of both tasks, we used weighted summation when designing the loss function. We validated the effectiveness of our approach by testing it on pre-collected mining data which were called Minescape. Our detection results on the Minescape dataset showed 87.8% mAP index, which was 9.3% higher than state-of-the-art algorithms. Our segmentation results surpassed the comparison algorithm by 1 percent in MIoU index. Our experimental results demonstrated that our approach achieves competitive performance. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
An Approach for 3D Modeling of the Regular Relief Surface Topography Formed by a Ball Burnishing Process Using 2D Images and Measured Profilograms
Sensors 2023, 23(13), 5801; https://doi.org/10.3390/s23135801 - 21 Jun 2023
Viewed by 493
Abstract
Advanced in the present paper is an innovative approach for three-dimensional modeling of the regular relief topography formed via a ball burnishing process. The proposed methodology involves capturing a greyscale image of and profile measuring the surface topography in two perpendicular directions using [...] Read more.
Advanced in the present paper is an innovative approach for three-dimensional modeling of the regular relief topography formed via a ball burnishing process. The proposed methodology involves capturing a greyscale image of and profile measuring the surface topography in two perpendicular directions using a stylus method. A specially developed algorithm further identifies the best match between the measured profile segment and a row or column from the captured topography image by carrying out a signal correlation assessment based on an appropriate similarity metric. To ensure accurate scaling, the image pixel grey levels are scaled with a factor calculated as being the larger ratio between the ultimate heights of the measured profilograms and the more perfectly matched image row/column. Nine different similarity metrics were tested to determine the best performing model. The developed approach was evaluated for eight distinct types of fully and partially regular reliefs, and the results reveal that the best-scaled 3D topography models are produced for the fully regular reliefs with much greater heights. Following a thorough analysis of the results obtained, at the end of the paper, we draw some conclusions and discuss potential future work. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Analysis of the Asymmetry between Both Eyes in Early Diagnosis of Glaucoma Combining Features Extracted from Retinal Images and OCTs into Classification Models
Sensors 2023, 23(10), 4737; https://doi.org/10.3390/s23104737 - 14 May 2023
Viewed by 768
Abstract
This study aims to analyze the asymmetry between both eyes of the same patient for the early diagnosis of glaucoma. Two imaging modalities, retinal fundus images and optical coherence tomographies (OCTs), have been considered in order to compare their different capabilities for glaucoma [...] Read more.
This study aims to analyze the asymmetry between both eyes of the same patient for the early diagnosis of glaucoma. Two imaging modalities, retinal fundus images and optical coherence tomographies (OCTs), have been considered in order to compare their different capabilities for glaucoma detection. From retinal fundus images, the difference between cup/disc ratio and the width of the optic rim has been extracted. Analogously, the thickness of the retinal nerve fiber layer has been measured in spectral-domain optical coherence tomographies. These measurements have been considered as asymmetry characteristics between eyes in the modeling of decision trees and support vector machines for the classification of healthy and glaucoma patients. The main contribution of this work is indeed the use of different classification models with both imaging modalities to jointly exploit the strengths of each of these modalities for the same diagnostic purpose based on the asymmetry characteristics between the eyes of the patient. The results show that the optimized classification models provide better performance with OCT asymmetry features between both eyes (sensitivity 80.9%, specificity 88.2%, precision 66.7%, accuracy 86.5%) than with those extracted from retinographies, although a linear relationship has been found between certain asymmetry features extracted from both imaging modalities. Therefore, the resulting performance of the models based on asymmetry features proves their ability to differentiate healthy from glaucoma patients using those metrics. Models trained from fundus characteristics are a useful option as a glaucoma screening method in the healthy population, although with lower performance than those trained from the thickness of the peripapillary retinal nerve fiber layer. In both imaging modalities, the asymmetry of morphological characteristics can be used as a glaucoma indicator, as detailed in this work. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Gaze-Dependent Image Re-Ranking Technique for Enhancing Content-Based Image Retrieval
Appl. Sci. 2023, 13(10), 5948; https://doi.org/10.3390/app13105948 - 11 May 2023
Viewed by 883
Abstract
Content-based image retrieval (CBIR) aims to find desired images similar to the image input by the user, and it is extensively used in the real world. Conventional CBIR methods do not consider user preferences since they only determine retrieval results by referring to [...] Read more.
Content-based image retrieval (CBIR) aims to find desired images similar to the image input by the user, and it is extensively used in the real world. Conventional CBIR methods do not consider user preferences since they only determine retrieval results by referring to the degree of resemblance or likeness between the query and potential candidate images. Because of the above reason, a “semantic gap” appears, as the model may not accurately understand the potential intention that a user has included in the query image. In this article, we propose a re-ranking method for CBIR that considers a user’s gaze trace as interactive information to help the model predict the user’s inherent attention. The proposed method uses the user’s gaze trace corresponding to the image obtained from the initial retrieval as the user’s preference information. We introduce image captioning to effectively express the relationship between images and gaze information by generating image captions based on the gaze trace. As a result, we can transform the coordinate data into a text format and explicitly express the semantic information of the images. Finally, image retrieval is performed again using the generated gaze-dependent image captions to obtain images that align more accurately with the user’s preferences or interests. The experimental results on an open image dataset with corresponding gaze traces and human-generated descriptions demonstrate the efficacy or efficiency of the proposed method. Our method considers visual information as the user’s feedback to achieve user-oriented image retrieval. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Real-Time Machine Learning-Based Driver Drowsiness Detection Using Visual Features
J. Imaging 2023, 9(5), 91; https://doi.org/10.3390/jimaging9050091 - 29 Apr 2023
Cited by 1 | Viewed by 2994
Abstract
Drowsiness-related car accidents continue to have a significant effect on road safety. Many of these accidents can be eliminated by alerting the drivers once they start feeling drowsy. This work presents a non-invasive system for real-time driver drowsiness detection using visual features. These [...] Read more.
Drowsiness-related car accidents continue to have a significant effect on road safety. Many of these accidents can be eliminated by alerting the drivers once they start feeling drowsy. This work presents a non-invasive system for real-time driver drowsiness detection using visual features. These features are extracted from videos obtained from a camera installed on the dashboard. The proposed system uses facial landmarks and face mesh detectors to locate the regions of interest where mouth aspect ratio, eye aspect ratio, and head pose features are extracted and fed to three different classifiers: random forest, sequential neural network, and linear support vector machine classifiers. Evaluations of the proposed system over the National Tsing Hua University driver drowsiness detection dataset showed that it can successfully detect and alarm drowsy drivers with an accuracy up to 99%. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Progressively Hybrid Transformer for Multi-Modal Vehicle Re-Identification
Sensors 2023, 23(9), 4206; https://doi.org/10.3390/s23094206 - 23 Apr 2023
Cited by 1 | Viewed by 1131
Abstract
Multi-modal (i.e., visible, near-infrared, and thermal-infrared) vehicle re-identification has good potential to search vehicles of interest in low illumination. However, due to the fact that different modalities have varying imaging characteristics, a proper multi-modal complementary information fusion is crucial to multi-modal vehicle re-identification. [...] Read more.
Multi-modal (i.e., visible, near-infrared, and thermal-infrared) vehicle re-identification has good potential to search vehicles of interest in low illumination. However, due to the fact that different modalities have varying imaging characteristics, a proper multi-modal complementary information fusion is crucial to multi-modal vehicle re-identification. For that, this paper proposes a progressively hybrid transformer (PHT). The PHT method consists of two aspects: random hybrid augmentation (RHA) and a feature hybrid mechanism (FHM). Regarding RHA, an image random cropper and a local region hybrider are designed. The image random cropper simultaneously crops multi-modal images of random positions, random numbers, random sizes, and random aspect ratios to generate local regions. The local region hybrider fuses the cropped regions to let regions of each modal bring local structural characteristics of all modalities, mitigating modal differences at the beginning of feature learning. Regarding the FHM, a modal-specific controller and a modal information embedding are designed to effectively fuse multi-modal information at the feature level. Experimental results show the proposed method wins the state-of-the-art method by a larger 2.7% mAP on RGBNT100 and a larger 6.6% mAP on RGBN300, demonstrating that the proposed method can learn multi-modal complementary information effectively. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Brief Report
Invariant Pattern Recognition with Log-Polar Transform and Dual-Tree Complex Wavelet-Fourier Features
Sensors 2023, 23(8), 3842; https://doi.org/10.3390/s23083842 - 09 Apr 2023
Viewed by 754
Abstract
In this paper, we propose a novel method for 2D pattern recognition by extracting features with the log-polar transform, the dual-tree complex wavelet transform (DTCWT), and the 2D fast Fourier transform (FFT2). Our new method is invariant to translation, rotation, and scaling of [...] Read more.
In this paper, we propose a novel method for 2D pattern recognition by extracting features with the log-polar transform, the dual-tree complex wavelet transform (DTCWT), and the 2D fast Fourier transform (FFT2). Our new method is invariant to translation, rotation, and scaling of the input 2D pattern images in a multiresolution way, which is very important for invariant pattern recognition. We know that very low-resolution sub-bands lose important features in the pattern images, and very high-resolution sub-bands contain significant amounts of noise. Therefore, intermediate-resolution sub-bands are good for invariant pattern recognition. Experiments on one printed Chinese character dataset and one 2D aircraft dataset show that our new method is better than two existing methods for a combination of rotation angles, scaling factors, and different noise levels in the input pattern images in most testing cases. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
YOLOv5s-CA: A Modified YOLOv5s Network with Coordinate Attention for Underwater Target Detection
Sensors 2023, 23(7), 3367; https://doi.org/10.3390/s23073367 - 23 Mar 2023
Cited by 6 | Viewed by 1791
Abstract
Underwater target detection techniques have been extensively applied to underwater vehicles for marine surveillance, aquaculture, and rescue applications. However, due to complex underwater environments and insufficient training samples, the existing underwater target recognition algorithm accuracy is still unsatisfactory. A long-term effort is essential [...] Read more.
Underwater target detection techniques have been extensively applied to underwater vehicles for marine surveillance, aquaculture, and rescue applications. However, due to complex underwater environments and insufficient training samples, the existing underwater target recognition algorithm accuracy is still unsatisfactory. A long-term effort is essential to improving underwater target detection accuracy. To achieve this goal, in this work, we propose a modified YOLOv5s network, called YOLOv5s-CA network, by embedding a Coordinate Attention (CA) module and a Squeeze-and-Excitation (SE) module, aiming to concentrate more computing power on the target to improve detection accuracy. Based on the existing YOLOv5s network, the number of bottlenecks in the first C3 module was increased from one to three to improve the performance of shallow feature extraction. The CA module was embedded into the C3 modules to improve the attention power focused on the target. The SE layer was added to the output of the C3 modules to strengthen model attention. Experiments on the data of the 2019 China Underwater Robot Competition were conducted, and the results demonstrate that the mean Average Precision (mAP) of the modified YOLOv5s network was increased by 2.4%. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Real-Time Fire Smoke Detection Method Combining a Self-Attention Mechanism and Radial Multi-Scale Feature Connection
Sensors 2023, 23(6), 3358; https://doi.org/10.3390/s23063358 - 22 Mar 2023
Viewed by 1210
Abstract
Fire remains a pressing issue that requires urgent attention. Due to its uncontrollable and unpredictable nature, it can easily trigger chain reactions and increase the difficulty of extinguishing, posing a significant threat to people’s lives and property. The effectiveness of traditional photoelectric- or [...] Read more.
Fire remains a pressing issue that requires urgent attention. Due to its uncontrollable and unpredictable nature, it can easily trigger chain reactions and increase the difficulty of extinguishing, posing a significant threat to people’s lives and property. The effectiveness of traditional photoelectric- or ionization-based detectors is inhibited when detecting fire smoke due to the variable shape, characteristics, and scale of the detected objects and the small size of the fire source in the early stages. Additionally, the uneven distribution of fire and smoke and the complexity and variety of the surroundings in which they occur contribute to inconspicuous pixel-level-based feature information, making identification difficult. We propose a real-time fire smoke detection algorithm based on multi-scale feature information and an attention mechanism. Firstly, the feature information layers extracted from the network are fused into a radial connection to enhance the semantic and location information of the features. Secondly, to address the challenge of recognizing harsh fire sources, we designed a permutation self-attention mechanism to concentrate on features in channel and spatial directions to gather contextual information as accurately as possible. Thirdly, we constructed a new feature extraction module to increase the detection efficiency of the network while retaining feature information. Finally, we propose a cross-grid sample matching approach and a weighted decay loss function to handle the issue of imbalanced samples. Our model achieves the best detection results compared to standard detection methods using a handcrafted fire smoke detection dataset, with APval reaching 62.5%, APSval reaching 58.5%, and FPS reaching 113.6. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Real-Time Target Detection System for Animals Based on Self-Attention Improvement and Feature Extraction Optimization
Appl. Sci. 2023, 13(6), 3987; https://doi.org/10.3390/app13063987 - 21 Mar 2023
Cited by 2 | Viewed by 1503
Abstract
In this paper, we propose a wildlife detection algorithm based on improved YOLOv5s by combining six real wildlife images of different sizes and forms as datasets. Firstly, we use the RepVGG model to simplify the network structure that integrates the ideas of VGG [...] Read more.
In this paper, we propose a wildlife detection algorithm based on improved YOLOv5s by combining six real wildlife images of different sizes and forms as datasets. Firstly, we use the RepVGG model to simplify the network structure that integrates the ideas of VGG and ResNet. This RepVGG introduces a structural reparameterization approach to ensure model flexibility while reducing the computational effort. This not only enhances the ability of model feature extraction but also speeds up the model computation, further improving the model’s real-time performance. Secondly, we use the sliding window method of the Swin Transformer module to divide the feature map to speed up the convergence of the model and improve the real-time performance of the model. Then, it introduces the C3TR module to segment the feature map, expand the perceptual field of the feature map, solve the problem of backpropagation gradient disappearance and gradient explosion, and enhance the feature extraction and feature fusion ability of the model. Finally, the model is improved by using SimOTA, a positive and negative sample matching strategy, by introducing the cost matrix to obtain the highest accuracy with the minimum cost. The experimental results show that the improved YOLOv5s algorithm proposed in this paper improves mAP by 3.2% and FPS by 11.9 compared with the original YOLOv5s algorithm. In addition, the detection accuracy and detection speed of the improved YOLOv5s model in this paper have obvious advantages in terms of the detection effects of other common target detection algorithms on the animal dataset in this paper, which proves that the improved effectiveness and superiority of the improved YOLOv5s target detection algorithm in animal target detection. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Left Ventricle Detection from Cardiac Magnetic Resonance Relaxometry Images Using Visual Transformer
Sensors 2023, 23(6), 3321; https://doi.org/10.3390/s23063321 - 21 Mar 2023
Cited by 1 | Viewed by 1300
Abstract
Left Ventricle (LV) detection from Cardiac Magnetic Resonance (CMR) imaging is a fundamental step, preliminary to myocardium segmentation and characterization. This paper focuses on the application of a Visual Transformer (ViT), a novel neural network architecture, to automatically detect LV from CMR relaxometry [...] Read more.
Left Ventricle (LV) detection from Cardiac Magnetic Resonance (CMR) imaging is a fundamental step, preliminary to myocardium segmentation and characterization. This paper focuses on the application of a Visual Transformer (ViT), a novel neural network architecture, to automatically detect LV from CMR relaxometry sequences. We implemented an object detector based on the ViT model to identify LV from CMR multi-echo T2* sequences. We evaluated performances differentiated by slice location according to the American Heart Association model using 5-fold cross-validation and on an independent dataset of CMR T2*, T2, and T1 acquisitions. To the best of our knowledge, this is the first attempt to localize LV from relaxometry sequences and the first application of ViT for LV detection. We collected an Intersection over Union (IoU) index of 0.68 and a Correct Identification Rate (CIR) of blood pool centroid of 0.99, comparable with other state-of-the-art methods. IoU and CIR values were significantly lower in apical slices. No significant differences in performances were assessed on independent T2* dataset (IoU = 0.68, p = 0.405; CIR = 0.94, p = 0.066). Performances were significantly worse on the T2 and T1 independent datasets (T2: IoU = 0.62, CIR = 0.95; T1: IoU = 0.67, CIR = 0.98), but still encouraging considering the different types of acquisition. This study confirms the feasibility of the application of ViT architectures in LV detection and defines a benchmark for relaxometry imaging. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Skew Class-Balanced Re-Weighting for Unbiased Scene Graph Generation
Mach. Learn. Knowl. Extr. 2023, 5(1), 287-303; https://doi.org/10.3390/make5010018 - 10 Mar 2023
Viewed by 1476
Abstract
An unbiased scene graph generation (SGG) algorithm referred to as Skew Class-Balanced Re-Weighting (SCR) is proposed for considering the unbiased predicate prediction caused by the long-tailed distribution. The prior works focus mainly on alleviating the deteriorating performances of the minority predicate predictions, showing [...] Read more.
An unbiased scene graph generation (SGG) algorithm referred to as Skew Class-Balanced Re-Weighting (SCR) is proposed for considering the unbiased predicate prediction caused by the long-tailed distribution. The prior works focus mainly on alleviating the deteriorating performances of the minority predicate predictions, showing drastic dropping recall scores, i.e., losing the majority predicate performances. It has not yet correctly analyzed the trade-off between majority and minority predicate performances in the limited SGG datasets. In this paper, to alleviate the issue, the Skew Class-Balanced Re-Weighting (SCR) loss function is considered for the unbiased SGG models. Leveraged by the skewness of biased predicate predictions, the SCR estimates the target predicate weight coefficient and then re-weights more to the biased predicates for better trading-off between the majority predicates and the minority ones. Extensive experiments conducted on the standard Visual Genome dataset and Open Image V4 and V6 show the performances and generality of the SCR with the traditional SGG models. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Research on Crack Width Measurement Based on Binocular Vision and Improved DeeplabV3+
Appl. Sci. 2023, 13(5), 2752; https://doi.org/10.3390/app13052752 - 21 Feb 2023
Cited by 1 | Viewed by 1468
Abstract
Crack width is the main manifestation of concrete material deterioration. To measure the crack information quickly and conveniently, a non-contact measurement method of concrete planar structure crack based on binocular vision is proposed. Firstly, an improved DeeplabV3+ semantic segmentation model is proposed, which [...] Read more.
Crack width is the main manifestation of concrete material deterioration. To measure the crack information quickly and conveniently, a non-contact measurement method of concrete planar structure crack based on binocular vision is proposed. Firstly, an improved DeeplabV3+ semantic segmentation model is proposed, which uses L-MobileNetV2 as the backbone feature extraction network, adopts IDAM structure to extract high-level semantic information, introduces ECA attention mechanism, and optimizes the loss function of the model to achieve high-precision segmentation of crack areas. Secondly, the plane space coordinate equation of the concrete structure was constructed based on the principle of binocular vision and SIFT feature point matching, and the crack width was calculated by combining the segmented image. Finally, to verify the performance of the above method, a measurement test platform was built. The experimental results show that the RMSE of the crack measurement by using the algorithm is less than 0.2 mm, and the error rate is less than 4%, which has stable accuracy in different measurement angles. It solves the problem of fast and convenient measurement of the crack width of concrete planar structures in an outdoor environment. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Hyperspectral Imaging Sorting of Refurbishment Plasterboard Waste
Appl. Sci. 2023, 13(4), 2413; https://doi.org/10.3390/app13042413 - 13 Feb 2023
Viewed by 1083
Abstract
Post-consumer plasterboard waste sorting is carried out manually by operators, which is time-consuming and costly. In this work, a laboratory-scale hyperspectral imaging (HSI) system was evaluated for automatic refurbishment plasterboard waste sorting. The HSI system was trained to differentiate between plasterboard (gypsum core [...] Read more.
Post-consumer plasterboard waste sorting is carried out manually by operators, which is time-consuming and costly. In this work, a laboratory-scale hyperspectral imaging (HSI) system was evaluated for automatic refurbishment plasterboard waste sorting. The HSI system was trained to differentiate between plasterboard (gypsum core between two lining papers) and contaminants (e.g., wood, plastics, mortar or ceramics). Segregated plasterboard samples were crushed and sieved to obtain gypsum particles of less than 250 microns, which were characterized through X-ray fluorescence to determine their chemical purity levels. Refurbishment plasterboard waste particles <10 mm in size were not processed with the HSI-based sorting system because the manual processing of these particles at a laboratory scale would have been very time-consuming. Gypsum from refurbishment plasterboard waste particles <10 mm in size contained very small amounts of undesirable chemical impurities for plasterboard manufacturing (chloride, magnesium, sodium, potassium and phosphorus salts), and its chemical purity was similar to that of the gypsum from HSI-sorted plasterboard (96 wt%). The combination of unprocessed refurbishment plasterboard waste <10 mm with HSI-sorted plasterboard ≥10 mm in size led to a plasterboard recovery yield >98 wt%. These findings underpin the potential implementation of an industrial-scale HSI system for plasterboard waste sorting. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Infrared Macrothermoscopy Patterns—A New Category of Dermoscopy
J. Imaging 2023, 9(2), 36; https://doi.org/10.3390/jimaging9020036 - 06 Feb 2023
Viewed by 1278
Abstract
(1) Background: The authors developed a new non-invasive dermatological infrared macroimaging analysis technique (MacroIR) that evaluates microvascular, inflammatory, and metabolic changes that may be dermoscopy complimentary, by analyzing different skin and mucosal lesions in a combined way—naked eye, polarized light dermatoscopy (PLD), and [...] Read more.
(1) Background: The authors developed a new non-invasive dermatological infrared macroimaging analysis technique (MacroIR) that evaluates microvascular, inflammatory, and metabolic changes that may be dermoscopy complimentary, by analyzing different skin and mucosal lesions in a combined way—naked eye, polarized light dermatoscopy (PLD), and MacroIR—and comparing results; (2) Methods: ten cases were evaluated using a smartphone coupled with a dermatoscope and a macro lens integrated far-infrared transducer into specific software to capture and organize high-resolution images in different electromagnetic spectra, and then analyzed by a dermatologist; (3) Results: It was possible to identify and compare structures found in two dermoscopic forms. Visual anatomical changes were correlated with MacroIR and aided skin surface dermatological analysis, presenting studied area microvascular, inflammatory, and metabolic data. All MacroIR images correlated with PLD, naked eye examination, and histopathological findings; (4) Conclusion: MacroIR and clinic dermatologist concordance rates were comparable for all dermatological conditions in this study. MacroIR imaging is a promising method that can improve dermatological diseases diagnosis. The observations are preliminary and require further evaluation in larger studies. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
On Deceiving Malware Classification with Section Injection
Mach. Learn. Knowl. Extr. 2023, 5(1), 144-168; https://doi.org/10.3390/make5010009 - 16 Jan 2023
Cited by 1 | Viewed by 1762
Abstract
We investigate how to modify executable files to deceive malware classification systems. This work’s main contribution is a methodology to inject bytes across a malware file randomly and use it both as an attack to decrease classification accuracy but also as a defensive [...] Read more.
We investigate how to modify executable files to deceive malware classification systems. This work’s main contribution is a methodology to inject bytes across a malware file randomly and use it both as an attack to decrease classification accuracy but also as a defensive method, augmenting the data available for training. It respects the operating system file format to make sure the malware will still execute after our injection and will not change its behavior. We reproduced five state-of-the-art malware classification approaches to evaluate our injection scheme: one based on Global Image Descriptor (GIST) + K-Nearest-Neighbors (KNN), three Convolutional Neural Network (CNN) variations and one Gated CNN. We performed our experiments on a public dataset with 9339 malware samples from 25 different families. Our results show that a mere increase of 7% in the malware size causes an accuracy drop between 25% and 40% for malware family classification. They show that an automatic malware classification system may not be as trustworthy as initially reported in the literature. We also evaluate using modified malware alongside the original ones to increase networks robustness against the mentioned attacks. The results show that a combination of reordering malware sections and injecting random data can improve the overall performance of the classification. All the code is publicly available. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Fuzzy Model for the Automatic Recognition of Human Dendritic Cells
J. Imaging 2023, 9(1), 13; https://doi.org/10.3390/jimaging9010013 - 06 Jan 2023
Viewed by 1415
Abstract
Background and objective: Nowadays, foodborne illness is considered one of the most outgrowing diseases in the world, and studies show that its rate increases sharply each year. Foodborne illness is considered a public health problem which is caused by numerous factors, such as [...] Read more.
Background and objective: Nowadays, foodborne illness is considered one of the most outgrowing diseases in the world, and studies show that its rate increases sharply each year. Foodborne illness is considered a public health problem which is caused by numerous factors, such as food intoxications, allergies, intolerances, etc. Mycotoxin is one of the food contaminants which is caused by various species of molds (or fungi), which, in turn, causes intoxications that can be chronic or acute. Thus, even low concentrations of Mycotoxin have a severely harmful impact on human health. It is, therefore, necessary to develop an assessment tool for evaluating their impact on the immune response. Recently, researchers have approved a new method of investigation using human dendritic cells, yet the analysis of the geometric properties of these cells is still visual. Moreover, this type of analysis is subjective, time-consuming, and difficult to perform manually. In this paper, we address the automation of this evaluation using image-processing techniques. Methods: Automatic classification approaches of microscopic dendritic cell images are developed to provide a fast and objective evaluation. The first proposed classifier is based on support vector machines (SVM) and Fisher’s linear discriminant analysis (FLD) method. The FLD–SVM classifier does not provide satisfactory results due to the significant confusion between the inhibited cells on one hand, and the other two cell types (mature and immature) on the other hand. Then, another strategy was suggested to enhance dendritic cell recognition results that are emitted from microscopic images. This strategy is mainly based on fuzzy logic which allows us to consider the uncertainties and inaccuracies of the given data. Results: These proposed methods are tested on a real dataset consisting of 421 images of microscopic dendritic cells, where the fuzzy classification scheme efficiently improved the classification results by successfully classifying 96.77% of the dendritic cells. Conclusions: The fuzzy classification-based tools provide cell maturity and inhibition rates which help biologists evaluate severe health impacts caused by food contaminants. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Synthetic Data Generation for Visual Detection of Flattened PET Bottles
Mach. Learn. Knowl. Extr. 2023, 5(1), 14-28; https://doi.org/10.3390/make5010002 - 29 Dec 2022
Viewed by 1854
Abstract
Polyethylene terephthalate (PET) bottle recycling is a highly automated task; however, manual quality control is required due to inefficiencies of the process. In this paper, we explore automation of the quality control sub-task, namely visual bottle detection, using convolutional neural network (CNN)-based methods [...] Read more.
Polyethylene terephthalate (PET) bottle recycling is a highly automated task; however, manual quality control is required due to inefficiencies of the process. In this paper, we explore automation of the quality control sub-task, namely visual bottle detection, using convolutional neural network (CNN)-based methods and synthetic generation of labelled training data. We propose a synthetic generation pipeline tailored for transparent and crushed PET bottle detection; however, it can also be applied to undeformed bottles if the viewpoint is set from above. We conduct various experiments on CNNs to compare the quality of real and synthetic data, show that synthetic data can reduce the amount of real data required and experiment with the combination of both datasets in multiple ways to obtain the best performance. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Communication
Prediction of Carlson Trophic State Index of Small Inland Water from UAV-Based Multispectral Image Modeling
Appl. Sci. 2023, 13(1), 451; https://doi.org/10.3390/app13010451 - 29 Dec 2022
Cited by 1 | Viewed by 958
Abstract
This paper demonstrates a predictive method for the spatially explicit and periodic in situ monitoring of surface water quality in a small lake using an unmanned aerial vehicle (UAV), equipped with a multi-spectrometer. According to the reflectance of different substances in different spectral [...] Read more.
This paper demonstrates a predictive method for the spatially explicit and periodic in situ monitoring of surface water quality in a small lake using an unmanned aerial vehicle (UAV), equipped with a multi-spectrometer. According to the reflectance of different substances in different spectral bands, multiple regression analyses are used to determine the models that comprise the most relevant band combinations from the multispectral images for the eutrophication assessment of lake water. The relevant eutrophication parameters, such as chlorophyll a, total phosphorus, transparency and dissolved oxygen, are, thus, evaluated and expressed by these regression models. Our experiments find that the predicted eutrophication parameters from the corresponding regression models may generally exhibit good linear results with the coefficients of determination (R2) ranging from 0.7339 to 0.9406. In addition, the result of Carlson trophic state index (CTSI), determined by the on-site water quality sampling data, is found to be rather consistent with the predicted results using the regression model data proposed in this research. The maximal error in CTSI accuracy is as low as 1.4% and the root mean square error (RMSE) is only 0.6624, which reveals the great potential of low-altitude drones equipped with multispectrometers in real-time monitoring and evaluation of the trophic status of a surface water body in an ecosystem. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Multiscale Cascaded Attention Network for Saliency Detection Based on ResNet
Sensors 2022, 22(24), 9950; https://doi.org/10.3390/s22249950 - 16 Dec 2022
Cited by 2 | Viewed by 2218
Abstract
Saliency detection is a key research topic in the field of computer vision. Humans can be accurately and quickly mesmerized by an area of interest in complex and changing scenes through the visual perception area of the brain. Although existing saliency-detection methods can [...] Read more.
Saliency detection is a key research topic in the field of computer vision. Humans can be accurately and quickly mesmerized by an area of interest in complex and changing scenes through the visual perception area of the brain. Although existing saliency-detection methods can achieve competent performance, they have deficiencies such as unclear margins of salient objects and the interference of background information on the saliency map. In this study, to improve the defects during saliency detection, a multiscale cascaded attention network was designed based on ResNet34. Different from the typical U-shaped encoding–decoding architecture, we devised a contextual feature extraction module to enhance the advanced semantic feature extraction. Specifically, a multiscale cascade block (MCB) and a lightweight channel attention (CA) module were added between the encoding and decoding networks for optimization. To address the blur edge issue, which is neglected by many previous approaches, we adopted the edge thinning module to carry out a deeper edge-thinning process on the output layer image. The experimental results illustrate that this method can achieve competitive saliency-detection performance, and the accuracy and recall rate are improved compared with those of other representative methods. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Analysis of Edge Method Accuracy and Practical Multidirectional Modulation Transfer Function Measurement
Appl. Sci. 2022, 12(24), 12748; https://doi.org/10.3390/app122412748 - 12 Dec 2022
Cited by 1 | Viewed by 1508
Abstract
The modulation transfer function (MTF) is commonly used as an imaging quality criterion reflecting the spatial resolution capability of imaging systems. The modified edge methods based on ISO Standard 12233 are widely used in MTF measurement for various imaging fields with high confidence. [...] Read more.
The modulation transfer function (MTF) is commonly used as an imaging quality criterion reflecting the spatial resolution capability of imaging systems. The modified edge methods based on ISO Standard 12233 are widely used in MTF measurement for various imaging fields with high confidence. However, there are two problems in the existing edge methods which limit the application in remote sensing (RS) field with complicated image quality and usually uncontrollable edge angle: a near-horizontal or near-vertical “small tilt angle straight (STAS)” edge is required, and the MTF measurement results show low robustness and non-uniqueness. In this study, the influence of edge angle, oversampling rate (OSR), region of interest (ROI), edge contrast, and random noise on the edge method accuracy is quantitatively analyzed, and a practical multidirectional MTF measurement edge method is proposed based on the above analysis results. The modified edge method adaptively determines the optimal OSR according to edge angle and combines multiple measurement states, such as multi-ROI extraction and multi-phase binning, to improve the robustness, accuracy, and practicality of the edge method. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
MAGNet: A Camouflaged Object Detection Network Simulating the Observation Effect of a Magnifier
Entropy 2022, 24(12), 1804; https://doi.org/10.3390/e24121804 - 09 Dec 2022
Cited by 1 | Viewed by 2513
Abstract
In recent years, protecting important objects by simulating animal camouflage has been widely employed in many fields. Therefore, camouflaged object detection (COD) technology has emerged. COD is more difficult to achieve than traditional object detection techniques due to the high degree of fusion [...] Read more.
In recent years, protecting important objects by simulating animal camouflage has been widely employed in many fields. Therefore, camouflaged object detection (COD) technology has emerged. COD is more difficult to achieve than traditional object detection techniques due to the high degree of fusion of objects camouflaged with the background. In this paper, we strive to more accurately and efficiently identify camouflaged objects. Inspired by the use of magnifiers to search for hidden objects in pictures, we propose a COD network that simulates the observation effect of a magnifier called the MAGnifier Network (MAGNet). Specifically, our MAGNet contains two parallel modules: the ergodic magnification module (EMM) and the attention focus module (AFM). The EMM is designed to mimic the process of a magnifier enlarging an image, and AFM is used to simulate the observation process in which human attention is highly focused on a particular region. The two sets of output camouflaged object maps were merged to simulate the observation of an object by a magnifier. In addition, a weighted key point area perception loss function, which is more applicable to COD, was designed based on two modules to give greater attention to the camouflaged object. Extensive experiments demonstrate that compared with 19 cutting-edge detection models, MAGNet can achieve the best comprehensive effect on eight evaluation metrics in the public COD dataset. Additionally, compared to other COD methods, MAGNet has lower computational complexity and faster segmentation. We also validated the model’s generalization ability on a military camouflaged object dataset constructed in-house. Finally, we experimentally explored some extended applications of COD. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Real-Time Video Synopsis via Dynamic and Adaptive Online Tube Resizing
Sensors 2022, 22(23), 9046; https://doi.org/10.3390/s22239046 - 22 Nov 2022
Viewed by 886
Abstract
Nowadays, with the increased numbers of video cameras, the amount of recorded video is growing. Efficient video browsing and retrieval are critical issues when considering the amount of raw video data to be condensed. Activity-based video synopsis is a popular approach to solving [...] Read more.
Nowadays, with the increased numbers of video cameras, the amount of recorded video is growing. Efficient video browsing and retrieval are critical issues when considering the amount of raw video data to be condensed. Activity-based video synopsis is a popular approach to solving the video condensation problem. However, conventional synopsis methods always consists of complicated and pairwise energy terms that involve a time-consuming optimization problem. In this paper, we propose a simple online video synopsis framework in which the number of collisions of objects is classified first. Different optimization strategies are applied according to different collision situations to maintain a balance among the computational cost, condensation ratio, and collision cost. Secondly, tube-resizing coefficients that are dynamic in different frames are adaptively assigned to a newly generated tube. Therefore, a suitable mapping result can be obtained in order to represent the proper size of the activity in each frame of the synopsis video. The maximum number of activities can be displayed in one frame with minimal collisions. Finally, in order to remove motion anti-facts and improve the visual quality of the condensed video, a smooth term is introduced to constrain the resizing coefficients. Experimental results on extensive videos validate the efficiency of the proposed method. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Recognition of Continuous Face Occlusion Based on Block Permutation by Using Linear Regression Classification
Appl. Sci. 2022, 12(23), 11885; https://doi.org/10.3390/app122311885 - 22 Nov 2022
Viewed by 942
Abstract
Face occlusion is still a key issue in the study of face recognition. Continuous occlusion affects the overall features and contour structure of a face, which brings significant challenges to face recognition. In previous studies, although the Representation-Based Classification Method (RBCM) can better [...] Read more.
Face occlusion is still a key issue in the study of face recognition. Continuous occlusion affects the overall features and contour structure of a face, which brings significant challenges to face recognition. In previous studies, although the Representation-Based Classification Method (RBCM) can better capture the differences in different categories of faces and accurately identify human face images with changes in light and facial expressions, it is easily affected by continuous occlusion. For face recognition, there is a situation where face error recognition occurs. The RBCM method frequently learns to cover the characteristics of face recognition and then handle face error recognition. Therefore, the elimination of occlusion information from the image is necessary to improve the robustness of such models. The Block Permutation Linear Regression Classification (BPLRC) method proposed in this paper includes image block permutation and Linear Regression Classification (LRC). The LRC algorithm belongs to the category of nearest subspace classification and uses the Euclidean distance as a metric to classify images. The LRC algorithm is based on one of the classification methods that is susceptible to outliers. Therefore, block permutation was used with the aim of establishing an image set that does not contain much occlusion information and constructing a robust linear regression model. The BPLRC method first modulates all the images and then lists the schemes that arrange all segments, enters the image features of various schemes into linear models, and classifies the result according to the minimum residual of the person’s face image and reconstruction image. Compared to several state-of-the-art algorithms, the proposed method effectively solves the continuous occlusion problem for the Extended Yale B, ORL, and AR datasets. The proposed method recognizes the AR data concentration scarf to cover the accuracy of human face images to 93.67%. The dataset recognition speed is 0.094 s/piece. The arrangement method can be combined not only with the LRC algorithm, but also other algorithms with weak robustness. Due to the increase in the number of blocks and the increase in the calculation index of block arrangement methods, it is necessary to explore reasonable iteration methods in the future, quickly find the optimal or sub-best arrangement scheme, and reduce the calculation of the proposed method. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Pattern Classification Using Quantized Neural Networks for FPGA-Based Low-Power IoT Devices
Sensors 2022, 22(22), 8694; https://doi.org/10.3390/s22228694 - 10 Nov 2022
Cited by 1 | Viewed by 1409
Abstract
With the recent growth of the Internet of Things (IoT) and the demand for faster computation, quantized neural networks (QNNs) or QNN-enabled IoT can offer better performance than conventional convolution neural networks (CNNs). With the aim of reducing memory access costs and increasing [...] Read more.
With the recent growth of the Internet of Things (IoT) and the demand for faster computation, quantized neural networks (QNNs) or QNN-enabled IoT can offer better performance than conventional convolution neural networks (CNNs). With the aim of reducing memory access costs and increasing the computation efficiency, QNN-enabled devices are expected to transform numerous industrial applications with lower processing latency and power consumption. Another form of QNN is the binarized neural network (BNN), which has 2 bits of quantized levels. In this paper, CNN-, QNN-, and BNN-based pattern recognition techniques are implemented and analyzed on an FPGA. The FPGA hardware acts as an IoT device due to connectivity with the cloud, and QNN and BNN are considered to offer better performance in terms of low power and low resource use on hardware platforms. The CNN and QNN implementation and their comparative analysis are analyzed based on their accuracy, weight bit error, RoC curve, and execution speed. The paper also discusses various approaches that can be deployed for optimizing various CNN and QNN models with additionally available tools. The work is performed on the Xilinx Zynq 7020 series Pynq Z2 board, which serves as our FPGA-based low-power IoT device. The MNIST and CIFAR-10 databases are considered for simulation and experimentation. The work shows that the accuracy is 95.5% and 79.22% for the MNIST and CIFAR-10 databases, respectively, for full precision (32-bit), and the execution time is 5.8 ms and 18 ms for the MNIST and CIFAR-10 databases, respectively, for full precision (32-bit). Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery
Entropy 2022, 24(11), 1619; https://doi.org/10.3390/e24111619 - 06 Nov 2022
Cited by 1 | Viewed by 1411
Abstract
Convolutional neural networks have long dominated semantic segmentation of very-high-resolution (VHR) remote sensing (RS) images. However, restricted by the fixed receptive field of convolution operation, convolution-based models cannot directly obtain contextual information. Meanwhile, Swin Transformer possesses great potential in modeling long-range dependencies. Nevertheless, [...] Read more.
Convolutional neural networks have long dominated semantic segmentation of very-high-resolution (VHR) remote sensing (RS) images. However, restricted by the fixed receptive field of convolution operation, convolution-based models cannot directly obtain contextual information. Meanwhile, Swin Transformer possesses great potential in modeling long-range dependencies. Nevertheless, Swin Transformer breaks images into patches that are single-dimension sequences without considering the position loss problem inside patches. Therefore, Inspired by Swin Transformer and Unet, we propose SUD-Net (Swin transformer-based Unet-like with Dynamic attention pyramid head Network), a new U-shaped architecture composed of Swin Transformer blocks and convolution layers simultaneously through a dual encoder and an upsampling decoder with a Dynamic Attention Pyramid Head (DAPH) attached to the backbone. First, we propose a dual encoder structure combining Swin Transformer blocks and reslayers in reverse order to complement global semantics with detailed representations. Second, aiming at the spatial loss problem inside each patch, we design a Multi-Path Fusion Model (MPFM) with specially devised Patch Attention (PA) to encode position information of patches and adaptively fuse features of different scales through attention mechanisms. Third, a Dynamic Attention Pyramid Head is constructed with deformable convolution to dynamically aggregate effective and important semantic information. SUD-Net achieves exceptional results on ISPRS Potsdam and Vaihingen datasets with 92.51%mF1, 86.4%mIoU, 92.98%OA, 89.49%mF1, 81.26%mIoU, and 90.95%OA, respectively. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Monocular Camera Viewpoint-Invariant Vehicular Traffic Segmentation and Classification Utilizing Small Datasets
Sensors 2022, 22(21), 8121; https://doi.org/10.3390/s22218121 - 24 Oct 2022
Cited by 2 | Viewed by 1370
Abstract
The work presented here develops a computer vision framework that is view angle independent for vehicle segmentation and classification from roadway traffic systems installed by the Virginia Department of Transportation (VDOT). An automated technique for extracting a region of interest is discussed to [...] Read more.
The work presented here develops a computer vision framework that is view angle independent for vehicle segmentation and classification from roadway traffic systems installed by the Virginia Department of Transportation (VDOT). An automated technique for extracting a region of interest is discussed to speed up the processing. The VDOT traffic videos are analyzed for vehicle segmentation using an improved robust low-rank matrix decomposition technique. It presents a new and effective thresholding method that improves segmentation accuracy and simultaneously speeds up the segmentation processing. Size and shape physical descriptors from morphological properties and textural features from the Histogram of Oriented Gradients (HOG) are extracted from the segmented traffic. Furthermore, a multi-class support vector machine classifier is employed to categorize different traffic vehicle types, including passenger cars, passenger trucks, motorcycles, buses, and small and large utility trucks. It handles multiple vehicle detections through an iterative k-means clustering over-segmentation process. The proposed algorithm reduced the processed data by an average of 40%. Compared to recent techniques, it showed an average improvement of 15% in segmentation accuracy, and it is 55% faster than the compared segmentation techniques on average. Moreover, a comparative analysis of 23 different deep learning architectures is presented. The resulting algorithm outperformed the compared deep learning algorithms for the quality of vehicle classification accuracy. Furthermore, the timing analysis showed that it could operate in real-time scenarios. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
A Hyper-Chaotically Encrypted Robust Digital Image Watermarking Method with Large Capacity Using Compress Sensing on a Hybrid Domain
Entropy 2022, 24(10), 1486; https://doi.org/10.3390/e24101486 - 18 Oct 2022
Cited by 3 | Viewed by 1421
Abstract
The digital watermarking technique is a quite promising technique for both image copyright protection and secure transmission. However, many existing techniques are not as one might have expected for robustness and capacity simultaneously. In this paper, we propose a robust semi-blind image watermarking [...] Read more.
The digital watermarking technique is a quite promising technique for both image copyright protection and secure transmission. However, many existing techniques are not as one might have expected for robustness and capacity simultaneously. In this paper, we propose a robust semi-blind image watermarking scheme with a high capacity. Firstly, we perform a discrete wavelet transformation (DWT) transformation on the carrier image. Then, the watermark images are compressed via a compressive sampling technique for saving storage space. Thirdly, a Combination of One and Two-Dimensional Chaotic Map based on the Tent and Logistic map (TL-COTDCM) is used to scramble the compressed watermark image with high security and dramatically reduce the false positive problem (FPP). Finally, a singular value decomposition (SVD) component is used to embed into the decomposed carrier image to finish the embedding process. With this scheme, eight 256×256 grayscale watermark images are perfectly embedded into a 512×512 carrier image, the capacity of which is eight times over that of the existing watermark techniques on average. The scheme has been tested through several common attacks on high strength, and the experiment results show the superiority of our method via the two most used evaluation indicators, normalized correlation coefficient (NCC) values and the peak signal-to-noise ratio (PSNR). Our method outperforms the state-of-the-art in the aspects of robustness, security, and capacity of digital watermarking, which exhibits great potential in multimedia application in the immediate future. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Decoupled Early Time Series Classification Using Varied-Length Feature Augmentation and Gradient Projection Technique
Entropy 2022, 24(10), 1477; https://doi.org/10.3390/e24101477 - 17 Oct 2022
Viewed by 1147
Abstract
Early time series classification (ETSC) is crucial for real-world time-sensitive applications. This task aims to classify time series data with least timestamps at the desired accuracy. Early methods used fixed-length time series to train the deep models, and then quit the classification process [...] Read more.
Early time series classification (ETSC) is crucial for real-world time-sensitive applications. This task aims to classify time series data with least timestamps at the desired accuracy. Early methods used fixed-length time series to train the deep models, and then quit the classification process by setting specific exiting rules. However, these methods may not adapt to the length variation of flow data in ETSC. Recent advances have proposed end-to-end frameworks, which leveraged the Recurrent Neural Networks to handle the varied-length problems, and the exiting subnets for early quitting. Unfortunately, the conflict between the classification and early exiting objectives is not fully considered. To handle these problems, we decouple the ETSC task into the varied-length TSC task and the early exiting task. First, to enhance the adaptive capacity of classification subnets to the data length variation, a feature augmentation module based on random length truncation is proposed. Then, to handle the conflict between classification and early exiting, the gradients of these two tasks are projected into a unified direction. Experimental results on 12 public datasets demonstrate the promising performance of our proposed method. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
A Fast Adaptive Multi-Scale Kernel Correlation Filter Tracker for Rigid Object
Sensors 2022, 22(20), 7812; https://doi.org/10.3390/s22207812 - 14 Oct 2022
Cited by 1 | Viewed by 1119
Abstract
The efficient and accurate tracking of a target in complex scenes has always been one of the challenges to tackle. At present, the most effective tracking algorithms are basically neural network models based on deep learning. Although such algorithms have high tracking accuracy, [...] Read more.
The efficient and accurate tracking of a target in complex scenes has always been one of the challenges to tackle. At present, the most effective tracking algorithms are basically neural network models based on deep learning. Although such algorithms have high tracking accuracy, the huge number of parameters and computations in the network models makes it difficult for such algorithms to meet the real-time requirements under limited hardware conditions, such as embedded platforms with small size, low power consumption and limited computing power. Tracking algorithms based on a kernel correlation filter are well-known and widely applied because of their high performance and speed, but when the target is in a complex background, it still can not adapt to the target scale change and occlusion, which will lead to template drift. In this paper, a fast multi-scale kernel correlation filter tracker based on adaptive template updating is proposed for common rigid targets. We introduce a simple scale pyramid on the basis of Kernel Correlation Filtering (KCF), which can adapt to the change in target size while ensuring the speed of operation. We propose an adaptive template updater based on the Mean of Cumulative Maximum Response Values (MCMRV) to alleviate the problem of template drift effectively when occlusion occurs. Extensive experiments have demonstrated the effectiveness of our method on various datasets and significantly outperformed other state-of-the-art methods based on a kernel correlation filter. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Instrument Pointer Recognition Scheme Based on Improved CSL Algorithm
Sensors 2022, 22(20), 7800; https://doi.org/10.3390/s22207800 - 14 Oct 2022
Viewed by 1024
Abstract
The traditional pointer instrument recognition scheme is implemented in three steps, which is cumbersome and inefficient. So it is difficult to apply to the industrial production of real-time monitoring. Based on the improvement of the CSL coding method and the setting of the [...] Read more.
The traditional pointer instrument recognition scheme is implemented in three steps, which is cumbersome and inefficient. So it is difficult to apply to the industrial production of real-time monitoring. Based on the improvement of the CSL coding method and the setting of the pre-cache mechanism, an intelligent reading recognition technology of the YOLOv5 pointer instrument is proposed in this paper, which realizes the rapid positioning and reading recognition of the pointer instrument. The problem of angle interaction in rotating target detection is eliminated, the complexity of image preprocessing is avoided, and the problems of poor adaptability of Hough detection are solved in this strategy. The experimental results show that compared with the traditional algorithm, the algorithm in this paper can effectively identify the angle of the pointer instrument, has high detection efficiency and strong adaptability, and has broad application prospects. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Transformer Based Binocular Disparity Prediction with Occlusion Predict and Novel Full Connection Layers
Sensors 2022, 22(19), 7577; https://doi.org/10.3390/s22197577 - 06 Oct 2022
Cited by 2 | Viewed by 1111
Abstract
The depth estimation algorithm based on the convolutional neural network has many limitations and defects by constructing matching cost volume to calculate the disparity: using a limited disparity range, the authentic disparity beyond the predetermined range can not be acquired; Besides, the matching [...] Read more.
The depth estimation algorithm based on the convolutional neural network has many limitations and defects by constructing matching cost volume to calculate the disparity: using a limited disparity range, the authentic disparity beyond the predetermined range can not be acquired; Besides, the matching process lacks constraints on occlusion and matching uniqueness; Also, as a local feature extractor, a convolutional neural network lacks the ability of global context information perception. Aiming at the problems in the matching method of constructing matching cost volume, we propose a disparity prediction algorithm based on Transformer, which specifically comprises the Swin-SPP module for feature extraction based on Swin Transformer, Transformer disparity matching network based on self-attention and cross-attention mechanism, and occlusion prediction sub-network. In addition, we propose a double skip connection fully connected layer to solve the problems of gradient vanishing and explosion during the training process for the Transformer model, thus further enhancing inference accuracy. The proposed model in this paper achieved an EPE (Absolute error) of 0.57 and 0.61, and a 3PE (Percentage error greater than 3 px) of 1.74% and 1.56% on KITTI 2012 and KITTI 2015 datasets, respectively, with an inference time of 0.46 s and parameters as low as only 2.6 M, showing great advantages compared with other algorithms in various evaluation metrics. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
An Improved Adaptive Median Filtering Algorithm for Radar Image Co-Channel Interference Suppression
Sensors 2022, 22(19), 7573; https://doi.org/10.3390/s22197573 - 06 Oct 2022
Cited by 1 | Viewed by 1322
Abstract
In order to increase the accuracy of ocean monitoring, this paper proposes an improved adaptive median filtering algorithm based on the tangential interference ratio to better suppress marine radar co-channel interference. To solve the problem that co-channel interference reduces the accuracy of radar [...] Read more.
In order to increase the accuracy of ocean monitoring, this paper proposes an improved adaptive median filtering algorithm based on the tangential interference ratio to better suppress marine radar co-channel interference. To solve the problem that co-channel interference reduces the accuracy of radar images’ parameter extraction, this paper constructs a tangential interference ratio model based on the improved Laplace operator, which is used to describe the ratio of co-channel interference along the antenna rotation direction in the original radar image. Based on the idea of between-class variance, the tangential interference ratio threshold is selected to divide co-channel interference into high-ratio regions and low ones. Moreover, an improved adaptive median filter is used to process regions of high ratio based on the median of sub-windows, while that of low-ratio regions is processed by the adaptive median filter based on the median of current windows. Radar-measured data from Bohai Bay, China are used for algorithm validation and experimental results show that the proposed filtering algorithm performs better than the adaptive median filtering algorithm. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Detection of Pits by Conjugate Lines: An Algorithm for Segmentation of Overlapping and Adhesion Targets in DE-XRT Sorting Images of Coal and Gangue
Appl. Sci. 2022, 12(19), 9850; https://doi.org/10.3390/app12199850 - 30 Sep 2022
Cited by 1 | Viewed by 1050
Abstract
In lump coal and gangue separation based on photoelectric technology, the prerequisite of using a dual-energy X-ray to locate and identify coal and gangue is to obtain the independent target area. However, with the increase in the input of the sorting system, the [...] Read more.
In lump coal and gangue separation based on photoelectric technology, the prerequisite of using a dual-energy X-ray to locate and identify coal and gangue is to obtain the independent target area. However, with the increase in the input of the sorting system, the actual collected images had adhesion and overlapping targets. This paper proposes a pit point detection and segmentation algorithm to solve the problem of overlapping and adhesion targets. The adhesion forms are divided into open and closed-loop adhesion (OLA and CLA). Then, an open- and closed-loop crossing algorithm (OLCA and CLCA) is proposed. We used the conjugate lines to detect the pit and judge the position and distance of the pixel point relative to the conjugate lines. Then, we set the constraint of the distance of the pixel point and the relatively straight line position to complete the pit detection. Finally, the minimum distance search method was used to obtain the dividing line corresponding to the pit to complete the image segmentation. The experiment results demonstrate that the segmentation accuracy of the overlapping target was 90.73%, and the acceptable segmentation accuracy was 94.15%. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
The Study of the Effectiveness of Advanced Algorithms for Learning Neural Networks Based on FPGA in the Musical Notation Classification Task
Appl. Sci. 2022, 12(19), 9829; https://doi.org/10.3390/app12199829 - 29 Sep 2022
Cited by 2 | Viewed by 1013
Abstract
The work contains an original comparison of selected algorithms using artificial neural network models, such as RBF neural networks, and classic algorithms, approaches that are based on structured programming in the image identification task. The existing studies exploring methods for the problem of [...] Read more.
The work contains an original comparison of selected algorithms using artificial neural network models, such as RBF neural networks, and classic algorithms, approaches that are based on structured programming in the image identification task. The existing studies exploring methods for the problem of classifying musical notation used in this work are still scarce. The research of neural network based and the classical method of image recognition was carried out on the basis of the effectiveness of recognizing the notes presented on the treble staff. In order to carry out the research, the density of the data distribution was modeled by means of the probabilistic principal component analysis, and a simple regression was performed with the use of a radial neural network. The methods of image acquisition and analysis are presented. The obtained results were successively tested in terms of selected quality criteria. The development of this research may contribute to supporting the learning of musical notation by both beginners and blind people. The further development of the experiments can provide a convenient reading of the musical notation with the help of a classification system. The research is also an introduction of new algorithms to further tests and projects in the field of music notation classification. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
An Efficient Retrieval System Framework for Fabrics Based on Fine-Grained Similarity
Entropy 2022, 24(9), 1319; https://doi.org/10.3390/e24091319 - 19 Sep 2022
Viewed by 1228
Abstract
In the context of “double carbon”, as a traditional high energy consumption industry, the textile industry is facing the severe challenges of energy saving and emission reduction. To improve production efficiency in the textile industry, we propose the use of content-based image retrieval [...] Read more.
In the context of “double carbon”, as a traditional high energy consumption industry, the textile industry is facing the severe challenges of energy saving and emission reduction. To improve production efficiency in the textile industry, we propose the use of content-based image retrieval technology to shorten the fabric production cycle. However, fabric retrieval has high requirements for results, which makes it difficult for common retrieval methods to be directly applied to fabric retrieval. This paper presents a novel method for fabric image retrieval. Firstly, we define a fine-grained similarity to measure the similarity between two fabric images. Then, a convolutional neural network with a compact structure and cross-domain connections is designed to narrow the gap between fabric images and similarities. To overcome the problems of probabilistic missing and difficult training in classical hashing, we introduce a variational network module and structural module into the hashing model, which is called DVSH. We employ list-wise learning to perform similarity embedding. The experimental results demonstrate the superiority and efficiency of the proposed hashing model, DVSH. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Supervised Contrastive Learning and Intra-Dataset Adversarial Adaptation for Iris Segmentation
Entropy 2022, 24(9), 1276; https://doi.org/10.3390/e24091276 - 10 Sep 2022
Cited by 4 | Viewed by 1342
Abstract
Precise iris segmentation is a very important part of accurate iris recognition. Traditional iris segmentation methods require complex prior knowledge and pre- and post-processing and have limited accuracy under non-ideal conditions. Deep learning approaches outperform traditional methods. However, the limitation of a small [...] Read more.
Precise iris segmentation is a very important part of accurate iris recognition. Traditional iris segmentation methods require complex prior knowledge and pre- and post-processing and have limited accuracy under non-ideal conditions. Deep learning approaches outperform traditional methods. However, the limitation of a small number of labeled datasets degrades their performance drastically because of the difficulty in collecting and labeling irises. Furthermore, previous approaches ignore the large distribution gap within the non-ideal iris dataset due to illumination, motion blur, squinting eyes, etc. To address these issues, we propose a three-stage training strategy. Firstly, supervised contrastive pretraining is proposed to increase intra-class compactness and inter-class separability to obtain a good pixel classifier under a limited amount of data. Secondly, the entire network is fine-tuned using cross-entropy loss. Thirdly, an intra-dataset adversarial adaptation is proposed, which reduces the intra-dataset gap in the non-ideal situation by aligning the distribution of the hard and easy samples at the pixel class level. Our experiments show that our method improved the segmentation performance and achieved the following encouraging results: 0.44%, 1.03%, 0.66%, 0.41%, and 0.37% in the Nice1 and 96.66%, 98.72%, 93.21%, 94.28%, and 97.41% in the F1 for UBIRIS.V2, IITD, MICHE-I, CASIA-D, and CASIA-T. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1