Research

Jump to: Review

14 pages, 2302 KiB

Open AccessArticle

A Multi-Scale Attention Fusion Network for Retinal Vessel Segmentation

by Shubin Wang, Yuanyuan Chen and Zhang Yi

Appl. Sci. 2024, 14(7), 2955; https://doi.org/10.3390/app14072955 - 31 Mar 2024

Viewed by 447

The structure and function of retinal vessels play a crucial role in diagnosing and treating various ocular and systemic diseases. Therefore, the accurate segmentation of retinal vessels is of paramount importance to assist a clinical diagnosis. U-Net has been highly praised for its [...] Read more.

The structure and function of retinal vessels play a crucial role in diagnosing and treating various ocular and systemic diseases. Therefore, the accurate segmentation of retinal vessels is of paramount importance to assist a clinical diagnosis. U-Net has been highly praised for its outstanding performance in the field of medical image segmentation. However, with the increase in network depth, multiple pooling operations may lead to the problem of crucial information loss. Additionally, handling the insufficient processing of local context features caused by skip connections can affect the accurate segmentation of retinal vessels. To address these problems, we proposed a novel model for retinal vessel segmentation. The proposed model is implemented based on the U-Net architecture, with the addition of two blocks, namely, an MsFE block and MsAF block, between the encoder and decoder at each layer of the U-Net backbone. The MsFE block extracts low-level features from different scales, while the MsAF block performs feature fusion across various scales. Finally, the output of the MsAF block replaces the skip connection in the U-Net backbone. Experimental evaluations on the DRIVE dataset, CHASE_DB1 dataset, and STARE dataset demonstrated that MsAF-UNet exhibited excellent segmentation performance compared with the state-of-the-art methods. Full article

(This article belongs to the Special Issue Application of Machine Vision and Deep Learning Technology)

► Show Figures

Figure 1

23 pages, 21756 KiB

Open AccessArticle

Segmenting Urban Scene Imagery in Real Time Using an Efficient UNet-like Transformer

by Haiqing Xu, Mingyang Yu, Fangliang Zhou and Hongling Yin

Appl. Sci. 2024, 14(5), 1986; https://doi.org/10.3390/app14051986 - 28 Feb 2024

Viewed by 488

Abstract

Semantic segmentation of high-resolution remote sensing urban images is widely used in many fields, such as environmental protection, urban management, and sustainable development. For many years, convolutional neural networks (CNNs) have been a prevalent method in the field, but the convolution operations are [...] Read more.

Semantic segmentation of high-resolution remote sensing urban images is widely used in many fields, such as environmental protection, urban management, and sustainable development. For many years, convolutional neural networks (CNNs) have been a prevalent method in the field, but the convolution operations are deficient in modeling global information due to their local nature. In recent years, the Transformer-based methods have demonstrated their advantages in many domains due to the powerful ability to model global information, such as semantic segmentation, instance segmentation, and object detection. Despite the above advantages, Transformer-based architectures tend to incur significant computational costs, limiting the model’s real-time application potential. To address this problem, we propose a U-shaped network with Transformer as the decoder and CNN as the encoder to segment remote sensing urban scene images. For efficient segmentation, we design a window-based, multi-head, focused linear self-attention (WMFSA) mechanism and further propose the global–local information modeling module (GLIM), which can capture both global and local contexts through a dual-branch structure. Experimenting on four challenging datasets, we demonstrate that our model not only achieves a higher segmentation accuracy compared with other methods but also can obtain competitive speeds to enhance the model’s real-time application potential. Specifically, the mIoU of our method is 68.2% and 52.8% on the UAVid and LoveDA datasets, respectively, while the speed is 114 FPS, with a 1024 × 1024 input on a single 3090 GPU. Full article

(This article belongs to the Special Issue Application of Machine Vision and Deep Learning Technology)

► Show Figures

Figure 1

12 pages, 22729 KiB

Open AccessArticle

nmODE-Unet: A Novel Network for Semantic Segmentation of Medical Images

by Shubin Wang, Yuanyuan Chen and Zhang Yi

Appl. Sci. 2024, 14(1), 411; https://doi.org/10.3390/app14010411 - 02 Jan 2024

Viewed by 880

Abstract

Diabetic retinopathy is a prevalent eye disease that poses a potential risk of blindness. Nevertheless, due to the small size of diabetic retinopathy lesions and the high interclass similarity in terms of location, color, and shape among different lesions, the segmentation task is [...] Read more.

Diabetic retinopathy is a prevalent eye disease that poses a potential risk of blindness. Nevertheless, due to the small size of diabetic retinopathy lesions and the high interclass similarity in terms of location, color, and shape among different lesions, the segmentation task is highly challenging. To address these issues, we proposed a novel framework named nmODE-Unet, which is based on the nmODE (neural memory Ordinary Differential Equation) block and U-net backbone. In nmODE-Unet, the shallow features serve as input to the nmODE block, and the output of the nmODE block is fused with the corresponding deep features. Extensive experiments were conducted on the IDRiD dataset, e_ophtha dataset, and the LGG segmentation dataset, and the results demonstrate that, in comparison to other competing models, nmODE-Unet showcases a superior performance. Full article

(This article belongs to the Special Issue Application of Machine Vision and Deep Learning Technology)

► Show Figures

Figure 1

12 pages, 2615 KiB

Open AccessArticle

Automatic Recognition of Blood Cell Images with Dense Distributions Based on a Faster Region-Based Convolutional Neural Network

by Yun Liu, Yumeng Liu, Menglu Chen, Haoxing Xue, Xiaoqiang Wu, Linqi Shui, Junhong Xing, Xian Wang, Hequn Li and Mingxing Jiao

Appl. Sci. 2023, 13(22), 12412; https://doi.org/10.3390/app132212412 - 16 Nov 2023

Viewed by 629

Abstract

In modern clinical medicine, the important information of red blood cells, such as shape and number, is applied to detect blood diseases. However, the automatic recognition problem of single cells and adherent cells always exists in a densely distributed medical scene, which is [...] Read more.

In modern clinical medicine, the important information of red blood cells, such as shape and number, is applied to detect blood diseases. However, the automatic recognition problem of single cells and adherent cells always exists in a densely distributed medical scene, which is difficult to solve for both the traditional detection algorithms with lower recognition rates and the conventional networks with weaker feature extraction capabilities. In this paper, an automatic recognition method of adherent blood cells with dense distribution is proposed. Based on the Faster R-CNN, the balanced feature pyramid structure, deformable convolution network, and efficient pyramid split attention mechanism are adopted to automatically recognize the blood cells under the conditions of dense distribution, extrusion deformation, adhesion and overlap. In addition, the Align algorithm for region of interest also contributes to improving the accuracy of recognition results. The experimental results show that the mean average precision of cell detection is 0.895, which is 24.5% higher than that of the original network model. Compared with the one-stage mainstream networks, the presented network has a stronger feature extraction capability. The proposed method is suitable for identifying single cells and adherent cells with dense distribution in the actual medical scene. Full article

(This article belongs to the Special Issue Application of Machine Vision and Deep Learning Technology)

► Show Figures

Figure 1

16 pages, 3003 KiB

Open AccessArticle

A Visual-Based Approach for Driver’s Environment Perception and Quantification in Different Weather Conditions

by Longxi Luo, Minghao Liu, Jiahao Mei, Yu Chen and Luzheng Bi

Appl. Sci. 2023, 13(22), 12176; https://doi.org/10.3390/app132212176 - 09 Nov 2023

Cited by 1 | Viewed by 773

Abstract

The decision-making behavior of drivers during the driving process is influenced by various factors, including road conditions, traffic situations, weather conditions, and so on. However, our understanding and quantification of the driving environment are still very limited, which not only increases the risk [...] Read more.

The decision-making behavior of drivers during the driving process is influenced by various factors, including road conditions, traffic situations, weather conditions, and so on. However, our understanding and quantification of the driving environment are still very limited, which not only increases the risk of driving but also hinders the deployment of autonomous vehicles. To address this issue, this study attempts to transform drivers’ visual perception into machine vision perception. Specifically, the study provides a detailed decomposition of the elements constituting weather and proposes three environmental quantification indicators: visibility brightness, visibility clarity, and visibility obstruction rate. These indicators help us to describe and quantify the driving environment more accurately. Based on these indicators, a visual-based environmental quantification method is further proposed to better understand and interpret the driving environment. Additionally, based on drivers’ visual perception, this study extensively analyzes the impact of environmental factors on driver behavior. A cognitive assessment model is established to evaluate drivers’ cognitive abilities in different environments. The effectiveness and accuracy of the model are validated through driver simulation experiments, thereby establishing a communication bridge between the driving environment and driver behavior. This research achievement enables us to better understand the decision-making behavior of drivers in specific environments and provides some references for the development of intelligent driving technology. Full article

(This article belongs to the Special Issue Application of Machine Vision and Deep Learning Technology)

► Show Figures

Figure 1

19 pages, 2158 KiB

Open AccessArticle

U2-Net: A Very-Deep Convolutional Neural Network for Detecting Distracted Drivers

by Nawaf O. Alsrehin, Mohit Gupta, Izzat Alsmadi and Saif Addeen Alrababah

Appl. Sci. 2023, 13(21), 11898; https://doi.org/10.3390/app132111898 - 31 Oct 2023

Cited by 1 | Viewed by 1186

Abstract

In recent years, the number of deaths and injuries resulting from traffic accidents has been increasing dramatically all over the world due to distracted drivers. Thus, a key element in developing intelligent vehicles and safe roads is monitoring driver behaviors. In this paper, [...] Read more.

In recent years, the number of deaths and injuries resulting from traffic accidents has been increasing dramatically all over the world due to distracted drivers. Thus, a key element in developing intelligent vehicles and safe roads is monitoring driver behaviors. In this paper, we modify and extend the U-net convolutional neural network so that it provides deep layers to represent image features and yields more precise classification results. It is the basis of a very deep convolution neural network, called U2-net, to detect distracted drivers. The U2-net model has two paths (contracting and expanding) in addition to a fully-connected dense layer. The contracting path is used to extract the context around the objects to provide better object representation while the symmetric expanding path enables precise localization. The motivation behind this model is that it provides precise object features to provide a better object representation and classification. We used two public datasets: MI-AUC and State Farm, to evaluate the U2 model in detecting distracted driving. The accuracy of U2-net on MI-AUC and State Farm is 98.34 % and 99.64%, respectively. These evaluation results show higher accuracy than achieved by many other state-of-the-art methods. Full article

(This article belongs to the Special Issue Application of Machine Vision and Deep Learning Technology)

► Show Figures

Figure 1

17 pages, 4439 KiB

Open AccessArticle

Exploring the ViDiDetect Tool for Automated Defect Detection in Manufacturing with Machine Vision

by Mateusz Dziubek, Jacek Rysiński and Daniel Jancarczyk

Appl. Sci. 2023, 13(19), 11098; https://doi.org/10.3390/app131911098 - 09 Oct 2023

Viewed by 1119

Abstract

Automated monitoring of cutting tool wear is of paramount importance in the manufacturing industry, as it directly impacts production efficiency and product quality. Traditional manual inspection methods are time-consuming and prone to human error, necessitating the adoption of more advanced techniques. This study [...] Read more.

Automated monitoring of cutting tool wear is of paramount importance in the manufacturing industry, as it directly impacts production efficiency and product quality. Traditional manual inspection methods are time-consuming and prone to human error, necessitating the adoption of more advanced techniques. This study explores the application of ViDiDetect, a deep learning-based defect detection solution, in the context of machine vision for assessing cutting tool wear. By capturing high-resolution images of machining tools and analyzing wear patterns, machine vision systems offer a non-contact and non-destructive approach to tool wear assessment, enabling continuous monitoring without disrupting the machining process. In this research, a smart camera and an illuminator were utilized to capture images of a car suspension knuckle’s machined surface, with a focus on detecting burrs, chips, and tool wear. The study also employed a mask to narrow the region of interest and enhance classification accuracy. This investigation demonstrates the potential of machine vision and ViDiDetect in automating cutting tool wear assessment, ultimately enhancing manufacturing processes’ efficiency and product quality. The project is at the implementation stage in one of the automotive production plants located in southern Poland. Full article

(This article belongs to the Special Issue Application of Machine Vision and Deep Learning Technology)

► Show Figures

Figure 1

15 pages, 1482 KiB

Open AccessArticle

LightSeg: Local Spatial Perception Convolution for Real-Time Semantic Segmentation

by Xiaochun Lei, Jiaming Liang, Zhaoting Gong and Zetao Jiang

Appl. Sci. 2023, 13(14), 8130; https://doi.org/10.3390/app13148130 - 12 Jul 2023

Cited by 1 | Viewed by 1018

Abstract

Semantic segmentation is increasingly being applied on mobile devices due to advancements in mobile chipsets, particularly in low-power consumption scenarios. However, the lightweight design of mobile devices poses limitations on the receptive field, which is crucial for dense prediction problems. Existing approaches have [...] Read more.

Semantic segmentation is increasingly being applied on mobile devices due to advancements in mobile chipsets, particularly in low-power consumption scenarios. However, the lightweight design of mobile devices poses limitations on the receptive field, which is crucial for dense prediction problems. Existing approaches have attempted to balance lightweight designs and high accuracy by downsampling features in the backbone. However, this downsampling may result in the loss of local details at each network stage. To address this challenge, this paper presents a novel solution in the form of a compact and efficient convolutional neural network (CNN) for real-time applications: our proposed model, local spatial perception convolution (LSPConv). Furthermore, the effectiveness of our architecture is demonstrated on the Cityscapes dataset. The results show that our model achieves an impressive balance between accuracy and inference speed. Specifically, our LightSeg, which does not rely on ImageNet pretraining, achieves an mIoU of 76.1 at a speed of 61 FPS on the Cityscapes validation set, utilizing an RTX 2080Ti GPU with mixed precision. Additionally, it achieves a speed of 115.7 FPS on the Jetson NX with int8 precision. Full article

(This article belongs to the Special Issue Application of Machine Vision and Deep Learning Technology)

► Show Figures

Figure 1

17 pages, 5162 KiB

Open AccessArticle

Semantic Segmentation of Packaged and Unpackaged Fresh-Cut Apples Using Deep Learning

by Udith Krishnan Vadakkum Vadukkal, Michela Palumbo and Giovanni Attolico

Appl. Sci. 2023, 13(12), 6969; https://doi.org/10.3390/app13126969 - 09 Jun 2023

Cited by 2 | Viewed by 1550

Abstract

Computer vision systems are often used in industrial quality control to offer fast, objective, non-destructive, and contactless evaluation of fruit. The senescence of fresh-cut apples is strongly related to the browning of the pulp rather than to the properties of the peel. This [...] Read more.

Computer vision systems are often used in industrial quality control to offer fast, objective, non-destructive, and contactless evaluation of fruit. The senescence of fresh-cut apples is strongly related to the browning of the pulp rather than to the properties of the peel. This work addresses the identification and selection of pulp inside images of fresh-cut apples, both packaged and unpackaged; this is a critical step towards a computer vision system that is able to evaluate their quality and internal properties. A DeepLabV3+-based convolutional neural network model (CNN) has been developed for this semantic segmentation task. It has proved to be robust with respect to the similarity of colours between the peel and pulp. Its ability to separate the pulp from the peel and background has been verified on four varieties of apples: Granny Smith (greenish peel), Golden (yellowish peel), Fuji, and Pink Lady (reddish peel). The semantic segmentation achieved an accuracy greater than 99% on all these varieties. The developed approach was able to isolate regions significantly affected by the browning process on both packaged and unpackaged pieces: on these areas, the colour analysis will be studied to evaluate internal quality and senescence of packaged and unpackaged products. Full article

(This article belongs to the Special Issue Application of Machine Vision and Deep Learning Technology)

► Show Figures

Figure 1

20 pages, 6660 KiB

Open AccessArticle

Real-Time Defect Detection for Metal Components: A Fusion of Enhanced Canny–Devernay and YOLOv6 Algorithms

by Hongjun Wang, Xiujin Xu, Yuping Liu, Deda Lu, Bingqiang Liang and Yunchao Tang

Appl. Sci. 2023, 13(12), 6898; https://doi.org/10.3390/app13126898 - 07 Jun 2023

Cited by 2 | Viewed by 1941

Abstract

Due to the presence of numerous surface defects, the inadequate contrast between defective and non-defective regions, and the resemblance between noise and subtle defects, edge detection poses a significant challenge in dimensional error detection, leading to increased dimensional measurement inaccuracies. These issues serve [...] Read more.

Due to the presence of numerous surface defects, the inadequate contrast between defective and non-defective regions, and the resemblance between noise and subtle defects, edge detection poses a significant challenge in dimensional error detection, leading to increased dimensional measurement inaccuracies. These issues serve as major bottlenecks in the domain of automatic detection of high-precision metal parts. To address these challenges, this research proposes a combined approach involving the utilization of the YOLOv6 deep learning network in conjunction with metal lock body parts for the rapid and accurate detection of surface flaws in metal workpieces. Additionally, an enhanced Canny–Devernay sub-pixel edge detection algorithm is employed to determine the size of the lock core bead hole. The methodology is as follows: The data set for surface defect detection is acquired using the labeling software lableImg and subsequently utilized for training the YOLOv6 model to obtain the model weights. For size measurement, the region of interest (ROI) corresponding to the lock cylinder bead hole is first extracted. Subsequently, Gaussian filtering is applied to the ROI, followed by a sub-pixel edge detection using the improved Canny–Devernay algorithm. Finally, the edges are fitted using the least squares method to determine the radius of the fitted circle. The measured value is obtained through size conversion. Experimental detection involves employing the YOLOv6 method to identify surface defects in the lock body workpiece, resulting in an achieved mean Average Precision (

m A P

) value of 0.911. Furthermore, the size of the lock core bead hole is measured using an upgraded technique based on the Canny–Devernay sub-pixel edge detection, yielding an average inaccuracy of less than 0.03 mm. The findings of this research showcase the successful development of a practical method for applying machine vision in the realm of the automatic detection of metal parts. This achievement is accomplished through the exploration of identification methods and size-measuring techniques for common defects found in metal parts. Consequently, the study establishes a valuable framework for effectively utilizing machine vision in the field of metal parts inspection and defect detection. Full article

(This article belongs to the Special Issue Application of Machine Vision and Deep Learning Technology)

► Show Figures

Figure 1

14 pages, 4371 KiB

Open AccessArticle

Lossless Compression of Large Aperture Static Imaging Spectrometer Data

by Lu Yu, Hongbo Li, Jing Li and Wei Li

Appl. Sci. 2023, 13(9), 5632; https://doi.org/10.3390/app13095632 - 03 May 2023

Viewed by 941

Abstract

The large-aperture static imaging spectrometer (LASIS) is an interference spectrometer with high device stability, high throughput, a wide spectral range, and a high spectral resolution. One frame image of the original data cube acquired by the LASIS shows the image superimposed with interference [...] Read more.

The large-aperture static imaging spectrometer (LASIS) is an interference spectrometer with high device stability, high throughput, a wide spectral range, and a high spectral resolution. One frame image of the original data cube acquired by the LASIS shows the image superimposed with interference fringes, which is distinctly different from traditional hyperspectral images. For compression studies using this new type of data, a lossless compression scheme that combines a novel data rearrange method and the lossless multispectral and hyperspectral image compression standard CCSDS-123 is presented. In the rearrange approach, the LASIS data cube is rearranged such that the interference information overlapped on the image can be separated, and the results are then processed using the CCSDS-123 standard. Then, several experiments are conducted to investigate the performance of the rearrange method and examine the impact of different CCSDS-123 parameter settings for the LASIS. The experimental results indicate that the proposed scheme provides a 32.9% higher ratio than traditional rearrange methods. Moreover, an adequate parameter combination for this compression scheme for LASIS is presented, and it yields a 19.6% improvement over the default settings suggested by the standard. Full article

(This article belongs to the Special Issue Application of Machine Vision and Deep Learning Technology)

► Show Figures

Figure 1

18 pages, 13737 KiB

Open AccessArticle

TSDNet: A New Multiscale Texture Surface Defect Detection Model

by Min Dong, Dezhen Li, Kaixiang Li and Junpeng Xu

Appl. Sci. 2023, 13(5), 3289; https://doi.org/10.3390/app13053289 - 04 Mar 2023

Viewed by 1635

Abstract

Industrial defect detection methods based on deep learning can reduce the cost of traditional manual quality inspection, improve the accuracy and efficiency of detection, and are widely used in industrial fields. Traditional computer defect detection methods focus on manual features and require a [...] Read more.

Industrial defect detection methods based on deep learning can reduce the cost of traditional manual quality inspection, improve the accuracy and efficiency of detection, and are widely used in industrial fields. Traditional computer defect detection methods focus on manual features and require a large amount of defect data, which has some limitations. This paper proposes a texture surface defect detection method based on convolutional neural network and wavelet analysis: TSDNet. The approach combines wavelet analysis with patch extraction, which can detect and locate many defects in a complex texture background; a patch extraction method based on random windows is proposed, which can quickly and effectively extract defective patches; and a judgment strategy based on a sliding window is proposed to improve the robustness of CNN. Our method can achieve excellent detection accuracy on DAGM 2007, a micro-surface defect database and KolektorSDD dataset, and can find the defect location accurately. The results show that in the complex texture background, the method can obtain high defect detection accuracy with only a small amount of training data and can accurately locate the defect position. Full article

(This article belongs to the Special Issue Application of Machine Vision and Deep Learning Technology)

► Show Figures

Figure 1

20 pages, 8014 KiB

Open AccessArticle

Research on Automatic Error Data Recognition Method for Structured Light System Based on Residual Neural Network

by Aozhuo Ding, Qi Xue, Xulong Ding, Xiaohong Sun, Xiaonan Yang and Huiying Ye

Appl. Sci. 2023, 13(5), 2920; https://doi.org/10.3390/app13052920 - 24 Feb 2023

Viewed by 1122

Abstract

In a structured light system, the positioning accuracy of the stripe is one of the determinants of measurement accuracy. However, the quality of the structured light stripe is reduced by noise, object shape, color, etc. The positioning accuracy of the low-quality stripe center [...] Read more.

In a structured light system, the positioning accuracy of the stripe is one of the determinants of measurement accuracy. However, the quality of the structured light stripe is reduced by noise, object shape, color, etc. The positioning accuracy of the low-quality stripe center will be decreased, and the large error will be introduced into measurement results, which can only be recognized by a human. To address this problem, this paper proposes a method to identify data with relatively large errors in 3D measurement results by evaluating the quality of the grayscale distribution of stripes. In this method, the undegraded and degraded stripe images are captured. Then, the residual neural network is trained using the grayscale distribution of the two types of stripes. The captured stripes are classified by the trained model. Finally, the data corresponding to the degraded stripes, which correspond to large errors in the data, can be identified according to the classified results. The experiment shows that the algorithm proposed in this paper can effectively identify the data with large errors automatically. Full article

(This article belongs to the Special Issue Application of Machine Vision and Deep Learning Technology)

► Show Figures

Figure 1

Review

Jump to: Research

17 pages, 1044 KiB

Open AccessReview

Quick Overview of Face Swap Deep Fakes

by Tomasz Walczyna and Zbigniew Piotrowski

Appl. Sci. 2023, 13(11), 6711; https://doi.org/10.3390/app13116711 - 31 May 2023

Cited by 2 | Viewed by 7128

Abstract

Deep Fake technology has developed rapidly in its generation and detection in recent years. Researchers in both fields are outpacing each other in their axes achievements. The works use, among other methods, autoencoders, generative adversarial networks, or other algorithms to create fake content [...] Read more.

Deep Fake technology has developed rapidly in its generation and detection in recent years. Researchers in both fields are outpacing each other in their axes achievements. The works use, among other methods, autoencoders, generative adversarial networks, or other algorithms to create fake content that is resistant to detection by algorithms or the human eye. Among the ever-increasing number of emerging works, a few can be singled out that, in their solutions and robustness of detection, contribute significantly to the field. Despite the advancement of emerging generative algorithms, the fields are still left for further research. This paper will briefly introduce the fundamentals of some the latest Face Swap Deep Fake algorithms. Full article

(This article belongs to the Special Issue Application of Machine Vision and Deep Learning Technology)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Application of Machine Vision and Deep Learning Technology

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (14 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI