Research

19 pages, 2306 KiB

Open AccessArticle

Enhanced Knowledge Distillation for Advanced Recognition of Chinese Herbal Medicine

by Lu Zheng, Wenhan Long, Junchao Yi, Lu Liu and Ke Xu

Sensors 2024, 24(5), 1559; https://doi.org/10.3390/s24051559 - 28 Feb 2024

Viewed by 510

The identification and classification of traditional Chinese herbal medicines demand significant time and expertise. We propose the dual-teacher supervised decay (DTSD) approach, an enhancement for Chinese herbal medicine recognition utilizing a refined knowledge distillation model. The DTSD method refines output soft labels, adapts [...] Read more.

The identification and classification of traditional Chinese herbal medicines demand significant time and expertise. We propose the dual-teacher supervised decay (DTSD) approach, an enhancement for Chinese herbal medicine recognition utilizing a refined knowledge distillation model. The DTSD method refines output soft labels, adapts attenuation parameters, and employs a dynamic combination loss in the teacher model. Implemented on the lightweight MobileNet_v3 network, the methodology is deployed successfully in a mobile application. Experimental results reveal that incorporating the exponential warmup learning rate reduction strategy during training optimizes the knowledge distillation model, achieving an average classification accuracy of 98.60% for 10 types of Chinese herbal medicine images. The model boasts an average detection time of 0.0172 s per image, with a compressed size of 10 MB. Comparative experiments demonstrate the superior performance of our refined model over DenseNet121, ResNet50_vd, Xception65, and EfficientNetB1. This refined model not only introduces an approach to Chinese herbal medicine image recognition but also provides a practical solution for lightweight models in mobile applications. Full article

(This article belongs to the Special Issue Object Detection Based on Vision Sensors and Neural Network)

► Show Figures

Figure 1

20 pages, 868 KiB

Open AccessArticle

Simple Conditional Spatial Query Mask Deformable Detection Transformer: A Detection Approach for Multi-Style Strokes of Chinese Characters

by Tian Zhou, Wu Xie, Huimin Zhang and Yong Fan

Sensors 2024, 24(3), 931; https://doi.org/10.3390/s24030931 - 31 Jan 2024

Viewed by 525

Abstract

In the Chinese character writing task performed by robotic arms, the stroke category and position information should be extracted through object detection. Detection algorithms based on predefined anchor frames have difficulty resolving the differences among the many different styles of Chinese character strokes. [...] Read more.

In the Chinese character writing task performed by robotic arms, the stroke category and position information should be extracted through object detection. Detection algorithms based on predefined anchor frames have difficulty resolving the differences among the many different styles of Chinese character strokes. Deformable detection transformer (deformable DETR) algorithms without predefined anchor frames result in some invalid sampling points with no contribution to the feature update of the current reference point due to the random sampling of sampling points in the deformable attention module. These processes cause a reduction in the speed of the vector learning stroke features in the detection head. In view of this problem, a new detection method for multi-style strokes of Chinese characters, called the simple conditional spatial query mask deformable DETR (SCSQ-MDD), is proposed in this paper. Firstly, a mask prediction layer is jointly determined using the shallow feature map of the Chinese character image and the query vector of the transformer encoder, which is used to filter the points with actual contributions and resample the points without contributions to address the randomness of the correlation calculation among the reference points. Secondly, by separating the content query and spatial query of the transformer decoder, the dependence of the prediction task on the content embedding is relaxed. Finally, the detection model without predefined anchor frames based on the SCSQ-MDD is constructed. Experiments are conducted using a multi-style Chinese character stroke dataset to evaluate the performance of the SCSQ-MDD. The mean average precision (mAP) value is improved by 3.8% and the mean average recall (mAR) value is improved by 1.1% compared with the deformable DETR in the testing stage, illustrating the effectiveness of the proposed method. Full article

(This article belongs to the Special Issue Object Detection Based on Vision Sensors and Neural Network)

► Show Figures

Figure 1

21 pages, 8098 KiB

Open AccessArticle

The Impact of Noise and Brightness on Object Detection Methods

by José A. Rodríguez-Rodríguez, Ezequiel López-Rubio, Juan A. Ángel-Ruiz and Miguel A. Molina-Cabello

Sensors 2024, 24(3), 821; https://doi.org/10.3390/s24030821 - 26 Jan 2024

Viewed by 734

Abstract

The application of deep learning to image and video processing has become increasingly popular nowadays. Employing well-known pre-trained neural networks for detecting and classifying objects in images is beneficial in a wide range of application fields. However, diverse impediments may degrade the performance [...] Read more.

The application of deep learning to image and video processing has become increasingly popular nowadays. Employing well-known pre-trained neural networks for detecting and classifying objects in images is beneficial in a wide range of application fields. However, diverse impediments may degrade the performance achieved by those neural networks. Particularly, Gaussian noise and brightness, among others, may be presented on images as sensor noise due to the limitations of image acquisition devices. In this work, we study the effect of the most representative noise types and brightness alterations on images in the performance of several state-of-the-art object detectors, such as YOLO or Faster-RCNN. Different experiments have been carried out and the results demonstrate how these adversities deteriorate their performance. Moreover, it is found that the size of objects to be detected is a factor that, together with noise and brightness factors, has a considerable impact on their performance. Full article

(This article belongs to the Special Issue Object Detection Based on Vision Sensors and Neural Network)

► Show Figures

Figure 1

15 pages, 4821 KiB

Open AccessArticle

Lightweight Detection Methods for Insulator Self-Explosion Defects

by Yanping Chen, Chong Deng, Qiang Sun, Zhize Wu, Le Zou, Guanhong Zhang and Wenbo Li

Sensors 2024, 24(1), 290; https://doi.org/10.3390/s24010290 - 03 Jan 2024

Viewed by 721

Abstract

The accurate and efficient detection of defective insulators is an essential prerequisite for ensuring the safety of the power grid in the new generation of intelligent electrical system inspections. Currently, traditional object detection algorithms for detecting defective insulators in images face issues such [...] Read more.

The accurate and efficient detection of defective insulators is an essential prerequisite for ensuring the safety of the power grid in the new generation of intelligent electrical system inspections. Currently, traditional object detection algorithms for detecting defective insulators in images face issues such as excessive parameter size, low accuracy, and slow detection speed. To address the aforementioned issues, this article proposes an insulator defect detection model based on the lightweight Faster R-CNN (Faster Region-based Convolutional Network) model (Faster R-CNN-tiny). First, the Faster R-CNN model’s backbone network is turned into a lightweight version of it by substituting EfficientNet for ResNet (Residual Network), greatly decreasing the model parameters while increasing its detection accuracy. The second step is to employ a feature pyramid to build feature maps with various resolutions for feature fusion, which enables the detection of objects at various scales. In addition, replacing ordinary convolutions in the network model with more efficient depth-wise separable convolutions increases detection speed while slightly reducing network detection accuracy. Transfer learning is introduced, and a training method involving freezing and unfreezing the model is employed to enhance the network’s ability to detect small target defects. The proposed model is validated using the insulator self-exploding defect dataset. The experimental results show that Faster R-CNN-tiny significantly outperforms the Faster R-CNN (ResNet) model in terms of mean average precision (mAP), frames per second (FPS), and number of parameters. Full article

(This article belongs to the Special Issue Object Detection Based on Vision Sensors and Neural Network)

► Show Figures

Figure 1

13 pages, 4703 KiB

Open AccessArticle

Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection

by Shuaihui Wang, Fengyi Jiang and Boqian Xu

Sensors 2023, 23(21), 8802; https://doi.org/10.3390/s23218802 - 29 Oct 2023

Viewed by 966

Abstract

Salient object detection (SOD), which is used to identify the most distinctive object in a given scene, plays an important role in computer vision tasks. Most existing RGB-D SOD methods employ a CNN-based network as the backbone to extract features from RGB and [...] Read more.

Salient object detection (SOD), which is used to identify the most distinctive object in a given scene, plays an important role in computer vision tasks. Most existing RGB-D SOD methods employ a CNN-based network as the backbone to extract features from RGB and depth images; however, the inherent locality of a CNN-based network limits the performance of CNN-based methods. To tackle this issue, we propose a novel Swin Transformer-based edge guidance network (SwinEGNet) for RGB-D SOD in which the Swin Transformer is employed as a powerful feature extractor to capture the global context. An edge-guided cross-modal interaction module is proposed to effectively enhance and fuse features. In particular, we employed the Swin Transformer as the backbone to extract features from RGB images and depth maps. Then, we introduced the edge extraction module (EEM) to extract edge features and the depth enhancement module (DEM) to enhance depth features. Additionally, a cross-modal interaction module (CIM) was used to integrate cross-modal features from global and local contexts. Finally, we employed a cascaded decoder to refine the prediction map in a coarse-to-fine manner. Extensive experiments demonstrated that our SwinEGNet achieved the best performance on the LFSD, NLPR, DES, and NJU2K datasets and achieved comparable performance on the STEREO dataset compared to 14 state-of-the-art methods. Our model achieved better performance compared to SwinNet, with 88.4% parameters and 77.2% FLOPs. Our code will be publicly available. Full article

(This article belongs to the Special Issue Object Detection Based on Vision Sensors and Neural Network)

► Show Figures

Figure 1

13 pages, 3783 KiB

Open AccessArticle

A Novel Approach for Apple Freshness Prediction Based on Gas Sensor Array and Optimized Neural Network

by Wei Wang, Weizhen Yang, Maozhen Li, Zipeng Zhang and Wenbin Du

Sensors 2023, 23(14), 6476; https://doi.org/10.3390/s23146476 - 17 Jul 2023

Viewed by 1298

Abstract

Apple is an important cash crop in China, and the prediction of its freshness can effectively reduce its storage risk and avoid economic loss. The change in the concentration of odor information such as ethylene, carbon dioxide, and ethanol emitted during apple storage [...] Read more.

Apple is an important cash crop in China, and the prediction of its freshness can effectively reduce its storage risk and avoid economic loss. The change in the concentration of odor information such as ethylene, carbon dioxide, and ethanol emitted during apple storage is an important feature to characterize the freshness of apples. In order to accurately predict the freshness level of apples, an electronic nose system based on a gas sensor array and wireless transmission module is designed, and a neural network prediction model using an improved Sparrow Search Algorithm (SSA) based on chaotic sequence (Tent) to optimize Back Propagation (BP) is proposed. The odor information emitted by apples is studied to complete an apple freshness prediction. Furthermore, by fitting the relationship between the prediction coefficient and the input vector, the accuracy benchmark of the prediction model is set, which further improves the prediction accuracy of apple odor information. Compared with the traditional prediction method, the system has the characteristics of simple operation, low cost, reliable results, mobile portability, and it avoids the damage to apples in the process of freshness prediction to realize non-destructive testing. Full article

(This article belongs to the Special Issue Object Detection Based on Vision Sensors and Neural Network)

► Show Figures

Figure 1

28 pages, 3115 KiB

Open AccessArticle

CoSOV1Net: A Cone- and Spatial-Opponent Primary Visual Cortex-Inspired Neural Network for Lightweight Salient Object Detection

by Didier Ndayikengurukiye and Max Mignotte

Sensors 2023, 23(14), 6450; https://doi.org/10.3390/s23146450 - 17 Jul 2023

Viewed by 916

Abstract

Salient object-detection models attempt to mimic the human visual system’s ability to select relevant objects in images. To this end, the development of deep neural networks on high-end computers has recently achieved high performance. However, developing deep neural network models with the same [...] Read more.

Salient object-detection models attempt to mimic the human visual system’s ability to select relevant objects in images. To this end, the development of deep neural networks on high-end computers has recently achieved high performance. However, developing deep neural network models with the same performance for resource-limited vision sensors or mobile devices remains a challenge. In this work, we propose CoSOV1net, a novel lightweight salient object-detection neural network model, inspired by the cone- and spatial-opponent processes of the primary visual cortex (V1), which inextricably link color and shape in human color perception. Our proposed model is trained from scratch, without using backbones from image classification or other tasks. Experiments on the most widely used and challenging datasets for salient object detection show that CoSOV1Net achieves competitive performance (i.e.,

F_{β} = 0.931

on the ECSSD dataset) with state-of-the-art salient object-detection models while having a low number of parameters (

1.14

M), low FLOPS (

1.4

G) and high FPS (

211.2

) on GPU (Nvidia GeForce RTX 3090 Ti) compared to the state of the art in lightweight or nonlightweight salient object-detection tasks. Thus, CoSOV1net has turned out to be a lightweight salient object-detection model that can be adapted to mobile environments and resource-constrained devices. Full article

(This article belongs to the Special Issue Object Detection Based on Vision Sensors and Neural Network)

► Show Figures

Figure 1

17 pages, 41360 KiB

Open AccessArticle

Traversable Region Detection and Tracking for a Sparse 3D Laser Scanner for Off-Road Environments Using Range Images

by Jhonghyun An

Sensors 2023, 23(13), 5898; https://doi.org/10.3390/s23135898 - 25 Jun 2023

Cited by 1 | Viewed by 1324

Abstract

This study proposes a method for detecting and tracking traversable regions in off-road conditions for unmanned ground vehicles (UGVs). Off-road conditions, such as rough terrain or fields, present significant challenges for UGV navigation, and detecting and tracking traversable regions is essential to ensure [...] Read more.

This study proposes a method for detecting and tracking traversable regions in off-road conditions for unmanned ground vehicles (UGVs). Off-road conditions, such as rough terrain or fields, present significant challenges for UGV navigation, and detecting and tracking traversable regions is essential to ensure safe and efficient operation. Using a 3D laser scanner and range-image-based approach, a method is proposed for detecting traversable regions under off-road conditions; this is followed by a Bayesian fusion algorithm for tracking the traversable regions in consecutive frames. Our range-image-based traversable-region-detection approach enables efficient processing of point cloud data from a 3D laser scanner, allowing the identification of traversable areas that are safe for the unmanned ground vehicle to drive on. The effectiveness of the proposed method was demonstrated using real-world data collected during UGV operations on rough terrain, highlighting its potential as a solution for improving UGV navigation capabilities in challenging environments. Full article

(This article belongs to the Special Issue Object Detection Based on Vision Sensors and Neural Network)

► Show Figures

Figure 1

14 pages, 2189 KiB

Open AccessArticle

DenseTextPVT: Pyramid Vision Transformer with Deep Multi-Scale Feature Refinement Network for Dense Text Detection

by My-Tham Dinh, Deok-Jai Choi and Guee-Sang Lee

Sensors 2023, 23(13), 5889; https://doi.org/10.3390/s23135889 - 25 Jun 2023

Viewed by 1981

Abstract

Detecting dense text in scene images is a challenging task due to the high variability, complexity, and overlapping of text areas. To adequately distinguish text instances with high density in scenes, we propose an efficient approach called DenseTextPVT. We first generated high-resolution features [...] Read more.

Detecting dense text in scene images is a challenging task due to the high variability, complexity, and overlapping of text areas. To adequately distinguish text instances with high density in scenes, we propose an efficient approach called DenseTextPVT. We first generated high-resolution features at different levels to enable accurate dense text detection, which is essential for dense prediction tasks. Additionally, to enhance the feature representation, we designed the Deep Multi-scale Feature Refinement Network (DMFRN), which effectively detects texts of varying sizes, shapes, and fonts, including small-scale texts. DenseTextPVT, then, is inspired by Pixel Aggregation (PA) similarity vector algorithms to cluster text pixels into correct text kernels in the post-processing step. In this way, our proposed method enhances the precision of text detection and effectively reduces overlapping between text regions under dense adjacent text in natural images. The comprehensive experiments indicate the effectiveness of our method on the TotalText, CTW1500, and ICDAR-2015 benchmark datasets in comparison to existing methods. Full article

(This article belongs to the Special Issue Object Detection Based on Vision Sensors and Neural Network)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Object Detection Based on Vision Sensors and Neural Network

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (9 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI