sensors-logo

Journal Browser

Journal Browser

Deep Learning for Object Detection, Classification and Tracking in Industry Applications

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (31 May 2021) | Viewed by 29585

Special Issue Editors


E-Mail Website
Guest Editor
Imaging and Computer Vision Research Group, Data61, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW 2122, Australia
Interests: image analysis; computer vision; machine learning; artificial intelligence

E-Mail Website
Guest Editor
Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #21-01, Connexis (South Tower), Singapore 138632, Singapore
Interests: computer vision; biometrics; autonomous vehicles

E-Mail Website
Guest Editor
Collaborative Innovation Center of Steel Technology, University of Science and Technology Beijing, Beijing 100083, China
Interests: machine learning; machine vision; surface inspection

Special Issue Information

Dear Colleagues,

With the advancement of computing hardware and imaging sensors, images and videos are becoming ubiquitous and deep learning-powered computer vision has become the core of the artificial intelligence (AI) revolution. While this is changing the way we see the world, there are still lots of challenges in real-world applications such as object detection, classification, and tracking under challenging conditions, availability of labeled data and uncertainty quantification in deep learning. Researchers are looking into new technologies to address these challenges such as deep active learning, synthetic data, edge AI, etc. As a result, applications of deep learning-equipped computer vision have grown exponentially across different industries.

This Special Issue calls for innovative solutions addressing the challenges in solving real-world problems using advanced machine learning and computer vision technologies. The topics of interests include, but are not limited to:

  • Deep learning for object detection, classification, and tracking
  • Object detection under challenging conditions such as weather, illumination, and background
  • Image classification
  • Image segmentation
  • Scene understanding
  • 3D computer vision
  • Biomedical imaging
  • Deep video analytics
  • Multispectral and Hyperspectral Imaging
  • Image and video synthesis
  • Active learning
  • Multiview learning
  • Federated learning
  • Edge AI

Dr. Dadong Wang
Dr. Jian-Gang Wang
Prof. Dr. Ke Xu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • artificial intelligence
  • computer vision
  • image analysis

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

3 pages, 155 KiB  
Editorial
Deep Learning for Object Detection, Classification and Tracking in Industry Applications
by Dadong Wang, Jian-Gang Wang and Ke Xu
Sensors 2021, 21(21), 7349; https://doi.org/10.3390/s21217349 - 05 Nov 2021
Cited by 16 | Viewed by 2948
Abstract
Object detection, classification and tracking are three important computer vision techniques [...] Full article

Research

Jump to: Editorial

12 pages, 2734 KiB  
Communication
Multi-Directional Scene Text Detection Based on Improved YOLOv3
by Liyun Xiao, Peng Zhou, Ke Xu and Xiaofang Zhao
Sensors 2021, 21(14), 4870; https://doi.org/10.3390/s21144870 - 16 Jul 2021
Cited by 11 | Viewed by 2104
Abstract
To address the problem of low detection rate caused by the close alignment and multi-directional position of text words in practical application and the need to improve the detection speed of the algorithm, this paper proposes a multi-directional text detection algorithm based on [...] Read more.
To address the problem of low detection rate caused by the close alignment and multi-directional position of text words in practical application and the need to improve the detection speed of the algorithm, this paper proposes a multi-directional text detection algorithm based on improved YOLOv3, and applies it to natural text detection. To detect text in multiple directions, this paper introduces a method of box definition based on sliding vertices. Then, a new rotating box loss function MD-Closs based on CIOU is proposed to improve the detection accuracy. In addition, a step-by-step NMS method is used to further reduce the amount of calculation. Experimental results show that on the ICDAR 2015 data set, the accuracy rate is 86.2%, the recall rate is 81.9%, and the timeliness is 21.3 fps, which shows that the proposed algorithm has a good detection effect on text detection in natural scenes. Full article
Show Figures

Figure 1

13 pages, 4627 KiB  
Communication
An Improved Character Recognition Framework for Containers Based on DETR Algorithm
by Xiaofang Zhao, Peng Zhou, Ke Xu and Liyun Xiao
Sensors 2021, 21(13), 4612; https://doi.org/10.3390/s21134612 - 05 Jul 2021
Cited by 6 | Viewed by 2656
Abstract
An improved DETR (detection with transformers) object detection framework is proposed to realize accurate detection and recognition of characters on shipping containers. ResneSt is used as a backbone network with split attention to extract features of different dimensions by multi-channel weight convolution operation, [...] Read more.
An improved DETR (detection with transformers) object detection framework is proposed to realize accurate detection and recognition of characters on shipping containers. ResneSt is used as a backbone network with split attention to extract features of different dimensions by multi-channel weight convolution operation, thus increasing the overall feature acquisition ability of the backbone. In addition, multi-scale location encoding is introduced on the basis of the original sinusoidal position encoding model, improving the sensitivity of input position information for the transformer structure. Compared with the original DETR framework, our model has higher confidence regarding accurate detection, with detection accuracy being improved by 2.6%. In a test of character detection and recognition with a self-built dataset, the overall accuracy can reach 98.6%, which meets the requirements of logistics information identification acquisition. Full article
Show Figures

Figure 1

13 pages, 2184 KiB  
Article
Interp-SUM: Unsupervised Video Summarization with Piecewise Linear Interpolation
by Ui-Nyoung Yoon, Myung-Duk Hong and Geun-Sik Jo
Sensors 2021, 21(13), 4562; https://doi.org/10.3390/s21134562 - 02 Jul 2021
Cited by 16 | Viewed by 2519
Abstract
This paper addresses the problem of unsupervised video summarization. Video summarization helps people browse large-scale videos easily with a summary from the selected frames of the video. In this paper, we propose an unsupervised video summarization method with piecewise linear interpolation (Interp-SUM). Our [...] Read more.
This paper addresses the problem of unsupervised video summarization. Video summarization helps people browse large-scale videos easily with a summary from the selected frames of the video. In this paper, we propose an unsupervised video summarization method with piecewise linear interpolation (Interp-SUM). Our method aims to improve summarization performance and generate a natural sequence of keyframes with predicting importance scores of each frame utilizing the interpolation method. To train the video summarization network, we exploit a reinforcement learning-based framework with an explicit reward function. We employ the objective function of the exploring under-appreciated reward method for training efficiently. In addition, we present a modified reconstruction loss to promote the representativeness of the summary. We evaluate the proposed method on two datasets, SumMe and TVSum. The experimental result showed that Interp-SUM generates the most natural sequence of summary frames than any other the state-of-the-art methods. In addition, Interp-SUM still showed comparable performance with the state-of-art research on unsupervised video summarization methods, which is shown and analyzed in the experiments of this paper. Full article
Show Figures

Figure 1

27 pages, 11887 KiB  
Article
SAVSDN: A Scene-Aware Video Spark Detection Network for Aero Engine Intelligent Test
by Jie Kou, Xinman Zhang, Yuxuan Huang and Cong Zhang
Sensors 2021, 21(13), 4453; https://doi.org/10.3390/s21134453 - 29 Jun 2021
Cited by 1 | Viewed by 1673
Abstract
Due to carbon deposits, lean flames, or damaged metal parts, sparks can occur in aero engine chambers. At present, the detection of such sparks deeply depends on laborious manual work. Considering that interference has the same features as sparks, almost all existing object [...] Read more.
Due to carbon deposits, lean flames, or damaged metal parts, sparks can occur in aero engine chambers. At present, the detection of such sparks deeply depends on laborious manual work. Considering that interference has the same features as sparks, almost all existing object detectors cannot replace humans in carrying out high-precision spark detection. In this paper, we propose a scene-aware spark detection network, consisting of an information fusion-based cascading video codec-image object detector structure, which we name SAVSDN. Unlike video object detectors utilizing candidate boxes from adjacent frames to assist in the current prediction, we find that efforts should be made to extract the spatio-temporal features of adjacent frames to reduce over-detection. Visualization experiments show that SAVSDN can learn the difference in spatio-temporal features between sparks and interference. To solve the problem of a lack of aero engine anomalous spark data, we introduce a method to generate simulated spark images based on the Gaussian function. In addition, we publish the first simulated aero engine spark data set, which we name SAES. In our experiments, SAVSDN far outperformed state-of-the-art detection models for spark detection in terms of five metrics. Full article
Show Figures

Figure 1

14 pages, 2506 KiB  
Article
A Sawn Timber Tree Species Recognition Method Based on AM-SPPResNet
by Fenglong Ding, Ying Liu, Zilong Zhuang and Zhengguang Wang
Sensors 2021, 21(11), 3699; https://doi.org/10.3390/s21113699 - 26 May 2021
Cited by 7 | Viewed by 2224
Abstract
Sawn timber is an important component material in furniture manufacturing, decoration, construction and other industries. The mechanical properties, surface colors, textures, use and other properties of sawn timber possesed by different tree species are different. In order to meet the needs of reasonable [...] Read more.
Sawn timber is an important component material in furniture manufacturing, decoration, construction and other industries. The mechanical properties, surface colors, textures, use and other properties of sawn timber possesed by different tree species are different. In order to meet the needs of reasonable timber use and product quality of sawn timber products, sawn timber must be identified according to tree species to ensure the best use of materials. In this study, an optimized convolution neural network was proposed to process sawn timber image data to identify the tree species of the sawn timber. The spatial pyramid pooling and attention mechanism were used to improve the convolution layer of ResNet101 to extract the feature vector of sawn timber images. The optimized ResNet (simply called “AM-SPPResNet”) was used to identify the sawn timber image, and the basic recognition model was obtained. Then, the weight parameters of the feature extraction layer of the basic model were frozen, the full connection layer was removed, and using support vector machine (SVM) and XGBoost classifier which were commonly used in machine learning to train and learn the 21 × 1024 dimension feature vectors extracted by feature extraction layer. Through a number of comparative experiments, it is found that the prediction model using linear function as the kernel function of support vector machine learning the feature vectors extracted from the improved convolution layer performed best, and the F1 score and overall accuracy of all kinds of samples were above 99%. Compared with the traditional methods, the accuracy was improved by up to 12%. Full article
Show Figures

Figure 1

13 pages, 5261 KiB  
Communication
FPGA-Based Acceleration on Additive Manufacturing Defects Inspection
by Yawen Luo and Yuhua Chen
Sensors 2021, 21(6), 2123; https://doi.org/10.3390/s21062123 - 18 Mar 2021
Cited by 10 | Viewed by 3298
Abstract
Additive manufacturing (AM) has gained increasing attention over the past years due to its fast prototype, easier modification, and possibility for complex internal texture devices when compared to traditional manufacture processing. However, potential internal defects are occurring during AM processes, and it requires [...] Read more.
Additive manufacturing (AM) has gained increasing attention over the past years due to its fast prototype, easier modification, and possibility for complex internal texture devices when compared to traditional manufacture processing. However, potential internal defects are occurring during AM processes, and it requires real-time inspections to minimize the costs by either aborting the processing or repairing the defect. In order to perform the defects inspection, first the defects database NEU-DET is used for training. Then, a convolution neural network (CNN) is applied to perform defects classification. For real-time purposes, Field Programmable Gate Arrays (FPGAs) are utilized for acceleration. A binarized neural network (BNN) is proposed to best fit the FPGA bit operations. Finally, for the image labeled with defects, the selective search and non-maximum algorithms are implemented to help locate the coordinates of defects. Experiments show that the BNN model on NEU-DET can achieve 97.9% accuracy in identifying whether the image is defective or defect-free. As for the image classification speed, the FPGA-based BNN module can process one image within 0.5 s. The BNN design is modularized and can be duplicated in parallel to fully utilize logic gates and memory resources in FPGAs. It is clear that the proposed FPGA-based BNN can perform real-time defects inspection with high accuracy and it can easily scale up to larger FPGA implementations. Full article
Show Figures

Figure 1

15 pages, 15343 KiB  
Article
AgriPest: A Large-Scale Domain-Specific Benchmark Dataset for Practical Agricultural Pest Detection in the Wild
by Rujing Wang, Liu Liu, Chengjun Xie, Po Yang, Rui Li and Man Zhou
Sensors 2021, 21(5), 1601; https://doi.org/10.3390/s21051601 - 25 Feb 2021
Cited by 44 | Viewed by 7054
Abstract
The recent explosion of large volume of standard dataset of annotated images has offered promising opportunities for deep learning techniques in effective and efficient object detection applications. However, due to a huge difference of quality between these standardized dataset and practical raw data, [...] Read more.
The recent explosion of large volume of standard dataset of annotated images has offered promising opportunities for deep learning techniques in effective and efficient object detection applications. However, due to a huge difference of quality between these standardized dataset and practical raw data, it is still a critical problem on how to maximize utilization of deep learning techniques in practical agriculture applications. Here, we introduce a domain-specific benchmark dataset, called AgriPest, in tiny wild pest recognition and detection, providing the researchers and communities with a standard large-scale dataset of practically wild pest images and annotations, as well as evaluation procedures. During the past seven years, AgriPest captures 49.7K images of four crops containing 14 species of pests by our designed image collection equipment in the field environment. All of the images are manually annotated by agricultural experts with up to 264.7K bounding boxes of locating pests. This paper also offers a detailed analysis of AgriPest where the validation set is split into four types of scenes that are common in practical pest monitoring applications. We explore and evaluate the performance of state-of-the-art deep learning techniques over AgriPest. We believe that the scale, accuracy, and diversity of AgriPest can offer great opportunities to researchers in computer vision as well as pest monitoring applications. Full article
Show Figures

Figure 1

25 pages, 6767 KiB  
Article
Meta-Transfer Learning Driven Tensor-Shot Detector for the Autonomous Localization and Recognition of Concealed Baggage Threats
by Taimur Hassan, Muhammad Shafay, Samet Akçay, Salman Khan, Mohammed Bennamoun, Ernesto Damiani and Naoufel Werghi
Sensors 2020, 20(22), 6450; https://doi.org/10.3390/s20226450 - 12 Nov 2020
Cited by 38 | Viewed by 3633
Abstract
Screening baggage against potential threats has become one of the prime aviation security concerns all over the world, where manual detection of prohibited items is a time-consuming and hectic process. Many researchers have developed autonomous systems to recognize baggage threats using security X-ray [...] Read more.
Screening baggage against potential threats has become one of the prime aviation security concerns all over the world, where manual detection of prohibited items is a time-consuming and hectic process. Many researchers have developed autonomous systems to recognize baggage threats using security X-ray scans. However, all of these frameworks are vulnerable against screening cluttered and concealed contraband items. Furthermore, to the best of our knowledge, no framework possesses the capacity to recognize baggage threats across multiple scanner specifications without an explicit retraining process. To overcome this, we present a novel meta-transfer learning-driven tensor-shot detector that decomposes the candidate scan into dual-energy tensors and employs a meta-one-shot classification backbone to recognize and localize the cluttered baggage threats. In addition, the proposed detection framework can be well-generalized to multiple scanner specifications due to its capacity to generate object proposals from the unified tensor maps rather than diversified raw scans. We have rigorously evaluated the proposed tensor-shot detector on the publicly available SIXray and GDXray datasets (containing a cumulative of 1,067,381 grayscale and colored baggage X-ray scans). On the SIXray dataset, the proposed framework achieved a mean average precision (mAP) of 0.6457, and on the GDXray dataset, it achieved the precision and F1 score of 0.9441 and 0.9598, respectively. Furthermore, it outperforms state-of-the-art frameworks by 8.03% in terms of mAP, 1.49% in terms of precision, and 0.573% in terms of F1 on the SIXray and GDXray dataset, respectively. Full article
Show Figures

Figure 1

Back to TopTop