Research

19 pages, 869 KiB

Open AccessArticle

ssFPN: Scale Sequence (S²) Feature-Based Feature Pyramid Network for Object Detection

by Hye-Jin Park, Ji-Woo Kang and Byung-Gyu Kim

Sensors 2023, 23(9), 4432; https://doi.org/10.3390/s23094432 - 30 Apr 2023

Cited by 14 | Viewed by 2064

Object detection is a fundamental task in computer vision. Over the past several years, convolutional neural network (CNN)-based object detection models have significantly improved detection accuracyin terms of average precision (AP). Furthermore, feature pyramid networks (FPNs) are essential modules for object detection models [...] Read more.

Object detection is a fundamental task in computer vision. Over the past several years, convolutional neural network (CNN)-based object detection models have significantly improved detection accuracyin terms of average precision (AP). Furthermore, feature pyramid networks (FPNs) are essential modules for object detection models to consider various object scales. However, the AP for small objects is lower than the AP for medium and large objects. It is difficult to recognize small objects because they do not have sufficient information, and information is lost in deeper CNN layers. This paper proposes a new FPN model named ssFPN (scale sequence (S

^{2}

) feature-based feature pyramid network) to detect multi-scale objects, especially small objects. We propose a new scale sequence (S

^{2}

) feature that is extracted by 3D convolution on the level of the FPN. It is defined and extracted from the FPN to strengthen the information on small objects based on scale-space theory. Motivated by this theory, the FPN is regarded as a scale space and extracts a scale sequence (S

^{2}

) feature by three-dimensional convolution on the level axis of the FPN. The defined feature is basically scale-invariant and is built on a high-resolution pyramid feature map for small objects. Additionally, the deigned S

^{2}

feature can be extended to most object detection models based on FPNs. We also designed a feature-level super-resolution approach to show the efficiency of the scale sequence (S

^{2}

) feature. We verified that the scale sequence (S

^{2}

) feature could improve the classification accuracy for low-resolution images by training a feature-level super-resolution model. To demonstrate the effect of the scale sequence (S

^{2}

) feature, experiments on the scale sequence (S

^{2}

) feature built-in object detection approach including both one-stage and two-stage models were conducted on the MS COCO dataset. For the two-stage object detection models Faster R-CNN and Mask R-CNN with the S

^{2}

feature, AP improvements of up to 1.6% and 1.4%, respectively, were achieved. Additionally, the AP

_{S}

of each model was improved by 1.2% and 1.1%, respectively. Furthermore, the one-stage object detection models in the YOLO series were improved. For YOLOv4-P5, YOLOv4-P6, YOLOR-P6, YOLOR-W6, and YOLOR-D6 with the S

^{2}

feature, 0.9%, 0.5%, 0.5%, 0.1%, and 0.1% AP improvements were observed. For small object detection, the AP

_{S}

increased by 1.1%, 1.1%, 0.9%, 0.4%, and 0.1%, respectively. Experiments using the feature-level super-resolution approach with the proposed scale sequence (S

^{2}

) feature were conducted on the CIFAR-100 dataset. By training the feature-level super-resolution model, we verified that ResNet-101 with the S

^{2}

feature trained on LR images achieved a 55.2% classification accuracy, which was 1.6% higher than for ResNet-101 trained on HR images. Full article

(This article belongs to the Special Issue Machine Learning in Robust Object Detection and Tracking)

► Show Figures

Figure 1

16 pages, 5387 KiB

Open AccessArticle

Run Your 3D Object Detector on NVIDIA Jetson Platforms:A Benchmark Analysis

by Chungjae Choe, Minjae Choe and Sungwook Jung

Sensors 2023, 23(8), 4005; https://doi.org/10.3390/s23084005 - 15 Apr 2023

Cited by 5 | Viewed by 3447

Abstract

This paper presents a benchmark analysis of NVIDIA Jetson platforms when operating deep learning-based 3D object detection frameworks. Three-dimensional (3D) object detection could be highly beneficial for the autonomous navigation of robotic platforms, such as autonomous vehicles, robots, and drones. Since the function [...] Read more.

This paper presents a benchmark analysis of NVIDIA Jetson platforms when operating deep learning-based 3D object detection frameworks. Three-dimensional (3D) object detection could be highly beneficial for the autonomous navigation of robotic platforms, such as autonomous vehicles, robots, and drones. Since the function provides one-shot inference that extracts 3D positions with depth information and the heading direction of neighboring objects, robots can generate a reliable path to navigate without collision. To enable the smooth functioning of 3D object detection, several approaches have been developed to build detectors using deep learning for fast and accurate inference. In this paper, we investigate 3D object detectors and analyze their performance on the NVIDIA Jetson series that contain an onboard graphical processing unit (GPU) for deep learning computation. Since robotic platforms often require real-time control to avoid dynamic obstacles, onboard processing with a built-in computer is an emerging trend. The Jetson series satisfies such requirements with a compact board size and suitable computational performance for autonomous navigation. However, a proper benchmark that analyzes the Jetson for a computationally expensive task, such as point cloud processing, has not yet been extensively studied. In order to examine the Jetson series for such expensive tasks, we tested the performance of all commercially available boards (i.e., Nano, TX2, NX, and AGX) with state-of-the-art 3D object detectors. We also evaluated the effect of the TensorRT library to optimize a deep learning model for faster inference and lower resource utilization on the Jetson platforms. We present benchmark results in terms of three metrics, including detection accuracy, frame per second (FPS), and resource usage with power consumption. From the experiments, we observe that all Jetson boards, on average, consume over 80% of GPU resources. Moreover, TensorRT could remarkably increase inference speed (i.e., four times faster) and reduce the central processing unit (CPU) and memory consumption in half. By analyzing such metrics in detail, we establish research foundations on edge device-based 3D object detection for the efficient operation of various robotic applications. Full article

(This article belongs to the Special Issue Machine Learning in Robust Object Detection and Tracking)

► Show Figures

Figure 1

14 pages, 12352 KiB

Open AccessArticle

MFTR-Net: A Multi-Level Features Network with Targeted Regularization for Large-Scale Point Cloud Classification

by Ruyu Liu, Zhiyong Zhang, Liting Dai, Guodao Zhang and Bo Sun

Sensors 2023, 23(8), 3869; https://doi.org/10.3390/s23083869 - 10 Apr 2023

Viewed by 1126

Abstract

There are some irregular and disordered noise points in large-scale point clouds, and the accuracy of existing large-scale point cloud classification methods still needs further improvement. This paper proposes a network named MFTR-Net, which considers the local point cloud’s eigenvalue calculation. The eigenvalues [...] Read more.

There are some irregular and disordered noise points in large-scale point clouds, and the accuracy of existing large-scale point cloud classification methods still needs further improvement. This paper proposes a network named MFTR-Net, which considers the local point cloud’s eigenvalue calculation. The eigenvalues of 3D point cloud data and the 2D eigenvalues of projected point clouds on different planes are calculated to express the local feature relationship between adjacent point clouds. A regular point cloud feature image is constructed and inputs into the designed convolutional neural network. The network adds TargetDrop to be more robust. The experimental result shows that our methods can learn more high-dimensional feature information, further improving point cloud classification, and our approach can achieve 98.0% accuracy with the Oakland 3D dataset. Full article

(This article belongs to the Special Issue Machine Learning in Robust Object Detection and Tracking)

► Show Figures

Figure 1

33 pages, 16682 KiB

Open AccessArticle

A Two-Stage Automatic Color Thresholding Technique

by Shamna Pootheri, Daniel Ellam, Thomas Grübl and Yang Liu

Sensors 2023, 23(6), 3361; https://doi.org/10.3390/s23063361 - 22 Mar 2023

Viewed by 2964

Abstract

Thresholding is a prerequisite for many computer vision algorithms. By suppressing the background in an image, one can remove unnecessary information and shift one’s focus to the object of inspection. We propose a two-stage histogram-based background suppression technique based on the chromaticity of [...] Read more.

Thresholding is a prerequisite for many computer vision algorithms. By suppressing the background in an image, one can remove unnecessary information and shift one’s focus to the object of inspection. We propose a two-stage histogram-based background suppression technique based on the chromaticity of the image pixels. The method is unsupervised, fully automated, and does not need any training or ground-truth data. The performance of the proposed method was evaluated using a printed circuit assembly (PCA) board dataset and the University of Waterloo skin cancer dataset. Accurately performing background suppression in PCA boards facilitates the inspection of digital images with small objects of interest, such as text or microcontrollers on a PCA board. The segmentation of skin cancer lesions will help doctors to automate skin cancer detection. The results showed a clear and robust background–foreground separation across various sample images under different camera or lighting conditions, which the naked implementation of existing state-of-the-art thresholding methods could not achieve. Full article

(This article belongs to the Special Issue Machine Learning in Robust Object Detection and Tracking)

► Show Figures

Figure 1

18 pages, 4134 KiB

Open AccessArticle

Multiple Attention Mechanism Enhanced YOLOX for Remote Sensing Object Detection

by Chao Shen, Caiwen Ma and Wei Gao

Sensors 2023, 23(3), 1261; https://doi.org/10.3390/s23031261 - 22 Jan 2023

Cited by 7 | Viewed by 2511

Abstract

The object detection technologies of remote sensing are widely used in various fields, such as environmental monitoring, geological disaster investigation, urban planning, and military defense. However, the detection algorithms lack the robustness to detect tiny objects against complex backgrounds. In this paper, we [...] Read more.

The object detection technologies of remote sensing are widely used in various fields, such as environmental monitoring, geological disaster investigation, urban planning, and military defense. However, the detection algorithms lack the robustness to detect tiny objects against complex backgrounds. In this paper, we propose a Multiple Attention Mechanism Enhanced YOLOX (MAME-YOLOX) algorithm to address the above problem. Firstly, the CBAM attention mechanism is introduced into the backbone of the YOLOX, so that the detection network can focus on the saliency information. Secondly, to identify the high-level semantic information and enhance the perception of local geometric feature information, the Swin Transformer is integrated into the YOLOX’s neck module. Finally, instead of GIOU loss, CIoU loss is adopted to measure the bounding box regression loss, which can prevent the GIoU from degenerating into IoU. The experimental results of three publicly available remote sensing datasets, namely, AIBD, HRRSD, and DIOR, show that the algorithm proposed possesses better performance, both in relation to quantitative and qualitative aspects. Full article

(This article belongs to the Special Issue Machine Learning in Robust Object Detection and Tracking)

► Show Figures

Figure 1

12 pages, 945 KiB

Open AccessArticle

Regressing Image Sub-Population Distributions with Deep Learning

by Magdeleine Airiau, Adrien Chan-Hon-Tong, Robin W. Devillers and Guy Le Besnerais

Sensors 2022, 22(23), 9218; https://doi.org/10.3390/s22239218 - 27 Nov 2022

Viewed by 985

Abstract

Regressing the distribution of different sub-populations from a batch of images with learning algorithms is not a trivial task, as models tend to make errors that are unequally distributed across the different sub-populations. Obviously, the baseline is forming a histogram from the batch [...] Read more.

Regressing the distribution of different sub-populations from a batch of images with learning algorithms is not a trivial task, as models tend to make errors that are unequally distributed across the different sub-populations. Obviously, the baseline is forming a histogram from the batch after having characterized each image independently. However, we show that this approach can be strongly improved by making the model aware of the ultimate task thanks to a density loss for both sub-populations related to classes (on three public datasets of image classification) and sub-populations related to size (on two public datasets of object detection in image). For example, class distribution was improved two-fold on the EUROSAT dataset and size distribution was improved by 10% on the PASCAL VOC dataset with both RESNET and VGG backbones. The code is released in the GitHub archive at achanhon/AdversarialModel/tree/master/proportion. Full article

(This article belongs to the Special Issue Machine Learning in Robust Object Detection and Tracking)

► Show Figures

Figure 1

20 pages, 4079 KiB

Open AccessArticle

Robust 6-DoF Pose Estimation under Hybrid Constraints

by Hong Ren, Lin Lin, Yanjie Wang and Xin Dong

Sensors 2022, 22(22), 8758; https://doi.org/10.3390/s22228758 - 12 Nov 2022

Cited by 2 | Viewed by 1564

Abstract

To solve the problem of the insufficient accuracy and stability of the two-stage pose estimation algorithm using heatmap in the problem of occluded object pose estimation, a new robust 6-DoF pose estimation algorithm under hybrid constraints is proposed in this paper. First, a [...] Read more.

To solve the problem of the insufficient accuracy and stability of the two-stage pose estimation algorithm using heatmap in the problem of occluded object pose estimation, a new robust 6-DoF pose estimation algorithm under hybrid constraints is proposed in this paper. First, a new loss function suitable for heatmap regression is formulated to improve the quality of the predicted heatmaps and increase keypoint accuracy in complex scenes. Second, the heatmap regression network is expanded and a translation regression branch is added to constrain the pose further. Finally, a robust pose optimization module is used to fuse the heatmap and translation estimates and improve the pose estimation accuracy. The proposed algorithm achieves ADD(-S) accuracy rates of 93.5% and 46.2% on the LINEMOD dataset and the Occlusion LINEMOD dataset, which are better than other state-of-the-art algorithms. Compared with the conventional two-stage heatmap-based pose estimation algorithms, the mean estimation error is greatly reduced, and the stability of pose estimation is improved. The proposed algorithm can run at a maximum speed of 22 FPS, thus constituting both a performant and efficient method. Full article

(This article belongs to the Special Issue Machine Learning in Robust Object Detection and Tracking)

► Show Figures

Figure 1

28 pages, 44447 KiB

Open AccessArticle

Unified DeepLabV3+ for Semi-Dark Image Semantic Segmentation

by Mehak Maqbool Memon, Manzoor Ahmed Hashmani, Aisha Zahid Junejo, Syed Sajjad Rizvi and Kamran Raza

Sensors 2022, 22(14), 5312; https://doi.org/10.3390/s22145312 - 15 Jul 2022

Cited by 4 | Viewed by 2395

Abstract

Semantic segmentation for accurate visual perception is a critical task in computer vision. In principle, the automatic classification of dynamic visual scenes using predefined object classes remains unresolved. The challenging problems of learning deep convolution neural networks, specifically ResNet-based DeepLabV3+ (the most recent [...] Read more.

Semantic segmentation for accurate visual perception is a critical task in computer vision. In principle, the automatic classification of dynamic visual scenes using predefined object classes remains unresolved. The challenging problems of learning deep convolution neural networks, specifically ResNet-based DeepLabV3+ (the most recent version), are threefold. The problems arise due to (1) biased centric exploitations of filter masks, (2) lower representational power of residual networks due to identity shortcuts, and (3) a loss of spatial relationship by using per-pixel primitives. To solve these problems, we present a proficient approach based on DeepLabV3+, along with an added evaluation metric, namely, Unified DeepLabV3+ and

S^{3} c o r e

, respectively. The presented unified version reduced the effect of biased exploitations via additional dilated convolution layers with customized dilation rates. We further tackled the problem of representational power by introducing non-linear group normalization shortcuts to solve the focused problem of semi-dark images. Meanwhile, to keep track of the spatial relationships in terms of the global and local contexts, geometrically bunched pixel cues were used. We accumulated all the proposed variants of DeepLabV3+ to propose Unified DeepLabV3+ for accurate visual decisions. Finally, the proposed

S^{3} c o r e

evaluation metric was based on the weighted combination of three different accuracy measures, i.e., the pixel accuracy, IoU (intersection over union), and Mean BFScore, as robust identification criteria. Extensive experimental analysis performed over a CamVid dataset confirmed the applicability of the proposed solution for autonomous vehicles and robotics for outdoor settings. The experimental analysis showed that the proposed Unified DeepLabV3+ outperformed DeepLabV3+ by a margin of 3% in terms of the class-wise pixel accuracy, along with a higher

S^{3} c o r e

, depicting the effectiveness of the proposed approach. Full article

(This article belongs to the Special Issue Machine Learning in Robust Object Detection and Tracking)

► Show Figures

Figure 1

12 pages, 24405 KiB

Open AccessArticle

A Novel Memory and Time-Efficient ALPR System Based on YOLOv5

by Piyush Batra, Imran Hussain, Mohd Abdul Ahad, Gabriella Casalino, Mohammad Afshar Alam, Aqeel Khalique and Syed Imtiyaz Hassan

Sensors 2022, 22(14), 5283; https://doi.org/10.3390/s22145283 - 14 Jul 2022

Cited by 8 | Viewed by 3259

Abstract

With the rapid development of deep learning techniques, new innovative license plate recognition systems have gained considerable attention from researchers all over the world. These systems have numerous applications, such as law enforcement, parking lot management, toll terminals, traffic regulation, etc. At present, [...] Read more.

With the rapid development of deep learning techniques, new innovative license plate recognition systems have gained considerable attention from researchers all over the world. These systems have numerous applications, such as law enforcement, parking lot management, toll terminals, traffic regulation, etc. At present, most of these systems rely heavily on high-end computing resources. This paper proposes a novel memory and time-efficient automatic license plate recognition (ALPR) system developed using YOLOv5. This approach is ideal for IoT devices that usually have less memory and processing power. Our approach incorporates two stages, i.e., using a custom transfer learned model for license plate detection and an LSTM-based OCR engine for recognition. The dataset that we used for this research was our dataset consisting of images from the Google open images dataset and the Indian License plate dataset. Along with training YOLOv5 models, we also trained YOLOv4 models on the same dataset to illustrate the size and performance-wise comparison. Our proposed ALPR system results in a 14 megabytes model with a mean average precision of 87.2% and 4.8 ms testing time on still images using Nvidia T4 GPU. The complete system with detection and recognition on the other hand takes about 85 milliseconds. Full article

(This article belongs to the Special Issue Machine Learning in Robust Object Detection and Tracking)

► Show Figures

Figure 1

19 pages, 3862 KiB

Open AccessArticle

Transformer Feature Enhancement Network with Template Update for Object Tracking

by Xiuhua Hu, Huan Liu, Yan Hui, Xi Wu and Jing Zhao

Sensors 2022, 22(14), 5219; https://doi.org/10.3390/s22145219 - 12 Jul 2022

Cited by 6 | Viewed by 1682

Abstract

This paper proposes a tracking method combining feature enhancement and template update, aiming to solve the problems of existing trackers lacking global information attention, weak feature characterization ability, and not being well adapted to the changing appearance of the target. Pre-extracted features are [...] Read more.

This paper proposes a tracking method combining feature enhancement and template update, aiming to solve the problems of existing trackers lacking global information attention, weak feature characterization ability, and not being well adapted to the changing appearance of the target. Pre-extracted features are enhanced in context and on channels through a feature enhancement network consisting of channel attention and transformer architectures. The enhanced feature information is input into classification and regression networks to achieve the final target state estimation. At the same time, the template update strategy is introduced to update the sample template judiciously. Experimental results show that the proposed tracking method exhibits good tracking performance on the OTB100, LaSOT, and GOT-10k benchmark datasets. Full article

(This article belongs to the Special Issue Machine Learning in Robust Object Detection and Tracking)

► Show Figures

Figure 1

21 pages, 5209 KiB

Open AccessArticle

Part-Based Obstacle Detection Using a Multiple Output Neural Network

by Razvan Itu and Radu Danescu

Sensors 2022, 22(12), 4312; https://doi.org/10.3390/s22124312 - 07 Jun 2022

Cited by 3 | Viewed by 2002

Abstract

Detecting the objects surrounding a moving vehicle is essential for autonomous driving and for any kind of advanced driving assistance system; such a system can also be used for analyzing the surrounding traffic as the vehicle moves. The most popular techniques for object [...] Read more.

Detecting the objects surrounding a moving vehicle is essential for autonomous driving and for any kind of advanced driving assistance system; such a system can also be used for analyzing the surrounding traffic as the vehicle moves. The most popular techniques for object detection are based on image processing; in recent years, they have become increasingly focused on artificial intelligence. Systems using monocular vision are increasingly popular for driving assistance, as they do not require complex calibration and setup. The lack of three-dimensional data is compensated for by the efficient and accurate classification of the input image pixels. The detected objects are usually identified as cuboids in the 3D space, or as rectangles in the image space. Recently, instance segmentation techniques have been developed that are able to identify the freeform set of pixels that form an individual object, using complex convolutional neural networks (CNNs). This paper presents an alternative to these instance segmentation networks, combining much simpler semantic segmentation networks with light, geometrical post-processing techniques, to achieve instance segmentation results. The semantic segmentation network produces four semantic labels that identify the quarters of the individual objects: top left, top right, bottom left, and bottom right. These pixels are grouped into connected regions, based on their proximity and their position with respect to the whole object. Each quarter is used to generate a complete object hypothesis, which is then scored according to object pixel fitness. The individual homogeneous regions extracted from the labeled pixels are then assigned to the best-fitted rectangles, leading to complete and freeform identification of the pixels of individual objects. The accuracy is similar to instance segmentation-based methods but with reduced complexity in terms of trainable parameters, which leads to a reduced demand for computational resources. Full article

(This article belongs to the Special Issue Machine Learning in Robust Object Detection and Tracking)

► Show Figures

Figure 1

20 pages, 6110 KiB

Open AccessArticle

Rapid Post-Earthquake Structural Damage Assessment Using Convolutional Neural Networks and Transfer Learning

by Peter Damilola Ogunjinmi, Sung-Sik Park, Bubryur Kim and Dong-Eun Lee

Sensors 2022, 22(9), 3471; https://doi.org/10.3390/s22093471 - 03 May 2022

Cited by 14 | Viewed by 3225

Abstract

The adoption of artificial intelligence in post-earthquake inspections and reconnaissance has received considerable attention in recent years, owing to its exponential increase in computation capabilities and inherent potential in addressing disadvantages associated with manual inspections. Herein, we present the effectiveness of automated deep [...] Read more.

The adoption of artificial intelligence in post-earthquake inspections and reconnaissance has received considerable attention in recent years, owing to its exponential increase in computation capabilities and inherent potential in addressing disadvantages associated with manual inspections. Herein, we present the effectiveness of automated deep learning in enhancing the assessment of damage caused by the 2017 Pohang earthquake. Six classical pre-trained convolutional neural network (CNN) models are implemented through transfer learning (TL) on a small dataset, comprising 1780 manually labeled images of structural damage. Feature extraction and fine-tuning TL methods are trained on the image datasets. The performances of various CNN models are compared on a testing image dataset. Results confirm that the MobileNet fine-tuned model offers the best performance. Therefore, the model is further developed as a web-based application for classifying earthquake damage. The severity of damage is quantified by assigning damage assessment values, derived using the CNN model and gradient-weighted class activation mapping. The web-based application can effectively and automatically classify structural damage resulting from earthquakes, rendering it suitable for decision making, such as in resource allocation, policy development, and emergency response. Full article

(This article belongs to the Special Issue Machine Learning in Robust Object Detection and Tracking)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Machine Learning in Robust Object Detection and Tracking

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Published Papers (12 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI