Deep Perception in Autonomous Driving

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Electrical and Autonomous Vehicles".

Deadline for manuscript submissions: 15 December 2024 | Viewed by 32109

Special Issue Editors


E-Mail Website
Guest Editor
School of Software, Shandong University, Ji'nan 250100, China
Interests: autonomousdriving;computer vision; deep learning

E-Mail Website
Guest Editor
Departments of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zürich, Switzerland
Interests: autonomous driving; deep learning; imagevideo segmentation

E-Mail Website
Guest Editor
Departments of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zürich, Switzerland
Interests: autonomous driving; deep learning; embodied AI; human-centric visual understanding; vision-language reasoning

Special Issue Information

Dear Colleagues,

The perception of the physical environment plays an essential role in the field of autonomous driving. Starting with the technical equipment within vehicles, autonomous driving is ushering in fundamental changes. For instance, cameras and various sensors are equipped to enable autonomous driving systems to better recognize the environment. This opens amazing opportunities to achieve innovative autonomous driving functions but also imposes exciting challenges for the perception system and associated multimodal data processing/understanding modules. With this Special Issue, we attempt to showcase the latest advances and trends in deep learning-based techniques to build ‘autonomous driving friendly’ perception models.

This Special Issue will feature original research papers related to the models and algorithms for perception tasks in autonomous driving. The main topics of interest (but are not limited to):

  • Visual, LiDAR and radar perception
  • 2D/3D object detection, 2D/3D object tracking
  • Domain adaption for classification/detection/segmentation
  • Scene parsing, semantic segmentation, instance segmentation and panoptic segmentation.
  • Human-centric visual understanding, human–human/object interaction understanding
  • Human activity understanding, human intention modeling
  • Person re-identification, pose estimation and part parsing
  • Vehicle detection, pedestrian detection and road detection
  • New benchmark datasets and survey papers related to the topics

Prof. Dr. Xiankai Lu
Dr. Tianfei Zhou
Dr. Wenguan Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • autonomous vehicles
  • artificial intelligence
  • visual perception
  • deep learning

Published Papers (18 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 5924 KiB  
Article
Radar Perception of Multi-Object Collision Risk Neural Domains during Autonomous Driving
by Józef Lisowski
Electronics 2024, 13(6), 1065; https://doi.org/10.3390/electronics13061065 - 13 Mar 2024
Viewed by 608
Abstract
The analysis of the state of the literature in the field of methods of perception and control of the movement of autonomous vehicles shows the possibilities of improving them by using an artificial neural network to generate domains of prohibited maneuvers of passing [...] Read more.
The analysis of the state of the literature in the field of methods of perception and control of the movement of autonomous vehicles shows the possibilities of improving them by using an artificial neural network to generate domains of prohibited maneuvers of passing objects, contributing to increasing the safety of autonomous driving in various real conditions of the surrounding environment. This article concerns radar perception, which involves receiving information about the movement of many autonomous objects, then identifying and assigning them a collision risk and preparing a maneuvering response. In the identification process, each object is assigned a domain generated by a previously trained neural network. The size of the domain is proportional to the risk of collisions and distance changes during autonomous driving. Then, an optimal trajectory is determined from among the possible safe paths, ensuring control in a minimum of time. The presented solution to the radar perception task was illustrated with a computer simulation of autonomous driving in a situation of passing many objects. The main achievements presented in this article are the synthesis of a radar perception algorithm mapping the neural domains of autonomous objects characterizing their collision risk and the assessment of the degree of radar perception on the example of multi-object autonomous driving simulation. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

17 pages, 10836 KiB  
Article
HAR-Net: An Hourglass Attention ResNet Network for Dangerous Driving Behavior Detection
by Zhe Qu, Lizhen Cui and Xiaohui Yang
Electronics 2024, 13(6), 1019; https://doi.org/10.3390/electronics13061019 - 08 Mar 2024
Viewed by 481
Abstract
Ensuring safety while driving relies heavily on normal driving behavior, making the timely detection of dangerous driving patterns crucial. In this paper, an Hourglass Attention ResNet Network (HAR-Net) is proposed to detect dangerous driving behavior. Uniquely, we separately input optical flow data, RGB [...] Read more.
Ensuring safety while driving relies heavily on normal driving behavior, making the timely detection of dangerous driving patterns crucial. In this paper, an Hourglass Attention ResNet Network (HAR-Net) is proposed to detect dangerous driving behavior. Uniquely, we separately input optical flow data, RGB data, and RGBD data into the network for spatial–temporal fusion. In the spatial fusion part, we combine ResNet-50 and the hourglass network as the backbone of CenterNet. To improve the accuracy, we add the attention mechanism to the network and integrate center loss into the original Softmax loss. Additionally, a dangerous driving behavior dataset is constructed to evaluate the proposed model. Through ablation and comparative studies, we demonstrate the efficacy of each HAR-Net component. Notably, HAR-Net achieves a mean average precision of 98.84% on our dataset, surpassing other state-of-the-art networks for detecting distracted driving behaviors. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

19 pages, 2449 KiB  
Article
Driving Behaviour Estimation System Considering the Effect of Road Geometry by Means of Deep NN and Hotelling Transform
by Felipe Barreno, Matilde Santos and Manuel Romana
Electronics 2024, 13(3), 637; https://doi.org/10.3390/electronics13030637 - 02 Feb 2024
Viewed by 500
Abstract
In this work, an intelligent hybrid model is proposed to identify hazardous or inattentive driving manoeuvres on roads, with the final goal being to increase and ensure travellers’ safety and comfort. The estimation is based on the effects that road geometry may have [...] Read more.
In this work, an intelligent hybrid model is proposed to identify hazardous or inattentive driving manoeuvres on roads, with the final goal being to increase and ensure travellers’ safety and comfort. The estimation is based on the effects that road geometry may have on vehicle accelerations, displacements and dynamics. The outputs of the intelligent systems proposed are how the type of driving can be characterized as normal, careless or distracted. The intelligent system consists of an LSTM (Long Short-Term Memory) neural network in a first step that distinguishes between normal and abnormal driving behaviour and then a second module that classifies abnormal forms of driving as aggressive or inattentive, with the latter implemented with another LSTM, a CNN (convolutional neural network) or the Hotelling transform. They are applied to some of the characteristics of vehicle dynamics to estimate the driving behaviour. Smartphone inertial sensors such as GPS, accelerometers and gyroscopes are used to measure these vehicle characteristics and to identify driving events in manoeuvres. Specifically, the critical acceleration due to the influence of the road geometry can be measured with inertial sensors, and then, this road acceleration with the lateral acceleration allows us to estimate the driver’s perceived acceleration. This perceived acceleration affects the driving style and, consequently, the estimation of the appropriate speed to travel on that road. There is use of both a traditional two-lane and a motorway route located in the Madrid region of Spain. Driving behaviour is determined by considering how changes in road geometry may affect one’s driving style and, consequently, the estimation of the proper speed. The results obtained with some of the proposed configurations of the intelligent hybrid system reach an accuracy of 97.21% in detecting dangerous driving or driving with a certain risk. This could allow generating real-time alerts for potentially dangerous or inattentive manoeuvres, leading to safer and more appropriate driving. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

13 pages, 541 KiB  
Article
A Neural Multi-Objective Capacitated Vehicle Routing Optimization Algorithm Based on Preference Adjustment
by Liting Wang, Chao Song, Yu Sun, Cuihua Lu and Qinghua Chen
Electronics 2023, 12(19), 4167; https://doi.org/10.3390/electronics12194167 - 07 Oct 2023
Cited by 1 | Viewed by 1275
Abstract
The vehicle routing problem (VRP) is a common problem in logistics and transportation with high application value. In the past, many methods have been proposed to solve the vehicle routing problem and achieved good results, but with the development of neural network technology, [...] Read more.
The vehicle routing problem (VRP) is a common problem in logistics and transportation with high application value. In the past, many methods have been proposed to solve the vehicle routing problem and achieved good results, but with the development of neural network technology, solving the VRP through neural combinatorial optimization has attracted more and more attention by researchers because of its short inference time and high parallelism. PMOCO is the most state-of-the-art multi-objective vehicle routing optimization algorithm. However, in PMOCO, preferences are often uniformly selected, which may lead to uneven Pareto sets and may reduce the quality of solutions. To solve this problem, we propose a multi-objective vehicle routing optimization algorithm based on preference adjustment, which is improved from PMOCO. We incorporate the weight adjustment method in PMOCO that is able to adapt to different approximate Pareto fronts and to find solutions with better quality. We treat the weight adjustment as a sequential decision process and train it through deep reinforcement learning. We find that our method could adaptively search for a better combination of preferences and have strong robustness. Our method is experimented on multi-objective vehicle routing problems and obtained good results (about 6% improvement compared with PMOCO with 20 preferences). Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

11 pages, 7159 KiB  
Article
Interactive Efficient Multi-Task Network for RGB-D Semantic Segmentation
by Xinhua Xu, Jinfu Liu and Hong Liu
Electronics 2023, 12(18), 3943; https://doi.org/10.3390/electronics12183943 - 19 Sep 2023
Viewed by 832
Abstract
Semantic segmentation is significant for robotic indoor activities. However, relying solely on RGB modality often leads to poor results due to limited information. Introducing other modalities can improve performance but also increases complexity and cost, making it unsuitable for real-time robotic applications. To [...] Read more.
Semantic segmentation is significant for robotic indoor activities. However, relying solely on RGB modality often leads to poor results due to limited information. Introducing other modalities can improve performance but also increases complexity and cost, making it unsuitable for real-time robotic applications. To address the balance issue of performance and speed in robotic indoor scenarios, we propose an interactive efficient multitask RGB-D semantic segmentation network (IEMNet) that utilizes both RGB and depth modalities. On the premise of ensuring rapid inference speed, we introduce a cross-modal feature rectification module, which calibrates the noise of RGB and depth modalities and achieves comprehensive cross-modal feature interaction. Furthermore, we propose a coordinate attention fusion module to achieve more effective feature fusion. Finally, an instance segmentation task is added to the decoder to assist in enhancing the performance of semantic segmentation. Experiments on two indoor scene datasets, NYUv2 and SUNRGB-D, demonstrate the superior performance of the proposed method, especially on the NYUv2, achieving 54.5% mIoU and striking an excellent balance between performance and inference speed at 42 frames per second. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

15 pages, 2236 KiB  
Article
Anomaly Detection Model of Network Dataflow Based on an Improved Grey Wolf Algorithm and CNN
by Liting Wang, Qinghua Chen and Chao Song
Electronics 2023, 12(18), 3787; https://doi.org/10.3390/electronics12183787 - 07 Sep 2023
Viewed by 922
Abstract
With the popularization of the network and the expansion of its application scope, the problem of abnormal network traffic caused by network attacks, malicious software, traffic peaks, or network device failures is becoming increasingly prominent. This problem not only leads to a decline [...] Read more.
With the popularization of the network and the expansion of its application scope, the problem of abnormal network traffic caused by network attacks, malicious software, traffic peaks, or network device failures is becoming increasingly prominent. This problem not only leads to a decline in network performance and service quality but also may pose a serious threat to network security. This paper proposes a hybrid data processing model based on deep learning for network anomaly detection to improve anomaly detection performance. First, the Grey Wolf optimization algorithm is improved to select high-quality data features, which are then converted to RGB images and input into an anomaly detection model. An anomaly detection model of network dataflow based on a convolutional neural network is designed to recognize network anomalies, including DoS (Denial of Service), R2L (Remote to Local), U2R (User to Root), and Probe (Probing). To verify the effectiveness of the improved Grey Wolf algorithm and the anomaly detection model, we conducted experiments on the KDD99 and UNSW-NB15 datasets. The proposed method achieves an average detection rate of 0.986, which is much higher than all the counterparts. Experimental results show that the accuracy and the detection rates of our method were improved, while the false alarm rate has been reduced, proving the effectiveness of our approach in network anomaly classification tasks. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

22 pages, 2251 KiB  
Article
Deep Reinforcement Learning for Dynamic Twin Automated Stacking Cranes Scheduling Problem
by Xin Jin, Nan Mi, Wen Song and Qiqiang Li
Electronics 2023, 12(15), 3288; https://doi.org/10.3390/electronics12153288 - 31 Jul 2023
Viewed by 1204
Abstract
Effective dynamic scheduling of twin Automated Stacking Cranes (ASCs) is essential for improving the efficiency of automated storage yards. While Deep Reinforcement Learning (DRL) has shown promise in a variety of scheduling problems, the dynamic twin ASCs scheduling problem is challenging owing to [...] Read more.
Effective dynamic scheduling of twin Automated Stacking Cranes (ASCs) is essential for improving the efficiency of automated storage yards. While Deep Reinforcement Learning (DRL) has shown promise in a variety of scheduling problems, the dynamic twin ASCs scheduling problem is challenging owing to its unique attributes, including the dynamic arrival of containers, sequence-dependent setup and potential ASC interference. A novel DRL method is proposed in this paper to minimize the ASC run time and traffic congestion in the yard. Considering the information interference from ineligible containers, dynamic masked self-attention (DMA) is designed to capture the location-related relationship between containers. Additionally, we propose local information complementary attention (LICA) to supplement congestion-related information for decision making. The embeddings grasped by the LICA-DMA neural architecture can effectively represent the system state. Extensive experiments show that the agent can learn high-quality scheduling policies. Compared with rule-based heuristics, the learned policies have significantly better performance with reasonable time costs. The policies also exhibit impressive generalization ability in unseen scenarios with various scales or distributions. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

22 pages, 9253 KiB  
Article
Image-Based Pothole Detection Using Multi-Scale Feature Network and Risk Assessment
by Dong-Hoe Heo, Ji-Yoon Choi, Sang-Baeg Kim, Tae-Oh Tak and Sheng-Peng Zhang
Electronics 2023, 12(4), 826; https://doi.org/10.3390/electronics12040826 - 06 Feb 2023
Cited by 4 | Viewed by 3965
Abstract
Potholes on road surfaces pose a serious hazard to vehicles and passengers due to the difficulty detecting them and the short response time. Therefore, many government agencies are applying various pothole-detection algorithms for road maintenance. However, current methods based on object detection are [...] Read more.
Potholes on road surfaces pose a serious hazard to vehicles and passengers due to the difficulty detecting them and the short response time. Therefore, many government agencies are applying various pothole-detection algorithms for road maintenance. However, current methods based on object detection are unclear in terms of real-time detection when using low-spec hardware systems. In this study, the SPFPN-YOLOv4 tiny was developed by combining spatial pyramid pooling and feature pyramid network with CSPDarknet53-tiny. A total of 2665 datasets were obtained via data augmentation, such as gamma regulation, horizontal flip, and scaling to compensate for the lack of data, and were divided into training, validation, and test of 70%, 20%, and 10% ratios, respectively. As a result of the comparison of YOLOv2, YOLOv3, YOLOv4 tiny, and SPFPN-YOLOv4 tiny, the SPFPN-YOLOv4 tiny showed approximately 2–5% performance improvement in the mean average precision (intersection over union = 0.5). In addition, the risk assessment based on the proposed SPFPN-YOLOv4 tiny was calculated by comparing the tire contact patch size with pothole size by applying the pinhole camera and distance estimation equation. In conclusion, we developed an end-to-end algorithm that can detect potholes and classify the risks in real-time using 2D pothole images. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

16 pages, 6528 KiB  
Article
3D Point Cloud Stitching for Object Detection with Wide FoV Using Roadside LiDAR
by Xiaowei Lan, Chuan Wang, Bin Lv, Jian Li, Mei Zhang and Ziyi Zhang
Electronics 2023, 12(3), 703; https://doi.org/10.3390/electronics12030703 - 31 Jan 2023
Cited by 3 | Viewed by 2966
Abstract
Light Detection and Ranging (LiDAR) is widely used in the perception of physical environment to complete object detection and tracking tasks. The current methods and datasets are mainly developed for autonomous vehicles, which could not be directly used for roadside perception. This paper [...] Read more.
Light Detection and Ranging (LiDAR) is widely used in the perception of physical environment to complete object detection and tracking tasks. The current methods and datasets are mainly developed for autonomous vehicles, which could not be directly used for roadside perception. This paper presents a 3D point cloud stitching method for object detection with wide horizontal field of view (FoV) using roadside LiDAR. Firstly, the base detection model is trained by KITTI dataset and has achieved detection accuracy of 88.94. Then, a new detection range of 180° can be inferred to break the limitation of camera’s FoV. Finally, multiple sets of detection results from a single LiDAR are stitched to build a 360° detection range and solve the problem of overlapping objects. The effectiveness of the proposed approach has been evaluated using KITTI dataset and collected point clouds. The experimental results show that the point cloud stitching method offers a cost-effective solution to achieve a larger FoV, and the number of output objects has increased by 77.15% more than the base model, which improves the detection performance of roadside LiDAR. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

15 pages, 3456 KiB  
Article
D-STGCN: Dynamic Pedestrian Trajectory Prediction Using Spatio-Temporal Graph Convolutional Networks
by Bogdan Ilie Sighencea, Ion Rareș Stanciu and Cătălin Daniel Căleanu
Electronics 2023, 12(3), 611; https://doi.org/10.3390/electronics12030611 - 26 Jan 2023
Cited by 3 | Viewed by 2806
Abstract
Predicting pedestrian trajectories in urban scenarios is a challenging task that has a wide range of applications, from video surveillance to autonomous driving. The task is difficult since pedestrian behavior is affected by both their individual path’s history, their interactions with others, and [...] Read more.
Predicting pedestrian trajectories in urban scenarios is a challenging task that has a wide range of applications, from video surveillance to autonomous driving. The task is difficult since pedestrian behavior is affected by both their individual path’s history, their interactions with others, and with the environment. For predicting pedestrian trajectories, an attention-based interaction-aware spatio-temporal graph neural network is introduced. This paper introduces an approach based on two components: a spatial graph neural network (SGNN) for interaction-modeling and a temporal graph neural network (TGNN) for motion feature extraction. The SGNN uses an attention method to periodically collect spatial interactions between all pedestrians. The TGNN employs an attention method as well, this time to collect each pedestrian’s temporal motion pattern. Finally, in the graph’s temporal dimension characteristics, a time-extrapolator convolutional neural network (CNN) is employed to predict the trajectories. Using a lower variable size (data and model) and a better accuracy, the proposed method is compact, efficient, and better than the one represented by the social-STGCNN. Moreover, using three video surveillance datasets (ETH, UCY, and SDD), D-STGCN achieves better experimental results considering the average displacement error (ADE) and final displacement error (FDE) metrics, in addition to predicting more social trajectories. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

14 pages, 4997 KiB  
Article
Time Synchronization and Space Registration of Roadside LiDAR and Camera
by Chuan Wang, Shijie Liu, Xiaoyan Wang and Xiaowei Lan
Electronics 2023, 12(3), 537; https://doi.org/10.3390/electronics12030537 - 20 Jan 2023
Cited by 2 | Viewed by 2530
Abstract
The sensing system consisting of Light Detection and Ranging (LiDAR) and a camera provides complementary information about the surrounding environment. To take full advantage of multi-source data provided by different sensors, an accurate fusion of multi-source sensor information is needed. Time synchronization and [...] Read more.
The sensing system consisting of Light Detection and Ranging (LiDAR) and a camera provides complementary information about the surrounding environment. To take full advantage of multi-source data provided by different sensors, an accurate fusion of multi-source sensor information is needed. Time synchronization and space registration are the key technologies that affect the fusion accuracy of multi-source sensors. Due to the difference in data acquisition frequency and deviation in startup time between LiDAR and the camera, asynchronous data acquisition between LiDAR and camera is easy to occur, which has a significant influence on subsequent data fusion. Therefore, a time synchronization method of multi-source sensors based on frequency self-matching is developed in this paper. Without changing the sensor frequency, the sensor data are processed to obtain the same number of data frames and set the same ID number, so that the LiDAR and camera data correspond one by one. Finally, data frames are merged into new data packets to realize time synchronization between LiDAR and camera. Based on time synchronization, to achieve spatial synchronization, a nonlinear optimization algorithm of joint calibration parameters is used, which can effectively reduce the reprojection error in the process of sensor spatial registration. The accuracy of the proposed time synchronization method is 99.86% and the space registration accuracy is 99.79%, which is better than the calibration method of the Matlab calibration toolbox. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

12 pages, 1364 KiB  
Article
VMLH: Efficient Video Moment Location via Hashing
by Zhifang Tan, Fei Dong, Xinfang Liu, Chenglong Li and Xiushan Nie
Electronics 2023, 12(2), 420; https://doi.org/10.3390/electronics12020420 - 13 Jan 2023
Cited by 1 | Viewed by 1284
Abstract
Video-moment location by query is a hot topic in video understanding. However, most of the existing methods ignore the importance of location efficiency in practical application scenarios; video and query sentences have to be fed into the network at the same time during [...] Read more.
Video-moment location by query is a hot topic in video understanding. However, most of the existing methods ignore the importance of location efficiency in practical application scenarios; video and query sentences have to be fed into the network at the same time during the retrieval, which leads to low efficiency. To address this issue, in this study, we propose an efficient video moment location via hashing (VMLH). In the proposed method, query sentences and video clips are, respectively, converted into hash codes and hash code sets, in which the semantic similarity between query sentences and video clips is preserved. The location prediction network is designed to predict the corresponding timestamp according to the similarity among hash codes, and the videos do not need to be fed into the network during the process of retrieval and location. Furthermore, different from the existing methods, which require complex interactions and fusion between video and query sentences, the proposed VMLH method only needs a simple XOR operation among codes to locate the video moment with high efficiency. This paper lays the foundation for fast video clip positioning and makes it possible to apply large-scale video clip positioning in practice. The experimental results on two public datasets demonstrate the effectiveness of the method. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

16 pages, 3532 KiB  
Article
Few-Shot Learning Based on Double Pooling Squeeze and Excitation Attention
by Qiuyu Xu, Jie Su, Ying Wang, Jing Zhang and Yixin Zhong
Electronics 2023, 12(1), 27; https://doi.org/10.3390/electronics12010027 - 21 Dec 2022
Cited by 2 | Viewed by 1899
Abstract
Training a generalized reliable model is a great challenge since sufficiently labeled data are unavailable in some open application scenarios. Few-shot learning (FSL) aims to learn new problems with only a few examples that can tackle this problem and attract extensive attention. This [...] Read more.
Training a generalized reliable model is a great challenge since sufficiently labeled data are unavailable in some open application scenarios. Few-shot learning (FSL) aims to learn new problems with only a few examples that can tackle this problem and attract extensive attention. This paper proposes a novel few-shot learning method based on double pooling squeeze and excitation attention (dSE) for the purpose of improving the discriminative ability of the model by proposing a novel feature expression. Specifically, the proposed dSE module adopts two types of pooling to emphasize features responding to foreground object channels. We employed both the pixel descriptor and channel descriptor to capture locally identifiable channel features and pixel features of an image (as opposed to traditional few-shot learning methods). Additionally, in order to improve the robustness of the model, we designed a new loss function. To verify the performance of the method, a large number of experiments were performed on multiple standard few-shot image benchmark datasets, showing that our framework can outperform several existing approaches. Moreover, we performed extensive experiments on three more challenging fine-grained few-shot datasets, the experimental results demonstrate that the proposed method achieves state-of-the-art performances. In particular, this work achieves 92.36% accuracy under the 5-way–5-shot classification setting of the Stanford Cars dataset. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

22 pages, 13072 KiB  
Article
Selection of Relevant Geometric Features Using Filter-Based Algorithms for Point Cloud Semantic Segmentation
by Muhammed Enes Atik and Zaide Duran
Electronics 2022, 11(20), 3310; https://doi.org/10.3390/electronics11203310 - 14 Oct 2022
Cited by 2 | Viewed by 1660
Abstract
Semantic segmentation of mobile LiDAR point clouds is an essential task in many fields such as road network management, mapping, urban planning, and 3D High Definition (HD) city maps for autonomous vehicles. This study presents an approach to improve the evaluation metrics of [...] Read more.
Semantic segmentation of mobile LiDAR point clouds is an essential task in many fields such as road network management, mapping, urban planning, and 3D High Definition (HD) city maps for autonomous vehicles. This study presents an approach to improve the evaluation metrics of deep-learning-based point cloud semantic segmentation using 3D geometric features and filter-based feature selection. Information gain (IG), Chi-square (Chi2), and ReliefF algorithms are used to select relevant features. RandLA-Net and Superpoint Grapgh (SPG), the current and effective deep learning networks, were preferred for applying semantic segmentation. RandLA-Net and SPG were fed by adding geometric features in addition to 3D coordinates (x, y, z) directly without any change in the structure of the point clouds. Experiments were carried out on three challenging mobile LiDAR datasets: Toronto3D, SZTAKI-CityMLS, and Paris. As a result of the study, it was demonstrated that the selection of relevant features improved accuracy in all datasets. For RandLA-Net, mean Intersection-over-Union (mIoU) was 70.1% with the features selected with Chi2 in the Toronto3D dataset, 84.1% mIoU was obtained with the features selected with the IG in the SZTAKI-CityMLS dataset, and 55.2% mIoU with the features selected with the IG and ReliefF in the Paris dataset. For SPG, 69.8% mIoU was obtained with Chi2 in the Toronto3D dataset, 77.5% mIoU was obtained with IG in SZTAKI-CityMLS, and 59.0% mIoU was obtained with IG and ReliefF in Paris. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Graphical abstract

15 pages, 2286 KiB  
Article
Anchor-Free Object Detection with Scale-Aware Networks for Autonomous Driving
by Zhengquan Piao, Junbo Wang, Linbo Tang, Baojun Zhao and Shichao Zhou
Electronics 2022, 11(20), 3303; https://doi.org/10.3390/electronics11203303 - 13 Oct 2022
Cited by 3 | Viewed by 1268
Abstract
Current anchor-free object detectors do not rely on anchors and obtain comparable accuracy with anchor-based detectors. However, anchor-free object detectors that adopt a single-level feature map and lack a feature pyramid network (FPN) prior information about an object’s scale; thus, they insufficiently adapt [...] Read more.
Current anchor-free object detectors do not rely on anchors and obtain comparable accuracy with anchor-based detectors. However, anchor-free object detectors that adopt a single-level feature map and lack a feature pyramid network (FPN) prior information about an object’s scale; thus, they insufficiently adapt to large object scale variation, especially for autonomous driving in complex road scenes. To address this problem, we propose a divide-and-conquer solution and attempt to introduce some prior information about object scale variation into the model when maintaining a streamlined network structure. Specifically, for small-scale objects, we add some dense layer jump connections between the shallow high-resolution feature layers and the deep high-semantic feature layers. For large-scale objects, dilated convolution is used as an ingredient to cover the features of large-scale objects. Based on this, a scale adaptation module is proposed. In this module, different dilated convolution expansion rates are utilized to change the network’s receptive field size, which can adapt to changes from small-scale to large-scale. The experimental results show that the proposed model has better detection performance with different object scales than existing detectors. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

10 pages, 2352 KiB  
Article
Monoscopic Phase Measuring Deflectometry Simulation and Verification
by Zhiming Li, Dayi Yin, Quan Zhang and Huixing Gong
Electronics 2022, 11(10), 1634; https://doi.org/10.3390/electronics11101634 - 20 May 2022
Cited by 2 | Viewed by 1756
Abstract
The three-dimensional (3D) shape of specular surfaces is important in aerospace, precision instrumentation, and automotive manufacturing. The phase measuring deflectometry (PMD) method is an efficient and highly accurate technique to measure specular surfaces. A novel simulation model with simulated fringe patterns for monoscopic [...] Read more.
The three-dimensional (3D) shape of specular surfaces is important in aerospace, precision instrumentation, and automotive manufacturing. The phase measuring deflectometry (PMD) method is an efficient and highly accurate technique to measure specular surfaces. A novel simulation model with simulated fringe patterns for monoscopic PMD is developed in this study. Based on the pre-calibration and the ray-tracing model of the monoscopic PMD system, a comprehensive model from deformed pattern generation to shape reconstruction was constructed. Experimental results showed that this model achieved high levels of measuring accuracy in both planar and concave surfaces measurement. In planar surface measurement, the peak to valley (PV) value and root mean square (RMS) value of the reconstructed shape can reach 26.93 nm and 10.32 nm, respectively. In addition, the accuracy of the reconstructed concave surface can reach a micrometre scale. This work potentially fills critical gaps in monoscopic PMD simulation and provides a cost-effective method of PMD study. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

13 pages, 2235 KiB  
Article
Few-Shot Object Detection Method Based on Knowledge Reasoning
by Jianwei Wang and Deyun Chen
Electronics 2022, 11(9), 1327; https://doi.org/10.3390/electronics11091327 - 22 Apr 2022
Cited by 4 | Viewed by 1797
Abstract
Human beings have the ability to quickly recognize novel concepts with the help of scene semantics. This kind of ability is meaningful and full of challenge for the field of machine learning. At present, object recognition methods based on deep learning have achieved [...] Read more.
Human beings have the ability to quickly recognize novel concepts with the help of scene semantics. This kind of ability is meaningful and full of challenge for the field of machine learning. At present, object recognition methods based on deep learning have achieved excellent results with the use of large-scale labeled data. However, the data scarcity of novel objects significantly affects the performance of these recognition methods. In this work, we investigated utilizing knowledge reasoning with visual information in the training of a novel object detector. We trained a detector to project the image representations of objects into an embedding space. Knowledge subgraphs were extracted to describe the semantic relation of the specified visual scenes. The spatial relationship, function relationship, and the attribute description were defined to realize the reasoning of novel classes. The designed few-shot detector, named KR-FSD, is robust and stable to the variation of shots of novel objects, and it also has advantages when detecting objects in a complex environment due to the flexible extensibility of KGs. Experiments on VOC and COCO datasets showed that the performance of the detector was increased significantly when the novel class was strongly associated with some of the base classes, due to the better knowledge propagation between the novel class and the related groups of classes. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

14 pages, 4257 KiB  
Article
Multi-Task Learning Using Gradient Balance and Clipping with an Application in Joint Disparity Estimation and Semantic Segmentation
by Yiyou Guo and Chao Wei
Electronics 2022, 11(8), 1217; https://doi.org/10.3390/electronics11081217 - 12 Apr 2022
Viewed by 1706
Abstract
In this paper, we propose a novel multi-task learning (MTL) strategy from the gradient optimization view which enables automatically learning the optimal gradient from different tasks. In contrast with current multi-task learning methods which rely on careful network architecture adjustment or elaborate loss [...] Read more.
In this paper, we propose a novel multi-task learning (MTL) strategy from the gradient optimization view which enables automatically learning the optimal gradient from different tasks. In contrast with current multi-task learning methods which rely on careful network architecture adjustment or elaborate loss functions optimization, the proposed gradient-based MTL is simple and flexible. Specifically, we introduce a multi-task stochastic gradient descent optimization (MTSGD) to learn task-specific and shared representation in the deep neural network. In MTSGD, we decompose the total gradient into multiple task-specific sub-gradients and find the optimal sub-gradient via gradient balance and clipping operations. In this way, the learned network can satisfy the performance of specific task optimization while maintaining the shared representation. We take the joint learning of semantic segmentation and disparity estimation tasks as the exemplar to verify the effectiveness of the proposed method. Extensive experimental results on a large-scale dataset show that our proposed algorithm is superior to the baseline methods by a large margin. Meanwhile, we perform a series of ablation studies to have a deep analysis of gradient descent for MTL. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Title: End-to-End Aerial Image Segmentation via Deformable Attention Network
Authors: Yiyou Guo
Affiliation: Tongji University

Title: Hierarchical Beyesian network for Video Instance Segmentation
Authors: Zheyun Qin
Affiliation: Hebei University

Back to TopTop