Multimodal Data Fusion and Computational Optimization for Intelligent Perception

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (15 April 2024) | Viewed by 18040

Special Issue Editors


E-Mail Website
Guest Editor
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
Interests: intelligent perception and computation; artificial intelligence; blockchain and industrial internet

E-Mail Website
Guest Editor
Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University, Shanghai 201804, China
Interests: artificial intelligence; wireless communications; multi-dimensional information processing

E-Mail Website
Guest Editor
Feliciano School of Business, Montclair State University, Montclair, NJ 07043, USA
Interests: business analytics; data mining; supply chain management; decision support systems

Special Issue Information

Dear Colleagues, 

The concept of multimodal data fusion was introduced to break down the research barriers of intelligent perception using single modality and brings great application prospects. Studies on multimodality have attracted much attention from both academia and the industry. Multisource sensors play a complementary role, being able to supply information for single sensor systems in complex environments. The ability of perception, interaction, and intelligent collaboration with multimodal fusion technology has improved the intelligence level of various applications, helping to realize higher precision in dynamic scenes based on the computing power of accelerator platforms. Multimodal data fusion is now applied to all areas of intelligent perception, providing a new path for further improvements in perception and understanding. Research on multimodal data fusion has driven the application of related algorithms and techniques in areas such as automatic driving, remote sensing, and industrial internet. Although multimodal data fusion brings multiple benefits to a variety of applications, it still faces many problems and challenges, such as insufficient fused data types, highly customized fusion methods, and time-consuming fusion algorithms. The methodology of multimodal fusion and its applications also need to be further optimized. We look forward to the latest research findings that suggest theories and practical solutions for multimodal data fusion and its applications.

Suggested Topics:

The topics of interest include but are not limited to:

  • Methodology of multimodal fusion;
  • Visible and infrared image fusion;
  • Image and point cloud fusion;
  • Radar–camera fusion;
  • Autopilot multimode sensing;
  • Multimodal remote sensing data fusion and application;
  • Object detection and semantic segmentation;
  • Anomaly detection;
  • Design of a lightweight multi-mode fusion method;
  • Application of multi-modal data fusion.

Prof. Dr. Tao Shen
Prof. Dr. Lei Zhang
Prof. Dr. John Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • methodology of multimodal fusion
  • visible and infrared image fusion
  • image and point cloud fusion
  • radar–camera fusion
  • autopilot multimode sensing
  • multimodal remote sensing data fusion and application
  • object detection and semantic segmentation
  • anomaly detection
  • design of a lightweight multi-mode fusion method
  • application of multimodal data fusion

Published Papers (14 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 928 KiB  
Article
Text-Centric Multimodal Contrastive Learning for Sentiment Analysis
by Heng Peng, Xue Gu, Jian Li, Zhaodan Wang and Hao Xu
Electronics 2024, 13(6), 1149; https://doi.org/10.3390/electronics13061149 - 21 Mar 2024
Viewed by 560
Abstract
Multimodal sentiment analysis aims to acquire and integrate sentimental cues from different modalities to identify the sentiment expressed in multimodal data. Despite the widespread adoption of pre-trained language models in recent years to enhance model performance, current research in multimodal sentiment analysis still [...] Read more.
Multimodal sentiment analysis aims to acquire and integrate sentimental cues from different modalities to identify the sentiment expressed in multimodal data. Despite the widespread adoption of pre-trained language models in recent years to enhance model performance, current research in multimodal sentiment analysis still faces several challenges. Firstly, although pre-trained language models have significantly elevated the density and quality of text features, the present models adhere to a balanced design strategy that lacks a concentrated focus on textual content. Secondly, prevalent feature fusion methods often hinge on spatial consistency assumptions, neglecting essential information about modality interactions and sample relationships within the feature space. In order to surmount these challenges, we propose a text-centric multimodal contrastive learning framework (TCMCL). This framework centers around text and augments text features separately from audio and visual perspectives. In order to effectively learn feature space information from different cross-modal augmented text features, we devised two contrastive learning tasks based on instance prediction and sentiment polarity; this promotes implicit multimodal fusion and obtains more abstract and stable sentiment representations. Our model demonstrates performance that surpasses the current state-of-the-art methods on both the CMU-MOSI and CMU-MOSEI datasets. Full article
Show Figures

Figure 1

19 pages, 917 KiB  
Article
Hybrid Uncertainty Calibration for Multimodal Sentiment Analysis
by Qiuyu Pan and Zuqiang Meng
Electronics 2024, 13(3), 662; https://doi.org/10.3390/electronics13030662 - 05 Feb 2024
Viewed by 592
Abstract
In open environments, multimodal sentiment analysis (MSA) often suffers from low-quality data and can be disrupted by noise, inherent defects, and outliers. In some cases, unreasonable multimodal fusion methods can perform worse than unimodal methods. Another challenge of MSA is effectively enabling the [...] Read more.
In open environments, multimodal sentiment analysis (MSA) often suffers from low-quality data and can be disrupted by noise, inherent defects, and outliers. In some cases, unreasonable multimodal fusion methods can perform worse than unimodal methods. Another challenge of MSA is effectively enabling the model to provide accurate prediction when it is confident and to indicate high uncertainty when its prediction is likely to be inaccurate. In this paper, we propose an uncertain-aware late fusion based on hybrid uncertainty calibration (ULF-HUC). Firstly, we conduct in-depth research on the issue of sentiment polarity distribution in MSA datasets, establishing a foundation for an uncertain-aware late fusion method, which facilitates organic fusion of modalities. Then, we propose a hybrid uncertainty calibration method based on evidential deep learning (EDL) that balances accuracy and uncertainty, supporting the reduction of uncertainty in each modality of the model. Finally, we add two common types of noise to validate the effectiveness of our proposed method. We evaluate our model on three publicly available MSA datasets (MVSA-Single, MVSA-Multiple, and MVSA-Single-Small). Our method outperforms state-of-the-art approaches in terms of accuracy, weighted F1 score, and expected uncertainty calibration error (UCE) metrics, proving the effectiveness of the proposed method. Full article
Show Figures

Figure 1

14 pages, 4464 KiB  
Article
Intelligent Visual Representation for Java Code Data in the Field of Software Engineering Based on Remote Sensing Techniques
by Dian Li, Weidong Wang and Yang Zhao
Electronics 2023, 12(24), 5009; https://doi.org/10.3390/electronics12245009 - 14 Dec 2023
Cited by 1 | Viewed by 881
Abstract
In the field of software engineering, large and complex code bases may lead to some burden of understanding their structure and meaning for developers. To reduce the burden on developers, we consider a code base visualization method to visually express the meaning of [...] Read more.
In the field of software engineering, large and complex code bases may lead to some burden of understanding their structure and meaning for developers. To reduce the burden on developers, we consider a code base visualization method to visually express the meaning of code bases. Inspired by remote sensing imagery, we employ graphical representations to illustrate the semantic connections within Java code bases, aiming to help developers understand its meaning and logic. This approach is segmented into three distinct levels of analysis. First, at the project-level, we visualize Java projects by portraying each file as an element within a code forest, offering a broad overview of the project’s structure. This macro-view perspective aids in swiftly grasping the project’s layout and hierarchy. Second, at the file-level, we concentrate on individual files, using visualization techniques to highlight their unique attributes and complexities. This perspective enables a deeper understanding of each file’s structure and its role within the larger project. Finally, at the component-level, our focus shifts to the detailed analysis of Java methods and classes. We examine these components for complexity and other specific characteristics, providing insights that are crucial for the optimization of code and the enhancement of software quality. By integrating remote sensing technology, our method offers software engineers deeper insights into code quality, significantly enhancing the software development lifecycle and its outcomes. Full article
Show Figures

Figure 1

14 pages, 3240 KiB  
Article
Research on Lightweight-Based Algorithm for Detecting Distracted Driving Behaviour
by Chengcheng Lou and Xin Nie
Electronics 2023, 12(22), 4640; https://doi.org/10.3390/electronics12224640 - 14 Nov 2023
Viewed by 764
Abstract
In order to solve the existing distracted driving behaviour detection algorithms’ problems such as low recognition accuracy, high leakage rate, high false recognition rate, poor real-time performance, etc., and to achieve high-precision real-time detection of common distracted driving behaviours (mobile phone use, smoking, [...] Read more.
In order to solve the existing distracted driving behaviour detection algorithms’ problems such as low recognition accuracy, high leakage rate, high false recognition rate, poor real-time performance, etc., and to achieve high-precision real-time detection of common distracted driving behaviours (mobile phone use, smoking, drinking), this paper proposes a driver distracted driving behaviour recognition algorithm based on YOLOv5. Firstly, to address the problem of poor real-time identification, the computational and parametric quantities of the network are reduced by introducing a lightweight network, Ghostnet. Secondly, the use of GSConv reduces the complexity of the algorithm and ensures that there is a balance between the recognition speed and accuracy of the algorithm. Then, for the problem of missed and misidentified cigarettes during the detection process, the Soft-NMS algorithm is used to reduce the problems of missed and false detection of cigarettes without changing the computational complexity. Finally, in order to better detect the target of interest, the CBAM is utilised to enhance the algorithm’s attention to the target of interest. The experiments show that on the homemade distracted driving behaviour dataset, the improved YOLOv5 model improves the mAP@0.5 of the YOLOv5s by 1.5 percentage points, while the computational volume is reduced by 7.6 GFLOPs, which improves the accuracy of distracted driving behaviour recognition and ensures the real-time performance of the detection speed. Full article
Show Figures

Figure 1

15 pages, 2400 KiB  
Article
Multimodal Sentiment Analysis in Realistic Environments Based on Cross-Modal Hierarchical Fusion Network
by Ju Huang, Pengtao Lu, Shuifa Sun and Fangyi Wang
Electronics 2023, 12(16), 3504; https://doi.org/10.3390/electronics12163504 - 18 Aug 2023
Cited by 3 | Viewed by 1890
Abstract
In the real world, multimodal sentiment analysis (MSA) enables the capture and analysis of sentiments by fusing multimodal information, thereby enhancing the understanding of real-world environments. The key challenges lie in handling the noise in the acquired data and achieving effective multimodal fusion. [...] Read more.
In the real world, multimodal sentiment analysis (MSA) enables the capture and analysis of sentiments by fusing multimodal information, thereby enhancing the understanding of real-world environments. The key challenges lie in handling the noise in the acquired data and achieving effective multimodal fusion. When processing the noise in data, existing methods utilize the combination of multimodal features to mitigate errors in sentiment word recognition caused by the performance limitations of automatic speech recognition (ASR) models. However, there still remains the problem of how to more efficiently utilize and combine different modalities to address the data noise. In multimodal fusion, most existing fusion methods have limited adaptability to the feature differences between modalities, making it difficult to capture the potential complex nonlinear interactions that may exist between modalities. To overcome the aforementioned issues, this paper proposes a new framework named multimodal-word-refinement and cross-modal-hierarchy (MWRCMH) fusion. Specifically, we utilized a multimodal word correction module to reduce sentiment word recognition errors caused by ASR. During multimodal fusion, we designed a cross-modal hierarchical fusion module that employed cross-modal attention mechanisms to fuse features between pairs of modalities, resulting in fused bimodal-feature information. Then, the obtained bimodal information and the unimodal information were fused through the nonlinear layer to obtain the final multimodal sentiment feature information. Experimental results on the MOSI-SpeechBrain, MOSI-IBM, and MOSI-iFlytek datasets demonstrated that the proposed approach outperformed other comparative methods, achieving Has0-F1 scores of 76.43%, 80.15%, and 81.93%, respectively. Our approach exhibited better performance, as compared to multiple baselines. Full article
Show Figures

Figure 1

19 pages, 9015 KiB  
Article
A Deep Learning-Enhanced Multi-Modal Sensing Platform for Robust Human Object Detection and Tracking in Challenging Environments
by Peng Cheng, Zinan Xiong, Yajie Bao, Ping Zhuang, Yunqi Zhang, Erik Blasch and Genshe Chen
Electronics 2023, 12(16), 3423; https://doi.org/10.3390/electronics12163423 - 12 Aug 2023
Cited by 1 | Viewed by 1769
Abstract
In modern security situations, tracking multiple human objects in real-time within challenging urban environments is a critical capability for enhancing situational awareness, minimizing response time, and increasing overall operational effectiveness. Tracking multiple entities enables informed decision-making, risk mitigation, and the safeguarding of civil-military [...] Read more.
In modern security situations, tracking multiple human objects in real-time within challenging urban environments is a critical capability for enhancing situational awareness, minimizing response time, and increasing overall operational effectiveness. Tracking multiple entities enables informed decision-making, risk mitigation, and the safeguarding of civil-military operations to ensure safety and mission success. This paper presents a multi-modal electro-optical/infrared (EO/IR) and radio frequency (RF) fused sensing (MEIRFS) platform for real-time human object detection, recognition, classification, and tracking in challenging environments. By utilizing different sensors in a complementary manner, the robustness of the sensing system is enhanced, enabling reliable detection and recognition results across various situations. Specifically designed radar tags and thermal tags can be used to discriminate between friendly and non-friendly objects. The system incorporates deep learning-based image fusion and human object recognition and tracking (HORT) algorithms to ensure accurate situation assessment. After integrating into an all-terrain robot, multiple ground tests were conducted to verify the consistency of the HORT in various environments. The MEIRFS sensor system has been designed to meet the Size, Weight, Power, and Cost (SWaP-C) requirements for installation on autonomous ground and aerial vehicles. Full article
Show Figures

Figure 1

15 pages, 2857 KiB  
Article
MRI Image Fusion Based on Sparse Representation with Measurement of Patch-Based Multiple Salient Features
by Qiu Hu, Weiming Cai, Shuwen Xu and Shaohai Hu
Electronics 2023, 12(14), 3058; https://doi.org/10.3390/electronics12143058 - 12 Jul 2023
Cited by 1 | Viewed by 807
Abstract
Multimodal medical image fusion is a fundamental, but challenging, problem in the fields of brain science research and brain disease diagnosis, as it is challenging for sparse representation (SR)-based fusion to characterize activity levels with a single measurement and not lose effective information. [...] Read more.
Multimodal medical image fusion is a fundamental, but challenging, problem in the fields of brain science research and brain disease diagnosis, as it is challenging for sparse representation (SR)-based fusion to characterize activity levels with a single measurement and not lose effective information. In this study, the Kronecker-criterion-based SR framework was applied for medical image fusion with a patch-based activity level, integrating salient features of multiple domains. Inspired by the formation process of vision systems, the spatial saliency was characterized by textural contrast (TC), composed of luminance and orientation contrasts, to promote the participation of more highlighted textural information in the fusion process. As a substitute for the conventional l1-norm-based sparse saliency, the sum of sparse salient features (SSSF) was used as a metric for promoting the participation of more significant coefficients in the composition of the activity level measurement. The designed activity level measurement was verified to be more conducive to maintaining the integrity and sharpness of detailed information. Various experiments on multiple groups of clinical medical images verified the effectiveness of the proposed fusion method in terms of both visual quality and objective assessment. Furthermore, this study will be helpful for the further detection and segmentation of medical images. Full article
Show Figures

Figure 1

13 pages, 12279 KiB  
Article
A Complex Empirical Mode Decomposition for Multivariant Traffic Time Series
by Guochen Shen and Lei Zhang
Electronics 2023, 12(11), 2476; https://doi.org/10.3390/electronics12112476 - 31 May 2023
Viewed by 999
Abstract
Data-driven modeling methods have been widely used in many applications or studies of traffic systems with complexity and chaos. The empirical mode decomposition (EMD) family provides a lightweight analytical method for non-stationary and non-linear data. However, a large amount of traffic data in [...] Read more.
Data-driven modeling methods have been widely used in many applications or studies of traffic systems with complexity and chaos. The empirical mode decomposition (EMD) family provides a lightweight analytical method for non-stationary and non-linear data. However, a large amount of traffic data in practice are usually multidimensional, so the EMD family cannot be used directly for those data. In this paper, a method to calculate the extremum point and the envelope-like function (series) from the complex function (series) is proposed so that the EMD family can be applied to two-variate traffic time-series data. Compared to the existing multivariate EMD, the proposed method has advantages in computational burden, flexibility and adaptivity. Two-dimensional trajectory data were used to test the method and its oscillatory characteristics were extracted. The decomposed feature can be used for data-driven traffic analysis and modeling. The proposed method also extends the utilization of EMD to multivariate traffic data for applications such as traffic data denoising, pattern recognition, traffic flow dynamic evaluation, traffic prediction, etc. Full article
Show Figures

Figure 1

11 pages, 3399 KiB  
Article
Recognition of Lane Changing Maneuvers for Vehicle Driving Safety
by Yuming Wu, Lei Zhang, Ren Lou and Xinghua Li
Electronics 2023, 12(6), 1456; https://doi.org/10.3390/electronics12061456 - 19 Mar 2023
Cited by 3 | Viewed by 1758
Abstract
The increasing number of vehicles has caused traffic conditions to become increasingly complicated in terms of safety. Emerging autonomous vehicles (AVs) have the potential to significantly reduce crashes. The advanced driver assistance system (ADAS) has received widespread attention. Lane keeping and lane changing [...] Read more.
The increasing number of vehicles has caused traffic conditions to become increasingly complicated in terms of safety. Emerging autonomous vehicles (AVs) have the potential to significantly reduce crashes. The advanced driver assistance system (ADAS) has received widespread attention. Lane keeping and lane changing are two basic driving maneuvers on highways. It is very important for ADAS technology to identify them effectively. The lane changing maneuver recognition has been used to study traffic safety for many years. Different models have been proposed. With the development of technology, machine learning has been introduced in this field with effective results. However, models which require a lot of physical data as input and unaffordable sensors lead to the high cost of AV platforms. This impedes the development of AVs. This study proposes a model of lane changing maneuver recognition based on a distinct set of physical data. The driving scenario from the natural vehicle trajectory dataset (i.e., HighD) is used for machine learning. Acceleration and velocity are extracted and labeled as physical data. The normalized features are then input into the k-nearest neighbor (KNN) classification model. The trained model was applied to another set of data and received good results. The results show that based on the acceleration features, the classification accuracy of lane keeping (LK), lane changing to the left (LCL) and lane changing to the right (LCR) is 100%, 97.89% and 96.19%. Full article
Show Figures

Figure 1

15 pages, 6094 KiB  
Article
Lightweight Pedestrian Detection Based on Feature Multiplexed Residual Network
by Mengzhou Sha, Kai Zeng, Zhimin Tao, Zhifeng Wang and Quanjun Liu
Electronics 2023, 12(4), 918; https://doi.org/10.3390/electronics12040918 - 11 Feb 2023
Cited by 1 | Viewed by 1307
Abstract
As an important part of autonomous driving intelligence perception, pedestrian detection has high requirements for parameter size, real-time, and model performance. Firstly, a novel multiplexed connection residual block is proposed to construct the lightweight network for improving the ability to extract pedestrian features. [...] Read more.
As an important part of autonomous driving intelligence perception, pedestrian detection has high requirements for parameter size, real-time, and model performance. Firstly, a novel multiplexed connection residual block is proposed to construct the lightweight network for improving the ability to extract pedestrian features. Secondly, the lightweight scalable attention module is investigated to expand the local perceptual field of the model based on dilated convolution that can maintain the most important feature channels. Finally, we verify the proposed model on the Caltech pedestrian dataset and BDD 100 K datasets. The results show that the proposed method is superior to existing lightweight pedestrian detection methods in terms of model size and detection performance. Full article
Show Figures

Figure 1

13 pages, 1324 KiB  
Article
Unsupervised Domain Adaptive Person Re-Identification Method Based on Transformer
by Xiai Yan, Shengkai Ding, Wei Zhou, Weiqi Shi and Hua Tian
Electronics 2022, 11(19), 3082; https://doi.org/10.3390/electronics11193082 - 27 Sep 2022
Viewed by 1131
Abstract
Person re-identification (ReID) is the problem of cross-camera target retrieval. The extraction of robust and discriminant features is the key factor in realizing the correct correlation of targets. A model based on convolutional neural networks (CNNs) can extract more robust image features. Still, [...] Read more.
Person re-identification (ReID) is the problem of cross-camera target retrieval. The extraction of robust and discriminant features is the key factor in realizing the correct correlation of targets. A model based on convolutional neural networks (CNNs) can extract more robust image features. Still, it completes the extraction of images from local information to global information by continuously accumulating convolution layers. As a complex CNN, a vision transformer (ViT) captures global information from the beginning to extract more powerful features. This paper proposes an unsupervised domain adaptive person re-identification model (ViTReID) based on the vision transformer, taking the ViT model trained on ImageNet as the pre-training weight and a transformer encoder as the feature extraction network, which makes up for some defects of the CNN model. At the same time, the combined loss function of cross-entropy and triplet loss function combined with the center loss function is used to optimize the network; the person’s head is evaluated and trained as a local feature combined with the global feature of the whole body, focusing on the head, to enhance the head feature information. The experimental results show that ViTReID exceeds the baseline method (SSG) by 14% (Market1501 → MSMT17) in mean average precision (mAP). In MSMT17 → Market1501, ViTReID is 1.2% higher in rank-1 (R1) accuracy than a state-of-the-art method (SPCL); in PersonX → MSMT17, the mAP is 3.1% higher than that of the MMT-dbscan method, and in PersonX → Market1501, the mAP is 1.5% higher than that of the MMT-dbscan method. Full article
Show Figures

Figure 1

12 pages, 1696 KiB  
Article
Research on Adaptive Exponential Droop Control Strategy for VSC-MTDC System
by Jianying Li, Minsheng Yang, Jianqi Li, Yunchang Xiao and Jingying Wan
Electronics 2022, 11(17), 2788; https://doi.org/10.3390/electronics11172788 - 04 Sep 2022
Cited by 3 | Viewed by 1235
Abstract
To solve the problem of large DC voltage deviation caused by the power fluctuations and poor power distribution characteristics of converters in a voltage source converter multi-terminal DC (VSC-MTDC) system based on traditional droop control, this paper proposes an adaptive exponential droop control [...] Read more.
To solve the problem of large DC voltage deviation caused by the power fluctuations and poor power distribution characteristics of converters in a voltage source converter multi-terminal DC (VSC-MTDC) system based on traditional droop control, this paper proposes an adaptive exponential droop control strategy. This strategy introduces the relative power deviation factor of the converter, and replaces the traditional linear droop control curve with a nonlinear exponential curve. Under different working conditions, the converter adaptively adjusts the droop control coefficient of the converter according to the relative power deviation factor to realize stability for the DC voltage and a reasonable power distribution for the MTDC system. A simulation model of a three-terminal VSC-MTDC was established in MATLAB/Simulink, and the feasibility and effectiveness of the proposed strategy were verified. Full article
Show Figures

Figure 1

14 pages, 6061 KiB  
Article
Clustering-Based Decision Tree for Vehicle Routing Spatio-Temporal Selection
by Yixiao Liu, Lei Zhang, Yixuan Zhou, Qin Xu, Wen Fu and Tao Shen
Electronics 2022, 11(15), 2379; https://doi.org/10.3390/electronics11152379 - 29 Jul 2022
Cited by 1 | Viewed by 1154
Abstract
The algorithm of the clustering-based decision tree, which is a methodology of multimodal fusion, has made many achievements in many fields. However, it is not common in the field of transportation, especially in the application of automobile navigation. Meanwhile, the concept of Spatio-temporal [...] Read more.
The algorithm of the clustering-based decision tree, which is a methodology of multimodal fusion, has made many achievements in many fields. However, it is not common in the field of transportation, especially in the application of automobile navigation. Meanwhile, the concept of Spatio-temporal data is now widely used. Therefore, we proposed a vehicle routing Spatio-temporal selection system based on a clustering-based decision tree. By screening and clustering Spatio-temporal data, which is a collection of individual point data based on historical driving data, we can identify the routes and many other features. Through the decision tree modeling of the state information of Spatio-temporal data, which includes the features of the historical data and route selection, we can obtain an optimal result, that is, the route selection made by the system. Moreover, all the above calculations and operations are done on the edge, which is different from the vast majority of current cloud computing vehicle navigation. We have also experimented with our system using real vehicle data. The experiments show that it can output path decision results for a given situation, which takes little time and is the same as the approximated case of networked navigation. The experiments yielded satisfactory results. Our system could save a lot of cloud computing power, which might change the current navigation systems. Full article
Show Figures

Figure 1

13 pages, 1901 KiB  
Article
A Lightweight Method for Vehicle Classification Based on Improved Binarized Convolutional Neural Network
by Bangyuan Zhang and Kai Zeng
Electronics 2022, 11(12), 1852; https://doi.org/10.3390/electronics11121852 - 10 Jun 2022
Cited by 1 | Viewed by 1585
Abstract
Vehicle classification is an important part of intelligent transportation. Owing to the development of deep learning, better vehicle classification can be achieved compared to traditional methods. Contemporary deep network models have huge computational scales and require a large number of parameters. Binarized convolutional [...] Read more.
Vehicle classification is an important part of intelligent transportation. Owing to the development of deep learning, better vehicle classification can be achieved compared to traditional methods. Contemporary deep network models have huge computational scales and require a large number of parameters. Binarized convolutional neural networks (CNNs) can effectively reduce model computational size and the number of parameters. Most contemporary lightweight networks are binarized directly on a full-precision model, leading to shortcomings such as gradient mismatch or serious accuracy degradation. To addresses the inherent defects of binarization networks, herein, we adjust and improve residual blocks and propose a new pooling method, which is called absolute value maximum pooling (Abs-MaxPooling). The information entropy after weight binary quantization is used to propose a weight distribution binary quantization method. A binarized CNN-based vehicle classification model is constructed, and the weights and activation values of the model are quantified to 1 bit, which saves data storage space and improves classification accuracy. The proposed binarized model performs well on the BIT-Vehicle dataset and outperforms some full-precision models. Full article
Show Figures

Figure 1

Back to TopTop