Research

22 pages, 4089 KiB

Open AccessArticle

General Image Manipulation Detection Using Feature Engineering and a Deep Feed-Forward Neural Network

by Sajjad Ahmed, Byungun Yoon, Sparsh Sharma, Saurabh Singh and Saiful Islam

Mathematics 2023, 11(21), 4537; https://doi.org/10.3390/math11214537 - 03 Nov 2023

Viewed by 743

Within digital forensics, a notable emphasis is placed on the detection of the application of fundamental image-editing operators, including but not limited to median filters, average filters, contrast enhancement, resampling, and various other operations closely associated with these techniques. When conducting a historical [...] Read more.

Within digital forensics, a notable emphasis is placed on the detection of the application of fundamental image-editing operators, including but not limited to median filters, average filters, contrast enhancement, resampling, and various other operations closely associated with these techniques. When conducting a historical analysis of an image that has potentially undergone various modifications in the past, it is a logical initial approach to search for alterations made by fundamental operators. This paper presents the development of a deep-learning-based system designed for the purpose of detecting fundamental manipulation operations. The research involved training a multilayer perceptron using a feature set of 36 dimensions derived from the gray-level co-occurrence matrix, gray-level run-length matrix, and normalized streak area. The system detected median filtering, mean filtering, the introduction of additive white Gaussian noise, and the application of JPEG compression in digital Images. Our system, which utilizes a multilayer perceptron trained with a 36-feature set, achieved an accuracy of 99.46% and outperformed state-of-the-art deep-learning-based solutions, which achieved an accuracy of 97.89%. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 5930 KiB

Open AccessArticle

EHFP-GAN: Edge-Enhanced Hierarchical Feature Pyramid Network for Damaged QR Code Reconstruction

by Jianhua Zheng, Ruolin Zhao, Zhongju Lin, Shuangyin Liu, Rong Zhu, Zihao Zhang, Yusha Fu and Junde Lu

Mathematics 2023, 11(20), 4349; https://doi.org/10.3390/math11204349 - 19 Oct 2023

Viewed by 1022

Abstract

In practical usage, QR codes often become difficult to recognize due to damage. Traditional restoration methods exhibit a limited effectiveness for severely damaged or densely encoded QR codes, are time-consuming, and have limitations in addressing extensive information loss. To tackle these challenges, we [...] Read more.

In practical usage, QR codes often become difficult to recognize due to damage. Traditional restoration methods exhibit a limited effectiveness for severely damaged or densely encoded QR codes, are time-consuming, and have limitations in addressing extensive information loss. To tackle these challenges, we propose a two-stage restoration model named the EHFP-GAN, comprising an edge restoration module and a QR code reconstruction module. The edge restoration module guides subsequent restoration by repairing the edge images, resulting in finer edge details. The hierarchical feature pyramid within the QR code reconstruction module enhances the model’s global image perception. Using our custom dataset, we compare the EHFP-GAN against several mainstream image processing models. The results demonstrate the exceptional restoration performance of the EHFP-GAN model. Specifically, across various levels of contamination, the EHFP-GAN achieves significant improvements in the recognition rate and image quality metrics, surpassing the comparative models. For instance, under mild contamination, the EHFP-GAN achieves a recognition rate of 95.35%, while under a random contamination, it reaches 31.94%, both outperforming the comparative models. In conclusion, the EHFP-GAN model demonstrates remarkable efficacy in the restoration of damaged QR codes. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 63373 KiB

Open AccessArticle

Plant Image Classification with Nonlinear Motion Deblurring Based on Deep Learning

by Ganbayar Batchuluun, Jin Seong Hong, Abdul Wahid and Kang Ryoung Park

Mathematics 2023, 11(18), 4011; https://doi.org/10.3390/math11184011 - 21 Sep 2023

Viewed by 840

Abstract

Despite the significant number of classification studies conducted using plant images, studies on nonlinear motion blur are limited. In general, motion blur results from movements of the hands of a person holding a camera for capturing plant images, or when the plant moves [...] Read more.

Despite the significant number of classification studies conducted using plant images, studies on nonlinear motion blur are limited. In general, motion blur results from movements of the hands of a person holding a camera for capturing plant images, or when the plant moves owing to wind while the camera is stationary. When these two cases occur simultaneously, nonlinear motion blur is highly probable. Therefore, a novel deep learning-based classification method applied on plant images with various nonlinear motion blurs is proposed. In addition, this study proposes a generative adversarial network-based method to reduce nonlinear motion blur; accordingly, the method is explored for improving classification performance. Herein, experiments are conducted using a self-collected visible light images dataset. Evidently, nonlinear motion deblurring results in a structural similarity index measure (SSIM) of 73.1 and a peak signal-to-noise ratio (PSNR) of 21.55, whereas plant classification results in a top-1 accuracy of 90.09% and F1-score of 84.84%. In addition, the experiment conducted using two types of open datasets resulted in PSNRs of 20.84 and 21.02 and SSIMs of 72.96 and 72.86, respectively. The proposed method of plant classification results in top-1 accuracies of 89.79% and 82.21% and F1-scores of 84% and 76.52%, respectively. Thus, the proposed network produces higher accuracies than the existing state-of-the-art methods. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence)

► Show Figures

Figure 1

23 pages, 11979 KiB

Open AccessArticle

Multi-Focus Image Fusion via PAPCNN and Fractal Dimension in NSST Domain

by Ming Lv, Zhenhong Jia, Liangliang Li and Hongbing Ma

Mathematics 2023, 11(18), 3803; https://doi.org/10.3390/math11183803 - 05 Sep 2023

Viewed by 670

Abstract

Multi-focus image fusion is a popular technique for generating a full-focus image, where all objects in the scene are clear. In order to achieve a clearer and fully focused fusion effect, in this paper, the multi-focus image fusion method based on the parameter-adaptive [...] Read more.

Multi-focus image fusion is a popular technique for generating a full-focus image, where all objects in the scene are clear. In order to achieve a clearer and fully focused fusion effect, in this paper, the multi-focus image fusion method based on the parameter-adaptive pulse-coupled neural network and fractal dimension in the nonsubsampled shearlet transform domain was developed. The parameter-adaptive pulse coupled neural network-based fusion rule was used to merge the low-frequency sub-bands, and the fractal dimension-based fusion rule via the multi-scale morphological gradient was used to merge the high-frequency sub-bands. The inverse nonsubsampled shearlet transform was used to reconstruct the fused coefficients, and the final fused multi-focus image was generated. We conducted comprehensive evaluations of our algorithm using the public Lytro dataset. The proposed method was compared with state-of-the-art fusion algorithms, including traditional and deep-learning-based approaches. The quantitative and qualitative evaluations demonstrated that our method outperformed other fusion algorithms, as evidenced by the metrics data such as

Q_{A B / F}

,

Q_{E}

,

Q_{F M I}

,

Q_{G}

,

Q_{N C I E}

,

Q_{P}

,

Q_{M I}

,

Q_{N M I}

,

Q_{Y}

,

Q_{A G}

,

Q_{P S N R}

, and

Q_{M S E}

. These results highlight the clear advantages of our proposed technique in multi-focus image fusion, providing a significant contribution to the field. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence)

► Show Figures

Figure 1

20 pages, 12456 KiB

Open AccessArticle

ATC-YOLOv5: Fruit Appearance Quality Classification Algorithm Based on the Improved YOLOv5 Model for Passion Fruits

by Changhong Liu, Weiren Lin, Yifeng Feng, Ziqing Guo and Zewen Xie

Mathematics 2023, 11(16), 3615; https://doi.org/10.3390/math11163615 - 21 Aug 2023

Cited by 1 | Viewed by 1773

Abstract

Passion fruit, renowned for its significant nutritional, medicinal, and economic value, is extensively cultivated in subtropical regions such as China, India, and Vietnam. In the production and processing industry, the quality grading of passion fruit plays a crucial role in the supply chain. [...] Read more.

Passion fruit, renowned for its significant nutritional, medicinal, and economic value, is extensively cultivated in subtropical regions such as China, India, and Vietnam. In the production and processing industry, the quality grading of passion fruit plays a crucial role in the supply chain. However, the current process relies heavily on manual labor, resulting in inefficiency and high costs, which reflects the importance of expanding the application of fruit appearance quality classification mechanisms based on computer vision. Moreover, the existing passion fruit detection algorithms mainly focus on real-time detection and overlook the quality-classification aspect. This paper proposes the ATC-YOLOv5 model based on deep learning for passion fruit detection and quality classification. First, an improved Asymptotic Feature Pyramid Network (APFN) is utilized as the feature-extraction network, which is the network modified in this study by adding weighted feature concat pathways. This optimization enhances the feature flow between different levels and nodes, allowing for the adaptive and asymptotic fusion of richer feature information related to passion fruit quality. Secondly, the Transformer Cross Stage Partial (TRCSP) layer is constructed based on the introduction of the Multi-Head Self-Attention (MHSA) layer in the Cross Stage Partial (CSP) layer, enabling the network to achieve a better performance in modeling long-range dependencies. In addition, the Coordinate Attention (CA) mechanism is introduced to enhance the network’s learning capacity for both local and non-local information, as well as the fine-grained features of passion fruit. Moreover, to validate the performance of the proposed model, a self-made passion fruit dataset is constructed to classify passion fruit into four quality grades. The original YOLOv5 serves as the baseline model. According to the experimental results, the mean average precision (mAP) of ATC-YOLOv5 reaches 95.36%, and the mean detection time (mDT) is 3.2 ms, which improves the mAP by 4.83% and the detection speed by 11.1%, and the number of parameters is reduced by 10.54% compared to the baseline, maintaining the lightweight characteristics while improving the accuracy. These experimental results validate the high detection efficiency of the proposed model for fruit quality classification, contributing to the realization of intelligent agriculture and fruit industries. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence)

► Show Figures

Figure 1

21 pages, 9369 KiB

Open AccessArticle

Raindrop-Removal Image Translation Using Target-Mask Network with Attention Module

by Hyuk-Ju Kwon and Sung-Hak Lee

Mathematics 2023, 11(15), 3318; https://doi.org/10.3390/math11153318 - 28 Jul 2023

Cited by 2 | Viewed by 1121

Abstract

Image processing plays a crucial role in improving the performance of models in various fields such as autonomous driving, surveillance cameras, and multimedia. However, capturing ideal images under favorable lighting conditions is not always feasible, particularly in challenging weather conditions such as rain, [...] Read more.

Image processing plays a crucial role in improving the performance of models in various fields such as autonomous driving, surveillance cameras, and multimedia. However, capturing ideal images under favorable lighting conditions is not always feasible, particularly in challenging weather conditions such as rain, fog, or snow, which can impede object recognition. This study aims to address this issue by focusing on generating clean images by restoring raindrop-deteriorated images. Our proposed model comprises a raindrop-mask network and a raindrop-removal network. The raindrop-mask network is based on U-Net architecture, which learns the location, shape, and brightness of raindrops. The rain-removal network is a generative adversarial network based on U-Net and comprises two attention modules: the raindrop-mask module and the residual convolution block module. These modules are employed to locate raindrop areas and restore the affected regions. Multiple loss functions are utilized to enhance model performance. The image-quality assessment metrics of proposed method, such as SSIM, PSNR, CEIQ, NIQE, FID, and LPIPS scores, are 0.832, 26.165, 3.351, 2.224, 20.837, and 0.059, respectively. Comparative evaluations against state-of-the-art models demonstrate the superiority of our proposed model based on qualitative and quantitative results. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence)

► Show Figures

Figure 1

18 pages, 8052 KiB

Open AccessArticle

Neural Rendering-Based 3D Scene Style Transfer Method via Semantic Understanding Using a Single Style Image

by Jisun Park and Kyungeun Cho

Mathematics 2023, 11(14), 3243; https://doi.org/10.3390/math11143243 - 24 Jul 2023

Viewed by 1659

Abstract

In the rapidly emerging era of untact (“contact-free”) technologies, the requirement for three-dimensional (3D) virtual environments utilized in virtual reality (VR)/augmented reality (AR) and the metaverse has seen significant growth, owing to their extensive application across various domains. Current research focuses on the [...] Read more.

In the rapidly emerging era of untact (“contact-free”) technologies, the requirement for three-dimensional (3D) virtual environments utilized in virtual reality (VR)/augmented reality (AR) and the metaverse has seen significant growth, owing to their extensive application across various domains. Current research focuses on the automatic transfer of the style of rendering images within a 3D virtual environment using artificial intelligence, which aims to minimize human intervention. However, the prevalent studies on rendering-based 3D environment-style transfers have certain inherent limitations. First, the training of a style transfer network dedicated to 3D virtual environments demands considerable style image data. These data must align with viewpoints that closely resemble those of the virtual environment. Second, there was noticeable inconsistency within the 3D structures. Predominant studies often neglect 3D scene geometry information instead of relying solely on 2D input image features. Finally, style adaptation fails to accommodate the unique characteristics inherent in each object. To address these issues, we propose a novel approach: a neural rendering-based 3D scene-style conversion technique. This methodology employs semantic nearest-neighbor feature matching, thereby facilitating the transfer of style within a 3D scene while considering the distinctive characteristics of each object, even when employing a single style image. The neural radiance field enables the network to comprehend the geometric information of a 3D scene in relation to its viewpoint. Subsequently, it transfers style features by employing the unique features of a single style image via semantic nearest-neighbor feature matching. In an empirical context, our proposed semantic 3D scene style transfer method was applied to 3D scene style transfers for both interior and exterior environments. This application utilizes the replica, 3DFront, and Tanks and Temples datasets for testing. The results illustrate that the proposed methodology surpasses existing style transfer techniques in terms of maintaining 3D viewpoint consistency, style uniformity, and semantic coherence. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence)

► Show Figures

Figure 1

17 pages, 10993 KiB

Open AccessArticle

An End-to-End Framework Based on Vision-Language Fusion for Remote Sensing Cross-Modal Text-Image Retrieval

by Liu He, Shuyan Liu, Ran An, Yudong Zhuo and Jian Tao

Mathematics 2023, 11(10), 2279; https://doi.org/10.3390/math11102279 - 13 May 2023

Cited by 4 | Viewed by 1536

Abstract

Remote sensing cross-modal text-image retrieval (RSCTIR) has recently attracted extensive attention due to its advantages of fast extraction of remote sensing image information and flexible human–computer interaction. Traditional RSCTIR methods mainly focus on improving the performance of uni-modal feature extraction separately, and most [...] Read more.

Remote sensing cross-modal text-image retrieval (RSCTIR) has recently attracted extensive attention due to its advantages of fast extraction of remote sensing image information and flexible human–computer interaction. Traditional RSCTIR methods mainly focus on improving the performance of uni-modal feature extraction separately, and most rely on pre-trained object detectors to obtain better local feature representation, which not only lack multi-modal interaction information, but also cause the training gap between the pre-trained object detector and the retrieval task. In this paper, we propose an end-to-end RSCTIR framework based on vision-language fusion (EnVLF) consisting of two uni-modal (vision and language) encoders and a muti-modal encoder which can be optimized by multitask training. Specifically, to achieve an end-to-end training process, we introduce a vision transformer module for image local features instead of a pre-trained object detector. By semantic alignment of visual and text features, the vision transformer module achieves the same performance as pre-trained object detectors for image local features. In addition, the trained multi-modal encoder can improve the top-one and top-five ranking performances after retrieval processing. Experiments on common RSICD and RSITMD datasets demonstrate that our EnVLF can obtain state-of-the-art retrieval performance. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence)

► Show Figures

Figure 1

21 pages, 8810 KiB

Open AccessArticle

MBDM: Multinational Banknote Detecting Model for Assisting Visually Impaired People

by Chanhum Park and Kang Ryoung Park

Mathematics 2023, 11(6), 1392; https://doi.org/10.3390/math11061392 - 13 Mar 2023

Viewed by 1248

Abstract

With the proliferation of smartphones and advancements in deep learning technologies, object recognition using built-in smartphone cameras has become possible. One application of this technology is to assist visually impaired individuals through the banknote detection of multiple national currencies. Previous studies have focused [...] Read more.

With the proliferation of smartphones and advancements in deep learning technologies, object recognition using built-in smartphone cameras has become possible. One application of this technology is to assist visually impaired individuals through the banknote detection of multiple national currencies. Previous studies have focused on single-national banknote detection; in contrast, this study addressed the practical need for the detection of banknotes of any nationality. To this end, we propose a multinational banknote detection model (MBDM) and a method for multinational banknote detection based on mosaic data augmentation. The effectiveness of the MBDM is demonstrated through evaluation on a Korean won (KRW) banknote and coin database built using a smartphone camera, a US dollar (USD) and Euro banknote database, and a Jordanian dinar (JOD) database that is an open database. The results show that the MBDM achieves an accuracy of 0.8396, a recall value of 0.9334, and an F1 score of 0.8840, outperforming state-of-the-art methods. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence)

► Show Figures

Figure 1

16 pages, 11259 KiB

Open AccessArticle

A Fuzzy Plug-and-Play Neural Network-Based Convex Shape Image Segmentation Method

by Xuyuan Zhang, Yu Han, Sien Lin and Chen Xu

Mathematics 2023, 11(5), 1101; https://doi.org/10.3390/math11051101 - 22 Feb 2023

Cited by 2 | Viewed by 1093

Abstract

The task of partitioning convex shape objects from images is a hot research topic, since this kind of object can be widely found in natural images. The difficulties in achieving this task lie in the fact that these objects are usually partly interrupted [...] Read more.

The task of partitioning convex shape objects from images is a hot research topic, since this kind of object can be widely found in natural images. The difficulties in achieving this task lie in the fact that these objects are usually partly interrupted by undesired background scenes. To estimate the whole boundaries of these objects, different neural networks are designed to ensure the convexity of corresponding image segmentation results. To make use of well-trained neural networks to promote the performances of convex shape image segmentation tasks, in this paper a new image segmentation model is proposed in the variational framework. In this model, a fuzzy membership function, instead of a classical binary label function, is employed to indicate image regions. To ensure fuzzy membership functions can approximate to binary label functions well, an edge-preserving smoothness regularizer is constructed from an off-the-shelf plug-and-play network denoiser, since an image denoising process can also be seen as an edge-preserving smoothing process. From the numerical results, our proposed method could generate better segmentation results on real images, and our image segmentation results were less affected by the initialization of our method than the results obtained from classical methods. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence)

► Show Figures

Figure 1

15 pages, 7152 KiB

Open AccessArticle

COCM: Co-Occurrence-Based Consistency Matching in Domain-Adaptive Segmentation

by Siyu Zhu, Yingjie Tian, Fenfen Zhou, Kunlong Bai and Xiaoyu Song

Mathematics 2022, 10(23), 4468; https://doi.org/10.3390/math10234468 - 26 Nov 2022

Viewed by 973

Abstract

This paper focuses on domain adaptation in a semantic segmentation task. Traditional methods regard the source domain and the target domain as a whole, and the image matching is determined by random seeds, leading to a low degree of consistency matching between domains [...] Read more.

This paper focuses on domain adaptation in a semantic segmentation task. Traditional methods regard the source domain and the target domain as a whole, and the image matching is determined by random seeds, leading to a low degree of consistency matching between domains and interfering with the reduction in the domain gap. Therefore, we designed a two-step, three-level cascaded domain consistency matching strategy—co-occurrence-based consistency matching (COCM)—in which the two steps are: Step 1, in which we design a matching strategy from the perspective of category existence and filter the sub-image set with the highest degree of matching from the image of the whole source domain, and Step 2, in which, from the perspective of spatial existence, we propose a method of measuring the PIOU score to quantitatively evaluate the spatial matching of co-occurring categories in the sub-image set and select the best-matching source image. The three levels mean that in order to improve the importance of low-frequency categories in the matching process, we divide the categories into three levels according to the frequency of co-occurrences between domains; these three levels are the head, middle, and tail levels, and priority is given to matching tail categories. The proposed COCM maximizes the category-level consistency between the domains and has been proven to be effective in reducing the domain gap while being lightweight. The experimental results on general datasets can be compared with those of state-of-the-art (SOTA) methods. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence)

► Show Figures

Figure 1

17 pages, 4546 KiB

Open AccessArticle

Reprojection-Based Numerical Measure of Robustness for CT Reconstruction Neural Network Algorithms

by Aleksandr Smolin, Andrei Yamaev, Anastasia Ingacheva, Tatyana Shevtsova, Dmitriy Polevoy, Marina Chukalina, Dmitry Nikolaev and Vladimir Arlazarov

Mathematics 2022, 10(22), 4210; https://doi.org/10.3390/math10224210 - 11 Nov 2022

Cited by 1 | Viewed by 1692

Abstract

In computed tomography, state-of-the-art reconstruction is based on neural network (NN) algorithms. However, NN reconstruction algorithms can be not robust to small noise-like perturbations in the input signal. A not robust NN algorithm can produce inaccurate reconstruction with plausible artifacts that cannot be [...] Read more.

In computed tomography, state-of-the-art reconstruction is based on neural network (NN) algorithms. However, NN reconstruction algorithms can be not robust to small noise-like perturbations in the input signal. A not robust NN algorithm can produce inaccurate reconstruction with plausible artifacts that cannot be detected. Hence, the robustness of NN algorithms should be investigated and evaluated. There have been several attempts to construct the numerical metrics of the NN reconstruction algorithms’ robustness. However, these metrics estimate only the probability of the easily distinguishable artifacts occurring in the reconstruction. However, these methods measure only the probability of appearance of easily distinguishable artifacts on the reconstruction, which cannot lead to misdiagnosis in clinical applications. In this work, we propose a new method for numerical estimation of the robustness of the NN reconstruction algorithms. This method is based on the probability evaluation for NN to form such selected additional structures during reconstruction which may lead to an incorrect diagnosis. The method outputs a numerical score value from 0 to 1 that can be used when benchmarking the robustness of different reconstruction algorithms. We employed the proposed method to perform a comparative study of seven reconstruction algorithms, including five NN-based and two classical. The ResUNet network had the best robustness score (0.65) among the investigated NN algorithms, but its robustness score is still lower than that of the classical algorithm SIRT (0.989). The investigated NN models demonstrated a wide range of robustness scores (0.38–0.65). Thus, in this work, robustness of 7 reconstruction algorithms was measured using the new proposed score and it was shown that some of the neural algorithms are not robust. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence)

► Show Figures

Figure 1

17 pages, 10586 KiB

Open AccessArticle

Multibranch Attention Mechanism Based on Channel and Spatial Attention Fusion

by Guojun Mao, Guanyi Liao, Hengliang Zhu and Bo Sun

Mathematics 2022, 10(21), 4150; https://doi.org/10.3390/math10214150 - 06 Nov 2022

Cited by 4 | Viewed by 2028

Abstract

Recently, it has been demonstrated that the performance of an object detection network can be improved by embedding an attention module into it. In this work, we propose a lightweight and effective attention mechanism named multibranch attention (M3Att). For the input feature map, [...] Read more.

Recently, it has been demonstrated that the performance of an object detection network can be improved by embedding an attention module into it. In this work, we propose a lightweight and effective attention mechanism named multibranch attention (M3Att). For the input feature map, our M3Att first uses the grouped convolutional layer with a pyramid structure for feature extraction, and then calculates channel attention and spatial attention simultaneously and fuses them to obtain more complementary features. It is a “plug and play” module that can be easily added to the object detection network and significantly improves the performance of the object detection network with a small increase in parameters. We demonstrate the effectiveness of M3Att on various challenging object detection tasks, including PASCAL VOC2007, PASCAL VOC2012, KITTI, and Zhanjiang Underwater Robot Competition. The experimental results show that this method dramatically improves the object detection effect, especially for the PASCAL VOC2007, and the mapping index of the original network increased by 4.93% when embedded in the YOLOV4 (You Only Look Once v4) network. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence)

► Show Figures

Figure 1

15 pages, 4764 KiB

Open AccessArticle

An Attention and Wavelet Based Spatial-Temporal Graph Neural Network for Traffic Flow and Speed Prediction

by Shihao Zhao, Shuli Xing and Guojun Mao

Mathematics 2022, 10(19), 3507; https://doi.org/10.3390/math10193507 - 26 Sep 2022

Cited by 2 | Viewed by 2221

Abstract

Traffic flow prediction is essential to the intelligent transportation system (ITS). However, due to the complex spatial-temporal dependence of traffic flow data, it is insufficient in the extraction of local and global spatial-temporal correlations for the previous process on road network and traffic [...] Read more.

Traffic flow prediction is essential to the intelligent transportation system (ITS). However, due to the complex spatial-temporal dependence of traffic flow data, it is insufficient in the extraction of local and global spatial-temporal correlations for the previous process on road network and traffic flow modeling. This paper proposes an attention and wavelet-based spatial-temporal graph neural network for traffic flow and speed prediction (STAGWNN). It integrated attention and graph wavelet neural networks to capture local and global spatial information. Meanwhile, we stacked a gated temporal convolutional network (gated TCN) with a temporal attention mechanism to extract the time series information. The experiment was carried out on real public transportation datasets: PEMS-BAY and PEMSD7(M). The comparison results showed that our proposed model outperformed baseline networks on these datasets, which indicated that STAGWNN could better capture the spatial-temporal correlation information. Full article

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Computer Vision, Image Processing Technologies and Artificial Intelligence

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Related Special Issue

Published Papers (14 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI