Electronics

Research

16 pages, 5861 KiB

Open AccessArticle

NRPerson: A Non-Registered Multi-Modal Benchmark for Tiny Person Detection and Localization

by Yi Yang, Xumeng Han, Kuiran Wang, Xuehui Yu, Wenwen Yu, Zipeng Wang, Guorong Li, Zhenjun Han and Jianbin Jiao

Electronics 2024, 13(9), 1697; https://doi.org/10.3390/electronics13091697 - 27 Apr 2024

Viewed by 186

Abstract

In recent years, the detection and localization of tiny persons have garnered significant attention due to their critical applications in various surveillance and security scenarios. Traditional multi-modal methods predominantly rely on well-registered image pairs, necessitating the use of sophisticated sensors and extensive manual [...] Read more.

In recent years, the detection and localization of tiny persons have garnered significant attention due to their critical applications in various surveillance and security scenarios. Traditional multi-modal methods predominantly rely on well-registered image pairs, necessitating the use of sophisticated sensors and extensive manual effort for registration, which restricts their practical utility in dynamic, real-world environments. Addressing this gap, this paper introduces a novel non-registered multi-modal benchmark named NRPerson, specifically designed to advance the field of tiny person detection and localization by accommodating the complexities of real-world scenarios. The NRPerson dataset comprises 8548 RGB-IR image pairs, meticulously collected and filtered from 22 video sequences, enriched with 889,207 high-quality annotations that have been manually verified for accuracy. Utilizing NRPerson, we evaluate several leading detection and localization models across both mono-modal and non-registered multi-modal frameworks. Furthermore, we develop a comprehensive set of natural multi-modal baselines for the innovative non-registered track, aiming to enhance the detection and localization of unregistered multi-modal data using a cohesive and generalized approach. This benchmark is poised to facilitate significant strides in the practical deployment of detection and localization technologies by mitigating the reliance on stringent registration requirements. Full article

(This article belongs to the Special Issue Big Model Techniques for Image Processing)

► Show Figures

Figure 1

14 pages, 6912 KiB

Open AccessArticle

A Dynamic Network with Transformer for Image Denoising

by Mingjian Song, Wenbo Wang and Yue Zhao

Electronics 2024, 13(9), 1676; https://doi.org/10.3390/electronics13091676 - 26 Apr 2024

Viewed by 273

Abstract

Deep convolutional neural networks (CNNs) can achieve good performance in image denoising due to their superiority in the extraction of structural information. However, they may ignore the relationships between pixels to limit effects for image denoising. Transformer, focusing on pixel to pixel relationships [...] Read more.

Deep convolutional neural networks (CNNs) can achieve good performance in image denoising due to their superiority in the extraction of structural information. However, they may ignore the relationships between pixels to limit effects for image denoising. Transformer, focusing on pixel to pixel relationships can effectively solve this problem. This article aims to make a CNN and Transformer complement each other in image denoising. In this study, we propose a dynamic network with Transformer for image denoising (DTNet), with a residual block (RB), a multi-head self-attention block (MSAB), and a multidimensional dynamic enhancement block (MDEB). Firstly, the RB not only utilizes a CNN but also lays the foundation for the combination with Transformer. Then, the MSAB adds positional encoding and applies multi-head self-attention, which enables the preservation of sequential positional information while employing the Transformer to obtain global information. Finally, the MDEB uses dimension enhancement and dynamic convolution to improve the adaptive ability. The experiments show that our DTNet is superior to some existing methods for image denoising. Full article

(This article belongs to the Special Issue Big Model Techniques for Image Processing)

► Show Figures

Figure 1

14 pages, 3268 KiB

Open AccessArticle

An Adaptive Atrous Spatial Pyramid Pooling Network for Hyperspectral Classification

by Tianxing Zhu, Qin Liu and Lixiang Zhang

Electronics 2023, 12(24), 5013; https://doi.org/10.3390/electronics12245013 - 15 Dec 2023

Cited by 1 | Viewed by 797

Abstract

Hyperspectral imaging (HSI) offers rich spectral and spatial data, beneficial for a variety of applications. However, challenges persist in HSI classification due to spectral variability, non-linearity, limited samples, and a dearth of spatial information in conventional spectral classifiers. While various spectral–spatial classifiers and [...] Read more.

Hyperspectral imaging (HSI) offers rich spectral and spatial data, beneficial for a variety of applications. However, challenges persist in HSI classification due to spectral variability, non-linearity, limited samples, and a dearth of spatial information in conventional spectral classifiers. While various spectral–spatial classifiers and dimension reduction techniques have been developed to mitigate these issues, they are often constrained by the utilization of handcrafted features. Deep learning has been introduced to HSI classification, with pixel- and patch-level deep learning (DL) classifiers gaining substantial attention. Yet, existing patch-level DL classifiers encounter difficulties in concentrating on long-distance dependencies and managing category areas of diverse sizes. The proposed Self-Adaptive 3D atrous spatial pyramid pooling (ASPP) Multi-Scale Feature Fusion Network (SAAFN) addresses these challenges by simultaneously preserving high-resolution spatial detail data and high-level semantic information. This method integrates a modified hyperspectral superpixel segmentation technique, a multi-scale 3D ASPP convolution block, and an end-to-end framework to extract and fuse multi-scale features at a self-adaptive rate for HSI classification. This method significantly enhances the classification accuracy of HSI with limited samples. Full article

(This article belongs to the Special Issue Big Model Techniques for Image Processing)

► Show Figures

Figure 1

18 pages, 5422 KiB

Open AccessArticle

D2StarGAN: A Near-Far End Noise Adaptive StarGAN for Speech Intelligibility Enhancement

by Dengshi Li, Chenyi Zhu and Lanxin Zhao

Electronics 2023, 12(17), 3620; https://doi.org/10.3390/electronics12173620 - 27 Aug 2023

Viewed by 1240

Abstract

When using mobile communication, the voice output from the device is already relatively clear, but in a noisy environment, it is difficult for the listener to obtain the information expressed by the speaker with clarity. Consequently, speech intelligibility enhancement technology has emerged to [...] Read more.

When using mobile communication, the voice output from the device is already relatively clear, but in a noisy environment, it is difficult for the listener to obtain the information expressed by the speaker with clarity. Consequently, speech intelligibility enhancement technology has emerged to help alleviate this problem. Speech intelligibility enhancement (IENH) is a technique that enhances speech intelligibility during the reception phase. Previous research has focused on IENH through normal versus different levels of Lombardic speech conversion, inspired by a well-known acoustic mechanism called the Lombard effect. However, these methods often lead to speech distortion and impair the overall speech quality. To address the speech quality degradation problem, we propose an improved (StarGAN)-based IENH framework by combining StarGAN networks with the dual discriminator idea to construct the conversion framework. This approach offers two main advantages: (1) Addition of a speech metric discriminator on top of StarGAN to optimize multiple intelligibility and quality-related metrics simultaneously; (2) a framework that is adaptive to different distal and proximal noise levels with different noise types. Experimental results from objective experiments and subjective preference tests show that our approach outperforms the baseline approach, and these enable IENH to be more widely used. Full article

(This article belongs to the Special Issue Big Model Techniques for Image Processing)

► Show Figures

Figure 1

26 pages, 7083 KiB

Open AccessArticle

MemoryGAN: GAN Generator as Heterogeneous Memory for Compositional Image Synthesis

by Zongtao Wang, Jiajie Peng and Zhiming Liu

Electronics 2023, 12(13), 2927; https://doi.org/10.3390/electronics12132927 - 03 Jul 2023

Viewed by 967

Abstract

The Generative Adversarial Network (GAN) has recently experienced great progress in compositional image synthesis. Unfortunately, the models proposed in the literature usually require a set of pre-defined local generators and use a separate generator to model each part object. This makes the model [...] Read more.

The Generative Adversarial Network (GAN) has recently experienced great progress in compositional image synthesis. Unfortunately, the models proposed in the literature usually require a set of pre-defined local generators and use a separate generator to model each part object. This makes the model inflexible and also limits its scalability. Inspired by humans’ structured memory system, we propose MemoryGAN to eliminate these disadvantages. MemoryGAN uses a single generator as a shared memory to hold the heterogeneous information of the parts, and it uses a recurrent neural network to model the dependency between the parts and provide the query code for the memory. The shared memory structure and the query and feedback mechanism make MemoryGAN flexible and scalable. Our experiment shows that although MemoryGAN only uses a single generator for all the parts, it achieves comparable performance with the state-of-the-art, which uses multiple generators, in terms of synthesized image quality, compositional ability and disentanglement property. We believe that our result of using the generator of the GAN as a memory model will inspire future work of both bio-friendly models and memory-augmented models. Full article

(This article belongs to the Special Issue Big Model Techniques for Image Processing)

► Show Figures

Figure 1

17 pages, 813 KiB

Open AccessArticle

NFSP-PLT: Solving Games with a Weighted NFSP-PER-Based Method

by Huale Li, Shuhan Qi, Jiajia Zhang, Dandan Zhang, Lin Yao, Xuan Wang, Qi Li and Jing Xiao

Electronics 2023, 12(11), 2396; https://doi.org/10.3390/electronics12112396 - 25 May 2023

Viewed by 881

Abstract

Nash equilibrium strategy is a typical goal when solving two-player imperfect-information games (IIGs). Neural fictitious self-play (NFSP) is a popular method to find the Nash equilibrium in IIGs, which is the first end-to-end method used to compute the Nash equilibrium strategy. However, the [...] Read more.

Nash equilibrium strategy is a typical goal when solving two-player imperfect-information games (IIGs). Neural fictitious self-play (NFSP) is a popular method to find the Nash equilibrium in IIGs, which is the first end-to-end method used to compute the Nash equilibrium strategy. However, the training of NFSP requires a large number of sample data and the interactive cost of obtaining such data is often very high. Realizing the efficient training of network under limited samples is an urgent problem. In this paper, we first proposed a new NFSP-based method, NFSP with prioritized experience replay (NFSP-PER), to improve the sample training efficiency. Then, a weighted NFSP-PER with learning time (NFSP-PLT) was proposed to control the utilization degree of priority-weighted samples. Furthermore, based on the NFSP-PLT, an adaptive upper-confidence-bound applied to tree (UCT) is used to solve the optimal response strategy, which makes the solving strategy more accurate. Extensive experimental results show that the proposed NFSP-PLT effectively improves the sample learning efficiency compared with the existing works. Full article

(This article belongs to the Special Issue Big Model Techniques for Image Processing)

► Show Figures

Figure 1

12 pages, 20154 KiB

Open AccessArticle

Denoising and Reducing Inner Disorder in Point Clouds for Improved 3D Object Detection in Autonomous Driving

by Weifan Xu, Jin Jin, Fenglei Xu, Ze Li and Chongben Tao

Electronics 2023, 12(11), 2364; https://doi.org/10.3390/electronics12112364 - 23 May 2023

Viewed by 1016

Abstract

In the field of autonomous driving, precise spatial positioning and 3D object detection have become increasingly critical due to advancements in LiDAR technology and its extensive applications. Traditional detection models for RGB images face challenges in handling the intrinsic disorder present in LiDAR [...] Read more.

In the field of autonomous driving, precise spatial positioning and 3D object detection have become increasingly critical due to advancements in LiDAR technology and its extensive applications. Traditional detection models for RGB images face challenges in handling the intrinsic disorder present in LiDAR point clouds. Although point clouds are typically perceived as irregular and disordered, an implicit order actually exists, owing to laser arrangement and sequential scanning. Therefore, we propose Frustumformer, a novel framework that leverages the inherent order of LiDAR point clouds, reducing disorder and enhancing representation. Our approach consists of a frustum-based method that relies on the results of a 2D image detector, a frustum patch embedding that exploits the new data representation format, and a single-stride transformer network for original resolution feature fusion. By incorporating these components, Frustumformer effectively exploits the intrinsic order of point clouds and models long-range dependencies to further improve performance. Ablation studies verify the efficacy of the single-stride transformer component and the overall model architecture. We conduct experiments on the KITTI dataset, and Frustumformer outperforms existing methods. Full article

(This article belongs to the Special Issue Big Model Techniques for Image Processing)

► Show Figures

Figure 1

19 pages, 4845 KiB

Open AccessArticle

Using CNN with Multi-Level Information Fusion for Image Denoising

by Shaodong Xie, Jiagang Song, Yuxuan Hu, Chengyuan Zhang and Shichao Zhang

Electronics 2023, 12(9), 2146; https://doi.org/10.3390/electronics12092146 - 08 May 2023

Cited by 1 | Viewed by 1954

Abstract

Deep convolutional neural networks (CNN) with hierarchical architectures have obtained good results for image denoising. However, in some cases where the noise level is unknown and the image background is complex, it is challenging to obtain robust information through CNN. In this paper, [...] Read more.

Deep convolutional neural networks (CNN) with hierarchical architectures have obtained good results for image denoising. However, in some cases where the noise level is unknown and the image background is complex, it is challenging to obtain robust information through CNN. In this paper, we present a multi-level information fusion CNN (MLIFCNN) in image denoising containing a fine information extraction block (FIEB), a multi-level information interaction block (MIIB), a coarse information refinement block (CIRB), and a reconstruction block (RB). In order to adapt to more complex image backgrounds, FIEB uses parallel group convolution to extract wide-channel information. To enhance the robustness of the obtained information, a MIIB uses residual operations to act in two sub-networks for implementing the interaction of wide and deep information to adapt to the distribution of different noise levels. To enhance the stability of the training denoiser, CIRB stacks common and group convolutions to refine the obtained information. Finally, RB uses a residual operation to act in a single convolution in order to obtain the resultant clean image. Experimental results show that our method is better than many other excellent methods, both in terms of quantitative and qualitative aspects. Full article

(This article belongs to the Special Issue Big Model Techniques for Image Processing)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Big Model Techniques for Image Processing

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (8 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI