sensors-logo

Journal Browser

Journal Browser

Sensing Technologies for Image/Video Analysis

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: closed (31 December 2023) | Viewed by 33440

Special Issue Editor


E-Mail Website
Guest Editor
Department of Embedded Systems Engineering, Incheon National University, Incheon, Republic of Korea
Interests: deep learning; AI; digital image; vision sensor; video coding; sensor for video; 3D vision sensor

Special Issue Information

Dear Colleagues,

Sensing technologies for image and video analysis are becoming reliable and efficient visual detection systems and are developing swiftly. Image analysis is the extraction of meaningful information from images, mainly digital images, by means of digital image processing techniques. Image analysis tasks can be as simple as reading bar-coded tags or sophisticated, such as identifying sensor data. 

This Special Issue aims to show how sensing technologies are used in image/video analysis and how they can be applied in practice in real industrial scenarios. We intend to focus, therefore, not only on theoretical contributions but also on applications using sensing technologies for image/video analysis, vision sensing, visual sensing technology, SIFT, digital sensors, denoising and colour processing, and artificial intelligence, such as deep learning, high-speed processing, and optical response.

Experiments performed in research centres, as well as in industry and universities, are welcome. Such applications should mostly be based on prototypes, perhaps within research projects, but large experiments will, of course, be accepted. Survey/tutorial manuscripts will also be considered.

Dr. Gwanggil Jeon
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • digital image
  • video processing
  • artificial intelligence
  • image processing
  • vision sensor
  • sensor deep learning
  • sensor for video
  • 3D vision sensor

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

18 pages, 3978 KiB  
Article
Siamese Transformer-Based Building Change Detection in Remote Sensing Images
by Jiawei Xiong, Feng Liu, Xingyuan Wang and Chaozhong Yang
Sensors 2024, 24(4), 1268; https://doi.org/10.3390/s24041268 - 16 Feb 2024
Viewed by 527
Abstract
To address the challenges of handling imprecise building boundary information and reducing false-positive outcomes during the process of detecting building changes in remote sensing images, this paper proposes a Siamese transformer architecture based on a difference module. This method introduces a layered transformer [...] Read more.
To address the challenges of handling imprecise building boundary information and reducing false-positive outcomes during the process of detecting building changes in remote sensing images, this paper proposes a Siamese transformer architecture based on a difference module. This method introduces a layered transformer to provide global context modeling capability and multiscale features to better process building boundary information, and a difference module is used to better obtain the difference features of a building before and after a change. The difference features before and after the change are then fused, and the fused difference features are used to generate a change map, which reduces the false-positive problem to a certain extent. Experiments were conducted on two publicly available building change detection datasets, LEVIR-CD and WHU-CD. The F1 scores for LEVIR-CD and WHU-CD reached 89.58% and 84.51%, respectively. The experimental results demonstrate that when utilized for building change detection in remote sensing images, the proposed method exhibits improved robustness and detection performance. Additionally, this method serves as a valuable technical reference for the identification of building damage in remote sensing images. Full article
(This article belongs to the Special Issue Sensing Technologies for Image/Video Analysis)
Show Figures

Figure 1

19 pages, 25163 KiB  
Article
DTFusion: Infrared and Visible Image Fusion Based on Dense Residual PConv-ConvNeXt and Texture-Contrast Compensation
by Xinzhi Zhou, Min He, Dongming Zhou, Feifei Xu and Seunggil Jeon
Sensors 2024, 24(1), 203; https://doi.org/10.3390/s24010203 - 29 Dec 2023
Viewed by 561
Abstract
Infrared and visible image fusion aims to produce an informative fused image for the same scene by integrating the complementary information from two source images. Most deep-learning-based fusion networks utilize small kernel-size convolution to extract features from a local receptive field or design [...] Read more.
Infrared and visible image fusion aims to produce an informative fused image for the same scene by integrating the complementary information from two source images. Most deep-learning-based fusion networks utilize small kernel-size convolution to extract features from a local receptive field or design unlearnable fusion strategies to fuse features, which limits the feature representation capabilities and fusion performance of the network. Therefore, a novel end-to-end infrared and visible image fusion framework called DTFusion is proposed to address these problems. A residual PConv-ConvNeXt module (RPCM) and dense connections are introduced into the encoder network to efficiently extract features with larger receptive fields. In addition, a texture-contrast compensation module (TCCM) with gradient residuals and an attention mechanism is designed to compensate for the texture details and contrast of features. The fused features are reconstructed through four convolutional layers to generate a fused image with rich scene information. Experiments on public datasets show that DTFusion outperforms other state-of-the-art fusion methods in both subjective vision and objective metrics. Full article
(This article belongs to the Special Issue Sensing Technologies for Image/Video Analysis)
Show Figures

Figure 1

22 pages, 7955 KiB  
Article
SpectralMAE: Spectral Masked Autoencoder for Hyperspectral Remote Sensing Image Reconstruction
by Lingxuan Zhu, Jiaji Wu, Wang Biao, Yi Liao and Dandan Gu
Sensors 2023, 23(7), 3728; https://doi.org/10.3390/s23073728 - 04 Apr 2023
Cited by 1 | Viewed by 2652
Abstract
Accurate hyperspectral remote sensing information is essential for feature identification and detection. Nevertheless, the hyperspectral imaging mechanism poses challenges in balancing the trade-off between spatial and spectral resolution. Hardware improvements are cost-intensive and depend on strict environmental conditions and extra equipment. Recent spectral [...] Read more.
Accurate hyperspectral remote sensing information is essential for feature identification and detection. Nevertheless, the hyperspectral imaging mechanism poses challenges in balancing the trade-off between spatial and spectral resolution. Hardware improvements are cost-intensive and depend on strict environmental conditions and extra equipment. Recent spectral imaging methods have attempted to directly reconstruct hyperspectral information from widely available multispectral images. However, fixed mapping approaches used in previous spectral reconstruction models limit their reconstruction quality and generalizability, especially dealing with missing or contaminated bands. Moreover, data-hungry issues plague increasingly complex data-driven spectral reconstruction methods. This paper proposes SpectralMAE, a novel spectral reconstruction model that can take arbitrary combinations of bands as input and improve the utilization of data sources. In contrast to previous spectral reconstruction techniques, SpectralMAE explores the application of a self-supervised learning paradigm and proposes a masked autoencoder architecture for spectral dimensions. To further enhance the performance for specific sensor inputs, we propose a training strategy by combining random masking pre-training and fixed masking fine-tuning. Empirical evaluations on five remote sensing datasets demonstrate that SpectralMAE outperforms state-of-the-art methods in both qualitative and quantitative metrics. Full article
(This article belongs to the Special Issue Sensing Technologies for Image/Video Analysis)
Show Figures

Figure 1

15 pages, 2824 KiB  
Article
Random Matrix Transformation and Its Application in Image Hiding
by Jijun Wang, Fun Soo Tan and Yi Yuan
Sensors 2023, 23(2), 1017; https://doi.org/10.3390/s23021017 - 16 Jan 2023
Cited by 2 | Viewed by 1832
Abstract
Image coding technology has become an indispensable technology in the field of modern information. With the vigorous development of the big data era, information security has received more attention. Image steganography is an important method of image encoding and hiding, and how to [...] Read more.
Image coding technology has become an indispensable technology in the field of modern information. With the vigorous development of the big data era, information security has received more attention. Image steganography is an important method of image encoding and hiding, and how to protect information security with this technology is worth studying. Using a basis of mathematical modeling, this paper makes innovations not only in improving the theoretical system of kernel function but also in constructing a random matrix to establish an information-hiding scheme. By using the random matrix as the reference matrix for secret-information steganography, due to the characteristics of the random matrix, the secret information set to be retrieved is very small, reducing the modification range of the steganography image and improving the steganography image quality and efficiency. This scheme can maintain the steganography image quality with a PSNR of 49.95 dB and steganography of 1.5 bits per pixel and can ensure that the steganography efficiency is improved by reducing the steganography set. In order to adapt to different steganography requirements and improve the steganography ability of the steganography schemes, this paper also proposes an adaptive large-capacity information-hiding scheme based on the random matrix. In this scheme, a method of expanding the random matrix is proposed, which can generate a corresponding random matrix according to different steganography capacity requirements to achieve the corresponding secret-information steganography. Two schemes are demonstrated through simulation experiments as well as an analysis of the steganography efficiency, steganography image quality, and steganography capacity and security. The experimental results show that the latter two schemes are better than the first two in terms of steganography capacity and steganography image quality. Full article
(This article belongs to the Special Issue Sensing Technologies for Image/Video Analysis)
Show Figures

Figure 1

19 pages, 1166 KiB  
Article
Emotion Detection Using Deep Normalized Attention-Based Neural Network and Modified-Random Forest
by Shtwai Alsubai
Sensors 2023, 23(1), 225; https://doi.org/10.3390/s23010225 - 26 Dec 2022
Cited by 9 | Viewed by 2002
Abstract
In the contemporary world, emotion detection of humans is procuring huge scope in extensive dimensions such as bio-metric security, HCI (human–computer interaction), etc. Such emotions could be detected from various means, such as information integration from facial expressions, gestures, speech, etc. Though such [...] Read more.
In the contemporary world, emotion detection of humans is procuring huge scope in extensive dimensions such as bio-metric security, HCI (human–computer interaction), etc. Such emotions could be detected from various means, such as information integration from facial expressions, gestures, speech, etc. Though such physical depictions contribute to emotion detection, EEG (electroencephalogram) signals have gained significant focus in emotion detection due to their sensitivity to alterations in emotional states. Hence, such signals could explore significant emotional state features. However, manual detection from EEG signals is a time-consuming process. With the evolution of artificial intelligence, researchers have attempted to use different data mining algorithms for emotion detection from EEG signals. Nevertheless, they have shown ineffective accuracy. To resolve this, the present study proposes a DNA-RCNN (Deep Normalized Attention-based Residual Convolutional Neural Network) to extract the appropriate features based on the discriminative representation of features. The proposed NN also explores alluring features with the proposed attention modules leading to consistent performance. Finally, classification is performed by the proposed M-RF (modified-random forest) with an empirical loss function. In this process, the learning weights on the data subset alleviate loss amongst the predicted value and ground truth, which assists in precise classification. Performance and comparative analysis are considered to explore the better performance of the proposed system in detecting emotions from EEG signals that confirms its effectiveness. Full article
(This article belongs to the Special Issue Sensing Technologies for Image/Video Analysis)
Show Figures

Figure 1

18 pages, 20323 KiB  
Article
End-to-End Network for Pedestrian Detection, Tracking and Re-Identification in Real-Time Surveillance System
by Mingwei Lei, Yongchao Song, Jindong Zhao, Xuan Wang, Jun Lyu, Jindong Xu and Weiqing Yan
Sensors 2022, 22(22), 8693; https://doi.org/10.3390/s22228693 - 10 Nov 2022
Cited by 2 | Viewed by 1986
Abstract
Surveillance video has been widely used in business, security, search, and other fields. Identifying and locating specific pedestrians in surveillance video has an important application value in criminal investigation, search and rescue, etc. However, the requirements for real-time capturing and accuracy are high [...] Read more.
Surveillance video has been widely used in business, security, search, and other fields. Identifying and locating specific pedestrians in surveillance video has an important application value in criminal investigation, search and rescue, etc. However, the requirements for real-time capturing and accuracy are high for these applications. It is essential to build a complete and smooth system to combine pedestrian detection, tracking and re-identification to achieve the goal of maximizing efficiency by balancing real-time capture and accuracy. This paper combined the detector and Re-ID models into a single end-to-end network by introducing a new track branch to YOLOv5 architecture for tracking. For pedestrian detection, we employed the weighted bi-directional feature pyramid network (BiFPN) to enhance the network structure based on the YOLOv5-Lite, which is able to further improve the ability of feature extraction. For tracking, based on Deepsort, this paper enhanced the tracker, which uses the Noise Scale Adaptive (NSA) Kalman filter to track, and adds adaptive noise to strengthen the anti-interference of the tracking model. In addition, the matching strategy is further updated. For pedestrian re-identification, the network structure of Fastreid was modified, which can increase the feature extraction speed of the improved algorithm by leaps and bounds. Using the proposed unified network, the parameters of the entire model can be trained in an end-to-end method with the multi-loss function, which has been demonstrated to be quite valuable in some other recent works. Experimental results demonstrate that pedestrians detection can obtain a 97% mean Average Precision (mAP) and that it can track the pedestrians well with a 98.3% MOTA and a 99.8% MOTP on the MOT16 dataset; furthermore, high pedestrian re-identification performance can be achieved on the VERI-Wild dataset with a 77.3% mAP. The overall framework proposed in this paper has remarkable performance in terms of the precise localization and real-time detection of specific pedestrians across time, regions, and cameras. Full article
(This article belongs to the Special Issue Sensing Technologies for Image/Video Analysis)
Show Figures

Figure 1

12 pages, 8627 KiB  
Article
Real-Time and Efficient Multi-Scale Traffic Sign Detection Method for Driverless Cars
by Xuan Wang, Jian Guo, Jinglei Yi, Yongchao Song, Jindong Xu, Weiqing Yan and Xin Fu
Sensors 2022, 22(18), 6930; https://doi.org/10.3390/s22186930 - 13 Sep 2022
Cited by 10 | Viewed by 2123
Abstract
Traffic signs detection and recognition is an essential and challenging task for driverless cars. However, the detection of traffic signs in most scenarios belongs to small target detection, and most existing object detection methods show poor performance in these cases, which increases the [...] Read more.
Traffic signs detection and recognition is an essential and challenging task for driverless cars. However, the detection of traffic signs in most scenarios belongs to small target detection, and most existing object detection methods show poor performance in these cases, which increases the difficulty of detection. To further improve the accuracy of small object detection for traffic signs, this paper proposed an optimization strategy based on the YOLOv4 network. Firstly, an improved triplet attention mechanism was added to the backbone network. It was combined with optimized weights to make the network focus more on the acquisition of channel and spatial features. Secondly, a bidirectional feature pyramid network (BiFPN) was used in the neck network to enhance feature fusion, which can effectively improve the feature perception field of small objects. The improved model and some state-of-the-art (SOTA) methods were compared on the joint dataset TT100K-COCO. Experimental results show that the enhanced network can achieve 60.4% mAP(Mean Average Precision), surpassing the YOLOv4 by 8% with the same input size. With a larger input size, it can achieve a best performance capability of 66.4% mAP. This work provides a reference for research on obtaining higher accuracy for traffic sign detection in autonomous driving. Full article
(This article belongs to the Special Issue Sensing Technologies for Image/Video Analysis)
Show Figures

Figure 1

17 pages, 19902 KiB  
Article
Low-Light Image Enhancement Based on Constraint Low-Rank Approximation Retinex Model
by Xuesong Li, Jianrun Shang, Wenhao Song, Jinyong Chen, Guisheng Zhang and Jinfeng Pan
Sensors 2022, 22(16), 6126; https://doi.org/10.3390/s22166126 - 16 Aug 2022
Cited by 4 | Viewed by 1892
Abstract
Images captured in a low-light environment are strongly influenced by noise and low contrast, which is detrimental to tasks such as image recognition and object detection. Retinex-based approaches have been continuously explored for low-light enhancement. Nevertheless, Retinex decomposition is a highly ill-posed problem. [...] Read more.
Images captured in a low-light environment are strongly influenced by noise and low contrast, which is detrimental to tasks such as image recognition and object detection. Retinex-based approaches have been continuously explored for low-light enhancement. Nevertheless, Retinex decomposition is a highly ill-posed problem. The estimation of the decomposed components should be combined with proper constraints. Meanwhile, the noise mixed in the low-light image causes unpleasant visual effects. To address these problems, we propose a Constraint Low-Rank Approximation Retinex model (CLAR). In this model, two exponential relative total variation constraints were imposed to ensure that the illumination is piece-wise smooth and that the reflectance component is piece-wise continuous. In addition, the low-rank prior was introduced to suppress the noise in the reflectance component. With a tailored separated alternating direction method of multipliers (ADMM) algorithm, the illumination and reflectance components were updated accurately. Experimental results on several public datasets verify the effectiveness of the proposed model subjectively and objectively. Full article
(This article belongs to the Special Issue Sensing Technologies for Image/Video Analysis)
Show Figures

Figure 1

Review

Jump to: Research

27 pages, 1321 KiB  
Review
Image-Compression Techniques: Classical and “Region-of-Interest-Based” Approaches Presented in Recent Papers
by Vlad-Ilie Ungureanu, Paul Negirla and Adrian Korodi
Sensors 2024, 24(3), 791; https://doi.org/10.3390/s24030791 - 25 Jan 2024
Cited by 1 | Viewed by 1316
Abstract
Image compression is a vital component for domains in which the computational resources are usually scarce such as automotive or telemedicine fields. Also, when discussing real-time systems, the large amount of data that must flow through the system can represent a bottleneck. Therefore, [...] Read more.
Image compression is a vital component for domains in which the computational resources are usually scarce such as automotive or telemedicine fields. Also, when discussing real-time systems, the large amount of data that must flow through the system can represent a bottleneck. Therefore, the storage of images, alongside the compression, transmission, and decompression procedures, becomes vital. In recent years, many compression techniques that only preserve the quality of the region of interest of an image have been developed, the other parts being either discarded or compressed with major quality loss. This paper proposes a study of relevant papers from the last decade which are focused on the selection of a region of interest of an image and on the compression techniques that can be applied to that area. To better highlight the novelty of the hybrid methods, classical state-of-the-art approaches are also analyzed. The current work will provide an overview of classical and hybrid compression methods alongside a categorization based on compression ratio and other quality factors such as mean-square error and peak signal-to-noise ratio, structural similarity index measure, and so on. This overview can help researchers to develop a better idea of what compression algorithms are used in certain domains and to find out if the presented performance parameters are of interest for the intended purpose. Full article
(This article belongs to the Special Issue Sensing Technologies for Image/Video Analysis)
Show Figures

Figure 1

28 pages, 38720 KiB  
Review
Multi-Object Multi-Camera Tracking Based on Deep Learning for Intelligent Transportation: A Review
by Lunlin Fei and Bing Han
Sensors 2023, 23(8), 3852; https://doi.org/10.3390/s23083852 - 10 Apr 2023
Cited by 4 | Viewed by 8431
Abstract
Multi-Objective Multi-Camera Tracking (MOMCT) is aimed at locating and identifying multiple objects from video captured by multiple cameras. With the advancement of technology in recent years, it has received a lot of attention from researchers in applications such as intelligent transportation, public safety [...] Read more.
Multi-Objective Multi-Camera Tracking (MOMCT) is aimed at locating and identifying multiple objects from video captured by multiple cameras. With the advancement of technology in recent years, it has received a lot of attention from researchers in applications such as intelligent transportation, public safety and self-driving driving technology. As a result, a large number of excellent research results have emerged in the field of MOMCT. To facilitate the rapid development of intelligent transportation, researchers need to keep abreast of the latest research and current challenges in related field. Therefore, this paper provide a comprehensive review of multi-object multi-camera tracking based on deep learning for intelligent transportation. Specifically, we first introduce the main object detectors for MOMCT in detail. Secondly, we give an in-depth analysis of deep learning based MOMCT and evaluate advanced methods through visualisation. Thirdly, we summarize the popular benchmark data sets and metrics to provide quantitative and comprehensive comparisons. Finally, we point out the challenges faced by MOMCT in intelligent transportation and present practical suggestions for the future direction. Full article
(This article belongs to the Special Issue Sensing Technologies for Image/Video Analysis)
Show Figures

Figure 1

21 pages, 2643 KiB  
Review
Convolutional Neural Networks or Vision Transformers: Who Will Win the Race for Action Recognitions in Visual Data?
by Oumaima Moutik, Hiba Sekkat, Smail Tigani, Abdellah Chehri, Rachid Saadane, Taha Ait Tchakoucht and Anand Paul
Sensors 2023, 23(2), 734; https://doi.org/10.3390/s23020734 - 09 Jan 2023
Cited by 29 | Viewed by 6906
Abstract
Understanding actions in videos remains a significant challenge in computer vision, which has been the subject of several pieces of research in the last decades. Convolutional neural networks (CNN) are a significant component of this topic and play a crucial role in the [...] Read more.
Understanding actions in videos remains a significant challenge in computer vision, which has been the subject of several pieces of research in the last decades. Convolutional neural networks (CNN) are a significant component of this topic and play a crucial role in the renown of Deep Learning. Inspired by the human vision system, CNN has been applied to visual data exploitation and has solved various challenges in various computer vision tasks and video/image analysis, including action recognition (AR). However, not long ago, along with the achievement of the transformer in natural language processing (NLP), it began to set new trends in vision tasks, which has created a discussion around whether the Vision Transformer models (ViT) will replace CNN in action recognition in video clips. This paper conducts this trending topic in detail, the study of CNN and Transformer for Action Recognition separately and a comparative study of the accuracy-complexity trade-off. Finally, based on the performance analysis’s outcome, the question of whether CNN or Vision Transformers will win the race will be discussed. Full article
(This article belongs to the Special Issue Sensing Technologies for Image/Video Analysis)
Show Figures

Figure 1

17 pages, 852 KiB  
Review
Convolutional Neural Networks and Heuristic Methods for Crowd Counting: A Systematic Review
by Khouloud Ben Ali Hassen, José J. M. Machado and João Manuel R. S. Tavares
Sensors 2022, 22(14), 5286; https://doi.org/10.3390/s22145286 - 15 Jul 2022
Cited by 5 | Viewed by 1917
Abstract
The crowd counting task has become a pillar for crowd control as it provides information concerning the number of people in a scene. It is helpful in many scenarios such as video surveillance, public safety, and future event planning. To solve such tasks, [...] Read more.
The crowd counting task has become a pillar for crowd control as it provides information concerning the number of people in a scene. It is helpful in many scenarios such as video surveillance, public safety, and future event planning. To solve such tasks, researchers have proposed different solutions. In the beginning, researchers went with more traditional solutions, while recently the focus is on deep learning methods and, more specifically, on Convolutional Neural Networks (CNNs), because of their efficiency. This review explores these methods by focusing on their key differences, advantages, and disadvantages. We have systematically analyzed algorithms and works based on the different models suggested and the problems they are trying to solve. The main focus is on the shift made in the history of crowd counting methods, moving from the heuristic models to CNN models by identifying each category and discussing its different methods and architectures. After a deep study of the literature on crowd counting, the survey partitions current datasets into sparse and crowded ones. It discusses the reviewed methods by comparing their results on the different datasets. The findings suggest that the heuristic models could be even more effective than the CNN models in sparse scenarios. Full article
(This article belongs to the Special Issue Sensing Technologies for Image/Video Analysis)
Show Figures

Figure 1

Back to TopTop