sensors-logo

Journal Browser

Journal Browser

Image/Video Coding and Processing Techniques for Intelligent Sensor Nodes

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: 5 May 2024 | Viewed by 11776

Special Issue Editors


E-Mail Website
Guest Editor
Intelligent Media Processing Lab, Hosei University, Tokyo 102-8160, Japan
Interests: image sensors; computer vision; image processing; video coding

E-Mail Website
Guest Editor
Graduate School of Information Science and Technology, Osaka University, Osaka 565-0871, Japan
Interests: image/video processing for embedded system; design methodology for embedded systems
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518071, China
Interests: computational photography; image/video processing and coding

Special Issue Information

Dear Colleagues,

There is an increasing interest in developing intelligent sensor nodes which enable intelligent processing for Internet of Things (IoT) surveillance, remote sensing, and smart city applications. The data are processed on board by embedded signal processing and machine learning-based analysis algorithms. Such machine learning-driven sensors can transmit key information instead of raw sensing data, lowering the data volume traveling throughout the network.

Due to the explosion of image and video data in IoT systems, specifically designed image and video codecs have been preferred in recent years. With a focus on reducing the data burden and improving the reconstructed image quality, image/video coding and processing techniques performing in low-cost implementations, saving power consumption, and increasing the battery lifetime will cope with the design requirements of sensor nodes. Moreover, intelligent sensors change the traditional intuition-driven sensors in support of machine learning algorithms, delivering high-resolution images and videos for the 5G revolution.

In line with the mission of Sensors, the organizers of this Special Issue endeavor to bring the most recent advancements in image/video coding and processing techniques to intelligent sensor nodes from the academic and industrial perspectives.

Dr. Jinjia Zhou
Dr. Ittetsu Taniguchi
Prof. Dr. Xin Jin
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image/video coding
  • image sensing
  • image/video processing
  • wireless communication
  • wireless sensor network
  • computational imaging

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 6423 KiB  
Article
Block Partitioning Information-Based CNN Post-Filtering for EVC Baseline Profile
by Kiho Choi
Sensors 2024, 24(4), 1336; https://doi.org/10.3390/s24041336 - 19 Feb 2024
Viewed by 574
Abstract
The need for efficient video coding technology is more important than ever in the current scenario where video applications are increasing worldwide, and Internet of Things (IoT) devices are becoming widespread. In this context, it is necessary to carefully review the recently completed [...] Read more.
The need for efficient video coding technology is more important than ever in the current scenario where video applications are increasing worldwide, and Internet of Things (IoT) devices are becoming widespread. In this context, it is necessary to carefully review the recently completed MPEG-5 Essential Video Coding (EVC) standard because the EVC Baseline profile is customized to meet the specific requirements needed to process IoT video data in terms of low complexity. Nevertheless, the EVC Baseline profile has a notable disadvantage. Since it is a codec composed only of simple tools developed over 20 years, it tends to represent numerous coding artifacts. In particular, the presence of blocking artifacts at the block boundary is regarded as a critical issue that must be addressed. To address this, this paper proposes a post-filter using a block partitioning information-based Convolutional Neural Network (CNN). The proposed method in the experimental results objectively shows an approximately 0.57 dB for All-Intra (AI) and 0.37 dB for Low-Delay (LD) improvements in each configuration by the proposed method when compared to the pre-post-filter video, and the enhanced PSNR results in an overall bitrate reduction of 11.62% for AI and 10.91% for LD in the Luma and Chroma components, respectively. Due to the huge improvement in the PSNR, the proposed method significantly improved the visual quality subjectively, particularly in blocking artifacts at the coding block boundary. Full article
Show Figures

Figure 1

16 pages, 2721 KiB  
Article
Multi-Frame Content-Aware Mapping Network for Standard-Dynamic-Range to High-Dynamic-Range Television Artifact Removal
by Zheng Wang and Gang He
Sensors 2024, 24(1), 299; https://doi.org/10.3390/s24010299 - 04 Jan 2024
Viewed by 686
Abstract
Recently, advancements in image sensor technology have paved the way for the proliferation of high-dynamic-range television (HDRTV). Consequently, there has been a surge in demand for the conversion of standard-dynamic-range television (SDRTV) to HDRTV, especially due to the dearth of native HDRTV content. [...] Read more.
Recently, advancements in image sensor technology have paved the way for the proliferation of high-dynamic-range television (HDRTV). Consequently, there has been a surge in demand for the conversion of standard-dynamic-range television (SDRTV) to HDRTV, especially due to the dearth of native HDRTV content. However, since SDRTV often comes with video encoding artifacts, SDRTV to HDRTV conversion often amplifies these encoding artifacts, thereby reducing the visual quality of the output video. To solve this problem, this paper proposes a multi-frame content-aware mapping network (MCMN), aiming to improve the performance of conversion from low-quality SDRTV to high-quality HDRTV. Specifically, we utilize the temporal spatial characteristics of videos to design a content-aware temporal spatial alignment module for the initial alignment of video features. In the feature prior extraction stage, we innovatively propose a hybrid prior extraction module, including cross-temporal priors, local spatial priors, and global spatial prior extraction. Finally, we design a temporal spatial transformation module to generate an improved tone mapping result. From time to space, from local to global, our method makes full use of multi-frame information to perform inverse tone mapping of single-frame images, while it is also able to better repair coding artifacts. Full article
Show Figures

Figure 1

16 pages, 33769 KiB  
Article
Transformer-Based Multiple-Object Tracking via Anchor-Based-Query and Template Matching
by Qinyu Wang, Chenxu Lu, Long Gao and Gang He
Sensors 2024, 24(1), 229; https://doi.org/10.3390/s24010229 - 30 Dec 2023
Cited by 2 | Viewed by 837
Abstract
Multiple object tracking (MOT) plays an important role in intelligent video-processing tasks, which aims to detect and track all moving objects in a scene. Joint-detection-and-tracking (JDT) methods are thriving in MOT tasks, because they accomplish the detection and data association in a single [...] Read more.
Multiple object tracking (MOT) plays an important role in intelligent video-processing tasks, which aims to detect and track all moving objects in a scene. Joint-detection-and-tracking (JDT) methods are thriving in MOT tasks, because they accomplish the detection and data association in a single stage. However, the slow training convergence and insufficient data association limit the performance of JDT methods. In this paper, the anchor-based query (ABQ) is proposed to improve the design of the JDT methods for faster training convergence. By augmenting the coordinates of the anchor boxes into the learnable queries of the decoder, the ABQ introduces explicit prior spatial knowledge into the queries to focus the query-to-feature learning of the JDT methods on the local region, which leads to faster training speed and better performance. Moreover, a new template matching (TM) module is designed for the JDT methods, which enables the JDT methods to associate the detection results and trajectories with historical features. Finally, a new transformer-based MOT method, ABQ-Track, is proposed. Extensive experiments verify the effectiveness of the two modules, and the ABQ-Track surpasses the performance of the baseline JDT methods, TransTrack. Specifically, the ABQ-Track only needs to train for 50 epochs to achieve convergence, while that for TransTrack is 150 epochs. Full article
Show Figures

Figure 1

15 pages, 5389 KiB  
Article
Edge-Oriented Compressed Video Super-Resolution
by Zheng Wang, Guancheng Quan and Gang He
Sensors 2024, 24(1), 170; https://doi.org/10.3390/s24010170 - 28 Dec 2023
Viewed by 508
Abstract
Due to the proliferation of video data in Internet of Things (IoT) systems, in order to reduce the data burden, most social media platforms typically employ downsampling to reduce the resolution of high-resolution (HR) videos before video coding. Consequently, the loss of detail [...] Read more.
Due to the proliferation of video data in Internet of Things (IoT) systems, in order to reduce the data burden, most social media platforms typically employ downsampling to reduce the resolution of high-resolution (HR) videos before video coding. Consequently, the loss of detail and the introduction of additional artifacts seriously compromise the quality of experience (QoE). Recently, the task of compressive video super-resolution (CVSR) has garnered significant attention, aiming to simultaneously eliminate compression artifacts and enhance the resolution of compressed videos. In this paper, we propose an edge-oriented compressed video super-resolution network (EOCVSR), which focuses on reconstructing higher-quality details, to effectively address the CVSR task. Firstly, we devised a motion-guided alignment module (MGAM) to achieve precise bi-direction motion compensation in a multi-scale manner. Secondly, we introduced an edge-oriented recurrent block (EORB) to reconstruct edge information by combining the merits of explicit and implicit edge extraction. In addition, benefiting from the recurrent structure, the receptive field of EOCVSR can be enhanced and the features can be effectively refined without introducing additional parameters. Extensive experiments conducted on benchmark datasets demonstrate that our method surpasses the performance of state-of-the-art (SOTA) approaches in both quantitative and qualitative evaluations. Our approach can provide users with high-quality and cost-effective HR videos by integrating with sensors and codecs. Full article
Show Figures

Figure 1

20 pages, 5006 KiB  
Article
Inpainting with Separable Mask Update Convolution Network
by Jun Gong, Senlin Luo, Wenxin Yu and Liang Nie
Sensors 2023, 23(15), 6689; https://doi.org/10.3390/s23156689 - 26 Jul 2023
Viewed by 926
Abstract
Image inpainting is an active area of research in image processing that focuses on reconstructing damaged or missing parts of an image. The advent of deep learning has greatly advanced the field of image restoration in recent years. While there are many existing [...] Read more.
Image inpainting is an active area of research in image processing that focuses on reconstructing damaged or missing parts of an image. The advent of deep learning has greatly advanced the field of image restoration in recent years. While there are many existing methods that can produce high-quality restoration results, they often struggle when dealing with images that have large missing areas, resulting in blurry and artifact-filled outcomes. This is primarily because of the presence of invalid information in the inpainting region, which interferes with the inpainting process. To tackle this challenge, the paper proposes a novel approach called separable mask update convolution. This technique automatically learns and updates the mask, which represents the missing area, to better control the influence of invalid information within the mask area on the restoration results. Furthermore, this convolution method reduces the number of network parameters and the size of the model. The paper also introduces a regional normalization technique that collaborates with separable mask update convolution layers for improved feature extraction, thereby enhancing the quality of the restored image. Experimental results demonstrate that the proposed method performs well in restoring images with large missing areas and outperforms state-of-the-art image inpainting methods significantly in terms of image quality. Full article
Show Figures

Figure 1

18 pages, 24519 KiB  
Article
Adapting Single-Image Super-Resolution Models to Video Super-Resolution: A Plug-and-Play Approach
by Wenhao Wang, Zhenbing Liu, Haoxiang Lu, Rushi Lan and Yingxin Huang
Sensors 2023, 23(11), 5030; https://doi.org/10.3390/s23115030 - 24 May 2023
Viewed by 1649
Abstract
The quality of videos varies due to the different capabilities of sensors. Video super-resolution (VSR) is a technology that improves the quality of captured video. However, the development of a VSR model is very costly. In this paper, we present a novel approach [...] Read more.
The quality of videos varies due to the different capabilities of sensors. Video super-resolution (VSR) is a technology that improves the quality of captured video. However, the development of a VSR model is very costly. In this paper, we present a novel approach for adapting single-image super-resolution (SISR) models to the VSR task. To achieve this, we first summarize a common architecture of SISR models and perform a formal analysis of adaptation. Then, we propose an adaptation method that incorporates a plug-and-play temporal feature extraction module into existing SISR models. The proposed temporal feature extraction module consists of three submodules: offset estimation, spatial aggregation, and temporal aggregation. In the spatial aggregation submodule, the features obtained from the SISR model are aligned to the center frame based on the offset estimation results. The aligned features are fused in the temporal aggregation submodule. Finally, the fused temporal feature is fed to the SISR model for reconstruction. To evaluate the effectiveness of our method, we adapt five representative SISR models and evaluate these models on two popular benchmarks. The experiment results show the proposed method is effective on different SISR models. In particular, on the Vid4 benchmark, the VSR-adapted models achieve at least 1.26 dB and 0.067 improvement over the original SISR models in terms of PSNR and SSIM metrics, respectively. Additionally, these VSR-adapted models achieve better performance than the state-of-the-art VSR models. Full article
Show Figures

Figure 1

19 pages, 1269 KiB  
Article
A Highly Pipelined and Highly Parallel VLSI Architecture of CABAC Encoder for UHDTV Applications
by Chen Fu, Heming Sun, Zhiqiang Zhang and Jinjia Zhou
Sensors 2023, 23(9), 4293; https://doi.org/10.3390/s23094293 - 26 Apr 2023
Cited by 1 | Viewed by 1376
Abstract
Recently, specifically designed video codecs have been preferred due to the expansion of video data in Internet of Things (IoT) devices. Context Adaptive Binary Arithmetic Coding (CABAC) is the entropy coding module widely used in recent video coding standards such as HEVC/H.265 and [...] Read more.
Recently, specifically designed video codecs have been preferred due to the expansion of video data in Internet of Things (IoT) devices. Context Adaptive Binary Arithmetic Coding (CABAC) is the entropy coding module widely used in recent video coding standards such as HEVC/H.265 and VVC/H.266. CABAC is a well known throughput bottleneck due to its strong data dependencies. Because the required context model of the current bin often depends on the results of the previous bin, the context model cannot be prefetched early enough and then results in pipeline stalls. To solve this problem, we propose a prediction-based context model prefetching strategy, effectively eliminating the clock consumption of the contextual model for accessing data in memory. Moreover, we offer multi-result context model update (MCMU) to reduce the critical path delay of context model updates in multi-bin/clock architecture. Furthermore, we apply pre-range update and pre-renormalize techniques to reduce the multiplex BAE’s route delay due to the incomplete reliance on the encoding process. Moreover, to further speed up the processing, we propose to process four regular and several bypass bins in parallel with a variable bypass bin incorporation (VBBI) technique. Finally, a quad-loop cache is developed to improve the compatibility of data interactions between the entropy encoder and other video encoder modules. As a result, the pipeline architecture based on the context model prefetching strategy can remove up to 45.66% of the coding time due to stalls of the regular bin, and the parallel architecture can also save 29.25% of the coding time due to model update on average under the condition that the Quantization Parameter (QP) is equal to 22. At the same time, the throughput of our proposed parallel architecture can reach 2191 Mbin/s, which is sufficient to meet the requirements of 8 K Ultra High Definition Television (UHDTV). Additionally, the hardware efficiency (Mbins/s per k gates) of the proposed architecture is higher than that of existing advanced pipeline and parallel architectures. Full article
Show Figures

Figure 1

20 pages, 8135 KiB  
Article
Learning-Based Rate Control for High Efficiency Video Coding
by Sovann Chen, Supavadee Aramvith and Yoshikazu Miyanaga
Sensors 2023, 23(7), 3607; https://doi.org/10.3390/s23073607 - 30 Mar 2023
Viewed by 1252
Abstract
High efficiency video coding (HEVC) has dramatically enhanced coding efficiency compared to the previous video coding standard, H.264/AVC. However, the existing rate control updates its parameters according to a fixed initialization, which can cause errors in the prediction of bit allocation to each [...] Read more.
High efficiency video coding (HEVC) has dramatically enhanced coding efficiency compared to the previous video coding standard, H.264/AVC. However, the existing rate control updates its parameters according to a fixed initialization, which can cause errors in the prediction of bit allocation to each coding tree unit (CTU) in frames. This paper proposes a learning-based mapping method between rate control parameters and video contents to achieve an accurate target bit rate and good video quality. The proposed framework contains two main structural codings, including spatial and temporal coding. We initiate an effective learning-based particle swarm optimization for spatial and temporal coding to determine the optimal parameters at the CTU level. For temporal coding at the picture level, we introduce semantic residual information into the parameter updating process to regulate the bit correctly on the actual picture. Experimental results indicate that the proposed algorithm is effective for HEVC and outperforms the state-of-the-art rate control in the HEVC reference software (HM-16.10) by 0.19 dB on average and up to 0.41 dB for low-delay P coding structure. Full article
Show Figures

Figure 1

20 pages, 1671 KiB  
Article
Low-Complexity Lossless Coding of Asynchronous Event Sequences for Low-Power Chip Integration
by Ionut Schiopu and Radu Ciprian Bilcu
Sensors 2022, 22(24), 10014; https://doi.org/10.3390/s222410014 - 19 Dec 2022
Cited by 3 | Viewed by 1590
Abstract
The event sensor provides high temporal resolution and generates large amounts of raw event data. Efficient low-complexity coding solutions are required for integration into low-power event-processing chips with limited memory. In this paper, a novel lossless compression method is proposed for encoding the [...] Read more.
The event sensor provides high temporal resolution and generates large amounts of raw event data. Efficient low-complexity coding solutions are required for integration into low-power event-processing chips with limited memory. In this paper, a novel lossless compression method is proposed for encoding the event data represented as asynchronous event sequences. The proposed method employs only low-complexity coding techniques so that it is suitable for hardware implementation into low-power event-processing chips. A first, novel, contribution consists of a low-complexity coding scheme which uses a decision tree to reduce the representation range of the residual error. The decision tree is formed by using a triplet threshold parameter which divides the input data range into several coding ranges arranged at concentric distances from an initial prediction, so that the residual error of the true value information is represented by using a reduced number of bits. Another novel contribution consists of an improved representation, which divides the input sequence into same-timestamp subsequences, wherein each subsequence collects the same timestamp events in ascending order of the largest dimension of the event spatial information. The proposed same-timestamp representation replaces the event timestamp information with the same-timestamp subsequence length and encodes it together with the event spatial and polarity information into a different bitstream. Another novel contribution is the random access to any time window by using additional header information. The experimental evaluation on a highly variable event density dataset demonstrates that the proposed low-complexity lossless coding method provides an average improvement of 5.49%, 11.45%, and 35.57% compared with the state-of-the-art performance-oriented lossless data compression codecs Bzip2, LZMA, and ZLIB, respectively. To our knowledge, the paper proposes the first low-complexity lossless compression method for encoding asynchronous event sequences that are suitable for hardware implementation into low-power chips. Full article
Show Figures

Figure 1

19 pages, 6097 KiB  
Article
Vision-Based Structural Modal Identification Using Hybrid Motion Magnification
by Dashan Zhang, Andong Zhu, Wenhui Hou, Lu Liu and Yuwei Wang
Sensors 2022, 22(23), 9287; https://doi.org/10.3390/s22239287 - 29 Nov 2022
Cited by 1 | Viewed by 1444
Abstract
As a promising alternative to conventional contact sensors, vision-based technologies for a structural dynamic response measurement and health monitoring have attracted much attention from the research community. Among these technologies, Eulerian video magnification has a unique capability of analyzing modal responses and visualizing [...] Read more.
As a promising alternative to conventional contact sensors, vision-based technologies for a structural dynamic response measurement and health monitoring have attracted much attention from the research community. Among these technologies, Eulerian video magnification has a unique capability of analyzing modal responses and visualizing modal shapes. To reduce the noise interference and improve the quality and stability of the modal shape visualization, this study proposes a hybrid motion magnification framework that combines linear and phase-based motion processing. Based on the assumption that temporal variations can represent spatial motions, the linear motion processing extracts and manipulates the temporal intensity variations related to modal responses through matrix decomposition and underdetermined blind source separation (BSS) techniques. Meanwhile, the theory of Fourier transform profilometry (FTP) is utilized to reduce spatial high-frequency noise. As all spatial motions in a video are linearly controllable, the subsequent phase-based motion processing highlights the motions and visualizes the modal shapes with a higher quality. The proposed method is validated by two laboratory experiments and a field test on a large-scale truss bridge. The quantitative evaluation results with high-speed cameras demonstrate that the hybrid method performs better than the single-step phase-based motion magnification method in visualizing sound-induced subtle motions. In the field test, the vibration characteristics of the truss bridge when a train is driving across the bridge are studied with a commercial camera over 400 m away from the bridge. Moreover, four full-field modal shapes of the bridge are successfully observed. Full article
Show Figures

Figure 1

Back to TopTop