sensors-logo

Journal Browser

Journal Browser

Advances in Image and Video Encoding Algorithm and H/W Design

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (30 April 2023) | Viewed by 19762

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Engineering, Dong-A University, Busan 602760, Korea
Interests: deep learning; image and video signal processing; pattern recognition; video coding standards

Special Issue Information

Dear Colleagues,

Over the past several years, the impressive progress of image/video coding schemes including statistical signal processing and deep learning approaches has been being reported in MPEG and JVET. Despite a short history of the deep-learning-based field, end-to-end trained coding models based on deep learning have evolved as a promising nonlinear signal processing framework, and recent works have shown that some learned models can achieve significant compression gains over traditional methods. In addition, there are many traditional signal processing approaches. However, in terms of complexity and their structures, there are still unsolved problems and compelling research challenges that remain to be addressed. First, the computational burden is still high and not suitable for real-time application. Therefore, we need the energy-efficient and fast architectures for low-complexity image/video encoding and decoding, which is essential for practical applications. In addition, improved compression for different target applications such as object detection or high-quality video streaming for machine-to-machine communication and variable-rate control and efficient bit allocation with deep learning frameworks have not been investigated yet.

This Special Issues invites original researches addressing important innovative and timely challenges of the community, which is deep learning and statistical signal processing for image/video coding such as JPEG, HEVC, and VVC standards in energy-aware, real-world environments.

Specific topics of interest include, but are not limited to, the following:

  • Advanced signal compression modules for image and video coding;
  • Machine-learning-based on efficient decoding structure;
  • Efficient and fast hardware/network architectures for learning-based image/video coding;
  • Fast schemes using lightweight DNN structures;
  • Automated machine learning for image/video coding;
  • End-to-end learning framework for image/video coding;
  • Quality assessment models reflecting the human perception of quality and its applications in image/video coding;
  • Deep learning techniques for optimizing traditional image/video codecs;
  • Bandwidth control mechanisms for network optimization;
  • Video encoding schemes for networked distributed camera system;
  • Various image/video coding for machine vision applications;

Prof. Dr. Byung-Gyu Kim
Dr. Dongsan Jun
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image/video signal processing
  • compression algorithm
  • video coding standars
  • deep neural network for video coding
  • camera sensor network

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 3629 KiB  
Article
A Fast Algorithm for Intra-Frame Versatile Video Coding Based on Edge Features
by Shuai Zhao, Xiwu Shang, Guozhong Wang and Haiwu Zhao
Sensors 2023, 23(13), 6244; https://doi.org/10.3390/s23136244 - 07 Jul 2023
Cited by 1 | Viewed by 1287
Abstract
Versatile Video Coding (VVC) introduces many new coding technologies, such as quadtree with nested multi-type tree (QTMT), which greatly improves the efficiency of VVC coding. However, its computational complexity is higher, which affects the application of VVC in real-time scenarios. Aiming to solve [...] Read more.
Versatile Video Coding (VVC) introduces many new coding technologies, such as quadtree with nested multi-type tree (QTMT), which greatly improves the efficiency of VVC coding. However, its computational complexity is higher, which affects the application of VVC in real-time scenarios. Aiming to solve the problem of the high complexity of VVC intra coding, we propose a low-complexity partition algorithm based on edge features. Firstly, the Laplacian of Gaussian (LOG) operator was used to extract the edges in the coding frame, and the edges were divided into vertical and horizontal edges. Then, the coding unit (CU) was equally divided into four sub-blocks in the horizontal and vertical directions to calculate the feature values of the horizontal and vertical edges, respectively. Based on the feature values, we skipped unnecessary partition patterns in advance. Finally, for the CUs without edges, we decided to terminate the partition process according to the depth information of neighboring CUs. The experimental results show that compared with VTM-13.0, the proposed algorithm can save 54.08% of the encoding time on average, and the BDBR (Bjøntegaard delta bit rate) only increases by 1.61%. Full article
(This article belongs to the Special Issue Advances in Image and Video Encoding Algorithm and H/W Design)
Show Figures

Figure 1

19 pages, 2605 KiB  
Article
Integrating Visual and Network Data with Deep Learning for Streaming Video Quality Assessment
by George Margetis, Grigorios Tsagkatakis, Stefania Stamou and Constantine Stephanidis
Sensors 2023, 23(8), 3998; https://doi.org/10.3390/s23083998 - 14 Apr 2023
Cited by 2 | Viewed by 1726
Abstract
Existing video Quality-of-Experience (QoE) metrics rely on the decoded video for the estimation. In this work, we explore how the overall viewer experience, quantified via the QoE score, can be automatically derived using only information available before and during the transmission of videos, [...] Read more.
Existing video Quality-of-Experience (QoE) metrics rely on the decoded video for the estimation. In this work, we explore how the overall viewer experience, quantified via the QoE score, can be automatically derived using only information available before and during the transmission of videos, on the server side. To validate the merits of the proposed scheme, we consider a dataset of videos encoded and streamed under different conditions and train a novel deep learning architecture for estimating the QoE of the decoded video. The major novelty of our work is the exploitation and demonstration of cutting-edge deep learning techniques in automatically estimating video QoE scores. Our work significantly extends the existing approach for estimating the QoE in video streaming services by combining visual information and network conditions. Full article
(This article belongs to the Special Issue Advances in Image and Video Encoding Algorithm and H/W Design)
Show Figures

Figure 1

19 pages, 1716 KiB  
Article
Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network
by Young-Ju Choi, Young-Woon Lee, Jongho Kim, Se Yoon Jeong, Jin Soo Choi and Byung-Gyu Kim
Sensors 2023, 23(5), 2631; https://doi.org/10.3390/s23052631 - 27 Feb 2023
Cited by 1 | Viewed by 1559
Abstract
As the demands of various network-dependent services such as Internet of things (IoT) applications, autonomous driving, and augmented and virtual reality (AR/VR) increase, the fifthgeneration (5G) network is expected to become a key communication technology. The latest video coding standard, versatile video coding [...] Read more.
As the demands of various network-dependent services such as Internet of things (IoT) applications, autonomous driving, and augmented and virtual reality (AR/VR) increase, the fifthgeneration (5G) network is expected to become a key communication technology. The latest video coding standard, versatile video coding (VVC), can contribute to providing high-quality services by achieving superior compression performance. In video coding, inter bi-prediction serves to improve the coding efficiency significantly by producing a precise fused prediction block. Although block-wise methods, such as bi-prediction with CU-level weight (BCW), are applied in VVC, it is still difficult for the linear fusion-based strategy to represent diverse pixel variations inside a block. In addition, a pixel-wise method called bi-directional optical flow (BDOF) has been proposed to refine bi-prediction block. However, the non-linear optical flow equation in BDOF mode is applied under assumptions, so this method is still unable to accurately compensate various kinds of bi-prediction blocks. In this paper, we propose an attention-based bi-prediction network (ABPN) to substitute for the whole existing bi-prediction methods. The proposed ABPN is designed to learn efficient representations of the fused features by utilizing an attention mechanism. Furthermore, the knowledge distillation (KD)- based approach is employed to compress the size of the proposed network while keeping comparable output as the large model. The proposed ABPN is integrated into the VTM-11.0 NNVC-1.0 standard reference software. When compared with VTM anchor, it is verified that the BD-rate reduction of the lightweighted ABPN can be up to 5.89% and 4.91% on Y component under random access (RA) and low delay B (LDB), respectively. Full article
(This article belongs to the Special Issue Advances in Image and Video Encoding Algorithm and H/W Design)
Show Figures

Figure 1

16 pages, 6039 KiB  
Article
Intra Prediction Method for Depth Video Coding by Block Clustering through Deep Learning
by Dong-seok Lee and Soon-kak Kwon
Sensors 2022, 22(24), 9656; https://doi.org/10.3390/s22249656 - 09 Dec 2022
Viewed by 975
Abstract
In this paper, we propose an intra-picture prediction method for depth video by a block clustering through a neural network. The proposed method solves a problem that the block that has two or more clusters drops the prediction performance of the intra prediction [...] Read more.
In this paper, we propose an intra-picture prediction method for depth video by a block clustering through a neural network. The proposed method solves a problem that the block that has two or more clusters drops the prediction performance of the intra prediction for depth video. The proposed neural network consists of both a spatial feature prediction network and a clustering network. The spatial feature prediction network utilizes spatial features in vertical and horizontal directions. The network contains a 1D CNN layer and a fully connected layer. The 1D CNN layer extracts the spatial features for a vertical direction and a horizontal direction from a top block and a left block of the reference pixels, respectively. 1D CNN is designed to handle time-series data, but it can also be applied to find the spatial features by regarding a pixel order in a certain direction as a timestamp. The fully connected layer predicts the spatial features of the block to be coded through the extracted features. The clustering network finds clusters from the spatial features which are the outputs of the spatial feature prediction network. The network consists of 4 CNN layers. The first 3 CNN layers combine two spatial features in the vertical and horizontal directions. The last layer outputs the probabilities that pixels belong to the clusters. The pixels of the block are predicted by the representative values of the clusters that are the average of the reference pixels belonging to the clusters. For the intra prediction for various block sizes, the block is scaled to the size of the network input. The prediction result through the proposed network is scaled back to the original size. In network training, the mean square error is used as a loss function between the original block and the predicted block. A penalty for output values far from both ends is introduced to the loss function for clear network clustering. In the simulation results, the bit rate is saved by up to 12.45% under the same distortion condition compared with the latest video coding standard. Full article
(This article belongs to the Special Issue Advances in Image and Video Encoding Algorithm and H/W Design)
Show Figures

Figure 1

21 pages, 10574 KiB  
Article
Video Super-Resolution Method Using Deformable Convolution-Based Alignment Network
by Yooho Lee, Sukhee Cho and Dongsan Jun
Sensors 2022, 22(21), 8476; https://doi.org/10.3390/s22218476 - 03 Nov 2022
Cited by 2 | Viewed by 1975
Abstract
With the advancement of sensors, image and video processing have developed for use in the visual sensing area. Among them, video super-resolution (VSR) aims to reconstruct high-resolution sequences from low-resolution sequences. To use consecutive contexts within a low-resolution sequence, VSR learns the spatial [...] Read more.
With the advancement of sensors, image and video processing have developed for use in the visual sensing area. Among them, video super-resolution (VSR) aims to reconstruct high-resolution sequences from low-resolution sequences. To use consecutive contexts within a low-resolution sequence, VSR learns the spatial and temporal characteristics of multiple frames of the low-resolution sequence. As one of the convolutional neural network-based VSR methods, we propose a deformable convolution-based alignment network (DCAN) to generate scaled high-resolution sequences with quadruple the size of the low-resolution sequences. The proposed method consists of a feature extraction block, two different alignment blocks that use deformable convolution, and an up-sampling block. Experimental results show that the proposed DCAN achieved better performances in both the peak signal-to-noise ratio and structural similarity index measure than the compared methods. The proposed DCAN significantly reduces the network complexities, such as the number of network parameters, the total memory, and the inference speed, compared with the latest method. Full article
(This article belongs to the Special Issue Advances in Image and Video Encoding Algorithm and H/W Design)
Show Figures

Figure 1

15 pages, 524 KiB  
Article
A Method of Deep Learning Model Optimization for Image Classification on Edge Device
by Hyungkeuk Lee, NamKyung Lee and Sungjin Lee
Sensors 2022, 22(19), 7344; https://doi.org/10.3390/s22197344 - 27 Sep 2022
Cited by 4 | Viewed by 2133
Abstract
Due to the recent increasing utilization of deep learning models on edge devices, the industry demand for Deep Learning Model Optimization (DLMO) is also increasing. This paper derives a usage strategy of DLMO based on the performance evaluation through light convolution, quantization, pruning [...] Read more.
Due to the recent increasing utilization of deep learning models on edge devices, the industry demand for Deep Learning Model Optimization (DLMO) is also increasing. This paper derives a usage strategy of DLMO based on the performance evaluation through light convolution, quantization, pruning techniques and knowledge distillation, known to be excellent in reducing memory size and operation delay with a minimal accuracy drop. Through experiments regarding image classification, we derive possible and optimal strategies to apply deep learning into Internet of Things (IoT) or tiny embedded devices. In particular, strategies for DLMO technology most suitable for each on-device Artificial Intelligence (AI) service are proposed in terms of performance factors. In this paper, we suggest a possible solution of the most rational algorithm under very limited resource environments by utilizing mature deep learning methodologies. Full article
(This article belongs to the Special Issue Advances in Image and Video Encoding Algorithm and H/W Design)
Show Figures

Figure 1

18 pages, 52184 KiB  
Article
Object-Cooperated Ternary Tree Partitioning Decision Method for Versatile Video Coding
by Sujin Lee, Sang-hyo Park and Dongsan Jun
Sensors 2022, 22(17), 6328; https://doi.org/10.3390/s22176328 - 23 Aug 2022
Viewed by 1331
Abstract
In this paper, we propose an object-cooperated decision method for efficient ternary tree (TT) partitioning that reduces the encoding complexity of versatile video coding (VVC). In most previous studies, the VVC complexity was reduced using decision schemes based on the encoding context, which [...] Read more.
In this paper, we propose an object-cooperated decision method for efficient ternary tree (TT) partitioning that reduces the encoding complexity of versatile video coding (VVC). In most previous studies, the VVC complexity was reduced using decision schemes based on the encoding context, which do not apply object detecion models. We assume that high-level objects are important for deciding whether complex TT partitioning is required because they can provide hints on the characteristics of a video. Herein, we apply an object detection model that discovers and extracts the high-level object features—the number and ratio of objects from frames in a video sequence. Using the extracted features, we propose machine learning (ML)-based classifiers for each TT-split direction to efficiently reduce the encoding complexity of VVC and decide whether the TT-split process can be skipped in the vertical or horizontal direction. The TT-split decision of classifiers is formulated as a binary classification problem. Experimental results show that the proposed method more effectively decreases the encoding complexity of VVC than a state-of-the-art model based on ML. Full article
(This article belongs to the Special Issue Advances in Image and Video Encoding Algorithm and H/W Design)
Show Figures

Figure 1

19 pages, 7417 KiB  
Article
A Novel Grayscale Image Encryption Scheme Based on the Block-Level Swapping of Pixels and the Chaotic System
by Muhammad Hanif, Nadeem Iqbal, Fida Ur Rahman, Muhammad Adnan Khan, Taher M. Ghazal, Sagheer Abbas, Munir Ahmad, Hussam Al Hamadi and Chan Yeob Yeun
Sensors 2022, 22(16), 6243; https://doi.org/10.3390/s22166243 - 19 Aug 2022
Cited by 11 | Viewed by 2399
Abstract
Hundreds of image encryption schemes have been conducted (as the literature review indicates). The majority of these schemes use pixels as building blocks for confusion and diffusion operations. Pixel-level operations are time-consuming and, thus, not suitable for many critical applications (e.g., telesurgery). Security [...] Read more.
Hundreds of image encryption schemes have been conducted (as the literature review indicates). The majority of these schemes use pixels as building blocks for confusion and diffusion operations. Pixel-level operations are time-consuming and, thus, not suitable for many critical applications (e.g., telesurgery). Security is of the utmost importance while writing these schemes. This study aimed to provide a scheme based on block-level scrambling (with increased speed). Three streams of chaotic data were obtained through the intertwining logistic map (ILM). For a given image, the algorithm creates blocks of eight pixels. Two blocks (randomly selected from the long array of blocks) are swapped an arbitrary number of times. Two streams of random numbers facilitate this process. The scrambled image is further XORed with the key image generated through the third stream of random numbers to obtain the final cipher image. Plaintext sensitivity is incorporated through SHA-256 hash codes for the given image. The suggested cipher is subjected to a comprehensive set of security parameters, such as the key space, histogram, correlation coefficient, information entropy, differential attack, peak signal to noise ratio (PSNR), noise, and data loss attack, time complexity, and encryption throughput. In particular, the computational time of 0.1842 s and the throughput of 3.3488 Mbps of this scheme outperforms many published works, which bears immense promise for its real-world application. Full article
(This article belongs to the Special Issue Advances in Image and Video Encoding Algorithm and H/W Design)
Show Figures

Figure 1

15 pages, 3744 KiB  
Article
Fusion-Based Versatile Video Coding Intra Prediction Algorithm with Template Matching and Linear Prediction
by Dan Luo, Shuhua Xiong, Chao Ren, Raymond Edward Sheriff and Xiaohai He
Sensors 2022, 22(16), 5977; https://doi.org/10.3390/s22165977 - 10 Aug 2022
Cited by 1 | Viewed by 1560
Abstract
The new generation video coding standard Versatile Video Coding (VVC) has adopted many novel technologies to improve compression performance, and consequently, remarkable results have been achieved. In practical applications, less data, in terms of bitrate, would reduce the burden of the sensors and [...] Read more.
The new generation video coding standard Versatile Video Coding (VVC) has adopted many novel technologies to improve compression performance, and consequently, remarkable results have been achieved. In practical applications, less data, in terms of bitrate, would reduce the burden of the sensors and improve their performance. Hence, to further enhance the intra compression performance of VVC, we propose a fusion-based intra prediction algorithm in this paper. Specifically, to better predict areas with similar texture information, we propose a fusion-based adaptive template matching method, which directly takes the error between reference and objective templates into account. Furthermore, to better utilize the correlation between reference pixels and the pixels to be predicted, we propose a fusion-based linear prediction method, which can compensate for the deficiency of single linear prediction. We implemented our algorithm on top of the VVC Test Model (VTM) 9.1. When compared with the VVC, our proposed fusion-based algorithm saves a bitrate of 0.89%, 0.84%, and 0.90% on average for the Y, Cb, and Cr components, respectively. In addition, when compared with some other existing works, our algorithm showed superior performance in bitrate savings. Full article
(This article belongs to the Special Issue Advances in Image and Video Encoding Algorithm and H/W Design)
Show Figures

Figure 1

13 pages, 778 KiB  
Article
Low-Complexity Multiple Transform Selection Combining Multi-Type Tree Partition Algorithm for Versatile Video Coding
by Liqiang He, Shuhua Xiong, Ruolan Yang, Xiaohai He and Honggang Chen
Sensors 2022, 22(15), 5523; https://doi.org/10.3390/s22155523 - 25 Jul 2022
Cited by 5 | Viewed by 1483
Abstract
Despite the fact that Versatile Video Coding (VVC) achieves a superior coding performance to High-Efficiency Video Coding (HEVC), it takes a lot of time to encode video sequences due to the high computational complexity of the tools. Among these tools, Multiple Transform Selection [...] Read more.
Despite the fact that Versatile Video Coding (VVC) achieves a superior coding performance to High-Efficiency Video Coding (HEVC), it takes a lot of time to encode video sequences due to the high computational complexity of the tools. Among these tools, Multiple Transform Selection (MTS) require the best of several transforms to be obtained using the Rate-Distortion Optimization (RDO) process, which increases the time spent video encoding, meaning that VVC is not suited to real-time sensor application networks. In this paper, a low-complexity multiple transform selection, combined with the multi-type tree partition algorithm, is proposed to address the above issue. First, to skip the MTS process, we introduce a method to estimate the Rate-Distortion (RD) cost of the last Coding Unit (CU) based on the relationship between the RD costs of transform candidates and the correlation between Sub-Coding Units’ (sub-CUs’) information entropy under binary splitting. When the sum of the RD costs of sub-CUs is greater than or equal to their parent CU, the RD checking of MTS will be skipped. Second, we make full use of the coding information of neighboring CUs to terminate MTS early. The experimental results show that, compared with the VVC, the proposed method achieves a 26.40% reduction in time, with a 0.13% increase in Bjøontegaard Delta Bitrate (BDBR). Full article
(This article belongs to the Special Issue Advances in Image and Video Encoding Algorithm and H/W Design)
Show Figures

Figure 1

25 pages, 28793 KiB  
Article
Deep Learning Post-Filtering Using Multi-Head Attention and Multiresolution Feature Fusion for Image and Intra-Video Quality Enhancement
by Ionut Schiopu and Adrian Munteanu
Sensors 2022, 22(4), 1353; https://doi.org/10.3390/s22041353 - 10 Feb 2022
Cited by 2 | Viewed by 2334
Abstract
The paper proposes a novel post-filtering method based on convolutional neural networks (CNNs) for quality enhancement of RGB/grayscale images and video sequences. The lossy images are encoded using common image codecs, such as JPEG and JPEG2000. The video sequences are encoded using previous [...] Read more.
The paper proposes a novel post-filtering method based on convolutional neural networks (CNNs) for quality enhancement of RGB/grayscale images and video sequences. The lossy images are encoded using common image codecs, such as JPEG and JPEG2000. The video sequences are encoded using previous and ongoing video coding standards, high-efficiency video coding (HEVC) and versatile video coding (VVC), respectively. A novel deep neural network architecture is proposed to estimate fine refinement details for full-, half-, and quarter-patch resolutions. The proposed architecture is built using a set of efficient processing blocks designed based on the following concepts: (i) the multi-head attention mechanism for refining the feature maps, (ii) the weight sharing concept for reducing the network complexity, and (iii) novel block designs of layer structures for multiresolution feature fusion. The proposed method provides substantial performance improvements compared with both common image codecs and video coding standards. Experimental results on high-resolution images and standard video sequences show that the proposed post-filtering method provides average BD-rate savings of 31.44% over JPEG and 54.61% over HEVC (x265) for RGB images, Y-BD-rate savings of 26.21% over JPEG and 15.28% over VVC (VTM) for grayscale images, and 15.47% over HEVC and 14.66% over VVC for video sequences. Full article
(This article belongs to the Special Issue Advances in Image and Video Encoding Algorithm and H/W Design)
Show Figures

Figure 1

Back to TopTop