Next Article in Journal
Explainable Machine Learning Solution for Observing Optimal Surgery Timings in Thoracic Cancer Diagnosis
Next Article in Special Issue
An Efficient Method for Document Correction Based on Checkerboard Calibration Pattern
Previous Article in Journal
The Development Progress of Surface Structure Diffraction Gratings: From Manufacturing Technology to Spectroscopic Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Online Correlation Noise Model Based on Band Coefficients Mean to Achieve Low Computational and Coding-Efficient Distributed Video Codec

by
Shahzad Khursheed
1,*,
Nasreen Badruddin
1,*,
Varun Jeoti
2 and
Manzoor Ahmed Hashmani
3
1
Department of Electrical and Electronic Engineering, Institute of Health and Analytics, Universiti Teknologi PETRONAS, Seri Iskandar 32610, Malaysia
2
Faculty of Technical Sciences, University of Novi Sad, 21000 Novi Sad, Serbia
3
High Performance Cloud Computing Center (HPC3), Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar 32610, Malaysia
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2022, 12(13), 6505; https://doi.org/10.3390/app12136505
Submission received: 29 April 2022 / Revised: 8 June 2022 / Accepted: 8 June 2022 / Published: 27 June 2022
(This article belongs to the Special Issue Advances on Image, Video and Signal Processing)

Abstract

:
Distributed video coding (DVC) is a novel coding paradigm that offers low computational encoding relative to conventional video-coding framework at the expense of high-decoding computational complexity. The challenging part of this video-coding framework is achieving better rate-distortion (RD) compared with conventional codec performance. A suitable and accurate correlation noise model (CNM) is crucial in improving the RD performance by achieving high coding efficiency and making decoding less computationally demanding. Since the correlation is nonstationary and time-variant and can vary from frame to frame, offline CNM estimation is not feasible for practical applications and real-time decoding. An online CNM may be the solution to this problem. In DVC, neither Wyner–Ziv frame (WZF) nor estimated side information (SI) of the corresponding WZF is available at the encoder. Therefore, online estimation of the CNM and its parameters can be quite challenging. The contribution of this research work is a novel online CNM which is computed by taking the mean of each transformed coefficient band and deployed for two different codecs. Our proposed codec, DIVCOM, which stands for “Distributed Video Coding with Online Band Mean Correlation Noise Model”, outperforms the existing baseline codec, DISCOVER (DIS), in both coding efficiency and peak signal-to-noise ratio (PSNR). The DIVCOM codec achieves coding efficiency of up to 8.05 kbps, and PSNR ranges from 0.0245 dB to 0.18 dB. An extended version of DIVCOM incorporating phase-based side information called PDIVCOM achieves coding efficiency up to 10.9 kbps, and PSNR ranges from 0.019 to 0.17 dB compared to DIS.

1. Introduction

Many upstream applications of wireless video sensor networks (WVSN), such as multimedia sensor networks, real-time wireless video surveillance, environment monitoring, medical monitoring, and internet of things (IoT), have emerged over recent time [1,2,3,4]. The battery size and life and other limitations of the transmitting device are of paramount importance in such applications [5]. For such low-resource upstream applications, low computational complexity at the encoder is desirable [6]. One of the video coding paradigms that support these requirements is distributed video coding (DVC) [7]. This coding paradigm redistributes the coding complexity and shifts most of the computation to the decoder by exploiting the source statistics at the decoder [8] and making the encoder computationally light [9]. However, this new coding design is not fully developed and is getting more attention from the research community to improve its performance compared with conventional codecs for real-time and resource-constrained applications [10,11].
DVC is based on the Slepian–Wolf (SW) [12] and Wyner–Ziv (WZ) [13] theorems of Information Theory. According to the SW theorem, if two correlated sources are encoded separately and decoded jointly, then the minimum coding rate we can achieve is the same as that of joint encoding and decoding. Wyner and Ziv obtained similar bounds for lossy coding with the presence of decoder side information (SI). The two major categories of DVC are block-based and frame-based DVC frameworks. Most of the available DVC codecs in literature are based on the frame-based Stanford architecture [14], which can then be divided into two types—pixel domain WZ (PDWZ) coding and transform domain WZ (TDWZ) coding. TDWZ coding is widely used in the literature due to its coding efficiency, and the DISCOVER codec (DIS) [10] is considered the state-of-the-art codec based on this architecture [14].
In TDWZ DVC schemes, the video is split into frames, and these frames are organized into Group of Pictures (GOP) of sizes 2, 4, or 8. For every GOP, the first frame is the key-frame (KF) and encoded with intra-predictive coding (H.264 intra-encoder). The rest of the GOP frames, called Wyner–Ziv frames (WZFs), are encoded with WZ coding. The WZF is first split into 4 × 4 blocks, and discrete cosine transform (DCT) is applied on each block of the frame. The corresponding coefficients from each block are organized into 16 coefficient bands. Each coefficient band is quantized to the desired quality. For each band, the bit-planes are extracted after quantization and channel encoded (turbo or LDPC) for parity bit generation. These parity bits are stored in a buffer and transmitted on the decoder’s request through a feedback channel. At the decoder, the first KF is decoded. Then the SI, which is an approximated replica of the current WZF, is estimated with motion-compensated interpolation or extrapolation of the previous and subsequent decoded frames. One challenge in DVC is to find a channel noise model (CNM) that accurately models the statistical dependency in the form of virtual channel correlation noise, between the current WZF and its corresponding SI. A more accurate model will lead to improved performance in coding efficiency and rate distortion (RD).
In our previous research work [15], efforts were made to develop a phase-based frame interpolation algorithm that can effectively generate the SI faster and with lower computational complexity compared to the baseline codec, DIS. In this paper, however, our main objective is to design an online CNM framework that can be used to develop a full DVC codec that has less decoding computational complexity and can compete with DIS in coding efficiency and RD performance. In most DVC codecs, the computation of error distribution between the original WZF and corresponding SI is challenging due to the unavailability of SI at the encoder and the original WZF at the decoder. Therefore, in this paper, we attempt to design a framework for the online estimation of CNM for the Laplacian distribution parameter while considering the mean of the coefficient band. The main contributions of our research work are the following:
  • Designed a framework for an online correlation noise model (CNM) which was implemented in the DIS codec. This implementation is called DIVCOM, which stands for Distributed Video Coding with Online Band Mean Correlation Noise Model.
  • Implemented the online CNM in the codec developed in a previous work of the authors and presented in [15]. This implementation is called PDIVCOM, which stands for Phase-based Distributed Video Coding with Online Band Mean Correlation Noise Model.
  • Evaluated and compared the performance of the codecs with the proposed online CNM.
The rest of the paper is organized as follows: related work is covered in Section 2, followed by a comprehensive explanation of the DVC framework with the proposed online correlation noise model (CNM) estimation, and its mathematical formulation in Section 3. The results are presented and elaborated in Section 4, followed by a conclusion and future research work in Section 5.

2. Related Work

The estimation of an accurate CNM between the correlated sources either at encoder or decoder with fully unknown or partially known data [16,17] is a key problem in DVC. Therefore, a lot of literature is devoted to the estimation of the CNM, but most of them follow the unrealistic offline assumption [11]. In DVC, the overall coding efficiency and RD performance of codec improved with an accurate estimation of parameters of the CNM.
In the DVC codec, neither WZF nor the estimated SI of corresponding WZF is available at the decoder and encoder, respectively. Therefore, offline approaches are adopted to calculate the CNM parameter through the distribution model by assuming either SI is available at the encoder or the actual WZF is available at the decoder. In this approach, the parameter is calculated offline for the whole video sequence and later used in the decoder. This is one of the biggest hurdles in the practical implementation of DVC because offline estimation of the CNM does not perform well for all types of motion videos. Furthermore, the correlation noise is not stationary, and its statistics vary from frame to frame. Therefore, the estimation of CNM parameters for different sequences through the offline CNM is not feasible to achieve coding efficiency and high RD performance. This is because when it is implemented at the decoder, it does not exploit the variability of CNM from frame to frame for actual motion. Furthermore, this is a complex task since the original information is not available in the decoder, and the SI quality varies throughout the sequence. However, if the model accurately describes the WZF and SI relationships, the coding performance is high and vice-versa. A Laplacian distribution model is applied in most architectures because of its excellent trade-off between model accuracy and complexity.
Hence, the focus has been put on online CNM parameter estimation without having access to the original WZF. The authors of [18] proposed several online correlation noise schemes for pixel-based coding at different granularity levels (frame, block, and pixel) by exploiting the temporal correlation between the decoded KF and estimated SI. In an extended work [19], the online CNM is adopted for the transform domain at the band level and coefficient level. Enhancements were made in [20] to the codec in [19] with the introduction of cross-band correlation by calculating residuals between bands, and a band classification map, which is updated after the successful decoding of each band. With cross-band correlation, the classification map of the current band is used to estimate the classification map of the next band. With this estimated cross-band classification of each band, the CNM parameter is calculated. In [21], the correlation noise parameter estimation is done at the DCT band level, where SI is refined after the decoding of each bit plane. Therefore, the authors attempted to refine the CNM as well. The authors in [22] put efforts into controlling the rate without a feedback channel. Based on the motion intensity, the algorithm adjusted the rates by switching the Laplacian distribution parameters in CNM between frame and block levels. The authors in [23] proposed an adaptive low computational DVC, which estimated correlation by using the expectation propagation during the channel decoding process. The correlation estimation is carried out jointly with decoding of the factor graph during the channel decoding process.
In [24], the authors proposed a CNM that is independently executed at both the encoder and decoder at no cost of extra computation at the encoder. At the encoder, the CNM calculates the number of least significant bits that are required to be sent to the decoder, assuming that the remainder is determined at the decoder. In [25], the authors proposed a hash-based DVC where the correlation noise is statistically dependent on the SI. The proposed algorithm performs online estimation of the SI-dependent correlation noise parameter at the transform coefficient bands level. Additionally, the algorithm successively refines the correlation parameter after decoding every bit-plane.
The authors in [26] attempted to reduce the deviation between the Laplacian statistical distribution model and the small and large residual coefficients by proposing a hybrid distribution correlation noise model. This hybrid model is based on the K-Mediods and Cauchy distributions clustering, which clusters all residual coefficients into small and large coefficients clusters. The small coefficients cluster is K-Mediods modeled to improve Laplacian distribution, while the large residual coefficients are modeled with Cauchy distribution.
On the other hand, the codec in [27] used the parallel LDPC decoding to estimate the correlation noise parameters and decode on the factor graph. This algorithm estimated one parameter for each band. The codec in [11] extended the work of [27] and proposed a recursive variational Bayes factor graph. Furthermore, it deployed a new message-passing algorithm to decode bit-planes corresponding to each band and simultaneously estimates and refines the correlation noise parameter.
The correlation noise parameter estimation plays an important role in achieving coding efficiency. In most codecs, the highly computational motion-compensated interpolation (MCTI) [10] is performed for the SI generation, and the resultant motion vectors are utilized for the correlation noise parameter estimation. However, in the codec presented in [15], an empirical study of a phase-based fast frame interpolation algorithm was conducted to effectively generate a low computational and fast SI in DVC when motion vectors are not available. For such codecs, there is a need to design an online CNM framework so it leads to the coding efficient codec framework along with low computational SI-generation feature. In this paper, our main focus is to design an online CNM framework for such codecs that will lead to the development of a full DVC codec with less decoding computational complexity and can compete with DIS in terms of coding efficiency and RD performance. Therefore, in this study, an attempt has been made to design a framework for online estimation of CNM for the Laplacian distribution parameter while considering the mean of the coefficients band.

3. Proposed DVC Framework for Online Correlation Noise Model

This section addresses the basic concept and implementation of the proposed codec and presents the proposed online correlation noise model methodology. We will first present the general concept of the proposed online correlation noise model based on the mean of band coefficients before moving on to the mathematical modeling of the proposed model.

3.1. General Concept of Proposed Online Correlation Noise Model

Figure 1 depicts the proposed DVC framework for calculating the online correlation noise model CNM for the DVC frameworks [10,15]. In [15], the motion estimation is not exploited at the decoder, and the actual frame is also not available at the decoder. Therefore, online CNM computation is quite a challenging task in such a DVC framework. Keeping that in mind, we proposed a novel online CNM based on the mean of band coefficients. In this framework, the Band Mean Calculator (BMC) is deployed at the encoder and decoder to perform the necessary calculations which are used in the CNM.
At the encoder, the BMC calculates the mean of the coefficients of each band of the WZF and sends them to the decoder. The BMC deployed at the decoder calculates the mean of each coefficient band of side information (SI) in a similar way as it does on the encoder. The Laplacian distribution is used to model the residual between the means of the original WZ and the corresponding SI frame for each band. The CNM calculation details and mathematical representation are given in Section 3.2. The SI generation process from [15] and conventional interpolation in [10] were used in this study. A brief overview of the phase-based SI generation algorithm from [15] is presented in Algorithm 1. We deployed and verified the performance of this novel online CNM concept for the DIS codec and named it Distributed Video Coding with Online Band Mean Correlation Noise Model (DIVCOM). Next, the same online CNM concept is deployed for the DVC codec with the phase-based SI [15], and we called the implementation Phase-based Distributed Video Coding with Online Band Mean Correlation Noise Model (PDIVCOM).

3.2. Mathematical Representation of Phase Interpolation for Side Information Generation

The standard interpolation approaches, e.g., optical flow, require accurate pixel correspondences between images to interpolate the in-between frames. In DVC, motion-compensated temporal interpolation (MCTI) is usually used for SI generation. However, in this paper, we implemented the phase-based interpolation for SI at the decoder. This technique represents motion as phase shifts in individual pixels and the computational cost is a fraction of that of any of the traditional SI interpolation techniques. In particular, the phase-based technique decomposes the frames into local phase and amplitude parameters using the complex-valued steerable pyramid. This complex steerable pyramid decomposes the one-dimensional (1D) or two-dimensional (2D) signal into spatial scale, orientation, and position [28]. In videos, the phase-based interpolation method manipulates motion by analyzing the signal of the local phase over time in different spatial scales and orientations. It mitigates the need for highly complex global computation. The computation efficiency is offered by phase-based representation, as it only performs per-pixel modifications to represent the pixel motion by shifting its phase [29]. Further efficiency is achieved due to the phase-shift correction method that combines the phase information across different levels of a multi-scale pyramid. The successfulness of image transition is achieved due to the deployment of a correction algorithm which adapts both amplitude and phase shift of the image. Before starting the part of actual image interpolation, we first explain its working concept for a basic one-dimensional (1D) signal.
The Fourier shift theorem motivates the assumption that motion can be encoded using the phase differences. Let us consider the 1D case, in which any function f ( x ) can be represented in the Fourier domain as a sum of complex sinusoids over all frequencies ω by Equation (1),
f ( x ) = ω = ω = + A ω e i ω x = ω = ω = + A ω e i Ø ω
where the Ø ω and A ω represent the phase and amplitude of the complex sinusoid, respectively. Then the shifted version of function f ( x ) shifted by spatial displacement δ ( t )   of is given by Equation (2).
f ( x + δ ( t )   ) = ω = ω = + A ω e i ω ( x + δ ( t ) )
The phase difference, Ø d i f f ω , between the original and shifted functions can be represented by Equation (3).
Ø d i f f ω = ω x ω ( x δ ( t ) ) = ω δ ( t )
The phase shift Ø s h i f t   , which corresponds to the actual spatial displacement between translated functions is defined in terms of the phase difference Ø d i f f ω between two phase curves scaled by angular frequency ω and is given in Equation (4).
Ø s h i f t   = Ø d i f f ω ω
In the Fourier transform domain, the shifted function is defined as the sum of complex sinusoids over all frequencies ω , and is given by Equation (5),
f ( x + δ ( t )   ) = ω = ω = + R ω ( x , t )
where each sinusoid represents one band, i.e., R ω ( x , t ) = A ω e i ω ( x + δ ( t ) ) . Now for the intermediate sinusoids representing the translational functions, the phase difference is modified according to the intermediate positions between functions that are defined by weight β ( 0 , 1 ) . Therefore, the modified bands Ř ω ( x , t ) with corresponding modified phase Ø ω = ω ( x + β δ ( t ) ) w.r.t β is given by Equation (6).
Ř ω ( x , t ) = A ω e i ω ( x + β δ ( t ) )
The in-between functions are then taken by integrating the modified bands following Equation (5).

3.2.1. Two Dimensional Functions—General Mathematical Representation

In two-dimensional (2D) functions, the sinusoids are separated into bands according to frequency ω , as well as the spatial phase orientations i.e., the even and odd phase orientations, by using the complex-valued steerable pyramid filter [30,31,32]. This filter decomposes the input images into several oriented frequency bands R ω , ϑ , and thus allows for meaningful phase measurement from coefficients of a pyramid. The real part of each coefficient is represented in cosine form and known as even-symmetric filtered, whereas the imaginary part is in sine filter representation, known as an odd-symmetric filtered. The steerable pyramids that are non-oriented and not captured during the pyramid levels are summarized as real-valued high and low-pass residual signal components.

3.2.1.1. Phase Computation Steps

During the phase computation, the complex-valued response R ω , ϑ obtained by applying steerable filters ψ ω , ϑ [30] on image I , and are represented in Equations (7)–(9),
R ω , ϑ ( x , y ) = ( I * ψ ω , ϑ ) ( x , y )
= A ω , ϑ ( x , y ) e i Ø ω , ϑ ( x , y )
= C ω , ϑ ( x , y ) + i   S ω , ϑ ( x , y )   ,
where C ω , ϑ and S ω , ϑ define the cosine and sine parts, respectively. As stated before, the cosine part, C ω , ϑ and the sine part, S ω , ϑ represent even- and odd-symmetric filter responses, respectively. Therefore, the amplitude and phase components can be computed by Equations (10) and (11), respectively.
A ω , ϑ ( x , y ) = C ω , ϑ ( x , y ) 2 + S ω , ϑ ( x , y ) 2
Ø ω , ϑ ( x , y ) = arctan ( S ω , ϑ ( x , y )   C ω , ϑ ( x , y )   )

3.2.1.2. Phase Difference Calculation Steps

As stated earlier, in phase interpolation, the motion is represented by a phase shift; therefore, interpolating the phase shift requires the phase difference Ø d i f f   between the phases of two input images, Ø 1   and Ø 2   and is computed by Equation (12),
Ø d i f f   = atan 2 ( sin (   Ø 1   Ø 2   ) , cos (   Ø 1   Ø 2   ) ) ,  
where atan 2 ( · ) represents the four-quadrant inverse tangent and leads to angular values between [ π , π ] , which corresponds to the angular difference between two input image phases. Additionally, it determines the limit of motion, which is bounded by Equation (13),
| Ø s h i f t   | = | Ø d i f f   | ω π ω  
where ω = 2 π υ , and υ denotes the spatial frequency of the pyramid level. In a multi-scale pyramid, each level represents a particular band of spatial frequencies υ [ υ m i n ,    υ m a x ] . Let υ m a x correspond to the highest frequency of any level then a phase difference of π represents a shift of one pixel. This appears to be a reasonable shift in coarser pyramid levels for low-frequency content; however, for high-frequency content, it is too limiting to achieve realistic interpolation.

3.2.1.3. Phase Shifting and Correction

To avoid the phase ambiguity which happens due to large displacement corresponding to a phase difference of more than π , and since the phase is periodic, the phase difference is normalized to between [ π ,   π ] .
Interpolation works accurately as long as the shift computed on a particular level mainly captures the frequency content corresponding to the correct motion. Therefore, shift correction based on confidence estimation is deployed to correctly estimate the motion even for high-frequency content. This approach robustly interpolates the accurate motion for high frequency by taking all available shift information into account. This approach assumes that phase difference between two resolution levels does not differ arbitrarily, i.e., phase differences between levels can be used as a confidence measure that quantifies whether the estimated phase shift is admissible.
The phase shift correction is performed on the level l if the computed shift at level l differs from the coarser level l + 1 by more than a threshold. The phase shift correction is performed by first adding multiples of ± 2 π to Ø d i f f   . This leads to absolute differences between the phase values of consecutive levels that can never be greater than a given tolerance. The π is used as tolerance distance which modifies the phase values in such a way that the phase difference of a pixel between two levels can never be larger than π . This step allows a meaningful range extension because the original phase difference are truncated to the range [ π ,   π ] . The actual shift correction depends on the difference between two levels, which is taken under consideration for confidence estimation and is calculated using Equation (14),
φ = atan 2 ( sin ( Ø d i f f l Ø d i f f l + 1 ) , cos ( Ø d i f f l Ø d i f f l + 1 ) ) ,  
where the coarser level phase value is scaled according to the arbitrary pyramid scale factor λ > 1 . When | φ | > π 2 , the shift correction is performed to obtain the corrected phase difference using (15) and considerably leads to a better interpolation result.
Ø ˜ d i f f l = λ Ø d i f f l + 1  
Although the phase shift correction can model large motion, it still has the limitation of presenting blur artifacts in some motions. Therefore, an additional enhancement step is performed to limit even admissible phase shifts into well representable motions. Therefore, the phase difference is limited by a constant, Ø l i m i t   , defined by Equation (16)
Ø l i m i t   = τ π λ L l  
where τ defines the percentage of limitation and τ ( 0 , 1 ) , L is the total number of levels, λ is the scale factor and l is the current level. At the coarser level, if the magnitude of phase difference exceeds Ø l i m i t   then the corrected phase difference is set to zero.
The next step is a smooth-phase interpolation between phases of input image 1, Ø 1   , and input image 2, Ø 2   . For convenience, from here onwards we will drop the superscript l to represent the pyramid level. As a result of shift correction, there is no surety that Ø 1   + Ø ˜ d i f f   matches Ø 2   or any of the equivalent multiples of Ø 2   + γ 2 π where γ 0 . For smooth interpolation, the original phases Ø 1   and Ø 2   are preserved along with shift-corrected phase difference Ø ˜ d i f f   . Therefore, phase adjustment is done using
Ø ˇ d i f f = Ø d i f f   + γ ¤ 2 π  
to give us the adjusted phase difference, Ø ˇ d i f f , where the γ ¤ is calculated using
γ ¤ = argmin   γ { ( Ø ˜ d i f f   ( Ø d i f f   + γ 2 π   ) ) 2 } .  
After the phase adjustment, the phase for the interpolated image, Ø β   , is computed using Equation (19) by taking adjusted phase difference Ø ˇ d i f f and the phase of one of the images, e.g., Ø 1   .
Ø β   = Ø 1   + β Ø ˇ d i f f  
The next step is a reconstruction of the interpolated image which requires the interpolated phase and the interpolated amplitude are required for smooth interpolation [33].
Algorithm 1 summarizes the execution steps involved in the interpolation of SI using the phase interpolation method. The inputs given to start the execution process are the previous and next keyframes (KFs) and the interpolation parameter, β . The previous and next KFs are represented by I 1 and I 2 respectively. The interpolated SI is denoted by I S I . The phase interpolation process is initialized with steerable pyramid decompositions of both keyframes ( I 1 and I 2 ) and calculation of their respective amplitudes ( A 1 and A 2 ). Later, the corresponding phases, Ø 1   and Ø 2   , and phase differences are calculated. For smooth interpolation of I S I , the level-by-level shift correction and phase adjustment of the phase difference is performed. The phase of the interpolated image, β and its amplitude, A β , are interpolated in steps 7 and 8, respectively, then recombined to generate the respective pyramids. Finally, all interpolated pyramids are utilized to reconstruct the interpolated image, I S I .
Algorithm 1 Side Information Generation with Phase Interpolation Process
INPUTS. Two input images are: I 1 and I 2 represent the previous and next keyframes (KFs), respectively.
Interpolation parameter: β
INITIALIZATION. Steerable pyramid decompositions: P 1 and P 2
Amplitudes Calculation: A 1 and A 2
OUTPUT. Output (interpolated Side Information): I S I
Step 1.  ( P 1 , P 2 ) ← Decompose ( I 1 , I 2 )         Equations (7)–(9), Refer [30]
Step 2.  ( A 1 , A 2 ) ← Amplitude ( I 1 , I 2 )             Equation (10)
Step 3.  ( Ø 1   , Ø 2   ) ← Phase ( P 1 , P 2 )                       Equation (11)
Step 4.   Ø d i f f    ← Phase Difference ( Ø 1   , Ø 2   )        Equation (12)
Step 5.   f o r   a l l   l = L 1 : 1   d o
              Ø ˜ d i f f l ← Shift Correction ( Ø ˜ d i f f l + 1 )           Section 3.2.1.3
              e n d   f o r
Step 6.    Ø ˇ d i f f    ← Adjust Phase ( Ø d i f f , Ø ˜ d i f f   )        Equation (17)
Step 7.    Ø β      ← Interpolate ( Ø 1   , Ø ˇ d i f f , β )              Equation (19)
Step 8.    A β ← Blend ( A 1 , A 2 , β )             Section 3.2.1.3, Refer [33]
Step 9.    P β ← Recombine ( Ø β , A β )                    Refer [30]
Step 10.   I S I ← Reconstruct ( P β )

3.3. Mathematical Representation of Proposed Online Correlation Noise Model Estimation

In conventional video coding, the distribution of the motion-compensated residual coefficients is modeled using the Laplacian distribution [34]. Different distribution models such as Gaussian distribution are found in the literature; however, the Laplacian distribution is often chosen to obtain a good trade-off between model complexity and accuracy [35]. Due to this trade-off feature, the Laplacian distribution is widely adopted in DVC for modeling the CNM.
Ideally, the calculation of the noise correlation parameter can either be done with the SI at the encoder or the actual WZF at the decoder. However, this is practically not possible, so for the online correlation noise parameter estimation, the different codecs exploited the temporal correlation at the decoder [18] at the cost of extra computation. In this work, we propose a novel frame-level online CNM parameter calculation method using transformed coefficient bands. For offline CNM, the Laplacian distribution of the residual of WZF and SI transformed coefficient at ( x , y ) , with Laplacian distribution parameter, α , is found using (20).
p [ W Z ( x , y ) S I ( x , y ) ] = α   2 exp [ α   | W Z ( x , y ) S I ( x , y ) | ]  
where α = 2 σ 2 is used to determines the error distribution between WZ(x,y) and SI(x,y), where σ 2 defines the variance [19].
To compute the online CNM using the mean of transformed coefficients of a band at the decoder, it is necessary to estimate the Laplacian distribution parameter which is necessary to determine the error distribution between the original frame and SI. Therefore, for the proposed frame-level Laplacian distribution model, the estimation of the Laplacian distribution parameter is a crucial step. Since, for our proposed method, the Laplacian distribution model is computed at the frame level, the Laplacian distribution parameter is denoted as α f . The steps involved in estimating α f are elaborated next.
Let μ R be the set of means of residuals of the coefficient bands, μ   b R . Therefore, for a total of 16 coefficient bands in a frame,
{ μ   b R | μ   b R μ R ,   b = 1 , 2 , .. , 16   }  
and
μ   b R = μ b W Z μ b S I ,   b = 1 , 2 , , 16 ,  
where μ b W Z and μ b S I are the means of the WZF and SI transformed coefficient bands, respectively. The μ b   W Z , is calculated at the encoder using
μ b W Z = 1 M j = 1 M X D C T , b ( j ) .  
In (23), X D C T , b ( j ) is the transformed coefficient of the WZF for the j -th coefficient in band b . On the other hand, the value, μ b   S I , is the mean of transformed coefficients of SI band and is calculated at the decoder by using
μ b   S I = 1 M j = 1 M Y D C T , b ( j ) .    
In (24), Y D C T , b ( j ) is the transformed coefficient of the SI for the j -th coefficient in band b . In both (23) and (24), j = 1 , 2 , , M where M defines the total number of coefficients in a band.
Next, the variance, σ f 2 , for the proposed residual means, μ R , is calculated using the equation
σ f 2 = E [ ( μ R ) 2 ] ( E [ μ R ] ) 2   ,  
where E [ μ R ] and E [ ( μ R ) 2 ] are computed using Equations (26) and (27), respectively.
E [ μ R ] = b = 1 16 μ b   R   P b    
E [ ( μ R ) 2 ] = b = 1 16 ( μ b   R ) 2 P b  
In (26) and (27), P b   denotes the probability of occurrence of μ b   R in μ R .
Finally, the Laplacian distribution parameter α f is calculated using
α f = { 1 2 × 10 2 × ( σ f 2 ) ,   0 < σ f 2 1 1 10 × μ R s u m ,   1 < σ f 2 < 100   &   μ R s u m 50     1 20 μ R s u m ,   1 < σ f 2 < 100   &   μ R s u m > 50 1 μ R s u m   ,   σ f 2 100    
where μ R s u m is the sum of absolute values of all residuals in μ R and given as
μ R s u m = b = 1 16 | μ b   R | .  
In (29), | μ b   R | represents the absolute mean value of the residual between the WZF and its corresponding SI for coefficient band b , where b = 1 , 2 , , 16 . Therefore, for the frame-level Laplacian distribution, Equation (20) becomes
p [ | μ b   R | ] = α f 2 exp [ α f | μ b   R | ] .  
Algorithm 2 is used to perform the step-by-step calculation of the Laplacian distribution parameter α f under the proposed CNM methodology. The process starts by computing the values of μ b W Z ,   μ b   S I and subsequently, μ b   R for all b . All the values of μ b   R are then used in the computation of the variance, σ f 2 , in step 4. Next, the Laplacian distribution parameter α f is calculated in step 5. Finally, the procedure is repeated for subsequent frames.
Algorithm 2 Determine the Steps for Calculation of Laplacian Distribution Parameter α f under Proposed Online Correlation Noise Model Methodology
INPUTS. Transformed Coefficients of Band- X D C T ( j ) ,   Y D C T ( j )
INITIALIZATION. Mean of coefficient bands of WZF
OUTPUTS.   α f
Step 1.   for all  b = 1 : 16  do
            Calculate   μ b   W Z                                               Equation (23)
         end for
Step 2.   for all  b = 1 : 16  do
                 Calculate      μ b   S I                                          Equation (24)
            end for
Step 3.       Calculate Residual of corresponding μ b   W Z and μ S I
            for all  b = 1 : 16  do
            μ b   R μ b   W Z μ b   S I                                             Equation (22)
         end for
Step 4.    Calculate σ f 2
            σ f 2           E [ ( μ R ) 2 ] ( E [ μ R ] ) 2            Equations (25)–(27)
Step 5.    Calculate α f                                                     Equation (28)
            μ R s u m     b = 1 16 | μ b   R |              b                     Equation (29)
            if     0 < σ f 2 1  then
                      α f       1 2 × 10 2 × ( σ f 2 )
            else if       1 < σ f 2 < 100 and μ R s u m 50  then
            α f       1 10 × μ   R s u m
            else if 1 < σ f 2 < 100 and μ R s u m > 50  then
            α f       1 20 μ R s u m
            else σ f 2 > 100 and μ R s u m   R          then
                                  α f       1 μ R s u m
            end if
Step 6. REPEAT STEPS 1–5 FOR THE UPCOMING FRAME

4. Results and Discussion

4.1. Experimental Setup

The experiments are carried out on a system with Core (TM) i7-7820HQ, 64-bit OS, CPU 2.90 GHz, and RAM 32 GB specifications. The results are compiled for the different test video sequences. The performance is analyzed by running the experiments on full video sequences with a frame rate (fps) of 15 Hz and a group of pictures (GOP) of size 2. The experiments were performed using six different values of the quantization metric, Qm, for quantizing the WZF to achieve different output qualities. The definitions and values of Qm used in the proposed codecs are the same as the ones used by DIS in [10]. A list of abbreviations for the different codecs evaluated is given in Table 1. Different parameters are involved in phase-based interpolation for SI generation. Parameter setting is required to obtain the best SI quality. Therefore, the following parameter settings are adopted for SI generation; the number of pyramid levels is 17, the phase shift is 0.4 radians, the number of orientations is 12, and the pyramid scale is ( 0.4 ) 1 4 .

4.2. Performance Evaluation

Four video sequences, i.e., “Coastguard”, “Akiyo”, “Foreman” and “Hall”, were used to compare the performance of the proposed codecs, against the DIS and Intra H.264 codecs. For each video sequence, the average PSNRs and coding rates are determined for different values of the quantization metric, Qm. The average PSNR and coding rate of Intra H.264, DIS, DIVCOM, and PDIVCOM were computed for a frame rate of 15 Hz. Next, the performance of the Wyner–Ziv coder is analyzed by evaluating the average channel-decoding rate required to correct the errors of SI for successfully decoding the WZF. Finally, the rate distortion (RD) performance is compared between Intra H.264, DIS, DIVCOM, and PDIVCOM.

4.2.1. “Coastguard” Video Sequence

From Table 2, we can observe that in most cases, DIVCOM gives a better average PSNR than DIS, while PDIVCOM gives a better average PSNR than DIS in all cases. PDIVCOM outperforms DIVCOM for lower values of Qm, while DIVCOM outperforms PDIVCOM at higher values of Qm. In terms of coding efficiency, the coding rate of DIVCOM is better than DIS by between 0.687 kbps to 8.05 kbps, while the coding rate of PDIVCOM is better than DIS by between 1.49 kbps to 8.955 kbps. However, closer analysis at the frame-to-frame level reveals that the RD performance of the proposed codecs is reduced in some frames of this video test sequence. Even so, the decrease in PSNR is small when taking into consideration the improvement in the coding rate.
Table 3 presents the results for the Wyner–Ziv (WZ) coder part for decoding WZF. For DIVCOM, the average PSNR gain per WZF over DIS ranges from 0.0521 dB to 0.36 dB, while for PDIVCOM, the gain ranges from 0.039 dB to 0.3323 dB. The average channel-decoding rate results indicate that DIVCOM achieves an average coding efficiency value between 101 bits to 1.151 kbits, whereas the average coding efficiency of PDIVCOM ranges from 219 bits to 1.28 kbits. At the individual frame level, the RD performance of the proposed codecs is degraded in some frames. However, overall, these codecs perform well for most of the frames in the test sequence. From Table 3, we can see that DIVCOM gave a negative average PSNR gain for quantization point 3. The reduction in performance is due to the CNM parameter in the reconstruction process. Even so, the average decoding rate indicates that better coding efficiency is achieved by DIVCOM. For the higher quantization points, the average coding rates of DIVCOM and PDIVCOM are close to that of DIS because, at these higher points, a variant coding rate behavior is noticed for the proposed codecs. For some frames of the sequence, a bit rate per frame that is close to or higher than DIS is achieved; however, higher coding efficiency is attained for most of the frames.
Figure 2 depicts the experimental results in the form of RD performance carried out with Intra H.264, DIS, DIVCOM, and PDIVCOM on the “Coastguard” video. It can be seen that for all quantization points, the performance of DIVCOM and PDIVCOM is comparable to DIS. For low quantization points, the DIVCOM gives a coding efficiency of roughly between 1 and 2.5 kbps along with a significant improvement in the PSNR. The PDIVCOM also shows almost the same performance trend in both coding efficiency and PSNR. At higher quantization points, both DIVCOM and PDIVCOM outperform DIS. DIVCOM achieves a coding efficiency of up to 8 kbps and an improvement in PSNR of 0.18 dB over DIS. PDIVCOM performs even better than DIVCOM and achieves a coding efficiency of up to 9 kbps and PSNR gain of up to 0.17 dB. The small improvement in coding efficiency at lower quantization points is due to fewer bands undergoing the decoding process. However, the improvement becomes significant at higher quantization points because more bands undergo the decoding process. Upon further evaluation of the frames for each quantization point, a higher coding rate is observed for a few of the frames in the test sequence. This is due to a miscalculation of the CNM parameter by the proposed model. Even the PSNR quality is also compromised for those frames. The performance comparison of DIVCOM with Intra H.264 shows that DIVCOM achieves better coding efficiency for all quantization points with a bit rate saving of up to 41.92 kbps. For low to intermediate quantization points, DIVCOM achieves a PSNR gain of up to 0.69 dB. However, at the highest quantization point, the PSNR degrades by up to 0.28 dB. PDIVCOM shows better coding efficiency for all quantization points compared to Intra H.264 with bit rate savings ranging from 30.23 kbps to 42.83 kbps. At the same time, PDIVCOM achieves PSNR gains up to 0.71 dB for all quantization points. However, similar to DIVCOM its PSNR performance degrades at the highest quantization point by 0.29 dB.

4.2.2. “Akiyo” Video Sequence

Table 4 presents the experimental results of the “Akiyo” video sequence. From the table, it can be seen that the improvement in PSNR of DIVCOM over the DIS codec is between 0.007 dB and 0.021 dB, while PDIVCOM gives an improvement in PSNR between 0.037 dB and 0.12 dB over the DIS codec. Even though the coding efficiency of DIVCOM improves by between 0.38 kbps and 1.7 kbps over DIS codec. In some cases, when the variance is less than 0.1, the coding rate of DIVCOM is worse than DIS. The coding efficiency of PDIVCOM is better than DIS by between 1.53 kbps and 10.938 kbps. On a frame-by-frame base analysis, PDIVCOM is better than DIVCOM because it has fewer occurrences of frames that have worse PSNR and coding rates than DIS. Overall, the PDIVCOM outperforms other codecs investigated for this test video sequence.
Table 5 presents the results for the Wyner–Ziv (WZ) coder part for decoding WZF. For DIVCOM, the average PSNR gain per WZF over DIS ranges from 0.0137 dB to 0.413 dB, while for PDIVCOM, the gain ranges from 0.0654 dB to 0.24 dB. The average channel-decoding rate results indicate that DIVCOM achieves an average coding efficiency value between 55 bits and 243 kbits. However, in one case where a few frames had a variance of less than 0.1, additional bits were requested several times to mitigate the errors. This resulted in DIVCOM having a worse coding rate than DIS. PDIVCOM achieves an average coding rate between 219 bits and 1.28 kbits. It can also be observed that for this video sequence PDIVCOM outperforms both DIS and DIVCOM in terms of coding efficiency. The gain comparison shows that there are only slight PSNR gains acquired by both proposed codecs. This is because the estimated CNM parameter used in the decoding process is not suitable for some of the frames with a variance of around 0.1, so this results in inaccurately reconstructed frames.
The graph in Figure 3 illustrates the RD performance of the codecs on the “Akiyo” test video sequence at all quantization points. Both DIVCOM and PDIVCOM perform better than DIS by having a better coding efficiency and PSNR at all quantization points. From the RD graph, it is noticed that for lower quantization points, the DIVCOM achieves a coding efficiency of up to 1.31 kbps and PSNR gain of up to 0.017 dB. However, the DIVCOM coding efficiency slightly reduces at higher quantization points but still manages to achieve a PSNR gain of 0.021 dB for these points. PDIVCOM achieves a higher coding efficiency than both DIS and DIVCOM. At low quantization points, the coding efficiency of PDIVCOM can go up to 3.71 kbps, and up to 11 kbps at higher quantization points. The PSNR gain achieved by PDIVCOM is up to 0.05 dB at the lower quantization points up to 0.12 dB at higher points. Compared to Intra H.264, the DIVCOM achieves a higher coding efficiency ranging between 1 kbps and 40 kbps and PSNR gain ranging from 0.06 dB to 0.29 dB for low and intermediate quantization points. However, at high quantization points, the coding efficiency of DIVCOM degrades by 2.8 kbps to 14.4 kbps. The PSNR of DIVCOM is also worse than Intra H.264 by 0.10 dB. The comparison between PDIVCOM and Intra H.264 shows that PDIVCOM achieves a better coding efficiency ranging from 4 kbps to 48 kbps for all quantization points and a PSNR gain ranging from 0.02 dB to 0.32 dB.

4.2.3. “Foreman” Video Sequence

Table 6 presents the experimental results of the “Foreman” video sequence. It can be seen that DIVCOM can achieve a PSNR improvement of between 0.0043 dB and 0.03 dB over DIS, however, in some cases, DIVCOM gives a worse PSNR than DIS. Additionally, DIVCOM also shows improvement in coding efficiency of between 1.19 kbps to 6.4 kbps over DIS. On the other hand, even though PDIVCOM can achieve a PSNR gain of up to 0.021 dB over DIS, in most cases it performs worse than DIS due to the reconstruction procedure. The coding efficiency of PDIVCOM is better than DIS by between 0.34 kbps to 4.1 kbps but PDIVCOM does not perform as well as DIVCOM.
Table 7 presents the results for the Wyner–Ziv (WZ) coder part for decoding the WZF of the “Foreman” video sequence. For DIVCOM, the average PSNR gain per WZF over DIS ranges from 0.0089 dB to 0.0627 dB while for PDIVCOM, the gain can reach up to 0.0434 dB. The results also indicate that DIVCOM achieves an average channel coding efficiency of between 174 bits to 934 bits, while for PDIVCOM, the coding efficiency ranges from 45 bits to 594 bits. However, the visual quality of both DIVCOM and PDIVCOM is worse than DIS due to the degradation of a few frames during the reconstruction process.
Figure 4 shows the RD performance for the “Foreman” test sequence. The graph indicates that both DIVCOM and PDIVCOM achieve better coding efficiency at all quantization points. However, in terms of visual quality, both codecs generally performed worse than DIS, as can be seen from the PSNR values. The PSNR performance degradation of DIVCOM is up to 0.1 dB, while for PDIVCOM, the maximum loss of 0.21 dB. By evaluating the plot for DIVCOM, it is noticed that for lower to middle quantization points, the coding performance of the codec is better than DIS by between 1.19 kbps and 2.36 kbps. This coding efficiency can improve up to 6.38 kbps at higher quantization points. Similarly, PDIVCOM achieves a coding efficiency of between 0.314 kbps and 1.2168 kbps at lower and middle quantization points, and up to 4.0576 kbps at higher quantization points. Therefore, both DIVCOM and PDIVCOM outperformed DIS in terms of coding efficiency but performed worse in terms of visual quality. At low and intermediate quantization points, the bit rate saving achieved by DIVCOM ranges from 3.6 kbps to 8.7 kbps compared to Intra H.264. However, at higher quantization points, DIVCOM shows a higher bit rate ranging from 3.6 kbps to 24.5 kbps. DIVCOM achieves a PSNR of around 0.45 dB to 0.63 dB for all quantization points. The comparative evaluation of PDIVCOM and Intra H.264 shows that the PDIVCOM achieves a bit rate saving ranging from 7.9 kbps to 10.4 kbps for low to intermediate quantization points. However, at high quantization points the coding rate of PDIVCOM is higher than Intra H.264 by up to 30 kbps is noticed. The PSNR gain of PDIVCOM over Intra H.264 ranges from 0.46 dB to 0.69 dB for all quantization points.

4.2.4. “Hall” Video Sequence

Table 8 presents the experimental results of the “Hall” video sequence. It can be seen that both DIVCOM and PDIVCOM achieved a higher average PSNR compared to DIS. When we analyze on a frame-by-frame basis, it is found that for most of the frames a slight PSNR gain is achieved by PDIVCOM, but for some frames the PSNR is reduced. Overall, the average PSNR gain of PDIVCOM over DIS is between 0.005 dB to 0.026 dB. PDIVCOM also achieved an improvement in average coding efficiency by between 0.32 kbps to 4.129 kbps over DIS. Similarly, DIVCOM also shows a variance in the PSNR gain over DIS on a frame-to-frame basis, but the overall average PSNR gain is 0.026 dB. The improvement in coding efficiency of DIVCOM over DIS is in the range of 1.17 kbps to 6.354 kbps. Both DIVCOM and PDIVCOM surpass DIS in terms of coding efficiency, with DIVCOM performing relatively better than PDIVCOM.
Table 9 presents the results for the Wyner–Ziv (WZ) coder part for decoding WZF. For the “Hall” video sequence, DIVCOM achieved an average PSNR gain of up to 0.053 dB over DIS for the reconstructed WZF, while PDIVCOM achieved an average PSNR gain in the range of 0.0102 dB to 0.0545 dB. The average channel decoding rate results indicate that DIVCOM achieved an average channel coding efficiency of between 46 bits and 929 bits while the coding efficiency of PDIVCOM ranges from 171 bits to 604 bits. Although both DIVCOM and PDIVCOM resulted in poorer visual quality for some of the quantization points due to degradation in some of the frames during reconstruction, they achieved higher coding efficiency than DIS.
Figure 4 shows the RD performance for the “Hall” test sequence. This visual representation shows that both DIVCOM and PDIVCOM outperform DIS in coding efficiency. Further analysis shows that DIVCOM outperforms DIS in coding efficiency by a range of 1.17 kbps to 1.87 kbps for lower quantization points and up to 6.354 kbps at higher quantization points. The graph also indicates that PDIVCOM achieves better coding efficiency compared to DIS. For lower and middle quantization points, the coding efficiency of 0.317 kbps to 1.231 kbps is recorded for this test sequence, while for higher quantization points, the coding efficiency increases to 4.13 kbps. In terms of PSNR gain, the performance of DIVCOM and PDIVCOM were mixed. DIVCOM outperforms DIS by up to 0.026 dB but can also suffer a PSNR loss of up to 0.03 dB. Similarly, PDIVCOM outperforms DIS by up to 0.0139 dB but can also suffer a PSNR loss of up to 0.06 dB. The WZF reconstruction process plays a key role in PSNR performance gain. The loss in the performance of the DIVCOM and PDIVCOM for the “Hall” and “Foreman” video sequences is mainly due to the degradation during the reconstruction process. Figure 5 shows that DIVCOM and PDIVCOM outperform the Intra H.264 codec in coding efficiency and PSNR gain for all quantization points. DIVCOM achieves a better coding efficiency ranging from 62 kbps to 94.4 kbps, while its PSNR performance also improves and with a gain of around 0.06 dB. The graph analysis indicates that PDIVCOM achieved a bit rate saving ranging from 61.16 kbps to 91.73 kbps over Intra H.264 for all quantization points. In terms of PSNR gain, PDIVCOM outperforms Intra H.264, by 0.04 dB to 0.12 dB.

4.2.5. Bjøntegaard Delta Performance Evaluation

Table 10 presents the Bjontegaard metric performance results of DIVCOM, PDIVCOM, and DISCOVER. The codecs’ performance results are presented in terms of BD-Rate and BD-PSNR. As reported in Table 10, DIVCOM and PDIVCOM give an improvement in bit rate ranging from 2.46% to 6.13% and 3.47% to 11.37%, respectively, compared to DIS for different video sequences. However, for the Foreman video sequence, the performance of PDIVCOM in terms of bit rate saving is degraded up to 1.33%. The BD-PSNR analysis shows that DIVCOM and PDICOM gain 0.05 dB to 0.18 dB and 0.14 to 0.35 dB, respectively, for different video sequences compared to DIS. However, PDIVCOM performance for Foreman is degraded by 0.026 dB compared to DIS.

5. Conclusions and Future Research Work

In DVC, a suitable and accurate correlation noise model plays a crucial role in improving the RD performance and coding efficiency. As in DVC, neither WZF is available at the decoder, nor is the estimated SI of the corresponding WZF available at the encoder. Therefore, online estimation of the CNM and its parameters is quite tricky, especially when the motion vector is not estimated at the decoder such as in [15]. The proposed novel online CNM approach is suitable for such a scenario and accurately calculates the error distribution and makes the coding of the codec highly efficient. Higher coding efficiency and PSNR gain lead to better RD performance. The DIVCOM codec achieves coding efficiency up to 8.05 kbps, and PSNR ranges from 0.0245 dB to 0.18 dB compared to the DIS codec. The PDIVCOM achieves a coding efficiency of up to 10.9 kbps, and PSNR ranges from 0.019 to 0.17 dB compared to DIS.
Even so, there is still room for improvement. During the frame-by-frame analysis, a higher coding rate is observed at some frames of every video sequence. A closer analysis determined that for some frames, the variance was around 0.1, and the channel-coding rate achieved under such conditions was too high. Along with the high channel-coding rate, all errors in bands of such frames were not completely corrected. By further investigating those frames at band level, a higher bit rate is also noticed for most bands of those frames. The CNM parameter ( α f ) estimation process can further be enhanced by improving the mathematical formulation by considering the variance. This could potentially improve the coding efficiency. Therefore, rectifying the CNM parameter at the band level enhances the coding rate and the overall RD performance. For band-level CNM, we may send more than one sample of coefficient values of a band instead of the mean of coefficients of a band.

Author Contributions

Conceptualization, S.K., N.B. and V.J.; methodology, S.K., N.B. and V.J.; validation, N.B., V.J. and M.A.H.; writing—original draft preparation, S.K.; writing—review and editing, N.B., V.J. and M.A.H.; supervision, N.B., V.J. and M.A.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Institute of Health and Analytics, Universiti Teknologi PETRONAS.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sehairi, K.; Benbouchama, C.; El Houari, K.; Fatima, C. A Real-Time Implementation of Moving Object Action Recognition System Based on Motion Analysis. Indones. J. Electr. Eng. Inform. 2017, 5, 44–58. [Google Scholar]
  2. Ukrit, M.F.; Suresh, G. Super-Spatial Structure Prediction Compression of Medical. Indones. J. Electr. Eng. Inform. 2016, 4, 126–133. [Google Scholar] [CrossRef] [Green Version]
  3. Deligiannis, N.; Munteanu, A.; Wang, S.; Cheng, S.; Schelkens, P. Maximum likelihood Laplacian correlation channel estimation in layered Wyner-Ziv coding. IEEE Trans. Signal Process. 2013, 62, 892–904. [Google Scholar] [CrossRef]
  4. Wang, W.; Zhu, J.; Zhang, S.; Zhou, W. Tradeoff between compression ratio and decoding delay of distributed source coding for uplink transmissions in machine-type communication. Int. J. Distrib. Sens. Netw. 2018, 14, 1550147718787109. [Google Scholar] [CrossRef] [Green Version]
  5. Yang, J.; Qing, L.; He, X.; Hua, L.; Rong, S. A Fast DVC to HEVC Transcoding for Mobile Video Communication. In Proceedings of the 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chengdu, China, 15–17 March 2019; pp. 505–509. [Google Scholar]
  6. Dufaux, F.; Gao, W.; Tubaro, S.; Vetro, A. Distributed video coding: Trends and perspectives. EURASIP J. Image Video Process. 2010, 2009, 508167. [Google Scholar] [CrossRef] [Green Version]
  7. Guo, M.; Lu, Y.; Wu, F.; Li, S.; Gao, W. Distributed video coding with spatial correlation exploited only at the decoder. In Proceedings of the 2007 IEEE International Symposium on Circuits and Systems, New Orleans, LA, USA, 27–30 May 2007; pp. 41–44. [Google Scholar]
  8. Jun, D. Distributed video coding with adaptive two-step side information generation for smart and interactive media. Displays 2019, 59, 21–27. [Google Scholar] [CrossRef]
  9. Van, X.H.; Ascenso, J.; Pereira, F. HEVC backward compatible scalability: A low encoding complexity distributed video coding based approach. Signal Process. Image Commun. 2015, 33, 51–70. [Google Scholar]
  10. Artigas, X.; Ascenso, J.; Dalai, M.; Klomp, S.; Kubasov, D.; Ouaret, M. The DISCOVER codec: Architecture, techniques and evaluation. In Proceedings of the Picture Coding Symposium (PCS’07), Lisboa, Portugal, 7–9 November 2007. [Google Scholar]
  11. Taheri, Y.M.; Ahmad, M.O.; Swamy, M. A joint correlation noise estimation and decoding algorithm for distributed video coding. Multimed. Tools Appl. 2018, 77, 7327–7355. [Google Scholar] [CrossRef]
  12. Slepian, D.; Wolf, J. Noiseless coding of correlated information sources. IEEE Trans. Inf. Theory 1973, 19, 471–480. [Google Scholar] [CrossRef]
  13. Wyner, A.; Ziv, J. The rate-distortion function for source coding with side information at the decoder. IEEE Trans. Inf. Theory 1976, 22, 1–10. [Google Scholar] [CrossRef]
  14. Aaron, A.; Zhang, R.; Girod, B. Wyner-Ziv coding of motion video. In Proceedings of the Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 3–6 November 2002; pp. 240–244. [Google Scholar]
  15. Khursheed, S.; Jeoti, V.; Badruddin, N.; Hashmani, M.A. Low complexity Phase-based Interpolation for side information generation for Wyner-Ziv coding at DVC decoder. In Proceedings of the 2020 12th International Symposium on Communication Systems, Networks and Digital Signal Processing (CSNDSP), Porto, Portugal, 20–22 July 2020. [Google Scholar]
  16. Guillemot, C.; Pereira, F.; Torres, L.; Ebrahimi, T.; Leonardi, R.; Ostermann, J. Distributed monoview and multiview video coding. IEEE Signal Process. Mag. 2007, 24, 67–76. [Google Scholar] [CrossRef] [Green Version]
  17. Fang, Y. Crossover probability estimation using mean-intrinsic-LLR of LDPC syndrome. IEEE Commun. Lett. 2009, 13, 679–681. [Google Scholar] [CrossRef]
  18. Brites, C.; Ascenso, J.; Pereira, F. Studying temporal correlation noise modeling for pixel based Wyner-Ziv video coding. In Proceedings of the 2006 International Conference on Image Processing, Atlanta, GA, USA, 8–11 October 2006; pp. 273–276. [Google Scholar]
  19. Brites, C.; Pereira, F. Correlation noise modeling for efficient pixel and transform domain Wyner–Ziv video coding. IEEE Trans. Circuits Syst. Video Technol. 2008, 18, 1177–1190. [Google Scholar] [CrossRef]
  20. Huang, X.; Forchhammer, S. Improved virtual channel noise model for transform domain Wyner-Ziv video coding. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 19–24 April 2009; pp. 921–924. [Google Scholar]
  21. Huu, T.V.; Huong, T.N.T.; Ngoc, M.N.; HoangVan, X. Improving performance of distributed video coding by consecutively refining of side information and correlation noise model. In Proceedings of the 2019 19th International Symposium on Communications and Information Technologies (ISCIT), Ho Chi Minh City, Vietnam, 25–27 September 2019; pp. 502–506. [Google Scholar]
  22. Wu, Y.-Y.; Cai, R.; Zhang, D.-Y. Improved Rate Allocation Algorithm for DVC without Feedback Channel. In Proceedings of the 2015 International Symposium on Computers & Informatics, Beijing, China, 17–18 January 2015; Atlantis Press: Amsterdam, The Netherlands, 2015; pp. 1081–1088. [Google Scholar]
  23. Cui, L.; Wang, S.; Jiang, X.; Cheng, S. Adaptive distributed video coding with correlation estimation using expectation propagation. Proc. SPIE Int. Soc. Opt. Eng. 2012, 8499, 1380075. [Google Scholar]
  24. Van, X.H.; Ascenso, J.; Pereira, F. Correlation modeling for a distributed scalable video codec based on the HEVC standard. In Proceedings of the 2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP), Jakarta, Indonesia, 22–24 September 2014. [Google Scholar]
  25. Deligiannis, N.; Barbarien, J.; Jacobs, M.; Munteanu, A.; Skodras, A.; Schelkens, P. Side-information-dependent correlation channel estimation in hash-based distributed video coding. IEEE Trans. Image Process. 2011, 21, 1934–1949. [Google Scholar] [CrossRef]
  26. Cai, R.; Zhang, D.Y. Hybrid Distributed Correlation Noise Model and Parameter Estimation. Appl. Mech. Mater. 2015, 752–753, 1110–1115. [Google Scholar] [CrossRef]
  27. Van Luong, H.; Huang, X.; Forchhammer, S. Parallel iterative decoding of transform domain Wyner-Ziv video using cross bitplane correlation. In Proceedings of the 2011 18th IEEE International Conference on Image Processing, Brussels, Belgium, 11–14 September 2011; pp. 2633–2636. [Google Scholar]
  28. Wadhwa, N.; Rubinstein, M.; Durand, F.; Freeman, W.T. Phase-based video motion processing. ACM Trans. Graph. TOG 2013, 32, 80. [Google Scholar] [CrossRef] [Green Version]
  29. Meyer, S.; Sorkine-Hornung, A.; Gross, M. Phase-based modification transfer for video. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 633–648. [Google Scholar]
  30. Portilla, J.; Simoncelli, E.P. A parametric texture model based on joint statistics of complex wavelet coefficients. Int. J. Comput. Vis. 2000, 40, 49–70. [Google Scholar] [CrossRef]
  31. Simoncelli, E.P.; Freeman, W.T. The steerable pyramid: A flexible architecture for multi-scale derivative computation. In Proceedings of the Proceedings International Conference on Image Processing, Washington, DC, USA, 23–26 October 1995; pp. 444–447. [Google Scholar]
  32. Simoncelli, E.P.; Freeman, W.T.; Adelson, E.H.; Heeger, D.J. Shiftable multiscale transforms. IEEE Trans. Inf. Theory 1992, 38, 587–607. [Google Scholar] [CrossRef] [Green Version]
  33. Meyer, S.; Wang, O.; Zimmer, H.; Grosse, M.; Sorkine-Hornung, A. Phase-based frame interpolation for video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1410–1418. [Google Scholar]
  34. Varodayan, D.; Aaron, A.; Girod, B. Rate-adaptive codes for distributed source coding. Signal Process. 2006, 86, 3123–3130. [Google Scholar] [CrossRef]
  35. Lam, E.Y.; Goodman, J.W. A mathematical analysis of the DCT coefficient distributions for images. IEEE Trans. Image Process. 2000, 9, 1661–1666. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Proposed DVC framework with novel online correlation noise model.
Figure 1. Proposed DVC framework with novel online correlation noise model.
Applsci 12 06505 g001
Figure 2. Rate-distortion (RD) performance graphical evaluation of “Coastguard” video sequence with frame rate of 15 Hz and GOP 2.
Figure 2. Rate-distortion (RD) performance graphical evaluation of “Coastguard” video sequence with frame rate of 15 Hz and GOP 2.
Applsci 12 06505 g002
Figure 3. Rate-distortion (RD) performance graphical evaluation of “Akiyo” test video sequence with frame rate of 15 Hz and GOP 2.
Figure 3. Rate-distortion (RD) performance graphical evaluation of “Akiyo” test video sequence with frame rate of 15 Hz and GOP 2.
Applsci 12 06505 g003
Figure 4. Rate-distortion (RD) performance graphical evaluation of “Foreman” test video sequence with frame rate of 15 Hz and GOP 2.
Figure 4. Rate-distortion (RD) performance graphical evaluation of “Foreman” test video sequence with frame rate of 15 Hz and GOP 2.
Applsci 12 06505 g004
Figure 5. Rate-distortion (RD) performance of “Hall” video sequence with frame rate of 15 Hz and GOP 2.
Figure 5. Rate-distortion (RD) performance of “Hall” video sequence with frame rate of 15 Hz and GOP 2.
Applsci 12 06505 g005
Table 1. Abbreviations of different video codecs.
Table 1. Abbreviations of different video codecs.
AbbreviationDescription
DISDISCOVER Codec
DIVCOMDistributed Video Coding with Online Band Mean Correlation Noise Model
PDIVCOMPhase-based Distributed Video Coding with Online Band Mean Correlation Noise Model
Table 2. Visual quality and rate performance evaluation of different codecs for “Coastguard” video sequence at frame rate of 15 Hz.
Table 2. Visual quality and rate performance evaluation of different codecs for “Coastguard” video sequence at frame rate of 15 Hz.
Video
Sequence
Quantization Matrix,
Qm
Average PSNR
(dB)
Coding Rate
(kbps)
DISDIVCOMPDIVCOMDISDIVCOMPDIVCOM
Coastguard132.061632.086732.0978417.431416.744415.934
232.9732.994533.008488.7012487.3013487.1401
334.45534.452034.4740614.1899612.8306612.5757
434.914635.00934.974722.3970718717.7109
535.874935.989935.9942825.1416818.6396817.7002
636.3536.528536.5175937.7612929.7104928.8057
Table 3. Coding efficiency evaluation of proposed novel online CNM for “Coastguard” video sequence.
Table 3. Coding efficiency evaluation of proposed novel online CNM for “Coastguard” video sequence.
Video
Sequence
QmAverage Channel Decoding Rate per WZF (bits)Average PSNR Gain per WZF Compared to DIS (dB)
DISDIVCOMPDIVCOMDIVCOMPDIVCOM
Coastguard14545444443260.05210.0751
25404519951760.07310.0449
3772274867523−0.0060.0396
418,07017,42517,3840.19570.1231
522,97922,02821,8910.24170.2507
633,85632,67832,5460.35490.3323
Table 4. Visual quality and rate performance evaluation of different codecs for “Akiyo” video sequence at frame rate of 15 Hz.
Table 4. Visual quality and rate performance evaluation of different codecs for “Akiyo” video sequence at frame rate of 15 Hz.
Video
Sequence
QmAverage PSNR
(dB)
Coding Rate
(kbps)
DISDIVCOMPDIVCOMDISDIVCOMPDIVCOM
Akiyo136.562236.568836.5988188.5562187.2031187.0215
237.972837.989638.0227235.6357234.3213233.8198
338.895038.904538.9394266.1543264.8397262.4297
439.708239.712839.6677329.874329.4976322.6880
540.653540.674740.7688387.9673386.311379.0576
640.963940.982741.0656424.4033439.0259413.4653
Table 5. Coding efficiency evaluation of proposed novel online CNM for “Akiyo” video sequence.
Table 5. Coding efficiency evaluation of proposed novel online CNM for “Akiyo” video sequence.
Video
Sequence
QmAverage Channel Decoding Rate per WZF (bits)Average PSNR Gain
per WZF Compared to DIS
(dB)
DISDIVCOMPDIVCOMDIVCOMPDIVCOM
Akiyo12733253525090.01370.0654
23436324431700.0350.1034
34309411637650.01970.0921
48674861976230.0094−0.084
510,41710,17491130.0440.2389
613,73215,87112,1320.04130.2097
Table 6. Visual quality and rate performance evaluation of different codecs for “Foreman” video sequence at frame rate of 15 Hz.
Table 6. Visual quality and rate performance evaluation of different codecs for “Foreman” video sequence at frame rate of 15 Hz.
Video
Sequence
QmAverage PSNR
(dB)
Coding Rate
(kbps)
DISDIVCOMPDIVCOMDISDIVCOMPDIVCOM
Foreman129.46729.414929.4215182.0371180.8467181.7236
231.11131.141231.0124243.3101241.8110242.8442
331.851331.757131.8224299.8809297.5244298.6641
433.550933.509533.4341466.7505460.9893464.5913
534.286434.182634.1587539.6836533.3032535.6260
636.175336.179636.1962739.2827733.3574739.5210
Table 7. Coding efficiency analysis of proposed novel online CNM for “Foreman” video sequence.
Table 7. Coding efficiency analysis of proposed novel online CNM for “Foreman” video sequence.
Video
Sequence
QmAverage Channel Decoding Rate per WZF (bits)Average PSNR Gain
per WZF Compared to DIS
(dB)
DISDIVCOMPDIVCOMDIVCOMPDIVCOM
Foreman1745272787407−0.1003−0.0922
28229801081610.0627−0.2043
312,44812,10312,270−0.1939−0.06
423,92523,08323,610−0.0857−0.2419
528,26027,32627,666−0.2149−0.02645
638,14837,28138,1830.00890.0434
Table 8. Visual quality and rate performance evaluation of different codecs for “Hall” video sequence for frame rate 15 Hz.
Table 8. Visual quality and rate performance evaluation of different codecs for “Hall” video sequence for frame rate 15 Hz.
Video
Sequence
QmAverage PSNR
(dB)
Coding Rate
(kbps)
DISDIVCOMPDIVCOMDISDIVCOMPDIVCOM
Hall130.150130.143830.1489170.1616168.9883169.845
232.8732.839332.8254243.5151242.0322242.8735
334.557534.530934.5463320.8701318.9966319.6396
435.339835.365435.3581389.0669386.2036387.0825
536.140936.125036.1458445.9565441.4385442.5024
636.978836.944637.0051533.2949526.9409529.1656
Table 9. Coding efficiency evaluation of proposed novel online CNM for “Hall” video sequence.
Table 9. Coding efficiency evaluation of proposed novel online CNM for “Hall” video sequence.
Video
Sequence
QmAverage Channel Decoding Rate per WZF (bits)Average PSNR Gain
per WZF Compared to DIS
(dB)
DISDIVCOMPDIVCOMDIVCOMPDIVCOM
Hall1446744214296−0.0131−0.0023
2512350294906−0.01090.0179
3625259786072−0.0549−0.023
411,44811,02911,1580.05310.0381
513,08112,42012,576−0.03310.0102
617,00316,07416,399−0.07080.0545
Table 10. Comparison of BD-Rate and BD-PSNR between DIVCOM and PDIVCOM with DIS.
Table 10. Comparison of BD-Rate and BD-PSNR between DIVCOM and PDIVCOM with DIS.
VideoBD-RateBD-PSNR
DIVCOM vs. DISPDIVCOM vs. DISDIVCOM vs. DISPDIVCOM vs. DIS
Coastguard−6.13−6.270.140.14
Akiyo−3.46−11.370.1130.35
Foreman−1.491.330.05−0.026
Hall−2.46−3.470.180.17
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Khursheed, S.; Badruddin, N.; Jeoti, V.; Hashmani, M.A. A Novel Online Correlation Noise Model Based on Band Coefficients Mean to Achieve Low Computational and Coding-Efficient Distributed Video Codec. Appl. Sci. 2022, 12, 6505. https://doi.org/10.3390/app12136505

AMA Style

Khursheed S, Badruddin N, Jeoti V, Hashmani MA. A Novel Online Correlation Noise Model Based on Band Coefficients Mean to Achieve Low Computational and Coding-Efficient Distributed Video Codec. Applied Sciences. 2022; 12(13):6505. https://doi.org/10.3390/app12136505

Chicago/Turabian Style

Khursheed, Shahzad, Nasreen Badruddin, Varun Jeoti, and Manzoor Ahmed Hashmani. 2022. "A Novel Online Correlation Noise Model Based on Band Coefficients Mean to Achieve Low Computational and Coding-Efficient Distributed Video Codec" Applied Sciences 12, no. 13: 6505. https://doi.org/10.3390/app12136505

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop