Subsampling of 3D Pixel Blocks as a Video Compression Method for Analog Transmission

Pastuszak, Grzegorz

doi:10.3390/electronics12122641

Open AccessArticle

Subsampling of 3D Pixel Blocks as a Video Compression Method for Analog Transmission

by

Grzegorz Pastuszak

Faculty of Electronics and Information Technology, Warsaw University of Technology, 00-665 Warsaw, Poland

Electronics 2023, 12(12), 2641; https://doi.org/10.3390/electronics12122641

Submission received: 22 May 2023 / Revised: 10 June 2023 / Accepted: 11 June 2023 / Published: 12 June 2023

(This article belongs to the Special Issue Wireless Communication and Multimedia Technology – Theory and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Hybrid digital–analog transmission of video signals enables flexibility in dividing video information into two parts to utilize the available bandwidth better. This study proposes a compression scheme to reduce the utilized bandwidth. The scheme uses different subsampling in three-dimensional (3D) blocks, where subsampling factors are selected to minimize reconstruction distortion. The study evaluates various methods for subsampling and reconstruction to find the best combination in terms of reconstruction quality and complexity. Results show that medium-quality reconstructions can be obtained for compression ratios of about 0.125–0.3 samples per pixel.

Keywords:

video compression; analog transmission; subsampling; rate–distortion optimization

1. Introduction

Using different visual communication systems results in more transmitted video data. On the other hand, wireless communication is limited by the throughput of available radio bands. The limitations are also encountered in wire transmission when cable connections are long. Despite the progress in video compression, channel coding, and modulation techniques, transmitting high-resolution video signals in band-limited channels is challenging.

Inherent parts of communications systems are analog circuits [1,2], which can generate and detect complex electric signals in the transmitter and receiver. Analog video signals used in popular low-cost CCTV cameras usually comply with old standard definition specifications like PAL and NTSC. The latest cameras also support custom formats developed for higher resolutions, such as HD-TVI (high-definition transport video interface) [3], HD-CVI (high-definition composite video interface) [4], and AHD (analog high definition) [5]. The required bandwidth for these analog video systems is closely proportional to the number of pixels transmitted in a given period. In the case of standard definition video, the bandwidth is about 6–8 MHz. High-definition custom formats increase it to tens of MHz. The bandwidth requirements increase accordingly if more video signals are transmitted in a common channel.

Some works focused on video coding schemes for video transmission in wireless analog channels. They exploit the redundancy in the source to improve video quality in wireless networks. In particular, the schemes are designed to achieve graceful quality degradation for wireless channels whose conditions may vary unpredictably and drastically. Some of the schemes such as SoftCast [6], ParCast [7], LineCast [8], OmniCast [9], and FeatureCast [10] provide signals which can be transmitted in an analog form with a small amount of digital metadata. The metadata are used to scale power in visual data chunks according to their impact on quality. The effective chunk-based blind estimation method allows the elimination of metadata [11]. Other schemes such as WaveCast [12], WSVC [13], SK-Cast [14], and “practical” one [15] apply hybrid digital–analog transmission to enable scalable coding. The hybrid schemes utilize digital video coding at a reduced quality/resolution, which is improved by differential data from the analog transmission. The digital part provides low-quality reconstruction but is strongly protected against transmission errors. Each receiver is assumed to decode videos at a guaranteed low quality. The analog part enables improvement depending on interferences in the transmission channel.

The schemes described above minimize the impact of channel interference on video quality while bandwidth requirements remain unchanged or reduced to a small extent. In some communication systems (e.g., short-range wireless surveillance, video transmitted over long cables), it is possible to preserve sufficient signal quality, and the main problem is limited bandwidths. This problem is solved to some extent by video coding algorithms fully working in the digital domain. Digital codecs applying advanced standards such as H.264/AVC [16], H.265/HEVC [17], and H.266/VVC [18] achieve excellent coding performance by using many coding tools with a large set of coding modes, together with a rate–distortion optimization (RDO) procedure to choose the optimal coding mode or parameters. Although the codecs have achieved great success for video communications, their algorithms are computationally-intensive and utilize a significant amount of hardware resources [19,20]. Moreover, they usually do not work well with highly dynamically varying wireless channels due to sensitivity to bit errors. This drawback is somewhat limited by error protection coding, which increases the system complexity and adds extra payload to compressed bit streams.

Bandwidth requirements on video transmission can also be reduced by selecting essential information in the analog domain at the cost of quality losses. Such an approach is applied in compressed sensing [21,22,23]. However, the complexity of algorithms based on compressed sensing is significant as it involves transformations on many pixels included in whole frames. Moreover, reconstruction qualities are much worse as compared to digital codecs.

The main goal of this study is to develop a compression algorithm suitable for analog transmission with reduced complexity compared to digital video codecs. The goal is achieved by the division of input video into three-dimensional (3D) blocks subsampled with different factors according to rate–distortion cost. The subsampling reduces the number of pixel samples that can be transmitted using different signal modulations, including those used in the abovementioned schemes. The rate–distortion analysis selects subsampling factors. Since subsampling is a simple method, each evaluation of each factor exhibits low complexity compared to digital codecs. The study evaluates two different reconstruction methods for the encoder and the decoder to demonstrate possible implementation alternatives.

The rest of the paper is organized as follows: Section 2 describes the main steps of the encoding and decoding algorithm. Implementation details are given in Section 3. Section 4 presents the evaluation results, and Section 5 concludes the paper.

2. Algorithm

Subsampling is widely applied in video compression to reduce the size of chroma components. In particular, removing every second chroma sample in each row, known as 4:2:2 format, reduces the bit rate from 24 bits per pixel (bps) to 16 bps for typical 8-bit sample representations. Additionally, when every second chroma row is eliminated (4:2:0 format), the bit rate drops to 12 bps. This subsampling of chroma components involves subjectively negligible quality losses due to human visual system properties. The subsampling of all components (luma and chroma) can be much stronger, allowing much higher bit rate and bandwidth reductions. This method can be applied to reconstruct a digital representation from analog signals due to its regular pattern.

On the other hand, information density is usually differentiated between frames (motion) and frame regions (texture). Therefore, a constant subsampling pattern can lead to significant information/quality losses in the case of regions with complex texture or high-motion activity. Simultaneously, samples with significant information redundancy can represent flat or static regions.

In video transmission, the goal of compression is to simultaneously reduce the bit rate to a desired level and maximize reconstruction quality. Quality maximization is achieved by allocating different bit numbers to regions of various content complexity. While operating on bits is the feature of digital techniques, samples are much more suitable and efficient for forming analog signals due to their ability to transfer multi-valued variables in successive time instants or frequency carriers (in orthogonal frequency division multiplexing) more compactly. Therefore, the compression for analog transmission should allocate samples rather than bits to pixel regions. Since the correlation of pixels is in both the two-dimensional (2D) frame space and successive time steps (between frames), the operation on 3D blocks (cubes) provides higher compression ratios and more flexibility in sample allocation. For each 3D block, selecting a subsampling factor related to each dimension allows different reconstruction qualities and sample numbers. The general block diagram of the proposed compression scheme is shown in Figure 1. Encoder and decoder modules are described in the following subsections.

2.1. Configurations and Subsamling Modes

The size of the 3D blocks used to select different subsampling factors can be fixed in the configuration or selected by the algorithm for each pixel region. Due to the design simplicity, this study applied fixed sizes. The 3D size is specified by three numbers corresponding to each dimension. In this study, the size configuration was labeled as [X, Y, Z], where X, Y, and Z denote the width, the height, and the frame number, respectively. Subsampling factors for blocks are limited by their size, i.e., at least one sample for each component must be present along a given dimension. The combination of subsampling factors corresponding to all three dimensions specifies modes. The modes were labeled as [x, y, z], where x, y, and z denote luma subsampling factors along horizontal, vertical, and frame dimensions, respectively. Chroma subsampling is twice as strong as luma in each dimension. When the luma subsampling factor is maximal, the chroma one is the same due to the abovementioned limitation. For simplicity, sizes and subsampling factors were restricted to natural powers of 2. Subsampling divides 3D blocks for each component into cubes, where each cube is represented by one sample. Higher subsampling factors form larger cubes.

Apart from 3D block size, the codec configuration is specified by encoding and decoding methods. There are two ways to encode: selecting samples at regular intervals and calculating the average value of the samples. The first approach is more straightforward as it takes/picks the bottom-right (regarding horizontal and vertical frame dimensions) and last-frame sample from each 3D cube resulting from subsampling. Locations of picked samples are shown in Figure 2b and Figure 3b. The second approach computes the average of all samples included in a cube. There are also two ways of decoding. The first is more straightforward as it duplicates sample representatives into all samples in the corresponding cube. The second approach applies trilinear or higher-order interpolation based on samples taken from previously decoded cubes and the representative of the current one.

One of the two decoding methods can be applied at the encoder side to estimate reconstruction distortions required to select the best mode. If the duplication is used in the decoder, it is no sense to estimate distortion based on the interpolation due to its complexity and the mismatch in the estimated (encoder) and actual (decoder) distortions. On the other hand, using the cube average reduces the encoder complexity. Therefore, this method is beneficial even when the decoder exploits the interpolation. Although LDI/ADI configurations involve some mismatches between qualities estimated in the encoder and obtained in the decoder, quality losses are small (see Section 4) since samples in a cube are usually strongly correlated. Table 1 summarizes the configurations regarding computation methods used in the coding algorithm. The last sample denotes the bottom-right and last frame sample within a cube. Figure 2, Figure 3, Figure 4 and Figure 5 depict examples of subsuming and reconstruction for four configurations: LDD, LDI/LII, ADD, and ADI/AII.

2.2. Rate–Distortion Optimization

The selection of subsampling modes uses the rate–distortion optimization (RDO) algorithm. The RDO takes the estimated sample rate (R) and reconstruction distortion (D) for each candidate mode. The two values are combined with the weight, known as Lagrange multiplier λ, to provide a joint cost. Costs computed for all modes are compared to each other to find the best for a given 3D block according to the following formula [24]:

mode = \arg \min_{i \in m o d e s} (D_{i} + λ \cdot R_{i})

(1)

The multiplier value decides the impact of rates and distortions on costs. Its selection depends on the sum of the rates of all blocks. Larger values involve smaller rates achieved for stronger subsampling. Since rates contributed by all blocks depend on λ, some interactions must be performed to meet requirements on the total rate while minimizing the total distortion. Each iteration evaluates one λ value. To limit the iteration number, the bisection method was applied. The method narrows the range of the λ multiplier in each iteration by half. In particular, the evaluated λ value is set in the middle of the range. If the total rate exceeds the target one, the upper subrange is selected as a new range. Otherwise, the bottom subrange is selected. The method finds one bit of the multiplier in each iteration. Modes selected in the last iteration are used to subsample each block. Figure 6 shows the example of the mode selection for the [32, 32, 1] configuration.

2.3. Smoothing Filter

The quality of reconstructed videos is improved by a smoothing filter operating on block edges within each frame. It allows smaller blocking artifacts introduced by subsampling. This study used a simple filter with configurations generating average samples (ADD, ADI, and AII). The filter modified edge luma samples in a given dimension when the corresponding subsampling factor equaled or exceeded 4. This condition applies separately for each side of the edge. The filtering is performed according to the following formula:

p^{'} = \frac{2}{3} p + \frac{1}{3} q

(2)

where p and q denote edge samples from the current and neighboring blocks, and p′ is the modified sample.

2.4. Metadata

In order to recreate the video from the samples, the decoder needs information about the subsampling modes used in the 3D blocks into which the source video was divided. In the target framework, this information can be sent over a separate low-bit-rate digital channel or embedded into the analog signal with redundancy and/or protection bits. For one 3D block, the mode identifier combines three parts related to each dimension. The size of each part depends on the block size in the corresponding dimension. Generally, the number of possible subsampling factors increases with the size, and the factor value is limited by the case when only one luma sample is present along a given dimension. For example, the [16, 16, 16] block needs the 7-bit identifier, where each dimension has five allowed subsampling factors. In total, there are 125 (5 × 5 × 5) combinations, which fit the 128-valued range resulting from 7 bits. The general formula for the number of bits is as follows:

bit number = \log_{2} [((\log_{2} (s i z e_{x}) + 1) ((\log_{2} (s i z e_{y}) + 1) ((\log_{2} (s i z e_{z}) + 1)]

(3)

The impact of the identifier on the required throughput depends on the selected redundancy and protection rates. Provided that each bit is transmitted as two samples, the rate for the [16, 16, 16] block is increased to 14 samples, which corresponds to 0.003418 samples per pixel (spp) in the digital compression.

3. Implementation

The encoder and the decoder are implemented using Python programming language. Popular libraries such as NumPy, SciPy, and sklearn-image were used. To be able to divide the source sequence into 3D blocks, a group of pictures consisting of Z successive frames was stored in memory. Then this group was divided into [X, Y, Z] blocks. The blocks were processed in raster order, i.e., from the left block to the right one in each row, and from the top row to the bottom one. If the frame width or the height was not divisible by X or Y, the last horizontal or vertical blocks contained fewer columns or rows, respectively.

All possible modes were generated for each block. For example, if the codec operated in the [16, 16, 16] configuration, then there were 125 possible modes (from (1, 1, 1) to (16, 16, 16) for the luma and from (1, 1, 1) to (8, 8, 8) for chroma components). For each mode in a given block, encoding and decoding were performed to calculate the rate and the mean square error (MSE). Using the ConvexHull class from the SciPy library, the convex hull (RD curve) was found for each block and stored in the memory. Then the bisection method found the weight λ, based on which the target subsampling modes in each block were selected.

The zoom function from the SciPy library was used for interpolation when the bottom-right and last frame sample was selected as a representative (LDI and LII configurations). This function increased the size of a multidimensional array n times so that the values in new cells were obtained by spline interpolation. It should be noted that samples from neighboring blocks were also needed to reconstruct a block. In particular, the interpolation based on the last sample in the cube refers to left/top/previous-frame-group blocks. The input to the zoom function was an array of samples and a zoom factor for each axis. For example, from the [16, 16, 16] block at the (4, 4, 2) subsampling mode, 32 samples were taken and formed into a 4 × 4 × 2 array. After adding samples from adjacent blocks, a 5 × 5 × 3 array was created. The zoom function enlarged the given array to the size of 17 × 17 × 17, and the 16 × 16 × 16 slice.

When the interpolation was performed using average samples from cubes, the map_coordinates function from the SciPy library was used. The previously mentioned zoom function indirectly uses this function. The input to the map_coordinates function is an array of samples and an array of coordinates where the interpolated values are to be computed at specified fraction-accuracy positions. The function supports the extrapolation, which is useful in the case of average-sample configurations. In particular, the nearest neighbor method calculates pixels between the extreme samples and the block edges. This extrapolation can lead to blocking artifacts, which can be reduced by the smoothing filter, as described in Section 2.3. The filter requires references to four blocks neighboring horizontally and vertically.

The interpolation functions can be executed in a specified order. For the order equal to 1, the linear interpolation is performed. Higher orders allow the polynomial interpolation. By increasing the order, the reconstruction quality should be improved at a higher algorithm complexity cost. Therefore, using higher orders only in the decoder while keeping the encoder as simple as possible is reasonable.

4. Results

The developed codec was evaluated using nine test sequences, as listed in Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7 [25]. For each sequence, the first 64 frames were coded. Four sample rates of 0.125, 0.1875, 0.25, and 0.3125 spp were tested for the peak signal to noise ratio (PSNR) and structural similarity index measure (SSIM) as distortion metrics. These sample rates corresponded to bit rates of 1, 1.5, 2, and 2.5 bits per pixel, respectively. They were selected to allow medium-quality reconstructions. Since the bandwidth is proportional to the number of samples transmitted in a period, the sample rate is the bandwidth reduction measure. If compression is not used, the bandwidth requirements are the highest. Provided the 4:2:0 chroma format, the sample rate of the uncompressed analog video is 1.5 spp.

At the first test stage, the effectiveness of the codec was checked in various configurations while maintaining a constant block size [16, 16, 16]. The interpolation order was set to 1, and the smoothing filter was not used. The obtained rate–distortion (RD) curves are shown in Figure 7 and Figure 8. As seen, the bandwidth reduction can be traded for reconstruction quality, which is typical in lossy compression schemes. Bjontegaard Delta (BD) metrics were computed and listed in Table 2 and Table 3 for these curves. The reference for the metrics was the LDD configuration since it turned out to be the worst regarding RD efficiency, as shown in Figure 7 and Figure 8. The best results were obtained for the LII configuration and slightly worse for AII and ADI configurations. Although LDI and ADD configurations were much better than the LDD one, they were worse than the three best modes. In the case of LDI, the reason was the mismatch between distortion estimation in the encoder and the actual one after decoding. On the other hand, the ADD did not take advantage of the interpolation.

The reconstruction quality strongly depends on motion activity. High motion activity decreases the efficiency of the subsampling in the time/frame dimension (sequences Football, Mobile, and Crowd_run). On the other hand, sequences with fixed camera view (Hall_monitor and News) achieve good compression efficiencies owing to strong time redundancy.

In the second test stage, the impact of the interpolation order and the smoothing filter on the quality of the reconstructed videos was evaluated. The three best configurations are selected from the first stage (LII, ADI, and AII). The results are summarized in Table 4 and Table 5 in terms of BD metrics. As can be seen, the order-2 interpolation outperformed the order-1 interpolation on average by 0.729 and 0.757 dB for ADI/ADI2 and AII/AII2 configurations, respectively. The improvement of the order-2 interpolation was about 0.01 for SSIM. On the other hand, the evaluation with the order equal to 3 provided slightly worse results. Thus, it is beneficial to take advantage of the order-2 interpolation.

The smoothing filter (using denoted as -F in ADI2-F and AII2-F) provided additional quality improvements. Although the average PSNR improvement was below 0.1 dB, subjective quality is much better due to the reduced blocking artifacts. This improvement can be observed in Figure 9.

A larger 3D block size means more modes available and the ability to use stronger subsampling. On the other hand, a larger block means less flexibility in the choice of modes because it imposes the same mode on a larger portion of the video, which can cause poor results. In order to verify the above claims, in the third stage of experiments, the performance of the codec in the ADI configuration is compared using different block sizes. Results are summarized in Table 6 and Table 7. They show that the best efficiency is achieved for the [8, 8, 8] block size. Compared to the [16, 16, 16] configuration, the average gain is about 0.5 dB and 0.0026 for PSNR and SSIM, respectively. However, smaller block sizes have higher bandwidth requirements on the digital channel. In particular, the [8, 8, 8] configuration involves six mode bits per block, which is 0.0117 bits per sample. Assuming that each bit occupies two samples (see Section 2.4), the increase is 0.0234, almost seven times more than in the case of the [16, 16, 16] configuration. It is almost 19% of the bandwidth of the main analog stream for the lowest rate considered in this study (0.125 spp).

The best block size differed between sequences, as the bolded numbers in the tables indicate. In particular, it depended on the temporal and spatial correlation of texture. Smaller blocks better adapted to changing video content. On the other hand, they increased the number of pixels on block edges leading to more blocking artifacts. For high motion activity (Football, Pedestrian_area, and Tractor), better results were achieved when the frame dimension of the block was decreased. For a fixed camera view with low motion activity (Hall_monitor, News), the frame dimension should be longer to utilize background redundancy between frames. The increase in spatial correlations within each frame favored blocks larger horizontally and vertically. These observations suggest that the block-size configuration should be selected considering the application and expected video content. It is also possible to extend this study by enabling the encoder to select between several block sizes. The selection would allow the adaptation to the best sizes based on local correlation.

Figure 10 depicts RD curves for the ADI2-F configuration and all tested video sequences. As seen, quality differed significantly between sequences, which mainly stems from their motion activities. Adaptation to the best block should also decrease the differences.

The implementation in Python was far from real-time performance. Nevertheless, it allowed the evaluation of relative complexities between configurations. Finally, the target implementation should be in hardware. Using one thread of the Intel i5-6300HQ processor clocked at 2.3 GHz, the LDD [16, 16, 16] configuration required 525.3 and 3.91 s to encode and decode 64 frames of Full High Definition (HD 1920 × 1080) video, respectively. In this simplest configuration, the complexity of the decoder is smaller by two orders of magnitude as compared to the decoder. This relationship stems from the fact that the encoder must evaluate all allowable modes to select the best one for each 3D block, whereas the decoder process only one mode for each block.

The average encoding and decoding times are summarized in Table 8 for the remaining [8, 8, 8] and [16, 16, 16] configurations. The LII and AII configurations were much more computationally complex as they had to perform interpolation operations in every allowable subsampling mode. With this parameter in mind, ADI was the best configuration because it guaranteed good compression quality with relatively simple calculations. Although the block size of [8, 8, 8] had fewer subsampling modes to evaluate (64 vs. 125), it involved a much longer execution time. It stems from the fact that Python is the interpreted language, and most of the execution time was utilized for source code interpretation. Since source codes were executed eight times more often for the [8, 8, 8] configurations, execution time significantly increases. In the target implementation, these ratios should be different, and execution times should be reduced by several orders of magnitude.

The smoothing filter and the order-2 interpolation applied at the decoder did not impact the encoder complexity. The increase in execution time of the decoder was small for the active filtering. On the other hand, the order-2 was much more complex than the order-1 interpolation.

To the best of our knowledge, there were no previous works on video compression for analog/hybrid communication. Therefore, it was not possible to make a comparison with them.

5. Conclusions

The compression based on subsampling in 3D blocks was suitable for hybrid digital–analog transmission. The compression allowed for the reduction of the utilized bandwidth by several times. In particular, medium-quality reconstructions can be obtained for transmitting full high definition (1920 × 1080) videos in the channel dedicated to standard definition ones, i.e., sample rate equal to about 0.25–0.3125 spp. Moreover, one such channel can transmit more videos when low-motion activity is expected (e.g., surveillance systems).

The worst codec configuration regarding reconstruction quality was when interpolation was not used, and samples were picked up from the original video (LDD). Subsampling based on the interpolation applied at both the encoder and the decoder achieved the best results. For LII and AII the average improvement was 3733 and 3.388 dB, respectively. Limiting the interpolation to the decoder allowed for the significant complexity reduction and slightly smaller quality improvement, i.e., 3.350 for ADI. The order-2 interpolation increased the quality by about 0.7 dB for the subsampling based on the averaging (ADI–ADI2). On the other hand, the increase was slight for the subsampling based on the picking (LII–LII2). The smoothing filter gave an average improvement of about 0.1 dB and significantly reduced blocking artifacts. The change in the 3D block size from [16, 16, 16] to [8, 8, 8] increased the quality by about 0.5 dB. However, selecting the best block size depends on motion activity, and further studies are beneficial to develop better algorithms.

The reconstruction methods evaluated in this study are relatively simple. The author believes it is possible to improve reconstruction qualities by leveraging more sophisticated approaches, such as variable 3D block size, transforms, and neural networks (on the decoder side). Developing a rate control algorithm more suitable for one-pass coding, i.e., without saving results of all modes for a group of frames, is also beneficial.

Funding

This research received no external funding.

Informed Consent Statement

Not applicable.

Data Availability Statement

Video data are publicly available.

Conflicts of Interest

The author declares no conflict of interest.

References

Fortuna, L.; Buscarino, A. Analog Circuits. Mathematics 2022, 10, 4717. [Google Scholar] [CrossRef]
Arena, P.; Baglio, S.; Fortuna, L.; Manganaro, G. Self-organization in a two-layer CNN. IEEE Trans. Circuits Syst. I Fundam. Theory Appl. 1998, 45, 157–162. [Google Scholar] [CrossRef]
Haldas, M. What is HD-TVI? 2016. Available online: http://videos.cctvcamerapros.com/surveillance-systems/what-is-hd-tvi.html (accessed on 1 May 2023).
Official Site of Zhejiang Dahua Technology Ltd. Available online: https://www.dahuasecurity.com/hdcvi/ (accessed on 1 May 2023).
Haldas, M. What is AHD CCTV? Analog 960 h vs. 720 p, 960 h vs. 1080 p HD Cameras. 2015. Available online: http://videos.cctvcamerapros.com/surveillance-systems/what-is-ahd-cctv.html (accessed on 1 May 2023).
Jakubczak, S.; Katabi, D. SoftCast: One-Size-Fits-All Wireless Video. ACM SIGCOMM Comput. Commun. Rev. 2010, 40, 449–450. [Google Scholar] [CrossRef] [Green Version]
Liu, X.L.; Hu, w.; Luo, C.; Pu, Q.; Wu, F.; Zhang, Y. ParCast+: Parallel video unicast in MIMO-OFDM WLANs. IEEE Trans. Multimed. 2014, 16, 2038–2051. [Google Scholar] [CrossRef]
Wu, F.; Peng, X.; Xu, J. Linecast: Line-based distributed coding and transmission for broadcasting satellite images. IEEE Trans. Image Process. 2014, 23, 1015–1027. [Google Scholar] [CrossRef] [PubMed]
Zhao, J.; Xiong, R.; Xu, J. OmniCast: Wireless Pseudo-Analog Transmission for Omnidirectional Video. IEEE J. Emerg. Sel. Top. Circuits Syst. 2019, 9, 58–70. [Google Scholar] [CrossRef]
Xu, N.; Chen, X.; Chen, L.; Yin, H.; Wang, W. Optimization-Based Pseudo-Analog Transmission Scheme for Wireless Feature Map Transmission. In Proceedings of the IEEE 8th International Conference on Computer and Communications (ICCC), Chengdu, China, 9–12 December 2022; pp. 1276–1280. [Google Scholar] [CrossRef]
Zong, S.; Gao, S.; Tu, G.; Zhang, C.; Chen, D. A metadata-free pure soft broadcast scheme for image and video transmission. Signal Process. Image Commun. 2019, 78, 359–367. [Google Scholar] [CrossRef]
Fan, X.; Xiong, R.; Wu, F.; Zhao, D. WaveCast: Wavelet based wireless video broadcast using lossy transmission. In Proceedings of the IEEE Visual Communications and Image Processing (VCIP), San Diego, CA, USA, 27–30 November 2012; pp. 1–6. [Google Scholar] [CrossRef]
Yu, L.; Li, H.; Li, W. Wireless scalable video coding using a hybrid digital-analog scheme. IEEE Trans. Circuits Syst. Video Technol. 2014, 24, 331–345. [Google Scholar] [CrossRef]
Liang, F.; Luo, C.; Xiong, R.; Zeng, W.; Wu, F. Hybrid digital–analog video delivery with Shannon–Kotel’nikov mapping. IEEE Trans. Multimed. 2017, 20, 2138–2152. [Google Scholar] [CrossRef]
Lan, C.; Luo, C.; Zeng, W.; Wu, F. A practical hybrid digital-analog scheme for wireless video transmission. IEEE Trans. Circuits Syst. Video Technol. 2017, 28, 1634–1647. [Google Scholar] [CrossRef]
ISO/IEC 14496-10:2003; Advanced Video Coding (AVC), ITU-T H.264 and ISO/IEC Standard 14496-10 (MPEG-4 Part 10). ISO/IEC: Washington, DC, USA, 2005.
ISO/IEC 23008-2:2013; High Efficiency Video Coding (HEVC), ITU-T H.265 and ISO/IEC Standard 23008-2 (MPEG-H Part 2). ISO/IEC: Washington, DC, USA, 2013.
ISO/IEC 23090-3:2021; Versatile Video Coding (VVC), ITU-T H.266 and ISO/IEC Standard 23090-3 (MPEG-I Part 3). ISO/IEC: Washington, DC, USA, 2020.
Pastuszak, G.; Abramowski, A. Algorithm and architecture design of the H. 265/HEVC intra encoder. IEEE Trans. Circuits Syst. Video Technol. 2015, 26, 210–222. [Google Scholar] [CrossRef]
Pastuszak, G. Architecture design of the H. 264/AVC encoder based on rate-distortion optimization. IEEE Trans. Circuits Syst. Video Technol. 2015, 25, 1844–1856. [Google Scholar] [CrossRef]
Mun, S.; Fowler, J.E. Block compressed sensing of images using directional transforms. In Proceedings of the 16th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, 7–10 November 2009; pp. 3021–3024. [Google Scholar] [CrossRef]
Shi, W.; Jiang, F.; Liu, S.; Zhao, D. Image compressed sensing using convolutional neural network. IEEE Trans. Image Process. 2019, 29, 375–388. [Google Scholar] [CrossRef] [PubMed]
Saideni, W.; Helbert, D.; Courreges, F.; Cances, J.-P. An Overview on Deep Learning Techniques for Video Compressive Sensing. Appl. Sci. 2022, 12, 2734. [Google Scholar] [CrossRef]
Sullivan, G.J.; Wiegand, T. Rate-distortion optimization for video compression. IEEE Signal Process. Mag. 1998, 15, 74–90. [Google Scholar] [CrossRef] [Green Version]
Video Test Media. Available online: https://media.xiph.org/video/derf/ (accessed on 1 May 2023).

Figure 1. Original block sample representatives, and reconstruction for the ADD [16, 16, 1] configuration and the (4, 4, 1) mode.

Figure 2. Original block (a), sample representatives (b), and reconstruction (c) for the LDD [16, 16, 1] configuration and the (4, 4, 1) mode.

Figure 3. Original block (a), sample representatives (b), and reconstruction (c) for the LDI [16, 16, 1] and LII [16, 16, 1] configurations and the (4, 4, 1) mode.

Figure 4. Original block (a), sample representatives (b), and reconstruction (c) for the ADD [16, 16, 1] configuration and the (4, 4, 1) mode.

Figure 5. Original block (a), sample representatives (b), and reconstruction (c) for the ADI [16, 16, 1] and AII configurations and the (4, 4, 1) mode.

Figure 6. Examples of subsampling mode selection: original blocks (a), sample representatives (b), distortion map (c), and the reconstruction (d). Modes for successive blocks are (4, 32, 1), (2, 4, 1), (4, 32, 1), and (1, 2, 1).

Figure 7. Rate–distortion curves for PSNR as a distortion metric. (a) Football; (b) Foreman; (c) Hall monitor; (d) Mobile; (e) News; (f) Blue sky; (g) Crowd run; (h) Pedestrian area; (i) Tractor.

Figure 8. Rate–distortion curves for SSIM as a distortion metric. (a) Football; (b) Foreman; (c) Hall monitor; (d) Mobile; (e) News; (f) Blue sky; (g) Crowd run; (h) Pedestrian area; (i) Tractor.

Figure 9. Example of compression effects for the central part of the first frame of Foreman at the rate of 0.125 spp: (a) original; (b) the LDD [16, 16, 16] configuration; (c) the LII [16, 16, 16] configuration; (d) the LDI [16, 16, 16] configuration; (e) the LDI2-F [16, 16, 16] configuration; (f) the LDI2-F [8, 8, 8] configuration.

Figure 10. Rate–distortion curves for the ADI2-F [8, 8, 8] configuration and all sequences. PSNR is s a distortion metric.

Table 1. Coding configuration summary.

Configuration Notation	Encoding Method	Distortion Estimation in the Encoder	Decoding Method
LDD[X,Y,Z]	Last sample	Duplication	Duplication
LDI[X,Y,Z]	Last sample	Duplication	Interpolation
LII[X,Y,Z]	Last sample	Interpolation	Interpolation
ADD[X,Y,Z]	Average of samples	Duplication	Duplication
ADI[X,Y,Z]	Average of samples	Duplication	Interpolation
AII[X,Y,Z]	Average of samples	Interpolation	Interpolation

Table 2. BD-PSNR for sequences and encoding/decoding modes. LDD is the reference.

Sequence	Resolution	BD-PSNR [dB]
Sequence	Resolution	LDI	LII	ADD	ADI	AII
Football	CIF	3.226	3.962	2.984	3.480	3.502
Foreman	CIF	3.025	4.381	2.624	3.562	3.598
Hall_monitor	CIF	0.429	1.235	2.464	2.524	2.532
Mobile	CIF	1.914	3.345	2.335	2.575	2.618
News	CIF	2.174	3.037	2.373	2.994	3.044
Blue_sky	1920 × 1080	4.557	4.817	2.433	3.864	3.894
Crowd_run	1920 × 1080	3.150	3.958	2.924	3.380	3.399
Pedestrian_area	1920 × 1080	2.314	3.902	2.996	3.752	3.837
Tractor	1920 × 1080	4.592	4.939	2.963	4.017	4.065
Average		2.820	3.733	2.677	3.350	3.388

Table 3. BD-SSIM for sequences and encoding/decoding modes. LDD is the reference.

Sequence	Resolution	BD-SSIM
Sequence	Resolution	LDI	LII	ADD	ADI	AII
Football	CIF	0.0773	0.0997	0.0818	0.0827	0.0837
Foreman	CIF	0.0404	0.0531	0.0413	0.0462	0.0469
Hall_monitor	CIF	0.0088	0.0117	0.0227	0.0228	0.0227
Mobile	CIF	0.0428	0.0835	0.0696	0.0569	0.0582
News	CIF	0.0041	0.0067	0.0075	0.0079	0.0081
Blue_sky	1920 × 1080	0.0221	0.0244	0.0206	0.0249	0.0256
Crowd_run	1920 × 1080	0.0692	0.0943	0.0826	0.0812	0.0818
Pedestrian_area	1920 × 1080	0.0206	0.0257	0.0233	0.0266	0.0273
Tractor	1920 × 1080	0.0586	0.0630	0.0459	0.0563	0.0573
Average		0.0382	0.0514	0.0439	0.0450	0.0457

Table 4. BD-PSNR for sequences, order-2/3 interpolation, and filtering (F). LDD is the reference.

Sequence	BD-PSNR [dB]
Sequence	LII2	LII3	ADI2	ADI2-F	AII2	AII2-F
Football	4.242	4.197	4.435	4.548	4.465	4.586
Foreman	4.660	4.576	4.701	4.838	4.793	4.942
Hall_monitor	0.874	0.856	2.582	2.552	2.601	2.575
Mobile	3.716	3.678	3.629	3.629	3.647	3.658
News	2.791	2.703	3.361	3.332	3.462	3.443
Blue_sky	5.170	5.103	4.864	5.014	4.925	5.080
Crowd_run	4.140	4.065	4.358	4.419	4.388	4.452
Pedestrian_area	3.548	3.471	4.107	4.190	4.245	4.341
Tractor	4.908	4.842	4.674	4.949	4.778	5.040
Average	3.783	3.721	4.079	4.163	4.145	4.235

Table 5. BD-SSIM for sequences, order-2/3 interpolation, and filtering (F). LDD is the reference.

Sequence	BD-SSIM [dB]
Sequence	LII2	LII3	ADI2	ADI2-F	AII2	AII2-F
Football	0.1018	0.1002	0.1013	0.1024	0.1023	0.1035
Foreman	0.0522	0.0511	0.0548	0.0559	0.0556	0.0567
Hall_monitor	0.0082	0.0081	0.0231	0.0231	0.0230	0.0230
Mobile	0.0938	0.0920	0.0864	0.0849	0.0862	0.0851
News	0.0062	0.0061	0.0083	0.0083	0.0086	0.0086
Blue_sky	0.0230	0.0224	0.0280	0.0285	0.0286	0.0291
Crowd_run	0.0952	0.0938	0.0985	0.0988	0.0990	0.0994
Pedestrian_area	0.0229	0.0225	0.0284	0.0290	0.0292	0.0298
Tractor	0.0610	0.0604	0.0623	0.0650	0.0635	0.0660
Average	0.0516	0.0507	0.0546	0.0551	0.0551	0.0557

Table 6. BD-PSNR for sequences and different block sizes. LDD is the reference. The LDI configuration, the smoothing filter, and the order-2 interpolation are used.

Sequence	BD-PSNR [dB]
Sequence	[8, 8, 2]	[8, 8, 4]	[8, 8, 8]	[16, 16, 4]	[16, 16, 8]	[16, 16, 16]	[32, 32, 1]	[32, 32, 2]	[32, 32, 4]	[64, 64, 1]	[64, 64, 2]
Football	6.392	5.980	5.414	5.440	5.053	4.548	4.628	4.835	4.740	3.877	4.193
Foreman	5.918	6.308	6.115	5.518	5.166	4.838	2.039	3.449	3.990	1.233	2.733
Hall_monitor	0.966	2.422	2.788	1.722	2.399	2.552	−6.852	−3.099	0.342	−9.071	−6.017
Mobile	3.559	4.446	4.6505	3.249	3.614	3.629	0.294	1.721	2.577	−0.066	1.499
News	−1.966	1.778	3.626	−0.047	2.378	3.332	−11.289	−7.470	−3.123	−12.520	−9.091
Blue_sky	5.605	5.402	5.100	5.477	5.264	5.014	7.536	5.474	5.401	7.025	5.282
Crowd_run	5.208	5.297	4.953	4.731	4.576	4.419	3.168	4.005	4.289	2.765	3.641
Pedestrian_area	4.738	4.712	4.543	4.599	4.447	4.190	4.589	4.304	4.350	4.004	3.931
Tractor	5.459	5.279	5.041	5.272	5.133	4.949	6.188	5.128	5.112	5.630	4.806
Average	3.987	4.625	4.692	3.958	4.226	4.163	1.145	2.038	3.075	0.320	1.220

Table 7. BD-SSIM for sequences and different block sizes. LDD is the reference. The LDI configuration, the smoothing filter, and the order-2 interpolation are used.

Sequence	BD-SSIM
Sequence	[8, 8, 2]	[8, 8, 4]	[8, 8, 8]	[16, 16, 4]	[16, 16, 8]	[16, 16, 16]	[32, 32, 1]	[32, 32, 2]	[32, 32, 4]	[64, 64, 1]	[64, 64, 2]
Football	0.1088	0.1068	0.1050	0.1024	0.1041	0.1024	0.0829	0.0950	0.0988	0.0823	0.0960
Foreman	0.0575	0.0618	0.0611	0.0581	0.0588	0.0559	0.0323	0.0503	0.0560	0.0281	0.0493
Hall_monitor	0.0127	0.0196	0.0218	0.0176	0.0217	0.0231	−0.0240	−0.0011	0.0145	−0.0349	−0.0106
Mobile	0.0731	0.0956	0.0993	0.0723	0.0841	0.0849	−0.0501	0.0363	0.0681	−0.0630	0.0331
News	−0.0093	0.0030	0.0075	−0.0009	0.0056	0.0083	−0.0656	−0.0303	−0.0077	−0.0785	−0.0408
Blue_sky	0.0284	0.0280	0.0276	0.0287	0.0286	0.0285	0.0401	0.0291	0.0292	0.0400	0.0294
Crowd_run	0.1040	0.1071	0.1020	0.1016	0.0997	0.0988	0.0697	0.0932	0.1011	0.0691	0.0947
Pedestrian_area	0.0309	0.0313	0.0306	0.0311	0.0305	0.0290	0.0262	0.0293	0.0305	0.0244	0.0282
Tractor	0.0683	0.0669	0.0649	0.0674	0.0663	0.0650	0.0701	0.0667	0.0667	0.0676	0.0648
Average	0.0527	0.0578	0.0577	0.0536	0.0555	0.0551	0.0202	0.0409	0.0508	0.0150	0.0382

Table 8. Encoding/decoding time for different configurations. In total, 64 frames of Full HD videos are coded.

Configuration	Encoding Time [s]		Decoding Time [s]
Configuration	[8, 8, 8]	[16, 16, 16]	[8, 8, 8]	[16, 16, 16]
LDD	1374.2	525.3	17.4	3.91
LDI	1374.2	525.3	113.9	40.4
LII	6691.4	4862.2	106.1	40.6
ADD	3132.8	1092.7	17.1	3.9
ADI	3132.8	1092.7	89.3	27.9
ADI-F	3132.8	1092.7	111.4	30.77
ADI2	3132.8	1092.7	329.4	75.2
ADI2-F	3132.8	1092.7	345.4	80.5
AII	27237.9	4183.0	89.9	27.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pastuszak, G. Subsampling of 3D Pixel Blocks as a Video Compression Method for Analog Transmission. Electronics 2023, 12, 2641. https://doi.org/10.3390/electronics12122641

AMA Style

Pastuszak G. Subsampling of 3D Pixel Blocks as a Video Compression Method for Analog Transmission. Electronics. 2023; 12(12):2641. https://doi.org/10.3390/electronics12122641

Chicago/Turabian Style

Pastuszak, Grzegorz. 2023. "Subsampling of 3D Pixel Blocks as a Video Compression Method for Analog Transmission" Electronics 12, no. 12: 2641. https://doi.org/10.3390/electronics12122641

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Subsampling of 3D Pixel Blocks as a Video Compression Method for Analog Transmission

Abstract

1. Introduction

2. Algorithm

2.1. Configurations and Subsamling Modes

2.2. Rate–Distortion Optimization

2.3. Smoothing Filter

2.4. Metadata

3. Implementation

4. Results

5. Conclusions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI