High-Speed Wavelet Image Processing Using the Winograd Method with Downsampling

Lyakhov, Pavel; Semyonova, Nataliya; Nagornov, Nikolay; Bergerman, Maxim; Abdulsalyamova, Albina

doi:10.3390/math11224644

Open AccessArticle

High-Speed Wavelet Image Processing Using the Winograd Method with Downsampling

by

Pavel Lyakhov

,

Nataliya Semyonova

,

Nikolay Nagornov

,

Maxim Bergerman

and

Albina Abdulsalyamova

^*

Department of Mathematical Modelling, North-Caucasus Federal University, 355009 Stavropol, Russia

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(22), 4644; https://doi.org/10.3390/math11224644

Submission received: 16 October 2023 / Revised: 9 November 2023 / Accepted: 13 November 2023 / Published: 14 November 2023

(This article belongs to the Topic Theory and Applications of High Performance Computing)

Download

Browse Figures

Versions Notes

Abstract

:

Wavelets are actively used to solve a wide range of image processing problems in various fields of science and technology. Modern image processing systems cannot keep up with the rapid growth in digital visual information. Various approaches are used to reduce the computational complexity and increase computational speeds. The Winograd method (WM) is one of the most promising. However, this method is used to obtain sequential values. Its use for wavelet image processing requires expanding the calculation methodology to cases of downsampling. This paper proposes a new approach to reduce the computational complexity of wavelet image processing based on the WM with decimation. Calculations have been carried out and formulas have been derived that implement digital filtering using the WM with downsampling. The derived formulas can be used for 1D filtering with an arbitrary downsampling stride. Hardware modeling of wavelet image filtering on an FPGA showed that the WM reduces the computational time by up to 66%, with increases in the hardware costs and power consumption of 95% and 344%, respectively, compared to the direct method. A promising direction for further research is the implementation of the developed approach on ASIC and the use of modular computing for more efficient parallelization of calculations and an even greater increase in the device speed.

Keywords:

computational complexity; delay reduction; image filtering; decimation; hardware implementation; parallel computing

MSC:

15A09; 42C40; 94A08

1. Introduction

Wavelets are actively used to solve a wide range of image processing problems in various fields of science and technology, e.g., image denoising [1], reconstruction [2], analysis [3], and video analysis and processing [4]. Wavelet processing methods are based on the discrete wavelet transform using 1D digital filtering. Digital filtering is performed through repeated additions and multiplications and has a high computational complexity. Modern image processing systems cannot keep up with the rapid growth in the volume of digital visual information that needs to be processed, stored, and transmitted. Improving microelectronic devices is one of the state-of-the-art approaches to increase computational efficiency [5]. Many approaches reduce the computational complexity of wavelet image processing, including various hardware architectures [6]. The authors of [7] developed a multidimensional wavelet construction method that constructs multidimensional inseparable wavelet filter banks from two 1D low-pass filters, one of which is an interpolation filter, to improve image processing speed. In paper [8], the authors developed a new 2D transform called the asymmetric 2D Haar transform and extended it to wavelet packets with an exponentially larger number of bases. The authors of [9] developed an algorithm for the 2D discrete wavelet transform of high-resolution images on Internet of Things nodes. All of these methods are based on pixel-by-pixel image processing. The Winograd method (WM) is based on matrix multiplication and is used as a modern alternative to the classic direct method (DM). The WM reduces the computational complexity of image processing due to the simultaneous calculation of several output values, in contrast to the above methods. The processed image is assembled not from a collection of individual pixels, but from fragments of a certain size. This approach reduces the number of computationally complex multiplications by increasing the number of additions. The WM is used to increase the speed of neural network image processing algorithms [10]. In [11], digital filtering algorithms based on the WM were proposed for convolutional layers of neural networks (NNs), which are superior to the fast Fourier transform in terms of the performance of deep NNs when processing large arrays of visual data. Based on this work, architectures [12] and hardware accelerators [13] have been developed to implement WM-based NN image processing algorithms. In [14], a digital filter architecture based on the WM in a residue number system was developed, which accelerated image processing while increasing hardware costs. However, the WM is designed to obtain groups of adjacent values, while wavelet filtering reduces the sampling rate of the digital signal. In this regard, there is a need to generalize the WM to cases of signal downsampling with a stride of two to implement wavelet filtering of images.

The purpose of this paper is to generalize the Winograd method to the case of convolution with downsampling and to increase the computational speed of wavelet image processing using the WM with decimation.

The rest of the paper is organized as follows. Section 2 describes approaches to wavelet image processing based on the DM and the WM. Section 3 presents a high-speed implementation of the discrete wavelet transform using the proposed approach. Section 4 contains discussion of the received results. Section 5 contains the conclusions.

2. Approaches to Wavelet Image Processing on Modern Hardware Architectures

A. Wavelet filtering using the direct method

Wavelet filtering with decimation of a 2D image along DM lines can be represented as:

Z (u, y) = \sum_{i = 1}^{r} N (u, 2 y + 1 - i) R (i),

where

Z

and

N

are the processed and original 2D images, respectively, and

u

and

y

are the row and column numbers of pixels processed by the wavelet filter

R

of order

r

. Wavelet image processing using the DM is performed through two computational channels corresponding to low-frequency and high-frequency wavelet filters. The scheme of 1D wavelet filtering of a DM image fragment is shown in Figure 1, where

R_{L}

and

R_{H}

are low-pass and high-pass filters and

Z_{L} (u, y)

and

Z_{H} (u, y)

are the processed images containing low- and high-frequency information about the original image, respectively.

All pixels are processed by a pair of wavelet filters of order

r

and require

2 r

multiplications and

2 (r - 1)

additions in the DM of wavelet processing. Multiplications have a higher computational complexity than additions and require significant resource costs when digital filtering is implemented on modern microelectronic devices. The WM is one of the main alternatives to the classic DM and is discussed below.

B. Digital filtering using the Winograd method

The WM reduces the computational complexity of image processing by simultaneously computing multiple output values through matrix calculations. The WM formula for 1D image filtering has the following general form [15]:

Z = A^{T} ((G R) ⊙ (B^{T} N)),

(1)

where

Z

is the fragment of the processed image of size

z \times 1

;

R

is the wavelet filter mask of size

r \times 1

;

N

is the fragment of the original image of size

n \times 1

, where

n = z + r - 1

;

A^{T}

,

G

,

B^{T}

are the transformation matrices of sizes

z \times n

,

n \times r

,

n \times n

, respectively; and

⊙

is the operator of element-wise matrix multiplication. Algorithms for compiling transformation matrices are described in detail in paper [16]. The notation WM

F (n, r)

contains processed image fragments of size

n

and the order

r

of the wavelet used. The sizes of the transformation matrices and the original image fragments depend on them. For example, WM

F (4, 2)

uses matrices

A^{T} = (\begin{matrix} 1 & 1 & 1 & 1 & 0 \\ 0 & 1 & - 1 & 2 & 0 \\ 0 & 1 & 1 & 4 & 0 \\ 0 & 1 & - 1 & 8 & 1 \end{matrix}), G = (\begin{matrix} \frac{1}{2} & 0 \\ \frac{1}{2} & \frac{1}{2} \\ \frac{1}{6} & - \frac{1}{6} \\ \frac{1}{6} & \frac{1}{3} \\ 0 & 1 \end{matrix}) B^{T} = (\begin{matrix} 2 & - 1 & - 2 & 1 & 0 \\ 0 & 2 & 1 & - 1 & 0 \\ 0 & - 2 & 3 & - 1 & 0 \\ 0 & - 1 & 0 & 1 & 0 \\ 0 & 2 & - 1 & - 2 & 1 \end{matrix}),

constructed at points

0, \pm 1, 2, \infty

.

However, the WM in its classic representation is a method based on processing groups of consecutive pixels, while image filtering through two computational channels during wavelet processing is performed with decimation to reduce the signal sampling frequency and reduce computational redundancy. Thus, there is a need to generalize and expand the WM to a more general case, in which a fragment of the processed image may consist of non-consecutive pixels.

C. Filtration using the Winograd method with downsampling

We expand the WM to the case of filtering with decimation for use in wavelet image processing. Let

x_{1}, x_{2}, \dots, x_{n}, \dots, x_{n + r - 1}

be the pixel brightness values of a certain fragment of a line of the original image

N

, and

r_{1}, r_{2}, \dots, r_{r}

be the coefficients of the wavelet filter

R

. Then, the calculation of the values

z_{1}, z_{2}, \dots, z_{n}

of the fragment of the processed image

Z

can be represented in matrix form:

(\begin{matrix} z_{1} \\ z_{2} \\ \dots \\ z_{n} \end{matrix}) = (\begin{matrix} x_{1} & x_{2} & \dots & x_{r} \\ x_{2} & x_{3} & \dots & x_{r + 1} \\ \dots & \dots & \dots & \dots \\ x_{n} & x_{n + 1} & \dots & x_{n + r - 1} \end{matrix}) \cdot (\begin{matrix} r_{1} \\ r_{2} \\ \dots \\ r_{r} \end{matrix}) .

Expand of processing a fragment with 10 pixels

(n + r - 1 = 10)

with a fifth order filter

(r = 5)

to obtain six values

(n = 6)

of a fragment of the processed image

Z

at the output:

(\begin{matrix} z_{1} \\ z_{2} \\ z_{3} \\ z_{4} \\ z_{5} \\ z_{6} \end{matrix}) = (\begin{matrix} x_{1} & x_{2} & x_{3} & x_{4} & x_{5} \\ x_{2} & x_{3} & x_{4} & x_{5} & x_{6} \\ x_{3} & x_{4} & x_{5} & x_{6} & x_{7} \\ x_{4} & x_{5} & x_{6} & x_{7} & x_{8} \\ x_{5} & x_{6} & x_{7} & x_{8} & x_{9} \\ x_{6} & x_{7} & x_{8} & x_{9} & x_{10} \end{matrix}) \cdot (\begin{matrix} r_{1} \\ r_{2} \\ r_{3} \\ r_{4} \\ r_{5} \end{matrix}) .

When wavelet filtering with decimation, the values

z_{1}, z_{3}, z_{5}

are calculated:

(\begin{matrix} z_{1} \\ z_{3} \\ z_{5} \end{matrix}) = (\begin{matrix} x_{1} & x_{2} & x_{3} & x_{4} & x_{5} \\ x_{3} & x_{4} & x_{5} & x_{6} & x_{7} \\ x_{5} & x_{6} & x_{7} & x_{8} & x_{9} \end{matrix}) \cdot (\begin{matrix} r_{1} \\ r_{2} \\ r_{3} \\ r_{4} \\ r_{5} \end{matrix}) = (\begin{matrix} x_{1} r_{1} + x_{2} r_{2} + x_{3} r_{3} + x_{4} r_{4} + x_{5} r_{5} \\ x_{3} r_{1} + x_{4} r_{2} + x_{5} r_{3} + x_{6} r_{4} + x_{7} r_{5} \\ x_{5} r_{1} + x_{6} r_{2} + x_{7} r_{3} + x_{8} r_{4} + x_{9} r_{5} \end{matrix}) = (\begin{matrix} x_{1} r_{1} + x_{3} r_{3} + x_{5} r_{5} \\ x_{3} r_{1} + x_{5} r_{3} + x_{7} r_{5} \\ x_{5} r_{1} + x_{7} r_{3} + x_{9} r_{5} \end{matrix}) + (\begin{matrix} x_{2} r_{2} + x_{4} r_{4} \\ x_{4} r_{2} + x_{6} r_{4} \\ x_{6} r_{2} + x_{8} r_{4} \end{matrix}) = (\begin{matrix} x_{1} & x_{3} & x_{5} \\ x_{3} & x_{5} & x_{7} \\ x_{5} & x_{7} & x_{9} \end{matrix}) \cdot (\begin{matrix} r_{1} \\ r_{3} \\ r_{5} \end{matrix}) + (\begin{matrix} x_{2} & x_{4} \\ x_{4} & x_{6} \\ x_{6} & x_{8} \end{matrix}) \cdot (\begin{matrix} r_{2} \\ r_{4} \end{matrix}) .

The resulting calculations can be implemented by a combination of the WM

F (3, 3)

using pixel brightness values

x_{1}, x_{3}, x_{5}, x_{7}, x_{9}

and filter coefficients

r_{1}, r_{3}, r_{5}

, and

F (3, 2)

using pixel brightness values

x_{2}, x_{4}, x_{6}, x_{8}

and filter coefficients

r_{2}, r_{4}

. Thus, in this case, instead of the

F (6, 5)

method for wavelet filtering with decimation, you can use the

F (3, 5, 2) = F (3, 3) + F (3, 2)

method, where the third number in

F (3, 5, 2)

denotes the degree of sampling rate reduction. In the general case of filtering the WM

F (n, r, d)

and reducing the signal sampling frequency

d

times, calculations are organized using a combination of methods according to the formula:

F (n, r, d) = s_{2} \cdot F (n, s_{1} + 1) + (d - s_{2}) \cdot F (n, s_{1}),

(2)

where

s_{1}

and

s_{2}

are the partial quotient and the remainder of dividing

r

by

d

, respectively. The notation of the WM

F (n, r, d)

contains the size

n

of the processed image fragments, the order

r

of the wavelet used, and the decimation stride

d

.

The processing of a fragment with 14

(n + r - 1 = 14)

pixels with a six-tap order filter

(r = 6)

is expanded to obtain nine values

(n = 9)

of a fragment of the processed image

Z

at the output. For

d = 3

, the values of

z_{1}, z_{4}, z_{7}

are calculated:

(\begin{matrix} z_{1} \\ z_{4} \\ z_{7} \end{matrix}) = (\begin{matrix} x_{1} & x_{2} & x_{3} & x_{4} & x_{5} & x_{6} \\ x_{4} & x_{5} & x_{6} & x_{7} & x_{8} & x_{9} \\ x_{7} & x_{8} & x_{9} & x_{10} & x_{11} & x_{12} \end{matrix}) \cdot (\begin{matrix} r_{1} \\ r_{2} \\ r_{3} \\ r_{4} \\ r_{5} \\ r_{6} \end{matrix}) = (\begin{matrix} x_{1} r_{1} + x_{2} r_{2} + x_{3} r_{3} + x_{4} r_{4} + x_{5} r_{5} + x_{6} r_{6} \\ x_{4} r_{1} + x_{5} r_{2} + x_{6} r_{3} + x_{7} r_{4} + x_{8} r_{5} + x_{9} r_{6} \\ x_{7} r_{1} + x_{8} r_{2} + x_{9} r_{3} + x_{10} r_{4} + x_{11} r_{5} + x_{12} r_{6} \end{matrix}) = (\begin{matrix} x_{1} r_{1} + x_{4} r_{4} \\ x_{4} r_{1} + x_{7} r_{4} \\ x_{7} r_{1} + x_{10} r_{4} \end{matrix}) + (\begin{matrix} x_{2} r_{2} + x_{5} r_{5} \\ x_{5} r_{2} + x_{8} r_{5} \\ x_{8} r_{2} + x_{11} r_{5} \end{matrix}) + (\begin{matrix} x_{3} r_{3} + x_{6} r_{6} \\ x_{6} r_{3} + x_{9} r_{6} \\ x_{9} r_{3} + x_{12} r_{6} \end{matrix}) = (\begin{matrix} x_{1} & x_{4} \\ x_{4} & x_{7} \\ x_{7} & x_{10} \end{matrix}) \cdot (\begin{matrix} r_{1} \\ r_{4} \end{matrix}) + (\begin{matrix} x_{2} & x_{5} \\ x_{5} & x_{8} \\ x_{8} & x_{11} \end{matrix}) \cdot (\begin{matrix} r_{2} \\ r_{5} \end{matrix}) + (\begin{matrix} x_{3} & x_{6} \\ x_{6} & x_{9} \\ x_{9} & x_{12} \end{matrix}) \cdot (\begin{matrix} r_{3} \\ r_{6} \end{matrix})

The resulting calculations can be implemented by the F(3,2) WM using pixel brightness values of

x_{1}, x_{4}, x_{7}, x_{10}

and filter coefficients of

r_{1}, r_{4}

, for values

x_{2}, x_{5}, x_{8}, x_{11}

and coefficients

r_{2}, r_{5}

, as well as values of

x_{3}, x_{6}, x_{9}, x_{12}

and coefficients of

r_{3}, r_{6}

. Thus, in this case, instead of using the

F (9, 6)

WM when filtering with a stride of 3, you can use the

F (3, 6, 3) = 3 F (3, 2)

method. In the special case when

d

divides

r

, Formula (2) will take the form:

F (n, r, d) = d \cdot F (n, s_{1}) .

(3)

The original signal is divided into two groups of samples (even and odd) during wavelet processing of the WM

F (n, r, d)

images. In this case, the calculations are divided into two computational channels corresponding to even and odd signal samples. Multiplications of matrices

G R_{L}

and

G R_{H}

by the WM are performed a priori, once for each filter used, and do not require additional computational costs. The

B^{T} N

multiplication is calculated before the calculations are divided into two channels, so it is the same for both wavelet filters used. Thus, the number of operations when processing wavelet images can be reduced by one calculation of

B^{T} N

with the subsequent division of calculations into two channels. The elements of the transformation matrices

A^{T}

and

B^{T}

are also known in advance and consist of zeros, powers of twos, and numbers represented in binary notation as a set of ones. Multiplications by these elements can be represented as scalings and additions. For example, multiplying a number by

3 = 11_{2}

can be executed by moving the point of this number one bit to the right and adding it to the original number. Thus, all calculations according to Formula (1) are implemented using additions, with the exception of element-wise multiplication

⊙

. Element-wise multiplication is performed once for two

n \times 1

matrices. The scheme for processing wavelet images with decimation for the WM is presented in Figure 2, where

Z_{L}

and

Z_{H}

are fragments of the image processed by the WM using wavelet filters

R_{L}

and

R_{H}

, respectively.

D. Computationally Efficient Data Representation

All data in digital devices are stored with limited accuracy. Thus, it is necessary to quantize the coefficients of wavelet filters. For experiments, we use Daubechies wavelets db2 and db3. During wavelet processing of 8-bit images, the coefficients of the filters used are represented with an accuracy selected in accordance with the formula [17]:

f = 10 + ⌊ \sqrt{\frac{r}{4}} ⌋,

(4)

where

f

is the capacity of the quantized wavelet filter coefficients without taking into account the sign bit and

⌊ \cdot ⌋

is the rounding down operator. The original filter coefficients

F

are scaled by

f

bits and rounded up:

F^{*} = ⌈ F \cdot 2^{f} ⌉,

(5)

where

F^{*}

is a quantized filter and

⌈ \cdot ⌉

is the rounding up operator.

For example, the initial coefficients of the high-frequency wavelet filter db2 are:

HD = (\begin{matrix} \frac{- 1 - \sqrt{3}}{4 \sqrt{2}} & \frac{3 + \sqrt{3}}{4 \sqrt{2}} & \frac{- 3 + \sqrt{3}}{4 \sqrt{2}} & \frac{1 - \sqrt{3}}{4 \sqrt{2}} \end{matrix}) .

According to Formula (3), for the considered wavelet filter,

f = 11

. The coefficients of the wavelet filter are quantized using Formula (4):

HD = (\begin{matrix} - 699 & 1212 & - 324 & - 187 \end{matrix}) .

The resulting values are scaled and rounded down after WM wavelet filtering to compensate for the rounding error. Thus, the limited accuracy of data representation in the device memory will not have a significant impact on the quality of wavelet image processing.

Below, we present the results of implementing both considered approaches to wavelet image processing on FPGAs and an analysis of the results obtained.

3. High-Speed Implementation of the Discrete Wavelet Transform Using the Direct Method and the Winograd Method

Hardware modeling of the discrete wavelet transform using the DM and the WM with decimation was carried out on an FPGA in the Xilinx Vivado 2018.2 environment in the Verilog on FPGA Family Virtex 7 on the board “xc7vx485tffg1157-1” with the synthesis parameters “Vivado Synthesis Defaults” and implementation parameters “Vivado Implementation Defaults”, without using DSP blocks. Wallace tree structures were used as tools for implementing addition and multiplication, with further addition performed using carry-saving adders [18] and Kogge–Stone adders [19]. These adders perform calculations in the least amount of time [20]. The 8-bit pixels of the original image were used as input. Daubechies wavelets of four-tap and six-tap order were selected, the filter coefficients of which are quantized by 11 bits. The output data are 8-bit pixels of the processed image, since the WM receives several values of processed pixels in one iteration. The methods were evaluated based on the average resource costs for processing each pixel for a correct comparison. The simulation results are presented in Table 1. The graphs in Figure 3, Figure 4, Figure 5 and Figure 6 show the hardware, time, and power consumption costs separately, as well as the area-delay product (ADP).

4. Discussion

The following conclusions can be drawn according to the results presented in Table 1 and Figure 3, Figure 4, Figure 5 and Figure 6:

The hardware costs for wavelet processing of WM images compared to the DM increase from 12% (|692.5 − 617|/617 × 100% ≈ 12%) to 95% (|1205.2 − 617|/617 × 100% ≈ 95%), where the operator |·| is present to calculate the absolute value, and from 1.5% (|916.5 − 903|/903 × 100% ≈ 1.5%) to 30% (|1175.5 − 903|/903 × 100% ≈ 30%) using four-tap and six-tap wavelets, respectively.
The computational delay in the wavelet processing of WM images compared to the DM is reduced from 34% (|9.732 − 14.815|/14.815 × 100% ≈ 34%) to 63% (|5.415 − 14.815|/14.815 × 100% ≈ 63%) and from 39% (|10.222 − 16.730|/16.730 × 100% ≈ 39%) to 66% (|5.706 − 16.730|/16.730 × 100% ≈ 66%) using four-tap and six-tap wavelets, respectively.
The power consumption during wavelet processing of WM images compared to the DM increases from 35% (|62.25 − 46.26|/46.24 × 100% ≈ 35%) to 344% (|205.62 − 46.26|/46.24 × 100% ≈ 344%) and from 2% (|80.19 − 78.77|/78.77 × 100% ≈ 2%) to 125% (|176.98 − 78.77|/78.77 × 100% ≈ 125%) using four-tap and six-tap wavelets, respectively.
The best efficiency in terms of the ADP is observed when using the F(3,4,2) and F(5,6,2) WMs using four-tap and six-tap wavelets, respectively.
In general, the WM significantly speeds up calculations due to a moderate increase in hardware costs and power consumption.
The increase in the speed of wavelet image processing will be insignificant compared to the increases in hardware costs and power consumption and the further increase in the size of the processed image fragments using the WM. This will also significantly increase the calculation error and degrade the quality of image processing.

Table 2 presents a comparison of wavelet transform methods. In [21], a standard wavelet transform method was presented, using which hardware simulations were carried out and compared with the proposed method. The proposed method showed the best results in terms of the delay and overall device performance. Paper [22] describes a wavelet transform method called the lifting scheme. The peculiarity of this method is the sequence of the action steps: split, predict, and update. The lifting scheme allows a reduction in the hardware costs of the device, but at the same time increases the computation time. The method is applicable only to a limited set of wavelets. In [23], a fast continuous wavelet transform was implemented, and it requires large time and hardware costs to implement. The method developed in this paper is free of these disadvantages and is a promising digital signal processing tool for practical use and can be used in fusion problems of noise removal, compression, and combination of signals.

5. Conclusions

In this paper, calculations were carried out and Formulas (2) and (3) were derived, which implement wavelet filtering using the WM with decimation. The scope of application of the WM has been expanded to the case of downsampling a signal with an arbitrary stride during processing. The main result from hardware simulations of the developed approach on an FPGA is a reduction in the computational delay by up to 66%, with increases in hardware costs and power consumption of 95% and 344%, respectively, compared to the DM depending on the selected method parameters. The best efficiency in terms of the ADP was found for the

F (3, 4, 2)

and

F (5, 6, 2)

WMs using four-tap and six-tap wavelets, respectively. The developed approach can be used in image processing systems to improve the performance of modern microelectronic devices for image denoising and compression, as well as pattern recognition. A promising direction for further research is the implementation of the developed approach on ASIC and the use of modular computing for more efficient parallelization of calculations and an even greater increase in the device speed.

Author Contributions

Conceptualization, N.N. and P.L.; methodology, N.S.; software, M.B.; formal analysis, N.S.; investigation, P.L.; resources, M.B.; data curation, N.N.; writing—original draft preparation, A.A.; writing—review and editing, A.A.; visualization, M.B.; supervision, N.N.; project administration, P.L.; funding acquisition, N.N. All authors have read and agreed to the published version of the manuscript.

Funding

The research in Section 2 was supported by the Russian Science Foundation (project no. 22-71-00009). The research in the remaining sections was supported by the Council for grants of President of Russian Federation (project no. MK-371.2022.4).

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors thank the North-Caucasus Center for Mathematical Research for providing the material and technical base.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wu, Y.; Gao, G.; Cui, C. Improved Wavelet Denoising by Non-Convex Sparse Regularization under Double Wavelet Domains. IEEE Access 2019, 7, 30659–30671. [Google Scholar] [CrossRef]
Qin, Q.; Dou, J.; Tu, Z. Deep ResNet Based Remote Sensing Image Super-Resolution Reconstruction in Discrete Wavelet Domain. Pattern Recognit. Image Anal. 2020, 30, 541–550. [Google Scholar] [CrossRef]
Soulard, R.; Carré, P. Elliptical Monogenic Wavelets for the Analysis and Processing of Color Images. IEEE Trans. Signal Process. 2016, 64, 1535–1549. [Google Scholar] [CrossRef]
Chen, Y.; Li, D.; Zhang, Q.J. Complementary Color Wavelet: A Novel Tool for the Color Image/Video Analysis and Processing. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 12–27. [Google Scholar] [CrossRef]
Rossinelli, D.; Fourestey, G.; Schmidt, F.; Busse, B.; Kurtcuoglu, V. High-Throughput Lossy-To-Lossless 3D Image Compression. IEEE Trans. Med. Imaging 2021, 40, 607–620. [Google Scholar] [CrossRef] [PubMed]
Alcaín, E.; Fernández, P.R.; Nieto, R.; Montemayor, A.S.; Vilas, J.; Galiana-Bordera, A.; Martinez-Girones, P.M.; Prieto-de-la-Lastra, C.; Rodriguez-Vila, B.; Bonet, M.; et al. Hardware Architectures for Real-Time Medical Imaging. Electronics 2021, 10, 3118. [Google Scholar] [CrossRef]
Escande, P.; Weiss, P. Fast wavelet decomposition of linear operators through product-convolution expansions. IMA J. Numer. Anal. 2022, 42, 569–596. [Google Scholar] [CrossRef]
Ouyang, W.; Zhao, T.; Cham, W.K.; Wei, L. Fast Full-Search-Equivalent Pattern Matching Using Asymmetric Haar Wavelet Packets. IEEE Trans. Circuits Syst. Video Technol. 2018, 28, 819–833. [Google Scholar] [CrossRef]
Tausif, M.; Khan, E.; Hasan, M.; Reisslein, M. SMFrWF: Segmented modified fractional wavelet filter: Fast low-memory discrete wavelet transform (DWT). IEEE Access 2019, 7, 84448–84467. [Google Scholar] [CrossRef]
Mittal, S.; Vibhu. A survey of accelerator architectures for 3D convolution neural networks. J. Syst. Arch. 2021, 115, 102041. [Google Scholar] [CrossRef]
Lavin, A.; Gray, S. Fast Algorithms for Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4013–4021. [Google Scholar]
Mehrabian, A.; Miscuglio, M.; Alkabani, Y.; Sorger, V.J.; El-Ghazawi, T. A Winograd-Based Integrated Photonics Accelerator for Convolutional Neural Networks. IEEE J. Sel. Top. Quantum Electron. 2020, 26, 610031. [Google Scholar] [CrossRef]
Shen, J.; Huang, Y.; Wen, M.; Zhang, C. Toward an Efficient Deep Pipelined Template-Based Architecture for Accelerating the Entire 2-D and 3-D CNNs on FPGA. IEEE Trans. Comput. Des. Integr. Circuits Syst. 2020, 7, 1442–1455. [Google Scholar] [CrossRef]
Valueva, M.; Lyakhov, P.; Valuev, G.; Nagornov, N. Digital Filter Architecture With Calculations in the Residue Number System by DM F(2 × 2, 2 × 2). IEEE Access 2021, 9, 143331–143340. [Google Scholar] [CrossRef]
Winograd, S. Arithmetic Complexity of Computations; SIAM: Philadelphia, PA, USA, 1980; Volume 33. [Google Scholar]
Lyakhov, P.; Abdulsalyamova, A.; Semyonova, N.; Nagornov, N. On the Computational Complexity of 2D Filtering by Winograd method. In Proceedings of the 2022 11th Mediterranean Conference on Embedded Computing (MECO), Budva, Montenegro, 7–10 June 2022; pp. 1–4. [Google Scholar]
Chervyakov, N.; Lyakhov, P.; Kaplun, D.; Butusov, D.; Nagornov, N. Analysis of the Quantization Noise in Discrete Wavelet Transform Filters for Image Processing. Electronics 2018, 7, 135. [Google Scholar] [CrossRef]
Parhami, B. Computer Arithmetic: Algorithms and Hard-Ware Designs; Oxford University Press: Oxford, UK, 2010. [Google Scholar]
Kogge, P.M.; Stone, H.S. A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations. IEEE Trans. Comput. 1973, C–22, 786–793. [Google Scholar] [CrossRef]
Zimmerman, R. Binary Adder Architectures for Cell-Based VLSI and Their Synthesis; Konstanz Hartung-Gorre: Konstanz, Germany, 1998. [Google Scholar]
Daubechies, I. Ten Lectures on Wavelets; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1992. [Google Scholar]
Sweldens, W. Lifting scheme: A new philosophy in biorthogonal wavelet constructions. In Wavelet Applications in Signal and Image Processing III; SPIE: Bellingham, WA, USA, 1995; pp. 68–79. [Google Scholar]
Arts, L.; Broek, E.L. The fast continuous wavelet transformation (fCWT) for real-time, high-quality, noise-resistant time–frequency analysis. Nat. Comput. Sci. 2022, 2, 47–58. [Google Scholar] [CrossRef]

Figure 1. Wavelet filtering scheme with decimation of a fragment of the original image using the direct method.

Figure 2. Wavelet filtering scheme with decimation of a fragment of the original image using the Winograd method.

Figure 3. Hardware costs for wavelet image processing using the direct method (one pixel) and the Winograd method (two–five pixels) with averaged values for each pixel.

Figure 4. Delay results for wavelet image processing using the direct method (one pixel) and the Winograd method (two–five pixels) with averaged values for each pixel.

Figure 5. Results of power consumption during wavelet image processing using the direct method (one pixel) and the Winograd method (two–five pixels) with averaged values for each pixel.

Figure 6. Area-delay product with wavelet image processing using the direct method (one pixel) and the Winograd method (two–five pixels) with averaged values for each pixel.

Table 1. Results of modeling a wavelet image processing device using the direct method and the Winograd method with averaged values for each pixel.

Tap	Method	Processed Image Fragment Size	Area, LUTs	Delay, ns	Power, W	Area-Delay Product
4	Direct	1	617.0	14.815	46.26	9140.86
	$F (2, 4, 2)$	2	692.5	9.732	62.25	6739.07
	$F (3, 4, 2)$	3	869.0	7.263	86.61	6311.55
	$F (4, 4, 2)$	4	1053.3	6.908	148.74	7275.85
	$F (5, 4, 2)$	5	1205.2	5.415	205.62	6525.92
6	Direct	1	903.0	16.730	78.77	15,107.19
	$F (2, 6, 2)$	2	916.5	10.222	80.19	9368.01
	$F (3, 6, 2)$	3	1062.0	7.786	126.07	8268.73
	$F (4, 6, 2)$	4	1175.5	6.747	179.33	7931.10
	$F (5, 6, 2)$	5	1125.0	5.706	176.98	6419.03

Table 2. Comparison of wavelet transform methods.

Method	Year	Key Idea
Daubechies [21]	1992	Convolution-based direct method
Sweldens [22]	1995	Lifting scheme
Arts et al. [23]	2022	Algorithm with scale-independent and scale-dependent operation separation
Proposed	2023	Matrix computations using the Winograd method

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lyakhov, P.; Semyonova, N.; Nagornov, N.; Bergerman, M.; Abdulsalyamova, A. High-Speed Wavelet Image Processing Using the Winograd Method with Downsampling. Mathematics 2023, 11, 4644. https://doi.org/10.3390/math11224644

AMA Style

Lyakhov P, Semyonova N, Nagornov N, Bergerman M, Abdulsalyamova A. High-Speed Wavelet Image Processing Using the Winograd Method with Downsampling. Mathematics. 2023; 11(22):4644. https://doi.org/10.3390/math11224644

Chicago/Turabian Style

Lyakhov, Pavel, Nataliya Semyonova, Nikolay Nagornov, Maxim Bergerman, and Albina Abdulsalyamova. 2023. "High-Speed Wavelet Image Processing Using the Winograd Method with Downsampling" Mathematics 11, no. 22: 4644. https://doi.org/10.3390/math11224644

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

High-Speed Wavelet Image Processing Using the Winograd Method with Downsampling

Abstract

1. Introduction

2. Approaches to Wavelet Image Processing on Modern Hardware Architectures

3. High-Speed Implementation of the Discrete Wavelet Transform Using the Direct Method and the Winograd Method

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI