Learning the Frequency Domain Aliasing for Real-World Super-Resolution

Hao, Yukun; Yu, Feihong

doi:10.3390/electronics13020250

Open AccessArticle

Learning the Frequency Domain Aliasing for Real-World Super-Resolution

by

Yukun Hao

and

Feihong Yu

^*

College of Optical Science and Engineering, Zhejiang University, Hangzhou 310027, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(2), 250; https://doi.org/10.3390/electronics13020250

Submission received: 22 October 2023 / Revised: 23 December 2023 / Accepted: 2 January 2024 / Published: 5 January 2024

(This article belongs to the Special Issue Machine Learning Techniques for Image Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Most real-world super-resolution methods require synthetic image pairs for training. However, the frequency domain gap between synthetic images and real-world images leads to artifacts and blurred reconstructions. This work points out that the main reason for the frequency domain gap is that aliasing exists in real-world images, but the degradation model used to generate synthetic images ignores the impact of aliasing on images. Therefore, a method is proposed in this work to assess aliasing in images undergoing unknown degradation by measuring the distance to their alias-free counterparts. Leveraging this assessment, a domain-translation framework is introduced to learn degradation from high-resolution to low-resolution images. The proposed framework employs a frequency-domain branch and loss function to generate synthetic images with aliasing features. Experiments validate that the proposed domain-translation framework enhances the visual quality and quantitative results compared to existing super-resolution models across diverse real-world image benchmarks. In summary, this work offers a practical solution to the real-world super-resolution problem by minimizing the frequency domain gap between synthetic and real-world images.

Keywords:

super-resolution; real-world; domain-translation

1. Introduction

Single-image super-resolution (SR) aims to enhance the quality of low-resolution images by reconstructing them into high-resolution counterparts. Learning-based methods [1,2,3,4,5,6,7,8,9,10], such as SRCNN [1], VDSR [2], LapSRN [3], RCAN [4], SRGAN [5], and ESRGAN [6], have made significant advancements in achieving impressive results. Typically, these methods require pairs of high-resolution (HR) and low-resolution (LR) images for training. However, acquiring such HR–LR image pairs for real-world scenarios proves challenging. Even if obtained through methods like optical zoom [11,12,13,14,15,16], they fail to address issues stemming from differences in depth of field, illumination, and perspective [17]. These disparities impact the performance of end-to-end learning models and result in blurry SR outcomes [17]. Moreover, these methods are constrained by limited degradation diversity, making it challenging to apply them effectively to various scenarios with differing degradation distributions. As a result, their adoption remains limited, primarily due to issues related to misalignment and a lack of adaptability.

Considering that there are currently many widely used HR datasets [18,19,20,21,22], a practical approach for real-world SR involves creating LR images from HR images by using a degradation model. This process leads to the formation of HR–LR image pairs, which can serve as the training dataset. Some methods [1,2,3,4,5,6] use HR images and LR images generated by blurring and downsampling to train SR models. When real-world images are taken as input, SR results are disturbed by severe artifacts and noise. This is due to the mismatch between the degradation model and the real-world degradation process. Some recent methods extend the classical degradation model through random strategies, or recombine steps such as blur, downsampling, and noise [23,24,25,26]. These methods greatly expand the range of degradation distributions covered and are able to deal with more complex degradation scenarios. However, at the cost of improved generalization capabilities, these methods lack detail-recovery capabilities and tend to produce over-smoothed results. Furthermore, these predefined, constrained degradation steps often prove inadequate for covering the complexity of degradation in real-world images.

Since the degradation processes in the real world are complex and diverse, domain-translation-based methods are proposed to adaptively learn degradation models for images with different degradation distributions [27]. The reference HR image is considered to be from the source domain, while the real-world LR image is considered to be from the target domain. The degradation model is considered to be the domain-translation process from the source domain to the target domain. Since the image contents in the reference HR and real-world LR are not directly consistent, the translation process from the source domain to the target domain is learned under an adversarial training framework [28] to generate synthetic LR images that conform to the degradation distribution of the target domain. Then, image pairs consisting of HR and synthetic LR images can be used to train the SR model in a supervised learning manner. KernelGAN [29] first generates LR images from HR using a deep linear generator. Bulat et al. [30] used a generator with an encoder–decoder structure for the domain-translation task. The CycleGAN [31] architecture for image-to-image translation is also used to learn domain translation from HR to LR images, as well as LR to HR images [32,33,34]. In this framework, the degradation process and the SR model are jointly trained, and cycle consistency loss ensures that the content of the image does not change during the degradation and SR processes. To facilitate integration with more advanced SR methods, more methods train the degradation and SR processes separately [17,30,35,36,37,38,39,40]. Zhou et al. [37] proposed a color-guided domain-mapping network to alleviate color shift in domain-translation processes. Luo et al. [38] proposed a probabilistic degradation model, which studies the degradation process by modeling it as a random variable and covers more diverse degradation distributions. Son et al. [40] proposed an adaptive data loss to allow the downsampler to adaptively learn the degradation process of real-world images. These learning-based methods can learn the degradation process of real-world images to generate LR images with smaller gaps from the real-world domain, thereby training SR models that can restore better image reconstruction quality.

However, these methods are all based on classical degradation models, focusing on establishing accurate spatial blur kernels and noise, without paying attention to the frequency-domain gap between synthetic LR images and real-world LR images. The frequency distributions between the synthetic LR image and the real-world LR image show obvious differences [27]. Ji et al. [39] focused on the frequency-domain features of synthetic images and used a frequency-consistent adaptation to construct images that were consistent with real-world images. However, when the image contents are inconsistent, it is inappropriate for the frequenc- consistent adaptation to directly compare their frequency spectral densities. Furthermore, this frequency domain consistency adaptation method does not point out the reason for the frequency-domain gap between synthetic images and real-world images, and adopts an indiscriminate comparison of densities at all frequencies.

This work points out that the main reason for the frequency-domain gap between synthetic LR images and real-world LR images is the aliasing phenomenon, which is widely present in real-world images [41,42,43]. However, existing degradation models [17,25,26,29,30,35,36,37] ignore the impact of aliasing on images [39,44]. Aliasing is an inherent property of sampling equipment [45,46,47]. Aliasing occurs when the spatial frequency of the scene surpasses the Nyquist limit defined by the imaging device’s resolution [48,49]. Therefore, aliasing widely exists in images captured by acquisition devices such as cameras and mobile phones in real scenes [41,42,43]. Aliasing reduction [50,51,52] has always been a classical problem in the fields of signal processing and image processing. However, in the field of real-world image super resolution, few works [39] focus on the impact of aliasing on images.

Therefore, this paper proposes a method to evaluate aliasing in LR images that have undergone unknown degradation. The method involves generating an alias-free copy of the LR image to be evaluated and then computing the

L 1

distance between this copy and the LR image. The greater the distance calculated, the more serious the aliasing. Compared with the previously discussed frequency-consistent adaptation method, this approach provides a more intuitive and accurate means of quantifying the degree of aliasing. The consistency of image contents between the copy and the LR image enhances the precision of the results.

On the basis of this measurement method, this work proposes a domain-translation-based method that generates frequency features similar to real-world images. On the one hand, a frequency-domain branch extracts the aliasing features in the degradation process. On the other hand, a loss function is proposed to guide the generation of synthetic images with aliasing features. Both the domain-translation generator and the frequency-domain loss function reduce the frequency-domain gap between synthetic LR images and real-world LR images, so the proposed method can help the SR model reconstruct more realistic image details.

The main contributions of this work are as follows:

This work points out the frequency-domain gap between the synthetic and real-world image. A method is proposed to measure the degree of frequency-domain aliasing in images that undergo unknown degradation;
A domain-translation framework is proposed to generate frequency-domain features that are similar to real-world images, including a branch to extract aliasing features and a loss function related to the degree of aliasing;
The proposed domain-translation framework is proven to help the SR model achieve better reconstruction quality on real-world images.

2. Related Work

2.1. Image Pair-Based Methods

The challenge in real-world SR primarily stems from the shortage of real-world HR–LR image pairs in most scenarios [27]. To address this issue, various approaches have been proposed [11,12,13,14,15,16]. These methods involve capturing images of the same scene at different resolutions, effectively creating datasets containing HR and LR images derived directly from real-world sources. Some methods [11,12,13,14] involve using zoom lenses to acquire HR and LR images with long and short focal lengths, respectively. SR models trained with these datasets have the capacity to learn the degradation distribution present in real-world scenarios, resulting in improved performance on LR images characterized by the same degradation distribution. Nonetheless, challenges persist due to disparities in depth of field, lighting conditions, and perspective, leading to inevitable misalignment between HR and LR images [17]. These alignment discrepancies significantly impact the efficacy of SR models. Moreover, existing datasets often lack diversity in terms of scenes and degradation distributions, making it challenging to generalize to real-world images that exhibit varying degradation distributions.

To address complex and diverse real-world SR challenges, degradation models that are consistent with real-world degradation processes and have generalization capabilities are necessary. Alternatively, domain-translation methods for unpaired HR and LR images offer a viable solution.

2.2. Degradation Modeling-Based Methods

In many cases, only reference HR images and real-world LR images are available due to the absence of HR–LR image pairs. There is no direct content correspondence between reference HR and real-world LR images. To reconstruct SR images using reconstruction-based SR methods, it is necessary to determine the parameter set for the degradation model between the HR and LR images. Efrat et al. [53] proposed that an accurate degradation model is more important than using advanced image priors. Classical degradation models typically involve steps like blurring, downsampling, and noise. Most current research efforts [54,55,56,57,58,59,60,61] concentrate on the precise recovery of blur kernels. Several approaches [57,58,59,60,61] optimize degradation parameters and intermediate super-resolved images through iterative prediction and correction techniques. However, when compared to the vast and diverse degradation space in the real world, these methods still exhibit limitations in terms of diversity and face the risk of failure in realistic scenarios.

For learning-based SR methods, it is crucial to determine the degradation model that transforms reference HR images into their corresponding LR counterparts, in order to train the SR model. One intuitive approach [62] involves synthesizing various LR images with multiple degradation distributions to train the SR model. This strategy aims to enhance the model’s generalization performance. Several methods seek to expand upon the classical degradation model by introducing new degradation frameworks [23,24,25,26]. For instance, Zhou et al. [23] employed Generative Adversarial Network [28] (GAN) to construct a blur kernel pool and used these kernels in conjunction with HR images to generate LR images. Ji et al. [24] proposed RealSR, a degradation framework that employed blind kernel estimation to construct blur kernels and noise pools. The degradation process involves random sampling of blur kernels and noise from these pools to degrade HR images. Zhang et al. [25] used a random shuffled strategy to extend the classic degradation model and proposed a practical degradation model BSRGAN. Wang et al. [26] proposed Real-ESRGAN, a high-order degradation model that simulates ringing artifacts through a sinc filter.

Nevertheless, in their pursuit of improved generalization, these methods often produce over-smoothed results and require extensive datasets and training resources.

2.3. Domain-Translation-Based Methods

Consider the reference HR image and the real-world LR image as residing in different domains: HR images in the source domain and LR images in the target domain. In this context, both the degradation process and the SR process can be treated as domain-translation challenges. One approach is to leverage the well-established CycleGAN architecture [31], typically used in style transfer. This strategy [32,33,34] involves the joint training of the degradation and SR stages. Cycle-consistency loss enforces content correspondence within the cycle, while style consistency between different domains is often assessed through adversarial loss. These methods possess the ability to adaptively learn the degradation and SR processes for images with varying distributions. These methods can achieve more accurate recovery of image details compared to degradation models based on generalization ability. However, these methods exhibit limited flexibility, making it challenging to integrate with advanced SR methods. Some methods [17,35] leverage the cycle structure solely within the degradation process, opting to train the degradation and SR processes separately. But, unlike the style transfer problem, which usually assumes a one-to-one correspondence between elements in domains, SR poses an ill-posed challenge. In the field of SR, a single LR image corresponds to countless HR images. The deterministic degradation relationship provided by the cycle structure struggles to account for the random variables in real-world degradation.

Another approach [29,30,36,37,38] is to directly employ GAN [28] for unsupervised learning of domain translation within the degradation process, resulting in the generation of HR–LR image pairs. The subsequent SR process is executed by deep convolution neural network (CNN) based SR methods through supervised learning. In contrast to the explicit modeling of degradation found in degradation modeling-based methods, this strategy constitutes implicit modeling of degradation. Fritsche et al. [36] proposed DSGAN to learn the domain-translation process from HR images to unpaired LR images in an unsupervised manner. DSGAN trains two models for different datasets, including SDSR [36] and TDSR [36]. Zhou et al. [37] proposed a color-guided domain-mapping network to alleviate color shift during domain translation, named CARB [37]. Luo et al. [38] proposed a probabilistic degradation model (PDM), which studies the degradation process as a random variable and models the mapping from a prior random variable to the degradation process. Son et al. [40] simulated the distribution of LR images in the target domain by generalizing low-frequency loss and designed an adaptive data loss (ADL). These methods can cover a wider range of unknown degradation distributions and is more robust when dealing with degradations that lack definition or are affected by random factors. Compared with methods based on the CycleGAN [31] framework, this strategy of separately modeling degradation can be easily integrated with advanced SR methods. With no cycle-consistency constraint, these methods rely on content consistency measured through the distance between low-frequency components of HR images and their corresponding LR images. Style consistency is still represented by adversarial loss.

However, the domain gap exists in the frequency domain between synthetic LR images and real-world LR images, which is overlooked by the existing methods [39]. Real-world images exhibit frequency-domain aliasing phenomena, while synthetic images tend to reduce aliasing features due to the low-pass filtering [40]. When applied to real-world images, SR models trained on such synthetic images often mistake aliasing phenomena as image content and amplify these features [44].

3. Method

3.1. Classical Degradation Model

Generally, LR image

I_{L R}

can be obtained from HR image

I_{H R}

by the classical degradation model [63,64]:

I_{L R} = (I_{H R} \otimes k) ↓_{s} + n

(1)

where

k

denotes a degradation kernel, ⊗ is a spatial convolution,

↓_{s}

indicates downsampling and decimation by a scale factor s, and

n

is a noise term. The order of these steps is not fixed; the classical degradation model can also be expressed as

I_{L R} = (I_{H R} ↓_{s}) \otimes k + n

(2)

The classical degradation model is widely used to synthesize HR–LR image pairs by degradation modeling-based methods. Furthermore, the classical degradation model is also combined with the probabilistic model in domain-translation methods [38].

3.2. Frequency-Domain Aliasing

There is a frequency-domain gap between real-world LR images and LR images synthesized by Equation (1) or Equation (2). Image downsampling process

↓_{s}

can be understood as sampling and decimation, which causes aliasing. Figure 1 shows the impact of several degradation processes on signals in the frequency domain. Figure 1a represents direct downsampling. Figure 1b and Figure 1c represent the degradation processes of Equation (1) and Equation (2), respectively. The influence of noise is not considered. Among them,

ω_{M}

is the highest inherent frequency of the image

I_{H R}

,

ω_{S}

is the sampling frequency, and

ω_{K}

is the cutoff frequency of the low-pass filter

k

. Compared with direct downsampling, the degradation processes represented by Equations (1) and (2) limit the frequency of

I_{L R}

within low-pass windows, reducing aliasing. Assume that

I_{H R}

generates

I_{L R}

through an unknown degradation process

D^{\times s} (\cdot)

, and the scale factor between

I_{H R}

and

I_{L R}

is s, which can be expressed as

I_{L R} = D^{\times s} (I_{H R})

(3)

Figure 2 selects three sets of image samples and shows the frequency-domain images of

I_{H R}

and

I_{L R}

after the above-mentioned degradations. The direct downsampling is denoted as

D_{B}^{\times s} (\cdot)

. And the degradation process in Equations (1) and (2) are expressed as

D_{1}^{\times s} (\cdot)

and

D_{2}^{\times s} (\cdot)

, respectively (

k

using Gaussian low-pass filter). The samples come from the real-world dataset City100 [11], so an LR image that has experienced the real-world degradation process

D_{R}^{\times s} (\cdot)

could be used as a reference. The first row of each group is the spatial-domain image. The second row of each group is the corresponding frequency-domain image. Consistent results with the previous analysis can be obtained from the samples. Compared with

D_{B}^{\times s} (\cdot)

,

D_{1}^{\times s} (\cdot)

and

D_{2}^{\times s} (\cdot)

limit the frequency-domain information to low-pass windows. Real-world LR images

D_{R}^{\times s} (I_{H R})

exhibit frequency-domain characteristics that are more like

D_{B}^{\times s} (I_{H R})

, rather than

D_{1}^{\times s} (I_{H R})

or

D_{2}^{\times s} (I_{H R})

. In the spatial domain,

D_{R}^{\times s} (I_{H R})

is not as smooth as

D_{1}^{\times s} (I_{H R})

and

D_{1}^{\times s} (I_{H R})

, but have jagged edges and background stripes, similar to

D_{B}^{\times s} (I_{H R})

. These phenomena indicate that the classical degradation model cannot simulate the aliasing phenomenon existing in real-world images. The step-by-step degradation models do not accurately represent complex degradation processes in the real world.

In order to more accurately describe the aliasing features of these LR images generated through different degradation processes, this work proposes a method to evaluate aliasing in LR images. Interpolate

I_{L R}

in Equation (3) to the size of

I_{H R}

, expressed as

I_{L R, u p} = (D^{\times s} (I_{H R})) ↑_{s}

(4)

where

↑_{s}

indicates upsampling by a scale factor s. In order to measure the degree of aliasing caused by the degradation process

D^{\times s} (\cdot)

on

I_{H R}

, assume that there is an alias-free copy of

I_{L R}

, which is the solution to the following optimization problem:

{\hat{I}}_{F} = \underset{I_{F}}{argmin} ∥ I_{F} - I_{L R, u p} ∥ + λ H (I_{F})

(5)

where the first item represents that the contents of

I_{F}

and

I_{L R, u p}

are consistent.

H (\cdot)

is a constraint that ensures that there is no aliasing in

I_{F}

.

λ

is the regularization constant. As shown in the last row of Figure 1,

I_{F}

can be obtained from

I_{H R}

through a low-pass filter

k

in the absence of aliasing. Therefore,

H (\cdot)

can be defined as

H (I_{F}) = C (I_{F}, k) = ∥ I_{H R} \otimes k - I_{F} ∥ + μ G (k)

(6)

where

C (\cdot)

is the non-aliasing constraint on

I_{F}

that introduces variable

k

, and

G (\cdot)

is the prior constraint of the filter

k

, usually using regularization term related to gradient.

μ

is the regularization constant. Equation (5) is rewritten as

({\hat{I}}_{F}, \hat{k}) = \underset{I_{F}, k}{argmin} ∥ I_{F} - I_{L R, u p} ∥ + λ ∥ I_{H R} \otimes k - I_{F} ∥ + μ G (k)

(7)

When a signal is aliased, its high-frequency components additively spill into the low-frequency components [65]. Treat aliasing as an additive noise [66,67,68] and denote it as

η

; then,

I_{L R, u p} = I_{F} + η

(8)

Then,

k

can be obtained by solving

\hat{k} = \underset{k}{argmin} ∥ I_{H R} \otimes k - I_{L R, u p} + η ∥ + μ G (k)

(9)

During the downsampling process, aliasing mostly occurs near areas containing high-frequency information, that is, areas where gradient changes are dramatic.

η

has less impact on areas where gradient changes are relatively gentle. Divide the image

I_{L R, u p}

and

I_{H R}

into N sample blocks. The i-th sample is denoted as

I_{L R, u p}^{i}

and

I_{H R}^{i}

, and

x^{i}

is the center pixel of this window. The metric to measure the usefulness of gradients in each block [69] is defined as

r (x^{i}) = \frac{∥ \sum_{y \in I_{H R}^{i}} \nabla I_{H R} (y) ∥}{\sum_{y \in I_{H R}^{i}} ∥ \nabla I_{H R} (y) ∥ + β}

(10)

where ∇ is the operator of first-order spatial derivatives and

β

is a constant coefficient. A small r implies that a flat region is involved, which causes neutralization of many gradient components. A large r implies existing strong image structures in the local region [69]. Next, define a weight w that is negatively related to r:

w (x^{i}) = \exp (- ∥ r (x^{i}) ∥^{α})

(11)

where

α

is a constant coefficient. Use w as the weight to find the average blur kernel of N blocks. Since

η

of the flat area is very small and has different directions, it will be ignored by the averaging process. The weight w of such a block is set large. The

η

of blocks containing high-frequency areas will interfere with the solution of the blur kernel as noise. Thus, the weight w of such blocks is set small. The average blur kernel can be approximated as

\bar{k} = \underset{k}{argmin} \sum_{i = 1}^{N} w (x^{i}) ∥ I_{H R}^{i} \otimes k - I_{L R, u p}^{i} ∥ + μ G (k)

(12)

In the case of

λ > 1

,

{\hat{I}}_{F}

can be obtained from Equation (7) as

{\hat{I}}_{F} = I_{H R} \otimes \bar{k}

(13)

η

can be obtained from Equation (8) as

η = I_{L R, u p} - {\hat{I}}_{F}

(14)

This work proposes to use the L1 distance

{∥ η ∥}_{1}

to quantify the degree of aliasing of

I_{L R}

obtained by

I_{H R}

through unknown degradation

D^{\times s} (\cdot)

. Figure 2 shows the visualization of the result

I_{L R}

and

η

produced by different degradation processes. There are three sets of samples. The first row of each set is

I_{H R}

and

I_{L R}

generated by the corresponding degradation processes. The second row is the corresponding images in the frequency domain, and the third row is the visualized images of

η

calculated by the proposed method. The corresponding L1 distance

{∥ η ∥}_{1}

is marked below the visualization of

η

. The visualization and the L1 distance of

η

both show that the degree of aliasing caused by the bicubic downsampling degradation

D_{B}^{\times s} (\cdot)

is similar to the real-world degradation

D_{R}^{\times s} (\cdot)

. In contrast, the currently commonly used degradation models

D_{1}^{\times s} (\cdot)

and

D_{2}^{\times s} (\cdot)

have obvious gaps with the real-world degradation model

D_{R}^{\times s} (\cdot)

.

3.3. Downsampling with Domain Translation

To reduce the frequency-domain gap between synthetic and real-world images, this work proposes a domain-translation-based downsampling framework to generate aliasing features through the frequency-domain branch. The framework is based on the probabilistic degradation model [38]. Since the step-by-step degradation process cannot simulate real-world aliasing, this paper proposes to separate the aliasing information from the reference image through the frequency-domain branch, and then compensate the image before downsampling to generate LR images with aliasing. As shown in Figure 3, the degradation process of the framework involves three steps:

\{\begin{matrix} I_{F} = I_{H R} \otimes k \\ I_{C} = netD ({netE}_{1} (I_{F}) + {netE}_{2} (η)) ↓_{s} \\ I_{L R} = I_{C} + n \end{matrix}

(15)

where

k

is the blur kernel generated by the Kernel Generator,

n

is the spatial-domain noise generated by the Noise Generator, netE₁ and netE₂ are two encoders, and netD is a decoder.

η

is the aliasing feature produced by the frequency-domain branch. Compensate

I_{F}

with

η

before downsampling.

The main function of the frequency-domain branch is to generate the aliasing feature

η

. Compared with the method of obtaining

η

mentioned in Section 3.2, a more intuitive method is to directly use a frequency-domain filter

K_{H}

to separate this part of aliasing information from the aliased image:

η = IFFT {K_{H} * FFT {(D_{B}^{\times s} (I_{H R})) ↑_{s}}}

(16)

where the image undergoes the degradation process

D_{B}^{\times s} (\cdot)

and is used as a reference image to provide aliasing information.

To model the distribution of the blur kernel

K_{H}

, a prior random variable

z_{m}

is defined, and a generator netM is used to learn the mapping from

z_{m}

to the frequency domain filter:

K_{H} = netM (z_{m}) z_{m} \sim N (0, 1)

(17)

If the height, width, and number of channels of

I_{H R}

are h, w, and c, then

z_{m} \in R^{f_{m} \times h \times w} K_{H} \in R^{h \times w \times c}

(18)

where

f_{m}

is the dimension of the normal distribution

z_{m}

.

Finally, the aliasing information

η

is integrated into

I_{F}

for compensation before downsampling. In this way, compared to classical degradation processes,

I_{L R}

generated by this proposed framework contains aliasing information and has a smaller domain gap compared to real-world images.

3.4. Frequency-Domain Loss

In this work, a frequency-domain loss is proposed to guide the domain-translation network to generate frequency-domain features that are consistent with real-world images. In order to guide the frequency-domain branch to generate

η

that conforms to the aliasing distribution,

η_{B}

that corresponds to

D_{B}^{\times s} (\cdot)

degradation is used as prior knowledge. The optimization of netM is formulated as

\underset{netM}{argmin} {∥ IFFT {netM (z_{m}) * FFT {(D_{B}^{\times s} (I_{H R})) ↑_{s}}} - γ η_{B} ∥}_{1}

(19)

where

γ

is a constant coefficient. Equation (19) can be simply written as

\underset{η}{argmin} {∥ η - γ η_{B} ∥}_{1}

(20)

In order to ensure that the generated image

I_{L R}

contains aliasing features, the aliasing degree

∥ η_{L R} ∥_{1}

related to

I_{L R}

is measured through the method proposed in Section 3.2.

η_{B, d o w n}

is a downsampled version of

η_{B}

, which serves as a prior. In this way, the overall frequency-domain loss function is defined as

L_{f r e} = ∥ η - γ η_{B} ∥_{1} + υ ∥ ∥ η_{B, d o w n} ∥_{1} - τ ∥ η_{L R} ∥_{1} ∥

(21)

where

υ

and

τ

are weights for the regularizer term. The entire frequency-domain loss function consists of two parts. The first part uses

η_{B}

to guide the generator to generate

η

that conforms to the aliasing distribution. The second part uses the downsampled version

η_{B, d o w n}

to constrain the final generated

I_{L R}

to satisfy the aliasing degree.

Since the degradation process represented by

D_{B}^{\times s} (\cdot)

is deterministic,

{\bar{k}}_{B}

of the processes

D_{B}^{\times s} (\cdot)

is independent of image and is only related to the downsampling factor s. As such, replace

{\bar{k}}_{B}

with fixed blur kernel calculated through a large number of samples in advance. This approach avoids recalculating

{\bar{k}}_{B}

every time the loss is calculated.

3.5. Overall Loss

Synthetic image

I_{L R}

needs to meet the following two requirements: the image content is consistent with

I_{H R}

and the image distribution is consistent with the target domain images. In order to preserve the image content across different scales, the data loss

L_{d a t a}

is defined to measure the distance between the low-frequency components of

I_{H R}

and the synthetic image

I_{L R}

:

L_{d a t a} = {∥ P^{\times s} (I_{H R}) - P (I_{L R}) ∥}_{1}

(22)

where

P (\cdot)

is a low-pass filtering and

P^{\times s} (\cdot)

is a combination of low-pass filtering and downsampling by s. The adversarial loss

L_{a d v}

is used to enforce

I_{L R}

to follow the distribution of the target domain images:

\{\begin{matrix} L_{a d v}^{g} = - E [\log (F (I_{L R}))] \\ L_{a d v}^{d} = - E [\log (F (I_{L R}^{r e a l}))] + E [\log (1 - F (I_{L R}))] \end{matrix}

(23)

where

L_{a d v}^{g}

and

L_{a d v}^{d}

are the adversarial losses of the generator and discriminator, respectively.

F (\cdot)

is the discriminator and

I_{L R}^{r e a l}

is the real-world image of the target domain.

The total loss function of the degradation model is

\{\begin{matrix} L^{g} = L_{d a t a} + L_{a d v}^{g} + L_{f r e} \\ L^{d} = L_{a d v}^{d} \end{matrix}

(24)

4. Experiments

4.1. Datasets and Training Details

Datasets. In order to test the effect of the proposed method on real-world images, 400 images were selected from the HR dataset DIV2K [18] as the source domain HR images. Real-world LR datasets or datasets with unknown and complex degradation were selected as target domain images, including DRealSR [12] and NTIRE2020 [70] Track1 and Track2. The source domain HR image and the target domain LR image were unpaired, which means that their contents were inconsistent. A portion of the target domain LR images were selected for training, and another portion of the target domain images were chosen for testing and validation. The proposed domain-translation-based downsampling method was used to learn the translation from the source domain to the target domain, and the loss function proposed in Equation (24) was used.

Evaluation metrics. For validation sets with ground-truth references, such as NTIRE2020 Track1, we used PSNR, SSIM [71], and LPIPS [72] as evaluation metrics. For the validation set without ground-truth references, we used NIQE [73], NRQM [74], and PI [75] as evaluation metrics.

Implementation and training details. During training,

z_{m}

was set as

f_{m} = 128

. The HR images were cropped into

128 \times 128

, and the LR images were cropped into

64 \times 64

for scale factor 2 and

32 \times 32

for scale factor 4. In comparative experiments and ablation experiments, the network was trained for 40 epochs with a learning rate of

1 \times 10^{- 4}

. The batch size was set as 24.

4.2. Comparison with Other Domain-Translation Based Methods

The proposed method is compared with several state-of-the-art real-world SR methods based on domain translation, including SDSR [36], TDSR [36], CARB [37], PDM [38], and ADL [40]. For fair comparison, the SR model used by the proposed method is consistent with the SR model used by these methods, including EDSR [76] and ESRGAN [6]. EDSR is PSNR-oriented and supervised by L1 and L2 losses. ESRGAN is perceptual-oriented and supervised by perceptual loss. Therefore, when EDSR is the SR model, PSNR and SSIM are used as the main evaluation metrics. When ESRGAN is the SR model, LPIPS (based on perceptual similarity) is used as the main evaluation metric.

The results on NTIRE2020 Track 1 [70] are presented in Table 1. When combined with the PSNR-oriented network EDSR, the proposed method in this work attains the best performance in terms of PSNR and SSIM. Similarly, when combined with the perceptual-oriented network ESRGAN, it achieves the best LPIPS results, indicating superior perceptual quality. This underscores that, regardless of being PSNR-oriented or focused on perceptual quality, incorporating the method in this work leads to the best results. The LR images in NTIRE2020 Track 1 are obtained by an undisclosed degradation operator on the HR dataset Flickr2k [19]. This degradation operator [70] generates structured artifacts produced by the kind of image-processing pipelines found on very low-end devices. Figure 4 provides a visual comparison of three sets of samples. The SR model in the first row of each group of samples is EDSR; both LR images are amplified by bicubic interpolation, and ground-truth HR images are used as references. The SR model in the second row is ESRGAN. In the results with EDSR as the SR model, the method proposed in this work achieved the sharpest edge while maintaining the denoising ability. As a perception-oriented method, ESRGAN obviously has stronger detail-recovery capabilities when used as the SR model, but it also bears the risk of amplifying complex noise. In each set of samples, “SDSR + ESRGAN” and “TDSR + ESRGAN” are plagued by complex noise. “ADL + ESRGAN” over-enlarges some details that do not exist in the ground-truth HR image. “CARB + ESRGAN” is too blurry to take advantage of the perception-oriented methods. “Ours + ESRGAN” is cleaner than other methods, indicating that the proposed method can better model complex noise in the spatial domain and frequency domain.

Table 2 shows the results of these methods on NTIRE2020 Track 2 [70]. The method proposed in this work still achieves the best results in NRQM and PI. The LR images in NTIRE2020 Track 2 are real-world images taken by an iPhone3 and come from DPED [22]. NTIRE2020 Track 2 contains more unknown and complex real-world degradation processes. Since the methods used for comparison all provide the perceptual-oriented SR model ESRGAN, the method proposed in this work also uses ESRGAN as the SR model. Figure 5 shows the visual comparison of three sets of samples. LR images amplified by bicubic interpolation are used as references. The results of “SDSR”, “TDSR”, and “ADL” are all affected by severe noise, while the results of “CARB” are still blurry. “PDM” has a better ability to remove noise in the spatial domain, but there is still some complex noise related to the image structure, as shown in the first sample. The method proposed in this work obtains the cleanest results, and there is no structural noise like in the results of “PDM”.

4.3. Comparison with Other Degradation Modeling-Based Methods

In order to have an intuitive comparison of the effects of different methods, comparisons with several methods based on degradation modeling are also provided, including RealSR [24], BSRGAN [25], and Real-ESRGAN [26]. These methods are all based on predefined degradation steps and have strong generalization capabilities for real-world LR images. Since these methods are perceptual-oriented and use ESRGAN or improved ESRGAN [26] as SR models, the method proposed in this work is also combined with improved ESRGAN [26] for comparison. The real-world dataset DRealSR [12] is selected as the validation set. The LR images of DRealSR are derived from the real world, and have ground-truth images as references. PSNR, SSIM, and LPIPS are evaluation metrics. Due to the perceptual quality orientation, the results of LPIPS are more important.

Table 3 shows the results of theses methods on DRealSR. The method proposed in this work achieved the best LPIPS and SSIM results. Figure 6 shows the visual comparison of three samples. LR images amplified by bicubic interpolation and ground-truth HR images are used as references. The results show that, at the expense of generalization ability, “BSRGAN” and “Real-ESRGAN” tend to produce over-smoothed results that ignore texture details, such as the shape of the petals in the first sample. Since the predefined degradation model may not match the real degradation model, there is a gap between the results of “RealSR” and the ground truth. The method proposed in this work can learn the specific degradation process, so it achieves the visual results that are the closest to ground truth.

4.4. Ablation Studies

In order to verify the effect of the proposed frequency-domain branch and frequency-domain loss function, comparative experiments were designed. The probabilistic degradation model without frequency-domain branch and frequency-domain loss function is used as a baseline. On this basis, the effects of the frequency-domain branch and the frequency-domain loss function are independently verified. The results are shown in Table 4. The effect of having frequency-domain branch or frequency-domain loss function is better than the baseline. When both are available, SR has the best effect.

In order to verify the role of the random variable,

z_{m}

is fixed to zero in the comparative experiment. Table 5 shows that, when

z_{m}

is randomly sampled, a better SR effect is achieved than when

z_{m}

is fixed to zero. The reason is speculated that

z_{m}

can simulate the influence of random factors and avoid overfitting.

5. Conclusions

This work proposes a method to evaluate frequency-domain aliasing in images suffering from unknown degradation. This is the first time that the effect of aliasing on images has been considered in a degradation model. Furthermore, a domain-translation framework is proposed that aims to leverage unpaired HR and LR images to generate synthetic datasets for training. This framework combines frequency-domain branch and frequency-domain loss function to generate synthetic images with real-world aliasing characteristics, thereby reducing the domain gap between synthetic images and real-world images. Experimental results demonstrate that the proposed method could improve the visual effects and quantitative results of super-resolution models on real-world images. This illustrates the practicality of the method proposed in this work and marks the possibility of improving super-resolution technology for real-world scenarios. In the future, the proposed method will be extended to estimate degradation models with real-world noise. Another direction is to combine domain-translation methods with reconstruction-based methods, with the goal of accurately estimating the degradation model and reconstructing high-resolution images, even when there is only a single low-resolution image or there are real-world images from different sources.

Author Contributions

Methodology, Y.H.; software, Y.H.; validation, Y.H.; formal analysis, Y.H.; investigation, Y.H.; resources, F.Y.; data curation, Y.H.; writing—original draft preparation, Y.H.; writing—review and editing, F.Y.; visualization, Y.H.; supervision, F.Y.; project administration, F.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data generated or analyzed during this study are included in this manuscript, and the associated code required to generate the data will be published as soon as possible.

Conflicts of Interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

References

Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 184–199. [Google Scholar]
Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar]
Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Change Loy, C. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
Yu, C.; Hong, L.; Pan, T.; Li, Y.; Li, T. ESTUGAN: Enhanced Swin Transformer with U-Net Discriminator for Remote Sensing Image Super-Resolution. Electronics 2023, 12, 4235. [Google Scholar] [CrossRef]
Shao, G.; Sun, Q.; Gao, Y.; Zhu, Q.; Gao, F.; Zhang, J. Sub-Pixel Convolutional Neural Network for Image Super-Resolution Reconstruction. Electronics 2023, 12, 3572. [Google Scholar] [CrossRef]
Shi, Y.; Jiang, C.; Liu, C.; Li, W.; Wu, Z. A Super-Resolution Reconstruction Network of Space Target Images Based on Dual Regression and Deformable Convolutional Attention Mechanism. Electronics 2023, 12, 2995. [Google Scholar] [CrossRef]
Ye, S.; Zhao, S.; Hu, Y.; Xie, C. Single-Image Super-Resolution Challenges: A Brief Review. Electronics 2023, 12, 2975. [Google Scholar] [CrossRef]
Chen, C.; Xiong, Z.; Tian, X.; Zha, Z.J.; Wu, F. Camera lens super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 1652–1660. [Google Scholar]
Wei, P.; Xie, Z.; Lu, H.; Zhan, Z.; Ye, Q.; Zuo, W.; Lin, L. Component divide-and-conquer for real-world image super-resolution. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 101–117. [Google Scholar]
Cai, J.; Zeng, H.; Yong, H.; Cao, Z.; Zhang, L. Toward real-world single image super-resolution: A new benchmark and a new model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3086–3095. [Google Scholar]
Zhang, X.; Chen, Q.; Ng, R.; Koltun, V. Zoom to learn, learn to zoom. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 3762–3770. [Google Scholar]
Köhler, T.; Bätz, M.; Naderi, F.; Kaup, A.; Maier, A.; Riess, C. Toward bridging the simulated-to-real gap: Benchmarking super-resolution on real data. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 2944–2959. [Google Scholar] [CrossRef]
Joze, H.R.V.; Zharkov, I.; Powell, K.; Ringler, C.; Liang, L.; Roulston, A.; Lutz, M.; Pradeep, V. Imagepairs: Realistic super resolution dataset via beam splitter camera rig. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 518–519. [Google Scholar]
Sun, W.; Gong, D.; Shi, Q.; van den Hengel, A.; Zhang, Y. Learning to zoom-in via learning to zoom-out: Real-world super-resolution by generating and adapting degradation. IEEE Trans. Image Process. 2021, 30, 2947–2962. [Google Scholar] [CrossRef]
Agustsson, E.; Timofte, R. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 126–135. [Google Scholar]
Timofte, R.; Agustsson, E.; Van Gool, L.; Yang, M.H.; Zhang, L. Ntire 2017 challenge on single image super-resolution: Methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 114–125. [Google Scholar]
Wang, X.; Yu, K.; Dong, C.; Loy, C.C. Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 606–615. [Google Scholar]
Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4401–4410. [Google Scholar]
Ignatov, A.; Kobyshev, N.; Timofte, R.; Vanhoey, K.; Van Gool, L. Dslr-quality photos on mobile devices with deep convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Honolulu, HI, USA, 21–26 July 2017; pp. 3277–3285. [Google Scholar]
Zhou, R.; Susstrunk, S. Kernel modeling super-resolution on real low-resolution images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2433–2443. [Google Scholar]
Ji, X.; Cao, Y.; Tai, Y.; Wang, C.; Li, J.; Huang, F. Real-world super-resolution via kernel estimation and noise injection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 466–467. [Google Scholar]
Zhang, K.; Liang, J.; Van Gool, L.; Timofte, R. Designing a practical degradation model for deep blind image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 4791–4800. [Google Scholar]
Wang, X.; Xie, L.; Dong, C.; Shan, Y. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 1905–1914. [Google Scholar]
Chen, H.; He, X.; Qing, L.; Wu, Y.; Ren, C.; Sheriff, R.E.; Zhu, C. Real-world single image super-resolution: A brief review. Inf. Fusion 2022, 79, 124–145. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
Bell-Kligler, S.; Shocher, A.; Irani, M. Blind super-resolution kernel estimation using an internal-gan. In Proceedings of the 33rd Conference on Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Bulat, A.; Yang, J.; Tzimiropoulos, G. To learn image super-resolution, use a gan to learn how to do image degradation first. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 185–200. [Google Scholar]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Honolulu, HI, USA, 21–26 July 2017; pp. 2223–2232. [Google Scholar]
Chen, S.; Han, Z.; Dai, E.; Jia, X.; Liu, Z.; Xing, L.; Zou, X.; Xu, C.; Liu, J.; Tian, Q. Unsupervised image super-resolution with an indirect supervised path. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 468–469. [Google Scholar]
Kim, G.; Park, J.; Lee, K.; Lee, J.; Min, J.; Lee, B.; Han, D.K.; Ko, H. Unsupervised real-world super resolution with cycle generative adversarial network and domain discriminator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 456–457. [Google Scholar]
Wang, W.; Zhang, H.; Yuan, Z.; Wang, C. Unsupervised real-world super-resolution: A domain adaptation perspective. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 11–17 October 2021; pp. 4318–4327. [Google Scholar]
Lugmayr, A.; Danelljan, M.; Timofte, R. Unsupervised learning for real-world super-resolution. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 3408–3416. [Google Scholar]
Fritsche, M.; Gu, S.; Timofte, R. Frequency separation for real-world super-resolution. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 3599–3608. [Google Scholar]
Zhou, Y.; Deng, W.; Tong, T.; Gao, Q. Guided frequency separation network for real-world super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 428–429. [Google Scholar]
Luo, Z.; Huang, Y.; Li, S.; Wang, L.; Tan, T. Learning the degradation distribution for blind image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 6063–6072. [Google Scholar]
Ji, X.; Tao, G.; Cao, Y.; Tai, Y.; Lu, T.; Wang, C.; Li, J.; Huang, F. Frequency consistent adaptation for real world super resolution. Aaai Conf. Artif. Intell. 2021, 35, 1664–1672. [Google Scholar] [CrossRef]
Son, S.; Kim, J.; Lai, W.S.; Yang, M.H.; Lee, K.M. Toward real-world super-resolution via adaptive downsampling models. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 8657–8670. [Google Scholar] [CrossRef] [PubMed]
Lee, W.; Son, S.; Lee, K.M. Ap-bsn: Self-supervised denoising for real-world images via asymmetric pd and blind-spot network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 17725–17734. [Google Scholar]
Glotzbach, J.W.; Schafer, R.W.; Illgner, K. A method of color filter array interpolation with alias cancellation properties. In Proceedings of the 2001 International Conference on Image Processing (Cat. No. 01CH37205), Thessaloniki, Greece, 7–10 October 2001; IEEE: Piscataway, NJ, USA, 2001; Volume 1, pp. 141–144. [Google Scholar]
Forsey, A.; Gungor, S. Demosaicing images from colour cameras for digital image correlation. Opt. Lasers Eng. 2016, 86, 20–28. [Google Scholar] [CrossRef]
Hao, Y.; Yu, F. Super-Resolution Degradation Model: Converting High-Resolution Datasets to Optical Zoom Datasets. In IEEE Transactions on Circuits and Systems for Video Technology; IEEE: Piscataway, NJ, USA, 2023; pp. 6374–6389. [Google Scholar]
Lettington, A.H.; Hong, Q.H.; Tzimopoulou, S. Superresolution by spatial-frequency aliasing. In Proceedings of the Infrared Technology and Applications XXII, Orlando, FL, USA, 8–12 April 1996; SPIE: St Bellingham, WA, USA, 1996; Volume 2744, pp. 583–590. [Google Scholar]
Hoshino, H.; Okano, F.; Yuyama, I. A study on resolution and aliasing for multi-viewpoint image acquisition. IEEE Trans. Circuits Syst. Video Technol. 2000, 10, 366–375. [Google Scholar] [CrossRef]
Pusey, E.; Yoon, C.; Anselmo, M.L.; Lufkin, R.B. Aliasing artifacts in MR imaging. Comput. Med Imaging Graph. 1988, 12, 219–224. [Google Scholar] [CrossRef]
Schöberl, M.; Schnurrer, W.; Oberdörster, A.; Fössel, S.; Kaup, A. Dimensioning of optical birefringent anti-alias filters for digital cameras. In Proceedings of the 2010 IEEE International Conference on Image Processing, Hong Kong, China, 26–29 September 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 4305–4308. [Google Scholar]
Muammar, H.; Dragotti, P.L. An investigation into aliasing in images recaptured from an LCD monitor using a digital camera. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 2242–2246. [Google Scholar]
Esqueda, F.; Bilbao, S.; Välimäki, V. Aliasing reduction in clipped signals. IEEE Trans. Signal Process. 2016, 64, 5255–5267. [Google Scholar] [CrossRef]
Bilinskis, I. Digital Alias-Free Signal Processing; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
Vandewalle, P.; Süsstrunk, S.; Vetterli, M. A frequency domain approach to registration of aliased images with application to super-resolution. Eurasip J. Adv. Signal Process. 2006, 2006, 1–14. [Google Scholar] [CrossRef]
Efrat, N.; Glasner, D.; Apartsin, A.; Nadler, B.; Levin, A. Accurate blur models vs. image priors in single image super-resolution. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2832–2839. [Google Scholar]
Luo, Z.; Huang, H.; Yu, L.; Li, Y.; Fan, H.; Liu, S. Deep constrained least squares for blind image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 17642–17652. [Google Scholar]
Tao, G.; Ji, X.; Wang, W.; Chen, S.; Lin, C.; Cao, Y.; Lu, T.; Luo, D.; Tai, Y. Spectrum-to-kernel translation for accurate blind image super-resolution. Adv. Neural Inf. Process. Syst. 2021, 34, 22643–22654. [Google Scholar]
Wang, L.; Wang, Y.; Dong, X.; Xu, Q.; Yang, J.; An, W.; Guo, Y. Unsupervised degradation representation learning for blind super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 10581–10590. [Google Scholar]
Gu, J.; Lu, H.; Zuo, W.; Dong, C. Blind super-resolution with iterative kernel correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 1604–1613. [Google Scholar]
Zhang, K.; Gool, L.V.; Timofte, R. Deep unfolding network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 3217–3226. [Google Scholar]
Huang, Y.; Li, S.; Wang, L.; Tan, T. Unfolding the alternating optimization for blind super resolution. Adv. Neural Inf. Process. Syst. 2020, 33, 5632–5643. [Google Scholar]
Yue, Z.; Zhao, Q.; Xie, J.; Zhang, L.; Meng, D.; Wong, K.Y.K. Blind image super-resolution with elaborate degradation modeling on noise and kernel. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 2128–2138. [Google Scholar]
Hussein, S.A.; Tirer, T.; Giryes, R. Correction filter for single image super-resolution: Robustifying off-the-shelf deep super-resolvers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 1428–1437. [Google Scholar]
Zhang, K.; Zuo, W.; Zhang, L. Learning a single convolutional super-resolution network for multiple degradations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3262–3271. [Google Scholar]
Elad, M.; Feuer, A. Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images. IEEE Trans. Image Process. 1997, 6, 1646–1658. [Google Scholar] [CrossRef]
Liu, C.; Sun, D. On Bayesian adaptive video super resolution. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 346–360. [Google Scholar] [CrossRef]
Vasconcelos, C.; Larochelle, H.; Dumoulin, V.; Roux, N.L.; Goroshin, R. An effective anti-aliasing approach for residual networks. arXiv 2020, arXiv:2011.10675. [Google Scholar]
Xiaoxi, W.; Yingjie, Y.; Jianbin, H. Aliasing fringe pattern denoising based on deep learning. In Proceedings of the AOPC 2021: Novel Technologies and Instruments for Astronomical Multi-Band Observations, Beijing, China, 23–25 July 2021; SPIE: St Bellingham, WA, USA, 2021; Volume 12069, pp. 178–183. [Google Scholar]
Vollmerhausen, R.H.; Driggers, R.G.; Wilson, D.L. Predicting range performance of sampled imagers by treating aliased signal as target-dependent noise. JOSA A 2008, 25, 2055–2065. [Google Scholar] [CrossRef]
Hennenfent, G.; Herrmann, F.J. Irregular sampling–from aliasing to noise. In Proceedings of the 69th EAGE Conference and Exhibition Incorporating SPE EUROPEC 2007, London, UK, 11–14 June 2007; p. cp–27. [Google Scholar]
Xu, L.; Jia, J. Two-phase kernel estimation for robust motion deblurring. In Proceedings of the Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010; Proceedings, Part I 11. Springer: Berlin/Heidelberg, Germany, 2010; pp. 157–170. [Google Scholar]
Lugmayr, A.; Danelljan, M.; Timofte, R. Ntire 2020 challenge on real-world image super-resolution: Methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 494–495. [Google Scholar]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar]
Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
Ma, C.; Yang, C.Y.; Yang, X.; Yang, M.H. Learning a no-reference quality metric for single-image super-resolution. Comput. Vis. Image Underst. 2017, 158, 1–16. [Google Scholar] [CrossRef]
Blau, Y.; Mechrez, R.; Timofte, R.; Michaeli, T.; Zelnik-Manor, L. The 2018 PIRM challenge on perceptual image super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]

Figure 1. Overview of the impact of several degradation processes on signals, including (a) direct downsampling, (b) the degradation process represented by Equation (1), (c) the degradation process represented by Equation (2). The last row is the results of LR image

I_{L R}

being upsampled again, recorded as

I_{L R, u p}

.

Figure 1. Overview of the impact of several degradation processes on signals, including (a) direct downsampling, (b) the degradation process represented by Equation (1), (c) the degradation process represented by Equation (2). The last row is the results of LR image

I_{L R}

being upsampled again, recorded as

I_{L R, u p}

.

Figure 2. Visualization of the result

I_{L R}

and

η

produced by different degradation processes. The degradation processes include real-world degradation process

D_{R}^{\times s} (\cdot)

, bicubic downsampling

D_{B}^{\times s} (\cdot)

, and the degradation processes of Equations (1) and (2), denoted as

D_{1}^{\times s} (\cdot)

and

D_{2}^{\times s} (\cdot)

, respectively (

k

uses a Gaussian blur kernel). Three sets of samples are presented. The first row of each set shows

I_{H R}

and

I_{L R}

. The second row shows the corresponding frequency domain image. The third row has the visualized images of

η

calculated by the proposed method. The corresponding L1 distance

{∥ η ∥}_{1}

is marked below

η

.

Figure 2. Visualization of the result

I_{L R}

and

η

produced by different degradation processes. The degradation processes include real-world degradation process

D_{R}^{\times s} (\cdot)

, bicubic downsampling

D_{B}^{\times s} (\cdot)

, and the degradation processes of Equations (1) and (2), denoted as

D_{1}^{\times s} (\cdot)

and

D_{2}^{\times s} (\cdot)

, respectively (

k

uses a Gaussian blur kernel). Three sets of samples are presented. The first row of each set shows

I_{H R}

and

I_{L R}

. The second row shows the corresponding frequency domain image. The third row has the visualized images of

η

calculated by the proposed method. The corresponding L1 distance

{∥ η ∥}_{1}

is marked below

η

.

Figure 3. Overview of the proposed domain-translation-based downsampling framework.

Figure 4. Comparison of our method with state-of-the-art real-world SR methods based on domain translation. Test images from dataset NTIRE2020 Track 1. Inside the red box is the enlarged area.

Figure 5. Comparison of our method with state-of-the-art real-world SR methods based on domain translation. Test images from dataset NTIRE Track 2. Inside the red box is the enlarged area.

Figure 6. Comparison of our method with state-of-the-art real-world SR methods based on degradataion modeling. Test images from dataset DRealSR. Inside the red box is the enlarged area.

Table 1. Quantitative comparison with domain-translation-based methods on NTIRE2020 Track 1 with ground-truth references. ↑ denotes the larger the better. ↓ denotes the smaller the better. The best results are denoted in red, and the second best are denoted in blue.

Methods	PSNR ↑	SSIM↑	LPIPS↓
PDM + EDSR	21.099	0.6044	0.3794
ADL + EDSR	28.942	0.8004	0.3248
SDSR + ESRGAN	23.096	0.4479	0.5619
TDSR + ESRGAN	21.949	0.3901	0.6024
CARB + ESRGAN	28.483	0.7968	0.3285
ADL + ESRGAN	24.688	0.6437	0.3063
Ours + EDSR	29.366	0.8033	0.3107
Ours + ESRGAN	25.661	0.7636	0.2930

Table 2. Quantitative comparison with domain-translation-based methods on NTIRE2020 Track 2 without ground-truth references. ↑ denotes the larger the better. ↓ denotes the smaller the better. The best results are denoted in red, and the second best are denoted in blue.

Methods	NIQE↓	NRQM↑	PI↓
SDSR + ESRGAN	6.744	4.630	6.057
TDSR + ESRGAN	4.365	4.985	4.690
CARB + ESRGAN	8.459	2.256	8.101
PDM + ESRGAN	6.714	4.231	6.241
ADL + ESRGAN	5.229	3.352	5.938
Ours + ESRGAN	4.423	5.158	4.632

Table 3. Quantitative comparison with degradation modeling-based methods on DRealSR with ground-truth references. ↑ denotes the larger the better. ↓ denotes the smaller the better. The best results are denoted in red, and the second best are denoted in blue.

Methods	PSNR↑	SSIM ↑	LPIPS↓
RealSR	23.088	0.7122	0.2438
BSRGAN	28.147	0.8128	0.1824
Real-ESRGAN	26.656	0.8013	0.1875
Ours + ESRGAN	26.799	0.8188	0.1779

Table 4. Ablation studies on frequency-domain branch and frequency-domain loss. The test set is NTIRE2020 Track 1 with ground-truth references. ↑ denotes the larger the better. ↓ denotes the smaller the better. The best results are denoted in red.

Frequency Branch	Frequency Loss	PSNR↑	SSIM ↑	LPIPS↓
		26.397	0.7335	0.3919
✓		26.719	0.7416	0.4005
	✓	28.842	0.7909	0.3242
✓	✓	29.366	0.8033	0.3107

Table 5. Ablation studies on the random variable

z_{m}

. The test set is NTIRE2020 Track 1 with ground-truth references. ↑ denotes the larger the better. ↓ denotes the smaller the better. The best results are denoted in red.

Table 5. Ablation studies on the random variable

z_{m}

. The test set is NTIRE2020 Track 1 with ground-truth references. ↑ denotes the larger the better. ↓ denotes the smaller the better. The best results are denoted in red.

$z_{m}$	PSNR ↑	SSIM ↑	LPIPS↓
	28.909	0.7996	0.3456
✓	29.366	0.8033	0.3107

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hao, Y.; Yu, F. Learning the Frequency Domain Aliasing for Real-World Super-Resolution. Electronics 2024, 13, 250. https://doi.org/10.3390/electronics13020250

AMA Style

Hao Y, Yu F. Learning the Frequency Domain Aliasing for Real-World Super-Resolution. Electronics. 2024; 13(2):250. https://doi.org/10.3390/electronics13020250

Chicago/Turabian Style

Hao, Yukun, and Feihong Yu. 2024. "Learning the Frequency Domain Aliasing for Real-World Super-Resolution" Electronics 13, no. 2: 250. https://doi.org/10.3390/electronics13020250

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Learning the Frequency Domain Aliasing for Real-World Super-Resolution

Abstract

1. Introduction

2. Related Work

2.1. Image Pair-Based Methods

2.2. Degradation Modeling-Based Methods

2.3. Domain-Translation-Based Methods

3. Method

3.1. Classical Degradation Model

3.2. Frequency-Domain Aliasing

3.3. Downsampling with Domain Translation

3.4. Frequency-Domain Loss

3.5. Overall Loss

4. Experiments

4.1. Datasets and Training Details

4.2. Comparison with Other Domain-Translation Based Methods

4.3. Comparison with Other Degradation Modeling-Based Methods

4.4. Ablation Studies

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI