Next Article in Journal
BlockchainBot: A Novel Botnet Infrastructure Enhanced by Blockchain Technology and IoT
Next Article in Special Issue
Multilevel Pyramid Network for Monocular Depth Estimation Based on Feature Refinement and Adaptive Fusion
Previous Article in Journal
Design of a Gabor Filter-Based Image Denoising Hardware Model
Previous Article in Special Issue
Mathematical Formula Image Screening Based on Feature Correlation Enhancement
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Online Learning for Reference-Based Super-Resolution

1
Department of Electronics Engineering, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Korea
2
School of Computer Science and Engineering, Pusan National University, 2 Busandaehak-ro 63beon-gil, Geumjeong-gu, Busan 46241, Korea
3
Department of Computer Science, Hanyang University, 222 Wangsimni-ro, Seongdong-gu, Seoul 04763, Korea
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Electronics 2022, 11(7), 1064; https://doi.org/10.3390/electronics11071064
Submission received: 28 February 2022 / Revised: 21 March 2022 / Accepted: 24 March 2022 / Published: 28 March 2022
(This article belongs to the Collection Computer Vision and Pattern Recognition Techniques)

Abstract

:
Online learning is a method for exploiting input data to update deep networks in the test stage to derive potential performance improvement. Existing online learning methods for single-image super-resolution (SISR) utilize an input low-resolution (LR) image for the online adaptation of deep networks. Unlike SISR approaches, reference-based super-resolution (RefSR) algorithms benefit from an additional high-resolution (HR) reference image containing plenty of useful features for enhancing the input LR image. Therefore, we introduce a new online learning algorithm, using several reference images, which is applicable to not only RefSR but also SISR networks. Experimental results show that our online learning method is seamlessly applicable to many existing RefSR and SISR models, and that improves performance. We further present the robustness of our method to non-bicubic degradation kernels with in-depth analyses.

1. Introduction

Deep learning-based single-image super-resolution (SISR) algorithms [1,2,3,4,5,6,7,8,9] have shown remarkable progress in recent years. However, these algorithms still suffer from blurry output images because they are generally trained to minimize the mean squared error (MSE) or mean absolute error (MAE) between the network output and ground truth images. This problem has led to various efforts to generate high-frequency details with a generative adversarial network (GAN) and/or perceptual losses [10,11,12]. However, these methods often lead to reduced reconstruction performance with unexpected visual artifacts. The reason for this is that GAN-based deep networks often generate visually pleasing images, but fail to recover genuine information lost during the downsampling (degradation) process. In order to reconstruct the lost information, reference-based super-resolution (RefSR) methods have been proposed. RefSR algorithms aim to benefit from rich high-frequency details of an external high-quality reference image such as video frames [13,14] or similar web images [15] during the reconstruction, and many RefSR methods attempt to align and combine information from a low-resolution (LR) input image and a high-resolution (HR) reference image to synthesize a HR image. To this end, most of the studies so far have explored how to find similar features and match the features [16,17,18] of the LR image and the reference image well. For instance, patch matching [19], deformable convolution [20], and attention [21] techniques have been utilized. The aforementioned methods have succeeded in transferring the high-frequency detail of a reference image. However, these reference-based algorithms show performance degradation when irrelevant high-resolution images are given as references. To this end, we present an online learning technique inspired by zero-shot super-resolution (ZSSR) [22]. In ZSSR, a LR input image I L R and its downsampled version I L R are used for supervision during the inference phase. However, ZSSR has difficulties when dealing with a large scaling factor (e.g., ×3, ×4).
Therefore, in this paper, we propose a method to effectively exploit both LR and HR reference images for online learning to update not only RefSR but also SISR models. Furthermore, using a pre-trained SR model, we create a pseudo-HR image from a LR input image, then use this pair of a pseudo-HR images I ¯ H R and a downsampled pseudo-HR image I ¯ H R as another datum for the online learning of the SR model. In summary, we perform online learning for both SISR and RefSR models by utilizing three types of supervision, including I L R , I R e f , and I ¯ H R . We use not only each supervision individually, but also combinations of each supervision for online learning. As a result, the proposed method can benefit from more images during the online adaptation, compared with ZSSR [22].
Our key contributions can be summarized as follows:
  • We propose an online learning method for reference-based super-resolution with various data pairs for supervision. To this end, we present three methods for SISR models and four methods for RefSR models;
  • Our method is very simple, but it is effective, and can be seamlessly combined with both SISR and RefSR models;
  • Our method shows consistent performance improvements without being significantly affected by the degree of similarity between the reference and input images.

2. Related Works

In this section, we review deep learning-based SISR and RefSR methods. Then, we introduce recent SR approaches using online adaptation.
Single-image super-resolution (SISR) restores high-frequency details of a LR image using only an input LR image. The traditional SISR approaches [23,24] usually exploit the self-similarity or self-recurrence of the input LR image. With deep learning, Dong et al. [25] introduced a SISR model with just three convolutional layers and outperformed traditional SISR methods with large margins. Kim et al. [3,4] increased the SR performance in terms of PSNR and SSIM by using very deep convolutional layers. Lai et al. [26] suggested LapSRN, which progressively restores high-frequency details with the Laplacian pyramid. Lim et al. [5] does away with unnecessary modules in residual networks such as batch normalization layers, and achieved improved performance. Zhang et al. [7] introduced a channel attention model to take care of the inter-dependency across different channels. The aforementioned methods have significantly increased SR performance in terms of PSNR and SSIM. However, these methods may be blurry or visually unpleasing to human eyes. To enhance visual quality, Johnson et al. [12] proposed perceptual loss that minimizes errors on high-level features. Ledig et al. [11] adopted the GAN framework to generate photo-realistic images. Furthermore, Wang et al. [27] introduced a relativistic adversarial loss based on a residual-in-residual dense block to produce more realistic images. Perception-based methods have succeeded in producing visually good results, but there is a limit to recovering information lost during the downsampling process.
Unlike SISR, reference-based image super-resolution (RefSR) uses an additional HR reference image as an input to restore high-frequency details. Therefore, information lost in the downsampling process can be obtained from the reference image. Zheng et al. [19] proposed RefSR-Net to combine information of both LR and reference images based on patch matching. Specifically, RefSR-Net extracts local patches from both LR and reference images and then searches for correspondences between them. After that, the resulting matches are used to synthesize a HR image. However, the patch-match-based approach has difficulty with handling non-rigid deformations and, thus, suffers from blur or grid artifacts. Using optical flow [28,29] for pixel-wise matching, Zheng et al. [30] presented CrossNet, combining a warping process and image synthesis. CrossNet can effectively handle non-rigid deformations between the input and reference images; however, it is vulnerable to large displacements. With the recent progress of neural-style transfer [31,32], Zhang et al. [33] proposed SRNTT, performing texture transfer from the reference image. This is particularly robust when an unrelated reference image is paired with an input image. Shim et al. [20] proposed SSEN, which aligns features of input and reference images using non-local blocks and deformable convolutions. Intra-image global similarities extracted from non-local blocks are utilized to estimate relative offset to relevant reference features. After that, deformable convolution operations are used to align reference features to those from the input low-resolution image. It is an end-to-end trainable network that does not require optical flow estimation or explicit patch-match. Yang et al. [21] introduced an attention-based RefSR method called TTSR, and have achieved significant performance improvements. Recently, Jiang et al. [34] presented a knowledge-distillation technique to solve the matching difficulties caused by the scale difference between the reference image and the LR input image. Lu et al. [35] introduced the MASA network for addressing the computational burden problem that may occur in LR image and reference image matching. However, these methods are sensitive to the similarity of the reference image.
Recently, Shocher et al. [22] proposed a zero-shot super-resolution that makes the CNN model flexibly adapt to the test image. In other words, parameters of the CNN model are updated during the test phase, and a CNN model optimized for the input image can be obtained. Furthermore, to reduce the number of updates, meta-learning techniques are applied in [36,37]. In this paper, as the first study to apply online learning to the RefSR problem, we achieved robust RefSR despite the difference in similarity between input and reference images.

3. Methods

In this section, we introduce our online learning methods for the RefSR problem using both SISR and RefSR models. We then describe the inference process of our method.

3.1. Online Learning

In online learning, the most important point is how to exploit input data given at the test phase. Online learning exploits self-similarity in the image. Using the characteristics of online learning to learn internal features, we proceed with high-resolution reference images with similar characteristics and use them to recover test images to improve performance.
For the RefSR problem, this is more crucial, because two kinds of input data are available: a LR image I L R and the reference image I R e f . Therefore, we develop various methods to construct pairs of train-input X and train-target (supervision) Y from multiple images (i.e., I L R , I R e f ). Although our main goal is to solve the RefSR problem in this work, we present methods to construct pairs of data D which can be used to train not only RefSR but also SISR models at the test time.

3.1.1. SISR Model

Existing SISR models require data pairs D s consisting of an input X and supervision Y (i.e., D s = X , Y ) for training. To be specific, we present three strategies, D s L R , D s P s e , and D s R e f , to construct D s . First, D s L R consists of a downsampled LR image and an input LR image and is denoted by D s L R = X : I L R , Y : I L R . Note that this is a commonly used configuration to exploit self-similarity in SISR such as ZSSR [22]. Next, D s P s e is constructed with a pseudo-HR image I ¯ H R obtained from a pre-trained SR model P ϕ ( · ) as follows:
I ¯ H R = P ϕ ( I L R ) ,
where ϕ is the pre-trained network parameter. Then, we downsample I ¯ H R to construct a pair of training samples, and the set is defined as D s P s e = X : I ¯ H R , Y : I ¯ H R . Finally, we utilize a reference image for D s R e f . Similar to D s L R and D s P s e , a downsampled image and the original reference images are paired as D s R e f = X : I R e f , Y : I R e f . Using these three data pairs acquired in the test phase, pre-trained parameters θ s of a SISR model SISR θ s ( · ) are updated by minimizing the following loss function:
L θ s ( x , y ) = E | | SISR θ s ( x ) y ) | | ,
where x and y are extracted patches from X and Y in D s . Note that the aforementioned data pairs can be used individually or in combination.

3.1.2. RefSR Model

In order to train RefSR models, a pair of data D r that comprises an input X, a reference R, and supervision Y (i.e., D r = X , R , Y ) is required. Thanks to the additional input R, it is possible to construct more diverse data pairs than SISR models, and we propose four methods to construct D r : D r L R , D r P s e , D r R e f 1 , and D r R e f 2 . First, D r L R is composed of a downsampled LR image, a reference image, and an input LR image, and denoted by D r L R = X : I L R , R : I R e f , Y : I L R (cf., D s L R ). Similarly, we define the second data pair D r P s e = X : I ¯ H R , R : I R e f , Y : I ¯ H R by including I R e f to D s P s e as a reference R. In addition, we can utilize a downsampled reference image I R e f and the original one I R e f as an input X and supervision Y, respectively, and an input LR image as a reference R to make the third data pair D r R e f 1 = X : I R e f , R : I L R , Y : I R e f . Finally, we replace I L R in D r R e f 1 with I ¯ H R to make the last data pair D r R e f 2 = X : I R e f , R : I ¯ H R , Y : I R e f . Note that D r R e f 1 and D r R e f 2 are extended from D s R e f by adding I L R and I ¯ H R as the reference. With these data pairs, we can update network parameters θ r of the pre-trained RefSR model RefSR θ r ( · ) with the following loss function:
L θ r ( x , r , y ) = E | | RefSR θ r ( x , r ) y ) | | ,
where x, r and y are extracted patches from X, R, and Y in D r . Similar to SISR models, data pairs can be used individually or combined for online learning. The data pairs for online learning of SISR and RefSR models are summarized in Table 1.

3.2. Inference

With the updated parameters of SISR or RefSR models at the test stage, we estimate the final super-resolved output image as follows:
O ¯ s = SISR θ s ( I L R ) , O ¯ r = RefSR θ r ( I L R , I R e f ) .
Notably, unlike RefSR, SISR models are updated using the reference image in the online learning phase, but the reference image is not used for the final inference.

4. Experiments

In this section, we describe implementation details and demonstrate both quantitative and qualitative comparisons with existing methods. We also provide various empirical analyses, including experiments according to the similarity between the reference image and the input LR image and experiments using non-bicubic degradation LR images.

4.1. Implementation Details

For both SISR and RefSR models, we used the CUFED dataset [33], which consists of 11,871 pairs of input and reference images to pre-train the models for × 4 upscaling. As baseline SISR models, we have adopted light-weight versions of SimpleNet [22], RCAN [7], and EDSR [5] for fast execution time. Specifically, the number of residual blocks is reduced from 20 to 6 for the RCAN, and from 32 to 16 for EDSR with 64 feature dimensions. Each model is trained for 100 epochs with 32 batch sizes. We use all training data, including both HR and reference images. For RefSR models, SSEN [20] and TTSR [21] are adopted. SSEN is trained for 200 epochs with a batch size of 32, and TTSR is trained for 200 epochs with a batch size of 9. In the online learning phase, the CUFED5 dataset [33] is used. It consists of 126 groups of images, and each group contains a HR image and 5 reference images with different levels of similarity. Images are augmented with random crop (128 × 128), rotation, and flip. The initial learning rate is set to 1 × 10 4 for ADAM, and we multiply by 0.1 when the loss values stop decreasing [22]. Our method is implemented using PyTorch on Ubuntu 16.04 with a single RTX 2080 GPU.

4.2. Experimental Results

For all experiments, we have trained SISR and RefSR models by following their original configurations to obtain the baseline models. After that, the proposed online learning is applied to verify the effectiveness of our algorithm. Note that our method does not introduce any additional modules to the baseline models. All the models are evaluated on the CUFED5 test set. We evaluate in terms of PSNR, SSIM, and LPIPS [12], and the LPIPS value is measured with the VGG model.
Table 2 shows SISR online learning results over the baseline models. Online learning with only D s L R degrades performance in all models because the size of D s L R is too small (30 × 20) to exploit abundant information. In contrast, D s P s e contains plenty of HR details useful for the inference and, thus, performance is consistently improved with D s P s e as in [38]. Notably, we see further improvement by combining D s L R with D s P s e . Different from the results using D s L R only, D s L R + D s P s e can effectively benefit from D s L R for self-similarity while keeping knowledge of HR information from D s P s e .
RefSR online learning results on the SISR baseline models are shown in Table 3. In RefSR online learning, baseline models always show performance improvements with D s R e f because it contains real high-frequency details not available in D s P s e . We achieve the best results by using both D s L R and D s R e f , rather than using either D s L R or D s R e f , respectively. Similar results are observed with RefSR online learning on the RefSR models where the best performance is mostly achieved with D r P s e + D r R e f 2 , as shown in Table 4. Figure 1 shows qualitative comparisons between existing methods and ours. Note that Figure 1f,k show superior performance over their baseline counterparts Figure 1e,j.

4.3. Empirical Analyses

  • Reference Similarity
We first analyze the effect of similarity of reference images on online learning. The CUFED5 dataset [33] provides five similarity levels, from the lowest (i.e., XL) to the highest (i.e., XH), depending on the content similarity between the reference and LR images. For both SISR and RefSR models, performance improvement is proportional to the level of similarity, and the best performance is obtained with the reference with the highest similarity XH. This result is expected, because XH reference images contain a large amount of real high-frequency details closely related to the lost details of LR images. Therefore, online learning with XH reference images can train baseline models with strong and relevant HR guidance. On the contrary, the amount of relevant information is reduced with decreasing similarity of the reference images; therefore, performance improvement also decreases, as shown in Table 3 and Table 4.
  • Pseudo HR vs. LR for Supervision
For RefSR online learning for SISR models in Table 3, we can compare two results by D s L R + D s R e f and D s P s e + D s R e f . With high similarity levels (i.e., XH and H), D s L R + D s R e f shows better performance while D s P s e + D s R e f works better for low similarity levels (i.e., M, L, and XL). For XH and H reference images, baseline networks can exploit highly relevant information from them, as we inspected. However, the role of D s R e f is weakened with irrelevant reference images, while that of the combined data, D s L R or D s P s e , is relatively emphasized. Therefore, the performance improvement from D s R e f combined with D s P s e is superior, thanks to the knowledge from a pre-trained model (i.e., D s P s e ), compared to the self-supervision (i.e., D s L R ). Meanwhile, for RefSR online learning for RefSR models, fine-tuning with I ¯ H R achieves better overall performance than fine-tuning with I L R . In other words, for all similarity levels, D r P s e + D r R e f 2 shows better performance than D r L R + D r R e f 1 , as reported in Table 4. The reason for this is that I ¯ H R has a relatively similar resolution to I R e f than I L R ; thus, it is much easier to align and combine with the information in the reference image.
  • Non-Bicubic Degradation
We further validate our algorithm with a LR image with non-bicubic degradation for both RCAN (SISR) and TTSR (RefSR) models. For non-bicubic × 4 degradation, we have utilized isotropic ( g w ) and anisotropic ( g a n i ) Gaussian kernels of width w with direct ( g d ) and bicubic ( g b ) subsampling methods presented in MZSR [36]. Table 5 shows that RCAN and TTSR pre-trained with bicubic degradation produce inferior SR results because they cannot handle non-bicubic degradation. However, RCAN and TTSR models can achieve substantial performance gains with the proposed online learning if the non-bicubic degradation model is given during the online learning (i.e., Non-blind). Moreover, RCAN and TTSR can be improved during the online learning in a blind manner that is conducted using each input obtained by downsampling with a random kernel [36]. Figure 2 shows qualitative non-bicubic comparisons between existing methods and ours. Therefore, we conclude that the proposed method can handle any type of degradation (i.e., bicubic and non-bicubic), regardless of the awareness of the degradation kernel (i.e., blind and non-blind).

5. Conclusions

We have proposed an online learning algorithm for RefSR to exploit various types of data for network adaptation in the test stage. The proposed method has brought significant performance improvements to both SISR and RefSR models without introducing any additional network parameters. Specifically, various types of data pairs are proposed using input LR, pseudo-HR, and reference HR images, and the role of each data pair is verified with different similarity levels of the reference images. Extensive experimental results demonstrate the validity, efficiency, and versatility of the proposed algorithm.

Author Contributions

Conceptualization, D.C.; formal analysis, J.P.; investigation, T.-H.K. and D.C.; data curation, B.C.; writing—original draft preparation, D.C.; writing—review and editing, T.-H.K. and D.C.; supervision, D.C. and J.P.; funding acquisition, D.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by BK21 FOUR Program by Chungnam National University Research Grant, 2021–2022.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. In Proceedings of the 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 391–407. [Google Scholar]
  2. Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
  3. Kim, J.; Lee, J.K.; Lee, K.M. Accurate Image Super-Resolution Using Very Deep Convolutional Networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
  4. Kim, J.; Lee, J.K.; Lee, K.M. Deeply-Recursive Convolutional Network for Image Super-Resolution. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1637–1645. [Google Scholar]
  5. Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced Deep Residual Networks for Single Image Super-Resolution. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1132–1140. [Google Scholar]
  6. Dai, T.; Cai, J.; Zhang, Y.; Xia, S.T.; Zhang, L. Second-Order Attention Network for Single Image Super-Resolution. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11065–11074. [Google Scholar]
  7. Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
  8. Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual Dense Network for Image Super-Resolution. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 2472–2481. [Google Scholar]
  9. Dong, C.; Loy, C.C.; He, K.; Tang, X. Image Super-Resolution Using Deep Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 2016, 38, 295–307. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Sajjadi, M.S.M.; Scholkopf, B.; Hirsch, M. EnhanceNet: Single Image Super-Resolution through Automated Texture Synthesis. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4501–4510. [Google Scholar]
  11. Ledig, C.; Theis, L.; Huszar, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
  12. Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 694–711. [Google Scholar]
  13. Liu, C.; Sun, D. A Bayesian approach to adaptive video super resolution. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, 20–25 June 2011; pp. 209–216. [Google Scholar]
  14. Caballero, J.; Ledig, C.; Aitken, A.P.; Acosta, A.; Totz, J.; Wang, Z.; Shi, W. Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2848–2857. [Google Scholar]
  15. Yue, H.; Sun, X.; Yang, J.; Wu, F. Landmark Image Super-Resolution by Retrieving Web Images. IEEE Trans. Image Process. (TIP) 2013, 22, 4865–4878. [Google Scholar]
  16. Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; pp. 1150–1157. [Google Scholar]
  17. Wu, Y.; Ma, W.; Gong, M.; Su, L.; Jiao, L. A Novel Point-Matching Algorithm Based on Fast Sample Consensus for Image Registration. IEEE Geosci. Remote Sens. Lett. 2014, 12, 43–47. [Google Scholar] [CrossRef]
  18. Sarlin, P.E.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperGlue: Learning Feature Matching With Graph Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 4938–4947. [Google Scholar]
  19. Zheng, H.; Ji, M.; Han, L.; Xu, Z.; Wang, H.; Liu, Y.; Fang, L. Learning Cross-scale Correspondence and Patch-based Synthesis for Reference-based Super-Resolution. In Proceedings of the 28th British Machine Vision Conference (BMVC), Imperial College, London, 4–7 September 2017; pp. 138.1–138.13. [Google Scholar]
  20. Shim, G.; Park, J.; Kweon, I.S. Robust Reference-Based Super-Resolution With Similarity-Aware Deformable Convolution. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8425–8434. [Google Scholar]
  21. Yang, F.; Yang, H.; Fu, J.; Lu, H.; Guo, B. Learning Texture Transformer Network for Image Super-Resolution. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 5790–5799. [Google Scholar]
  22. Shocher, A.; Cohen, N.; Irani, M. “Zero-Shot” Super-Resolution using Deep Internal Learning. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3118–3126. [Google Scholar]
  23. Glasner, D.; Bagon, S.; Irani, M. Super-resolution from a single image. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 349–356. [Google Scholar]
  24. Zontak, M.; Irani, M. Internal statistics of a single natural image. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, 20–25 June 2011; pp. 977–984. [Google Scholar]
  25. Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a Deep Convolutional Network for Image Super-Resolution. In Proceedings of the 13th European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 184–199. [Google Scholar]
  26. Lai, W.; Huang, J.; Ahuja, N.; Yang, M. Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5835–5843. [Google Scholar]
  27. Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Loy, C.C. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In Proceedings of the 2018 European Conference on Computer Vision Workshops (ECCVW), Munich, Germany, 8–14 September 2018; pp. 63–79. [Google Scholar]
  28. Dosovitskiy, A.; Fischer, P.; Ilg, E.; Häusser, P.; Hazirbas, C.; Golkov, V.; van der Smagt, P.; Cremers, D.; Brox, T. FlowNet: Learning Optical Flow with Convolutional Networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 2758–2766. [Google Scholar]
  29. Ilg, E.; Mayer, N.; Saikia, T.; Keuper, M.; Dosovitskiy, A.; Brox, T. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1647–1655. [Google Scholar]
  30. Zheng, H.; Ji, M.; Wang, H.; Liu, Y.; Fang, L. CrossNet: An End-to-end Reference-based Super Resolution Network using Cross-scale Warping. In Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 88–104. [Google Scholar]
  31. Gatys, L.; Ecker, A.S.; Bethge, M. Texture Synthesis Using Convolutional Neural Networks. In Proceedings of the Twenty-ninth Conference on Neural Information Processing Systems (NeurIPS), Montreal, Quebec, Canada, 7–12 December 2015; pp. 262–270. [Google Scholar]
  32. Gatys, L.A.; Ecker, A.S.; Bethge, M. Image Style Transfer Using Convolutional Neural Networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
  33. Zhang, Z.; Wang, Z.; Lin, Z.; Qi, H. Image Super-Resolution by Neural Texture Transfer. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7974–7983. [Google Scholar]
  34. Jiang, Y.; Chan, K.C.; Wang, X.; Loy, C.C.; Liu, Z. Robust Reference-based Super-Resolution via C2-Matching. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 2103–2112. [Google Scholar]
  35. Lu, L.; Li, W.; Tao, X.; Lu, J.; Jia, J. MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 6368–6377. [Google Scholar]
  36. Soh, J.W.; Cho, S.; Cho, N.I. Meta-Transfer Learning for Zero-Shot Super-Resolution. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 3516–3525. [Google Scholar]
  37. Park, S.; Yoo, J.; Kim, J.; Cho, D.; Kim, T.H. Fast Adaptation to Super-Resolution Networks via Meta-learning. In Proceedings of the 16th European Conference on Computer Vision (ECCV), Glasgow, United Kingdom, 23–28 August 2020; pp. 754–769. [Google Scholar]
  38. Yoo, J.; Kim, T.H. Self-Supervised Adaptation for Video Super-Resolution. arXiv 2021, arXiv:2103.10081. [Google Scholar]
Figure 1. Qualitative comparison of RefSR methods on the CUFED5 datasets. (a) GT images. (bk) Results of Bicubic, SimpleNet [22], EDSR [5], RCAN [7], Ours+RCAN [7], SRNTT [33], SRNTT- 2 [33], SSEN [20], TTSR-rec [21], and Ours+TTSR-rec [21], respectively.
Figure 1. Qualitative comparison of RefSR methods on the CUFED5 datasets. (a) GT images. (bk) Results of Bicubic, SimpleNet [22], EDSR [5], RCAN [7], Ours+RCAN [7], SRNTT [33], SRNTT- 2 [33], SSEN [20], TTSR-rec [21], and Ours+TTSR-rec [21], respectively.
Electronics 11 01064 g001
Figure 2. Qualitative comparison of RefSR methods on the non-bicubic CUFED5 datasets. From the top, each kernel ( g 0.2 d , g 2.0 d , g a i g d , g 1.3 b ) was used. (a) GT images. (b) RCAN [7]. (c) Ours + RCAN [7]. (d) GT images. (e) TTSR [21]. (f) Ours + TTSR [21] using D r R e f 1 . (g) Ours + TTSR [21] using D r R e f 2 .
Figure 2. Qualitative comparison of RefSR methods on the non-bicubic CUFED5 datasets. From the top, each kernel ( g 0.2 d , g 2.0 d , g a i g d , g 1.3 b ) was used. (a) GT images. (b) RCAN [7]. (c) Ours + RCAN [7]. (d) GT images. (e) TTSR [21]. (f) Ours + TTSR [21] using D r R e f 1 . (g) Ours + TTSR [21] using D r R e f 2 .
Electronics 11 01064 g002
Table 1. Online learning data pairs for SISR and RefSR models.
Table 1. Online learning data pairs for SISR and RefSR models.
ModelSISRRefSR
Data Pair D s LR D s Pse D s Ref D r LR D r Pse D r Ref 1 D r Ref 2
X I L R I ¯ H R I R e f I L R I ¯ H R I R e f I R e f
  R--- I R e f I R e f I L R I ¯ H R
  Y I L R I ¯ H R I R e f I L R I ¯ H R I R e f I R e f
Table 2. SISR online learning results on SISR models.
Table 2. SISR online learning results on SISR models.
ModelMethodPSNRSSIMLPIPS
SRCNN [9]Pre-trained25.4750.7370.3369
D s L R 25.3790.7320.3388
D s P s e 25.5630.7410.3273
D s L R + D s P s e 25.5590.7410.3275
VDSR [3]Pre-trained25.6600.7460.3332
D s L R 25.5000.7400.3229
D s P s e 25.7090.7480.3256
D s L R + D s P s e 25.7340.7490.3245
SimpleNet [22]Pre-trained25.8000.7530.3267
D s L R 25.7270.7500.3128
D s P s e 25.9410.7570.3152
D s L R + D s P s e 25.9580.7570.3136
EDSR [5]Pre-trained26.1980.7710.2955
D s L R 26.1320.7650.2897
D s P s e 26.4220.7740.2956
D s L R + D s P s e 26.4400.7750.2932
RCAN [7]Pre-trained26.2430.7740.2906
D s L R 26.1470.7670.2883
D s P s e 26.5000.7770.2912
D s L R + D s P s e 26.5120.7780.2892
Table 3. RefSR online learning results on SISR models.
Table 3. RefSR online learning results on SISR models.
ModelMethodSimilarity
XLLMHXH
PSNRSSIMLPIPSPSNRSSIMLPIPSPSNRSSIMLPIPSPSNRSSIMLPIPSPSNRSSIMLPIPS
D s R e f 25.8880.7550.315925.9320.7560.315325.9250.7570.314825.9900.7580.314026.0460.7600.3127
Ours + D s L R + D s R e f 25.8940.7550.313425.9600.7570.311925.9500.7580.311526.0030.7580.310726.0580.7610.3093
SimpleNet [22] D s P s e + D s R e f 25.9360.7570.314725.9630.7580.314525.9790.7580.314325.9850.7580.314026.0180.7590.3134
D s L R + D s P s e + D s R e f 25.9730.7580.313125.9910.7580.313025.9970.7590.313026.0100.7590.312926.0490.7600.3122
D s R e f 26.3540.7720.295926.4180.7730.293726.4170.7740.293226.5120.7760.292226.6450.7800.2888
Ours + D s L R + D s R e f 26.3850.7730.288926.4380.7750.287526.4440.7750.287526.5530.7770.286126.6990.7820.2833
EDSR [5] D s P s e + D s R e f 26.4520.7750.294926.4670.7750.294426.5000.7760.293526.4970.7760.293826.5590.7780.2926
D s L R + D s P s e + D s R e f 26.4620.7750.292526.4840.7760.292226.5080.7760.291626.5220.7760.291726.5770.7780.2902
D s R e f 26.4020.7730.291926.4650.7750.290626.4650.7750.290026.5810.7780.288626.7030.7820.2856
Ours + D s L R + D s R e f 26.4180.7740.286226.4990.7770.285326.5050.7770.284526.6350.7800.282826.8100.7850.2796
RCAN [7] D s P s e + D s R e f 26.5110.7770.290826.5470.7780.290126.5670.7780.290126.5890.7790.289526.6340.7810.2887
D s L R + D s P s e + D s R e f 26.5430.7780.289026.5620.7790.288526.5740.7790.288426.6070.7800.287726.6810.7820.2862
Table 4. RefSR online learning results on RefSR models.
Table 4. RefSR online learning results on RefSR models.
ModelMethodSimilarity
XLLMHXH
PSNRSSIMLPIPSPSNRSSIMLPIPSPSNRSSIMLPIPSPSNRSSIMLPIPSPSNRSSIMLPIPS
SRNTT [33]Pre-trained25.140.7290.247625.070.7200.241025.060.7280.235425.130.7340.229425.170.7340.2099
SRNTT- 2  [33]Pre-trained25.870.7570.294925.880.7580.291625.900.7580.289325.970.7600.285626.060.7650.2758
SSEN [20]Pre-trained26.1560.7680.297926.1510.7680.298026.1490.7680.297926.1540.7680.297726.1520.7690.2976
D r L R 26.1090.7640.287926.1070.7640.288126.1160.7640.288926.1080.7640.288326.1120.7640.2884
D r P s e 26.4340.7740.295126.4590.7750.294626.4800.7750.294426.4800.7750.294026.5270.7770.2930
Ours + D r R e f 1 26.2260.7670.293126.2060.7680.292526.2410.7680.292126.2840.7690.290326.2760.7700.2895
SSEN [20] D r R e f 2 26.3430.7710.294626.3830.7720.293626.4750.7740.292026.5090.7750.291126.6750.7800.2874
D r L R + D r R e f 1 26.2050.7670.285226.2060.7670.285626.2210.7670.285426.2610.7680.284326.2570.7690.2946
D r P s e + D r R e f 2 26.3920.7730.295526.4600.7740.294226.4750.7740.294626.5050.7750.293526.5680.7770.2924
TTSR-rec [21]Pre-trained26.5860.7830.282526.6230.7850.280026.6850.7870.278226.7870.7890.275927.0390.7990.2653
D r L R 26.4070.7750.271126.4550.7760.268926.5020.7780.267526.5790.7800.264326.8120.7880.2545
D r P s e 26.8220.7860.281526.8660.7880.279226.9370.7900.278227.0270.7910.276027.3370.8010.2663
Ours + D r R e f 1 26.5400.7780.279126.5630.7810.275726.6220.7820.275026.7690.7850.271226.9860.7940.2614
TTSR-rec [21] D r R e f 2 26.6580.7820.281826.7170.7850.278826.8360.7870.275726.9590.7900.273027.3830.8020.2578
D r L R + D r R e f 1 26.4970.7770.269626.5220.7790.266826.5920.7800.266026.6980.7820.263526.9000.7900.2529
D r P s e + D r R e f 2 26.8450.7860.281626.8770.7880.279626.9800.7900.278027.0560.7920.276027.4000.8010.2663
Table 5. Online learning results with non-bicubic degradation.
Table 5. Online learning results with non-bicubic degradation.
ModelKernelBlindMethodPSNRSSIMLPIPSModelKernelBlindMethodPSNRSSIMLPIPS
Ours+
EDSR [5]
g 0.2 d -Pre-trained18.7540.5340.4068Ours+
SSEN [20]
g 0.2 d -Pre-trained18.5380.5210.4142
Non-blind D s L R + D s R e f 24.3350.7260.3035Non-blind D r R e f 1 23.5650.6940.3431
D r R e f 2 24.1550.7200.3239
g 2.0 d -Pre-trained21.3870.6060.3771 g 2.0 d -Pre-trained20.7060.5860.3541
Non-blind D s L R + D s R e f 26.2630.7720.2665Non-blind D r R e f 1 25.4360.7410.2891
D r R e f 2 26.1050.7650.2838
g a n i d -Pre-trained21.3640.5930.3639 g a n i d -Pre-trained21.2690.5900.3633
Non-blind D s L R + D s R e f 26.1640.7650.2764Non-blind D r R e f 1 25.2130.7280.3062
D r R e f 2 25.8820.7530.2953
g 1.3 b -Pre-trained25.5950.7410.3288 g 1.3 b -Pre-trained25.5220.7400.3273
Non-blind D s L R + D s R e f 26.6550.7800.2663Non-blind D r R e f 1 26.0100.7580.2773
D r R e f 2 26.5690.7780.2789
--Pre-trained21.8960.6080.3892--Pre-trained21.8360.6060.3881
Blind D s L R + D s R e f 24.3540.6970.3459Blind D r R e f 1 23.9530.6760.3654
D r R e f 2 24.2010.6850.3608
Ours+
RCAN [7]
g 0.2 d -Pre-trained17.9380.4970.4111Ours+
TTSR-rec [21]
g 0.2 d -Pre-trained18.4150.5240.4039
Non-blind D s L R + D s R e f 24.5320.7370.2910Non-blind D r R e f 1 23.4890.6880.3423
D r R e f 2 24.1680.7170.3232
g 2.0 d -Pre-trained21.1310.5970.3335 g 2.0 d -Pre-trained21.2110.6090.3127
Non-blind D s L R + D s R e f 26.5450.7830.2586Non-blind D r R e f 1 25.9110.7600.2647
D r R e f 2 26.5610.7840.2624
g a n i d -Pre-trained21.1980.5870.3609 g a n i d -Pre-trained21.1990.5960.3367
Non-blind D s L R + D s R e f 26.4140.7750.2679Non-blind D r R e f 1 25.5120.7410.2841
D r R e f 2 26.1990.7680.2754
g 1.3 b -Pre-trained25.4840.7380.3314 g 1.3 b -Pre-trained26.1470.7670.2912
Non-blind D s L R + D s R e f 26.9090.7900.2597Non-blind D r R e f 1 26.5990.7810.2471
D r R e f 2 26.9890.7960.2471
--Pre-trained21.7980.6060.3914--Pre-trained21.8200.6150.3603
Blind D s L R + D s R e f 24.2770.6920.3480Blind D r R e f 1 23.9280.6720.3535
D r R e f 2 24.0100.6840.3461
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Chae, B.; Park, J.; Kim, T.-H.; Cho, D. Online Learning for Reference-Based Super-Resolution. Electronics 2022, 11, 1064. https://doi.org/10.3390/electronics11071064

AMA Style

Chae B, Park J, Kim T-H, Cho D. Online Learning for Reference-Based Super-Resolution. Electronics. 2022; 11(7):1064. https://doi.org/10.3390/electronics11071064

Chicago/Turabian Style

Chae, Byungjoo, Jinsun Park, Tae-Hyun Kim, and Donghyeon Cho. 2022. "Online Learning for Reference-Based Super-Resolution" Electronics 11, no. 7: 1064. https://doi.org/10.3390/electronics11071064

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop