High-Magnification Super-Resolution Reconstruction of Image with Multi-Task Learning

Li, Yanghui; Zhu, Hong; Yu, Shunyuan

doi:10.3390/electronics11091412

Open AccessArticle

High-Magnification Super-Resolution Reconstruction of Image with Multi-Task Learning

by

Yanghui Li

¹

,

Hong Zhu

^1,* and

Shunyuan Yu

²

¹

Faculty of Automation and Information Engineering, Xi’an University of Technology, Xi’an 710048, China

²

Institute of Electronic and Information Engineering, Ankang University, Ankang 725000, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(9), 1412; https://doi.org/10.3390/electronics11091412

Submission received: 2 April 2022 / Revised: 25 April 2022 / Accepted: 26 April 2022 / Published: 28 April 2022

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Single-image super-resolution technology has made great progress with the development of the convolutional neural network, but most of the current super-resolution methods do not attempt high-magnification image super-resolution reconstruction; only reconstruction with ×2, ×3, ×4 magnification is carried out for low-magnification down-sampled images without serious degradation. Based on this, this paper proposed a single-image high-magnification super-resolution method, which extends the scale factor of image super-resolution to high magnification. By introducing the idea of multi-task learning, the process of the high-magnification image super-resolution process is decomposed into different super-resolution tasks. Different tasks are trained with different data, and network models for different tasks can be obtained. Through the cascade reconstruction of different task network models, a low-resolution image accumulates reconstruction advantages layer by layer, and we obtain the final high-magnification super-resolution reconstruction results. The proposed method shows better performance in quantitative and qualitative comparison on the benchmark dataset than other super-resolution methods.

Keywords:

multi-task learning; high-magnification; single-image super-resolution; convolutional neural network

1. Introduction

Single-image super-resolution is a classical problem in computer vision. In essence, it is the solution of ill-posed problems. A better solution can usually be obtained by constraining the solution space with a priori information. Many current studies assume that the low-resolution image is obtained by bicubic interpolation down-sampled of the label image. However, in practical application, blur, noise and other degradation factors are inevitable.

Currently, the convolution neural network is mainly used to reconstruct images in the research of single-image super-resolution. This kind of method establishes the corresponding relationship between the low-resolution image and high-resolution image by the feature mapping of a neural network to achieve the purpose of reconstructing the image. Supervised learning methods include [1,2,3,4,5]. These methods take the low-resolution image as the input and the labeled image as the output, learn the mapping relationship from input to output through a neural network, and then apply this mapping relationship to the unknown image to be reconstructed. The mainstream supervised learning method is the super-resolution method because of its relatively simple learning process and high quality of reconstruction results. Methods [6,7,8,9] constructed different super-resolution network models based on the convolutional neural network, while methods [3,4] adopted the recursive neural network structure, and method [10] introduced a cyclic neural network to realize final results’ prediction. The methods [11,12,13] based on the generative adversarial network [14] can generate super-resolution reconstruction images with visual perception. Weakly supervised learning performs super-resolution reconstruction without defining degradation conditions using unmatched low-resolution images and high-resolution images. Methods [15,16] realize super-resolution reconstruction without using low-resolution images and high-resolution image pairs. Unsupervised learning is the same as weak supervised learning, only using unmatched image pairs for network training. Method [17] achieved the super-resolution reconstruction of the image by using the internal block of a recursive circular image to train the network. Method [18] utilizes the prior information of the neural network itself to perform super-resolution reconstruction of the image after initializing the network. Method [19] proposes a two-stage multi-task network to focus on image contrast and local brightness during network training. Method [20] proposed a three-layer residual attention network to solve the interaction problem of different dimensional features in the middle layers of the network during training.

All the above methods can achieve good image super-resolution results, but their reconstruction scale can only achieve ×4 magnification at the highest. There are also some methods [21,22,23,24] to achieve ×8 magnification. However, these are only a tentative expansion of the super-resolution work of images with more than ×4 magnification, and there is no technical research on the super-resolution of high-magnification. For down-sampled images over ×8 magnification, the loss of texture information and detail information is very serious. These current methods do not have a special design for high-magnification image super-resolution work, and the reconstruction ability of high-magnification images is limited. From the perspective of information content, the information contained in the down-sampled image over ×8 magnification will fall precipitously, and the amount of information available in the reconstruction process is very small, resulting in unsatisfactory reconstruction results. The high-magnification super-resolution problem is extremely challenging, because it estimates hundreds or thousands of pixels through one pixel, and the result is not unique due to a few prior conditions. Currently, few methods are involved in the high-magnification super-resolution reconstruction of images.

Based on the EDSR [25] method, our paper introduces the idea of multi-task learning and proposes a high-magnification image super-resolution reconstruction method. Firstly, we decompose the high-magnification image super-resolution problem into different tasks. In the framework of multi-task learning, we use specific training data to train different network models in each task, in which each network model has ×2 magnification. Because the training data used by the training network model contain different texture characteristics, the network model can learn the reconstruction ability to adapt to different textures. Secondly, through the cascade of different network models obtained by training, the advantages of reconstruction results are accumulated step by step, and the high-magnification super-resolution reconstruction of high-quality images is finally realized. As shown in Figure 1, the super-resolution reconstruction results of the reconstruction ×32 times of our method in this paper and other methods are compared. We choose the image named img072 from the Urban100 dataset and show each version of the full image in Figure 2. Figure 2 compares the original version, ×32-times down-sampled full image with reconstruction results using the ×32-times super-resolution method in this paper. Among them, the ×32-times down-sampled image is stretched to the same size as the original image, which can intuitively show that the texture and detail information of the down-sampled image are seriously lost.

We test the proposed method on different benchmark datasets, compare it with the current mainstream super-resolution work, and use the peak signal-to-noise ratio (PSNR) as the evaluation index. The experimental results show that our method is better than other methods on the problem of high-magnification super-resolution of images.

2. Related Work

In this section, we mainly introduce the general methods of super-resolution reconstruction and high-magnification super-resolution reconstruction methods.

2.1. General Methods of Super-Resolution Reconstruction

As a pioneering work of convolutional neural network in single image super-resolution, the SRCNN [1] method uses a three-layer network structure to learn the mapping relationship between the input image and output image. Compared with the traditional method, this method improves the calculation accuracy. However, this method uses an interpolated image as input, which increases the calculation amount of the model. Method [2] proposed a sub-pixel convolution layer to enlarge the image through pixel rearrangement, which reduced the computational complexity of the network and achieved better reconstruction quality. These methods are based on the convolutional neural network. Compared with the traditional interpolation methods, they have greatly improved the computational accuracy and reconstruction quality.

Residual network ResNet was proposed by He et al. [26]. This method solves the gradient disappearance problem caused by the increase in network layers. Through the stacking of residual blocks, the network can reach a deeper depth. Based on the residual network, Kim et al. [7] proposed a very deep network structure, VDSR, which avoids the repeated learning of low-frequency information and reduces the computational complexity of the network model. Lim et al. [25] proposed the EDSR method, which removes the batch normalization layer in the residual block, reduces the use of memory, and improves the image reconstruction effect. Ahn et al. [27] proposed the residual cascade method, which uses fewer parameters and operations, and achieves performance equivalent to the current advanced methods. In the RCAN method proposed by Zhang et al. [28], an attention mechanism is added to residual channels and different weights are given to different channels according to different importance. The convergence effect of the network is better than that of the network with only stacked residual blocks. Liu et al. [29] proposed RFANet to effectively extract hierarchical features on residual branches, and used spatial attention blocks to focus on key spatial image features. Because of the introduction of residual learning, the network can attempt a deeper structure and add other more refined calculation modules to the network, so that the network has the ability to reconstruct high-frequency details. However, with the increase in the number of network layers, the computational complexity of the network also increases.

In order to generate super-resolution results that are more consistent with the visual experience, Ledig et al. [11] proposed the SRGAN model based on the generative adversarial network, and took the sum of confrontation loss and perceptual loss as the loss function to generate a clearer, more natural, and visually consistent reconstructed image. Park et al. [30] proposed the SRfeat method to generate a more realistic reconstruction effect by adding a discriminator outside the feature domain to carry out the confrontation training between the extracted features and the high-resolution image features. Sajjadi et al. [31] proposed EnhanceNet and used a feed-forward fully convolutional neural network in adversarial training to increase the high-frequency texture of the reconstructed image, which achieves a significant improvement in image quality. The method proposed by Maeda et al. [32] adopted a noise correction network and pseudo-paired super-resolution network to achieve super-resolution of images without pairing training datasets. The residual network, as the basis of the generative adversarial network generator, usually needs to be pre-trained to stabilize the generative adversarial network. The generator generates the detailed information of the image through adversarial training, and the discriminator makes the details of the generated image conform with visual experience.

In addition, Zhang et al. [33] use densely connected convolutional base layers in a dense residual structure to extract local features, which makes the training of the network more stable. Zheng et al. [34] efficiently extracted local long and short path features by incorporating augmentation and compression units into the distillation block of the network. Yang et al. [35] proposed a super-resolution texture transformation network, which discovered the deep feature response of low-resolution images and reference images through an attention mechanism, accurately transferred texture features, and achieved texture recovery of reconstructed images.

2.2. High-Magnification Super-Resolution Reconstruction Method

The MSRN method proposed by Li et al. [36] uses multi-scale residual blocks to realize image reconstruction with different magnification. The Meta-SR method proposed by Hu et al. [37] realizes the image super-resolution of any scale. Guo et al. [21] proposed a binary regression method to restrict the space of possible solutions and learn additional binary regression mapping estimation to provide closed-loop supervision for the image reconstruction process. Lai et al. [22] proposed the LapSRN method using the Laplacian pyramid structure, which enlarges the reconstruction results step by step. The network structure also abandons the interpolation of the images fed into the network, which improves the computational efficiency. Dai et al. [23] proposed a second-order attention network for feature correlation learning and feature expression. Niu et al. [24] proposed a global attention network to model the overall relationship between the convolution layer, channel, and location. The deep back projection network proposed by Haris et al. [38] utilizes iterative up-sampling and down-sampled layers to provide an error feedback mechanism that improves the quality of reconstruction results. These methods have carried out super-resolution reconstruction of more than ×4 magnification, or super-resolution reconstruction of arbitrary magnification, which provides a reference and ideas for high-magnification super-resolution work.

With the application and development of neural networks in super-resolution research, new network models are constantly proposed. Several methods introduced here are based on different network structures and solve the problems existing in current super-resolution research from different perspectives.

2.3. Other Image Enhancement Work

In addition to image super-resolution, image enhancement also includes image denoising, deblurring, and restoration. Image denoising refers to the process of reducing noise in the image. Yu et al. [39] proposed a deep iterative down-up convolutional neural network for image denoising, which can repeatedly reduce and improve the resolution of feature mapping. Jin et al. [40] proposed an unsupervised deraining generative adversarial network to solve the impact of unsupervised low-quality image restoration on the final result. Chen et al. [41] proposed a non-blind deblurring network to restore night blurred images. Image restoration is the process of the reconstruction or restoration of degraded images. Li et al. [42] introduced the concept of separation feature learning, realized the feature-level division of mixed distortion, eliminated the interference between mixed features, and achieved the restoration of mixed distorted images.

2.4. Image Quality Assessment

PSNR and SSIM (structural similarity) are usually used for image quality assessment, which can well reflect the difference between the result image and the real image. However, there are two problems in the practical application. Firstly, the original reference image is not available in the practical application. Secondly, they are proposed based on the difference between the distorted image and the original image, ignoring the evaluation of image quality by human vision. Zhou et al. [43] proposed a method combining structural fidelity and statistical naturalness to evaluate image quality. This method not only measures the quantitative index of image quality, but also takes into account the visual experience of the image. Zhou et al. [44] proposed a deep two-stream convolution network, which does not need the original reference image for image quality assessment. Fang et al. [45] designed a convolution neural network including a convolution layer, pooling layer, full connection layer, and regression layer for blind image quality assessment.

3. Proposed Method

The method in this paper is mainly proposed for the high-magnification super-resolution reconstruction of a single image. In this section, we introduce the implementation process of the high-magnification super-resolution reconstruction of a single image in detail. We first introduce the concept of multi-task learning into the problem of high-magnification image super-resolution, and then construct the network structure of different models cascaded under the multi-task learning strategy. Finally, we compare the information amount contained in different magnification down-sampled images, and show the suitability of the method to the high-magnification super-resolution reconstruction of a single image.

3.1. Multi-Task Learning and Different Model Training

Single-task learning refers to designing a network structure for a specific task, and obtaining a deep learning model through dataset training. This process is generally carried out independently. Compared with single-task learning, multi-task learning [46] puts multiple related tasks together for learning, so as to use the useful information contained in multiple learning tasks to help each learning task obtain a more accurate learner. Multi-task learning can share a network model and reduce memory usage. The result can be obtained after one forward calculation, and the model has a fast inference speed. For related tasks, information can be shared to complement each other and improve the performance of different tasks.

The internal implicit relationship among multiple tasks is used by different tasks through parameter sharing, which is the main application of multi-task learning in deep learning. Hard parameter sharing and soft parameter sharing are the two main parameter sharing methods for multi-task learning in deep learning.

The hard parameter sharing method is commonly used in neural networks. The parameters of shared layers are shared by various tasks. Each task has its own independent parameters in a specific task layer, as shown in Figure 3. Through hard parameter sharing, the network model greatly reduces the risk of overfitting. Because multiple tasks are being learned simultaneously by the network model, the hidden layer parameters will capture the common characteristics of each task as much as possible in the optimization process, and the risk of overfitting of a single task in the network training process is greatly reduced.

Soft parameter sharing is another means of multi-task learning. As shown in Figure 4, different tasks have their own model structures and parameters, and regularization terms are added among model parameters of different tasks through constrained layers to guide parameter optimization in a similar direction.

The problem of high-magnification image super-resolution is a relatively complex reconstruction problem. After the image is down-sampled at high magnification, high-frequency information and texture details are seriously lost. In this paper, by introducing a multi-tasking learning idea, based on the EDSR [25] network, the high-magnification super-resolution problem is decomposed into different tasks to solve different degradation problems, cascade the network model of each task, and realize the high-magnification image super-resolution reconstruction, which effectively improves the detail and quality of the reconstruction results.

As shown in Figure 5, the left is the training process of the EDSR benchmark model with a reconstruction magnification of ×2. The low-resolution image of the training data is obtained from the original image (the corresponding label image) by bicubic interpolation and down-sampled 2 times. This model is the benchmark model; we call this model Net0. Based on the Net0 model, other network models are trained. Because the parameters contained in the Net0 model are fixed, although other models are trained independently, they all take the Net0 model as the pre-training model for training and share the parameters of the Net0 model. This process is similar to the hard parameter sharing method in multi-task learning. The Net0 model is similar to the sharing layer and provides common parameters for each specific task. The training of each network model is a specific task in multi-task learning. Each task has its own independent parameters. Different tasks correspond to different network models. The right side of Figure 5 shows this process.

The difference from multi-task learning here is that, in addition to the independence of parameters for each task, the training data involved in training different tasks are also different. The training data of Task4 are an image down-sampled by 32 times of the original image, denoted as LR32, and the corresponding label data are an image down-sampled by 16 times of the original image, denoted as LR16, and the trained network model is Net4. The training data of Task3 are LR16, and the corresponding label data are an image down-sampled by 8 times of the original image, denoted as LR8, and the trained network model is denoted as Net3. The training data of Task2 are LR8, and the corresponding label data are an image down-sampled by 4 times of the original image, which is denoted as LR4, and the trained network model is denoted as Net2. The training data of Task1 are LR4, and the corresponding label data are an image down-sampled by 2 times of the original image, denoted as LR2, and the trained network model is denoted as Net1. The above models are all trained for the texture characteristics of the images at their respective levels, so the models at each level can better adapt to the super-resolution reconstruction task of down-sampling to the images at this level.

3.2. Network Model Cascade

In order to achieve the high-magnification super-resolution reconstruction of images, the cascade of network models is adopted in this paper. Based on the Net0 model, we train multiple tasks to learn different tasks of a specific task layer, and input specific training data into different task layers to obtain different network models. These network models perform image super-resolution reconstruction in a cascaded manner. The magnification factor of each network model is ×2. When reconstructing at ×4 magnification, two network models need to be cascaded. When performing ×8 reconstruction, we cascade 3 network models, while ×16 and ×32 reconstructions require cascading 4 network models and 5 network models, respectively. In Figure 6, the network structure for super-resolution reconstruction at ×32 magnification is shown. The model parameters of each network model are different, and each network model is obtained by training data for this level.

The label image is a low-resolution image obtained by bicubic interpolation and down-sampled. The image down-sampled 32 times is denoted as LR32, and enlarged by 2 times through the first-level Net4 network model. After the image is down-sampled 32 times, the texture and edge information of the image is very seriously lost, and the Net4 model is mainly trained to deal with the serious loss of image information at this level. In the image reconstruction of this level, the Net4 model can adapt to the serious loss problem of image information, and the size of the obtained intermediate image is the same as the size of the image down-sampled 16 times, which is denoted as LR16. The intermediate result LR16 is sent to the lower-level network model, which is Net3, and the generated intermediate result is recorded as LR8. The training of Net3 also uses training data more suitable for this level, so the network model can better perceive the image texture of this level and achieve better reconstruction results. After the original image is down-sampled 16 times, the cascade reconstruction of the network model is performed. Firstly, the Net3 model is used for super-resolution reconstruction, and the obtained results are sent to the lower-level network model. The model of each level only performs super-resolution reconstruction for the data of this level, whether the image to be reconstructed is the output result of the upper-level model or the low-resolution image down-sampled into this level from the label image. The intermediate result LR8 reconstructed by the lower-level network model is recorded as LR4, and this level model is Net2. The obtained intermediate result is sent to the lower-level network Net1, and the obtained result is recorded as LR2. Net2 and Net1 are also trained with appropriate training data, and the two levels’ network models are also more suitable for the output image reconstructed by their respective superior network models and the texture characteristics of the images down-sampled to this level. LR2 is then reconstructed by the network model Net0 to obtain the final result, and the reconstruction result is recorded as SR.

The benchmark model Net0 is obtained through EDSR network training. In the process of reconstructing the image down-sampled 2 times to the original image size, the training of the benchmark model is to use the image down-sampled by 2 times of bicubic interpolation as the training data, and the label map as the corresponding high-resolution images. The benchmark model has been well adapted to the image reconstruction of 2-times down-sampling, so it is no longer necessary to replace the benchmark model Net0 with other network models here, and a better reconstruction effect can be achieved. In the cascade network of high-magnification super-resolution reconstruction of images, the reconstruction advantages of each network model are accumulated layer by layer in the cascade process until the final reconstructed image is generated.

3.3. Down-Sampled Images at Different Levels

Different levels of down-sampled images contain different image information due to different down-sampled ratios. Taking the Set5 dataset as an example, Table 1 shows the average information entropy of all images in the Set5 dataset at different down-sampled levels. It can be seen from the table that the information entropy of the image changes when the image is down-sampled by ×2 magnification, ×4 magnification and ×8 magnification, but the variation range is small, which indicates that the amount of information contained in the image is not much different. The information entropy of the image down-sampled ×16 times decreases more, the image information loss of the down-sampled ×16 decreases more than that of the image down-sampled ×8 times, and the image information loss is more serious. While the information entropy of the image down-sampled ×32 times decreases more than that of the image down-sampled ×16 times, the image information loss is more serious.

This is also the reason why the maximum magnification of the existing super-resolution methods can only achieve ×8 times reconstruction in performing super-resolution reconstruction, and the information loss of down-sampled images in super-resolution reconstruction over ×8 times is serious. As the reconstruction magnification increases, the number of pixel points of the reconstructed image estimated by a single pixel point also increases, which inevitably increases uncontrollable factors.

The content of a down-sampled image at different levels of the same dataset is the same, but the distribution probabilities of image pixels at different levels are different. As shown in Table 2, it is the cosine angle and cosine similarity between adjacent levels of down-sampled image with different magnification in the Set5 dataset. It can be seen from Table 2 that the pixel distribution probabilities of low-magnification down-sampled images are basically similar, because the low-magnification down-sampled images have lower loss of texture and edge information. With the increase in image down-sampled magnification, the similarity of the pixel distribution probability of down-sampled images at adjacent levels gradually decreases. The texture and edge information of the down-sampled image is seriously lost, and there are great differences in the pixel distribution probability of high-magnification down-sampled images at adjacent levels. Therefore, different network models are used to perform super-resolution reconstruction on images of corresponding levels, and different models can better adapt to the texture characteristics of images at this level, thereby obtaining better results.

In order to intuitively show the difference in the pixel distribution probability of down-sampled images at different levels, the cosine angle of the pixel distribution probability of down-sampled images at adjacent levels is displayed with the coordinate axis in Figure 7. The larger the included angle, the greater the difference in pixel distribution. Through the cosine included angle of pixel distribution probability between adjacent level images, it more intuitively reflects the difference in texture details contained in adjacent levels of down-sampling images.

4. Experiments

4.1. Training Dataset

The training data are from the DIV2K [47] dataset, which contains 800 training images, 100 validation images and 100 test images. We validate the model during training using 10 images with indexes 0801-0810. Set5 [48], Set14 [49], B100 [50] and Urban100 [51] are selected as benchmark datasets to test the model. We crop the edges of all images so that the length of the edges of the images is a multiple of 32, which ensures the consistency of the edge lengths of the images during down-sampling and super-resolution. The cropped label images are down-sampled step by step through bicubic interpolation to obtain low-resolution images with different down-sampled magnifications, and each stage is down-sampled 2 times. The low-resolution image down-sampled 32 times is obtained after the original image is down-sampled 5 times, and the image obtained by only one level down-sampled is an image down-sampled 2 times.

4.2. Training Details

During the training process of the cascaded network model, the training data of each layer of the network are different. Net0 reconstructs a ×2 model for the EDSR method as the base model for the cascaded network. In the training process of the Net1 model, the low-resolution image used is the image obtained by down-sampling the original image 4 times, and the corresponding label image is an image down-sampled 2 times of the original image. The image block of the model during training is 48 × 48. The minimum batch is 16, Net0 is used as a pre-training model, and Net1 is trained on the basis of Net0. The training of Net2 is similar to that of Net1. Net0 is also used as the pre-training model for model training. The difference is that the training image used in Net2 training is 8-times down-sampled of the original image, and the corresponding label image is 4-times down-sampled of the original image. Other settings are the same.

The training of Net3 also takes on Net0 as the pre-training model, and the training is performed on the basis of the pre-training model. The training dataset used is 16-times down-sampled of the original image, and the corresponding label image is down-sampled 8 times of the original image. In the training process, the image block here is 40 × 40; because the size of the low-resolution image used for training is already small, in order to obtain enough image blocks for training, the size of the training image block is reduced. Net4 also has this problem. During training, the image block size is set to 20 × 20. The trained low-resolution image is obtained by 32-times down-sampling of the original image, and the corresponding label image is obtained by 16-times down-sampling of the original image.

In the training process, we set the minimum batch to 16 and use the ADAM optimizer for training, in which we set

β_{1} = 0.9

,

β_{2} = 0.999

,

ϵ = 10^{- 8}

, and the learning rate is

10^{- 4}

. The initial parameters of different level models in the training process are fixed parameters, and the initial parameters are the same. When training models at different levels, only the size of the training image will be adjusted appropriately in order to obtain enough image blocks for model training. In the cascade process of the model, there is no additional training or fine-tuning of network parameters.

After the network model training of each level is completed, these network models are cascaded step by step, the images are sent to the network model step by step, and finally the purpose of high-magnification super-resolution reconstruction of images is realized.

4.3. Cascade Model and End-to-End Model

This section mainly compares the cascade model and the end-to-end model. The end-to-end model directly outputs the results after calculating the data input to the network without pre-processing or post-processing. This paper trains an end-to-end model with a reconstruction magnification of ×32 and an end-to-end model with a reconstruction magnification of ×16 on the basis of the EDSR model with a reconstruction magnification of ×2. The training process is consistent with the training of the network model with a reconstruction magnification of ×2. The end-to-end model and different model cascade reconstruction methods are compared on datasets such as Set5, Set14, B100 and Urban100, respectively.

As shown in Table 3, the PSNR and SSIM values of the reconstruction results of the images down-sampled 32 times and down-sampled 16 times through different model cascade methods and end-to-end methods are displayed. From Table 3, it can be intuitively reflected that in the high-magnification super-resolution reconstruction of images, compared with the end-to-end method, the reconstruction results of different model cascade methods are of higher quality. The cascade model achieves the purpose of super-resolution reconstruction by enlarging the down-sampled image step by step. Each time a network model image passes through, the image will be enlarged by 2 times. After all the network models, the image is reconstructed to the original image size. After the end-to-end network model is sent into the image to be processed, it directly outputs the final result after a series of processing, without the output of intermediate results.

For the cascaded network model method, in the process of step-by-step image reconstruction, there is a certain difference between the reconstruction results of each network model and the real results, but this difference will be partially corrected after entering the lower-level network, so that the error accumulation of reconstruction images gradually decreases. When the end-to-end network model reconstructs the image, the loss of texture information of the high-magnification down-sampled image itself is very serious. The image is directly reconstructed end-to-end, and the difference between the image and the label in the reconstruction process cannot be corrected step by step. In addition to partial texture reconstruction, the regions with serious loss of texture information are also enlarged by the end-to-end model, resulting in a decline in the quality of the reconstructed image of the end-to-end model.

4.4. Comparison of Intermediate Results between the Same Model Cascade and Different Model Cascades

The high-magnification image reconstruction can be achieved by model cascade, which can obtain better image high-magnification super-resolution results. However, the results of the same model cascades and different model cascades are different; the super-resolution reconstruction results using different model cascades are better than the super-resolution reconstruction results using the same model cascades. This is because different network models can reconstruct images according to the texture characteristics of low-resolution images at different levels. In addition to the final reconstructed image, each level will also produce its own reconstruction results. These results are the intermediate results of different network models. The reconstruction results of the superior network model are used as the input data of the subordinate network model until the final reconstruction results are obtained.

At the same down-sampled level, the comparison between the intermediate results obtained from the reconstruction of the hierarchical model in cascading the same model structure and that in cascading different model structures are shown in Table 4. The EDSR method uses the repeated cascade of model Net0 to obtain the final reconstruction result. Our method cascades different models trained by different tasks in multi-task learning to obtain the final reconstruction results. Here, we only reconstruct the down-sampled image once, i.e., the result of ×2 magnification, which is the intermediate result generated by the cascade network, not the final reconstruction result. This method is used to compare the super-resolution reconstruction ability of different models at the same down-sampled magnification. Reconstruction level “×32 to ×16” means that both the Set5 dataset and B100 dataset are down-sampled 32 times, and the super-resolution reconstruction of 2 times is carried out through network model Net0 and network model Net4, trained for this level. The size of the resulting graph is equal to the size of the down-sampled image obtained by 16-times direct down-sampling of the dataset, and the PSNR values of different model reconstruction results are calculated. The process of reconstruction level “×16 to ×8” and “×8 to ×4” is the same as that of reconstruction level “×32 to ×16”, but the down-sampled magnification is different.

It can be seen from Table 4 that the intermediate results generated by the network model trained with specific data in the process of image high-magnification super-resolution reconstruction have higher PSNR values than the intermediate results generated by the EDSR network model. The training data of the Net0 model are the data obtained by down-sampling 2 times of the original image. These data are not degraded seriously, and contain rich texture details, while the training data of the Net4 model are the data obtained by down-sampling 32 times of the original image, and these data are degraded seriously. These two kinds of data have different degrees of degradation and different details. The network models trained with these two kinds of data have different abilities to reconstruct image details, resulting in different super-resolution results of the models for the same down-sampled image. The reconstruction results of the Net3 and Net2 model differ from those of the Net0 model for the same reason. Therefore, cascading different network models has more advantages in super-resolution reconstruction at each level.

4.5. Comparative Experiment

We tested our proposed network model cascade method in different datasets. In order to calculate the PSNR value of the image, we convert the RGB channel of the image to the YCbCr channel, and take the brightness channel (Y channel) of the converted image for calculation. To fairly compare the reconstruction results of different methods and maintain the consistency between other methods and the method in this paper, the public codes of other methods are implemented in the same environment as our method. The different methods shown in Table 5 all use the cascade method to perform super-resolution reconstruction at different magnifications. In our method, different models are cascaded, and each model is trained according to the texture distribution characteristics of the down-sampled image at this level. Other methods use the same model cascaded with magnification of 2 times. The model is trained with single data, which cannot take into account the distribution characteristics of image textures at different levels. For example, in the reconstruction scale of 32 times, our method cascades five different models with a magnification of 2 times. Each model can carry out super-resolution reconstruction for different levels of images. Through the layer by layer accumulation of advantages at different levels, we finally achieved better reconstruction results. Other methods use the same five models with magnification of 2 times to cascade, because the images at each level cannot be magnified pertinently, so the quality of reconstruction results is lower than that of our proposed method.

Table 5 quantitatively shows the PSNR and SSIM values of the reconstruction results of Bicubic, SRCNN [1], FSRCNN [5], VDSR [7], LapSRN [22], CARN [27], EDSR [25] and our method in this paper under different scale factors. It can be seen that our method in this paper, compared with other methods, can not only obtain better results at high-magnification scale factors, but also obtain better results at ×8 and ×4 magnifications, indicating that the method proposed in this paper has good generalization ability.

We take the benchmark datasets Set5, Set14, B100 and Urban100 as the test datasets. Table 5 quantitatively shows the PSNR and SSIM values of the reconstruction results of different methods under different scale factors. In order to visually compare the high-magnification super-resolution reconstruction results of different methods, we selected representative images from the Set14 and B100 datasets in Figure 8. From the different images in Figure 8, it can be seen that the high-magnification super-resolution reconstruction results of our method in this paper are clearer on the boundary texture and more similar to the label image, while the reconstruction results of other methods have the problem of blurred edges.

We also selected representative images from the Urban100 dataset, as shown in Figure 9. It can be seen from Figure 9 that the reconstruction results of some methods have blurred edges and contours in performing super-resolution reconstruction with a magnification of ×32, which are difficult to distinguish, and even lost some textures. When the method of this paper is used for high-magnification super-resolution reconstruction, the image texture of the reconstruction result is basically restored, and it has a better visual experience. The above quantitative and qualitative comparisons prove the advantages of our method described in this paper in the high-magnification super-resolution reconstruction of images, and it achieves better reconstruction results on different datasets.

5. Conclusions

In this paper, we extend the magnification of super-resolution reconstruction to high magnifications. Based on the idea of multi-task learning, we propose a different model cascade super-resolution method, which achieves image high-magnification super-resolution reconstruction by cascading different models with the same reconstruction magnification. Compared with other super-resolution methods, our method in this paper takes into account the different texture and edge information contained in low-resolution images under different down-sampled magnifications in the process of high-magnification super-resolution reconstruction, and selectively trains network models at different levels in the cascade network in a targeted manner. To make it more suitable for the texture characteristics of the low-resolution image of this level, the low-resolution image is enlarged step by step through the cascade model, and finally achieves the purpose of high-magnification super-resolution reconstruction. In the tests of different benchmark datasets, the high-magnification super-resolution reconstruction results of our method are better than other methods in the evaluation index, and have better detailed presentation and visual experience.

Author Contributions

Y.L. and H.Z. conceived and designed the experiments; Y.L. performed the experiments and wrote the original draft; H.Z. contributed to the review of the paper; S.Y. participated in the editing of the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China (Grant No. 61801005) and Natural Science Basic Research Program of Shaanxi (Grant No. 2020JQ-903).

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: http://people.rennes.inria.fr/Aline.Roumy/results/SR_BMVC12.html, https://sites.google.com/site/romanzeyde/research-interests, https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/, https://sites.google.com/site/v2/identifier/ and https://data.vision.ee.ethz.ch/cvl/DIV2K/ (all accessed on 1 March 2022).

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Grant No. 61801005) and the Natural Science Basic Research Program of Shaanxi (Grant No. 2020JQ-903). We thank the anonymous reviewers for their constructive feedback.

Conflicts of Interest

The authors declare no conflict of interest.

References

Dong, C.; Loy, C.C.; He, K.; Tang, X. Image Super-Resolution Using Deep Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar] [CrossRef] [Green Version]
Kim, J.; Lee, J.K.; Lee, K.M. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1637–1645. [Google Scholar] [CrossRef] [Green Version]
Tai, Y.; Yang, J.; Liu, X. Image super-resolution via deep recursive residual network. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3147–3155. [Google Scholar] [CrossRef]
Dong, C.; Loy, C.C.; Tang, X. Accelerating the Super-Resolution Convolutional Neural Network. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 391–407. [Google Scholar]
Tong, T.; Li, G.; Liu, X.; Gao, Q. Image Super-Resolution Using Dense Skip Connections. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4809–4817. [Google Scholar] [CrossRef]
Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar] [CrossRef] [Green Version]
Tai, Y.; Yang, J.; Liu, X.; Xu, C. MemNet: A Persistent Memory Network for Image Restoration. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4549–4557. [Google Scholar] [CrossRef] [Green Version]
Qiu, Y.; Wang, R.; Tao, D.; Cheng, J. Embedded Block Residual Network: A Recursive Restoration Model for Single-Image Super-Resolution. In Proceedings of the International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019. [Google Scholar] [CrossRef]
Han, W.; Chang, S.; Liu, D.; Yu, M.; Witbrock, M.; Huang, T.S. Image Super-Resolution via Dual-State Recurrent Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1654–1663. [Google Scholar] [CrossRef] [Green Version]
Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.P.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 105–114. [Google Scholar] [CrossRef] [Green Version]
Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Change Loy, C. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018; pp. 63–79. [Google Scholar]
Wang, X.; Yu, K.; Dong, C.; Loy, C.C. Recovering Realistic Texture in Image Super-Resolution by Deep Spatial Feature Transform. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 606–615. [Google Scholar] [CrossRef] [Green Version]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
Bulat, A.; Yang, J.; Tzimiropoulos, G. To Learn Image Super-Resolution, Use a GAN to Learn How to Do Image Degradation First BT. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018; pp. 187–202. [Google Scholar]
Yuan, Y.; Liu, S.; Zhang, J.; Zhang, Y.; Dong, C.; Lin, L. Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 814–823. [Google Scholar] [CrossRef] [Green Version]
Shocher, A.; Cohen, N.; Irani, M. Zero-Shot Super-Resolution Using Deep Internal Learning. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3118–3126. [Google Scholar] [CrossRef] [Green Version]
Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Deep Image Prior. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9446–9454. [Google Scholar] [CrossRef]
Yang, J.; Wei, F.; Bai, Y.; Zuo, M.; Sun, X.; Chen, Y. An Effective Multi-Task Two-Stage Network with the Cross-Scale Training Strategy for Multi-Scale Image Super Resolution. Electronics 2021, 10, 2434. [Google Scholar] [CrossRef]
Huang, F.; Wang, Z.; Wu, J.; Shen, Y.; Chen, L. Residual Triplet Attention Network for Single-Image Super-Resolution. Electronics 2021, 10, 2072. [Google Scholar] [CrossRef]
Guo, Y.; Chen, J.; Wang, J.; Chen, Q.; Cao, J.; Deng, Z.; Xu, Y.; Tan, M. Closed-loop matters: Dual regression networks for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 5406–5415. [Google Scholar] [CrossRef]
Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5835–5843. [Google Scholar] [CrossRef] [Green Version]
Dai, T.; Cai, J.; Zhang, Y.; Xia, S.-T.; Zhang, L. Second-order attention network for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11057–11066. [Google Scholar] [CrossRef]
Niu, B.; Wen, W.; Ren, W.; Zhang, X.; Yang, L.; Wang, S.; Zhang, K.; Cao, X.; Shen, H. Single Image Super-Resolution via a Holistic Attention Network. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 191–207. [Google Scholar]
Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar] [CrossRef] [Green Version]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
Ahn, N.; Kang, B.; Sohn, K.A. Fast, Accurate, and Lightweight with Cascading Residual Network. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 252–268. [Google Scholar]
Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar] [CrossRef]
Liu, J.; Zhang, W.; Tang, Y.; Tang, J.; Wu, G. Residual Feature Aggregation Network for Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2356–2365. [Google Scholar] [CrossRef]
Park, S.J.; Son, H.; Cho, S.; Hong, K.S.; Lee, S. SRFeat: Single Image Super-Resolution with Feature Discrimination. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Volume 11220, pp. 455–471. [Google Scholar] [CrossRef]
Sajjadi, M.S.; Scholkopf, B.; Hirsch, M. EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4501–4510. [Google Scholar] [CrossRef] [Green Version]
Maeda, S. Unpaired Image Super-Resolution using Pseudo-Supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 288–297. [Google Scholar] [CrossRef]
Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual Dense Network for Image Super-Resolution. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2472–2481. [Google Scholar] [CrossRef] [Green Version]
Zheng, H.; Wang, X.; Gao, X. Fast and Accurate Single Image Super-Resolution via Information Distillation Network. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 723–731. [Google Scholar] [CrossRef] [Green Version]
Yang, F.; Yang, H.; Fu, J.; Lu, H.; Guo, B. Learning texture transformer network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 5790–5799. [Google Scholar] [CrossRef]
Li, J.; Fang, F.; Mei, K.; Zhang, G. Multi-scale residual network for image super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018; pp. 517–532. [Google Scholar]
Hu, X.; Mu, H.; Zhang, X.; Wang, Z.; Tan, T.; Sun, J. Meta-SR: A magnification-arbitrary network for super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 1575–1584. [Google Scholar] [CrossRef] [Green Version]
Haris, M.; Shakhnarovich, G.; Ukita, N. Deep Back-Projection Networks for Super-Resolution. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; Volume 2, pp. 1664–1673. [Google Scholar] [CrossRef] [Green Version]
Yu, S.; Park, B.; Jeong, J. Deep iterative down-up cnn for image denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 15–21 June 2019. [Google Scholar]
Jin, X.; Chen, Z.; Lin, J.; Chen, Z.; Zhou, W. Unsupervised single image deraining with self-supervised constraints. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 2761–2765. [Google Scholar]
Chen, L.; Zhang, J.; Pan, J.; Lin, S.; Fang, F.; Ren, J.S. Learning a non-blind deblurring network for night blurry images. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 10542–10550. [Google Scholar]
Li, X.; Jin, X.; Lin, J.; Liu, S.; Wu, Y.; Yu, T.; Zhou, W. Learning disentangled feature representation for hybrid-distorted image restoration. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 313–329. [Google Scholar]
Zhou, W.; Wang, Z.; Chen, Z. Image super-resolution quality assessment: Structural fidelity versus statistical naturalness. In Proceedings of the 2021 13th International Conference on Quality of Multimedia Experience (QoMEX), Montreal, QC, Canada, 14–17 June 2021; pp. 61–64. [Google Scholar]
Zhou, W.; Jiang, Q.; Wang, Y.; Chen, Z.; Li, W. Blind quality assessment for image superresolution using deep two-stream convolutional networks. Inf. Sci. 2020, 528, 205–218. [Google Scholar] [CrossRef] [Green Version]
Fang, Y.; Zhang, C.; Yang, W.; Liu, J.; Guo, Z. Blind visual quality assessment for image super-resolution by convolutional neural network. Multimed. Tools Appl. 2018, 77, 29829–29846. [Google Scholar] [CrossRef]
Ruder, S. An Overview of Multi-Task Learning in Deep Neural Networks. arXiv 2017, arXiv:1706.05098. [Google Scholar]
Timofte, R.; Agustsson, E.; Van Gool, L.; Yang, M.-H.; Zhang, L.; Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M.; et al. NTIRE 2017 Challenge on Single Image Super-Resolution: Methods and Results. In Proceedings of the CVPR 2017 Workshops, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
Bevilacqua, M.; Roumy, A.; Guillemot, C.; Alberi Morel, M.-L. Low-Complexity Single-Image SUPER-RESolution based on Nonnegative Neighbor Embedding. In Proceedings of the British Machine Vision Conference, Guildford, UK, 3–7 September 2012; pp. 1–12. [Google Scholar] [CrossRef] [Green Version]
Zeyde, R.; Elad, M.; Protter, M. On Single Image Scale-Up Using Sparse-Representations; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; Volume 6920, pp. 711–730. [Google Scholar] [CrossRef]
Martin, D.R.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision, Vancouver, BC, Canada, 7–14 July 2001; Volume 2, pp. 416–423. [Google Scholar] [CrossRef] [Green Version]
Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar] [CrossRef]

Figure 1. The super-resolution reconstruction results of different methods for ×32 magnification.

Figure 2. Comparison of original version and down-sampled version of full image and the super-resolution results with our method.

Figure 3. Hard parameter sharing in multi-task learning.

Figure 4. Soft parameter sharing in multi-task learning.

Figure 5. The left is the training process of Net0 model and the right is the training process of different tasks.

Figure 6. Cascaded network structure of different models.

Figure 7. Cosine angle between adjacent down-sampled levels of Set5 dataset.

Figure 8. Qualitative comparison of image reconstruction results for ×32 magnification in Set14 and B100 datasets.

Figure 9. Qualitative comparison of image reconstruction results for ×32 magnification in the Urban100 dataset.

Table 1. The average information entropy of Set5 dataset down-sampled to different levels of images. (Unit: bits/pixel).

	×2	×4	×8	×16	×32
Set5	7.0385	7.0738	7.0364	6.7135	5.8306

Table 2. Cosine angle and cosine similarity of pixel distribution probability between adjacent levels of down-sampled images in Set5 dataset.

	×2 and ×4	×4 and ×8	×8 and ×16	×16 and ×32
Set5	11.07°	20.56°	38.80°	56.11°
Set5	0.9786	0.9283	0.7746	0.5507

Table 3. PSNR and SSIM values of reconstruction results by high-magnification super-resolution end-to-end method and different model cascade method.

	Scale	EDSR (End-to-End)	Ours
Set5	×16	22.91/0.8857	23.03/0.8868
Set5	×32	20.28/0.8107	20.35/0.8086
Set14	×16	21.87/0.8168	22.01/0.8175
Set14	×32	19.56/0.7569	19.69/0.7570
B100	×16	22.82/0.8504	22.90/0.8497
B100	×32	20.97/0.7998	21.05/0.7979
Urban100	×16	20.08/0.8033	20.21/0.8040
Urban100	×32	18.28/0.7288	18.44/0.7315

Table 4. The PSNR values of intermediate results generated by model reconstruction at different levels of EDSR method and our method.

	Reconstruction Level	EDSR	Ours
Set5	×8 to ×4	32.08(Net0)	32.10(Net2)
	×16 to ×8	29.25(Net0)	29.26(Net3)
	×32 to ×16	27.74(Net0)	27.81(Net4)
B100	×8 to ×4	32.27(Net0)	32.31(Net2)
	×16 to ×8	31.08(Net0)	31.15(Net3)
	×32 to ×16	29.49(Net0)	29.54(Net4)

Table 5. PSNR and SSIM values of reconstruction results of different methods in different datasets.

Methods	Scale	Set5	Set14	B100	Urban100
Bicubic	×4	28.17/0.9628	25.62/0.8937	25.88/0.9140	23.02/0.8870
SRCNN [1]		30.27/0.9773	26.96/0.9124	26.88/0.9275	24.46/0.9123
FSRCNN [5]		30.35/0.9784	27.11/0.9131	26.88/0.9281	24.54/0.9140
VDSR [7]		31.19/0.9815	27.66/0.9175	27.24/0.9310	25.13/0.9219
LapSRN [22]		30.77/0.9813	27.47/0.9106	27.14/0.9299	24.90/0.9184
CARN [27]		31.81/0.9843	28.05/0.9216	27.55/0.9342	25.87/0.9321
EDSR [25]		32.28/0.9856	28.38/0.9245	27.74/0.9361	26.41/0.9391
Ours		32.30/0.9857	28.40/0.9247	27.77/0.9362	26.66/0.9409
Bicubic	×8	24.17/0.9078	22.75/0.8343	23.66/0.8689	20.67/0.8198
SRCNN [1]		25.43/0.9312	23.65/0.8530	24.34/0.8822	21.51/0.8454
FSRCNN [5]		25.43/0.9331	23.67/0.8531	24.28/0.8826	21.52/0.8459
VDSR [7]		25.94/0.9400	24.05/0.8584	24.54/0.8861	21.87/0.8547
LapSRN [22]		25.69/0.9357	23.96/0.8541	24.48/0.8850	21.72/0.8506
CARN [27]		26.59/0.9508	24.47/0.8658	24.79/0.8911	22.34/0.8673
EDSR [25]		26.82/0.9537	24.70/0.8691	24.94/0.8932	22.69/0.8757
Ours		26.95/0.9553	24.78/0.8706	24.98/0.8935	22.84/0.8774
Bicubic	×16	21.26/0.8357	20.52/0.7774	21.88/0.8230	18.91/0.7503
SRCNN [1]		22.05/0.8564	21.13/0.7953	22.42/0.8370	19.42/0.7738
FSRCNN [5]		22.07/0.8581	21.16/0.7950	22.34/0.8371	19.40/0.7731
VDSR [7]		22.35/0.8637	21.44/0.8016	22.55/0.8412	19.63/0.7815
LapSRN [22]		22.28/0.8610	21.39/0.7993	22.51/0.8401	19.54/0.7777
CARN [27]		20.83/0.8778	21.80/0.8123	22.75/0.8474	19.96/0.7954
EDSR [25]		22.99/0.8832	22.00/0.8617	22.87/0.8494	20.20/0.8038
Ours		23.03/0.8868	22.01/0.8175	22.90/0.8497	20.21/0.8040
Bicubic	×32	19.02/0.7702	18.81/0.7187	20.27/0.7692	17.56/0.6819
SRCNN [1]		19.10/0.7808	19.16/0.7352	20.67/0.7831	17.88/0.7013
FSRCNN [5]		19.54/0.7867	19.14/0.7326	20.59/0.7841	17.86/0.7006
VDSR [7]		19.58/0.7894	19.30/0.7390	20.77/0.7881	18.02/0.7083
LapSRN [22]		19.63/0.7899	19.26/0.7379	20.76/0.7877	17.98/0.7058
CARN [27]		20.17/0.8018	19.54/0.7509	20.95/0.7960	18.23/0.7216
EDSR [25]		20.29/0.8063	19.68/0.7563	21.03/0.7982	18.42/0.7306
Ours		20.35/0.8086	19.69/0.7570	21.05/0.7979	18.44/0.7315

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Zhu, H.; Yu, S. High-Magnification Super-Resolution Reconstruction of Image with Multi-Task Learning. Electronics 2022, 11, 1412. https://doi.org/10.3390/electronics11091412

AMA Style

Li Y, Zhu H, Yu S. High-Magnification Super-Resolution Reconstruction of Image with Multi-Task Learning. Electronics. 2022; 11(9):1412. https://doi.org/10.3390/electronics11091412

Chicago/Turabian Style

Li, Yanghui, Hong Zhu, and Shunyuan Yu. 2022. "High-Magnification Super-Resolution Reconstruction of Image with Multi-Task Learning" Electronics 11, no. 9: 1412. https://doi.org/10.3390/electronics11091412

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

High-Magnification Super-Resolution Reconstruction of Image with Multi-Task Learning

Abstract

1. Introduction

2. Related Work

2.1. General Methods of Super-Resolution Reconstruction

2.2. High-Magnification Super-Resolution Reconstruction Method

2.3. Other Image Enhancement Work

2.4. Image Quality Assessment

3. Proposed Method

3.1. Multi-Task Learning and Different Model Training

3.2. Network Model Cascade

3.3. Down-Sampled Images at Different Levels

4. Experiments

4.1. Training Dataset

4.2. Training Details

4.3. Cascade Model and End-to-End Model

4.4. Comparison of Intermediate Results between the Same Model Cascade and Different Model Cascades

4.5. Comparative Experiment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI