# PercepPan: Towards Unsupervised Pan-Sharpening Based on Perceptual Loss

^{*}

## Abstract

**:**

## 1. Introduction

- How can the framework and loss be designed to train pan-sharpening model G directly?
- Could supervised pre-training offer gains in the SPUF training paradigm?
- Could the unsupervised perspective outperform its supervised counterpart?

- A novel unsupervised learning framework “perceptual pan-sharpening (PercepPan)” is proposed, which does not need the degradation step anymore. The framework consists of a generator, a reconstructor, and a discriminator. The generator takes responsibility for generating HRMS images, the reconstructor takes advantage of prior knowledge to imitate the observation model from HRMS images to LRMS-PAN image pairs, and the discriminator extracts features from LRMS-PAN image pairs to compute feature loss and GAN loss.
- A perceptual loss is adopted as the objective function. The loss consists of three parts, with one computed in pixel space, another computed in feature space and the last computed in GAN space. The hybrid loss is beneficial for improving perceptual quality of generated HRMS images.
- A novel training paradigm, called SPUF, is adopted to train the proposed PercepPan. Experiments show that SPUF could usually outperform random initialization.
- Experiments show that PercepPan could cooperate with several different generators. Experiments on the QuickBird dataset show that the unsupervised results are comparable to the supervised ones. When generalizing to the IKONOS dataset, similar conclusions still hold.

## 2. Perceptual Loss

## 3. Methodology

#### 3.1. Pan-Sharpening Formula

#### 3.2. Network Architecture

- A generator G which takes as input a LRMS-PAN image pair $(x,p)$ to generate a HRMS image $\widehat{y}$;
- A reconstructor R which takes as input a generated HRMS image $\widehat{y}$ to reconstruct the corresponding LRMS-PAN image pair, with the output denoted as $\widehat{x}$ and $\widehat{p}$, respectively;
- A discriminator D which takes as input real/reconstructed LRMS-PAN image pairs to calculate feature loss and GAN loss.

**Generator**. The generator G needs to fuse spectral details from LRMS images and spatial details from PAN images. Existing generators taking LRMS-PAN image pairs into networks directly to extract those details [14,17], or learning residual details according to LRMS images [15,16], could play the role of G. We also try the ESRGAN-style generator with residual learning according to PAN images,

**Reconstructor**. The reconstructor $R=({R}_{x},{R}_{p})$ aims at reconstructing LRMS-PAN image pairs from the generated HRMS images. It could be implemented by a neural network. Inspired by [58], we design a shallow architecture for R to simulate the observation process about how to acquire LRMS-PAN image pairs via satellites.

**Discriminator**. The discriminator D is responsible for computing feature loss and GAN loss.

#### 3.3. Initialization

#### 3.4. Training Strategy

#### 3.5. Datasets and Algorithms

Algorithm 1 Full-scale SPUF training algorithm. |

Require: Full-scale training dataset, number of training iterations $Iter$, batch size $bs$, learning rates ${\eta}_{G}$ and ${\eta}_{D}$, hyper-parameters $\alpha $, $\beta $ and $\gamma $.1: Initializing G, R and D 2: for $i\leftarrow 1,2,\dots ,Iter$do3: Sampling a mini-batch of image pairs, ${\{{x}^{\left(n\right)},{p}^{\left(n\right)}\}}_{n=1}^{bs}$, from full-scale training dataset 4: Computing loss ${l}_{G}={\sum}_{n=1}^{bs}\alpha {l}_{\mathrm{pixel}}^{\left(n\right)}+\beta {l}_{\mathrm{feat}}^{\left(n\right)}+\gamma {l}_{\mathrm{GAN}}^{\left(n\right)}$ 5: Computing Gradient ${g}_{G}={\nabla}_{G}{l}_{G}$ 6: Updating weights ${w}_{G}\leftarrow {w}_{G}-{\eta}_{G}\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}\mathrm{Adam}\left({w}_{G},{g}_{G}\right)$ 7: Sampling a mini-batch of image pairs, ${\{{x}^{\left(n\right)},{p}^{\left(n\right)}\}}_{n=1}^{bs}$, from full-scale training dataset 8: Computing loss ${l}_{D}={\sum}_{n=1}^{bs}{l}_{\mathrm{GAN}}^{\left(n\right)}$ 9: Computing Gradient ${g}_{D}={\nabla}_{D}{l}_{D}$ 10: Updating weights ${w}_{D}\leftarrow {w}_{D}-{\eta}_{D}\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}\mathrm{Adam}\left({w}_{D},{g}_{D}\right)$ 11: end for |

Algorithm 2 Reduced-scale SPSF training algorithm. |

Require: Reduced-scale training dataset, number of training iterations $Iter$, batch size $bs$, learning rates ${\eta}_{G}$ and ${\eta}_{D}$, hyper-parameters $\alpha $, $\beta $ and $\gamma $.1: Initializing G and D 2: for $i\leftarrow 1,2,\dots ,Iter$do3: Sampling a mini-batch of image pairs, ${\{{x}^{\left(n\right)},{p}^{\left(n\right)}\}}_{n=1}^{bs}$, from reduced-scale training dataset 4: Computing loss ${l}_{G}={\sum}_{n=1}^{bs}\alpha {l}_{\mathrm{pixel}}^{\left(n\right)}+\beta {l}_{\mathrm{feat}}^{\left(n\right)}+\gamma {l}_{\mathrm{GAN}}^{\left(n\right)}$ 5: Computing Gradient ${g}_{G}={\nabla}_{G}{l}_{G}$ 6: Updating weights ${w}_{G}\leftarrow {w}_{G}-{\eta}_{G}\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}\mathrm{Adam}\left({w}_{G},{g}_{G}\right)$ 7: Sampling a mini-batch of image pairs, ${\{{x}^{\left(n\right)},{p}^{\left(n\right)}\}}_{n=1}^{bs}$, from reduced-scale training dataset 8: Computing loss ${l}_{D}={\sum}_{n=1}^{bs}{l}_{\mathrm{GAN}}^{\left(n\right)}$ 9: Computing Gradient ${g}_{D}={\nabla}_{D}{l}_{D}$ 10: Updating weights ${w}_{D}\leftarrow {w}_{D}-{\eta}_{D}\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}\mathrm{Adam}\left({w}_{D},{g}_{D}\right)$ 11: end for |

## 4. Experiments

#### 4.1. Experiment Settings

**Datasets**. Images come from two different satellites, QuickBird and IKONOS. Table 3 summarizes spectral and spatial information about these two satellites [68,69]:

- The QuickBird bundle product is composed of two kinds of images, with one MS image at $2.8$ m resolution and another PAN images at $0.7$ m resolution, and the pixel is recorded in 11 bits. The images used in this paper come from the area of Sha Tin, Hong Kong, China, with geographic coordinates N(${22}^{\xb0}{19}^{\prime}{58}^{\u2033},{22}^{\xb0}{36}^{\prime}{0}^{\u2033}$) E(${114}^{\xb0}{6}^{\prime}{1}^{\u2033},{114}^{\xb0}{16}^{\prime}{19}^{\u2033}$), and the image of this area is shown in Figure 4. The whole size is 7364 × 7713 for MS images, and 29,456 × 30,852 for PAN images.
- The IKONOS bundle product is composed of two kinds of images as well, with one MS image at 4 m resolution and another PAN image at 1 m resolution, and again pixel is recorded in 11 bits. The images used in this paper come from the area of Wenchuan, Sichuan, China, with geographic coordinates N(${30}^{\xb0}{59}^{\prime}{0}^{\u2033},{31}^{\xb0}{6}^{\prime}{0}^{\u2033}$) E(${103}^{\xb0}{12}^{\prime}{36}^{\u2033},{103}^{\xb0}{17}^{\prime}{48}^{\u2033}$). Due to the license issue, the image of this area is not shown here. The whole size is $2066\times 3236$ for MS image, and $8264\times $ 12,944 for PAN image.

**Other Generators**. Only neural network-based methods are used as the generator G. These methods are PNN [14], RSIFNN [15], PanNet [16], and PSGAN [17]. These methods are trained in a supervised manner with preferable settings from the corresponding papers but on our reduced-scale dataset, and then generalized onto the full-scale dataset directly. Classical methods, such as [1,4,7], are not taken into consideration.

**Hyper-parameters**. As stated in Section 3.3, the ESRGAN-style generator G is initialized by one of Random style, PSNR style, and ESRGAN style. The hyper-parameters in Equation (11) are given in advance, $(\alpha ,\beta ,\gamma )\in \left\{\right(1,0,0),(0,1,0.01),(1,1,0.01\left)\right\}$, in which $(1,0,0)$ means only pixel loss takes effect, $(0,1,0.01)$ means feature loss and GAN loss take effect, and $(1,1,0.01)$ means all of three losses take effect at the same time. It should be noticed that $0.01$ is used to make the GAN loss have the same order of magnitude with the other two losses in an early training stage. The batch size is assigned to be 4, and the number of iterations for training is 5000. The whole network is trained by Adam [70]. Inspired by two time scale update rule [67], learning rates ${\eta}_{G}$ and ${\eta}_{D}$ are chosen individually from 1 × 10

^{−4}and 1 × 10

^{−5}.

#### 4.2. Image Quality Assessment

- SAM is a measurement of spectral distortion. Denote ${\widehat{I}}_{i,j},{I}_{i,j}\in {\mathbb{R}}^{C}$ as vectors at $(i,j)$ pixel position of $\widehat{I}$ and I, respectively, then$$\mathrm{SAM}(\widehat{I},I)=\frac{1}{HW}\sum _{i=1}^{H}\sum _{j=1}^{W}arccos\frac{<{\widehat{I}}_{i,j},{I}_{i,j}>}{\left|\right|{\widehat{I}}_{i,j}\left|\right|\left|\right|{I}_{i,j}\left|\right|},$$
- PSNR is a commonly used image quality assessment method,$$\mathrm{PSNR}(\widehat{I},I)=10{log}_{10}{\left(\frac{{\mathrm{MAX}}_{I}}{\mathrm{RMSE}(\widehat{I},I)}\right)}^{2},$$
- SCC is a spatial quality index. Denote ${\widehat{I}}_{c},{I}_{c}\in {\mathbb{R}}^{H\times W}$ as the c-th band of $\widehat{I}$ and I, respectively. Then,$$\mathrm{SCC}(\widehat{I},I)=\frac{1}{C}\sum _{c=1}^{C}\frac{\mathrm{Cov}({\widehat{I}}_{c},{I}_{c})}{\sigma \left({\widehat{I}}_{c}\right)\sigma \left({I}_{c}\right)},$$
- Q-index gathers image luminance, contrast, and structure for quality assessment. After dividing ${\widehat{I}}_{c}$ and ${I}_{c}$ into B patches pairs ${\left\{({\widehat{p}}_{c}^{\left(i\right)},{p}_{c}^{\left(i\right)})\right\}}_{i=1}^{B}$, Q-index is computed as follows:$$\mathrm{Q}(\widehat{I},I)=\frac{1}{C}\sum _{c=1}^{C}\frac{1}{B}\sum _{i=1}^{B}\frac{2\mu \left({\widehat{p}}_{c}^{\left(i\right)}\right)\mu \left({p}_{c}^{\left(i\right)}\right)}{{\mu}^{2}\left({\widehat{p}}_{c}^{\left(i\right)}\right)+{\mu}^{2}\left({p}_{c}^{\left(i\right)}\right)}\frac{2\sigma \left({\widehat{p}}_{c}^{\left(i\right)}\right)\sigma \left({p}_{c}^{\left(i\right)}\right)}{{\sigma}^{2}\left({\widehat{p}}_{c}^{\left(i\right)}\right)+{\sigma}^{2}\left({p}_{c}^{\left(i\right)}\right)}\frac{\mathrm{Cov}({\widehat{p}}_{c}^{\left(i\right)},{p}_{c}^{\left(i\right)})}{\sigma \left({\widehat{p}}_{c}^{\left(i\right)}\right)\sigma \left({p}_{c}^{\left(i\right)}\right)},$$
- SSIM is a famous image quality assessment method and it is an extension of Q-index,$$\mathrm{SSIM}(\widehat{I},I)=\frac{1}{C}{\sum}_{c=1}^{C}\frac{1}{B}{\sum}_{i=1}^{B}\frac{2\mu \left({\widehat{p}}_{c}^{\left(i\right)}\right)\mu \left({p}_{c}^{\left(i\right)}\right)+{c}_{1}}{{\mu}^{2}\left({\widehat{p}}_{c}^{\left(i\right)}\right)+{\mu}^{2}\left({p}_{c}^{\left(i\right)}\right)+{c}_{1}}\frac{2\sigma \left({\widehat{p}}_{c}^{\left(i\right)}\right)\sigma \left({p}_{c}^{\left(i\right)}\right)+{c}_{2}}{{\sigma}^{2}\left({\widehat{p}}_{c}^{\left(i\right)}\right)+{\sigma}^{2}\left({p}_{c}^{\left(i\right)}\right)+{c}_{2}}\frac{\mathrm{Cov}({\widehat{p}}_{c}^{\left(i\right)},{p}_{c}^{\left(i\right)})+{c}_{3}}{\sigma \left({\widehat{p}}_{c}^{\left(i\right)}\right)\sigma \left({p}_{c}^{\left(i\right)}\right)+{c}_{3}},$$
- ERGAS is another common method of image quality assessment. Denote the spatial resolution ratio between MS images and the corresponding PAN images by r. Then,$$\mathrm{ERGAS}(\widehat{I},I)=100\times r\times \sqrt{\frac{1}{C}\sum _{c=1}^{C}{\left(\frac{\mathrm{RMSE}({\widehat{I}}_{c},{I}_{c})}{\mu \left({I}_{c}\right)}\right)}^{2}}.$$
- QNR is a no-reference method for image quality assessment. It consists of a spectral distortion index ${D}_{\lambda}$, and a spatial distortion index ${D}_{s}$. Here, denote an LRMS image with C spectral bands as ${I}^{\mathrm{LRMS}}$, the corresponding generated HRMS image as ${I}^{\mathrm{HRMS}}$, PAN image with only one spectral band as ${I}^{\mathrm{PAN}}$, and its degraded counterpart as ${I}^{\mathrm{LRPAN}}$, then$$\begin{array}{cc}\hfill {D}_{\lambda}=& {\left(\frac{2}{C(C-1)}\sum _{c=1}^{C}\sum _{{c}^{\prime}>c}^{C}{|Q({I}_{c}^{\mathrm{HRMS}},{I}_{{c}^{\prime}}^{\mathrm{HRMS}})-Q({I}_{c}^{\mathrm{LRMS}},{I}_{{c}^{\prime}}^{\mathrm{LRMS}})|}^{u}\right)}^{\frac{1}{u}},\hfill \end{array}$$$$\begin{array}{cc}\hfill {D}_{s}=& {\left(\frac{1}{C}\sum _{c=1}^{C}{|Q({I}_{c}^{\mathrm{HRMS}},{I}^{\mathrm{PAN}})-Q({I}_{c}^{\mathrm{LRMS}},{I}^{\mathrm{LRPAN}})|}^{v}\right)}^{\frac{1}{v}},\hfill \end{array}$$$$\begin{array}{cc}\hfill \mathrm{QNR}=& {(1-{D}_{\lambda})}^{a}{(1-{D}_{s})}^{b},\hfill \end{array}$$

#### 4.3. Model Evaluation with Different Settings

#### 4.4. Generalization: Generator

#### 4.5. Generalization: Dataset

## 5. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A

**Figure A1.**Noised images and the corresponding image quality assessment scores. Greater level value means stronger noise.

## References

- Carper, W.; Lillesand, T.; Kiefer, R. The use of intensity-hue-saturation transformations for merging SPOT panchromatic and multispectral image data. Photogramm. Eng. Remote Sens.
**1990**, 56, 459–467. [Google Scholar] - Gillespie, A.R.; Kahle, A.B.; Walker, R.E. Color enhancement of highly correlated images. II. Channel ratio and “chromaticity” transformation techniques. Remote Sens. Environ.
**1987**, 22, 343–365. [Google Scholar] [CrossRef] - Zhang, Y.; Hong, G. An IHS and wavelet integrated approach to improve pan-sharpening visual quality of natural colour IKONOS and QuickBird images. Inf. Fusion
**2005**, 6, 225–234. [Google Scholar] [CrossRef] - Khan, M.M.; Chanussot, J.; Condat, L.; Montanvert, A. Indusion: Fusion of Multispectral and Panchromatic Images Using the Induction Scaling Technique. IEEE Geosci. Remote Sens. Lett.
**2008**, 5, 98–102. [Google Scholar] [CrossRef] [Green Version] - Otazu, X.; González-Audícana, M.; Fors, O.; Núñez, J. Introduction of sensor spectral response into image fusion methods. Application to wavelet-based methods. IEEE Trans. Geosci. Remote Sens.
**2005**, 43, 2376–2385. [Google Scholar] [CrossRef] [Green Version] - Guo, M.; Zhang, H.; Li, J.; Zhang, L.; Shen, H. An Online Coupled Dictionary Learning Approach for Remote Sensing Image Fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
**2014**, 7, 1284–1294. [Google Scholar] [CrossRef] - Zhu, X.; Bamler, R. A Sparse Image Fusion Algorithm With Application to Pan-Sharpening. IEEE Trans. Geosci. Remote Sens.
**2013**, 51, 2827–2836. [Google Scholar] [CrossRef] - Ghassemian, H. A review of remote sensing image fusion methods. Inf. Fusion
**2016**, 32, 75–89. [Google Scholar] [CrossRef] - Meng, X.; Shen, H.; Li, H.; Zhang, L.; Fu, R. Review of the pansharpening methods for remote sensing images based on the idea of meta-analysis: Practical discussion and challenges. Inf. Fusion
**2019**, 46, 102–113. [Google Scholar] [CrossRef] - Dong, C.; Loy, C.C.; He, K.; Tang, X. Image Super-Resolution Using Deep Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell.
**2016**, 38, 295–307. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Garzelli, A. A Review of Image Fusion Algorithms Based on the Super-Resolution Paradigm. Remote Sens.
**2016**, 8, 797. [Google Scholar] [CrossRef] [Green Version] - Ledig, C.; Theis, L.; Huszar, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.P.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 105–114. [Google Scholar] [CrossRef] [Green Version]
- Lu, T.; Wang, J.; Zhang, Y.; Wang, Z.; Jiang, J. Satellite Image Super-Resolution via Multi-Scale Residual Deep Neural Network. Remote Sens.
**2019**, 11, 1588. [Google Scholar] [CrossRef] [Green Version] - Masi, G.; Cozzolino, D.; Verdoliva, L.; Scarpa, G. Pansharpening by Convolutional Neural Networks. Remote Sens.
**2016**, 8, 594. [Google Scholar] [CrossRef] [Green Version] - Shao, Z.; Cai, J. Remote Sensing Image Fusion With Deep Convolutional Neural Network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
**2018**, 11, 1656–1669. [Google Scholar] [CrossRef] - Yang, J.; Fu, X.; Hu, Y.; Huang, Y.; Ding, X.; Paisley, J.W. PanNet: A Deep Network Architecture for Pan-Sharpening. In Proceedings of the IEEE International Conference on Computer Vision ICCV 2017, Venice, Italy, 22–29 October 2017; pp. 1753–1761. [Google Scholar] [CrossRef]
- Liu, X.; Wang, Y.; Liu, Q. Psgan: A Generative Adversarial Network for Remote Sensing Image Pan-Sharpening. In Proceedings of the 2018 IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, 7–10 October 2018; pp. 873–877. [Google Scholar] [CrossRef] [Green Version]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.C.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Hong, D.; Yokoya, N.; Chanussot, J.; Zhu, X.X. CoSpace: Common Subspace Learning from Hyperspectral-Multispectral Correspondences. IEEE Trans. Geosci. Remote Sens.
**2019**, 57, 4349–4359. [Google Scholar] [CrossRef] [Green Version] - Yao, J.; Meng, D.; Zhao, Q.; Cao, W.; Xu, Z. Nonconvex-Sparsity and Nonlocal-Smoothness-Based Blind Hyperspectral Unmixing. IEEE Trans. Image Process.
**2019**, 28, 2991–3006. [Google Scholar] [CrossRef] - Hong, D.; Yokoya, N.; Chanussot, J.; Zhu, X.X. An Augmented Linear Mixing Model to Address Spectral Variability for Hyperspectral Unmixing. IEEE Trans. Image Process.
**2019**, 28, 1923–1938. [Google Scholar] [CrossRef] [Green Version] - LeCun, Y.; Bengio, Y.; Hinton, G.E. Deep learning. Nature
**2015**, 521, 436–444. [Google Scholar] [CrossRef] [PubMed] - Hong, D.; Yokoya, N.; Xia, G.; Chanussot, J.; Zhu, X. X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data. arXiv
**2020**, arXiv:2006.13806. [Google Scholar] - Hong, D.; Yokoya, N.; Ge, N.; Chanussot, J.; Zhu, X.X. Learnable manifold alignment (LeMA): A semi-supervised cross-modality learning framework for land cover and land use classification. ISPRS J. Photogramm. Remote Sens.
**2019**, 147, 193–205. [Google Scholar] [CrossRef] - Hong, D.; Wu, X.; Ghamisi, P.; Chanussot, J.; Yokoya, N.; Zhu, X.X. Invariant attribute profiles: A spatial-frequency joint feature extractor for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens.
**2020**, 58, 3791–3808. [Google Scholar] [CrossRef] [Green Version] - Wald, L.; Ranchin, T.; Mangolini, M. Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images. Photogramm. Eng. Remote Sens.
**1997**, 63, 691–699. [Google Scholar] - Hinton, G.E.; Osindero, S.; Teh, Y.W. A Fast Learning Algorithm for Deep Belief Nets. Neural Comput.
**2006**, 18, 1527–1554. [Google Scholar] [CrossRef] - Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion. J. Mach. Learn. Res.
**2010**, 11, 3371–3408. [Google Scholar] - Bengio, Y.; Lamblin, P.; Popovici, D.; Larochelle, H. Greedy Layer-Wise Training of Deep Networks. In Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 4–7 December 2006; pp. 153–160. [Google Scholar]
- Erhan, D.; Bengio, Y.; Courville, A.C.; Manzagol, P.; Vincent, P.; Bengio, S. Why Does Unsupervised Pre-training Help Deep Learning? J. Mach. Learn. Res.
**2010**, 11, 625–660. [Google Scholar] - Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Li, F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.; Maire, M.; Belongie, S.J.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014—13th European Conference, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar] [CrossRef] [Green Version]
- Zhou, B.; Lapedriza, À.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 Million Image Database for Scene Recognition. IEEE Trans. Pattern Anal. Mach. Intell.
**2018**, 40, 1452–1464. [Google Scholar] [CrossRef] [Green Version] - Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems 2012., Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1106–1114. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar] [CrossRef] [Green Version]
- Rasti, B.; Hong, D.; Hang, R.; Ghamisi, P.; Kang, X.; Chanussot, J.; Benediktsson, J.A. Feature Extraction for Hyperspectral Imagery: The Evolution from Shallow for Deep (Overview and Toolbox). IEEE Geosci. Remote Sens. Mag.
**2020**. [Google Scholar] [CrossRef] - Girshick, R.B.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Divvala, S.K.; Girshick, R.B.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef] [Green Version]
- Wu, X.; Hong, D.; Chanuoost, J.; Tao, R.; Wang, Y. Fourier-based Rotation- invariant Feature Boosting: An Efficient Framework for Geospatial Object Detection. IEEE Geosci. Remote Sens. Lett.
**2020**, 17, 302–306. [Google Scholar] [CrossRef] [Green Version] - Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell.
**2017**, 39, 640–651. [Google Scholar] [CrossRef] - Chen, L.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell.
**2018**, 40, 834–848. [Google Scholar] [CrossRef] - Gao, L.; Hong, D.; Yao, J.; Zhang, B.; Gamba, P.; Chanussot, J. Spectral Superresolution of Multispectral Imagery with Joint Sparse and Low-Rank Learning. IEEE Trans. Geosci. Remote Sens.
**2020**. [Google Scholar] [CrossRef] - He, K.; Girshick, R.B.; Dollár, P. Rethinking ImageNet Pre-training. arXiv
**2018**, arXiv:1811.08883. [Google Scholar] - Kornblith, S.; Shlens, J.; Le, Q.V. Do Better ImageNet Models Transfer Better? arXiv
**2018**, arXiv:1805.08974. [Google Scholar] - Hendrycks, D.; Lee, K.; Mazeika, M. Using Pre-Training Can Improve Model Robustness and Uncertainty. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA, 9–15 June 2019; pp. 2712–2721. [Google Scholar]
- Yosinski, J.; Clune, J.; Nguyen, A.M.; Fuchs, T.J.; Lipson, H. Understanding Neural Networks Through Deep Visualization. arXiv
**2015**, arXiv:1506.06579. [Google Scholar] - Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In Proceedings of the Computer Vision—ECCV 2016—14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Berlin, Germany, 2016; Volume 9906, pp. 694–711. [Google Scholar] [CrossRef] [Green Version]
- Larsen, A.B.L.; Sønderby, S.K.; Larochelle, H.; Winther, O. Autoencoding beyond pixels using a learned similarity metric. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York, NY, USA, 19–24 June 2016; pp. 1558–1566. [Google Scholar]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Loy, C.C. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In Proceedings of the Computer Vision—ECCV 2018 Workshops, Munich, Germany, 8–14 September 2018; pp. 63–79. [Google Scholar] [CrossRef] [Green Version]
- Jolicoeur-Martineau, A. The relativistic discriminator: a key element missing from standard GAN. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Blau, Y.; Mechrez, R.; Timofte, R.; Michaeli, T.; Zelnik-Manor, L. The 2018 PIRM Challenge on Perceptual Image Super-Resolution. In Proceedings of the Computer Vision—ECCV 2018 Workshops, Munich, Germany, 8–14 September 2018; pp. 334–355. [Google Scholar] [CrossRef] [Green Version]
- Ulyanov, D.; Vedaldi, A.; Lempitsky, V.S. Instance Normalization: The Missing Ingredient for Fast Stylization. arXiv
**2016**, arXiv:1607.08022. [Google Scholar] - Huang, X.; Belongie, S.J. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. In Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017; pp. 1510–1519. [Google Scholar] [CrossRef] [Green Version]
- Goodfellow, I.J.; Bengio, Y.; Courville, A.C. Deep Learning; Adaptive Computation and Machine Learning; MIT Press: Cambridge, MA, USA, 2016; pp. 499–523. [Google Scholar]
- Isola, P.; Zhu, J.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 5967–5976. [Google Scholar] [CrossRef] [Green Version]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Li, S.; Yin, H.; Fang, L. Remote Sensing Image Fusion via Sparse Representations Over Learned Dictionaries. IEEE Trans. Geosci. Remote Sens.
**2013**, 51, 4779–4789. [Google Scholar] [CrossRef] - Hong, D.; Zhu, X. SULoRA: Subspace Unmixing with Low-Rank Attribute Embedding for Hyperspectral Data Analysis. IEEE J. Sel. Top. Signal Process.
**2018**, 12, 1351–1363. [Google Scholar] [CrossRef] - Mishkin, D.; Matas, J. All you need is a good init. In Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar] [CrossRef] [Green Version]
- Aiazzi, B.; Alparone, L.; Baronti, S.; Garzelli, A.; Selva, M. MTF-tailored multiscale fusion of high-resolution MS and Pan imagery. Photogramm. Eng. Remote Sens.
**2006**, 72, 591–596. [Google Scholar] [CrossRef] - Vivone, G.; Alparone, L.; Chanussot, J.; Mura, M.D.; Garzelli, A.; Licciardi, G.; Restaino, R.; Wald, L. A Critical Comparison Among Pansharpening Algorithms. IEEE Trans. Geosci. Remote Sens.
**2015**, 53, 2565–2586. [Google Scholar] [CrossRef] - Li, Z.; Leung, H. Fusion of Multispectral and Panchromatic Images Using a Restoration-Based Method. IEEE Trans. Geosci. Remote Sens.
**2009**, 47, 1482–1491. [Google Scholar] [CrossRef] - Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Salimans, T.; Goodfellow, I.J.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved Techniques for Training GANs. In Proceedings of the Annual Conference on Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2016; pp. 2226–2234. [Google Scholar]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Proceedings of the Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; pp. 6626–6637. [Google Scholar]
- Wang, L.; Sousa, W.P.; Gong, P.; Biging, G.S. Comparison of IKONOS and QuickBird images for mapping mangrove species on the Caribbean coast of Panama. Remote Sens. Environ.
**2004**, 91, 432–440. [Google Scholar] [CrossRef] - Parente, C.; Santamaria, R. Increasing geometric resolution of data supplied by quickbird multispectral sensors. Sens. Transducers
**2013**, 156, 111. [Google Scholar] - Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019; Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R., Eds.; pp. 8024–8035. [Google Scholar]
- Yuhas, R.H.; Goetz, A.F.; Boardman, J.W. Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm. In Proceedings of the Summaries 3rd Annual JPL Airborne Geoscience Workshop, 1992, Pasadena, CA, USA, 1–5 June 1992; Jet Propulsion Laboratory Publication: Pasadena, CA, USA, 1992; Volume 1, pp. 147–149. [Google Scholar]
- Zhou, J.; Civco, D.; Silander, J. A wavelet transform method to merge Landsat TM and SPOT panchromatic data. Int. J. Remote Sens.
**1998**, 19, 743–757. [Google Scholar] [CrossRef] - Wang, Z.; Bovik, A.C. A universal image quality index. IEEE Signal Process. Lett.
**2002**, 9, 81–84. [Google Scholar] [CrossRef] - Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process.
**2004**, 13, 600–612. [Google Scholar] [CrossRef] [Green Version] - Ranchin, T.; Wald, L. Fusion of high spatial and spectral resolution images: the ARSIS concept and its implementation. Photogramm. Eng. Remote Sens.
**2000**, 66, 49–61. [Google Scholar] - Alparone, L.; Aiazzi, B.; Baronti, S.; Garzelli, A.; Nencini, F.; Selva, M. Multispectral and panchromatic data fusion assessment without reference. Photogramm. Eng. Remote Sens.
**2008**, 74, 193–200. [Google Scholar] [CrossRef] [Green Version]

**Figure 1.**Different perspectives to train pan-sharpening models. Left: traditional supervised perspective; Right: proposed unsupervised perspective.

**Figure 2.**The generator of ESRGAN. Contents in red boxes shows an example of adaptations when taking as input MS image with four bands.

**Figure 3.**The structure of the proposed PercepPan. G, R, and D denote Generator, Reconstructor, and Discriminator, respectively.

**Figure 5.**The score trend of different indexes with respect to the level of noise. On the x-axis, a greater number means a higher noise level.

**Figure 6.**Fused results of two randomly selected samples from the QuickBird test set. From left to right are the original LRMS/PAN images, results of PNN, RSIFNN, PanNet, PSGAN, and PercepPan with the ESRGAN generator, respectively.

**Figure 7.**Fused results of two randomly selected samples from the IKONOS test set. From left to right are the original LRMS/PAN images, results of PNN, RSIFNN, PanNet, PSGAN, and the PercepPan with the ESRGAN generator, respectively.

**Table 1.**Nyquist cutoff frequencies (Nyquist) and linear weights (Weight) of different satellites for each spectral band.

Satellite | Item | Blue | Green | Red | Near Infrared |
---|---|---|---|---|---|

QuickBird | Nyquist | $0.34$ | $0.32$ | $0.30$ | $0.22$ |

Weight | $0.1139$ | $0.2315$ | $0.2308$ | $0.4239$ | |

IKONOS | Nyquist | $0.26$ | $0.28$ | $0.29$ | $0.28$ |

Weight | $0.1071$ | $0.2646$ | $0.2696$ | $0.3587$ |

Dataset | Training Paradigm | HRMS Patches | LRMS Patches | PAN Patches | Training Type | Quality Assessment |
---|---|---|---|---|---|---|

Full-scale | SPUF | None | Original MS patches | Original PAN patches | Unsupervised | No-reference |

Reduced-scale | SPSF | Original MS patches | Degraded MS patches | Degraded PAN patches | Supervised | Full-reference |

Satellite | Wave Length (nm) | Spatial Resolution (m) | |||||
---|---|---|---|---|---|---|---|

Blue | Green | Red | Near Infrared | Panchromatic | Multispectral | Panchromatic | |

QuickBird | 450–520 | 520–600 | 630-690 | 780–900 | 450–900 | $2.8$ | $0.7$ |

IKONOS | 445–516 | 506–595 | 632–698 | 757–853 | 450–900 | 4 | 1 |

Satellite | Dataset Type | Patch Size | #Training | #Validation | #Test | |
---|---|---|---|---|---|---|

Multispectral | Panchromatic | |||||

QuickBird | Full-scale | $64\times 64$ | $256\times 256$ | 13,494 | 4498 | 4499 |

Reduced-scale | 820 | 274 | 274 | |||

IKONOS | Full-scale | 1574 | 525 | 525 | ||

Reduced-scale | 90 | 30 | 30 |

**Table 5.**Quality assessment of the proposed PercepPan under different settings on QuickBird dataset. “—” means the corresponding entry is invalid and the best value of each index is shown in parentheses.

Initialization | Hyper-Parameter | Reduced-Scale | Full-Scale | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

$\mathit{\alpha}$ | $\mathit{\beta}$ | $\mathit{\gamma}$ | ${\mathit{\eta}}_{\mathit{G}}$ | ${\mathit{\eta}}_{\mathit{D}}$ | SAM (0) | PSNR (∞) | SCC (1) | Q-index (1) | SSIM (1) | ERGAS (0) | ${\mathbf{D}}_{\mathit{\lambda}}$ (0) | ${\mathbf{D}}_{\mathbf{s}}$ (0) | QNR (1) | |

Random | 1 | 0 | 0 | 1 × 10^{−4} | — | $0.121$ | $33.494$ | $0.656$ | $0.381$ | $0.891$ | $4.938$ | $0.148/0.160$ | $0.225/0.151$ | $0.660/0.714$ |

1 × 10^{−5} | — | $0.125$ | $33.051$ | $0.631$ | $0.347$ | $0.871$ | $5.113$ | $0.136/0.151$ | $0.299/0.180$ | $0.605/0.697$ | ||||

0 | 1 | $0.01$ | 1 × 10^{−4} | 1 × 10^{−4} | $0.137$ | $31.505$ | $0.731$ | $0.441$ | $0.888$ | $10.035$ | $0.167/0.135$ | $0.176/0.145$ | $0.686/0.740$ | |

1 × 10^{−4} | 1 × 10^{−5} | $0.118$ | $35.023$ | $0.745$ | $\mathbf{0.495}$ | $\mathbf{0.919}$ | $6.641$ | $0.156/0.158$ | $0.164/\mathbf{0.101}$ | $0.705/\mathbf{0.756}$ | ||||

1 × 10^{−5} | 1 × 10^{−4} | $0.131$ | $32.830$ | $0.674$ | $0.424$ | $0.904$ | $6.034$ | $0.191/0.137$ | $0.162/0.144$ | $0.678/0.738$ | ||||

1 | 1 | $0.01$ | 1 × 10^{−4} | 1 × 10^{−4} | $\mathbf{0.098}$ | $35.628$ | $0.704$ | $0.424$ | $0.909$ | $4.179$ | $0.192/0.152$ | $0.186/0.141$ | $0.658/0.729$ | |

1 × 10^{−4} | 1 × 10^{−5} | $0.107$ | $\mathbf{35.651}$ | $\mathbf{0.748}$ | $0.453$ | $0.912$ | $\mathbf{4.009}$ | $\mathbf{0.131}/\mathbf{0.118}$ | $\mathbf{0.129}/0.168$ | $\mathbf{0.756}/0.734$ | ||||

1 × 10^{−5} | 1 × 10^{−4} | $0.124$ | $34.128$ | $0.701$ | $0.425$ | $0.902$ | $5.197$ | $0.173/0.141$ | $0.174/0.153$ | $0.684/0.727$ | ||||

PSNR | 1 | 0 | 0 | 1 × 10^{−4} | — | $0.116$ | $34.505$ | $0.806$ | $0.459$ | $0.902$ | $4.400$ | $0.155/0.142$ | $0.237/0.148$ | $0.644/0.731$ |

1 × 10^{−5} | — | $0.213$ | $29.682$ | $0.607$ | $0.371$ | $0.872$ | $8.544$ | $0.246/0.147$ | $0.262/0.129$ | $0.556/0.743$ | ||||

0 | 1 | $0.01$ | 1 × 10^{−4} | 1 × 10^{−4} | $0.112$ | $34.213$ | $0.755$ | $0.467$ | $0.921$ | $8.800$ | $0.146/0.147$ | $0.139/\mathbf{0.114}$ | $0.735/0.756$ | |

1 × 10^{−4} | 1 × 10^{−5} | $\mathbf{0.087}$ | $\mathbf{36.491}$ | $0.783$ | $\mathbf{0.504}$ | $0.925$ | $6.471$ | $0.132/0.131$ | $\mathbf{0.133}/0.130$ | $\mathbf{0.752}/0.756$ | ||||

1 × 10^{−5} | 1 × 10^{−4} | $0.122$ | $34.336$ | $0.748$ | $0.464$ | $0.919$ | $8.066$ | $0.149/0.157$ | $0.152/0.164$ | $0.721/0.704$ | ||||

1 | 1 | $0.01$ | 1 × 10^{−4} | 1 × 10^{−4} | $0.109$ | $35.969$ | $0.765$ | $0.471$ | $0.917$ | $4.309$ | $0.170/0.131$ | $0.177/0.116$ | $0.683/\mathbf{0.768}$ | |

1 × 10^{−4} | 1 × 10^{−5} | $0.108$ | $36.080$ | $\mathbf{0.814}$ | $0.473$ | $\mathbf{0.935}$ | $4.190$ | $\mathbf{0.128}/0.138$ | $0.140/0.117$ | $0.750/0.762$ | ||||

1 × 10^{−5} | 1 × 10^{−4} | $0.120$ | $35.444$ | $0.771$ | $0.461$ | $0.916$ | $\mathbf{4.090}$ | $0.143/\mathbf{0.125}$ | $0.152/0.127$ | $0.727/0.764$ | ||||

ESRGAN | 1 | 0 | 0 | 1 × 10^{−4} | — | $0.118$ | $33.949$ | $0.676$ | $0.400$ | $0.897$ | $4.596$ | $0.151/0.162$ | $0.230/0.161$ | $0.654/0.703$ |

1 × 10^{−5} | — | $0.121$ | $33.036$ | $0.619$ | $0.328$ | $0.872$ | $5.084$ | $\mathbf{0.123}/0.144$ | $0.262/0.148$ | $0.647/0.730$ | ||||

0 | 1 | $0.01$ | 1 × 10^{−4} | 1 × 10^{−4} | $0.128$ | $30.959$ | $0.744$ | $0.452$ | $0.900$ | $9.092$ | $0.143/0.130$ | $0.143/0.127$ | $0.734/0.760$ | |

1 × 10^{−4} | 1 × 10^{−5} | $0.104$ | $35.753$ | $0.762$ | $\mathbf{0.487}$ | $0.922$ | $6.866$ | $0.174/\mathbf{0.119}$ | $0.168/0.129$ | $0.687/\mathbf{0.768}$ | ||||

1 × 10^{−5} | 1 × 10^{−4} | $0.130$ | $33.509$ | $0.686$ | $0.419$ | $0.907$ | $5.613$ | $0.179/0.152$ | $0.139/\mathbf{0.110}$ | $0.707/0.754$ | ||||

1 | 1 | $0.01$ | 1 × 10^{−4} | 1 × 10^{−4} | $0.108$ | $35.107$ | $0.684$ | $0.429$ | $0.912$ | $5.003$ | $0.148/0.122$ | $0.149/0.139$ | $0.725/0.756$ | |

1 × 10^{−4} | 1 × 10^{−5} | $\mathbf{0.081}$ | $\mathbf{36.525}$ | $\mathbf{0.761}$ | $0.451$ | $\mathbf{0.926}$ | $\mathbf{3.479}$ | $0.132/0.151$ | $\mathbf{0.128}/0.111$ | $\mathbf{0.757}/0.754$ | ||||

1 × 10^{−5} | 1 × 10^{−4} | $0.117$ | $34.322$ | $0.656$ | $0.416$ | $0.915$ | $4.907$ | $0.186/0.123$ | $0.188/0.127$ | $0.660/0.766$ |

**Table 6.**Quality assessment of different methods under supervised/unsupervised manner on the QuickBird dataset. The best value of each index is shown in parentheses.

Methods | Reduced-Scale | Full-Scale | |||||||
---|---|---|---|---|---|---|---|---|---|

SAM (0) | PSNR (∞) | SCC (1) | Q-Index (1) | SSIM (1) | ERGAS (0) | ${\mathbf{D}}_{\mathit{\lambda}}$ (0) | ${\mathbf{D}}_{\mathbf{s}}$ (0) | QNR (1) | |

PNN [14] | $0.108$ | $35.225$ | $0.814$ | $0.217$ | $0.871$ | $3.861$ | $0.158/\mathbf{0.122}$ | $0.183/\mathbf{0.149}$ | $0.688/\mathbf{0.747}$ |

RSIFNN [15] | $0.081$ | $37.898$ | $0.835$ | $0.445$ | $0.913$ | $6.282$ | $0.081/\mathbf{0.068}$ | $\mathbf{0.125}/0.129$ | $0.805/\mathbf{0.812}$ |

PanNet [16] | $0.081$ | $37.910$ | $0.835$ | $0.444$ | $0.912$ | $6.279$ | $0.104/\mathbf{0.072}$ | $\mathbf{0.112}/0.130$ | $0.794/\mathbf{0.808}$ |

PSGAN [17] | $0.105$ | $35.458$ | $0.740$ | $0.463$ | $0.922$ | $4.127$ | $\mathbf{0.123}/0.131$ | $0.112/\mathbf{0.095}$ | $0.779/\mathbf{0.787}$ |

PercePan | $0.081$ | $36.525$ | $0.761$ | $0.451$ | $0.926$ | $3.479$ | $\mathbf{0.132}/0.151$ | $0.128/\mathbf{0.111}$ | $\mathbf{0.757}/0.754$ |

**Table 7.**Generalization performance of the different methods on the IKONOS dataset. The best value of each index is shown in parentheses.

Methods | Reduced-Scale | Full-Scale | |||||||
---|---|---|---|---|---|---|---|---|---|

SAM (0) | PSNR (∞) | SCC (1) | Q-Index (1) | SSIM (1) | ERGAS (0) | ${\mathbf{D}}_{\mathit{\lambda}}$ (0) | ${\mathbf{D}}_{\mathbf{s}}$ (0) | QNR (1) | |

PNN [14] | $0.437$ | $28.804$ | $0.676$ | $0.223$ | $0.784$ | $8.939$ | $0.182/\mathbf{0.112}$ | $0.284/\mathbf{0.234}$ | $0.585/\mathbf{0.679}$ |

RSIFNN [15] | $0.352$ | $31.898$ | $0.721$ | $0.366$ | $0.919$ | $11.758$ | $\mathbf{0.055}/0.097$ | $0.192/\mathbf{0.136}$ | $0.764/\mathbf{0.781}$ |

PanNet [16] | $0.357$ | $30.969$ | $0.701$ | $0.326$ | $0.901$ | $11.088$ | $\mathbf{0.096}/0.109$ | $0.218/\mathbf{0.130}$ | $0.708/\mathbf{0.774}$ |

PSGAN [17] | $0.292$ | $28.065$ | $0.609$ | $0.455$ | $0.751$ | $9.813$ | $\mathbf{0.146}/0.153$ | $0.224/\mathbf{0.162}$ | $0.663/\mathbf{0.710}$ |

PercepPan | $0.314$ | $30.836$ | $0.696$ | $0.252$ | $0.797$ | $8.639$ | $\mathbf{0.171}/0.185$ | $0.202/\mathbf{0.188}$ | $0.660/\mathbf{0.662}$ |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Zhou, C.; Zhang, J.; Liu, J.; Zhang, C.; Fei, R.; Xu, S.
PercepPan: Towards Unsupervised Pan-Sharpening Based on Perceptual Loss. *Remote Sens.* **2020**, *12*, 2318.
https://doi.org/10.3390/rs12142318

**AMA Style**

Zhou C, Zhang J, Liu J, Zhang C, Fei R, Xu S.
PercepPan: Towards Unsupervised Pan-Sharpening Based on Perceptual Loss. *Remote Sensing*. 2020; 12(14):2318.
https://doi.org/10.3390/rs12142318

**Chicago/Turabian Style**

Zhou, Changsheng, Jiangshe Zhang, Junmin Liu, Chunxia Zhang, Rongrong Fei, and Shuang Xu.
2020. "PercepPan: Towards Unsupervised Pan-Sharpening Based on Perceptual Loss" *Remote Sensing* 12, no. 14: 2318.
https://doi.org/10.3390/rs12142318