Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Shallow-to-Deep Spatial–Spectral Feature Enhancement for Hyperspectral Image Classification

Remote Sens. 2023, 15(1), 261; https://doi.org/10.3390/rs15010261

by Lijian Zhou¹

, Xiaoyu Ma¹, Xiliang Wang¹, Siyuan Hao¹, Yuanxin Ye²

and Kun Zhao^1,*

Reviewer 1: Anonymous

Reviewer 2:

Chein-I Chang

Remote Sens. 2023, 15(1), 261; https://doi.org/10.3390/rs15010261

Submission received: 12 December 2022 / Revised: 24 December 2022 / Accepted: 28 December 2022 / Published: 1 January 2023

(This article belongs to the Section Remote Sensing Image Processing)

Round 1

Reviewer 1 Report (Previous Reviewer 1)

Review of

Shallow-to-Deep Spatial-Spectral Feature Enhancement for Hyperspectral Image Classification

by Lijian Zhou, Xiaoyu Ma, Siyuan Hao, Yuanxin Ye, Kun Zhao

Overview

This paper presents an algorithm of hyperspectral image classification based on the so-called shallow-to-deep feature enhancement model with several integrated modules based on convolutional neural networks and a vision transformer. The authors present their algorithm step by step from principal component analysis as a preprocessing step over 2-layer 3D-CNN-based Shallow Spatial-Spectral Feature Extraction module and Res-SEConv module to the vision transformer-based module extracting the spatial-spectral features. The performance of the presented algorithm is tested on three well-known hyperspectral databases Indian Pines, Pavia University, and Salinas. According to the presented results, the algorithm outperforms the standard algorithm and neural network used for hyperspectral classification. The disadvantage of the presented approach is the time cost and memory requirements.

Comments

The authors have done a decent job revising the paper. The previous suggestions and questions have been fully addressed and answered. I have only a few additional, mostly minor, comments on the current state of the paper.

There are several newly included references regarding the topic, but still, many other state-of-the-art ones that involve CNN and other deep networks can be added.
Lines 32- 34: “Since the PCA method can extract the most important spectral components for classification, it is chosen to reduce the spectral information redundancy and dimension in this paper.” In my view, this sentence should be incorporated more farther in the introduction, where the authors’ method is described (e.g. in line 85).
Line 306: The P_e expression should be included in a separate equation, not within the text, so could the N expression.
Tables 10-12: In what units and how are the classification accuracies assessed for the specific classes?
Tables 10-12: The units should be added. Moreover, consider splitting the OA, AA and Kappa results, at least by a horizontal line, because they do not correspond to the class header name of the table.
Fig 15: Please, consider placing the legend in a different position to uncover the hidden curves below.
Table 13: For clearer presentation, the word “Proposed” could be placed in parentheses, and the first should be the name of your method “, SDFE”.
Table 14: The unit (M) should be explained or described more clearly within the table.
Table 14: Why only the IP dataset is included?
Line 447: The future aims in terms of the “lightweight” of the network could be described in more detail, with a few more sentences. It would definitely increase the impact of the discussion section.
Because the number of abbreviations increased, for easier text following, I recommend adding an additional Abbreviation section at the end of the manuscript.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report (Previous Reviewer 2)

This paper develops a Shallow-to-Deep Feature Enhancement (SDFE) approach to HSIC, which is designed to capture the spatial-spectral joint features extraction with local and global information at different scales. Unfortunately, this paper did not discuss and compare spectral-spatial classification methods which have been to perform very well such as [1]. For example, this paper used 10% of the total data samples as the training samples for IP dataset and 5% of the total data samples of the PU and SA datasets as training samples. Ref. [1] only used 80 data samples as training samples for all the three data sets to produce over 99.7% OA/99.6% AA which are better than that obtained in this paper using much fewer training samples without appealing for PCA. So, this reviewer cannot see what advantages can be gained from your proposed method, SSSFE which must tune several parameters, learning rate, patch size, number of principal components, number of kernels, size of depth, etc.?

Another major issue with this paper is that this paper only reviewed CNN-based classification methods, while discarding many spectral-spatial classification methods available in the literature which can outperform the proposed method such as ref. [1] and refs. [2-3] along with its variants. As a matter of fact, there are spectral-spatial classification methods published in the past 2-3 years in TGRS, JSTARS, GRSL, RS which did perform better than the proposed method. This paper simply failed to conduct a decent literature review. This reviewer strongly encourages the authors to go back literature survey on spectral-spatial classification methods which also attempt to do the same task as this paper does by capture spectral and spatial features.

1. K. Ma and C.-I Chang, “Iterative training sampling with active learning for semi-supervised spectral-spatial hyperspectral image classification,” IEEE Trans. on Geoscience and Remote Sensing, vol. 59, no. 10, pp. 8672-8692, October 2021.

2. X. Kang, S. Li and J. A. Benediktsson, “Spectral-spatial hyperspectral image classification with edge-preserving filtering,” IEEE Trans. on Geoscience and Remote Sensing, vol. 52, pp. 2666-2677, May 2014.

3. P. Ghamisi, E. Maggiori, S. Li, R. Souza, Y. Tarabalka, G. Moser, A. D. Giorgi, L. Fang, Y. Chen, M. Chi, S. B. Serpico, and J. A. Benediktsson “New frontiers in spectral-spatial hyperspectral image classification,” IEEE Geoscience and Remote Sensing Magazine, pp. 10-43, September 2018.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

Review of

Shallow-to-Deep Spatial-Spectral Feature Enhancement for Hyperspectral Image Classification

Lijian Zhou, Xiaoyu Ma, Siyuan Hao, Yuanxin Ye, Kun Zhao

Overview

Comments

Hyperspectral classification is nowadays a very popular task among many researchers focused on hyperspectral imaging. Deep neural networks for classification tasks have become highly popular among researchers in recent years. The authors have introduced an interesting advanced approach based on their own classification network with several modules. The overall aim of this paper is quite clear. However, the introduction needs severe improvement to better capture the state-of-the-art, the work on which the authors build upon, and the need for the Authors’ presented approach. The authors should address other issues within the paper to improve the presentation, clear understandability of the paper, and comparison to other methods. The following lines cover my comments and recommendations for the authors.

The introduction is really vast, but the main aim and why your algorithm is of interest to the field of hyperspectral image classification is somewhat unclear. Please include how your method builds upon the previous research and what brings new to the hyperspectral image classification research community.
There are several contributions mentioned in the introduction. However, how is this connected to the contribution of this paper? From my point of view, the contribution points describe only the presented algorithm.
From the presented introduction, it needs to be clarified why exactly did you select the shallow-to-deep approach and other used methods.
There are only a few very recent references regarding hyperspectral imaging applications, and some applications are missing (especially in lines 21-22 ). Please add some recent references.
The division of sections into enumerated points is very often not necessary and distracting.

The following section contains my comments and suggestions in a more detailed manner.

Line 20: What do you mean by “rich spatial information”?
Line 24: What do you mean by “increasingly high”? This expression is a bit ambiguous.
Line 25-26: “The number of spectra in the currently published datasets is more than 100, but the actual categories of objects are generally less than 20.” To what database do you refer?
Lines 31-33: According to what decision have you selected the PCA method for your algorithm? Why are the other methods unsuitable?
Why did you cite reference [9] in the section on machine learning?
Is the paragraph in lines 34-39 essential for the introduction of your paper? Or should it not be incorporated into the CNN comparison (Lines 49-52)?
Please check if all the abbreviations are explained in the text, such as PCA or HSI etc. They should be explained in the text as well as in the abstract.
Line 174: Why did you select 30 bands from the PCA method? Why was the initial number of bands 103? What if the number of bands was 700?
The PU abbreviation needs to be also explained in the text.
In my view, Figure 4 is not necessary. It needs to explain something new to the potential reader. The PCA method is a standard tool in hyperspectral imaging, as well as the images from the PU dataset.
Line 187: What does the “S” symbol represent?
Line 190: What do the “T_0-T_3” symbols represent?
Line 200: “SENet” should be referenced.
Section 3.4: In my view, the division into the enumerated paragraphs/points is unnecessary.
Section 4.1: The URL reference to the datasets should be added to the paper.
Line 273: What do you mean by train and test samples? Pixels? Areas of defined class?
Section 4.2: How do you compare the results of your method against the ground truth? On a pixel level?
Section 4.2: From my point of view, the division into the enumerated paragraphs/points is unnecessary.
Section 4.3: How long does the training last?
Section 4.3: Why have selected parameters: cross-entropy, Adam optimizer, learning rate, PCA components 30 and batch size 64? Why did you select these values, and how did you optimize them?
The discussion section should be included in the previous results section. It additionally describes the results of the algorithm with different portions of training data. Only the last sentence appears as a “discussion”.
Line 408: You have mentioned “memory consumption” and “time cost”, but you did not mention the time cost and memory requirements of your method. How time-consuming is your approach compared to the other methods presented in the results section? This should be added, for example, in a form of a table to the manuscript.

Additional questions

What are the limits of your method? How can it be improved in the future?

Reviewer 2 Report

This paper suffers from many major serious issues.

1. The abstract of this paper is poorly written. After reading abstract, this reviewer still does not know what SA)CNN will do. How deep learning is trained. Is it trained to learn background or anomalies? How are training samples selected? What hyperspectral features does the deep feature extraction neural network try to learn?

2. This paper conducted a very poor literature review and missed many recently published papers, particularly those published in TGRS in the past 2 yeas. Many most recent published HAD have not reviewed. The authors seems to limit themselves to CNN and AE. There are many other model-based approaches such as Godec-based methods, component decomposition analysis , tensor decomposition have been shown to perform better than the compared methods CRD, LSAD. The authors missed lots of references such as

· S. Chang, B. Du and L. Zhang, “BASO: a background-anomaly component projection and separation optimized filter for anomaly detection in hyperspectral images,” IEEE Trans. on Geoscience and Remote Sensing, vol. 56, no. 7, PP. 3747-3761, July 2018.

· J. Zhong, W. Xie, Y. Li, J. Lei and Q. Du, “Characterization of background-anomaly separability with generative adversarial network for hyperspectral anomaly detection,” IEEE Trans. on Geoscience and Remote Sensing, vol 59, no. 7, pp 6017-6028, July 2021.

· L. Li, W. Li, Q. Du and R. Tao, “Low-rank and sparse decomposition with mixture of Gaussian for hyperspectral anomaly detection,” IEEE Trans. on Cybernetics, vol. 51, no. 9, pp. 4363-4372, Sept. 2021.

· L. Li, W. Li, Y. Qu, C. Zhao, R. Tao and Q. Du, “Prior-based tensor approximation for anomaly detection in hyperspectral imagery,” IEEE Trans. on Neural Network and Learning Systems, vol. 33, no. 3, pp. 1037-1050, March 2022.

· C.-I Chang, “Hyperspectral anomaly detection: a dual theory of hyperspectral target detection,” IEEE Trans. Geoscience Remote Sensing vol. 60, 2022.

· S. Wang, X. Wang, L. Zhang, Y. Zhong, “Deep low-rank prior for hyperspectral anomaly detection,” IEEE Trans. on Geoscience and Remote Sensing, vol. 60, 2022.

· W. Xie, B. Liu, Y. Li, J. Lei, Q. Du, “Autoencoder and adversarial-learning-based semisupervised background estimation for hyperspectral anomaly detection,” IEEE Trans. on Geoscience and Remote Sensing, vol. 58, no. 8, pp. 5416-5427, August 2020.

3. Your ROC curve analysis is insufficient and not reliable. The box plot is not quantitative since you need visual inspection to determine performance. How did you generate your ROC curves of (PD,PF), (PD,tau) and (PF,tau)? There are recent references on how to use these three curves to evaluate the performance. You missed them all.

4. Anomaly detection is not only determined by its detection but also background suppression.

5. he San Diego airport data is not suitable for HAD. This is because the airplanes are so large and visible which cannot be considered as anomalies. If you look at your results of LSA, CRD and GRX in Fig 6(f), these anomaly detectors did not detect the three airplanes which make sense. So, if these three detectors are considered as anomaly detectors, then your SAPCNN is not. If your SAOCNN is an anomaly detector, then these three detectors are not. As matter of fact, in the San Diego data, the anomalies should those located at upper right corner not the airplanes. Therefore, using the airplanes as ground truth to evaluate the anomaly detection performance is misleading. Similar conclusions are also applied to cost image in Fig. 6(a).

6. Your Fig. 6 does not provide any useful information.

7. As mentioned, the three anomaly detectors, LSAD, CRD and GRX have been shown to perform not as well as many recently developed anomaly detectors. Using them as comparison does not justify your method.

8. Most importnatly, how can you fairly compare your training samples-based HAD to HAD without using training samples such as GRX. The reason that HAD is called anomaly detection because it does not require any prior knowledge. If we know the training samples, we can simply do target detection which will do better than anomaly detection.

Finally, the main issues in this paper are (1) insufficient literature survey in which case you are not aware of the fact that many anomaly detection methods recently developed may indeed perform better than your method; (2) your performance measures are insufficient in which case, you cannot evaluate background suppression effectively as well as joint anomaly etction and background suppression; (3) inappropriate use of data sets which may lead to incorrect conclusion.

Reviewer 3 Report

the paper proposes a new model for hyperspectral image classification. the model consists of three blocks SSSFE, Res-SEConv, and VTFE. The SSSFE consists of 2 3D-CNN layers, the Res-SEConv utilizes the use of Squeeze and Excitation networks to enhance band-wise features, while the VTFE is used to enhance the feature maps produced by the previous layer. Performance of the proposed approach was compared against state of the art methods and shows superiority in terms of overall accuracy, Average accuracy and Kappa.

I have few questions:
1. The authors decided to use 30 principal components for dimensionality reduction, is there a specific reason? I suggest the to add a part in section 4.4 and try different values.
2. in the Res-SEConv module, GeLU activation was made, does it have any advantage over ReLU? the authors need to elaborate more on this part.

Article Menu

Shallow-to-Deep Spatial–Spectral Feature Enhancement for Hyperspectral Image Classification

Further Information

Guidelines

MDPI Initiatives

Follow MDPI