A Novel Fully Convolutional Auto-Encoder Based on Dual Clustering and Latent Feature Adversarial Consistency for Hyperspectral Anomaly Detection

Zhao, Rui; Yang, Zhiwei; Meng, Xiangchao; Shao, Feng

doi:10.3390/rs16040717

Open AccessArticle

A Novel Fully Convolutional Auto-Encoder Based on Dual Clustering and Latent Feature Adversarial Consistency for Hyperspectral Anomaly Detection

Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(4), 717; https://doi.org/10.3390/rs16040717

Submission received: 22 January 2024 / Revised: 12 February 2024 / Accepted: 13 February 2024 / Published: 18 February 2024

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

With the development of artificial intelligence, the ability to capture the background characteristics of hyperspectral imagery (HSI) has improved, showing promising performance in hyperspectral anomaly detection (HAD) tasks. However, existing methods proposed in recent years still suffer from certain limitations: (1) Constraints are lacking in the deep feature learning process in terms of the issue of the absence of prior background and anomaly information. (2) Hyperspectral anomaly detectors with traditional self-supervised deep learning methods fail to ensure prioritized reconstruction of the background. (3) The architecture of fully connected deep networks in hyperspectral anomaly detectors leads to low utilization of spatial information and the destruction of the original spatial relationship in hyperspectral imagery and disregards the spectral correlation between adjacent pixels. (4) Hypotheses or assumptions for background and anomaly distributions restrict the performance of many hyperspectral anomaly detectors because the distributions of background land covers are usually complex and not assumable in real-world hyperspectral imagery. In consideration of the above problems, in this paper, we propose a novel fully convolutional auto-encoder based on dual clustering and latent feature adversarial consistency (FCAE-DCAC) for HAD, which is carried out with self-supervised learning-based processing. Firstly, density-based spatial clustering of applications with a noise algorithm and connected component analysis are utilized for successive spectral and spatial clustering to obtain more precise prior background and anomaly information, which facilitates the separation between background and anomaly samples during the training of our method. Subsequently, a novel fully convolutional auto-encoder (FCAE) integrated with a spatial–spectral joint attention (SSJA) mechanism is proposed to enhance the utilization of spatial information and augment feature expression. In addition, a latent feature adversarial consistency network with the ability to learn actual background distribution in hyperspectral imagery is proposed to achieve pure background reconstruction. Finally, a triplet loss is introduced to enhance the separability between background and anomaly, and the reconstruction residual serves as the anomaly detection result. We evaluate the proposed method based on seven groups of real-world hyperspectral datasets, and the experimental results confirm the effectiveness and superior performance of the proposed method versus nine state-of-the-art methods.

Keywords:

hyperspectral imagery; anomaly detection; self-supervised learning; fully convolutional auto-encoder; latent feature adversarial consistency; triplet loss

1. Introduction

Hyperspectral imagery (HSI) contains abundant spatial and spectral information [1,2,3]. Hyperspectral remote sensors collect hyperspectral images by accreting two spatial dimensions of the image with an additional spectral dimension that comprises hundreds or thousands of approximately continuous spectral curves for land cover. This data collection pattern forms a 3-D hyperspectral image cube. In hyperspectral imagery, the spectral information of each pixel corresponds to a distinct spectral curve [4]. The high spectral resolution of the hyperspectral image makes it possible to distinguish different ground objects by obtaining reliable spectral characteristics [5,6]. Extensive applications can be carried out with hyperspectral imagery, such as target detection [7], classification [8], change detection [9], etc. For hyperspectral anomaly detection (HAD), pixels that have distinct spectral curves and take up a very small spatial proportion in the hyperspectral imagery are recognized as anomaly targets. The abundant spatial and spectral information in hyperspectral imagery is a benefit to the detection of anomaly targets in hyperspectral imagery, even without any prior information about their spectral characteristics [10,11]. The practical application of HAD does not require prior spectral information, which alleviates the need for the extensive allocation of labor and material resources to acquire the background and anomaly spectral information in advance [12]. Consequently, the inherent advantage of HAD lies in its independence from prior spectral information, making it highly suitable for real-world scenarios. HAD is currently extensively utilized in military reconnaissance, environmental monitoring, and search and rescue missions [13,14,15].

In the past two decades, there has been a continuous emergence of models and methods in the research field of anomaly detection for hyperspectral remote sensing imagery [1]. There are two main categories of hyperspectral detection methods: classical methods and deep learning-based methods.

The earliest classical method is Reed–Xiaoli (RX) [16], which assumes a multivariate Gaussian distribution to model the background and quantifies anomaly targets by calculating the Mahalanobis distance between the measured pixel and estimated background, and it can serve as a benchmark for HAD. Subsequently, various extended versions of RX algorithms, such as the local RX (LRX) [17] algorithm, which uses the strategy of inner and outer double windows to model the local background, and subspace RX [18] (SSRX), which reduces the impact of anomaly contamination on background estimation by projecting into the subspace emerged. To estimate the background model more accurately, weighted RX (WRX) and linear filter-based RX (LF-RX) [19] have been proposed. The aforementioned methods, however, are only suitable for simple application scenarios and often perform poorly in complex scenarios. This implies that not all backgrounds conform to the assumption of a multivariate Gaussian distribution [20]. Kernel RX (KRX) [21] was proposed to try to address this issue by employing high-dimensional feature space mapping for each pixel to accurately estimate the background, which presents a nonlinear variant of the RX algorithm. The subsequent emergence of a series of advanced related methods, such as the clustering KRX (CKRX) [22] and Robust Nonlinear Anomaly Detection (RNAD) [23] methods, has significantly enriched this field. Additionally, FRFE [24] maps all pixels in the original spectral domain to the Fourier domain (FRFE) in order to enhance the distinction between background and anomaly to improve detection accuracy. In order to further address the issue of unreasonable assumptions regarding the statistical distribution of backgrounds and enhance the suitability of models for complex application scenarios, representation-based methods have been developed. Representation-based methods are categorized into sparse representation (SR), collaborative representation (CR), and low-rank representation (LRR) depending on the type of regularization constraints [1]. The typical methods include the CR-based detector (CRD) [25], the LRaSMD-based Mahalanobis distance method for anomaly detection (LSMAD) [26], the abundance- and dictionary-based low-rank decomposition (ADLR) method [27], and the anomaly detection method based on low-rank sparse representation (LRASR) [28]. The aforementioned methods address the issue of unfounded assumptions regarding background distribution. However, the establishment of dictionary optimization necessitates the inclusion of regularization parameters. Unfortunately, due to the lack of prior information, determining specific values for these regularization parameters becomes rather difficult [29]. The aforementioned methods primarily employ spectral discrimination for hyperspectral anomaly detection and neglect the utilization of spatial information. Consequently, another branch is using spatial discrimination to detect anomalies. The recently proposed the attribute and edge-preserving filtering (AED) method [30] and structure tensor- and guided filtering-based HAD (STGD) algorithm [31], which exhibit excellent detection performance in detecting anomalies through local filtering operations. However, these methods tend to overlook the significance of spectral information.

In recent years, research has proved the remarkable power of deep neural networks in modeling complex datasets and mining high-dimensional information, which enables them to extract more representative features compared with conventional methods while exhibiting exceptional feature expression capabilities [32,33,34,35]. The utilization of deep learning techniques has progressively gained prominence for HAD [36]. A mass of HAD methodologies rooted in deep learning has emerged, which can be broadly categorized into two distinct groups: supervised learning (SL) and unsupervised learning (UL) methods. The most common supervised HAD method is CNND [37], which requires the utilization of a reference image scene containing labeled samples (captured using the same sensor) to generate training pairs and train a CNN network capable of outputting the similarity between the center pixel and its surrounding pixels. A new Siamese network is proposed in [38] as the backbone of the CNND network, and it computes the similarity score between the pixel to be measured and the surrounding pixels in the hidden layer level. Song et al. [39] combined CNN with low-rank representation (LRR) for HAD. They employed CNN to generate robust abundance maps and then input these maps into the LRR model to construct a dictionary. The supervised learning method is constrained by the availability of annotated labels and training samples, which does not meet the premise of the lack of spectral prior knowledge and compromises its flexibility and generalization in practical applications. The unsupervised learning methods for HAD, in contrast, offer a significant advantage by eliminating the need for labeling training samples and solely relying on inputting the original HSI as training data. These methods typically employ the auto-encoder (AE) model [40] and a generative adversarial network (GAN) to extract the deep intrinsic spectral characteristics of HSI. Bati et al. [41] and Arisoy et al. [42] are the pioneers who introduced the AE model and GAN to HAD, respectively. They assume that the anomaly pixels are sparsely distributed compared with the background pixels. Consequently, reconstructing the background pixels is easier than reconstructing the anomaly pixels during reconstruction processing, which brings significantly smaller reconstruction errors for background pixels. Therefore, these reconstruction errors can effectively indicate the number of anomaly targets in each pixel. Additionally, there are also some methods that reconstruct the HSI with stronger discrimination between the background and anomaly or apply traditional methods such as RX to detect anomalies in enhanced residual image. However, due to the robust reconstruction capability of the AE model and GANs, it becomes impossible to ensure whether an anomaly is reconstructed during the actual reconstruction processing. In other words, determining the learning direction of the deep network during training remains indeterminate [43]. To alleviate this problem, HADGAN was proposed [44], which employs a GAN to enable the latent feature layer to acquire knowledge from the multivariate normal background distribution. This enables the deep network to focus on generating the background. A guided auto-encoder (GAED) is proposed [45] to incorporate a guided module based on guided images into the deep network. Hence, it leverages feedback information to effectively reduce the feature representation of anomaly targets. However, the aforementioned methods solely focus on pixel-level reconstruction in the spectral domain, which leads to the loss of spatial structure in HSI and hinders the ability of deep networks to capture spatial context information. In order to enhance the utilization of spatial information [46], incorporates graph regularization into the hidden layer (RGAE) of the auto-encoder. Moreover, a residual self-attention-based auto-encoder for HAD (RSAAE) is proposed in [47], which employs residual attention to concentrate on the spatial characteristics of HSI. Wang et al. [48] are the pioneers who proposed a fully convolutional auto-encoder for HAD (Auto-AD) that employs adaptive learning to suppress the reconstruction of anomaly targets. However, it is still constrained by the underlying assumption of background distribution due to its use of multivariate normal distribution noise as an input for training. Additionally, Wang et al. [49,50] propose a blind spot reconstruction network that utilizes the surrounding pixel features to reconstruct the blind spot pixels, and which exploits a novel paradigm for HSI reconstruction.

As mentioned above, deep learning-based hyperspectral anomaly detection approaches meet the following limitations and challenges:

(1): The deep network for hyperspectral anomaly detection lacks a clear learning direction and merely relies on the assumption of high reconstruction errors to identify anomalies that fail to meet the requirements of diverse hyperspectral anomaly detection scenarios. It is urgent for us to develop a method that provides a learning approach for a hyperspectral anomaly-detection deep network that includes guidance for its training phase.
(2): The current state-of-the-art methods of hyperspectral anomaly detection primarily rely on pixel-level spectral reconstruction for deep learning-based methods, which inappropriately comes to terms with the spatial structure of HSI and interferes with the deep network’s ability to learn any spatial features. In reality, spatial information plays a crucial role in hyperspectral anomaly detection. The lack of spatial structure analysis brings limitations to the detection performance of certain existing approaches.
(3): The background of HSI is inherently multivariate and complex. However, most traditional and deep learning-based methods still assume a multivariate normal distribution for the hyperspectral background. This assumption does not always hold true for the real-world complex background of HSI, in which existing algorithms are inadequate for adapting to such scenes. As a result, applications of these hyperspectral anomaly detection methods are mostly limited to simple scenarios.

In this study, we consider the aforementioned three challenges to be our original inspiration. In regard to the first challenge, it is necessary to interpret the learning methodology of a deep network and provide a preliminary understanding to distinguish anomalies from background elements. To address the first challenge, we introduce a dual clustering module for prior knowledge extraction to establish a clear learning direction for the network and provide a rough understanding of the anomaly and background. For the second challenge, we propose a fully convolutional encoder that integrates a spatial and spectral joint attention mechanism to enhance the cooperation between spatial and spectral features, which improves the utilization of spatial information. Simultaneously, we employ a fully convolutional architecture that reconstructs the central pixel by leveraging surrounding pixel features instead of isolating the pixel-by-pixel reconstruction. The third challenge lies in the difficulty of explaining the complex background through a simple distribution which may be fuzzy in real-world distribution. To address this issue, we directly employ the extracted real prior distribution that encompasses most characteristics of the background instead of the assumed distribution. GANs do well at effectively fitting two distributions to achieve maximum similarity, which is a technique widely employed in style transfer. Consequently, a latent feature adversarial consensus network is designed based on this approach to learn real distribution for background.

The final objective of the proposed method is to reconstruct the proper background and identify anomalies by generating reconstruction errors. Therefore, we propose a novel fully convolutional auto-encoder based on dual clustering and latent feature adversarial consistency for hyperspectral anomaly detection (FCAE-DCAC). The proposed FCAE-DCAC method makes contributions in the following aspects:

(1): A novel fully convolutional auto-encoder is proposed to make full use of spatial information to assist hyperspectral anomaly detection tasks to achieve a joint anomaly detection process with a spatial structure.
(2): A novel module for extracting prior knowledge that combines the DBSCAN and connected component analysis clustering is designed to guide deep network learning by extracting background and anomaly samples. This ensures that the proposed deep network has a clear learning direction. Additionally, the induction of triplet loss helps separate the distance between the background and anomaly. Hence, it enhances the separability between the background and anomaly.
(3): To overcome the limitations of assuming a specific distribution for the background and achieve a more accurate reconstruction for the pure background, we propose a latent feature adversarial consistency network. This network aims to learn the true distribution of the real background and employs an adversarial consistency enhancement loss to strengthen the constraints for reconstructing a purer background.

The rest of this article is organized as follows. In Section 2, we present a comprehensive overview of the implementation details for the proposed FCAE-DCAC method. In Section 3, the extensive experimental results of the proposed method are presented and compared with state-of-the-art approaches to evaluate the performance of the FCAE-DCAC. Finally, our conclusions are drawn in Section 4.

2. Proposed Method

2.1. Overview

Here, we present the flowchart of the proposed unsupervised fully convolutional auto-encoder (FCAE), as illustrated in Figure 1. Our proposed approach consists of three distinct stages:

(1): Extracting Prior Knowledge with Dual Clustering: The purpose of dual clustering is to obtain coarse labels for supervised network learning and provide the network with a clear learning direction to enhance its performance. Dual clustering (i.e., unsupervised DBSCAN and connected domain analysis clustering) techniques can be employed to cluster the HSI from the spectral domain to the spatial domain, which yields preliminary separation results between the background and anomaly regions. Subsequently, prior samples representing the background and anomaly regions are obtained through this processing, which effectively purifies the supervision information provided to the deep network by conveying more background-related information as well as anomaly related information. These anomaly features are then utilized to suppress anomaly generation while the background features contribute toward reconstructing most of the background.
(2): Training for the Fully Convolutional Auto-Encoder: The prior background and anomaly samples extracted in the first stage are used as training data for the training of a fully convolutional auto-encoder model. During the training phase, the original hyperspectral information is input into a fully convolutional deep network using a mask strategy while an adversarial consistency network is employed to learn the true background distribution and suppress anomaly generation. Finally, by leveraging self-supervision learning as a foundation, the whole deep network is guided to learn by incorporating triplet loss and adversarial consistency loss. Additionally, a spatial and spectral joint attention mechanism is utilized in both the encoder and decoder stages to enable adaptive learning for spatial and spectral focus.
(3): Testing with the Original Hyperspectral Imagery: The parameters of the proposed deep network are fixed, and the original hyperspectral imagery is fed into the trained network for reconstructing the expected background for hyperspectral imagery. At this stage, the deep network only consists of an encoder and a decoder. The reconstruction error serves as the final detection result of the proposed hyperspectral anomaly detection method.

2.2. Extracting Prior Knowledge with Dual Clustering

Dual clustering is mainly divided into DBSCAN on the spectral domain and CCC on the spatial domain. Firstly, the DBSCAN algorithm is employed to cluster the HSI based on its spectral information; the DBSCAN algorithm possesses the capability of clustering with arbitrary shapes and yielding clustering results with specific spatial attributes. This builds the foundation for subsequent CCC spatial clustering. The components of Figure 2 have the same meanings as those in Figure 1. Figure 2 shows an input HSI of

X \in R^{H \times W \times B}

, where

H

,

W,

and

B

are the row number, column number, and spectral dimension (the number of spectral channels) of the HSI, respectively. Under the condition of (Eps, MinPts), DBSCAN randomly selects a pixel as the starting point. It then searches for all pixels within a spectral Euclidean distance radius of the Eps around the starting point. If the number of pixels in this range is not less than MinPts, the starting point is marked as a core point, and a new cluster is created. All the core points and their density-reachable data points are added to this cluster. By iterating through all the core pixels, it obtains the category label graph

M_{1} = {\{m_{i}^{1}\}}_{I = 1}^{i = H \times W} \in R^{H \times W}

. Since the probability of picking a background pixel greatly exceeds that of the anomaly pixels, Our experiment also revealed that the clustering results can yield up to 312 categories. However, class 1 typically accounts for over 94% of the proportion of these results, which led us to roughly divide them into two categories. The majority of class 1 were considered to be part of the background (marked as 1), while the remaining minority classes were identified as anomalies (marked as 0). Finally, the binary classification map

P_{1} = {\{p_{i}^{1}\}}_{I = 1}^{i = H \times W} \in R^{H \times W}

was obtained using the following equation:

P_{1} = \{\begin{matrix} p_{i}^{1} = 1, m_{i}^{1} \in “ 1 ” \\ p_{i}^{1} = 0, m_{i}^{1} \notin “ 1 ” \end{matrix}

(1)

Through Equation (1), a binary classification map

P_{1}

which possessed certain spatial attributes can be obtained. However, due to the complexity and diversity of the background, not all of the background exhibits the same spectral characteristics. Consequently, in the binary classification map, isolated noise pixels and large background ground objects that differ significantly from other backgrounds might be mistakenly identified as anomaly targets. To address this issue, we propose a method involving spatial clustering using connected component analysis. By labeling the eight connected components on the binary classification graph with specific spatial attributes, a connected components labeling map

M_{2} = {\{m_{i}^{2}\}}_{I = 1}^{i = H \times W} \in R^{H \times W}

which represents the spatial relationship between background and anomaly can be obtained. In addition, by analyzing this spatial relationship through clustering techniques, it filters out isolated noise pixels and misclassified large background ground objects. The large background ground objects are defined as connected components with more than

D

pixels, where

D

is set to 50 in this article, and then it categorizes the connected components into three groups based on their pixel count as follows: category

L_{1}

represents connected components with less than 5 pixels, category

L_{2}

represents connected components with more than 5, but fewer than

D

pixels, and category

L_{3}

represents connected components with more than

D

pixels. We perform the following actions to obtain coarse labels

P_{2} = {p_{i}^{2}}_{i = 1}^{i = H \times W} \in R^{H \times W}

:

P_{2} = \{\begin{array}{l} p_{i}^{2} = 0, m_{i}^{2} \in “ L_{2} ” \\ p_{i}^{2} = 1, m_{i}^{2} \in “ L_{1} ” a n d \frac{L_{1}}{L} < 0.8 \\ p_{i}^{2} = 0 m_{i}^{2} \in “ L_{1} ” a n d \frac{L_{1}}{L} > 0.8 \\ p_{i}^{2} = 1, m_{i}^{2} \in “ L_{3} ” \end{array}

(2)

where

L_{1}

,

L_{2}

, and

L_{3}

are the number of connected components of the three classes, respectively, and

L = L_{1} + L_{2} + L_{3}

. These three types of connected components are analyzed, and the connected components with more than 50 pixels are considered to be part of the background and marked as 1. This process filters out the large background objects that are misjudged by the DBSCAN algorithm. The pixel values of the connected components greater than 5 but less than 50 are considered anomalies and marked as 0. If the number of connected components is less than 5 pixels, it is less than 80% of the total number of all connected components and is considered as isolated noise (i.e., background), which is marked as 1. Otherwise, it is considered an anomaly, which is marked as 0. In fact, due to the small proportion of connected components

L_{1}

in the entire HSI dataset, there is a significant reconstruction error during the reconstruction processing, whether or not it is labeled as background. In a word, the filter of connected components

L_{1}

provides better detection performance and the filters have little impact.

In order to better supervise the training of the deep network, combined with the coarse label

P_{2}

, the original HSI is partitioned into coarse-classified background samples and coarse-classified anomaly samples, which are represented as follows:

\begin{matrix} X^{B} = P_{2} \otimes X \\ X^{A} = (1 - P_{2}) \otimes X \end{matrix}

(3)

where

X^{A} \cup X^{B} = X

,

X^{A} \cap X^{B} = \emptyset

. Although it cannot be guaranteed that the coarse-classified background samples totally represent the background, it is certain that the coarse-classified background samples predominantly contain the majority of characteristics of the background.

2.3. Training for Fully Convolutional Auto-Encoder

With prior knowledge extraction carried out via dual clustering, the FCAE-DCAC employs the random mask strategy to augment the training samples. A novel fully convolutional auto-encoder is proposed with an adversarial consistency network for obtaining robust background features, and this method proposes adversarial consistency enhancement constraints and triplet loss to enhance the distinguishability between background and anomalies. The entire training phase is demonstrated in the following sections in three parts: (1) Data Augmentation, (2) Network Architecture, and (3) Learning Procedure.

2.3.1. Data Augmentation

The deep learning-based method for hyperspectral anomaly detection has always been troubled by the issue of insufficient training samples, which results in the phenomenon of overfitting within the trained deep network. To address this issue, we employed the mask-learning strategy, a widely adopted tool in the CV community [51], which can also be found in extensive applications for hyperspectral anomaly detection tasks [52]. The training samples can be expanded by randomly masking the HSI with each input batch, which then generates multiple batches of diverse training samples. The random masking method can be implemented in two ways: (a) utilizing a binary mask consisting of 0 and 1, where the pixel values within the masked region are directly set to 0, and (b) employing Gaussian noise to fill the masked area. However, adopting method (a) results in significant coverage of the background, which leads to a reduction in background features for the purpose of learning. Hence, this article adopts the second approach, which employs Gaussian noise to simulate the statistical characteristics for most of the background, thus prompting the extraction of more informative features.

Specifically, the original hyperspectral imagery adapts to the sizes of the image

(W, H)

by itself, and then we can randomly select the patch size from 2 to 10. For instance, for an input HSI of

X \in R^{H \times W \times B}

, we selected a value between 3 and 7 to confirm whether both

W

and

H

are divisible by this selected value. Both

W

and

H

should be divisible. We randomly chose one of them as the patch size and partitioned the original hyperspectral image into

K

distinct patches with respective patch sizes. Then, we randomly selected

N

patches from the

K

patches, where

N < K

and

0.3 < N / K < 1

. The locations of these

N

patches are then obtained and mapped onto the corresponding mask map

S

using binary values (0 or 1), with 0 representing the masked area and 1 representing the other pixel, where

S = {s_{i}}_{i = 1}^{i = H \times W} \in R^{H \times W}

. The employment of a cube

I \in R^{H \times W \times B}

, which is generated with Gaussian noise, is employed to fill the generated mask, taking the predominant multivariate Gaussian distribution observed in the background into consideration. The final deep network for the input training sample

X^{M} \in R^{H \times W \times B}

can be mathematically expressed as follows:

X^{M} = X \otimes S + I \otimes \bar{S}

(4)

2.3.2. Network Architecture

The architecture of the FCAE-DCAC, as illustrated in Figure 1, comprises a fully convolutional encoder, a fully convolutional decoder, and a latent feature adversarial consistency network.

(1): Fully Convolutional Auto-Encoder (FCAE): Previous deep learning-based hyperspectral anomaly detection methods, such as GAED [45], employ fully connected layers for pixel-wise self-supervised learning of HSI on the spectral dimension. However, these methods result in the degradation of the spatial structure within HSI, which leads to a significant loss of spatial information and underutilization of the spatial characteristics of the original HSI. Additionally, dealing with input hyperspectral images in pixel-by-pixel mode prevents the deep network from capturing spectral correlations between adjacent pixels, which results in isolated features and limited information acquisition. A straightforward improvement can be observed in Auto-AD [48], in which a convolutional auto-encoder (CAE) is utilized for the self-supervised learning of the HSI cube. By incorporating convolution operations, pooling operations, and sampling operations into AE architecture, the CAE not only extracts spatial features effectively but also enhances spectral feature correlation.

The distinction between our proposed FCAE and a simple CAE, as illustrated in Figure 3a, lies in employing a spectral and spatial joint attention mechanism within both the encoder and decoder. Moreover, we utilize a combination of residual and skip connections in the proposed deep network architecture to enhance the diversity of learned features. Specifically, the FCAE incorporates a spectral and spatial joint attention mechanism at the initial stage of the network to acquire crucial spatial and spectral features. These features are then fused to obtain key features that have both spatial and spectral features, which are subsequently fed into a fully convolutional encoder for feature encoding. The fully convolutional encoder consists of four EResConvBlocks, which transform the input cube

X^{M} \in R^{H \times W \times B}

from size

H \times W

to

\frac{H}{16} \times \frac{W}{16}

. Each EResConvBlock reduces the size in half. To preserve sufficient spectral information, we maintain a channel count of 128 and obtain the latent feature

Z \in R^{\frac{H}{16} \times \frac{W}{16} \times 128}

after passing through all the four EResConvBlocks. The encoding process can be expressed as follows:

\begin{matrix} F_{1} = S S A J (C o n v 1 (X^{M})) \\ F_{2} = E r w s C o n v B l o c k ((F_{1})) \\ F_{3} = E r w s C o n v B l o c k ((F_{2})) \\ F_{4} = E r w s C o n v B l o c k ((F_{3})) \\ Z = E r w s C o n v B l o c k ((F_{4})) \end{matrix}

(5)

where

F_{1} \in R^{H \times W \times 128}

,

F_{2} \in R^{\frac{H}{2} \times \frac{W}{2} \times 128}

,

F_{3} \in R^{\frac{H}{4} \times \frac{W}{4} \times 128}

, and

F_{4} \in R^{\frac{H}{8} \times \frac{W}{8} \times 128}

denote different levels of features in the encoder, respectively.

S S A J

denotes the spectral–spatial joint attention mechanism, and

C o n v 1

denotes

1 \times 1

convolution with a stride size of 1, and it can turn the channel from

B

to 128.

The latent feature

Z

undergoes the spectral–spatial joint attention process in the first place to further enhance its spatial and spectral features, which prepares it for the next decoding step. The fully convolutional decoder consists of four DEResConvBlocks that fuse the encoder features from different levels through skip connections, which gradually restores the latent feature

Z

of size

\frac{H}{16} \times \frac{W}{16}

to

H \times W

via stepwise upsampling. In the final encoder layer, the channel numbers are restored from 128 to

B

, and the decoding processing can be expressed as follows:

\begin{matrix} F_{5} = S S A J (Z) \\ F_{6} = D E r e s C o n v B l o c k (C oncat (Upsampling (F_{5}), C o n v 1 (F_{4}))) \\ F_{7} = D E r e s C o n v B l o c k (C oncat (Upsampling (F_{6}), C o n v 1 (F_{3}))) \\ F_{8} = D E r e s C o n v B l o c k (C oncat (Upsampling (F_{7}), C o n v 1 (F_{2}))) \\ \tilde{X} = C o n v 1 (D E r e s C o n v B l o c k (C oncat (Upsampling (F_{8}), C o n v 1 (F_{1})))) \end{matrix}

(6)

where

F_{5} \in R^{\frac{H}{16} \times \frac{W}{16} \times 128}

,

F_{6} \in R^{\frac{H}{8} \times \frac{W}{8} \times 128}

,

F_{7} \in R^{\frac{H}{4} \times \frac{W}{4} \times 128}

, and

F_{8} \in R^{\frac{H}{2} \times \frac{W}{2} \times 128}

denote different levels of features in the decoder, respectively, and

C o n c a t

is the concatenation for different levels of features between the encoder and decoder.

C o n v 1

denotes

1 \times 1

convolution with a stride size of 1 and it can maintain a channel count of 128 to smooth the feature, but the last

C o n v 1

can turn the channel from 128 to

B

.

The most important property of the FCAE lies in the incorporation of a spectral–spatial joint attention mechanism at the beginning of both the encoding and decoding stages, which enhances feature expression and optimizes spatial and spectral utilization to extract better spatial and spectral features. Additionally, the introduction of skip connections and residual connections facilitates the cross-layer feature interaction while preserving intricate details and semantic information to reconstruct a purer background.

Spectral–Spatial Joint Attention: As shown in Figure 3d,e the spectral–spatial joint attention mechanism learns spatial and spectral important features through global max-pooling and global average-pooling on both the spatial and spectral dimensions of the input feature map of the hyperspectral image cube, respectively. Then, the important features learned by the two pooling methods are decision-fused. Finally, the spatial and spectral important weight coefficients are obtained via the activation function of the sigmoid. The weighting coefficients are weighted to the input hyperspectral image cube features to obtain the key spatial features and key spectral features. Ultimately, the fusion of these two key features results in the joint key feature.

EResConvBlock: The EResConvBlock is composed of three convolutional layers: one

3 \times 3

convolution with a stride of 2, one

3 \times 3

convolution with a stride of 1, and one

1 \times 1

convolution with a stride of 1. The number of convolution kernels in each layer is fixed at 128 and is connected with the residual connection paradigm. This entire process is illustrated in Figure 3b. Firstly, instead of using a pooling operation, a

3 \times 3

convolution with a stride size of 2 is employed to reduce the input feature map size in half. Then, features are extracted through

3 \times 3

convolutions with a stride size of 1. Finally, the residual connection is utilized to incorporate the features from the other branch (i.e., those extracted via

1 \times 1

convolution with a stride of 2), which enables the fusion interaction between pre- and post-features and accelerates the deep network fitting speed. Each convolutional layer is followed by batch normalization and the LeakyReLU activation function.

DEResConvBlock: The entire DEResConvBlock is composed of three

1 \times 1

convolutions with a stride of 1 and one

3 \times 3

convolution with a stride of 1. The number of channels remains fixed at 128, except for the initial

1 \times 1

convolution, which reduces the input feature map from 256 to 128 dimensions. The entire process is illustrated in Figure 3c. Firstly, the input 256-dimension features to 128-dimension features is reduced through a

1 \times 1

convolution with a stride of 1. Subsequently, the feature is decoded using a

3 \times 3

convolution with a stride of 1 and smoothed via a

1 \times 1

convolution with a stride of 1. Finally, the residual connection is utilized to incorporate the features from the other branch (i.e., those extracted via

1 \times 1

convolution with a stride of 2) to enrich and enhance the decoded feature representation. Each convolutional layer is followed by batch normalization and the LeakyReLU activation function.

Skip Connection: By establishing skip connections, the features corresponding to different layers in both the encoding and decoding processes are interconnected, which facilitates cross-layer feature interaction and preserves intricate details as well as semantic information. This approach enhances the capacity of the proposed deep network for learning robust features and improves its fitting ability.

(2): Latent Feature Adversarial Consistency Network (LFACN): The latent feature adversarial consistency network, as illustrated in Figure 4a, comprises an encoder and a discriminator for the latent features. The input samples $X^{M} \in R^{H \times W \times B}$ and the prior background samples $X^{B} \in R^{H \times W \times B}$ are mapped to latent features $Z_{1}$ and $Z_{2}$ , respectively, through an encoder $E$ with shared weights. In order to ensure that the latent features of the background exhibit similar distributions, we employ a latent feature discriminator $D Z$ to oppose the encoder, which makes the latent feature $Z_{1}$ of the input resemble the hyperspectral image as closely as possible to the latent feature $Z_{2}$ in adversarial situations. This approach directly results in the true distribution of the background. All the inputs can then be effectively mapped to similar background latent features. Thereby, this approach enables accurate decoding of the corresponding pure background. Moreover, the latent feature $Z_{3}$ , which is obtained by mapping the reconstructed background $\tilde{X} \in R^{H \times W \times B}$ through the encoder $E$ , could also exhibit more similarity to the latent feature $Z_{2}$ of the prior background samples $X^{B}$ . However, due to the deep network’s inability to guarantee this point, a latent feature consistency loss $L 1$ is employed in order to strengthen the constraint.

Latent Discriminator,

D Z

: The latent discriminator,

D Z

, as illustrated in Figure 4b, comprises three

1 \times 1

convolutions with a stride of 1, followed by a fully connected layer and a sigmoid layer. The sequences of these three

1 \times 1

convolutions progressively reduce the dimensions of the input latent feature from 128 to 64, then to 32, and finally to 1 dimension. Subsequently, the latent discriminator is transformed into a single value through the fully connected layer and ultimately mapped to a confidence score for the latent feature using sigmoid activation.

2.3.3. Learning Procedure

The proposed deep network architecture primarily consists of an encoder,

E

, a decoder,

D E

, and a latent feature discriminator,

D Z

. Therefore, the loss function encompasses four components: the adversarial loss,

L_{D Z}

, between encoder

E

and the latent feature discriminator,

D Z

; the triplet loss,

L_{T}

; the adversarial consistency loss,

L_{Z}

; and the reconstruction loss

L_{R}

. Throughout the learning processing of the proposed deep network model, gradient backpropagation is utilized to iteratively optimize its parameters based on these four losses.

The original purpose of the reconstruction loss in

A E

is to minimize the discrepancy between the reconstructed image and the original HSI. However, in the proposed deep network, the reconstruction loss aims to prevent priorly extracted anomaly samples from including too few background samples, which results in the extreme situation of these parts of the background not being reconstructed. The following mean squared error (MSE) is employed to calculate the reconstruction loss:

L_{R} = {‖X - \tilde{X}‖}_{2}

(7)

where

{‖*‖}_{2}

represents the MSE loss. The objective of triplet loss is to enhance the discrimination between background and anomaly targets by minimizing the distance between the reconstructed image and the background samples while maximizing the distance between the reconstructed image and the anomaly samples. Consequently, triplet loss employs two mean squared errors, as can be seen in the following equation:

L_{T} = {‖X^{B} - \tilde{X} \otimes P_{2}‖}_{2} - {‖X^{A} - \tilde{X} \otimes (1 - P_{2})‖}_{2}

(8)

The latent feature adversarial consistency network effectively matches the latent feature

Z_{1}

extracted by the encoder

E

from the input with the latent feature

Z_{2}

obtained from the prior background samples while reinforcing the constraint on the latent feature

Z_{3}

of the reconstructed image through adversarial consistency loss. Consequently, we can express both the adversarial loss and adversarial consistency loss of the encoder

E

and the latent feature discriminator

D Z

as follows:

L_{D Z} = E (\log (D Z (Z_{2}))) + E (\log (1 - D Z (Z_{1})))

(9)

L_{Z} = {‖Z_{3} - Z_{2}‖}_{2}

(10)

where

Z_{1} = E (X^{M})

,

Z_{2} = E (X^{B})

, and

Z_{3} = E (\tilde{X})

are the latent features extracted by the encoder

E

from the input and the prior anomaly samples and the reconstructed HSI, respectively. By minimizing

L_{D Z}

and

L_{Z}

, the deep network can learn a more realistic background distribution.

Finally, the total loss of the whole network can be expressed as follows:

L_{a l l} = \partial L_{T} + β L_{Z} + μ L_{R}

(11)

where

\partial

,

β

, and

μ

are set to 0.9, 0.1, and 0.1, respectively, according to the needs of the task. The network was optimized by minimizing the loss function, with a learning rate of

l r = 0

.001. After training, the parameters of the deep network were fixed and utilized to reconstruct the original HSI.

Algorithm 1 Algorithm Flow Diagram of the FCAE-DCAC

Input: The original HSI

X \in R^{H \times W \times B}

Parameters: Epoch, learning rate

l r

, (eps, mints), D,

\partial

,

β

and

μ

Output: Final detection result:

G = {\{G_{i, j}\}}_{i = 1, j = 1}^{i = H, j = W} \in R^{H \times W}

Stage 1: Extracting Prior Knowledge with Dual Clustering
Obtain the prior anomaly samples

X^{A} \in R^{H \times W \times B}

, the prior background sample

X^{B} \in R^{H \times W \times B}

, and the coarse label

P_{2}

by (Equations (1)–(3))
Stage 2: Training for Fully Convolutional Auto-Encoder
Acquire training samples

X^{M} \in R^{H \times W \times B}

by (Equation (4))
Initialize the network with random weights
for each epoch perform the following:
FCAE update:

E

, D E

by

L_{a l l} = \partial L_{T} + β L_{Z} + μ L_{R}

Latent Feature Adversarial Consistency Network update:

E

, D Z

by

L_{D Z}

back-propagate

L_{a l l}

and

L_{D Z}

to change

E

, D E

, D Z

end
Stage 3: Testing with the Original HSI
Obtain the reconstructed HSI using the Original HSI as input by (Equation (12))
Calculate the degree of anomaly

G_{i, j}

for each pixel in

X

by (Equation (13))

2.4. Testing with the Original HSI

After optimizing and fixing the parameters of the deep network

\overset{̑}{θ}

, we eliminated the discriminator

D Z

and solely retained the encoder

E

and decoder

D E

for reconstructing the HSI. We used the original HSI

X

for detection instead of using a training mask image

X^{M}

, which comes closer to real-world scenarios. In practical applications, obtaining an image to undergo detection is effortless as it can be directly input into the deep network. The trained model then reconstructs the background image

\tilde{X}

with the end-to-end mode, as represented by the following equation:

\tilde{X} = F C A E_D C A C (X, \hat{θ})

(12)

After undergoing the guided learning of dual clustering and triple loss, and the learning of the real background by the adversarial consistency network, our proposed FCAE-DCAC deep network emerged as a robust background reconstruction network. It effectively maps anomaly pixels from the original HSI to potential features according to the proper background. Then, it reconstructs pixels that are similar to the surrounding background pixels. The anomaly exhibits a significantly higher reconstruction error compared with the background. Finally, based on the reconstruction error of the proposed model, we utilized Equation (13) to obtain the results of hyperspectral anomaly detection:

G_{i, j} = {‖x_{i, j} - {\tilde{x}}_{i, j}‖}_{2}

(13)

where

x_{i, j} \in R^{B \times 1}

and

{\tilde{x}}_{i, j} \in R^{B \times 1}

represent the pixels of the original HSI

X \in R^{H \times W \times C}

and reconstructed HSI

\tilde{X} \in R^{H \times W \times C}

, respectively. At the corresponding position

(i, j)

,

G_{i, j}

denotes the anomaly degree score of the pixels at this position

(i, j)

, which ultimately forms the final detection map

G = {\{G_{i, j}\}}_{i = 1, j = 1}^{i = H, j = W} \in R^{H \times W}

. Algorithm 1 provides a detailed description of the main steps involved in our proposed method.

3. Experiments and Analysis

For experimental validation and analysis, plenty of experiments were conducted on seven experimental hyperspectral datasets captured using various hyperspectral remote sensors to assess and validate the effectiveness and superiority of the proposed FCAE-DCAC method. Qualitative and quantitative comparisons were conducted with nine state-of-the-art hyperspectral anomaly detection methods. All experiments were executed on a computer equipped with an Intel Core i7-12700H CPU, 16 GB RAM, and GeForce RTX 3090. Eight compared methods were carried out with MATLAB 2018a. The proposed method and Auto-AD were implemented with Python 3.8.18, Pytorch 1.7.1, and CUDA 11.0. For fairness, we ensured that all compared methods were implemented based on open-source codes.

3.1. Data Description

We employed three distinct hyperspectral sensors to capture seven hyperspectral datasets in diverse scenarios for the anomaly detection task. These datasets contain both sparsely and densely distributed anomaly targets constituted with individual pixels or specific spatial structures. Moreover, these anomaly targets have different spatial scales.

(1): San Diego Dataset: This dataset was acquired by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) hyperspectral sensor over the San Diego airport area, CA, USA. The spatial resolution is 3.5 m. Its original size was 400 × 400 as depicted in Figure 5a. The image consists of a total of 224 spectral bands within the range of 370–2510 nm with 189 remaining after excluding bands that were affected by water absorption and low signal-to-noise ratio. Within this dataset, three regions named San Diego-1, San Diego-2, and San Diego-3 were selected. Figure 5b–d display the pseudocolor images and the ground-truth maps of these datasets. The image size of San Diego-1 is 100 × 100 and it contains three aircraft with different sizes that are considered anomaly targets. These anomaly targets comprise a total of 58 pixels, which account for 0.58% of the entire image. The image size of San Diego-2 is 60 × 60. Tarp, building, and shadow are the background land covers. Within this image, there are 22 densely distributed targets identified as anomalies. These anomaly targets comprise a total of 214 pixels, which account for 5.94% of the entire image. Similarly, the image size of San Diego-3 is 40 × 90 with a tarp, building, and shadow as background. In this image, there are 21 densely distributed targets identified as anomaly targets. These anomaly targets comprise a total of 423 pixels, which account for 11.75% of the entire image. It should be noted that the spectral curves of the building in the upper right corner significantly differ from other background features in the San Diego-2 image. Furthermore, the proportion occupied by this building is not as substantial as the other two types of background. Consequently, there are some challenges and difficulties in modeling and analyzing the background features in these datasets.
(2): Hyperspectral Digital Imagery Collection Experiment (HYDICE) Dataset: This dataset was acquired by the HYDICE sensor over a suburban residential area in Michigan, USA. The spatial resolution is 3 m, and the image size is 80 × 100. There are 210 spectral bands within the range of 400–2500 nm, with 175 remaining after eliminating noise and water vapor absorption bands. This hyperspectral dataset includes background land covers such as parking lots, soil, water bodies, and roads. Figure 5e displays the pseudocolor image and the ground-truth map of this dataset. Ten vehicles are considered anomaly targets and they comprise 17 pixels, which account for 0.21% of the entire image.
(3): Pavia Dataset: This dataset was acquired by the Reflective Optics System Imaging Spectrometer (ROSIS) in the center of Pavia, northern Italy. The spatial resolution is 1.3 m and the image size is 150 × 150. This dataset consists of 102 spectral bands within the range of 430–860 nm. Figure 5f displays the pseudocolor image and the ground-truth map of this dataset. The background land covers captured in this dataset include bridges, water bodies, and bare soil, while the anomaly targets are vehicles on the bridge. These anomaly targets comprise a total of 63 pixels, which account for 0.28% of the entire image.
(4): Los Angeles-1 (LA-1) Dataset: This dataset was acquired by the AVIRIS sensor over the Los Angeles area. The spatial resolution is 7.5 m and the image size is 100 × 100. It encompasses a total of 205 spectral bands within the range of 430–860 nm. Figure 5g displays the pseudocolor image and the ground-truth map of this dataset. Notably, there are a few houses that are considered anomaly targets in these images, which comprise a total of 232 pixels, accounting for 2.32% of the entire image.
(5): Gulfport Dataset: This dataset was acquired by the AVIRIS sensor over Gulfport, Southern, MS, USA, in 2010. The spatial resolution is 3.4 m and the image size is 100 × 100. After eliminating bands with a low signal-to-noise ratio (SNR), a total of 191 bands remained. The spectral coverage spans from 400 to 2500 nm. Figure 5h displays the pseudocolor image and the ground-truth map of this dataset. Three airplanes of various sizes are identified as anomaly targets comprising a total of 60 pixels and accounting for 0.60% of the entire image.

3.2. Evaluation Metrics

We quantitatively investigated the detection performance of the proposed method and the comparative approaches using three widely adopted evaluation metrics for anomaly detection in hyperspectral remote sensing imagery: background–anomaly separation analysis (boxplot) [53], receiver operating characteristic (ROC) [54], and area under the ROC curve (AUC) [55]. If the ROC curve of the anomaly detector exhibits a higher true positive (TPR,

P_{d}

) at a lower false alarm rate (FAR,

P_{f}

), which indicates that the ROC curve is closer to the top left corner, it suggests superior detection performance of the anomaly detector. However, if the ROC curves of two detectors demonstrate interleaved TPRs under different FARs, it becomes rather difficult to judge their performance solely based on visual results from the ROC curves. In such a case, an alternative quantitative evaluation criterion named AUC for anomaly detectors should be employed. If the AUC score is closer to 1, the detection performance is better. The boxplot can be utilized to assess the degree of separation between the background for different anomaly detectors. An anomaly detector with a higher degree of separation between background and anomalies exhibits superior detection performance.

3.3. Detection Performance

Subsequently, we conducted a comprehensive evaluation of the detection performance of various detectors based on four key aspects: heat map analysis, ROC curve assessment, AUC calculation, and separability boxplot examination. The heat maps in Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12 illustrate the hyperspectral anomaly detection results of ten different detectors on seven real HSI datasets. The pixels with higher anomaly degree scores are closer to yellow and, on the contrary, the background pixels are closer to blue. This visualization allows us to intuitively assess the anomaly highlighting and background suppression capabilities of the detectors. The ROC curves for all the methods are presented in Figure 13. The curve is closer to the top right corner, the detectors indicate a lower false alarm rate, and the probability of misjudgment is smaller. However, the AUC scores of (

P_{d}

,

P_{f}

) presented in Table 1 serve as a further evaluation of the detection performance; a higher value of AUC of (

P_{d}

,

P_{f}

) indicates that the anomaly detection capability of the detector is superior. The separability boxplot in Figure 14 illustrates the separability of anomaly targets and background in the detection results, which represents the statistical distribution distance between anomalies and background. A larger gap between the background and anomaly boxes indicates a stronger ability to highlight anomalies and suppress background, resulting in greater separability between anomalies and background.

As depicted in Figure 6 and Figure 12, the GRX, LRX, and FRFR hardly detected anomalous aircraft for the San Diego-1 and Gulfport datasets, but on the contrary, the scattered background displays a notable degree score of anomalies. Although the CRD, LRASR, GAED, and RGAE can detect most anomalies, the salience of these anomalies is not readily apparent. Additionally, certain types of background exhibit a higher degree score of anomalies than the degree of anomalous aircraft. AED and Auto-AD effectively highlighted anomalies, yet they still retain a significant amount of background information, such as background contour, which results in a persistently high false alarm rate. However, the detection results of our proposed method more effectively highlight the anomalies, while it effectively suppresses background interference. It is evident that the proposed FCAE-DCAC model achieves an exceptionally low false alarm rate and has strong background suppression capabilities on the San Diego-1 dataset. Especially for the datasets San Diego-2 and San Diego-3, which contain dense anomaly targets, the FCAE-DCAC demonstrated superior detection performance. It can accurately identify dense anomaly targets while ensuring a minimal miss rate and robust background suppression ability. Due to its effective utilization of spatial information for detection, the FCAE-DCAC excels in detecting anomalous structures and contours. The results depicted in Figure 7 and Figure 8 demonstrate that alternative approaches either fail to effectively suppress the prominent building background located in the upper right corner of San Diego-2, which is prone to misdetection, or exhibit a mixture of excessive noise and background, resulting in frequent missed detection occurrences. Notably, representation-based methods such as the CRD and the LRASR are particularly vulnerable to noise interference due to their linear or nonlinear representations. While the Auto-AD and the AED successfully mitigate the background interference, they suffer from a high miss rate and lack preservation of spatial structure details pertaining to anomaly targets and only provide approximate identification. The Pavia dataset also includes a bare soil background that is highly susceptible to misdetection. As depicted in Figure 10, the GRX, LRX, and FRFR almost fail to detect anomaly vehicles, but retain the soil background in the lower-left region of the Pavia image. The CRD, LRASR, GAED, and AED can effectively identify the anomalies; however, they cannot completely suppress the soil background in the lower-left region of the Pavia image, and the RGAE and Auto-AD exhibit strong background suppression abilities but suffer from significant missed detection issues. The proposed FCAE-DCAC method effectively suppresses the soil background at the bottom left of the Pavia image and yields a detection result that closely resembles the ground truth with very few cases of missed detection and false detection distinguishing anomalies. The proposed method obtains superior performance in extracting dense small anomaly targets for the LA-1 dataset, and extraction of dense small anomaly targets is most complete, as illustrated in Figure 11. In contrast, the GRX, LRX, FRFE, CRD, and AED failed to detect all anomaly targets. Moreover, LRASR, GAED, RGAE, and Auto-AD yielded detection results with excessive background information and noise. Particularly for LRASR, the suppression of background is almost negligible. The experiment conducted on the HDICE dataset further validates the efficacy of our proposed method in accurately reconstructing pure background. From Figure 9, it is evident that our method yields the purest detection results. However, given that anomalies in HYDICE are limited to a few pixels and exhibit simple regular shape structures, they obtain relatively satisfactory performance across all methods.

Additionally, we also evaluated the performance of different algorithms qualitatively and quantitatively from the ROC curves and AUC scores of (

P_{d}

,

P_{f}

) for the detection results of different methods on the experimental datasets. Figure 13 shows the ROC results of these seven experimental datasets. In most cases, the ROC curve of the FCAE-DCAC is at the top and is closest to the top left corner, which exhibits the best detection performance. The results demonstrate that our method achieves high detection accuracy while maintaining a low false alarm rate. As expected, the FCAE-DCAC consistently outperforms the other nine methods, even when it deals with datasets containing dense anomalies like San Diego-2, San Diego-3, and LA-1. It is evident that the proposed FCAE-DCAC method exhibits remarkable competitiveness.

To validate the capability of our proposed method for distinguishing between background and anomalies, we conducted an analysis of the separability boxplot. As depicted in Figure 14, the green background box of the FCAE-DCAC exhibits a narrow range that indicates its effective suppression of background and reconstruction of a relatively pure background. Additionally, there is a significant distance between the red anomaly box and the green background box of the FCAE-DCAC method, with almost no overlapping region. The separability of the AED is only better than the proposed method on the HYDICE, LA-1, and Gulfport datasets. This could be attributed to the presence of extremely small anomalies (i.e., anomalies less than 5 pixels) in these datasets, which are filtered out as noise in the dual clustering process. However, our method exhibits a higher anomaly box, indicating that it can detect more anomalies compared with the other methods. The other methods either had significant overlap between background and anomaly boxes or detect fewer anomalies. For the remaining four datasets, the proposed FCAE-DCAC method effectively separated the background from anomalies by increasing their separability.

The evaluation results demonstrate that the proposed FCAE-DCAC method exhibits superior detection performance in that it effectively detects anomalies with various sizes and diverse structural information while preserving their fundamental shape structure. It also achieves a lower false alarm rate, lower miss rate, and higher detection rate, which ensures a balance between background suppression and anomaly detection. These experimental performances indicate that the detection results of the proposed method illustrated in Table 1 are satisfactory, particularly in dense anomaly target identification. The average AUC scores of (

P_{d}

,

P_{f}

) of the FCAE-DCAC on the seven experimental datasets are the highest, reaching 0.9903, which is 0.0127 higher than the second-place AED method. In general, the FCAE-DCAC is extremely competitive.

3.4. Parametric Analysis

In this section, we examine the impact of four parameters for the proposed method, namely the clustering radius Eps of dual clustering, the minimum number of neighborhood points in the domain MinPts, the filtering threshold

D

, and the weight parameters (

α, β

) of the triplet loss function and the adversarial consistent row loss function.

In order to assess the impact of Eps on the performance of the proposed method, we set MinPts at 1 and kept

D

fixed at 50. The weight of the loss function remains constant (

α

= 0.9,

β

= 0.1), while the reconstruction penalty coefficient

μ

is set as 0.1. Within the range of 0.01 to 0.25, Figure 15a illustrates the optimal values for Eps across different datasets, and the optimal Eps for San Diego-1, San Diego-2, and San Diego-3 datasets are 0.20, 0.11, and 0.12, respectively. For the HYDICE dataset, the variation in Eps has minimal impact on the AUC score of (

P_{d}, P_{f}

), and the value of Eps = 0.12 is selected due to its relatively superior detection performance in subsequent experiments. The AUC score of (

P_{d}, P_{f}

) on the Pavia dataset reaches its optimum when Eps is set as 0.14. For the LA-1 and Gulfport datasets, the Eps values of 0.09 and 0.14, respectively, yield the highest AUC scores for (

P_{d}, P_{f}

).

Since the connected domain clustering takes an eight-neighborhood approach, only the influence of changing MinPts from 1 to 8 for the detection performance of the proposed methods was analyzed, as depicted in Figure 15b. It is important to note that Figure 15a determines the optimal Eps value for assisting MinPts analysis while it keeps other parameters consistent with analysis experiments on Eps. The experimental results reveal that, except for the LA-1 dataset, all the other datasets achieve their best AUC scores of (

P_{d}, P_{f}

) when MinPts = 1. However, for the LA-1 dataset, varying MinPts from 1 to 4 has minimal effect on the AUC score of (

P_{d}, P_{f}

), with a maximum fluctuation of 0.001921. Therefore, to simplify and minimize the human intervention in subsequent experiments, MinPts is set as 1 for all datasets.

Initially, the value of the filtering threshold

D

was determined through expert visual inspection to identify the potential large background sizes and filter out misjudged large background targets from the initial clustering results. In this experiment, the sensitivity of different datasets to parameter

D

was tested in the range of 30 to 140, and the results are presented in Figure 15c. After conducting the experiments, it was observed that for the San Diego-1, San Diego-3, and Gulfport datasets, there is a significant improvement in the AUC scores of (

P_{d}, P_{f}

) when the parameter

D

is changed from 40 to 50, which indicates that, when

D

is less than 50, it filters a lot of anomaly targets into background targets. However, the detection performances of the HYDICE and LA-1 datasets are not sensitive to the change in

D

, and the detection performances are consistently stable, which indicates that there is no misjudgment of large background land covers in their clustering results. Conversely, the San Diego-2 and Pavia datasets exhibit a notable decline in detection performance when

D

exceeds a certain threshold, which indicates an inability to filter out misclassified large background land covers under high values of

D

. Ultimately, after elaborative analysis, a value of 50 was chosen for

D

to ensure stable and satisfactory performance across all the datasets.

In order to verify the influence of different loss function weights on the deep network performance, the other parameters are fixed as the best values. The performances of the proposed method on different datasets were analyzed in the range of weight allocation from (0.1, 0.9) to (0.9, 0.1) as shown in Figure 16. Changing parameters (

α, β

) has almost no effect on the AUC scores of (

P_{d}, P_{f}

) on the San Diego-1 and Gulfport datasets. This is because dual clustering works particularly well on these two datasets so that adversarial consistency loss can guide the proposed method to fully learn the features of the real background. However, with the increase in the adversarial consistency constraint, the performance of other datasets is reduced. This is because the prior background samples produced by double clustering are not all background but only contain most of the characteristics of the background, and the use of strong constraints only leads to a significant decline in detection performance. Finally, through the experiment, we selected the weight allocation of (0.9, 0.1), under the precondition of fully separating the distance between the background and anomaly. The weak constraint of the background adversarial consistency was imposed to direct the proposed method to pay more attention to learning background features.

Finally, after parameter analysis, we selected Eps values of 0.20, 0.11, 0.12, 0.12, 0.14, 0.09, and 0.14 for the San Diego-1, San Diego-2, San Diego-3, HYDICE, Pavia, LA-1, and Gulfport datasets, respectively. For all the datasets, we set D as 50 and the (

α, β

) as (0.9, 0.1), and the weight of the reconstruction penalty term (

μ

) was fixed as 0.1.

3.5. Ablation Study

The effectiveness of our proposed novel fully convolutional auto-encoder (FCAE), the latent feature adversarial consistent network (LFACN), dual clustering (DC), triple loss

L_{T}

, and the other components are primarily analyzed in this section. Ablation experiments were conducted to specifically investigate four cases: the first case involves using an FCAE without SSJA; the second case involves using a fully convolutional network with SSJA; the third case includes an additional triplet loss,

L_{T}

; and the fourth case incorporates an additional LFACN. Since the results of the DC mainly impact the triplet loss

L_{T}

and the LFACN, their effectiveness is proved, which indirectly validates the efficacy of dual clustering. Meanwhile, the effectiveness of triplet loss were proved in the ablation study. The experimental results are presented in Table 2.

It is evident that with an increase in the number of components, our detection performance exhibits a steady improvement. With the exception of a less pronounced enhancement on the HYDICE dataset, a significant improvement is observed on the other datasets, particularly San Diego-2, San Diego-3, and LA-1. The AUC values are increased by 0.00788, 0.0063, and 0.0061, respectively, from the first case to the second case, which proves that SSJA effectively improves spatial information utilization and enables the deep network to achieve better detection results in the same learning time. It further increased by 0.03568, 0.0706, and 0.04405, respectively, from the second case to the third case. Finally, it increased by 0.0542, 0.03853, and 0.01211, respectively, from the third case to the fourth case. The overall improvement from the first case to the fourth case on the three datasets is approximately 11–12%.

The ablation experiments demonstrate that the integration of

L_{T}

and the LFACN significantly enhances the detection performance of the deep network. It substantiates the effectiveness of triplet loss in effectively discerning the anomaly–background distance and proves that the LFACN can comprehensively learn the real background distribution. Moreover, it further enhances the purity of the reconstructed background. All the other datasets also experienced various degrees of improvement, which clearly demonstrates that incorporating each component in the reconstruction process contributes positively toward enhancing HAD effectiveness.

3.6. Comparison of Inference Times

The inference time of different detectors is shown in Table 3. Because we introduced the prior knowledge extraction method of dual clustering, coupled with the training of deep learning, the time of the whole process may take about 5 min. In Table 3, we only compare the inference time of HAD after training in seconds. It can be seen from Table 3 that the FCAE-DCAC method does not provide the fastest inference time, but it is much faster than the other methods except for the Auto-AD method. It can be seen that, as long as the deep network is trained, the practicability is very strong, and the background reconstruction ability of the corresponding dataset is also very strong. The deep network becomes more complex compared with the Auto-AD method, and the prior knowledge extraction is also time-consuming; therefore, we improved the detection accuracy at the expense of the preparation time. In the future, the lightweight HAD algorithm will be the focus of our research.

4. Conclusions

In this article, we proposed a novel fully convolutional auto-encoder for hyperspectral anomaly detection based on dual clustering and the latent feature adversarial consistency network (FCAE-DCAC). Specifically, we proposed a spatial–spectral joint attention mechanism to enhance the utilization of spatial information in our design for the fully convolutional auto-encoder. We incorporated a dual clustering prior extraction module that accurately extracts prior knowledge to guide the deep network learning process. We also proposed a triple loss to increase the separation between background and anomaly. Furthermore, we equipped our model with a latent adversarial consistency network to learn the true distribution of background samples and enhance the consistency constraint for improved learning guidance, which enabled our deep network to reconstruct pure backgrounds effectively. The incomplete reconstruction of anomalies in the HSI ultimately resulted in a significant increase in reconstruction error. The experiments conducted on seven datasets demonstrate that our FCAE-DCAC method exhibits superior and comprehensive detection performance across various scenarios. The proposed FCAE-DCAC method particularly excels in scenes with dense anomaly targets and prominent background land covers, which are prone to misjudgment. The detection performances prove that the proposed FCAE-DCAC method outperforms the compared state-of-the-art hyperspectral anomaly detection methods. The experiments for effectiveness further validate the reliability and feasibility of the proposed FCAE-DCAC method.

Author Contributions

Conceptualization, Z.Y.; methodology, R.Z.; investigation, F.S.; supervision, X.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, 42301376; the National Natural Science Foundation of China, 42171326; and the Ningbo Natural Science Foundation, 2022J076.

Data Availability Statement

No new data were created.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

$X \in R^{H \times W \times B}$	The original HSI.
$\tilde{X} \in R^{H \times W \times B}$	The reconstructed HSI.
$X^{A} \in R^{H \times W \times B}$	The coarse classified anomaly sample (i.e., the prior anomaly samples).
$X^{B} \in R^{H \times W \times B}$	The coarse classified background sample (i.e., the prior background sample).
$X^{M} \in R^{H \times W \times B}$	The input training samples.
$\otimes$	The multiplication of corresponding elements.
$P_{1} = {\{p_{i}^{1}\}}_{I = 1}^{i = H \times W} \in R^{H \times W}$	The binary classification map.
$P_{2} = {p_{i}^{2}}_{i = 1}^{i = H \times W} \in R^{H \times W}$	The coarse labels.
$M_{2} = {\{m_{i}^{2}\}}_{I = 1}^{i = H \times W} \in R^{H \times W}$	The connected components labeling map.
$M_{1} = {\{m_{i}^{1}\}}_{I = 1}^{i = H \times W} \in R^{H \times W}$	The category label graph.
$S = {s_{i}}_{i = 1}^{i = H \times W} \in R^{H \times W}$	The mask map.
$\bar{S} = {{\bar{s}}_{i}}_{i = 1}^{i = H \times W} \in R^{H \times W}$	The inverse mask map of $S$ .
$G = {\{G_{i, j}\}}_{i = 1, j = 1}^{i = H, j = W} \in R^{H \times W}$	The final detection map.

References

Su, H.; Wu, Z.; Zhang, H.; Du, Q. Hyperspectral anomaly detection: A survey. IEEE Geosci. Remote Sens. Mag. 2021, 10, 64–90. [Google Scholar] [CrossRef]
Shah, N.R.; Maud, A.R.M.; Bhatti, F.A.; Ali, M.K.; Khurshid, K.; Maqsood, M.; Amin, M. Hyperspectral anomaly detection: A performance comparison of existing techniques. Int. J. Digit. Earth 2022, 15, 2078–2125. [Google Scholar] [CrossRef]
Li, J.; Hong, D.; Gao, L.; Yao, J.; Zheng, K.; Zhang, B.; Chanussot, J. Deep learning in multimodal remote sensing data fusion: A comprehensive review. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102926. [Google Scholar] [CrossRef]
Chang, C.-I. Hyperspectral anomaly detection: A dual theory of hyperspectral target detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5511720. [Google Scholar] [CrossRef]
Bioucas-Dias, J.M.; Plaza, A.; Camps-Valls, G.; Scheunders, P.; Nasrabadi, N.; Chanussot, J. Hyperspectral remote sensing data analysis and future challenges. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–36. [Google Scholar] [CrossRef]
Li, J.; Zheng, K.; Liu, W.; Li, Z.; Yu, H.; Ni, L. Model-Guided Coarse-to-Fine Fusion Network for Unsupervised Hyperspectral Image Super-Resolution. IEEE Geosci. Remote Sens. Lett. 2023, 20, 5508605. [Google Scholar] [CrossRef]
Zhu, D.; Du, B.; Zhang, L. Two-stream convolutional networks for hyperspectral target detection. IEEE Trans. Geosci. Remote Sens. 2020, 59, 6907–6921. [Google Scholar] [CrossRef]
Zhang, S.; Meng, X.; Liu, Q.; Yang, G.; Sun, W. Feature-Decision Level Collaborative Fusion Network for Hyperspectral and LiDAR Classification. Remote Sens. 2023, 15, 4148. [Google Scholar] [CrossRef]
Liu, S.; Marinelli, D.; Bruzzone, L.; Bovolo, F. A review of change detection in multitemporal hyperspectral images: Current techniques, applications, and challenges. IEEE Geosci. Remote Sens. Mag. 2019, 7, 140–158. [Google Scholar] [CrossRef]
Manolakis, D.; Truslow, E.; Pieper, M.; Cooley, T.; Brueggeman, M. Detection algorithms in hyperspectral imaging systems: An overview of practical algorithms. IEEE Signal Process. Mag. 2013, 31, 24–33. [Google Scholar] [CrossRef]
Gao, L.; Sun, X.; Sun, X.; Zhuang, L.; Du, Q.; Zhang, B. Hyperspectral anomaly detection based on chessboard topology. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5505016. [Google Scholar] [CrossRef]
Chang, C.-I.; Chiang, S.-S. Anomaly detection and classification for hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 2002, 40, 1314–1325. [Google Scholar] [CrossRef]
Theiler, J.; Ziemann, A.; Matteoli, S.; Diani, M. Spectral variability of remotely sensed target materials: Causes, models, and strategies for mitigation and robust exploitation. IEEE Geosci. Remote Sens. Mag. 2019, 7, 8–30. [Google Scholar] [CrossRef]
Xiang, P.; Song, J.; Qin, H.; Tan, W.; Li, H.; Zhou, H. Visual attention and background subtraction with adaptive weight for hyperspectral anomaly detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2270–2283. [Google Scholar] [CrossRef]
Matteoli, S.; Diani, M.; Theiler, J. An overview of background modeling for detection of targets and anomalies in hyperspectral remotely sensed imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2317–2336. [Google Scholar] [CrossRef]
Reed, I.S.; Yu, X. Adaptive multiple-band CFAR detection of an optical pattern with unknown spectral distribution. IEEE Trans. Acoust. Speech Signal Process. 1990, 38, 1760–1770. [Google Scholar] [CrossRef]
Molero, J.M.; Garzon, E.M.; Garcia, I.; Plaza, A. Analysis and optimizations of global and local versions of the RX algorithm for anomaly detection in hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 801–814. [Google Scholar] [CrossRef]
Schaum, A. Joint subspace detection of hyperspectral targets. In Proceedings of the 2004 IEEE Aerospace Conference, Big Sky, MT, USA, 6–13 March 2004; Volume 3, p. 1824. [Google Scholar]
Guo, Q.; Zhang, B.; Ran, Q.; Gao, L.; Li, J.; Plaza, A. Weighted-RXD and linear filter-based RXD: Improving background statistics estimation for anomaly detection in hyperspectral imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2351–2366. [Google Scholar] [CrossRef]
Du, B.; Zhang, L. A discriminative metric learning based anomaly detection method. IEEE Trans. Geosci. Remote Sens. 2014, 52, 6844–6857. [Google Scholar]
Kwon, H.; Nasrabadi, N.M. Kernel RX-algorithm: A nonlinear anomaly detector for hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 2005, 43, 388–397. [Google Scholar] [CrossRef]
Zhou, J.; Kwan, C.; Ayhan, B.; Eismann, M.T. A novel cluster kernel RX algorithm for anomaly and change detection using hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6497–6504. [Google Scholar] [CrossRef]
Zhao, R.; Du, B.; Zhang, L. A robust nonlinear hyperspectral anomaly detection approach. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 1227–1234. [Google Scholar] [CrossRef]
Tao, R.; Zhao, X.; Li, W.; Li, H.-C.; Du, Q. Hyperspectral anomaly detection by fractional Fourier entropy. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 4920–4929. [Google Scholar] [CrossRef]
Li, W.; Du, Q. Collaborative representation for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2014, 53, 1463–1474. [Google Scholar] [CrossRef]
Zhang, Y.; Du, B.; Zhang, L.; Wang, S. A low-rank and sparse matrix decomposition-based Mahalanobis distance method for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2015, 54, 1376–1389. [Google Scholar] [CrossRef]
Qu, Y.; Wang, W.; Guo, R.; Ayhan, B.; Kwan, C.; Vance, S.; Qi, H. Hyperspectral anomaly detection through spectral unmixing and dictionary-based low-rank decomposition. IEEE Trans. Geosci. Remote Sens. 2018, 56, 4391–4405. [Google Scholar] [CrossRef]
Xu, Y.; Wu, Z.; Li, J.; Plaza, A.; Wei, Z. Anomaly detection in hyperspectral images based on low-rank and sparse representation. IEEE Trans. Geosci. Remote Sens. 2015, 54, 1990–2000. [Google Scholar] [CrossRef]
Galatsanos, N.P.; Katsaggelos, A.K. Methods for choosing the regularization parameter and estimating the noise variance in image restoration and their relation. IEEE Trans. Image Process. 1992, 1, 322–336. [Google Scholar] [CrossRef]
Kang, X.; Zhang, X.; Li, S.; Li, K.; Li, J.; Benediktsson, J.A. Hyperspectral anomaly detection with attribute and edge-preserving filters. IEEE Trans. Geosci. Remote Sens. 2017, 55, 5600–5611. [Google Scholar] [CrossRef]
Xie, W.; Jiang, T.; Li, Y.; Jia, X.; Lei, J. Structure tensor and guided filtering-based algorithm for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4218–4230. [Google Scholar] [CrossRef]
Liu, Q.; Meng, X.; Shao, F.; Li, S. Supervised-unsupervised combined deep convolutional neural networks for high-fidelity pansharpening. Inf. Fusion 2023, 89, 292–304. [Google Scholar] [CrossRef]
Li, J.; Zheng, K.; Yao, J.; Gao, L.; Hong, D. Deep unsupervised blind hyperspectral and multispectral data fusion. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6007305. [Google Scholar] [CrossRef]
Liu, Q.; Chen, X.; Meng, X.; Chen, H.; Shao, F.; Sun, W. Dual-Task Interactive Learning for Unsupervised Spatio-Temporal-Spectral Fusion of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5402015. [Google Scholar] [CrossRef]
Li, J.; Zheng, K.; Li, Z.; Gao, L.; Jia, X. X-shaped interactive autoencoders with cross-modality mutual learning for unsupervised hyperspectral image super-resolution. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5518317. [Google Scholar] [CrossRef]
Hu, X.; Xie, C.; Fan, Z.; Duan, Q.; Zhang, D.; Jiang, L.; Wei, X.; Hong, D.; Li, G.; Zeng, X. Hyperspectral anomaly detection using deep learning: A review. Remote Sens. 2022, 14, 1973. [Google Scholar] [CrossRef]
Li, W.; Wu, G.; Du, Q. Transferred deep learning for anomaly detection in hyperspectral imagery. IEEE Geosci. Remote Sens. Lett. 2017, 14, 597–601. [Google Scholar] [CrossRef]
Rao, W.; Qu, Y.; Gao, L.; Sun, X.; Wu, Y.; Zhang, B. Transferable network with Siamese architecture for anomaly detection in hyperspectral images. Int. J. Appl. Earth Obs. Geoinf. 2022, 106, 102669. [Google Scholar] [CrossRef]
Song, S.; Zhou, H.; Yang, Y.; Song, J. Hyperspectral anomaly detection via convolutional neural network and low rank with density-based clustering. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 3637–3649. [Google Scholar] [CrossRef]
Gao, L.; Li, J.; Zheng, K.; Jia, X. Enhanced Autoencoders with Attention-Embedded Degradation Learning for Unsupervised Hyperspectral Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5509417. [Google Scholar] [CrossRef]
Bati, E.; Çalışkan, A.; Koz, A.; Alatan, A.A. Hyperspectral anomaly detection method based on auto-encoder. In Proceedings of the Image and Signal Processing for Remote Sensing XXI, Toulouse, France, 21–24 September 2015; Volume 9643, pp. 220–226. [Google Scholar]
Arisoy, S.; Nasrabadi, N.M.; Kayabol, K. GAN-based hyperspectral anomaly detection. In Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 18–21 January 2021; pp. 1891–1895. [Google Scholar]
Racetin, I.; Krtalić, A. Systematic review of anomaly detection in hyperspectral remote sensing applications. Appl. Sci. 2021, 11, 4878. [Google Scholar] [CrossRef]
Jiang, T.; Li, Y.; Xie, W.; Du, Q. Discriminative reconstruction constrained generative adversarial network for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4666–4679. [Google Scholar] [CrossRef]
Xiang, P.; Ali, S.; Jung, S.K.; Zhou, H. Hyperspectral anomaly detection with guided autoencoder. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5538818. [Google Scholar] [CrossRef]
Fan, G.; Ma, Y.; Mei, X.; Fan, F.; Huang, J.; Ma, J. Hyperspectral anomaly detection with robust graph autoencoders. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5511314. [Google Scholar]
Wang, L.; Wang, X.; Vizziello, A.; Gamba, P. RSAAE: Residual Self-Attention-Based Autoencoder for Hyperspectral Anomaly Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5510614. [Google Scholar] [CrossRef]
Wang, S.; Wang, X.; Zhang, L.; Zhong, Y. Auto-AD: Autonomous hyperspectral anomaly detection network based on fully convolutional autoencoder. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5503314. [Google Scholar] [CrossRef]
Wang, D.; Zhuang, L.; Gao, L.; Sun, X.; Huang, M.; Plaza, A. PDBSNet: Pixel-shuffle Down-sampling Blind-Spot Reconstruction Network for Hyperspectral Anomaly Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5511914. [Google Scholar] [CrossRef]
Gao, L.; Wang, D.; Zhuang, L.; Sun, X.; Huang, M.; Plaza, A. BS 3 LNet: A new blind-spot self-supervised learning network for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5504218. [Google Scholar]
He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21–24 June 2022; pp. 16000–16009. [Google Scholar]
Li, Z.; Wang, Y.; Xiao, C.; Ling, Q.; Lin, Z.; An, W. You Only Train Once: Learning a General Anomaly Enhancement Network With Random Masks for Hyperspectral Anomaly Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5506718. [Google Scholar] [CrossRef]
Manolakis, D.; Shaw, G. Detection algorithms for hyperspectral imaging applications. IEEE Signal Process. Mag. 2002, 19, 29–43. [Google Scholar] [CrossRef]
Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
Ferri, C.; Hernández-Orallo, J.; Flach, P.A. A coherent interpretation of AUC as a measure of aggregated classification performance. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, 28 June–2 July 2011; pp. 657–664. [Google Scholar]

Figure 1. Flowchart of the proposed FCAE-DCAC method.

Figure 2. Flowchart of the dual clustering method.

Figure 3. The detailed deep network architecture of the FCAE for hyperspectral anomaly detection: (a) fully convolutional auto-encoder; (b) EResConvBlock; (c) DEResConvBlock; (d) spatial attention; and (e) spectral attention.

Figure 4. The detailed deep network architecture of the LFACN: (a) latent feature adversarial consistency network; (b) latent discriminator

D Z

.

Figure 4. The detailed deep network architecture of the LFACN: (a) latent feature adversarial consistency network; (b) latent discriminator

D Z

.

Figure 5. Pseudocolor images and ground-truth maps of seven experimental hyperspectral datasets: (a) San Diego airport; (b) San Diego-1; (c) San Diego-2; (d) San Diego-3; (e) HYDICE; (f) Pavia; (g) LA-1; and (h) Gulfport.

Figure 6. Heat maps obtained by using different algorithms on the San Diego-1 image: (a) ground truth; (b) GRX; (c) LRX; (d) FRFE; (e) CRD; (f) AED; (g) LRASR; (h) GAED; (i) RGAE; (j) Auto-AD; and (k) ours.

Figure 7. Heat maps obtained by using different algorithms on the San Diego-2 image: (a) ground truth; (b) GRX; (c) LRX; (d) FRFE; (e) CRD; (f) AED; (g) LRASR; (h) GAED; (i) RGAE; (j) Auto-AD; and (k) ours.

Figure 8. Heat maps obtained by using different algorithms on the San Diego-3 image: (a) ground truth; (b) GRX; (c) LRX; (d) FRFE; (e) CRD; (f) AED; (g) LRASR; (h) GAED; (i) RGAE; (j) Auto-AD; and (k) ours.

Figure 9. Heat maps obtained by using different algorithms on the HYDICE dataset image: (a) ground truth; (b) GRX; (c) LRX; (d) FRFE; (e) CRD; (f) AED; (g) LRASR; (h) GAED; (i) RGAE; (j) Auto-AD; and (k) ours.

Figure 10. Heat maps obtained by using different algorithms on the Pavia dataset image: (a) ground truth; (b) GRX; (c) LRX; (d) FRFE; (e) CRD; (f) AED; (g) LRASR; (h) GAED; (i) RGAE; (j) Auto-AD; and (k) ours.

Figure 11. Heat maps obtained by using different algorithms on the LA-1 dataset image: (a) ground truth; (b) GRX; (c) LRX; (d) FRFE; (e) CRD; (f) AED; (g) LRASR; (h) GAED; (i) RGAE; (j) Auto-AD; and (k) ours.

Figure 12. Heat maps obtained by using different algorithms on the Gulfport dataset image: (a) ground truth; (b) GRX; (c) LRX; (d) FRFE; (e) CRD; (f) AED; (g) LRASR; (h) GAED; (i) RGAE; (j) Auto-AD; and (k) ours.

Figure 13. ROC curves for different detectors on the seven considered datasets: (a) San Diego-1; (b) San Diego-2; (c) San Diego-3; (d) HYDICE; (e) Pavia; (f) LA-1; and (g) Gulfport.

Figure 14. Separability boxplots for different detectors on the seven considered datasets: (a) San Diego-1; (b) San Diego-2; (c) San Diego-3; (d) HYDICE; (e) Pavia; (f) LA-1; and (g) Gulfport.

Figure 15. Analysis of the parameters with all the experimental datasets: (a) clustering radius Eps; (b) MinPts; and (c) filtering threshold

D

.

Figure 15. Analysis of the parameters with all the experimental datasets: (a) clustering radius Eps; (b) MinPts; and (c) filtering threshold

D

.

Figure 16. The effects of the parameters (

α, β

) over the AUC scores of (

P_{d}, P_{f}

) on each dataset.

Figure 16. The effects of the parameters (

α, β

) over the AUC scores of (

P_{d}, P_{f}

) on each dataset.

Table 1. The AUC values of the 10 considered detectors on different datasets.

Dataset	$The AUC (P_{d}, P_{f})$ of Different Methods
Dataset	GRX	LRX	FRFE	CRD	AED	LRASR	GAED	RGAE	Auto-AD	Ours
Sandiego-1	0.8736	0.8570	0.9787	0.9768	0.9900 ²	0.9824	0.9861	0.9854	0.9895	0.9994 ¹
Sandiego-2	0.7499	0.7211	0.7821	0.9290	0.9399	0.8065	0.8905	0.8819	0.9466 ²	0.9773 ¹
Sandiego-3	0.7125	0.7540	0.7694	0.9485 ²	0.9659	0.7214	0.7811	0.8341	0.9163	0.9815 ¹
HYDICE	0.9857	0.9911	0.9933	0.9976	0.9951 ²	0.9744	0.9843	0.9646	0.9951 ²	0.9980 ¹
Pavia	0.9538	0.9525	0.9457	0.9510	0.9793	0.9380	0.9398	0.9688	0.9914 ²	0.9979 ¹
LA-1	0.9692	0.9492	0.9655	0.9229	0.9780 ²	0.9440	0.9424	0.9569	0.9406	0.9808 ¹
Gulfport	0.9526	0.9532	0.9722	0.9342	0.9953	0.9120	0.9705	0.9842	0.9968 ²	0.9975 ¹
Average	0.8853	0.8826	0.9153	0.9514	0.9777 ²	0.8970	0.9278	0.9394	0.9680	0.9903 ¹

¹ The AUC values with bold font and red color in this table represent the optimal performance for each dataset. ² The AUC values with bold font and blue color in this table represent the suboptimal performance for each dataset.

Table 2. The AUC values of the ablation study on different datasets.

Component	$The AUC (P_{d}, P_{f})$ of Different Cases
Component	Sandiego-1	Sandiego-2	Sandiego-3	HYDICE	Pavia	LA-1	Gulfport
FCAE without SSJA	0.9732	0.8785	0.8567	0.9887	0.9600	0.9168	0.9679
FCAE	0.9786	0.8864	0.8630	0.9920	0.9686	0.9229	0.9763
$FCAE + L_{T}$	0.9975 ²	0.9221 ²	0.9336 ²	0.9961 ²	0.9881 ²	0.9669 ²	0.9822 ²
FCAE + $L_{T}$ + LFACN	0.9996 ¹	0.9763 ¹	0.9722 ¹	0.9979 ¹	0.9932 ¹	0.9791 ¹	0.9957 ¹

¹ The AUC values with bold font and red color in this table represent the optimal performance for each dataset. ² The AUC values with bold font and blue color in this table represent the suboptimal performance for each dataset.

Table 3. The inference time of different detectors.

Dataset	Inference Time of Different Detectors
Dataset	GRX	LRX	FRFE	CRD	AED	LRASR	GAED	RGAE	Auto-AD	Ours
Sandiego-1	0.2146	9.1735	9.8865	3.9145	0.2107	46.3339	0.0305 ²	0.0335	0.0210 ¹	0.0390
Sandiego-2	0.3168	25.6535	14.9289	5.0192	0.2456	57.3001	0.0394	0.0570	0.0275 ²	0.0185 ¹
Sandiego-3	0.0998	18.2074	5.4494	2.1902	0.1884	19.2353	0.0150 ¹	0.0157 ²	0.0160	0.0210
HYDICE	0.2146	9.1735	9.8865	3.9145	0.2107	46.3339	0.0305	0.0335	0.0210 ¹	0.0235 ²
Pavia	0.9823	16.8106	33.5833	5.3146	0.3625	61.1938	0.1072	0.0476	0.0305 ¹	0.0355 ²
LA-1	0.3173	14.2751	21.3859	5.3762	0.2242	72.0277	0.0345	0.0459	0.0240 ¹	0.0330 ²
Gulfport	0.5988	13.7447	13.6096	5.0287	0.2652	63.4349	0.0620	0.0373	0.0220 ¹	0.0275 ²
Average	0.3771	14.6755	14.9511	4.1165	0.2563	48.4835	0.0436	0.0366	0.0229 ¹	0.0283 ²

¹ The AUC values with bold font and red color in this table represent the optimal performance for each dataset. ² The AUC values with bold font and blue color in this table represent the suboptimal performance for each dataset.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, R.; Yang, Z.; Meng, X.; Shao, F. A Novel Fully Convolutional Auto-Encoder Based on Dual Clustering and Latent Feature Adversarial Consistency for Hyperspectral Anomaly Detection. Remote Sens. 2024, 16, 717. https://doi.org/10.3390/rs16040717

AMA Style

Zhao R, Yang Z, Meng X, Shao F. A Novel Fully Convolutional Auto-Encoder Based on Dual Clustering and Latent Feature Adversarial Consistency for Hyperspectral Anomaly Detection. Remote Sensing. 2024; 16(4):717. https://doi.org/10.3390/rs16040717

Chicago/Turabian Style

Zhao, Rui, Zhiwei Yang, Xiangchao Meng, and Feng Shao. 2024. "A Novel Fully Convolutional Auto-Encoder Based on Dual Clustering and Latent Feature Adversarial Consistency for Hyperspectral Anomaly Detection" Remote Sensing 16, no. 4: 717. https://doi.org/10.3390/rs16040717

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Fully Convolutional Auto-Encoder Based on Dual Clustering and Latent Feature Adversarial Consistency for Hyperspectral Anomaly Detection

Abstract

1. Introduction

2. Proposed Method

2.1. Overview

2.2. Extracting Prior Knowledge with Dual Clustering

2.3. Training for Fully Convolutional Auto-Encoder

2.3.1. Data Augmentation

2.3.2. Network Architecture

2.3.3. Learning Procedure

2.4. Testing with the Original HSI

3. Experiments and Analysis

3.1. Data Description

3.2. Evaluation Metrics

3.3. Detection Performance

3.4. Parametric Analysis

3.5. Ablation Study

3.6. Comparison of Inference Times

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI