# An Optimization Technique for Linear Manifold Learning-Based Dimensionality Reduction: Evaluations on Hyperspectral Images

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Linear Projections for Manifold Learning

#### 2.1. LPP and NPE

#### 2.2. OLPP and ONPE

## 3. The Proposed Method

_{w}and W

_{b}. At these steps, the heat kernel function is constructed with a preselected constant heat kernel parameter, t. The next step is the optimization of two objective functions, i.e., maximizing $\sum}_{i,j}{\left({y}_{i}-{y}_{j}\right)}^{2}{W}_{b,ij$and minimizing $\sum}_{i,j}{\left({y}_{i}-{y}_{j}\right)}^{2}{W}_{w,ij$. Finally, these are combined to give a generalized eigenvalue solution.

_{c}be the coefficient that keep the specific values calculated for every class with c in 1, 2, …, n and n is the total class number, and D is the m × m distance matrix where m is the number of training samples. Then, µ

_{c}is found by applying Equations (8) and (9), sequentially, based on the information of D as follows:

_{i}which corresponds to the i-th row. In the equation, k

_{i}keeps the total number of intra-class samples for every row. I

_{ij}keeps the intra-class sample indices for every i-th row; the maximum number of j depends on the i-th sample intra-class number, k

_{i}. The average of distances of intra-class samples of i-th rows is shown with avg

_{i}.

_{i}is calculated for every row of D, the values that belong to the same classes on S are grouped by holding the indices of S elements. This information is kept on ${s}_{{I}_{grouped,c}}$ and allowed us to calculate the average of each grouped class as given by Equation (9). The indices of S elements belonging to the same classes are represented by ${I}_{grouped,c}$, where c is for class id. The total sample number of every class is shown by t

_{c}. The maximum number of j depends on the sample number belonging to the specific classes, t

_{c}.

_{c}. Due to the symmetrical structure of D, the explanations made for notations are also valid for the column-wise perspective.

_{c}was explained. This parameter is exclusive for each class of training samples and used in the formulation to find W. The main steps of the application of the proposed method to LPP and OLPP are described as follows:

- The adjacency graph is constructed.
- Equation (10) is applied to samples among k-nearest neighbors.

- New weights for the samples among k-nearest neighbors are calculated with Equation (10). The class exclusive parameter, µ
_{c}, is used when two samples belong to the same class that c(x_{i}) = c(x_{j}) holds. The parameter p is a constant determined manually to scale W. If the classes of the samples are not the same, the heat kernel function is applied directly. - The samples that are not connected, i.e., not among k-nearest neighbors, are marked with 0 on W or this must be guaranteed in Step 1.
- The intra-class samples that satisfies ${W}_{ij}=0$ among the k-nearest neighbors are determined and their weights are updated to new values determined by the first part of Equation (10) and, correspondingly, the maximum values of the samples on the row of W among k-nearest neighbors and not in the same class are marked as 0, i.e., this value is replaced with 0.

- The generalized eigenvector solution and the projection are implemented as the LPP or the OLPP does.

Algorithm 1 Proposed method |

Given D_{m×m}, the distance matrix; K_{m×m}, the neighborhood information matrix (1, neighbor and 2, not); L_{m×m}, the non-intra-class information matrix (1, non-intra and 2, not); W_{m×m}, J_{m×m}, N_{c×m}, S_{1×m}, Y_{1×m}, and µ_{c×1}, u, an integer; and p, a float used in Equation (10), are zero matrices and variables that are needed during the implementation:c: Total number of classes and used as class enumeration. m: Total number of training number. u, p are independent variables created due to need of algorithm. For i = 1 to m do For j = 1 to m do If (Y(1,i) == Y(1,j)) J(i,j) = D(i,j) end end S(1,i) ← calculate standard deviation for every row for J _{m×m} (exclude zeros)For i = 1 to c do For j = 1 to m do If (Y(1,j) == c) N(c,i) = S(1,j) end end µ(c,1) ← calculate average of every row of N _{c×m} (exclude zeros)For i = 1 to m do For j = 1 to m do If (Y(1,i) == Y(1,j) && K(i,j) == 1) W(i,j) = exp(−D(i,j) ^{2}/2.s^{2} .µ(Y(1,j),1).p)Elseif (Y(1,i) != Y(1,j) && K(i,j) == 1) W(i,j) = exp(−D(i,j) ^{2}/2.s^{2})end end For i = 1 to m do For j = 1 to m do If (Y(1,i) == Y(1,j) && K(i,j) == 0) u = find_max_non_intra_indice (W(i,:) from L(i, :) = 1) W(i,u) = 0 L(i,u) = 0 W(i,j) = exp(-D(i,j) ^{2}/2.s^{2} .µ(Y(1,j),1).p) end end |

^{2}), but extra memory is required to keep 11 different variables to initialize the algorithm.

## 4. Experiments

#### 4.1. Datasets

#### 4.2. Experimental Settings

^{−4}, 2

^{8}] and kernel width parameter, γ ∈ [2

^{−4}, 2

^{4}]. A grid search is applied on the closed intervals indicated. Cross validation is applied to search parameters.

_{c}, that are proposed with this study are visually reported. The other analytical elements can be expressed as the evaluation of band information, weight, and correlation matrices with their significance. The analytical experiments are carried out with randomly chosen data by considering the sample rates shown in Table 1. The parameters remained the same as previously determined in classification phase.

#### 4.3. Experimental Results

_{c}values and enhancement rates for every dataset are reported and shown in Figure 3. According to this figure, the number of classes that show decrement in accuracy is only two out of 29 classes. Furthermore, in Figure 3a, a correlation pattern is noted. Another remarkable point is the classes that seem to match with zero µ

_{c}, as shown in Figure 3b. In fact, these are not zero, but very low values as compared with other classes. This occurs because the classes having very similar band information that causes low scatter values.

## 5. Discussion

## 6. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Pal, M.; Foody, G.M. Feature selection for classification of hyperspectral data by SVM. IEEE Trans. Geosci. Remote Sens.
**2010**, 48, 2297–2307. [Google Scholar] [CrossRef] [Green Version] - Zhuang, L.; Gao, L.; Zhang, B.; Fu, X.; Bioucas-Dias, J.M. Hyperspectral image denoising and anomaly detection based on low-rank and sparse representations. IEEE Trans. Geosci. Remote Sens.
**2020**, 1–17. [Google Scholar] [CrossRef] - Qiao, H.; Zhang, P.; Wang, D.; Zhang, B. An explicit nonlinear mapping for manifold learning. IEEE Trans. Cybern.
**2012**, 43, 51–63. [Google Scholar] [CrossRef] [PubMed] - Pless, R.; Souvenir, R. A survey of manifold learning for images. IPSJ Trans. Comput. Vis. Appl.
**2009**, 1, 83–94. [Google Scholar] [CrossRef] [Green Version] - Lunga, D.; Prasad, S.; Crawford, M.M.; Ersoy, O. Manifold-learning-based feature extraction for classification of hyperspectral data: A review of advances in manifold learning. IEEE Signal Process. Mag.
**2013**, 31, 55–66. [Google Scholar] [CrossRef] - Zhang, J.; Huang, H.; Wang, J. Manifold learning for visualizing and analyzing high-dimensional data. IEEE Ann. Hist. Comput.
**2010**, 25, 54–61. [Google Scholar] [CrossRef] - Van Der Maaten, L.; Postma, E.; Van den Herik, J. Dimensionality reduction: A comparative. J Mach Learn Res
**2009**, 10, 66–71. [Google Scholar] - Tenenbaum, J.B.; De Silva, V.; Langford, J.C. A global geometric framework for nonlinear dimensionality reduction. Science
**2000**, 290, 2319–2323. [Google Scholar] [CrossRef] - Belkin, M.; Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput.
**2003**, 15, 1373–1396. [Google Scholar] [CrossRef] [Green Version] - Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science
**2000**, 290, 2323–2326. [Google Scholar] [CrossRef] [Green Version] - Hong, D.; Yokoya, N.; Zhu, X.X. Learning a robust local manifold representation for hyperspectral dimensionality reduction. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
**2017**, 10, 2960–2975. [Google Scholar] [CrossRef] [Green Version] - Bengio, Y.; Paiement, J.F.; Vincent, P.; Delalleau, O.; Roux, N.; Ouimet, M. Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–13 December 2003; pp. 177–184. [Google Scholar] [CrossRef]
- Chen, G.H.; Wachinger, C.; Golland, P. Sparse projections of medical images onto manifolds. In Proceedings of the International Conference on Information Processing in Medical Imaging, Asilomar, CA, USA, 28 June–3 July 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 292–303. [Google Scholar] [CrossRef] [Green Version]
- Peherstorfer, B.; Pflüger, D.; Bungartz, H.J. A sparse-grid-based out-of-sample extension for dimensionality reduction and clustering with laplacian eigenmaps. In Proceedings of the Australasian Joint Conference on Artificial Intelligence, Perth, WA, Australia, 5–8 December 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 112–121. [Google Scholar] [CrossRef]
- He, X.; Niyogi, P. Locality preserving projections. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 13–18 December 2004; Volume 16, pp. 153–160. [Google Scholar]
- He, X.; Cai, D.; Yan, S.; Zhang, H.J. Neighborhood preserving embedding. In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05), Beijing, China, 17–21 October 2005; pp. 1208–1213. [Google Scholar] [CrossRef]
- Cai, D.; He, X.; Han, J.; Zhang, H.J. Orthogonal laplacianfaces for face recognition. IEEE Trans. Image Process.
**2006**, 15, 3608–3614. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Liu, X.; Yin, J.; Feng, Z.; Dong, J.; Wang, L. Orthogonal neighborhood preserving embedding for face recognition. In Proceedings of the 2007 IEEE International Conference on Image Processing, San Antonio, TX, USA, 16–19 September 2007; p. I-133. [Google Scholar] [CrossRef]
- Yılmaz, A.; Öztürk, Ü. An Analysis of Dimension Reduction Methods Applicable for Out of Sample Problem in Hyperspectral Images. In Proceedings of the 2019 9th International Conference on Recent Advances in Space Technologies (RAST), İstanbul, Turkey, 11–14 June 2019; pp. 381–386. [Google Scholar] [CrossRef]
- Hong, D.; Yokoya, N.; Chanussot, J.; Xu, J.; Zhu, X.X. Learning to propagate labels on graphs: An iterative multitask regression framework for semi-supervised hyperspectral dimensionality reduction. ISPRS J. Photogramm. Remote Sens.
**2019**, 158, 35–49. [Google Scholar] [CrossRef] - Raducanu, B.; Dornaika, F. A supervised non-linear dimensionality reduction approach for manifold learning. Pattern Recognit.
**2012**, 45, 2432–2444. [Google Scholar] [CrossRef] - Chen, J.; Liu, Y. Locally linear embedding: A survey. Artif. Intell. Rev.
**2011**, 36, 29–48. [Google Scholar] [CrossRef] - Yang, W.; Sun, C.; Zhang, L. A multi-manifold discriminant analysis method for image feature extraction. Pattern Recognit.
**2011**, 44, 1649–1657. [Google Scholar] [CrossRef] [Green Version] - Zhang, Z.; Zhao, M.; Chow, T.W. Marginal semi-supervised sub-manifold projections with informative constraints for dimensionality reduction and recognition. Neural Netw.
**2012**, 36, 97–111. [Google Scholar] [CrossRef] - Hua, Q.; Bai, L.; Wang, X.; Liu, Y. Local similarity and diversity preserving discriminant projection for face and handwriting digits recognition. Neurocomputing
**2012**, 86, 150–157. [Google Scholar] [CrossRef] - Wen, J.; Yan, W.; Lin, W. Supervised linear manifold learning feature extraction for hyperspectral image classification. In Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium, Quebec City, QC, Canada, 13–18 July 2014; pp. 3710–3713. [Google Scholar] [CrossRef]
- Sugiyama, M. Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J. Mach. Learn. Res.
**2007**, 8, 1027–1061. [Google Scholar] - Li, B.; Liu, J.; Zhao, Z.Q.; Zhang, W.S. Locally linear representation fisher criterion. In Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN), Dallas, TX, USA, 4–9 August 2013; pp. 1–7. [Google Scholar] [CrossRef]
- Cui, Y.; Fan, L. A novel supervised dimensionality reduction algorithm: Graph-based Fisher analysis. Pattern Recognit.
**2012**, 45, 1471–1481. [Google Scholar] [CrossRef] - Wang, R.; Chen, X. Manifold discriminant analysis. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 429–436. [Google Scholar] [CrossRef]
- Vural, E.; Guillemot, C. A Study of the Classification of Low-Dimensional Data with Supervised Manifold Learning. J. Mach. Learn. Res.
**2017**, 18, 5741–5795. [Google Scholar] - Chao, G.; Luo, Y.; Ding, W. Recent advances in supervised dimension reduction: A survey. Mach. Learn. Knowl. Extr.
**2019**, 1, 341–358. [Google Scholar] [CrossRef] [Green Version] - Matlab Toolbox for Dimensionality Reduction. Available online: https://lvdmaaten.github.io/drtoolbox/ (accessed on 20 February 2021).
- Hyperspectral Remote Sensing Scenes. Available online: http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes (accessed on 20 February 2021).
- Crawford, M.M.; Ma, L.; Kim, W. Exploring nonlinear manifold learning for classification of hyperspectral data. In Optical Remote Sensing; Springer: Berlin/Heidelberg, Germany, 2011; pp. 207–234. [Google Scholar]

**Figure 1.**The flowchart used for classification. (Red rectangle contains the process on a selected training/test dataset for feature learning. This part is repeated five times with randomly selected training/test datasets and classification results are kept, and then averaged to find optimal k and s. Cross validation is applied to train embeddings to determine hyperparameters. Then, the test data is evaluated. The test data are completely unseen data and used just for prediction).

**Figure 2.**Visualizations of subspaces obtained upon “3d_clusters” toy dataset: (

**a**) Original data; (

**b**) LPP; (

**c**) Proposed + LPP, (x y and z represent the features; LPP: Locality Preserving Projection).

**Figure 3.**Enhancement vs. µ

_{c}graphs upon best resulting algorithms reported in Table 2: (

**a**) Indian P.; (

**b**) KSC (enhancement, class-based classification accuracy difference in percent between the proposed method + existing manifold learning algorithm and existing manifold learning algorithm; coeff, µ

_{c}, class-based coefficients proposed with this study). (Note that every circle represents a class. The figure reports class-based accuracy enhancements and µ

_{c}coefficients calculated for every class. The coefficient µ

_{c}is related to the scattering level of intra-class samples and specific for every class. Class-based success of the proposed method can be evaluated.for every dataset and compared with the coefficients proposed).

**Figure 4.**Weight matrices obtained with HSI datasets: (

**a**) LPP for Indian Pines; (

**b**) Proposed + LPP for Indian Pines; (

**c**) OLPP for KSC; (

**d**) Proposed + OLPP for KSC. (LPP: Locality Preserving Projection, OLPP: Orthogonal Locality Preserving Projection).

**Figure 5.**Information of Bands 1 and 2 for “difficult classes”: (

**a**) LPP for Indian Pines; (

**b**) Proposed + LPP for Indian Pines; (

**c**) OLPP for KSC; (

**d**) Proposed + OLPP for KSC. Information of Bands 11 and12 for empirically chosen classes: (

**e**) LPP for Indian Pines; (

**f**) Proposed + LPP for Indian Pines; (

**g**) OLPP for KSC; (

**h**) Proposed + OLPP for KSC. (LPP: Locality Preserving Projection, OLPP: Orthogonal Locality Preserving Projection).

**Figure 6.**Correlation matrices obtained upon face and HSI datasets: (

**a**) LPP for Indian Pines; (

**b**) Proposed + LPP for Indian Pines; (

**c**) OLPP for KSC; (

**d**) Proposed + OLPP for KSC. Significance matrices (p-values) for the correlations: (

**e**) LPP for Indian Pines; (

**f**) Proposed + LPP for Indian Pines; (

**g**) OLPP for KSC; (

**h**) Proposed + OLPP for KSC. (LPP: Locality Preserving Projection, OLPP: Orthogonal Locality Preserving Projection).

Dataset | Indian P. | KSC |
---|---|---|

Total | 10,249 | 5211 |

Training | 1031 | 528 |

Test | 9218 | 4683 |

**Table 2.**Classification performance on linear manifold learning methods for the HSI datasets. (OA, overall classification accuracy in percentage; std, standard deviation; k, neighborhood; s, heat kernel parameter; the algorithm with the best performance is bolded for every dataset).

Dataset | Linear Manifold Learning Method | OA (%) ± std | Parameters |
---|---|---|---|

Indian Pines | LPP | 69.2 ± 1.25 | k = 50, s = 1600 |

NPE | 73.2 ± 0.25 | k = 70 | |

OLPP | 76.6 ± 0.72 | k = 90, s = 800 | |

ONPE | 74.7 ± 0.67 | k = 40 | |

Proposed + LPP | 78.2 ± 0.77 | k = 110, s = 400, p = 10^{−4} | |

Proposed + OLPP | 75.2 ± 1.25 | k = 90, s = 1000, p = 10^{−4} | |

KSC | LPP | 82.9 ± 2.05 | k = 2, s = 19,500 |

NPE | 90.0 ± 0.82 | k = 32 | |

OLPP | 87.4 ± 2.10 | k = 18, s = 36,000 | |

ONPE | 87.0 ± 1.00 | k = 60 | |

Proposed + LPP | 85.0 ± 0.87 | k = 50, s = 19,500, p = 10^{−6} | |

Proposed + OLPP | 91.4 ± 0.83 | k = 20, s = 20,000, p = 10^{−6} |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Öztürk, Ü.; Yılmaz, A.
An Optimization Technique for Linear Manifold Learning-Based Dimensionality Reduction: Evaluations on Hyperspectral Images. *Appl. Sci.* **2021**, *11*, 9063.
https://doi.org/10.3390/app11199063

**AMA Style**

Öztürk Ü, Yılmaz A.
An Optimization Technique for Linear Manifold Learning-Based Dimensionality Reduction: Evaluations on Hyperspectral Images. *Applied Sciences*. 2021; 11(19):9063.
https://doi.org/10.3390/app11199063

**Chicago/Turabian Style**

Öztürk, Ümit, and Atınç Yılmaz.
2021. "An Optimization Technique for Linear Manifold Learning-Based Dimensionality Reduction: Evaluations on Hyperspectral Images" *Applied Sciences* 11, no. 19: 9063.
https://doi.org/10.3390/app11199063