A Classification Method for Airborne Full-Waveform LiDAR Systems Based on a Gramian Angular Field and Convolution Neural Networks

Hu, Bin; Zhao, Yiqiang; He, Jiaji; Liu, Qiang; Chen, Rui

doi:10.3390/electronics11244114

Open AccessArticle

A Classification Method for Airborne Full-Waveform LiDAR Systems Based on a Gramian Angular Field and Convolution Neural Networks

by

Bin Hu

^1,2,3

,

Yiqiang Zhao

^1,2,*,

Jiaji He

^1,2,*

,

Qiang Liu

^1,2 and

Rui Chen

^1,2

¹

School of Microelectronics, Tianjin University, Tianjin 300072, China

²

Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology, Tianjin 300072, China

³

Technical College for the Deaf, Tianjin University of Technology, Tianjin 300382, China

^*

Authors to whom correspondence should be addressed.

Electronics 2022, 11(24), 4114; https://doi.org/10.3390/electronics11244114

Submission received: 8 November 2022 / Revised: 30 November 2022 / Accepted: 8 December 2022 / Published: 9 December 2022

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The data processing of airborne full-waveform light detection and ranging (LiDAR) systems has become a research hotspot in the LiDAR field in recent years. However, the accuracy and reliability of full-waveform classification remain a challenge. The manual features and deep learning techniques in the existing methods cannot fully utilize the temporal features and spatial information in the full waveform. On the premise of preserving temporal dependencies, we convert them into Gramian angular summation field (GASF) images using the polar coordinate method. By introducing spatial attention modules into the neural network, we emphasize the importance of the location of texture information in GASF images. Finally, we use open source and simulated data to evaluate the impact of using different network architectures and transformation methods. Compared with the performance of the state-of-art method, our proposed method can achieve higher precision and F1 scores. The results suggest that transforming the full waveform into GASF images and introducing a spatial attention module outperformed other classification methods.

Keywords:

full-waveform classification; deep learning; airborne LiDAR; Gramian angular field

Graphical Abstract

1. Introduction

An airborne platform has the advantages of flexibility, high accuracy, and low costs; thus, an airborne LiDAR system can be used for large-area rapid laser scanning. The distance between the target and the lidar is calculated by recording the time difference between the transmitted and received pulses to calculate the three-dimensional coordinates of the target [1]. Airborne full-waveform LiDAR can record target backscatter echo waveforms with very small sampling intervals. As shown in Figure 1, the additional information about the target is reflected in echo morphological characteristics such as pulse width, the number of echoes, echo shape, and reflected power. Therefore, a combination of artificial features and machine learning can be used for land cover classification [2] and ocean engineering [3,4,5]. However, such algorithms are extremely dependent on a large number of manual features and annotation data [6] so the universality of those algorithms is limited.

With the development of deep learning, the full waveform can be directly fed into the neural network as a one-dimensional signal [7]. End-to-end classifier training based on deep learning can automatically extract features from signals [8]. Although the complex communication process can increase the difficulty of classification, deep learning can still achieve good results [3]. However, the accuracy and stability of echo classification will be reduced due to complex terrain or random noise.

To overcome the above problems, we propose a framework for full-waveform data classification. First, we add random Gaussian noise and abrupt noise to the original data for data enhancement. Then, we transform the full waveform into an image using the Gramian angular summation field (GASF) method [9]. Finally, a convolution neural network with a Spatial Attention Module (SAM) is used to classify the generated images. The main contributions of this paper are as follows:

The polar coordinates are used to preserve the time dependence of the LiDAR waveform by transforming it into one GASF image. Therefore, the image classification algorithm can be used to improve the performance of full-waveform classification.
A spatial attention mechanism is introduced to emphasize the importance of the location of texture information in the GASF images.
A general architecture for full-waveform classifications of LiDAR data is proposed for the first time. Urban and bathymetric data are used to verify that the architecture can be applied to different types of full-waveform classifications.

The rest of this paper is arranged as follows. Section 2 describes previous relevant works. Section 3 analyzes the structure of full-waveform data and introduces the experimental dataset. Section 4 introduces the proposed method. Section 5 compares the performance of the proposed method with other state-of-the-art algorithms. Section 6 summarizes the conclusions.

2. Related Works

Full-waveform classification methods are mainly divided into two categories. One is based on manual features followed by classical machine learning and the other is based on deep learning with learned features.

Full-waveform morphological features are used for island feature classification because the shape of the main echo and backscatter can reflect the surface and geometry characteristics [10]. With the increase in the number of features, principal component analysis (PCA) technology is also used to reduce the dimension of the model [4,11]. Before extracting manual features, it is necessary to remove noise based on a dynamic threshold or filtering algorithms. Close distances between multiple targets will cause echo overlap so deconvolution and multi-Gaussian fitting need to be used before feature extraction [12,13]. After the decomposition of echoes into multiple Gaussian components, the manual features combined with the support vector machine (SVM) [6] and random forest (RF) [14,15] methods can be used for classification. However, the accuracy is mainly subject to the quality of the manual features, which requires specialized knowledge. By combining airborne lidar bathymetry (ALB) bottom waveforms and bathymetric characteristics, the accuracy of coral reef detection can be greatly improved [16]. ALB bottom returns can also provide information about seafloor features, but environmental and ALB hardware uncertainties result in poor classification accuracy [17].

A 1D CNN can obtain a compact representation of LiDAR waveforms by regarding them as one-dimensional time series [7]. The deep learning classification method is widely used in mechanical anomaly detection and seismic detection [18]. Because the structural characteristics of LiDAR full-waveform echoes differ from ECG and seismic signals, these architectures cannot be directly applied to LiDAR echo classification [19,20]. The processing of full waveforms is affected by the SNR and sampling rate so denoising and super resolution based on encoder–decoder architecture are also proposed [21,22]. Although the above-mentioned end-to-end method can reduce the dependence on manual features, the ability to extract time-sequence dependency needs to be improved [5,23]. Neural networks based on LSTM have the function of long-term memory and can extract the dependent features of time series, but are insufficient for extracting the spatial features [8]. In order to overcome the above problems, time series can be transformed into higher dimensions. Using the Gramian angular summation field (GASF), Gramian angular difference field (GADF), or Markov transition field (MTF) methods can transform the signal into a time series image, and then the technology of computer vision in deep learning can be used to extract the high-dimensional features from transformed images [24,25].

3. Materials

3.1. Analysis of Signal Modal

The urban full-waveform echo can be considered the superposition of scattering from multiple targets with different distances in a laser foot, as in Equation (1). According to the reflected energy of different elevation points, it can not only detect the vertical distribution of the target in the process of laser beam propagation but also reflect the scattering characteristics of the ground objects. As shown in Figure 1, the laser pulse may encounter different targets on its propagation path such as passing through leaves and forming multiple echoes. If the light spot formed by the laser beam on the ground shines on multiple targets, multiple targets will be recorded in chronological order such as vegetation and buildings. When the light spot shines on an inclined roof, the echo width has a certain extension. Let the ideal waveform model be a Gaussian mixture model, which is expressed as the superposition of k Gaussian function fractions:

\begin{matrix} f (t) = \sum_{i = 1}^{k} A_{i} e x p (- \frac{{(t - μ_{i})}^{2}}{F_{i}^{2} / (4 l n 2)}) + δ \end{matrix}

(1)

where

A_{i}

,

μ_{i}

, and

F_{i}

represent the pulse amplitude, center position, and half-width of the i th Gaussian component, respectively, and

δ

denotes noise.

As shown in Figure 2 and Equation (2), the full-waveform data of airborne bathymetry can be regarded as the superposition of sea-surface echo, backscatter, sea-bottom echo, and noise. The signals of the sea surface and seabed can be seen as the superposition of two different Gaussian echoes.

f_{s} (t) = f_{k = 2} (t) + f_{b} (t)

(2)

Light scattering in seawater includes forward scattering and backward scattering. Before reaching the seafloor, the laser will zigzag downward through multiple forward scattering and the backscattered part will directly reflect one or more noise signals with strong peaks on the echo waveform, forming the water backscattered echo. Due to the scattering of light by the sea, the sea-bottom echo signal finally received by LiDAR will be delayed and broadened. This non-ideal effect is more obvious when the water depth is deeper. As Equation (3), The expression methods of backscattering can be quadrilateral:

f_{b} (t) = \{\begin{matrix} 0, & t \leq a \\ e \times \frac{t - a}{b - a}, & a < t \leq b \\ \frac{e c - b g + t (g - e)}{c - b}, & b < t \leq c \\ g \times \frac{t - d}{c - d}, & c < t \leq d \\ 0, & t > d \end{matrix}

(3)

where a, b, c, and d are the times corresponding to the four vertices A, B, C, and D of the quadrilateral, respectively, and e and g are the corresponding strengths of points B and C, respectively.

3.2. Datasets

This dataset was collected by the Riegl LMS-Q780 full-waveform airborne LiDAR system. The authors make joint annotations according to the GPS, LiDAR, aircraft attitude, and camera [26]. Due to the high working altitude of the aircraft, the LiDAR data are sparse. Therefore, the label of a full waveform is mainly based on the meaning of the corresponding position image. As shown in Figure 3, the echo signal received by the laser receiver is a one-dimensional time series and the shape of the target surface has a great influence on the echo shape. For flat, bare ground, the echo waveform has almost no change compared to the transmitted waveform. For sloping objects, the width of the returned waveform will be stretched to some extent. There are also differences in the peak position and amplitude for the building echo. For stacked vegetation or powerlines, the returned waveform is a curve with several overlapping phenomena. It can reflect the vertical height changes in different targets.

We used the ocean echo data generated based on the Monte Carlo model [3,27]. As shown in Figure 4, when the water depth is shallow, the influence of backscattering is not obvious and two discrete Gaussian echoes can be observed. When the water depth is very shallow, the sea-surface echo and sea-bottom echo are combined and the received echo signal is displayed as a single echo. From the angle of the echo energy, it can be seen that when the water depth becomes deeper, the echo energy decreases due to the enhanced scattering and absorption of seawater and the backscattering of water becomes obvious. We removed the highest Gaussian echoes in the deep-water and shallow-water echoes to generate the echo of the missing sea surface.

4. Proposed Method

4.1. GASF Transformation

As shown in Figure 5, the Gramian angular summation field method can be used to encode time series as images, which can retain the characteristics of a LiDAR waveform using a polar coordinate system. The method consists of two steps to convert the echo into a matrix.

First, we convert the signal from the space (time × amplitude) to polar coordinates [28] using the following operations:

\begin{matrix} X = (x_{1}, x_{2}, \dots, x_{N}), \{\begin{matrix} ϕ = a r c c o s (X_{n o r m, i}) & ϕ \in [0, π] \\ r = \frac{n}{L} & r \in R^{+} \end{matrix}, \end{matrix}

(4)

Then, the Gramian matrix can be calculated in a new space, where L is the total length of the echo signal. This conversion is bijective and the absolute time relationship is preserved. Because Equation (4) is only defined for

X \in [- 1, 1]

, it first needs to scale the data. Furthermore, we have the normalization form of the input X as follows:

X_{n o r m, i} = \frac{x_{i} - U B + (x_{i} - L B)}{U B - L B},

(5)

where UB refers to the upper-bound parameters and LB refers to the lower-bound parameters.

G A S F = [\begin{matrix} cos (ϕ_{1} + ϕ_{1}) & cos (ϕ_{1} + ϕ_{2}) & \dots & cos (ϕ_{1} + ϕ_{N}) \\ cos (ϕ_{2} + ϕ_{1}) & cos (ϕ_{2} + ϕ_{2}) & \dots & cos (ϕ_{2} + ϕ_{N}) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ cos (ϕ_{N} + ϕ_{1}) & cos (ϕ_{N} + ϕ_{2}) & \dots & cos (ϕ_{N} + ϕ_{N}) \end{matrix}]

(6)

The final Gramian matrix can be computed as in Equation (6), where each element in the GAF matrix is the cosine of the summation of angles that can preserve the relative correlation. Then, the LiDAR echoes can be transformed into heatmaps, whose values range from 0 to 1.

The elements of each point represent the relative correlation within a certain interval and the GASF image can preserve temporal dependency. To illustrate the advantage of transforming full waveforms into two-dimensional images, as shown in Figure 6 and Figure 7, the corresponding GASF image of the different waveforms that can be easily recognized has different textures for the images and locations of multiple different crosslines. The red area in the GASF images is meaningless for classification and represents Gaussian noise with a low signal-to-noise ratio.

4.2. Data Augmentation

Due to many factors, such as terrain fluctuation, noise, and multiple diffuse reflections of objects, it is easy to miss some overlapping echoes, which leads to a reduction in classification accuracy. To improve the generalization performance and robustness of the convolutional neural network, we refer to the contrast experiment of airborne echo denoising [29] and increase the diversity of samples by adding abrupt noise and Gaussian white noise to the original noise.

As shown in Figure 8, the echo after adding noise is converted into a GASF image. With the increase in noise, the dark lines representing the target echo exhibit few changes. However, the red area in the figure changes significantly and the noise signal will also lead to intersections. The abrupt noise refers to the pulse signal caused by a random target and random reflection. We use pulse percentage to represent the percentage of impulse noise points in the total number of sampling points. By adjusting the pulse percentage to 1, 2, 4, or 8 and increasing the energy of the Gaussian noise by 1.5, 2.5, 3.5, and 4.5 times, we expand the number and diversity of samples.

4.3. Spatial Attention Module

By introducing an attention mechanism, when the background noise is strong, a certain weight will be given to an effective echo. In most cases, the network structure focuses on extracting the characteristics of the main echo.

The original sequence is

T = {T_{1}, T_{2}, \dots T_{N}}

, hence a subsequence

S_{i : j} = {T_{i}, T_{i + 1}, \dots T_{j}}

can be defined. Extracting features from key subsequences is very important for full-waveform classification because it can show that the maximum number of effective echoes is related to laser power and aircraft flight altitude [6]. When the multi-target echo is accompanied by random noise and occlusions, the key subsequence features extracted based on morphology cannot work effectively. As shown in Figure 9, we introduce a Spatial Attention Module (SAM) to optimize the feature weight allocation of key parts in the GASF image.

\begin{matrix} M_{s} (F^{'}) & = σ (f^{7 \times 7} ([A v g P o o l (F^{'}), M a x P o o l (F^{'})])) \\ = σ (f^{7 \times 7} ([F_{a v g}^{S}; F_{m a x}^{S}])), \end{matrix}

(7)

where

F^{'}

is a feature with a dimension of

H \times W \times C

,

σ

represents the sigmoid activation function, and f indicates the convolution layer. Then, we perform the maximum pooling and average pooling of one channel dimension to obtain two

H \times W \times 1

channel descriptions. After the two descriptions are spliced together according to the channel, the weight coefficient

M_{s}

is obtained through the volume base layer and sigmoid activation function layer. Finally,

M_{s}

is multiplied by the input feature

F^{'}

to obtain the scaled new feature.

4.4. Convolution Neural Network

The proposed method is divided into three stages: First, we add Gaussian noise and impulse noise to the full waveform offline to enhance the diversity of the waveform. Then, we transmit the waveform to the GASF images and input them into the neural network. Finally, we introduce a spatial attention mechanism to implement an end-to-end GASF image classification neural network.

The network architecture is shown in Figure 10. Our CNN modal contains five 2D convolution layers and two SAM layers. The inputs of the model are

64 \times 64 \times 3

images. The number of convolution kernels are, in order, 16, 32, 32, 64, and 64. The batch normalization layer and the rectified linear unit (RELU) activation function layer are used to reduce overfitting and guarantee the network’s steady convergence. The size of the convolution kernel of the convolution layer is

3 \times 3

. The

2 \times 2

max-pooling operation is applied to reduce the size of the feature map and ease the computational demands on the CNN. By introducing two spatial attention modules, the spatio-temporal features that can reveal the internal structure of the LiDAR echo are extracted from the high-dimensional features. All the features are brought together via the flatten layer. The last two layers are fully connected layers and the numbers of nodes in the fully connected layers are 500 and 10. Finally, the classification results of each waveform can be obtained by combining the softmax layer and the.

5. Experiment Results and Discussion

All programs were run on an Ubuntu 20.02 system with a Xeon

5 \times 4

CPU server, 64GB memory, and an NVIDIA GTX3090 graphics card. To compare and analyze the classification results, we used the Python framework to complete the deep learning part of the experiment and the sklearn and libsvm function libraries to implement the machine learning algorithm. The classification precision, recall, F1 score, and confusion moment analysis results were analyzed.

The performance of the classification network was impacted by the imbalance of the training data. Therefore, we selected 10,000 samples for each type. After data enhancement, we obtained 40,000 samples for each category. In order to prepare the dataset, it took about three days for the Python-based pyts tool library to convert the full waveform into time-series images. Then, we divided the dataset with a ratio of 4:1 to form the training and test sets. The maximum number of iterations in the network training was 100 and the batch size was 32. The loss was calculated using the cross-entropy loss function and the Adam optimizer was applied with a learning rate of 0.001.

5.1. Comparison with Different Waveform Transform Methods

The concept of classifying time-series data using computer vision technology was motivated by the rapid growth of this technology. To uncover features and patterns not visible in the one-dimensional sequence of the original time series, many transform methods were developed to encode time series as input images for computer vision.

As shown in Figure 11, the raw waveforms of ground, vegetation, building, and powerline echoes can be converted to time-series images. The bluer color denotes lower values, whereas the redder color denotes higher values. Random background noise and complex terrain lead to changes in the amplitude, position, broadening, and noise of the LiDAR echo. Different echoes can be clearly distinguished using Gramian angular summation field (GASF), Gramian angular difference Field (GADF), and Markov transition field (MTF) images [28]. The value of each pixel in the GASF and GADF images is similar as follows:

\begin{matrix} G A S F_{i, j} = c o s (ϕ_{i} + ϕ_{j}) \\ G A D F_{i, j} = s i n (ϕ_{i} - ϕ_{j}) \end{matrix}

(8)

The scale of the waveform conversion to a picture is an important parameter. Our experiment converted each waveform into an image of

64 \times 64

pixels. Although there are similarities between the Gramian angular summation field (GASF) and MTF methods, it is necessary to evaluate which method can perform better in the classification results.

As shown in Table 1 and Table 2, precision, recall, and F1 score were used to measure the classification algorithm. Three image-encoding methods, GADF, GASF, and MTF, were applied to measure the impact of the image transform algorithms on the classification results. Although MTF can preserve details in the temporal range, it usually has lower stability than GASF and GADF due to the uncertainty in the inverse map of the MTF inverse mapping. Unlike GAF, the MTF technique cannot go back to the original time-series data because the changed matrix is produced by the probabilities of elements moving. Therefore, we finally chose the GAF as the preferred conversion method [28]. When using GASF and GADF, the precision difference between each subcategory in the urban and bathymetric data was less than 0.4%. In short, the selection of the image transformation method did not greatly affect the classification.

5.2. Comparison of Different Classification Methods

We used SVM and RF [6,14], which are effective in this field, and compared them in combination with the handcrafted features. The handcrafted features used for the comparison in this experiment were the number of echoes, echo intensity, full width at half maxima (FWHM), backscattering cross-section, skewness of the first echo, kurtosis of the first echo, width difference between the first and last echoes, amplitude difference between the first and last echoes, area difference between the first and last echoes, overlapping area of the first and last echoes, overlapping width of the first and last echoes, and time difference between the first and last echoes. As shown in Table 3 and Table 4, the artificial features combined with SVM and RF achieved higher precision for vegetation and powerlines but the recall decreased significantly, indicating that the robustness of handcrafted features was poor. The precision and recall of RF were slightly higher than SVM because it has good generalization ability in multivariate classification. The F1 scores of the deep learning-based method were higher than those of SVM and RF, especially for powerlines and missed surfaces. The data-driven one-dimensional signal classification network showed stronger robustness. This is probably because when multiple echoes overlapped, the echo pulse width or amplitude could not be accurately extracted. The rightmost column in the table represents the time required to identify 2000 waveforms. The SVM and RF methods mainly spent time on feature extraction and normalization. After introducing the LSTM unit or attention mechanism, the recognition speed slowed down slightly. Although our proposed method was slightly slower than other deep learning methods, the classification precision and F1 score were improved to a certain extent. We obtained F1 scores of 88.4%, 97.8%, 90.8%, and 92.6%, respectively, for ground, vegetation, buildings, and powerlines based on the proposed architecture.

LiDARNet [7] is a neural network composed of multi-layer 1D convolutions. Local features can be extracted using various filters in the convolution layer, pooling layer, normalization layer, and full connection layer. LiDARNet cannot retain the memory of previous time-series patterns so the F1 score of the bathymetry dataset was lower than with 1D SEM [3] and FCN-LSTM [8]. By introducing an attention mechanism, 1D SEM optimized the weights of the weak echo features in the time series. FCN-LSTM was sufficient for capturing the time-history relationship of the lidar signal at different sampling points, which is very effective for multi-echo sequences but insufficient for spatial feature extraction. This is because each storage unit of LSTM contains a storage unit and three gates, which can learn the time dependency in a sequence but find it difficult to learn the long-term dependency in a long sequence. As shown in Table 4, we obtained precision values of 99.6%, 99.5%, 99.6%, and 99.1%, respectively, for land, shallow water, deep water, and missed surfaces based on the proposed architecture. This shows that the spatio-temporal features extracted using the two-dimensional convolution network were more effective than the local and time-series dependency features of the one-dimensional convolution network. A 2D CNN and 2D SAM-CNN were used to verify the impact of the spatial attention modules on the classification results. The 2D SAM-CNN had higher precision and F1 scores than the 2D CNN, which indicates that the spatial attention mechanism optimized the feature weights of key positions. Therefore, it can be verified that the proposed method can achieve higher accuracy and has a lower leak detection rate.

5.3. Analysis by Feature Visualization

In order to further analyze the characteristics of the proposed method, we used class activation mapping [30] to visualize the areas in the original data that affected the echo classification. As shown in Figure 12 and Figure 13, the colors represent the weight of the network structure to the region of interest; the darker the color, the greater the weight.

As shown in Figure 12a, The 1D FCN-LSTM model focused too much on the noise and ignored the extraction of the main echo features. As shown in Figure 12b, when there were multiple targets in the echo, the effective signal was not well weighed, which may have led to the low classification accuracy of some subclasses.

As shown in Figure 13, the pixel of each point represents the relative correlation within a certain interval [31]. By introducing the SAM, the relationship between the dark lines was given more attention. The intersection of the dark lines in Figure 13a represents the main echo, and the area representing the weak echo in the upper-right corner in Figure 13b was given more weight. As shown in Figure 13b, the area between the dark lines had a greater impact on the classification of multiple-echo time-series images, although the weak signal and random noise affected classification accuracy. Two-dimensional convolution combined with a spatial attention mechanism can extract high-dimensional features and can be used to improve classification performance.

6. Conclusions

In this paper, we propose a general method for full-waveform classification, which enhances the stability of the classification results based on high-dimensional features. By converting the full waveform into a GASF image and introducing a spatial attention module, the feature extraction of the full waveform is optimized. The classification performance of machine learning-based methods is lower than those of deep learning methods, mainly because the echo features extracted with fixed parameters sometimes fail. In the future, we will try to introduce more features of time-series classification into full-waveform classification.

Author Contributions

Literature Search, Figures, Study Design, and Data Analysis, B.H.; writing—review, Y.Z.; editing, J.H.; writing—review, Q.L.; Study Design, R.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Guangxi Innovative Development Grand Grant (No. 2018AA13005).

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Guiotte, F.; Rao, M.; Lefèvre, S.; Tang, P.; Corpetti, T. Relation network for full-waveforms lidar classification. ISPRS-Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci. 2020, 43, 515–520. [Google Scholar] [CrossRef]
Qin, H.; Zhou, W.; Zhao, W. Airborne small-footprint full-waveform LiDAR data for urban land cover classification. Front. Environ. Sci. 2022, 10, 2960. [Google Scholar] [CrossRef]
Zhao, Y.; Yu, X.; Hu, B.; Chen, R. A Multi-Source Convolutional Neural Network for Lidar Bathymetry Data Classification. Mar. Geod. 2022, 45, 232–250. [Google Scholar] [CrossRef]
Ji, X.; Tang, Q.; Xu, W.; Li, J. Island feature classification for single-wavelength airborne lidar bathymetry based on full-waveform parameters. Appl. Opt. 2021, 60, 3055–3061. [Google Scholar] [CrossRef] [PubMed]
Shanjiang, H.; Yan, H.; Bangyi, T.; Jiayong, Y.; Weibiao, C. Classification of sea and land waveforms based on deep learning for airborne laser bathymetry. Infrared Laser Eng. 2019, 48, 1113004. [Google Scholar] [CrossRef]
Zhou, M.; Li, C.R.; Ma, L.; Guan, H.C. Land cover classification from full-waveform lidar data based on support vector machines. In Proceedings of the ISPRS-International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Prague, Czech Republic, 12–19 July 2016. [Google Scholar] [CrossRef] [Green Version]
Aßmann, A.; Stewart, B.; Wallace, A.M. Deep Learning for LiDAR Waveforms with Multiple Returns. In Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 18–21 January 2021; pp. 1571–1575. [Google Scholar] [CrossRef]
Karim, F.; Majumdar, S.; Darabi, H.; Chen, S. LSTM Fully Convolutional Networks for Time Series Classification. IEEE Access 2018, 6, 1662–1669. [Google Scholar] [CrossRef]
Liu, L.; Wang, Z. Encoding temporal Markov dynamics in graph for time series visualization. arXiv 2018, arXiv:1610.07273. [Google Scholar]
Schwarz, R.; Pfeifer, N.; Pfennigbauer, M.; Ullrich, A. Exponential decomposition with implicit deconvolution of lidar backscatter from the water column. PFG-Photogramm. Remote Sens. Geoinf. Sci. 2017, 85, 159–167. [Google Scholar] [CrossRef]
Deng, X.; Yang, G.; Zhang, H.; Chen, G. Accurate quantification of alkalinity ofsintered ore by random forest model based onPCA and variable importance (PCA-VI-RF). Appl. Opt. 2020, 59, 2042–2049. [Google Scholar] [CrossRef]
Cui, Z.; Chen, W.; Chen, Y. Multi-scale convolutional neural networks for time series classification. arXiv 2016, arXiv:1603.06995. [Google Scholar]
Xing, S.; Wang, D.; Xu, Q.; Lin, Y.; Li, P.; Jiao, L.; Zhang, X.; Liu, C. A depth-adaptive waveform decomposition method for airborne LiDAR bathymetry. Sensors 2019, 19, 5065. [Google Scholar] [CrossRef] [Green Version]
Ma, L.; Zhou, M.; Li, C. Land covers classification based on Random Forest method using features from full-waveform LiDAR data. In Proceedings of the The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W7, ISPRS, Geospatial Week 2017, Wuhan, China, 18–22 September 2017; Volume 42. [Google Scholar]
Ji, X.; Yang, B.; Tang, Q. Seabed sediment classification using multibeam backscatter data based on the selecting optimal random forest model. Appl. Acoust. 2020, 167, 107387. [Google Scholar] [CrossRef]
Su, D.; Yang, F.; Ma, Y.; Zhang, K.; Wang, M. Classification of Coral Reefs in the South China Sea by Combining Airborne LiDAR Bathymetry Bottom Waveforms and Bathymetric Features. IEEE Trans. Geosci. Remote. Sens. 2018, 57, 815–828. [Google Scholar] [CrossRef]
Eren, F.; Pe’Eri, S.; Rzhanov, Y.; Ward, L. Bottom characterization by using airborne lidar bathymetry (ALB) waveform features obtained from bottom return residual analysis. Remote Sens. Environ. 2018, 206, 260–274. [Google Scholar] [CrossRef]
Zhao, M.; Zhong, S.; Fu, X.; Tang, B.; Pecht, M. Deep residual shrinkage networks for fault diagnosis. IEEE Trans. Ind. Inform. 2019, 16, 4681–4690. [Google Scholar] [CrossRef]
Shinohara, T.; Xiu, H.; Matsuoka, M. FWNet: Semantic Segmentation for Full-Waveform LiDAR Data Using Deep Learning. Sensors 2020, 20, 3568. [Google Scholar] [CrossRef]
Zhao, M.; Chen, S.; Yuen, D. Waveform classification and seismic recognition by convolution neural network. Chin. J. Geophys. 2019, 62, 374–382. [Google Scholar]
Gangping, L.; Jun, K. Deep-learning for super-resolution full-waveform lidar. In Proceedings Volume 11187, Optoelectronic Imaging and Multimedia Technology VI; Society of Photo-Optical Instrumentation Engineers (SPIE) Conference, Series; Qionghai, D., Shimura, T., Zheng, Z., Eds.; SPIE/COS Photonics Asia: Hangzhou, China, 2019. [Google Scholar]
Hu, M.; Mao, J.; Li, J.; Wang, Q.; Zhang, Y. A Novel Lidar Signal Denoising Method Based on Convolutional Autoencoding Deep Learning Neural Network. Atmosphere 2021, 12, 1403. [Google Scholar] [CrossRef]
Dai, W.; Dai, C.; Qu, S.; Li, J.; Das, S. Very Deep Convolutional Neural Networks for Raw Waveforms. In Proceedings of the IEEE International Conference on Acoustics, New Orleans, LA, USA, 5–9 March 2017; pp. 421–425. [Google Scholar]
Wang, Z.; Yan, W.; Oates, T. Time series classification from scratch with deep neural networks: A strong baseline. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 1578–1585. [Google Scholar]
Geng, Z.; Wang, Y. Automated design of a convolutional neural network with multi-scale filters for cost-efficient seismic data classification. Nat. Commun. 2020, 11, 3311. [Google Scholar] [CrossRef]
Zorzi, S.; Maset, E.; Fusiello, A.; Crosilla, F. Full-waveform airborne LiDAR data classification using convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8255–8261. [Google Scholar] [CrossRef]
Li, J.; Ma, Y.; Zhou, Q.; Zhou, B.; Wang, H. Monte Carlo study on pulse response of underwater optical channel. Opt. Eng. 2012, 51, 6001. [Google Scholar] [CrossRef]
Wang, Z.; Oates, T. Imaging time-series to improve classification and imputation. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
Hu, B.; Zhao, Y.; Chen, R.; Liu, Q.; Wang, P.; Zhang, Q. A denoising method for lidar bathymetry system basedon low-rank recovery of non-local data structures. Appl. Opt. 2021, 61. [Google Scholar] [CrossRef]
Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning Deep Features for Discriminative Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of echo under urban classification.

Figure 2. Schematic diagram of bathymetry data.

Figure 3. Urban full waveform.

Figure 4. Bathymetry full waveform.

Figure 5. Workflow of full-waveform transformation.

Figure 6. Diagram of different transform methods used for the urban waveform.

Figure 7. Diagram of different transform methods used for the bathymetry waveform.

Figure 8. Workflow of data augmentation.

Figure 9. Spatial attention module.

Figure 10. Flowchart of proposed convolutional neural network.

Figure 11. Results of different transform methods used on the urban waveform.

Figure 12. 1D FCN-LSTM feature visualization.

Figure 13. 2D SAM-CNN feature visualization.

Table 1. Comparison of different methods used on the urban full waveform.

Method	Metrics	Ground	Vegetation	Buildings	Powerlines
MTF	Precision	92.1%	96.9%	87.9%	90.6%
	Recall	81.3%	94.2%	82.1%	93.6%
	F1 score	86.3%	95.3%	84.9%	92.1%
GADF	Precision	91.5%	96.5%	87.8%	90.9%
	Recall	84.3%	97.9%	93.3%	94.1%
	F1 score	87.7%	97.2%	90.5%	92.5%
GASF	Precision	91.7%	96.8%	88.3%	91.3%
	Recall	85.5%	98.9%	93.5%	93.9%
	F1 score	88.4%	97.8%	90.8%	92.6%

Table 2. Comparison of different methods used on the bathymetry waveform.

Method	Metrics	Land	Shallow	Deep	Missed Surfaces
MTF	Precision	98.6%	98.5%	99.3%	98.9%
	Recall	99.2%	99.1%	97.5%	98.1%
	F1 score	98.8%	98.7%	98.4%	98.5%
GADF	Precision	99.8%	99.3%	99.5%	99.0%
	Recall	99.2%	99.1%	99.3%	98.6%
	F1 score	99.5%	99.2%	99.4%	98.8%
GASF	Precision	99.6%	99.5%	99.6%	99.1%
	Recall	99.3%	99.1%	99.2%	99.8%
	F1 score	99.4%	99.3%	99.4%	99.4%

Table 3. Comparison of different methods of urban data classification.

Method	Metrics	Ground	Vegetation	Buildings	Powerlines	Time
SVM [6]	Precision	81.5%	89.3%	80.4%	86.2%	80 s
	Recall	75.3%	82.4%	75.2%	81.9%
	F1 score	78.3%	85.7%	77.7%	83.9%
RF [14]	Precision	84.3%	92.1%	79.2%	89.3%	68 s
	Recall	78.4%	87.4%	77.3%	85.2%
	F1 score	81.2%	89.6%	78.2%	87.2%
LiDARNet [7]	Precision	85.6%	91.8%	80.9%	89.9%	7 s
	Recall	82.1%	89.6%	83.1%	88.2%
	F1 score	83.8%	90.7%	82.0%	89.1%
1D SEM [3]	Precision	86.2%	92.1%	81.2%	90.1%	9 s
	Recall	83.5%	91.4%	88.4%	87.4%
	F1 score	84.8%	91.7%	84.6%	88.7%
FCN-LSTM [8]	Precision	86.9%	93.2%	82.1%	91.3%	9 s
	Recall	83.8%	92.7%	88.1%	88.6%
	F1 score	85.3%	92.9%	84.9%	89.9%
2D CNN	Precision	90.3%	95.1%	86.4%	90.8%	12 s
	Recall	81.8%	95.3%	90.3%	93.3%
	F1 score	85.8%	95.2%	88.3%	92.1%
2D SAM-CNN	Precision	91.7%	96.8%	88.3%	91.3%	16 s
	Recall	85.5%	98.9%	93.5%	93.9%
	F1 score	88.4%	97.8%	90.8%	92.6%

Table 4. Comparison of different methods of bathymetry data classification.

Method	Metrics	Land	Shallow Water	Deep Water	Missed Surfaces	Time
SVM [6]	Precision	89.3%	82.9%	91.9%	62.3%	83 s
	Recall	86.1%	89.9%	87.8%	80.9%
	F1 score	87.7%	86.3%	89.8%	70.4%
RF [14]	Precision	96.7%	93.4%	96.8%	86.2%	71 s
	Recall	97.3%	94.7%	94.9%	91.3%
	F1 score	96.9%	90.1%	95.8%	93.7%
LiDARNet [7]	Precision	95.7%	94.2%	94.4%	90.5%	7 s
	Recall	96.2%	94.6%	93.8%	91.1%
	F1 score	95.9%	94.4%	94.1%	90.8%
1D SEM [3]	Precision	96.5%	94.8%	95.3%	91.3%	9 s
	Recall	98.1%	95.2%	94.6%	91.7%
	F1 score	97.3%	95.1%	94.9%	91.5%
FCN-LSTM [8]	Precision	96.3%	95.1%	96.4%	91.1%	9 s
	Recall	93.5%	96.3%	96.7%	90.2%
	F1 score	94.9%	95.7%	96.5%	90.6%
2D CNN	Precision	99.1%	98.3%	98.7%	98.4%	12 s
	Recall	97.5%	98.8%	96.2%	97.6%
	F1 score	98.3%	98.5%	97.4%	97.9%
2D SAM-CNN	Precision	99.6%	99.5%	99.6%	99.1%	16 s
	Recall	99.3%	99.1%	99.2%	99.8%
	F1 score	99.4%	99.3%	99.4%	99.4%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, B.; Zhao, Y.; He, J.; Liu, Q.; Chen, R. A Classification Method for Airborne Full-Waveform LiDAR Systems Based on a Gramian Angular Field and Convolution Neural Networks. Electronics 2022, 11, 4114. https://doi.org/10.3390/electronics11244114

AMA Style

Hu B, Zhao Y, He J, Liu Q, Chen R. A Classification Method for Airborne Full-Waveform LiDAR Systems Based on a Gramian Angular Field and Convolution Neural Networks. Electronics. 2022; 11(24):4114. https://doi.org/10.3390/electronics11244114

Chicago/Turabian Style

Hu, Bin, Yiqiang Zhao, Jiaji He, Qiang Liu, and Rui Chen. 2022. "A Classification Method for Airborne Full-Waveform LiDAR Systems Based on a Gramian Angular Field and Convolution Neural Networks" Electronics 11, no. 24: 4114. https://doi.org/10.3390/electronics11244114

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Classification Method for Airborne Full-Waveform LiDAR Systems Based on a Gramian Angular Field and Convolution Neural Networks

Abstract

1. Introduction

2. Related Works

3. Materials

3.1. Analysis of Signal Modal

3.2. Datasets

4. Proposed Method

4.1. GASF Transformation

4.2. Data Augmentation

4.3. Spatial Attention Module

4.4. Convolution Neural Network

5. Experiment Results and Discussion

5.1. Comparison with Different Waveform Transform Methods

5.2. Comparison of Different Classification Methods

5.3. Analysis by Feature Visualization

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI