Point Cloud Deep Learning Network Based on Local Domain Multi-Level Feature

Han, Xianquan; Chen, Xijiang; Deng, Hui; Wan, Peng; Li, Jianzhou

doi:10.3390/app131910804

Open AccessArticle

Point Cloud Deep Learning Network Based on Local Domain Multi-Level Feature

by

Xianquan Han

¹,

Xijiang Chen

^2,*

,

Hui Deng

²,

Peng Wan

¹ and

Jianzhou Li

¹

Changjiang River Scientific Research Institute, Wuhan 430010, China

²

School of Safety Science and Emergency Management, Wuhan University of Technology, Wuhan 430079, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(19), 10804; https://doi.org/10.3390/app131910804

Submission received: 12 September 2023 / Revised: 26 September 2023 / Accepted: 27 September 2023 / Published: 28 September 2023

(This article belongs to the Special Issue Novel Approaches for Remote Sensing Image Processing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Point cloud deep learning networks have been widely applied in point cloud classification, part segmentation and semantic segmentation. However, current point cloud deep learning networks are insufficient in the local feature extraction of the point cloud, which affects the accuracy of point cloud classification and segmentation. To address this issue, this paper proposes a local domain multi-level feature fusion point cloud deep learning network. First, dynamic graph convolutional operation is utilized to obtain the local neighborhood feature of the point cloud. Then, relation-shape convolution is used to extract a deeper-level edge feature of the point cloud, and max pooling is adopted to aggregate the edge features. Finally, point cloud classification and segmentation are realized based on global features and local features. We use the ModelNet40 and ShapeNet datasets to conduct the comparison experiment, which is a large-scale 3D CAD model dataset and a richly annotated, large-scale dataset of 3D shapes. For ModelNet40, the overall accuracy (OA) of the proposed method is similar to DGCNN, RS-CNN, PointConv and GAPNet, all exceeding 92%. Compared to PointNet, PointNet++, SO-Net and MSHANet, the OA of the proposed method is improved by 5%, 2%, 3% and 2.6%, respectively. For the ShapeNet dataset, the mean Intersection over Union (mIoU) of the part segmentation achieved by the proposed method is 86.3%, which is 2.9%, 1.4%, 1.7%, 1.7%, 1.2%, 0.1% and 1.0% higher than PointNet, RS-Net, SCN, SPLATNet, DGCNN, RS-CNN and LRC-NET, respectively.

Keywords:

point cloud; convolutional network; classification; segmentation

1. Introduction

Currently, laser point clouds have been widely used in many fields, such as autonomous driving [1], augmented reality [2] and so on. Point cloud classification and segmentation are key to scene understanding. Therefore, many scholars have systematically studied point cloud classification and segmentation. Machine learning approaches [3] are the most commonly used data analysis methods. In the field of point cloud data analysis, deep learning is used for point cloud classification and segmentation. To improve the accuracy of point cloud classification and segmentation, some researchers have drawn on image methods to handle point clouds. They project point clouds onto a 2D plane to transform them into images so that point cloud features can be learned based on 2D image feature learning methods [4]. The advantage of these methods is that the algorithms are simple. The limitations of these methods are that the process of converting point clouds to images can easily lose some feature information, and the selection of projection views requires a lot of prior knowledge. To address this, some researchers have begun to implement point cloud segmentation and classification directly on point clouds, such as voxel-based methods [5,6,7]. This method expresses the entire point cloud in the form of individual voxels, thereby retaining the geometric information of objects. The disadvantage of these methods is that they cannot finely divide the geometric information of object boundaries. In addition, these methods are generally subject to severe memory limitations, with high resolution incurring enormous computational and storage costs, while low resolution would result in information loss. In view of this, the sparse convolution [8] was used to reduce memory usage. For the low-resolution point clouds, some researchers have used point-based methods [9,10] to implement point cloud segmentation, which can obtain local geometric information of low-resolution point clouds. However, relying solely on local geometric information to determine the overall structure may lack consideration of global structural topological information. Simultaneously, these methods require a significant amount of time. The methods that directly process point cloud data can fully learn all features of point clouds without requiring additional point cloud transformation operations. However, the raw point clouds have the characteristics of being irregular, sparse and unordered. Therefore, it is necessary to construct the raw point clouds into a structure of local neighborhood graphs.

To address the above issues, this paper proposes a point cloud deep learning network based on multi-level feature fusion in a local domain. First, the local neighborhoods of the point cloud are constructed using graph convolution to extract features. Then, the feature points are used as input again to reconstruct local neighborhoods. In each neighborhood, a central point is found, and the dependencies between the central point and other feature points within the neighborhood are learned. Finally, point cloud classification and segmentation are achieved through pooling processing based on different features.

2. Related Works

Currently, there are three main categories of point cloud segmentation methods: projection-based methods, voxel-based methods and methods that directly operate on point clouds. Projection-based methods project the point cloud onto a 2D plane to form an image and use 2D CNNs to extract image features. Voxel-based methods divide the point cloud into blocks, and each block of points represents a voxel. Features are extracted from each voxel point cloud and fused to segment the entire point cloud. Point-cloud-based methods directly perform convolutional operations on point clouds, which can effectively obtain feature information from point cloud data.

2.1. Projection-Based Methods

The key to projection-based methods is constructing multi-view point clouds and then learning the 2D features of the multi-view projections through 2D convolutional operations. Ultimately, the features from the multi-view point clouds are fused. Su et al. [11] proposed a novel CNN architecture that combines information from multiple views of a 3D shape into a single and compact shape descriptor, offering even better recognition performance. The feature information of multiple views is integrated into a single, compact 3D shape descriptor by convolution and pooling layers, and the aggregated features are fed into a fully connected layer to obtain classification results. Qi et al. [12] improved multi-view CNN performance through a multi-resolution extension with improved data augmentation, which introduced multi-resolution 3D filtering to capture information at multiple scales. Thereby, the performance of the classification model is enhanced. Xu et al. [13] designed spatially adaptive convolution based on the structure of the SqueezeSeg model [14]. The model has the ability to achieve spatial adaptation and content awareness, which solves the problem of network performance degradation caused by traditional convolution applied to LiDAR images.

2.2. Voxel-Based Methods

Point cloud voxelization [15] means to voxelize disordered point clouds into regular structures and then use a neural network architecture for feature learning to achieve semantic segmentation of point clouds. Considering the sparsity of point clouds, Graham [16] designed a sparse convolutional network and applied it to 3D segmentation [17]. Tchapmi et al. [18] used the SEGCloud method to subdivide the large point cloud into voxel grids and post-processed them by using trilinear interpolation and conditional random fields. Li et al. [19] attempted to sample sparse 3D data and then feed the sampled results into a network for processing to reduce the computational effort. Le et al. [20] proposed the PointGrid method, which employs a 3DCNN to learn grid cells containing fixed points to obtain local geometric details. In addition, Hua et al. [21] proposed a lightweight, fully convolutional network based on an attention mechanism and a sparse tensor for the semantic segmentation of point clouds.

2.3. Point-Cloud-Based Methods

The above two methods have high computational complexity. In view of this, the network model based on the original point cloud is gradually proposed. PointNet [22] was among the earliest approaches to apply deep learning network architectures for processing raw point cloud data. A limitation of this method is that it only extracts global features of point clouds to enable point cloud classification or semantic segmentation without accounting for local features, leading to less-than-ideal network performance for semantic segmentation in large scenes. In order to improve the segmentation performance of point clouds, Charles et al. [23] proposed PointNet++ by improving the PointNet framework. PointNet++ aims to address the limitation of the original PointNet that only considers global features of point clouds. The addition of the local region segmentation module allows PointNet++ to learn and utilize local geometric features at different scales, thereby improving its performance on large-scene semantic segmentation. Jiang et al. [24] proposed a hierarchical point–edge interaction network for point cloud semantic segmentation. It introduces an edge convolution operation to interact between point features and edge features for capturing local geometric structures. Unbalanced point cloud scenarios will affect the semantic segmentation accuracy of PointNet++. In view of this problem, Deng et al. [25] proposed a weighted sampling method based on farthest point sampling (FPS), which can make the sampling process more balanced and reduce the influence of point cloud scene imbalance. Dang et al. [26] proposed hierarchical parallel group convolution, which can capture both a single point feature and local geometric features of point clouds. Therefore, the model improves the ability of the network to recognize complex classes. He et al. [27] proposed a multi-feature PointNet (MSMF-PointNet) deep learning point cloud model that can extract multiscale, multi-neighborhood features for classification. Liu et al. [28] introduces a relation module to learn the relations between points and aggregate neighbor information. Additionally, a shape module is presented to encode absolute geometric features of each point. The two modules are combined to extract comprehensive shape representations. The network can reason about the spatial information features of the point cloud and realize the shape context perception. The global features are obtained through the fully connected layer and then classified. Hu et al. [29] proposed an efficient and lightweight RandLA-Net network. The model used a local feature aggregation module to expand the range of k-nearest neighbor points to reduce information loss. The random sampling is used to reduce the storage cost and improve computational efficiency. Landrieu et al. [30] proposed a method for large-scale point cloud semantic segmentation using superpoint graphs. The approach first oversegments the point cloud into a set of superpoints. It then constructs a graph to represent relationships between superpoints. Graph convolutional networks are applied to perform semantic segmentation on the superpoint graph. It can greatly reduce the number of points in a point cloud, and the network can be applied in the field of large-scale point cloud datasets.

3. Proposed Method

Previous research shows that some information of the point cloud will be lost if the max pooling operation is performed directly. In this paper, a local domain relational convolution is designed to further extract more features from the input point cloud data and weaken the useless features. The RS-CNN randomly samples from the raw point cloud data to construct local neighborhoods, and the convolution that can be directly applied in the point cloud is designed. In this paper, we first use graph convolution to extract features, and then obtain feature points containing more information. Then, we construct the local domain and extract features from feature points again by relational convolution.

3.1. Proposed Network Architecture

The flowchart of the proposed method is described in Figure 1. First, the input point cloud data are locally partitioned using three different scales of k-nearest neighbors (KNN), and a directed graph is constructed for the extraction of features between points.

Simultaneously, we utilize EdgeConv (edge convolution) layers to extract and aggregate local features of the point cloud. The regular network is then extended to the irregular structures by using relation convolution to further extract the features of the points. Finally, global features are output through maximum pooling. The point cloud deep learning network proposed in this paper is constructed, as shown in Figure 2. The input of the network is the point set P = {p₁, p₂, …, p_n} ⊆ R^F, N is the number of sampled points and D is the feature dimension of each point of input.

When D = 3, the input points have only three dimensions, which means that the input point cloud data are (x, y, z). The N × D dimensional point clouds are input to the spatial transform networks (STN). The spatial transformation matrix learned by this module can align the coordinates of the input point cloud data. The aligned data are manipulated using a layer of the dynamic graph convolutional network module to obtain local neighborhood features, which are mapped to 64, 128 and 1024 dimensions in sequence by Multi-Layer Perceptron (MLP). The local features of the point cloud are extracted and aggregated by using EdgeConv, as shown in Figure 3a. Deeper feature extraction is then performed using a relation-shape convolution network module, as shown in Figure 3b. It utilizes relation-shape convolution to capture local geometric structures by modeling pairwise spatial relations between points. Global features are obtained through max pooling of the local features.

Point cloud classification is achieved by applying max pooling and an MLP to the global features. Point cloud segmentation is realized according to the combination of global and local features. It concatenates global and local features and outputs the score per point.

3.2. Feature Fusion Based on Dynamic Graph Convolution

We use the k-nearest neighbor graph structure of the directed graph to represent the local structure of the point cloud G = (V, E), where V = (1, …, n) and

E = (V, V)

are the vertices and edges, respectively. In the local directed graph, each point xi is the central node of the local neighborhood, the edge feature e_ij between the central node and other points in the neighborhood is calculated, and then the feature information of edges is aggregated to represent the feature information of this central node, as shown in Figure 4. Assume the input point cloud is

X = {\{x_{i} \in R^{F}\}}_{i = 1}^{n}

. For the object point

x_{i}

, the k neighboring points are searched, and the local map is constructed. They can be expressed by the index

(i, j_{1}), (i, j_{2}), \dots, (i, j_{k})

. According to the local map of

x_{i}

, the

x_{i}

and the edge feature of neighboring points are obtained, which are expressed as

e_{i j} = h (x_{i}, x_{j}; θ)

;

h (\cdot)

is the custom edge feature extraction function;

e_{i j}

is an

F ″

dimensional vector; and

θ

forms the hyperparameter of

h (\cdot)

that needs to be learned.

We define the edge feature as e_ij = hΘ(x_i,x_j), where hΘ is a nonlinear function composed by the learnable parameter Θ. The complete edge feature of the ith central node is derived by the pooled feature aggregation operation and can be expressed as xi = hΘ(xi,xj).

The choice of hΘ and feature aggregation operation has a very significant impact on the performance of the network. The first choice is hΘ(x_i,x_j) = hΘ(x_i), which is used in the PointNet, and it only considers global information and ignores local information; the second choice is hΘ(x_i,x_j) = hΘ(x_j-x_i), which encodes only local information and will lose global information. The DGCNN method chooses hΘ(x_i,x_j) = hΘ(x_i,x_j-x_i), which combines the advantages of the first and second choice and considers both local and global features. Generally, a Multi-Layer Perceptron is chosen for the learning of the parameter Θ, and max pooling is used for the aggregation operation of the edge features. Then, the final edge features can be specifically represented as

{x^{'}}_{i} = \max {Re L U (h_{Θ} (x_{i}, x_{j}))}

(1)

where ReLU is an activation function.

Compared with the sigmoid and tanh activation function, it is easy to obtain the derivative of ReLU, and this can improve the training speed of the network, increase the nonlinearity of the network and make the network sparse.

In this paper, the network structure is optimized so that the input of each layer of edge convolution is a fusion of all previous output features, and the edge features of each layer can be expressed as

e_{i j}^{(l + 1)} = h_{Θ} ([x_{i}^{l + 1}, x_{j}^{l + 1}]),

(2)

The final stitching result is

{x^{'}}_{i}^{(l + 1)} = [x_{0}, x_{1}, \dots, x_{l}],

(3)

where [x₀, x₁, …, x_l] denotes the output features of the network from layer 0 to l.

3.3. Local Domain Relational Convolution

We construct a spherical neighborhood for the points after edge convolution and take the sampling point x_i as the center point and x_j ∈ N(x_i) as the other points in the neighborhood. Our aim is to learn the potential relational information between points in this neighborhood. It can be learned by the following convolutional operation formula:

f_{p} = σ (φ ({T (f x_{i})})),

(4)

where f is the feature vector and x is the 3D point.

We first transform the features of all x points by the function T, and then aggregate them by the function φ to obtain f_p. We choose max pooling as the aggregation function, and use the nonlinear activation factor σ. The nonlinear activation factor σ is chosen as the ReLU function.

In a 3D spatial neighborhood, the geometric relation between x_i and its neighbors is an explicit expression about the spatial layout of points, which further discriminatively reflects the underlying shape. To capture the relationship, learnable weight w_i is replaced by w_ij, which can be defined as M(h_ij). The effect of M is to abstract the expression of the high-level relationship between two points so as to encode their spatial layout.

M

uses a Multi-Layer Perceptron;

r_{i j}

is the relationship between two points, which can be defined as the Euclidean distance; and finally,

f_{P}

can be expressed as

f_{P} = σ (φ ({M (r_{i j}) \cdot f_{x_{i}}})),

(5)

According to Equation (5), the convolution aggregates the relationships between all points within the neighborhood and the central point, thereby enabling inferential understanding of the spatial layout of points and generating effective shape perception.

3.4. Evaluation Metrics

The performance metrics of the classification are mean class accuracy (mAcc) and overall accuracy (OA). The performance metrics of the segmentation are overall accuracy (OA) and mean Intersection over Union (mIoU). Assuming a total of k categories, TP denotes the number of true positives, FP denotes the number of false positives and FN denotes the number of false negatives.

mAcc is defined as

m A c c = \frac{1}{k} \sum_{i = 0}^{k - 1} \frac{T P}{\sum_{j = 0}^{k - 1} F N},

(6)

OA is defined as

O A = \frac{\sum_{i = 0}^{k - 1} T P}{\sum_{i = 0}^{k - 1} \sum_{j = 0}^{k - 1} F N},

(7)

mIoU is defined as

m I o U = \frac{1}{k + 1} \sum_{i = 0}^{k} \frac{T P}{F N + F P + T P},

(8)

For the point cloud classification, in order to describe the classification performance more intuitively, we use the classification result of each object to calculate the classification accuracy of each object, as described in

CA = \frac{\sum_{i = 1}^{m} δ (r_{i}, s_{i})}{m},

(9)

where r_i, s_i are the actual and classified object points, respectively, m is the number of points of each actual object and δ is the indicator function.

δ (r_{i}, s_{i}) = {\begin{matrix} 1 & r_{i} = s_{i} \\ 0 & r_{i} \neq s_{i} \end{matrix}

(10)

4. Experiments

4.1. Experimental Setup

The experimental environment and network parameter settings are as follows: Intel Xeon Silver 4210 CPU, GeForce RTX 2080Ti graphics card, 11 GB video memory, 31 GB RAM, Ubuntu 20.04.3 LTS operating system, Python 3.7.0, PyTorch 1.11.0 and Cuda 11.6. The initial learning rate was set to 0.001, batch_size was set to 16, momentum was set to 0.9, an Adam optimizer was used, the initial decay rate was 0.5 and the number of training rounds was experimentally set to 200.

In this paper, we conduct point cloud classification and segmentation to evaluate the performance of the proposed method. For the point cloud classification, the ModelNet40 dataset [31] provided by Princeton University was used for the point cloud classification. It contains about 40 object categories (such as aircraft, tables, plants, etc.) and 12,311 CAD models represented by triangular grids. The dataset has 40 categories and 12,311 models. The data were divided into 9843 training samples and 2468 test samples. Each model sampled 1024 points as the initial data points for the experiment. For the part segmentation of the point cloud, the ShapeNet dataset [32] was used for the experiment. This dataset contains 16,881 shapes and 50 part segmentation category labels for 16 object classes. For a given input 3D point cloud, the objective of segmentation is to classify each point in the point cloud to give a category label, e.g., for chair shapes, the components such as chair legs, chair back, etc., need to be segmented. For the point cloud semantic segmentation, the experiment was conducted by using the S3DIS dataset [33]. The S3DIS dataset contains 6 areas of 3 different buildings divided into 271 individual rooms, each point in the scene corresponds to a fixed label and the labels belong to 13 categories (ceiling, floor, wall, door, etc.). In this paper, the spatial coordinates of the scene points and their RGB information were used as the input features of the network, and the rooms were partitioned into 1 m × 1 m × height (m) cubes during training. A total of 4096 points were randomly selected from each cube to generate the training data. A standard 6-fold cross-validation was used in the experiments.

4.2. Performance Analysis of Point Cloud Classification

In order to verify the classification performance of the proposed network model, we used the proposed method to conduct the point cloud classification and compare it with other network models, as shown in Table 1. From Table 1, it can be seen that the mAcc and OA of the proposed method are 91.4% and 93.8%, respectively. By comparing with other networks, it is clearly visible that the mAcc and OA of the proposed method are higher than other networks. Among them, the mAcc of the proposed method is slightly higher than DGCNN, GAPNet and PointNet++, and obviously higher than PointNet and SO-Net. The OA of the proposed method is similar to DGCNN, RS-CNN, PointConv and GAPNet, and they are all greater than 92%. However, the OA of the proposed method is still slightly higher than these four methods. Specifically, compared with PointNet, PointNet++, SO-Net and MSHANet, the OA of the proposed method is improved by 5%, 2%, 3% and 2.6%.

According to Table 1, the classification accuracies of different networks are obtained, as shown in Figure 5. From Figure 5, it is clearly visible that the classification accuracies of most objects in this paper are higher than when using other methods. For a few different objects, such as bathtub, guitar, tent and Xbox, the classification accuracy of the proposed method is slightly lower than that of the RS-CNN method. The possible reason is that the RS-CNN proposes a convolutional operator RS-Conv that can learn the geometric topological relations between points, allowing the RS-CNN to reason about the spatial layout of points and achieve very powerful shape context awareness. And the network in this paper is weaker than the RS-CNN for spatial layout inference of points. For the bed, bench, bowl, cup, door, glass box, tv stand and Xbox, the classification accuracy of the proposed method is slightly lower than that of the GAPNet method. The reason for this phenomenon is that GAPNet learns the local semantic information of the original input point cloud by embedding a graphical attention mechanism in the stacked MLP layers. Moreover, the network uses a parallel mechanism (multi-head attention) to generate attention features from different GAP-Layer layers in an aggregated manner, enabling the network to efficiently extract local geometric features of disordered point clouds. In contrast, the proposed method does not use the attention mechanism module and does not acquire the multi-attentive and multi-graph features of the point cloud, which leads to the poor classification accuracy of some objects. Although most of the objects are successfully classified by these methods, there are a few objects that cannot be accurately classified, such as the flower pot, radio and wardrobe. From Figure 5, it can be seen that all these methods have the lowest classification accuracies for the flower pot, which are less than 40%. The reason is that there are so many styles of flower pots and some of them have very complex structures, so that the network is not able to effectively identify the complex structures of flower pots. Also, the structure of the flower pot is similar to the bottle, vase and lamp, and sometimes they are classified into other similar structures. Although the classification accuracy of the radio in this paper is obviously higher than other methods, they are all lower than 70%. By the same token, the classification accuracies of the wardrobe by these methods are low. Although the classification accuracies of the flower pot, radio and wardrobe are lower than 65%, the classification accuracies of other objects are greater than 75%. Simultaneously, the classification accuracies of more than half of the objects by the proposed method are higher than other methods. Especially for the curtain, desk, dresser, lamp, mantel, night stand and so on, the classification performance of the proposed method is superior to other methods.

We also compared the shape retrieval performance of the proposed method with other networks on the ModelNet40 dataset. We selected a sample from the test dataset as a query, and then retrieved the most similar shapes from the test dataset. The mean Average Precision (mAP) was used as the evaluation metric. Table 2 shows that the mAP of the proposed method reaches 88.7, which is higher than the other networks. In addition, we used the shape retrieval results to conduct a qualitative assessment of the classification results. The shape retrieval results for the three different categories were obtained, as shown in Figure 6. The left side of Figure 6 shows the shape of the query, and the right side of Figure 6 shows the two rows of retrieval results. The first row is the retrieval result of PointNet, and the second row is the retrieval result of the proposed method. Simultaneously, the wrong classification results are obtained, as shown in the red squares of Figure 6.

4.3. Performance Analysis of Point Cloud Part Segmentation

In order to evaluate the effectiveness of the proposed method in the field of part segmentation, we used the ShapeNet dataset to conduct the part segmentation, and the part of the point cloud dataset and its corresponding training samples are shown in Figure 7.

The segmentation results of the proposed method on the ShapeNet dataset were compared with other networks. Table 3 gives the cross-merge ratio and average cross-merge ratio of each network for 16 categories.

From Table 3, it is visible that the proposed method has the highest cross-merge ratios for six categories. Among them, the cross-merge ratios of the chair and pistol obtained by the proposed method are obviously higher than the other methods. The possible reason is that the structure of the chair and pistol is relatively simple. The segmentation accuracy of the chair and pistol is higher than other methods. The reason is that the proposed method enhances the contextual information of the local features of the chair and pistol by further constructing the local domain and captures higher-level semantic shape information. Although the cross-merge ratios of other categories obtained by the proposed method are not the highest, they show only small differences from the highest IoU. For example, the cross-merge ratios of the knife and rocket obtained by the proposed method are almost the same as the highest cross-merge ratio. According to the cross-merge ratios of different categories, we obtained the mIoU of different methods. The comparison results show that the mIoU of part segmentation by the proposed method is 86.3%, which is 2.9%, 1.4%, 1.7%, 1.7%, 1.2%, 0.1% and 1.0% higher than PointNet, RS-Net, SCN, SPLATNet, DGCNN, RS-CNN and LRC-NET, respectively. The mIoU of the part segmentation of the proposed method is almost the same as that of the RS-CNN. The reason for this phenomenon is that the network in this paper proposes a local domain relational convolution like that in the RS-CNN. The higher values of IoU are cap, chair, guitar, lamp, knife, laptop and mug, and the overall effect of part segmentation of these categories is better. The proposed method obtains the highest value of IoU in the categories of aero, chair, lamp, pistol, table and laptop, as shown in Figure 8.

From Figure 8, it is clearly visible that different parts of the six categories are accurately segmented by the proposed method. For aero, the airfoils are segmented exactly, and they display different colors on fuselage. Especially for the second aero, the engines, fuselage, empennage and airfoils display different colors. For chair, although there are very many styles of chairs, these chairs are accurately segmented. The first chair displays different colors for the back, surface and legs. For lamp, the bulb, holder, base and shade are accurately segmented by the proposed method. For pistol, the barrel, grip and trigger display different colors. Some points in trigger are misidentified as the barrel or grip. The possible reason for this is that the trigger part has fewer points, and it is a difficult-to-extract feature. The table top and legs are segmented exactly by the proposed method. Since the structure of laptops is very simple, it is easy to segment laptops, and this category has the highest segmentation accuracy among the 16 categories.

From Figure 9, it is clearly visible that the features learned by the edge convolution layer are mainly about simple features such as edges, corners and radians, while the relation convolution layer captures more complex semantic shape features such as chair backrests, table legs, knife handles and hat brims. The proposed method combines these two convolutional layers and has better performance in point cloud part segmentation by feature extraction.

We used the proposed method to conduct the part segmentation of airplane, chair, guitar and pistol, and compared the segmentation results with the PointNet, DGCNN and RS-CNN methods, as shown in Figure 10. Each category is labeled with several different colors, and they represent the different components in each category. For the airplane, the four methods can accurately segment the engines, fuselage, empennage and airfoils of airplane. However, the PointNet, DGCNN and RS-CNN methods misidentify a small number of partial points on the airfoil as engine points, as shown in the ellipse of Figure 10a. The chair legs are segmented by the four methods. However, the PointNet method misidentifies points of the back of a chair as points of the seat. For the RS-CNN and DGCNN methods, a few points on the back of chair are mistakenly recognized as the seat points, as shown in the rectangle of Figure 10. The guitar mainly includes the head, neck and case, and they are successfully segmented by the four methods. However, the segmentation effect of these four methods is different. In particular, a few of points on the neck are taken for the head of the guitar by the PointNet and DGCNN methods. Although the RS-CNN and proposed method keep a few points on the head that do not belong to this part, the head of the guitar segmentation by these two methods is superior to the PointNet and DGCNN methods, as shown in the circle of Figure 10. For the pistol, it is clearly visible that it is segmented into three parts. The square of Figure 10 shows that the trigger points segmented by PointNet, DGCNN and RS-CNN belong to other parts. However, the trigger points segmented by the proposed method are relatively pure, and they do not mix the points of other parts.

4.4. Performance Analysis of Point Cloud Semantic Segmentation

In order to verify the effectiveness of the proposed method, the semantic segmentation results of the S3DIS dataset were obtained by the proposed method and compared with other networks, as shown in Table 4.

From Table 4, it is visible that the DGCNN is much less effective than the proposed method for segmenting objects such as wall, window, door and bookcase. A possible reason is that the DGCNN only constructs the local neighborhoods once by using graph convolution networks. The RS-CNN achieved an accuracy of 68.7 in segmenting windows, and it is the highest segmentation accuracy compared to the other networks. However, the segmentation accuracies of sofa, board and clutter are lower than the proposed method. The possible reason is that the RS-CNN method ignores the dependence of the shape feature between domain points when describing local shape features. The overall segmentation performance of the proposed method is superior to other networks, and it can achieve the best segmentation results on 8 of the 13 categories. In particular, the proposed method obtains better segmentation accuracy in categories such as ceiling, floor and wall. The reason is that the proposed method is able to fully extract point cloud features. From Table 4, it is clearly visible that the OA and mIoU of the proposed method are 5.4% and 8.2% higher than the DGCNN and 1.7% and 2.3% higher than the RS-CNN. The OA and mIoU of the proposed method are slightly lower than Point Transformer. The reason is that the Point Transformer network has a strong ability of semantic segmentation for the door and chair, and the semantic segmentation accuracy of the door and chair obtained by the Point Transformer network is much higher than other methods.

Some of the segmentation results of the proposed method on the S3DIS dataset were obtained, as shown in Figure 11. The proposed method can more accurately segment the objects such as chairs, tables, etc., as shown in the dashed rectangle of Figure 11. It illustrates that the proposed method can identify the location range of the objects and mitigate the interference of incorrect classification with the correct results. Simultaneously, the leg part of a chair that connects to the floor is also accurately segmented by the proposed method. The reason for this result is that the proposed method enhances the ability to identify the detailed feature information of the sampled points and is able to determine the boundary range of an object more accurately. However, the proposed method cannot accurately segment a small number of objects that are close to each other, as shown in the ellipse of Figure 11. A possible reason for this phenomenon is that the local features of these two types of objects are very similar.

4.5. Robustness Analysis

In order to verify the robustness of the proposed method for the sparse point cloud, 1024, 768, 512, 256 and 128 points were selected as the input of the training model, and the experimental parameters were the same as those in Section 3.3. From Figure 12, it is clearly visible that the classification accuracy of the proposed method is always higher than the other networks with the reduction in sampling points. The classification accuracy of the DGCNN decreases rapidly when the number of points is less than 512, and the classification accuracy of PointNet++, GAPNet and MSHANet decreases rapidly when the number of points is less than 256. This indicates that the proposed method has strong robustness in the field of the classification of sparse points.

5. Application of Semantic Segmentation in Outdoor Scenes

We used the proposed method to conduct semantic segmentation in outdoor scenes, and different objects are displayed in different colors after segmentation, as shown in Figure 13.

From Figure 13, it is clearly visible that the proposed method can accurately identify the spatial extent of various objects and precisely segment the target objects. For example, the proposed method divides outdoor scenes into man-made terrain, natural terrain, high vegetation, low vegetation, buildings, cars and other objects. The man-made terrain and natural terrain are accurately segmented by the proposed method. However, for some adjacent or structurally similar objects, such as low vegetation placed on buildings, segmentation errors can easily occur. The proposed method may also struggle to accurately extract features and identify objects with blurry boundaries. For example, segmentation errors may occur at the junctions between tall vegetation and landscapes.

6. Conclusions

This paper proposes a point cloud deep learning network based on multi-level feature fusion in a local domain, which is used in the field of point cloud classification and segmentation. We first used the dynamic graph convolutional network to extract feature information. Then, the feature points were used as input in reconstructing the local domain to obtain the low-dimensional relationship information between the feature points, which was mapped to higher dimensions by MLP to further extract the point cloud features. The proposed method was applied in the field of classification and segmentation on the ModelNet40 and ShapeNet datasets and compared with other networks. For the ModelNet40 dataset, the proposed method can accurately classify most of the objects, and the OA of the classification by the proposed method is 93.8%. The classification accuracy of the proposed method is slightly higher than DGCNN, RS-CNN and PointConv, and better than the latest MASHANet. Furthermore, the mIoU of part segmentation of the proposed method is 86.3%, and it is similar to RS-CNN. Simultaneously, the mIoU of these two networks is superior to other networks. The proposed method has better performance in the field of semantic segmentation. It can achieve the best segmentation results on 8 of the 13 categories on the S3DIS dataset. The OA and mIoU of the semantic segmentation by the proposed method are 89.5% and 73.0%. The segmentation result of the proposed method is better than the DGCNN and RS-SNN, and slightly higher than Point Transformer.

However, there are still some shortcomings of the proposed method, and the network is deficient in analyzing object classes in complex environments. The proposed method cannot accurately judge the edge points of some objects that are close to each other and have similar geometric features. The structure around an object will affect the judgment of the total structure of an object, and then affect the semantic segmentation of the object. Therefore, our future work will focus on the application of the proposed method in complex scenes and the ability to handle the edge of an object.

Author Contributions

Conceptualization, X.H.; methodology, X.C. validation, H.D.; software, P.W.; formal analysis, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant Nos. 42271447, 42171428 and 42001374) and in part by the Fundamental Research Funds for Central Public Welfare Research Institutes (Grant Nos. CKSF2021449/GC and CKSF2021448/GC).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data are public data and are referenced in the paper.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum pointnets for 3d object detection from rgb-d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 918–927. [Google Scholar] [CrossRef]
Shi, G.; Zheng, L.; Wang, W.; Lu, K. Non-Scanning Three-Dimensional Imaging System with a Single-Pixel Detector: Simulation and Experimental Study. Appl. Sci. 2020, 10, 3100. [Google Scholar] [CrossRef]
Brahmanandam, P.S. Prediction of Atmospheric Particulate Matter (PM2.5) Over Beijing, China using Machine Learning Approaches. Int. J. Eng. Res. Technol. 2021, 5, 443. [Google Scholar] [CrossRef]
Lawin, F.J.; Danelljan, M.; Tosteberg, P.; Bhat, G.; Khan, F.S.; Felsberg, M. Deep projective 3D semantic segmentation. In Proceedings of the International Conference on Computer Analysis of Images and Patterns, Ystad, Sweden, 22–24 August 2017; Springer: Cham, Switzerland, 2017; pp. 95–107. [Google Scholar] [CrossRef]
Maturana, D.; Scherer, S. Voxnet: A 3d convolutional neural network for real-time object recognition. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 922–928. [Google Scholar] [CrossRef]
Gadelha, M.; Wang, R.; Maji, S. Multiresolution tree networks for 3d point cloud processing. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 103–118. [Google Scholar]
Huang, J.; You, S. Point cloud labeling using 3d convolutional neural network. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 2670–2675. [Google Scholar] [CrossRef]
Graham, B.; Engelcke, M.; Van Der Maaten, L. 3d semantic segmentation with submanifold sparse convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9224–9232. [Google Scholar] [CrossRef]
Liu, Z.; Tang, H.; Lin, Y.; Han, S. Point-voxel cnn for efficient 3d deep learning. Adv. Neural Inf. Process. Syst. 2019, 32, 1–11. [Google Scholar]
Jiang, M.; Wu, Y.; Zhao, T.; Zhao, Z.; Lu, C. Pointsift: A sift-like network module for 3d point cloud semantic segmentation. arXiv 2018, arXiv:1807.00652. [Google Scholar] [CrossRef]
Su, H.; Maji, S.; Kalogerakis, E.; Learned-Miller, E. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 945–953. [Google Scholar] [CrossRef]
Qi, C.R.; Su, H.; Nießner, M.; Dai, A.; Yan, M.; Guibas, L.J. Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5648–5656. [Google Scholar] [CrossRef]
Xu, C.; Wu, B.; Wang, Z.; Zhan, W.; Vajda, P.; Keutzer, K.; Tomizuka, M. Squeezesegv3: Spatially-adaptive convolution for efficient point-cloud segmentation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 1–19. [Google Scholar] [CrossRef]
Wu, B.; Wan, A.; Yue, X.; Keutzer, K. Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 1887–1893. [Google Scholar] [CrossRef]
Lee, M.-y.; Lee, S.-h.; Jung, K.-d.; Lee, S.-h.; Kwon, S.-c. A Novel Preprocessing Method for Dynamic Point-Cloud Compression. Appl. Sci. 2021, 11, 5941. [Google Scholar] [CrossRef]
Graham, B. Spatially-sparse convolutional neural networks. arXiv 2014, arXiv:1409.6070. [Google Scholar] [CrossRef]
Verdoja, F.; Thomas, D.; Sugimoto, A. Fast 3D point cloud segmentation using supervoxels with geometry and color for 3D scene understanding. In Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, China, 10–14 July 2017; pp. 1285–1290. [Google Scholar] [CrossRef]
Tchapmi, L.; Choy, C.; Armeni, I.; Gwak, J.; Savarese, S. Segcloud: Semantic segmentation of 3d point clouds. In Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China, 10–12 October 2017; pp. 537–547. [Google Scholar] [CrossRef]
Li, Y.; Pirk, S.; Su, H.; Qi, C.R.; Guibas, L.J. Fpnn: Field probing neural networks for 3d data. Adv. Neural Inf. Process. Syst. 2016, 29, 1–10. [Google Scholar]
Le, T.; Duan, Y. Pointgrid: A deep network for 3d shape understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 9204–9214. [Google Scholar] [CrossRef]
Wang, F.; Yang, Y.; Wu, Z.; Zhou, J.; Zhang, W. Real-Time Semantic Segmentation of Point Clouds Based on an Attention Mechanism and a Sparse Tensor. Appl. Sci. 2023, 13, 3256. [Google Scholar] [CrossRef]
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar] [CrossRef]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 30, 1–14. [Google Scholar]
Jiang, L.; Zhao, H.; Liu, S.; Shen, X.; Fu, C.W.; Jia, J. Hierarchical point-edge interaction network for point cloud semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 10433–10441. [Google Scholar]
Deng, C.; Peng, Z.; Chen, Z.; Chen, R. Point Cloud Deep Learning Network Based on Balanced Sampling and Hybrid Pooling. Sensors 2023, 23, 981. [Google Scholar] [CrossRef] [PubMed]
Dang, J.; Yang, J. HPGCNN: Hierarchical Parallel Group Convolutional Neural Networks for Point Clouds Processing. In Proceedings of the Asian Conference on Computer Vision, Kyoto, Japan, 30 November–4 December 2020. [Google Scholar] [CrossRef]
He, P.; Ma, Z.; Fei, M.; Liu, W.; Guo, G.; Wang, M. A Multiscale Multi-Feature Deep Learning Model for Airborne Point-Cloud Semantic Segmentation. Appl. Sci. 2022, 12, 11801. [Google Scholar] [CrossRef]
Liu, Y.; Fan, B.; Xiang, S.; Pan, C. Relation-shape convolutional neural network for point cloud analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8895–8904. [Google Scholar]
Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11108–11117. [Google Scholar]
Landrieu, L.; Simonovsky, M. Large-scale point cloud semantic segmentation with superpoint graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4558–4567. [Google Scholar]
Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1912–1920. [Google Scholar] [CrossRef]
Yi, L.; Kim, V.G.; Ceylan, D.; Shen, I.C.; Yan, M.; Su, H.; Lu, C.; Huang, Q.; Sheffer, A.; Guibas, L. A scalable active framework for region annotation in 3d shape collections. ACM Trans. Graph. (ToG) 2016, 35, 1–12. [Google Scholar] [CrossRef]
Armeni, I.; Sener, O.; Zamir, A.R.; Jiang, H.; Brilakis, I.; Fischer, M.; Savarese, S. 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1534–1543. [Google Scholar] [CrossRef]
Li, J.; Chen, B.M.; Lee, G.H. So-net: Self-organizing network for point cloud analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9397–9406. [Google Scholar]
Wu, W.; Qi, Z.; Fuxin, L. Pointconv: Deep convolutional networks on 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9621–9630. [Google Scholar]
Ju, M.; Ryu, H.; Moon, S.; Yoo, C.D. GAPNet: Generic-Attribute-Pose Network For Fine-Grained Visual Categorization Using Multi-Attribute Attention Module. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2020; pp. 703–707. [Google Scholar] [CrossRef]
Gao, X.Y.; Wang, Y.Z.; Zhang, C.X.; Lu, J.Q. Multi-head self-attention for 3D point Cloud classification. IEEE Access 2021, 9, 18137–18147. [Google Scholar] [CrossRef]
Fu, K.; Peng, J.; He, Q.; Zhang, H. Single image 3D object reconstruction based on deep learning: A review. Multimed. Tools Appl. 2021, 80, 463–498. [Google Scholar] [CrossRef]
Xie, S.; Liu, S.; Chen, Z.; Tu, Z. Attentional shapecontextnet for point cloud recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4606–4615. [Google Scholar] [CrossRef]
Su, H.; Jampani, V.; Sun, D.; Maji, S.; Kalogerakis, E.; Yang, M.H.; Kautz, J. Splatnet: Sparse lattice networks for point cloud processing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2530–2539. [Google Scholar]
Liu, X.; Han, Z.; Hong, F.; Liu, Y.S.; Zwicker, M. LRC-Net: Learning discriminative features on point clouds by encoding local region contexts. Comput. Aided Geom. Des. 2020, 79, 101859. [Google Scholar] [CrossRef]
Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. Pointcnn: Convolution on x-transformed points. Adv. Neural Inf. Process. Syst. 2018, 31, 1–11. [Google Scholar]
Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6411–6420. [Google Scholar]
Zhao, H.; Jiang, L.; Jia, J.; Torr, P.H.; Koltun, V. Point transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 16259–16268. [Google Scholar]

Figure 1. Flowchart of the proposed method.

Figure 2. Proposed network architecture.

Figure 3. Structure of edge and relation convolution. (a) Edge Convolution Module. (b) Relation Convolution Module.

Figure 4. Extraction of feature center point in edge convolution.

Figure 5. Classification accuracy of different methods.

Figure 6. Shape retrieval of the proposed method and PointNet on ModelNet40 dataset, and the red squares represent wrong classification results.

Figure 7. The part of the point cloud dataset and its corresponding training samples, the different colors represent different part of objects.

Figure 8. Part segmentation of six categories by the proposed method, the different colors represent different part of objects.

Figure 9. Point cloud feature learning at edge and relational convolution layers, the different colors represent different part of objects.

Figure 10. Part segmentation of different methods and different colors represent different part of objects: (a) PointNet; (b) DGCNN; (c) RS-CNN; (d) proposed method.

Figure 11. Semantic segmentation results of the proposed method on the S3DIS dataset, and different colors represent different objects.

Figure 12. (a) Sparse point cloud; (b) classification accuracy of different sampling points by different methods.

Figure 13. The semantic segmentation results in outdoor scenes.

Table 1. Point cloud classification of different methods on the ModelNet40 dataset.

Methods	Points	mAcc (%)	OA (%)
PointNet	1024	86.2	89.2
PointNet++	1024	89.4	91.9
DGCNN	1024	90.2	92.9
RS-CNN	1024	—	92.9
SO-Net [34]	5000	87.3	90.9
PointConv [35]	1024	—	92.5
GAPNet [36]	1024	89.7	92.4
MSHANet [37]	1024	—	91.3
Proposed method	1024	91.4	93.8

Table 2. Shape retrieval results of different methods on the ModelNet40 dataset.

Method	mAP
PointNet	70.5
PointNet++	81.3
DGCNN	83.2
RS-CNN	85.3
Proposed method	88.7

Table 3. Segmentation results of different methods on ShapeNet dataset.

Class	Number of Shapes	PointNet	RS-Net [38]	SCN [39]	SPLATNet [40]	DGCNN	RSCNN	LRC-NET [41]	Proposed Method
aero	2690	83.4	82.7	83.8	81.9	84.0	83.5	82.6	84.5
bag	76	78.7	86.4	80.8	83.9	83.7	84.8	85.2	86.1
cap	55	82.5	84.1	83.5	88.6	84.4	88.8	87.4	88.6
car	898	74.9	78.2	79.3	79.5	77.8	79.6	79.0	79.3
chair	3758	89.6	90.4	90.5	90.1	90.6	91.2	90.7	92.4
ear phone	69	73.0	69.3	69.8	73.5	74.4	81.1	80.2	80.9
guitar	787	91.5	91.4	91.7	91.3	91.0	91.6	91.3	91.5
knife	392	85.9	87.0	86.5	84.7	88.1	88.4	86.9	88.3
lamp	1547	80.8	83.5	82.9	84.5	83.4	86.0	84.5	86.6
laptop	451	95.3	95.4	96.0	96.3	95.8	96.0	95.5	96.9
motor	202	65.2	66.0	69.2	69.7	67.8	73.7	71.4	73.2
mug	184	93.0	92.6	93.8	95.0	93.3	94.1	93.8	94.7
pistol	283	81.2	81.8	82.5	81.7	82.3	83.4	79.4	84.9
rocket	66	57.9	56.1	62.9	59.2	59.2	60.5	51.7	62.8
skate board	152	72.8	75.8	74.4	70.4	76.0	77.7	75.5	77.5
table	5271	80.6	82.2	80.8	81.3	81.9	83.6	82.6	83.9
mIoU		83.4	84.9	84.6	84.6	85.1	86.2	85.3	86.3

Table 4. Semantic segmentation of different methods on the S3DIS dataset.

Methods	PointNet	PointCNN [42]	DGCNN	RS-CNN	Kpconv [43]	Point Transformer [44]	Proposed Method
ceiling	88.0	94.8	92.9	93.4	93.6	94.3	94.9
floor	88.7	97.3	93.8	95.5	92.4	97.5	97.6
wall	69.3	75.8	73.1	84.0	83.1	84.7	84.9
beam	42.4	63.3	62.5	61.9	63.9	55.6	63.8
column	23.1	51.7	55.9	57.6	54.3	58.1	58.9
window	47.5	58.4	57.6	68.7	66.1	66.1	68.6
door	51.6	57.2	59.2	66.5	76.6	78.2	68.8
chair	42.0	69.1	66.7	65.2	57.8	74.1	67.7
table	54.1	71.6	60.4	75.9	64.0	77.6	77.9
bookcase	38.2	61.2	57.0	68.6	69.3	71.2	71.5
sofa	9.6	39.1	54.8	62.9	74.9	67.3	64.5
board	29.4	52.2	56.7	60.1	61.3	65.7	65.8
clutter	35.2	58.6	51.6	59.1	60.3	64.8	64.9
OA	78.5	88.1	84.1	87.8	-	90.2	89.5
mIoU	47.6	65.4	64.8	70.7	70.6	73.5	73.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, X.; Chen, X.; Deng, H.; Wan, P.; Li, J. Point Cloud Deep Learning Network Based on Local Domain Multi-Level Feature. Appl. Sci. 2023, 13, 10804. https://doi.org/10.3390/app131910804

AMA Style

Han X, Chen X, Deng H, Wan P, Li J. Point Cloud Deep Learning Network Based on Local Domain Multi-Level Feature. Applied Sciences. 2023; 13(19):10804. https://doi.org/10.3390/app131910804

Chicago/Turabian Style

Han, Xianquan, Xijiang Chen, Hui Deng, Peng Wan, and Jianzhou Li. 2023. "Point Cloud Deep Learning Network Based on Local Domain Multi-Level Feature" Applied Sciences 13, no. 19: 10804. https://doi.org/10.3390/app131910804

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Point Cloud Deep Learning Network Based on Local Domain Multi-Level Feature

Abstract

1. Introduction

2. Related Works

2.1. Projection-Based Methods

2.2. Voxel-Based Methods

2.3. Point-Cloud-Based Methods

3. Proposed Method

3.1. Proposed Network Architecture

3.2. Feature Fusion Based on Dynamic Graph Convolution

3.3. Local Domain Relational Convolution

3.4. Evaluation Metrics

4. Experiments

4.1. Experimental Setup

4.2. Performance Analysis of Point Cloud Classification

4.3. Performance Analysis of Point Cloud Part Segmentation

4.4. Performance Analysis of Point Cloud Semantic Segmentation

4.5. Robustness Analysis

5. Application of Semantic Segmentation in Outdoor Scenes

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI