Sharp Feature-Preserving 3D Mesh Reconstruction from Point Clouds Based on Primitive Detection

Liu, Qi; Xu, Shibiao; Xiao, Jun; Wang, Ying

doi:10.3390/rs15123155

Open AccessArticle

Sharp Feature-Preserving 3D Mesh Reconstruction from Point Clouds Based on Primitive Detection

¹

School of Artificial Intelligence, University of Chinese Academy of Sciences, No. 19 Yuquan Road, Shijingshan District, Beijing 100049, China

²

School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(12), 3155; https://doi.org/10.3390/rs15123155

Submission received: 28 April 2023 / Revised: 24 May 2023 / Accepted: 15 June 2023 / Published: 16 June 2023

(This article belongs to the Special Issue Automatic Segmentation, Reconstruction, and Modelling from Laser Scanning Data)

Download

Browse Figures

Versions Notes

Abstract

:

High-fidelity mesh reconstruction from point clouds has long been a fundamental research topic in computer vision and computer graphics. Traditional methods require dense triangle meshes to achieve high fidelity, but excessively dense triangles may lead to unnecessary storage and computational burdens, while also struggling to capture clear, sharp, and continuous edges. This paper argues that the key to high-fidelity reconstruction lies in preserving sharp features. Therefore, we introduce a novel sharp-feature-preserving reconstruction framework based on primitive detection. It includes an improved deep-learning-based primitive detection module and two novel mesh splitting and selection modules that we propose. Our framework can accurately and reasonably segment primitive patches, fit meshes in each patch, and split overlapping meshes at the triangle level to ensure true sharpness while obtaining lightweight mesh models. Quantitative and visual experimental results demonstrate that our framework outperforms both the state-of-the-art learning-based primitive detection methods and traditional reconstruction methods. Moreover, our designed modules are plug-and-play, which not only apply to learning-based primitive detectors but also can be combined with other point cloud processing tasks such as edge extraction or random sample consensus (RANSAC) to achieve high-fidelity results.

Keywords:

mesh reconstruction; point clouds; sharp feature; primitive detection

1. Introduction

Reconstructing 3D mesh surfaces from point clouds is a key research topic in computer vision and computer graphics, as it enables subsequent computer applications such as calculation, rendering, lighting, deformation, and physical simulation. In particular, achieving high-fidelity results has been a major focus in this field. Mesh reconstruction has significant value in various domains, such as reverse engineering, game animation, virtual and augmented reality, and robotics vision.

Traditional reconstruction algorithms [1,2] achieve high fidelity by producing dense, high-resolution triangle meshes, but this approach can create unnecessary storage or computational burden due to the large number of triangles.

For common manmade 3D objects, such as CAD models, urban buildings, and indoor furniture, the key to achieving high-fidelity reconstruction is to preserve their sharp edge features during reconstruction. Therefore, reconstruction methods based on geometric primitive detection may be the best way to achieve high fidelity [3]. This is because manmade objects can be represented as a combination of primitive surface patches such as planes, spheres, cylinders, and cones, among others.

Despite the many related studies in this area, there is currently no complete framework that allows users to easily obtain high-fidelity mesh models from point clouds. For example, traditional methods use algorithms such as random sample consensus (RANSAC) [4,5] or cluster segmentation [6,7] to segment point clouds and obtain primitive patches, but these methods are limited by the tedious tuning required for each model. Recent methods [8,9,10] leverage deep learning to learn semantics from large amounts of data, achieving better generalization performance and higher extraction accuracy. However, as shown in Figure 1, the recent state-of-the-art methods such as Parsenet [9] and HPNet [10] extract primitives by training neural networks in a supervised manner, but they are more focused on the accuracy of extraction and prediction and have not considered reconstruction results in detail. For a more detailed introduction of related work, please refer to Section 2.

In this paper, we propose a full-process reconstruction framework based on primitive detection. Our goal is to provide a complete and user-friendly solution that can reconstruct high-fidelity mesh models with sharp features preserved from point clouds. It includes (1) an improved primitive extraction module based on deep learning, equipped with a necessary post-processing refine module, achieving the best performance in extracting primitive patches from point clouds to date, and (2) an efficient mesh fitting and splitting module is used to preserve sharp features. A brand-new selection module based on linear optimization is used to ensure a watertight final result. For technical details of each module, please refer to Section 3.

Visual and quantitative experimental results show that our reconstruction framework has the following advantages: (1) High fidelity. As shown in Figure 1, our framework produces reconstruction models that are visually close to ground truth, with clear and sharp boundaries. (2) Lightweight representation. Unlike previous dense mesh reconstruction methods, our approach requires fewer triangles to capture object boundary details, resulting in lightweight models. (3) Flexibility. Our proposed modules are versatile and can be combined with other tasks, such as automatic or semiautomatic edge extraction, to produce high-quality reconstruction models. Moreover, our method can generate sharp and lightweight mesh models from real-scanned point clouds, as shown in Figure 2, which is the result of our modules combined with RANSAC applied to point clouds of urban buildings. For more experimental details, please refer to Section 4.

In conclusion, this work presents the following contributions:

We propose a novel complete framework for reconstructing meshes from point clouds based on primitive detection. Our framework can accurately preserve sharp and clear boundary features and generate high-fidelity reconstruction models.
The framework includes an improved learning-based primitive detection module. Experiments show that it outperforms previous methods, with Seg-IoU and Type-IoU scores improving from $85.24 / 91.04$ to $88.42 / 92.85$ . In addition, we specifically designed a refine submodule to optimize the detected segmentation, obtaining more reasonable segmentation patches.
The framework also includes an efficient module for mesh splitting that can separate overlapping meshes at the triangle level, producing clear and continuous segmentation blocks. This module helps our framework reconstruct high-quality sharp edges, and it can be well parallelized.
Our framework also features a novel optimization selection module, which treats the reconstruction task as a minimum subset selection problem. In our framework, this module is responsible for selecting the optimal subset from the already split mesh collection, to obtain the optimal reconstruction result. The design of this module considers both the local and global information of the input model.

Next, in Section 2, we introduce related work, and in Section 3, we detail our framework and each module’s specifics. Then, in Section 4, we present visualization and quantitative experimental results, and explore the flexibility and robustness of our proposed framework for transfer to other tasks through a series of exploratory experiments.

2. Related Work

The technologies most relevant to this article are surface reconstruction and primitive detection from point clouds, which we briefly describe in this section. For more comprehensive discussion, please refer to the recent surveys [1,2,3].

2.1. Surface Reconstruction

2.1.1. Non-Learning-Based Methods

Reconstructing the mesh surface from point clouds is an ill-posed problem in the absence of reasonable priors [1,2]. Early classic reconstruction methods are employed smoothness as a prior to infer the implicit surface from the point cloud globally or locally, and then isosurface extraction algorithms such as marching cubes [11] are used to obtain the explicit mesh. Representative works include methods based on moving least squares (MLS) [12,13], radial basis functions (RBFs), or the Hermite variant of the RBFs (HRBF) [14,15,16], and the currently most commonly used Poisson reconstruction [17,18]. The reconstruction quality of these methods is limited by the resolution of the triangles, often requiring dense triangles to ensure high fidelity, especially for sharp edges, which can cause unnecessary computational and storage pressures.

Therefore, some subsequent methods use piecewise smoothness as a prior to preserve sharp edges while improving the reconstruction performance. For example, Robust MLS (RMLS) [19] fits feature boundaries through two different smoothing components, and points on different smoothing components can be regarded as outliers of each other. Robust implicit MLS (RIMLS) [20] assumes that the points on two different smooth components can be regarded as outliers in both the space and the gradient domains; however, they are only processed locally and are prone to jagged edges at the border. Edge-aware resampling (EAR) [21] uses a bilateral mechanism to smooth normals and separates the sharp feature boundaries from the smooth area. The locally optimal projector (LOP) [22] smooths and resamples the area far from feature boundaries, whereas interpolation projection upsamples and enhances sharp feature areas. Though these methods preserve or enhance sharp edges, they still produce jagged edges or artifacts at the boundaries and do not obtain complete and continuous boundaries compared to our method.

2.1.2. Learning Based Methods

Recently, deep-learning-based reconstruction techniques have gained popularity. These methods propose the use of neural networks to learn the continuous surface of objects, also known as neural implicit representation, and were first introduced in pioneering works such as IM-Net [23], DeepSDF [24], and Occupancy Networks [25]. Unlike commonly used explicit representations such as point clouds, voxel grids, and meshes, neural implicit representation trains a discriminator to output the occupancy or the distance to the surface (signed distance function, SDF) based on the query points in space during training. During inference, a discrete grid is utilized, with each grid cell being queried to obtain the occupancy or SDF value, and marching cubes [11] are used to obtain an explicit mesh.

Subsequent follow-up research proposed several important observations. The first is that the network should pay more attention to local information, which is conducive to learning better generalization. Points2Surf [26] proposes to sample two patches, local and global, based on query points in the input point cloud, and learn two encoders for each patch as two separate branches, combining local and global information to learn the discriminator. Local Implicit Grid (LIG) [27] proposes to not directly learn the entire object, but rather to learn local parts. During reconstruction, already learned patches are combined using an optimization algorithm to obtain the optimal combination. This is similar to our approach, but we learn primitive patches with geometric meaning, while they randomly cut patches. The second observation is that convolutional networks have stronger representation abilities than the fully-connected networks used in the early stages, which can bring better reconstruction results. ConvONet [28] projects features extracted from the point cloud onto a feature grid, further aggregates features using 3D CNN on the feature grid, and finally hands them over to a discriminator to generate the model. Overall, existing methods based on neural implicit representation have not paid special attention to boundaries and high fidelity, and therefore cannot obtain high-quality mesh models similar to our method.

In Section 4.2, we provide a detailed comparison of our method with representative works mentioned above, and we give a more detailed analysis.

2.2. Primitive Detection

In recent decades, more higher-level priors, such as semantic and geometric priors, have been emphasized in reconstruction research. Primitive-based approaches assume that objects can be represented by a combination of simple and standard geometric shapes (such as planes, spheres, cylinders, cones, etc.).

2.2.1. Non-Learning-Based Methods

Popular non-learning primitive detection methods are mainly based on RANdom SAmple Consensus (RANSAC) [4] and its variants [5,29], which are widely used due to their robustness to outliers. Among them, Schnabel et al. [5] first proposed extracting primitives such as planes, spheres, cylinders, and cones from point clouds based on RANSAC, and reconstructing models through their combination. Schnabel et al. [29] extrapolated all detected primitives by RANSAC and calculated the intersection points between them. The extrapolation of the primitives is interpreted as a graph cut problem.

A limitation of RANSAC-based methods is that if some parts of the point cloud model cannot be well represented by the defined primitives, the reconstruction result cannot be obtained. Lafarge et al. [30] attempted to solve this problem through a hybrid approach. Part of the regular structure is represented by planes and their combinations, while the area without primitives is reconstructed using the graph cut method based on Delaunay triangulation. Nan et al. [31] used planes to approximate all surfaces, they obtained a set of candidate faces through the intersection of planes detected by RANSAC, then selected the optimal subset through binary linear optimization to generate lightweight reconstruction results. It is only suitable for objects containing only planes.

In general, non-learning primitive-based methods suffer from the trouble of parameter tuning, which requires manual adjustments for each model and primitive. However, the idea of combining the primitives to generate models by selection and optimization inspired our work.

2.2.2. Learning-Based Methods

Recent works extract features from point clouds by supervised training of a neural network to detect primitives. CSGNet [32] and CAPRI-Net [33] reconstruct objects by combining the detected quadric surface primitives via constructive solid geometry (CSG) operations. However, they require a labeled hierarchical structure for primitives, which may not always be readily available.

The most relevant works to our framework are SPFN [8], ParSeNet [9], and HPNet [10]. SPFN is the pioneering work among them, which proposes predicting pointwise properties by segmentation labels, type labels, and normals, and fitting the primitive parameters through a differential model estimation module. ParSeNet adds the prediction of B-spline patches on the basis of SPFN, which increases the expressive ability of surface models. Both SPFN and ParSeNet only use high-dimensional semantic supervision information for prediction. HPNet improves the prediction accuracy by introducing additional geometric features (sharp edges).

Learning-based methods avoid trivial manual tuning of parameters, but the supervised methods still suffer from the problem of data dependence. It is foreseeable that as larger models and larger datasets are continuously proposed, learning-based methods will have more room for improvement.

3. Method

The input of our framework is a 3D point cloud

P = {p_{i} | 1 \leq i \leq N}

, and each point

p_{i}

contains position

p_{i} \in R^{3}

and normal

n_{i} \in R^{3}

; therefore,

p_{i} \in R^{6}

. Our goal is to finally obtain a high-fidelity watertight mesh model.

Figure 3 shows the pipeline of our framework; this section details the implementation of primitive detection module, mesh fitting and splitting module, and selection module. More experiments are detailed in Section 4.

3.1. Primitive Detection Module

3.1.1. Coarse Primitive Detection Based on Supervised Learning

As shown in Figure 3, our primitive detection module adopts a two-step strategy, which first applies an improved supervised-learning-based coarse primitive detector. Here, we first briefly introduce the previous state-of-the-art primitive detector HPNet [10], and then describe our improvements based on it.

HPNet employs a two-stage hybrid model that combines neural networks with geometric constraints. The trainable part employs a three-layer DGCNN [34] as the backbone encoder, which encodes each point

p_{i} \in P

into

R^{256}

feature space, and then outputs through multiple classification heads: an embedding descriptor

e_{i} \in R^{128}

, a binary primitive type prediction vector

t_{i} \in {\{0, 1\}}^{6}

(corresponds to 6 primitive types: plane, sphere, cylinder, cone, B-spline-open, and B-spline-closed), and a shape parameter prediction vector

s_{i} \in R^{22}

. With regard to the nontrainable post-processing part, HPNet constructs two constraints, named geometric consistency and smoothness consistency, according to the primitive parameters and the known normal. According to these two constraints, the embedding descriptors

e \in R^{N \times 128}

are then clustered by mean-shift [35] to obtain the final K patches

P_{k}

,

\{1 \leq k \leq K\}

and

P = P_{1} \cup \dots \cup P_{K}

.

HPNet achieves the best primitive detection performance due to its use of geometric constraints, but the main limitation comes from the DGCNN backbone, which makes the throughput during training significantly lower than other learning-based point cloud processing methods, resulting in higher training costs. In addition, recent work PointNeXt [36] proves that even the most classic and widely used backbone PointNet++ [37], after simply adopting improved training strategies, can outperform some recent complex designed backbones (such as PointMLP [38] and Point Transformer [39]).

Therefore, as per Figure 4, we keep the same two-stage design as HPNet and make the following enhancements: (1) We replaced DGCNN with PointNeXt-b, a classic PointNet++ equipped with the bottleneck structure [40]), resulting in a significant improvement in throughput and detection performance; (2) we switched from Adam optimizer to AdamW; (3) we implemented cosine learning rate decay; and (4) we incorporated label smoothing. As a result, our primitive detection network achieved higher segmentation mean IoU and primitive type mean IoU scores of

88.42 / 92.85

on the ABCParts [9] benchmarks, compared to the original scores of

85.24 / 91.04

. Additionally, we improved throughput from 8 ins/sec to 28 ins/sec. For more detailed experimental information, please refer to Section 4.

3.1.2. Refine Clustering via Normal Angle

The accuracy of the segmentation results obtained from the primitive detection network depends heavily on the quality of labels used in its supervised training. For instance, the ABCParts [9] benchmark dataset is automatically labeled and oversegmented, which can negatively impact the segmentation results. To address this issue, we introduced a refine clustering submodule that can optionally be used to improve the oversegmentation problem by enforcing normal angle constraints. This submodule can be skipped if the upstream task already produces accurate segmentation results.

Specifically, our submodule first eliminates patches with fewer than

N_{m i n}

points and computes the convex hull for the remaining patches. Next, taking two adjacent patches,

P_{p}

and

P_{q}

, as an example, we detect their adjacency relationship and determine whether to merge them. Here, we refer to the clustering approach of P-linkage [41] to automatically cluster each patch into a collection of smaller slices, resulting in cluster sets

C_{p} = {{c_{p}}_{i} | 1 \leq i \leq N_{p}}

and

C_{q} = {{c_{q}}_{j} | 1 \leq j \leq N_{q}}

. Then, we apply principal component analysis (PCA) [42] locally to each

c_{p}

and

c_{q}

to compute their normals, representing each patch as multiple slices with normals. Finally, if more than

N_{c}

slice pairs satisfy the following Equation (1), it indicates that the two patches

P_{p}

and

P_{q}

have a good adjacent smooth transition, and they will be merged.

\arccos | {\vec{n (c_{p})}}^{⊤} \cdot \vec{n (c_{q})} | \leq θ_{t}

(1)

where

θ_{t}

is the angle threshold that determines the curvature of the surface to be merged.

In summary, our refine clustering submodule effectively removes disturbances caused by trivial patches and merges oversegmented patches.

3.2. Mesh Fitting and Splitting Module

After extracting the point cloud patches from the primitive detection module, this section introduces our mesh fitting and splitting module, which is used to fit meshes to each patch and split intersecting meshes. The module consists of four steps: mesh fitting, intersection line detection, pairwise splitting, and partitioning triangles in nonintersecting area.

For mesh fitting, We use Huang et al. [16]’s method (other meshing methods can also be used, such as screened Poisson reconstruction [18], which can handle most cases faster but may not be able to fill holes) to produce triangular meshes

S = {S_{i} | 1 \leq i \leq N_{s}}

, which can handle cases with holes or missing data. Note that a slightly wider grid or scale should be set to ensure overlapping and intersection between meshes (such as Figure 3).

For intersection line detection, we iterate through and pairwise check for intersections. For a pair of surfaces

S_{a}

and

S_{b}

in

S

, we first calculate the axis-aligned bounding boxes [43] for all triangles on each surface separately. Collision detection of the bounding boxes can determine all the intersecting triangle pairs on the two surfaces, that is, where the intersection lines are. This is more efficient than traversing all the triangles to calculate the intersection line.

For the pairwise splitting, the intersection line detection has already identified pairs of intersection triangles. Take

Δ a \in S_{a}

and

Δ b \in S_{b}

in Figure 5 for an example; we fit triangle

Δ b

to plane

\bar{b}

, and divide triangle

Δ a

according to the relationship between the intersection point and the vector as follows:

\vec{v} = \vec{p_{1} p_{2}} \times \vec{u}

(2)

Δ p_{1} a_{3} p_{2} \in \{\begin{matrix} S_{a} (A), & \vec{p_{1} a_{3}} \cdot \vec{v} \geq 0 \\ S_{a} (B), & \vec{p_{1} a_{3}} \cdot \vec{v} < 0 \end{matrix}

(3)

where

p_{1}

,

p_{2}

are the intersection points of plane

\bar{b}

and triangle

Δ a

,

\vec{u}

is the known normal vector pointing either inward or outward of the model. Vector

\vec{v}

is the vector that lies on the plane of triangle

Δ a

and is perpendicular to vector

\vec{p_{1} p_{2}}

(Equation (2)). If

\vec{p_{1} a_{3}} \cdot \vec{v} \geq 0

, triangle

Δ p_{1} a_{3} p_{2}

is included in set

S_{a} (A)

(marked in red in the figure), triangles

Δ p_{1} p_{2} a_{2}

,

Δ p_{1} a_{2} a_{1}

are included into set

S_{a} (B)

(marked in green in the figure). Whereas triangle

Δ p_{1} a_{3} p_{2}

is included in set

S_{a} (B)

, triangles

Δ p_{1} p_{2} a_{2}

and

Δ p_{1} a_{1} a_{2}

are include in set

S_{a} (A)

. Note that the vector

\vec{p_{1} p_{2}}

of adjacent triangles should be in the same direction.

For partitioning triangles in a nonintersecting area, such as

Δ o

in Figure 6, we locate the closest intersection point p on the intersection line to

Δ o

, then categorize

Δ o

in either

S_{a} (A)

or

S_{a} (B)

based on the already divided triangles near p, according to the following equation:

\sum_{i = 1}^{k} dis (o, a_{i}) - \sum_{i = 1}^{k} dis (p, a_{i}) \leq \sum_{i = 1}^{k} dis (o, b_{i}) - \sum_{i = 1}^{k} dis (p, b_{i})

(4)

where k is the K-neighborhood of the two divided triangle sets

S_{a} (A)

and

S_{a} (B)

taken at point p, and

dis (a, b)

represents the Euclidean distance from

Δ a

to

Δ b

. In this way, we split

S_{a}

into two subsurfaces,

S_{a} = S_{a} (A) \cup S_{a} (B)

.

We apply these operations to all surfaces, splitting them into a set of candidate faces

S_{candi}

that are segmented along intersecting lines. This approach is crucial in producing high-quality boundaries in the final model. Note that this meshing and splitting algorithm operates discretely on each surface and triangle and can be well parallelized. In practice, the C++ implementation of the parallelized version has an efficiency improvement of approximately

10 \times

, as discussed in Section 4.

3.3. Selection Module

We treat the reconstruction problem as an optimal subset selection problem, and for this we introduce the selection module to choose appropriate meshes from candidate surfaces

S_{candi}

and combine them into a reasonable reconstruction model. We define the following data fitting and 3D structural similarity energy terms to constitute the optimization objective function.

3.3.1. Energy Terms

The set

S_{candi} = {s_{i} | 1 \leq i \leq N_{s c}}

of

N_{s c}

candidate surfaces is known from the mesh fitting and splitting module; we define the binary variable x to represent whether

s_{i}

is chosen (i.e.,

x = 1

) or not (i.e.,

x = 0

).

Data fitting. This term assesses the degree of alignment between the generated surfaces and the point sets while taking into consideration the points’ confidence [31,44]. Its definition is as follows:

E_{f} = 1 - \frac{1}{N} \sum_{i = 1}^{N_{s c}} x_{i} \cdot supp (s_{i})

(5)

where N represents the total number of points in the point cloud

P

, while

supp (s_{i})

factors in the distance between the point and the surface, the point distribution, and the local sampling uniformity, as follows:

supp (s) = \sum_{p, s | dist (p, s) < ϵ} (1 - \frac{dist (p, s)}{ϵ}) \cdot conf (p)

(6)

dist (p, s) = min \{dist (p, f_{j}) | f \in s, 1 \leq j \leq N_{f}\}

(7)

conf (p) = \frac{1}{3} \sum_{i = 1}^{3} (1 - \frac{3 λ_{i}^{1}}{λ_{i}^{1} + λ_{i}^{2} + λ_{i}^{3}}) \cdot \frac{λ_{i}^{2}}{λ_{i}^{3}}

(8)

where

dist (p, s)

in Equation (7) represents the Euclidean distance from a point

p \in P

to the candidate surface s. Only points whose distance from the surface is less than

ϵ

are considered. Mesh s contains

N_{f}

faces, denoted by f.

λ_{i}^{1} \leq λ_{i}^{2} \leq λ_{i}^{3}

in Equation (8) are the three eigenvalues of the covariance matrix at scale i. The property

1 - 3 λ^{1} / λ^{1} + λ^{2} + λ^{3}

in

conf (p)

measures the quality of fitting a tangent plane in the local neighborhood. A value closer to 0 indicates a poor point distribution, whereas a value of 1 implies a perfectly fitting plane. The property

λ^{2} / λ^{3}

in

conf (p)

gauges the uniformity of point sampling in the local neighborhood. Its value ratio ranges from 0 to 1, with 0 indicating perfect line distribution and 1 representing uniform disk distribution.

In essence, the data fitting term biases the final result towards selecting candidate faces that are proximal to the input points and have a dense and uniform point distribution.

3D structural similarity. To ensure the reliability of the reconstruction results, we cannot rely solely on data fitting because it may stubbornly select surfaces around data points, and the input point cloud may contain defects such as noise or missing data. Additionally, there may be gaps in the boundary area after the primitive detection module, and mesh splitting may lead to nonunique intersections in the missing or gap area (shown as Figure 7a). These factors make data fitting unable to choose a reasonable result. Hence, we add a 3D structural similarity term that considers the global structure information of the model to the objective function.

The input point set

P

contains structural information, and the final selected surface set

S_{out} \in S_{candi}

should be structurally similar to the input point cloud, defined as

E_{s s} = 1 - similarity (S_{out}, P)

(9)

S_{out} = \sum_{i = 1}^{N_{s c}} x_{i} \cdot s_{i}

(10)

where

P

is the known segmented patches, and the range of

similarity (S_{out}, P)

is

(0, 1]

, where the closer the value is to 1, the higher the similarity. It is defined as

similarity (S_{out}, P) = \frac{2 μ_{S} \cdot μ_{P}}{{μ_{S}}^{2} + {μ_{P}}^{2} + η} \cdot \frac{2 σ_{S} \cdot σ_{P}}{{σ_{S}}^{2} + {σ_{P}}^{2} + η} \cdot \frac{σ_{S P}}{σ_{S} \cdot σ_{P} + η}

(11)

where

η

is a small number to avoid division by zero.

μ_{P}

and

σ_{P}

represent the mean and variance of

P

,

μ_{S}

and

σ_{S}

represents the mean and variance of

S_{o u t}

, and

σ_{S P}

represents the covariance between

P

and

S_{o u t}

. These values are computed by randomly sampling the same number of points from both the point cloud and the mesh, and calculating them based on the coordinates of the sampled points. According to Equation (10), they can be written as follows:

μ_{S} = \sum_{i = 1}^{N_{s c}} x_{i} \cdot μ_{s_{i}}, σ_{S} = σ (\sum_{i = 1}^{N_{s c}} x_{i} \cdot s_{i}), σ_{S P} = σ (\sum_{i = 1}^{N_{s c}} x_{i} \cdot s_{i}, P)

(12)

Similarly, a fixed number of sampling points is used to calculate them. This transforms the problem into a binary linear combination optimization problem, which is optimized together with the data fitting term.

The 3D structural similarity term aims to minimize the distribution difference between the surface set

S_{out}

and the point set

P

, resulting in a reconstruction that has a similar global structure to

P

.

Figure 7 illustrates the effect of this term. When there are gaps between patches and nonunique intersections, the data fitting term alone may not result in a reasonable reconstruction (Figure 7b). However, after adding the 3D structural similarity term (Figure 7c), the optimizer can achieve the desired result.

3.3.2. Optimization

We use the energy terms defined above to formulate the following optimization model, which selects the best set of candidate surfaces and ensures the watertightness of the final model through hard constraints:

\begin{array}{l} min_{X} & λ_{f} \cdot E_{f} + λ_{s s} \cdot E_{s s} \\ s . t . & \{\begin{cases} \sum_{j \in N (e_{i})}^{} x_{j} = 2 o r 0, & 1 \leq i \leq |E| \\ x_{i} \in \{0, 1\}, & 1 \leq i \leq |E| \end{cases} \end{array}

(13)

\sum_{j \in N (e_{i})}^{} x_{j} = 2 o r 0

ensures that an intersecting boundary

e_{i}

is adjacent to only two surfaces in the final result, thereby ensuring the watertightness of the final model. We solve this binary optimization model using the Gurobi [45] optimizer.

4. Results

4.1. Datasets

We present experimental results on two widely-used datasets of manmade objects, namely, ABCParts [9] and Thingi10K [46]. ABCParts is a subset of the ABC dataset [47], which is considered a standard benchmark for learning-based primitive detection methods [9,10,48] in recent times. It comprises the point cloud of

30 k

CAD models, where each point cloud has

10, 000

points and at least one curved surface. We trained and evaluated our primitive detection network on this dataset.

For nonlearnable modules, Thingi10K is more challenging in demonstrating algorithm performance. It consists of over

10 k

objects that have been uploaded by users for 3D printing. We selected models containing curved surfaces and sharp edges to demonstrate the framework’s efficacy.

4.2. Experiment Details and Results Analysis

Training strategies. For the convenience of comparison, except for the improved part, we adopt the same training strategies as HPNet, and adopt the same loss function and weights. Refer to Section 3.1 for the improvements. The same strategies include

24 k / 4 k / 4 k

dataset division, downsampling to 7000 points for each point cloud, input channel contains point + normal, same data augmentation, and learning rate

l r = 0.001

for 150 epochs. The throughput is measured with an Nvidia GeForce RTX 3090 24 GB GPU and a 16-core Intel i7 @2.8 GHz CPU.

Evaluation metrics. We use the following metrics for evaluating the segmentation and primitive labeling.

Seg-IoU: this metric measures the similarity between the predicted patches and ground truth segments: $\frac{1}{K} \sum_{k = 1}^{K} I o U (W [:, k], \hat{W} [:, k])$ , where W is the predicted segmentation membership for each point cloud, $\hat{W}$ is the ground truth, and K is the number of ground truth segments.
Type-IoU: this metric measures the classification accuracy of primitive type prediction: $\frac{1}{K} \sum_{k = 1}^{K} I [t_{k} = \hat{t_{k}}]$ , where $t_{k}$ is the predicted primitive type for the kth segment patch and $\hat{t_{k}}$ is the ground truth. $I$ is an indicator function.
Throughput: this metric measures the efficiency performance of the network: ins./sec., meaning maximum number of instances the network can handle per second.

We use Hungarian matching [49] to find correspondences between predicted segments and ground-truth segments.

Analysis of results. Our framework’s reconstruction results, both quantitative and qualitative, are presented in Table 1 and Figure 8, respectively, for the ABCParts benchmark. The detection performance of our primitive detection module, as shown in Table 1, outperforms other state-of-the-art methods, particularly in terms of throughput efficiency, which can be attributed to the PointNet++ backbone. Additionally, Figure 8 shows that our method produces high-quality watertight models that are close to the ground truth and surpass all previous methods. This is thanks to our proposed mesh fitting and splitting module and selection module.

In contrast, the primitive blocks detected by ParSeNet [9] may contain breaks or errors, particularly near boundaries. Although it incorporates a learnable fit module to reconstruct mesh models, it may generate gaps or overlaps (such as (a), (f), (h), and (j) in the third row). This highlights the difficulty of training the surface fitting task compared to primitive detection. HPNet produces more precise primitive detection results than ParSeNet, but it may encounter oversegmentation issues ((c), (d), (e), (i), and (k) in the fifth row). As HPNet does not consider mesh model reconstruction, for comparison, we used the classical mesh reconstruction algorithm ball pivoting [50] to obtain the mesh models (sixth row).

Ablation analysis. Table 2 displays the enhancements of our primitive detection module compared to the baseline. The most significant improvement comes from the backbone, which not only boosts the throughput from 8 to 28 instances per second but also enhances the detection score. Moreover, incorporating more advanced training strategies such as AdamW [51], cosine learning rate decay, and label smoothing [52] can also lead to improved performance.

Comparison with learning-based reconstruction methods. Figure 9 provides a comparison of our method with recently popular deep-learning-based surface reconstruction methods. Please refer to Section 2.1.2 for more information on these methods. We utilized the authors’ open-source code and pretrained models while following the recommended settings in their papers.

Overall, most learning-based methods train networks to predict occupancy or SDF values, making them unsuitable for representing hollow shapes such as those indicated by the blue boxes in the figure. The reconstruction results of Points2surf [26] have surface artifacts and poor boundary quality, possibly because of their design of separate local and global branches. Local Implicit Grid (LIG) [27] performs better since it trains the network to learn parts rather than the entire object, making it more local and having better generalization in surface reconstruction tasks. However, as mentioned by the authors, it may suffer from the “back-faces” problem, as shown by the yellow boxes. Convolutional Occupancy Networks (ConvOnet) [28] proposes two networks that use 2DCNN and 3DCNN, respectively, named ConvOnet-3plane (shown as ConvOnet-3p in the figure) and ConvOnet-grid (shown as ConvOnet-grid32 in the figure, with a default resolution of

32^{3}

). We conducted comparative tests on both models. The three-plane network is more efficient but has poorer representation capabilities, which may result in artifacts in the reconstruction results (as shown by the red boxes in the figure). The grid model has better representation capabilities but is relatively less efficient. Moreover, since ConvOnet uses one latent vector to represent one object, it may make semantic recognition errors and not be as local as Points2surf or ConvOnet, as shown by the green box in the figure.

In contrast, our method achieved the best reconstruction results, with a clean surface and sharp edges, with only a small amount of distortion at the boundary of the last model (shown by the green box in the last row).

Comparison with traditional reconstruction methods. Figure 10 illustrates a comprehensive comparison between our method and five other well-known reconstruction techniques. They include the classical reconstruction algorithm screened Poisson (SCP) [18], as well as four methods specifically designed for preserving sharp features: APSS [54], RIMLS [20], EAR [21], and PolyFit [31], which are detailed in Section 2. Two models of varying complexity are used: a Vase with only four surfaces (top of Figure 10) and a Fandisk with approximately 20 faces (bottom of Figure 10). Both models utilize point clouds with

10 k

points and moderate levels of Gaussian noise (

σ = 0.02 d

, where d represents the length of the diagonal of the bounding boxes).

SCP accurately reconstructs the surface noise but lacks specific smoothing on surface or sharpening effects on the boundaries. APSS and RIMLS, with their piecewise smoothing design, partially smooth the surface noise but introduce jagged edges at the boundaries. EAR effectively smooths the surface noise and enhances sharp edges; however, extensive upsampling leads to deformations near the boundaries and an excessive number of triangles in the reconstruction. PolyFit represents all surfaces as planes, rendering it unsuitable for models with curved surfaces. In contrast, our method excels at accurate shape reconstruction while preserving clear and sharp boundaries, resulting in superior visual outcomes compared to the other techniques.

Timeliness analysis. Table 3 provides statistics on the reconstruction results in Figure 10, including the number of triangles in the reconstructed meshes and the runtime of the algorithms. For SCP, APSS, and RIMLS, the open-source versions available in MeshLab [55] with their default settings were utilized. Regarding EAR and PolyFit, we employed the executable programs provided by their respective authors and followed the guidelines outlined in their papers for parameter tuning and usage.

The comparison reveals that traditional methods such as SCP, APSS, and EAR, which do not incorporate segmentation, exhibit relatively fast algorithm speeds. However, when dealing with more complex models, they necessitate dense triangle representations for accurate reconstruction. Specifically, EAR, with its extensive upsampling for edge enhancement, produces an excessive number of triangles in the reconstruction, leading to significantly longer runtime. Conversely, PolyFit represents the entire model using planes for lightweight representation, but it struggles to accurately capture shapes containing curved surfaces.

In our reconstruction framework, explicit segmentation is incorporated, yet the algorithm’s runtime remains within an acceptable range. We also implemented a parallel accelerated version utilizing OpenMP [56]. Since our algorithm operates on each face independently, it can be effectively parallelized, further reducing the runtime. Furthermore, our method does not require dense triangles to preserve sharp features, allowing for lightweight representations. Users can adjust the number of triangles according to their needs, which is discussed in detail in subSection 4.3.

Geometric error analysis. To analyze the errors of various reconstruction methods, we visualized the noise-free original point cloud and computed the shortest distances to the reconstructed meshes. The resulting heatmap, colored based on the shortest distance, is presented in Figure 11. Additionally, we provide statistical results for relevant error metrics in Table 4.

It can be observed that SCP exhibits a global error distribution, primarily due to the presence of noise. APSS and RIMLS can only smooth relatively low-level surface noise and may increase errors at the boundaries. EAR can effectively smooth the surface and enhance the boundaries, but it also introduces increased boundary errors. In comparison, our method achieves the overall minimum geometric error and accurately reconstructs the boundaries, resulting in the most accurate reconstruction.

4.3. Exploratory Experiments

In this subsection, we present exploratory experiments that demonstrate the extensible possibilities of our proposed framework and modules. These experiments include integrating with other tasks, conducting low-density mesh representation performance testing, and performing more robustness testing.

Combined with Edge extraction. Our proposed framework can be applied not only to primitive detection, but also flexibly combined with other point cloud processing tasks, such as edge extraction, to achieve high-fidelity reconstruction models.

To demonstrate this, we selected three state-of-the-art and representative works and tested them on the Thingi10K dataset. They belong to heuristic traversal algorithms, learning-based methods, and semiautomatic methods with manual assistance: Merigot et al. [57] detected edges by thresholding the Voronoi covariance measure (VCM), Wang et al. [48] detected edges by supervised training of a neural network named PIE-Net, and Zhuang et al. [58] extracted feature edges semiautomatically by combining geodesic distance and hand-marking labels, named Live-Wire.

Given the boundaries output by these methods, we segmented the original point cloud into patches using a fixed small neighborhood and performed reclustering. Then, we combined our proposed mesh fitting and splitting module and selection module to produce reconstruction models. Figure 12 shows the visualization results. It is evident that our framework can still achieve high-fidelity and watertight mesh models from the extracted boundaries.

Combined with RANSAC. Our method can also be combined with the classical RANSAC [4,5] method for more application scenarios. Figure 2 shows the results on a real scanned urban building point cloud, where it can be seen that our proposed module still produces high-quality models with preserved sharp features. It should be noted that RANSAC [5] requires careful tuning for each input and needs to be combined with our proposed refine submodule to obtain satisfactory segmentation patches.

Low-density mesh representation. Our method is distinct from common reconstruction algorithms that rely on dense triangle meshes to achieve high fidelity. To demonstrate its ability to handle low-density meshes, we designed the following experiment. As shown in Figure 13, we gradually reduced the triangle density during mesh fitting for the same input point cloud and tested our method’s reconstruction effect. As the number of triangle faces decreases, the surface becomes more distorted, but the intersecting boundary remains sharp. Therefore, users can adjust the appropriate number of triangular faces according to their needs to obtain lightweight models and avoid poor boundaries. This is beneficial for subsequent efficient storage and computation.

Robustness evaluation. To test the robustness of our algorithm, we examined the reconstruction effect of our framework on noisy and missing data.

As depicted in Figure 14, we added random Gaussian noise during the training of the primitive detection module as a data augmentation technique. During testing, we added varying degrees of Gaussian noise to the same point cloud, with d representing the length of an edge of the object. Our algorithm demonstrated robustness to moderate Gaussian noise.

For missing data, as shown in Figure 15, we artificially cropped the input point cloud. Despite this, the refine submodule Section 3.1.2 was still able to detect all patches and merge them via normal angle. The mesh fitting and segmentation module is responsible for filling in the missing parts based on the merged patches. It is worth noting that our default HRBF-based meshing method [16] is required here, as screened Poisson will not work. The reconstruction framework was still able to produce the desired result.

5. Conclusions

Our work presents a novel framework for 3D mesh reconstruction from point clouds, which is based on primitive detection and is designed to preserve sharp features with high fidelity. Unlike previous methods, our approach emphasizes achieving high-quality overall reconstruction, particularly on sharp boundaries. To achieve this goal, we developed multiple modules that cover the entire reconstruction process and result in watertight and high-fidelity models. Our method outperforms the state-of-the-art on most metrics, and produces models that are closer to the ground truth and have smaller errors than recent learning-based reconstruction and classic mesh reconstruction methods. Additionally, we demonstrate the versatility of our designed modules by applying them to other tasks, such as edge extraction and RANSAC, resulting in high-quality models.

As larger networks and datasets become available, we expect further improvements in feature extraction and primitive detection for point clouds, making our framework even more valuable for various applications in the future.

Future prospects. We believe that, compared to 2D datasets, the current availability and quality of 3D segmentation datasets are still limited. This particularly applies to primitive segmentation data, which is why our method currently performs well only on CAD data. However, the modules within our framework are designed to be flexible, enabling them to adapt to future developments. We anticipate that as larger networks and higher-quality datasets, including urban buildings and objects in indoor scenes, become available, or with further advancements in unsupervised segmentation methods, the reconstruction performance of our method will continue to improve, making our framework even more valuable for various applications in the future.

Limitations. It is worth noting that primitive representation is a strong prior, and our method may not be applicable to all types of objects that are not suitable for representation using primitives. Additionally, in the selection module, we treat the reconstruction problem as a binary linear combination optimization problem, which may limit our method’s ability to complete large missing areas in the input point cloud, such as in the case of extensive missing data. In these cases, our method may not be able to achieve the correct reconstruction result, unlike recent deep-learning-based methods [23,24,28].

Author Contributions

Conceptualization, Q.L.; funding acquisition, J.X.; project administration, J.X.; supervision, J.X., S.X. and Y.W.; validation, Q.L., J.X., S.X. and Y.W.; visualization, Q.L.; writing—original draft, Q.L.; writing—review and editing, J.X., S.X. and Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (U2003109, U21A20515, 62102393, 62206263, 62271467), the Strategic Priority Research Program of the Chinese Academy of Sciences (No. XDA23090304), the Youth Innovation Promotion Association of the Chinese Academy of Sciences (Y201935), the State Key Laboratory of Robotics and Systems (HIT) (SKLRS-2022-KF-11), and the Fundamental Research Funds for the Central Universities and China Postdoctoral Science Foundation (2022T150639, 2021M703162).

Data Availability Statement

This work uses the following datasets, all of which can be obtained from the internet. ABC at https://deep-geometry.github.io/abc-dataset/ (accessed on 14 October 2022); Thingi10K at https://ten-thousand-models.appspot.com/ (accessed on 14 October 2022).

Acknowledgments

The authors would like to thank the reviewers for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Berger, M.; Tagliasacchi, A.; Seversky, L.M.; Alliez, P.; Guennebaud, G.; Levine, J.A.; Sharf, A.; Silva, C.T. A Survey of Surface Reconstruction from Point Clouds. Comput. Graph. Forum 2017, 36, 301–329. [Google Scholar] [CrossRef] [Green Version]
Berger, M.; Tagliasacchi, A.; Seversky, L.M.; Alliez, P.; Levine, J.a.; Sharf, A.; Silva, C.T.; Tagliasacchi, A.; Seversky, L.M.; Silva, C.T.; et al. State of the Art in Surface Reconstruction from Point Clouds. In Proceedings of the 35th Annual Conference of the European Association for Computer Graphics, Eurographics 2014-State of the Art Reports (No. CONF), Strasbourg, France, 7–11 April 2014; Volume 1, pp. 161–185. [Google Scholar]
Kaiser, A.; Ybanez Zepeda, J.A.; Boubekeur, T. A Survey of Simple Geometric Primitives Detection Methods for Captured 3D Data. Comput. Graph. Forum 2019, 38, 167–196. [Google Scholar] [CrossRef]
Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. In Readings in Computer Vision; Fischler, M.A., Firschein, O., Eds.; Morgan Kaufmann: San Francisco, CA, USA, 1987; pp. 726–740. [Google Scholar] [CrossRef]
Schnabel, R.; Wahl, R.; Klein, R. Efficient RANSAC for point-cloud shape detection. Comput. Graph. Forum 2007, 26, 214–226. [Google Scholar] [CrossRef]
Yan, D.M.; Wang, W.; Liu, Y.; Yang, Z. Variational mesh segmentation via quadric surface fitting. CAD Comput. Aided Des. 2012, 44, 1072–1082. [Google Scholar] [CrossRef]
Lafarge, F.; Mallet, C. Creating large-scale city models from 3D-point clouds: A robust approach with hybrid representation. Int. J. Comput. Vis. 2012, 99, 69–85. [Google Scholar] [CrossRef] [Green Version]
Li, L.; Sung, M.; Dubrovina, A.; Yi, L.; Guibas, L.J. Supervised fitting of geometric primitives to 3D point clouds. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 2647–2655. [Google Scholar] [CrossRef] [Green Version]
Sharma, G.; Liu, D.; Maji, S.; Kalogerakis, E.; Chaudhuri, S.; Měch, R. ParSeNet: A Parametric Surface Fitting Network for 3D Point Clouds. Lect. Notes Comput. Sci. 2020, 12352, 261–276. [Google Scholar] [CrossRef]
Yan, S.; Yang, Z.; Ma, C.; Huang, H.; Vouga, E.; Huang, Q. HPNet: Deep Primitive Segmentation Using Hybrid Representations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021. [Google Scholar] [CrossRef]
Lorensen, W.E.; Cline, H.E. Marching Cubes: A High Resolution 3D Surface Construction Algorithm. SIGGRAPH Comput. Graph. 1987, 21, 163–169. [Google Scholar] [CrossRef]
Alexa, M.; Behr, J.; Cohen-Or, D.; Fleishman, S.; Levin, D.; Silva, C.T. Computing and rendering point set surfaces. IEEE Trans. Vis. Comput. Graph. 2003, 9, 3–15. [Google Scholar] [CrossRef] [Green Version]
Wang, H.; Scheidegger, C.E.; Silva, C.T. Bandwidth selection and reconstruction quality in point-based surfaces. IEEE Trans. Vis. Comput. Graph. 2009, 15, 572–582. [Google Scholar] [CrossRef] [Green Version]
Carr, J.C.; Beatson, R.K.; Cherrie, J.B.; Mitchell, T.J.; Fright, W.R.; McCallum, B.C.; Evans, T.R. Reconstruction and representation of 3D objects with radial basis functions. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 2001, Los Angeles, CA, USA, 12–17 August 2001; pp. 67–76. [Google Scholar] [CrossRef]
Brazil, E.V.; Macedo, I.; Sousa, M.C.; de Figueiredo, L.H.; Velho, L. Sketching Variational Hermite-RBF Implicits. In Proceedings of the Seventh Sketch-Based Interfaces and Modeling Symposium, Annecy, France, 7–10 June 2010; SBIM ’10. pp. 1–8. [Google Scholar]
Huang, Z.; Carr, N.; Ju, T. Variational implicit point set surfaces. ACM Trans. Graph. 2019, 38, 124. [Google Scholar] [CrossRef]
Kazhdan, M.; Bolitho, M.; Hoppe, H. Poisson Surface Reconstruction. In Proceedings of the Fourth Eurographics Symposium on Geometry Processing, Sardinia, Italy, 26–28 June 2006; SGP ’06. pp. 61–70. [Google Scholar]
Kazhdan, M.; Hoppe, H. Screened poisson surface reconstruction. ACM Trans. Graph. 2013, 32, 29. [Google Scholar] [CrossRef] [Green Version]
Fleishman, S.; Cohen-Or, D.; Silva, C.T. Robust moving least-squares fitting with sharp features. ACM Trans. Graph. 2005, 24, 544–552. [Google Scholar] [CrossRef] [Green Version]
Öztireli, A.C.; Guennebaud, G.; Gross, M. Feature Preserving Point Set Surfaces based on Non-Linear Kernel Regression. Comput. Graph. Forum 2009, 28, 493–501. [Google Scholar] [CrossRef] [Green Version]
Huang, H.; Wu, S.; Gong, M.; Cohen-Or, D.; Ascher, U.; Zhang, H.R. Edge-aware point set resampling. ACM Trans. Graph. 2013, 32, 9. [Google Scholar] [CrossRef]
Lipman, Y.; Cohen-Or, D.; Levin, D.; Tal-Ezer, H. Parameterization-free projection for geometry reconstruction. In Proceedings of the ACM SIGGRAPH Conference on Computer Graphics, San Diego, CA, USA, 5–9 August 2007. [Google Scholar] [CrossRef] [Green Version]
Chen, Z.; Zhang, H. Learning Implicit Fields for Generative Shape Modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Park, J.J.; Florence, P.; Straub, J.; Newcombe, R.; Lovegrove, S. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. In Proceedings of the The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Mescheder, L.; Oechsle, M.; Niemeyer, M.; Nowozin, S.; Geiger, A. Occupancy Networks: Learning 3D Reconstruction in Function Space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Erler, P.; Guerrero, P.; Ohrhallinger, S.; Mitra, N.J.; Wimmer, M. Points2Surf: Learning Implicit Surfaces from Point Clouds. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; pp. 108–124. [Google Scholar] [CrossRef]
Jiang, C.M.; Sud, A.; Makadia, A.; Huang, J.; Nießner, M.; Funkhouser, T. Local Implicit Grid Representations for 3D Scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Songyou, P.; Michael, N.; Lars, M.; Marc, P.; Andreas, G. Convolutional Occupancy Networks. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020. [Google Scholar]
Schnabel, R.; Degener, P.; Klein, R. Completion and reconstruction with primitive shapes. Comput. Graph. Forum 2009, 28, 503–512. [Google Scholar] [CrossRef]
Lafarge, F.; Alliez, P. Surface reconstruction through point set structuring. Comput. Graph. Forum 2013, 32, 225–234. [Google Scholar] [CrossRef] [Green Version]
Nan, L.; Wonka, P. PolyFit: Polygonal Surface Reconstruction from Point Clouds. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2372–2380. [Google Scholar] [CrossRef] [Green Version]
Sharma, G.; Goyal, R.; Liu, D.; Kalogerakis, E.; Maji, S. CSGNet: Neural Shape Parser for Constructive Solid Geometry. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5515–5523. [Google Scholar] [CrossRef] [Green Version]
Yu, F.; Chen, Z.; Li, M.; Sanghi, A.; Shayani, H.; Mahdavi-Amiri, A.; Zhang, H. CAPRI-Net: Learning Compact CAD Shapes with Adaptive Primitive Assembly. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar] [CrossRef]
Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic Graph CNN for Learning on Point Clouds. ACM Trans. Graph. 2019, 38, 146. [Google Scholar] [CrossRef] [Green Version]
Comaniciu, D.; Meer, P. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 603–619. [Google Scholar] [CrossRef] [Green Version]
Qian, G.; Li, Y.; Peng, H.; Mai, J.; Hammoud, H.; Elhoseiny, M.; Ghanem, B. PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies. In Advances in Neural Information Processing Systems; Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2022; Volume 35, pp. 23192–23204. [Google Scholar]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; NIPS’17; pp. 5105–5114. [Google Scholar]
Ma, X.; Qin, C.; You, H.; Ran, H.; Fu, Y. Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework. In Proceedings of the International Conference on Learning Representations, Virtual Event, 25–29 April 2022. [Google Scholar]
Zhao, H.; Jiang, L.; Jia, J.; Torr, P.H.; Koltun, V. Point Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 16259–16268. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef] [Green Version]
Lu, X.; Yao, J.; Tu, J.; Li, K.; Li, L.; Liu, Y. Pairwise linkage for point cloud segmentation. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 3, 201–208. [Google Scholar] [CrossRef] [Green Version]
Alliez, P.; Giraudot, S.; Jamin, C.; Lafarge, F.; Mérigot, Q.; Meyron, J.; Saboret, L.; Salman, N.; Wu, S.; Yildiran, N.F. Point Set Processing. In CGAL User and Reference Manual, 4th ed; CGAL Editorial Board: New York, NY, USA, 2022. [Google Scholar]
Kettner, L.; Meyer, A.; Zomorodian, A. Intersecting Sequences of dD Iso-oriented Boxes. In CGAL User and Reference Manual, 3rd ed; CGAL Editorial Board: New York, NY, USA, 2021. [Google Scholar]
Nan, L.; Sharf, A.; Zhang, H.; Cohen-Or, D.; Chen, B. SmartBoxes for interactive urban reconstruction. In ACM Siggraph 2010 Papers, Siggraph 2010; ACM: New York, NY, USA, 2010. [Google Scholar] [CrossRef]
Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual. 2021. Available online: https://www.gurobi.com (accessed on 27 April 2023).
Zhou, Q.; Jacobson, A. Thingi10K: A Dataset of 10, 000 3D-Printing Models. arXiv 2016, arXiv:abs/1605.04797. [Google Scholar]
Koch, S.; Matveev, A.; Williams, F.; Alexa, M.; Zorin, D.; Panozzo, D.; Files, C.A.D. ABC: A Big CAD Model Dataset For Geometric Deep Learning. arXiv 2019, arXiv:1812.06216v2. [Google Scholar]
Wang, X.; Xu, Y.; Xu, K.; Tagliasacchi, A.; Zhou, B.; Mahdavi-Amiri, A.; Zhang, H. PIE-NET: Parametric Inference of Point Cloud Edges. Adv. Neural Inf. Process. Syst. 2020, 33, 20167–20178. [Google Scholar]
Kuhn, H.W. The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 1955, 2, 83–97. [Google Scholar] [CrossRef] [Green Version]
Bernardini, F.; Mittleman, J.; Rushmeier, H.; Silva, C.; Taubin, G. The Ball-Pivoting Algorithm for Surface Reconstruction. IEEE Trans. Vis. Comput. Graph. 1999, 5, 349–359. [Google Scholar] [CrossRef]
Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the ICLR, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar] [CrossRef] [Green Version]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Guennebaud, G.; Gross, M. Algebraic point set surfaces. In Proceedings of the ACM SIGGRAPH Conference on Computer Graphics, San Diego, CA, USA, 5–9 August 2007. [Google Scholar] [CrossRef]
Cignoni, P.; Callieri, M.; Corsini, M.; Dellepiane, M.; Ganovelli, F.; Ranzuglia, G. MeshLab: An Open-Source Mesh Processing Tool. In Proceedings of the Eurographics Italian Chapter Conference, Salerno, Italy, 2–4 July 2008; Scarano, V., Chiara, R.D., Erra, U., Eds.; The Eurographics Association: Crete, Greece, 2008. [Google Scholar] [CrossRef]
Chandra, R.; Dagum, L.; Kohr, D.; Menon, R.; Maydan, D.; McDonald, J. Parallel Programming in OpenMP; Morgan Kaufmann: Burlington, MA, USA, 2001. [Google Scholar]
Mérigot, Q.; Ovsjanikov, M.; Guibas, L.J. Voronoi-based curvature and feature estimation from point clouds. IEEE Trans. Vis. Comput. Graph. 2011, 17, 743–756. [Google Scholar] [CrossRef] [Green Version]
Zhuang, Y.; Zou, M.; Carr, N.; Ju, T. Anisotropic geodesics for live-wire mesh segmentation. Comput. Graph. Forum 2014, 33, 111–120. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Comparison results between our method and the previous state-of-the-art methods (ParSeNet [9] and HPNet [10]) for primitive detection and reconstruction. Our method aims to preserve sharp boundary features and ultimately produce high-fidelity mesh models that are close to the ground truth.

Figure 2. Results of our method combined with RANSAC [5] applied to real-scanned urban building point clouds. Our proposed modules can be used as plug-and-play post-processing modules to produce sharp and lightweight high-quality mesh models. (a) Input points, (b) RANSAC segments, (c) our refined submodule outputs, (d) our selection module outputs, (e) reconstructed mesh model.

Figure 3. The pipeline of our framework.

Figure 4. Primitive detection network. Learnable encoder predicts embedding descriptors, primitive type prediction vectors, and shape parameter prediction vectors from input point clouds. Then the mean-shift module clusters embedding descriptors through geometric consistency and smoothness consistency to produces primitive segments.

Figure 5. Pairwise splitting. (a) A pair of intersecting triangles

Δ a

and

Δ b

. (b) Fit

Δ b

to a plane

\bar{b}

. (c) Split

Δ a

into three small triangles (

Δ p_{1} a_{3} p_{2}

,

Δ p_{1} p_{2} a_{2}

, and

Δ p_{1} a_{2} a_{1}

) according to intersection points

p_{1}

and

p_{2}

, then divide them according to vectors

\vec{v}

and

\vec{u}

. (d) Process the triangles on intersection line.

Figure 5. Pairwise splitting. (a) A pair of intersecting triangles

Δ a

and

Δ b

. (b) Fit

Δ b

to a plane

\bar{b}

. (c) Split

Δ a

into three small triangles (

Δ p_{1} a_{3} p_{2}

,

Δ p_{1} p_{2} a_{2}

, and

Δ p_{1} a_{2} a_{1}

) according to intersection points

p_{1}

and

p_{2}

, then divide them according to vectors

\vec{v}

and

\vec{u}

. (d) Process the triangles on intersection line.

Figure 6. Partition triangles in nonintersecting area.

Figure 7. (a) There may be data missing (self-defective or discarded by the primitive detection module) and nonunique intersections at boundaries. (b) Reconstruction result obtained solely through the data fitting term. (c) Reconstruction result with the 3D structural similarity added.

Figure 8. (a–k) Primitive segmentation and surface reconstruction results. From top to bottom: input, ground truth, segments produced by ParSeNet, surfaces fitted by ParSeNet, segments produced by HPNet, surfaces produced by HPNet and ball pivoting [50], our segments produced by primitive detection module, and final reconstructed models.

Figure 9. Comparison with deep-learning-based reconstruction methods.

Figure 10. Comparison with traditional meshing methods (top: Vase; bottom: Fandisk).

Figure 11. Heatmap of geometric error (top: Vase, bottom: Fandisk). To visualize the reconstruct error, the point cloud is colored based on the distance between each point and the surface of the reconstructed model (Figure 10). The color scale ranges from blue (representing a shorter distance) to red (indicating a longer distance).

Figure 12. Demonstration of the effectiveness of our framework combined with edge extraction. From top to bottom: input, results produced by three different types of edge detection methods, results after clustering and segmentation, candidate surfaces produced by the proposed mesh splitting module, and final outputs of selection module. Panels (a,b) are examples of VCM [57], (c,d) are examples of the learning-based method PIE-Net [48], and (e,f) are examples of the semiautomatic method Live-Wire [58].

Figure 13. Our method can easily handle meshes represented by low-density triangles without sacrificing boundary quality, because we do not require dense triangles to ensure high-fidelity. Users can choose triangle density according to their needs.

Figure 14. Reconstruction results of the CAD component with increasing Gaussian noise. Top: input. Bottom: reconstruction. (a) Input without noise. (b)

σ = 0.01 d

. (c)

σ = 0.04 d

. (d)

σ = 0.06 d

.

Figure 14. Reconstruction results of the CAD component with increasing Gaussian noise. Top: input. Bottom: reconstruction. (a) Input without noise. (b)

σ = 0.01 d

. (c)

σ = 0.04 d

. (d)

σ = 0.06 d

.

Figure 15. Reconstruction of missing data. (a) Point cloud with missing data. (b) Candidate surfaces. (c) Reconstruction result.

Table 1. Benchmark evaluation on primitive detection module and baseline approaches.

Method	Seg-IoU (%)	Type-IoU (%)	Throughput (ins./sec.)
SPFN [8]	73.41	80.04	21
ParSeNet [9]	82.14	88.6	8
HPNet [10]	85.24	91.04	8
Ours	88.42	92.85	28

Table 2. Ablation analysis of improvements.

Improvements	Seg-IoU (%)	Type-IoU (%)	Throughput (ins./sec.)
Baseline (HPNet [10])	85.24	91.04	8
+DGCNN [34] → PointNeXt-b [36]	86.88	92.23	28
+Adam [53] → AdamW [51]	87.34	92.74	28
+Step → Cosine	87.50	92.67	28
+Label Smoothing [52]	88.42	92.85	28

Table 3. Statistics on the examples presented in Figure 10.

	Vase		Fandisk
	Faces.	Sec.	Faces.	Sec.
Screened Possion [18]	9996	1.98	40,028	3.08
APSS [54]	9996	1.59	17,815	4.34
RIMLS [20]	9996	2.41	17,801	6.63
EAR [21]	181,170	128.98	272,593	202.39
PolyFit [31]	38	1.81	17	3.39
Ours	4068	7.92	6403	69.09
Ours + OpenMP [56]	4068	2.26	6403	7.35

Table 4. Statistics on the examples presented in Figure 11.

Method	Vase				Fandisk
	Shortest dis. ( $10^{- 9}$ mm)	Hausdorff dis. ( $10^{- 3}$ mm)	Mean dis. ( $10^{- 3}$ mm)	Median dis. ( $10^{- 3}$ mm)	Shortest dis. ( $10^{- 9}$ mm)	Hausdorff dis. ( $10^{- 3}$ mm)	Mean dis. ( $10^{- 3}$ mm)	Median dis. ( $10^{- 3}$ mm)
Screened Possion [18]	27.35	8.014	1.766	1.390	117.8	20.24	2.983	2.572
APSS [54]	149.0	5.579	1.120	0.897	26.28	19.32	3.051	2.806
RIMLS [20]	31.28	5.663	1.296	1.022	97.25	19.86	3.515	3.303
EAR [21]	8.227	10.57	1.647	1.152	197.7	19.95	3.843	3.575
PolyFit [31]	3525	210.7	53.52	36.32	1586	260.4	45.34	31.94
Ours	0.003	8.132	1.499	1.343	15.82	18.85	1.162	1.008

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Q.; Xu, S.; Xiao, J.; Wang, Y. Sharp Feature-Preserving 3D Mesh Reconstruction from Point Clouds Based on Primitive Detection. Remote Sens. 2023, 15, 3155. https://doi.org/10.3390/rs15123155

AMA Style

Liu Q, Xu S, Xiao J, Wang Y. Sharp Feature-Preserving 3D Mesh Reconstruction from Point Clouds Based on Primitive Detection. Remote Sensing. 2023; 15(12):3155. https://doi.org/10.3390/rs15123155

Chicago/Turabian Style

Liu, Qi, Shibiao Xu, Jun Xiao, and Ying Wang. 2023. "Sharp Feature-Preserving 3D Mesh Reconstruction from Point Clouds Based on Primitive Detection" Remote Sensing 15, no. 12: 3155. https://doi.org/10.3390/rs15123155

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sharp Feature-Preserving 3D Mesh Reconstruction from Point Clouds Based on Primitive Detection

Abstract

1. Introduction

2. Related Work

2.1. Surface Reconstruction

2.1.1. Non-Learning-Based Methods

2.1.2. Learning Based Methods

2.2. Primitive Detection

2.2.1. Non-Learning-Based Methods

2.2.2. Learning-Based Methods

3. Method

3.1. Primitive Detection Module

3.1.1. Coarse Primitive Detection Based on Supervised Learning

3.1.2. Refine Clustering via Normal Angle

3.2. Mesh Fitting and Splitting Module

3.3. Selection Module

3.3.1. Energy Terms

3.3.2. Optimization

4. Results

4.1. Datasets

4.2. Experiment Details and Results Analysis

4.3. Exploratory Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI