1. Introduction
With the acceleration of urbanization and the development of BIM technology, it has become increasingly important to efficiently and accurately acquire building information models (BIMs) for existing buildings. As a digital building model that integrates a large amount of information, BIM supports the entire life cycle management of buildings using digital technology. In addition to simple geometric information, the building components contained in BIM also include the building’s engineering information. This ensures that BIM covers every aspect of the building, from design to construction to operation and maintenance. However, many existing buildings do not have corresponding digital models that were developed during design and construction due to their age and thus cannot be integrated into modern construction management systems. Therefore, it is necessary to collect and analyze the data for existing buildings and then reconstruct BIMs to realize the digitization of existing buildings.
During BIM reconstruction of existing buildings, there were several problems that were encountered when using traditional methods. Manual mapping or reconstruction based on point clouds obtained from 3D laser scanning [
1] can accurately restore building data but are often time-consuming and costly, especially when performing large-scale BIM reconstruction [
2]. In recent years, with the continuous advancement of high-precision drone and satellite technologies, the technique of BIM reconstruction with the help of easy-to-acquire orthophotos and DSMs has emerged not only because it can significantly reduce the time and cost of data acquisition [
3], but also because it meets the urgent demand for large-scale BIM reconstruction of existing buildings [
4]. Orthophotos provide distortion-free top-view images of buildings [
5], which are essential for accurately determining building boundaries. Meanwhile, a DSM provides elevation information on a building and its surroundings, which can be used to determine the building’s height. Combining these two data sources enables the automated BIM reconstruction of buildings. This process eliminates the need for costly ground surveys or direct physical contact, significantly improving the efficiency of data collection and reducing acquisition costs [
6].
The greatest challenge in automatic BIM reconstruction using orthophotos and DSMs is the need to extract accurate and regularized building boundaries from the orthophotos and DSMs. Although previous research [
7,
8] employed deep-learning and image-processing methods to achieve regularized building boundary extraction, most of these studies rely on either orthophotos or DSMs as a single data source. The surface texture and color information of buildings as determined from orthophotos, as well as the surface elevation changes provided by DSMs, are valuable data. However, to extract accurate and regular building boundaries from two data sources, a feature extraction method that can integrate different data types is needed.
Moreover, in past building reconstruction research, the city geography markup language (CityGML) standard was commonly used to construct three-dimensional building models [
9,
10,
11]. The three-dimensional models reconstructed based on the CityGML standard contained geometric information about building components. However, these models were relatively limited in their cover of the building’s intrinsic engineering properties and performance information and could not expand upon or supplement subsequent information, leading to limitations of CityGML in engineering applications. As a new and constantly developing digital technology, BIMs are more widely used in the design, construction and operation, and maintenance management stages of actual construction projects because of their more advanced information organization and interactivity [
12]. Nonetheless, because the automated interaction between BIMs and external data is more complex, there are relatively few studies that cover automatically reconstructing BIM using orthophotos and DSMs.
In view of these challenges, this study proposes a new method of automatically reconstructing BIM from orthophotos and DSM data. First, a deep-learning network that fuses orthophotos and DSM data are fused to more accurately accomplish building footprint extraction. Then, polygon optimization with empirical optimization is used to accurately extract polygonal building boundaries. Finally, the elevation information in DSMs is utilized to obtain the building height, and the building boundary and height information is used by Dynamo to reconstruct the BIM in Autodesk Revit.
2. Literature Review
Three-dimensional building reconstruction techniques based on orthophotos and DSMs have attracted extensive academic attention. Partovi et al. [
13] used methods combining DSMs with multispectral satellite orthophotos to achieve the extraction, decomposition, and connection of building boundaries, culminating in the reconstruction of three-dimensional building models. Mao et al. [
14] employed deep-learning techniques to predict potential DSMs from orthophotos, aiming to reconstruct three-dimensional building models using the input orthophotos. Gui et al. [
8] adopted a model-driven strategy to extract features from DSMs and ultra-high-resolution satellite orthophotos for the reconstruction of building models. Yu et al. [
15] replaced the traditional multi-view stereo vision (MVS) methods with deep-learning strategies, transforming multi-angle drone images into DSMs and orthophotos for the purposes of three-dimensional building reconstruction. These studies generate building models that conform to the CityGML standard and are widely applied in the field of Geographic Information Systems (GISs), but there is considerably less research on BIM reconstruction.
Mainstream building reconstruction studies have usually incorporated a data-driven framework with three key steps: semantic segmentation of buildings, boundary extraction, and 3D model reconstruction. In the semantic segmentation phase, deep-learning methods, especially classical convolutional neural networks (CNNs) [
16,
17,
18] and the latest Transformer architecture [
19] have become the preferred techniques due to their excellent classification performance in complex contexts. These techniques are able to improve the accuracy of building extraction compared to that of traditional imaging methods. At the same time, multi-modal semantic segmentation technology that integrates deep orthophotos and DSMs continues to emerge. SA-gate [
20] introduced a unified and efficient cross-modal guided encoder to improve the performance of RGB-D semantic segmentation. Zhou et al. [
21] developed an end-to-end feature extraction and gate fusion network, CEGFNet, to achieve high-precision semantic segmentation from DSMs and orthophotos.
Subsequently, semantically segmented images of buildings are usually postprocessed during a boundary extraction session. The simplest approach is to apply the Marching Cubes algorithm [
22] and Douglas–Peucker algorithm [
23] to simplify the building boundaries or to use algorithms to regularize them. There have been studies dedicated to combining the semantic segmentation of buildings with the regularized extraction of contours with the aim of simplifying the process through an end-to-end deep-learning approach. By learning to build a segmentation map that is aligned with the proposed frame field vectors, Frame-Field [
24] obtains building polygons through corner-aware contour simplification. Zorzi et al. [
25] proposed polyworld, which uses CNNs to detect building boundary vertices and employs graph neural networks to predict the connection strength between vertices, ultimately realizing regularized boundary extraction by solving a differentiable optimal transport problem. Xu et al. [
26] used a hierarchical supervision mechanism to complete polygonal building extraction using deep learning. Although these approaches show potential for automation, they rely on large-scale datasets with long training times. This has led to studies suggesting that combining deep learning with postprocessing may be more practical [
27]. However, there are few studies on how to fully utilize the features of orthophoto and DSM data and apply both data types to the two phases of deep learning and postprocessing.
4. Experiments
4.1. Datasets
To evaluate the performance of the BIM reconstruction method proposed in this article under different scales and GSD datasets, two open large-scale datasets were selected: the Tianjin dataset and the Urban 3D dataset. Specifically, the Tianjin dataset contains comprehensive urban characteristics and a large number of complex buildings, making it beneficial for evaluating the accuracy of our method. The Urban 3D dataset contains satellite images of multiple cities, providing a reference for large-scale urban reconstruction. These two datasets were collected by drones and satellites, respectively, and have different GSDs, which can be used to verify the robustness of the proposed method.
4.1.1. Tianjin Dataset
The Tianjin dataset [
15] was collected in Tianjin, China, in an area consisting of two typical architectural styles—residential townhouses and industrial buildings—by using a drone-mounted camera to capture high-resolution images, and the captured photographs were used to obtain orthophotos and a DSM with a GSD of 0.1 m by using smart3D 2019 software with the MVS technique. The building boundaries were delineated in the orthophotos through a manual annotation method. The base and top elevations of each individual building were initially measured based on a statistical analysis of the building’s neighborhood elevation values which was then checked and edited manually to determine the building’s height. In this study, based on the classification method proposed in the original paper, images of Area 1 and Area 3 were used as the training set, which contained a total of 539 individual buildings; Area 2 was used for testing and validation and consisted of 243 buildings.
Figure 9 shows the orthophotos and the corresponding 3D reconstruction model for the Tianjin dataset.
4.1.2. Urban 3D Dataset
The Urban 3D dataset [
37] was generated by the WorldView-3 satellite and contains RGB orthophotos and corresponding DSMs and DEMs. The data cover over 360 square kilometers of terrain and contain approximately 157,000 buildings, all with a GSD of 0.5 m. The dataset provides the true building footprint. This study uses a DSM to subtract the DEM to obtain the nDSM. Based on the real building footprint, the maximum value in that footprint is obtained from the nDSM as the verification value of the building height.
Figure 10 shows an image of the Urban 3D dataset. The source images include a resolution of 2048 × 2048, of which 174 images are used for training and 62 images are used for verification and testing.
4.2. Evaluation Metrics
4.2.1. Building Footprint and Boundary Extraction
The semantic segmentation task is essentially a per-pixel-level classification task, and the confusion matrix, a standard tool for evaluating classification performance, contains four primary classification results: true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The relationships of these concepts are shown in
Figure 11.
In order to evaluate the effectiveness of building footprint and boundary extraction, this study adopted three performance metrics that are widely recognized in the industry: intersection over union (IoU), recall, and precision.
4.2.2. BIM Reconstruction
In order to evaluate the performance of the automated BIM reconstruction method that was proposed, the same evaluation metrics were used as in previous studies [
15]. The accuracy of individual building reconstruction was first evaluated by using the 3D IoU, which was obtained by multiplying the IoU of the semantic segmentation of building footprints with the corresponding height IoU. Specifically, the cross-intersection between each extracted individual building footprint and its closest ground-truth building footprint was calculated to obtain the IoU of the building footprint’s semantic segmentation. The IoU height was the quotient of the predicted building height within a single building footprint and its closest ground-truth building height. After assessing the reconstruction accuracy for individual buildings, the effectiveness of the BIM reconstruction in the overall area was assessed using object-based completion metrics.
Here, TP represents the number of buildings with 3DIoU > 0.5 among all completed reconstructed buildings and FP represents the number of buildings with 3DIoU ≤ 0.5 among the reconstructed buildings.
4.3. Experimental Setup
In the building footprint extraction phase, the performance of the model was highly dependent on the selection of hyperparameters. Through comparative analysis, we determined the optimal configuration. We adopted the Segformer model as the backbone of the semantic segmentation network and pre-trained it on the PASCAL VOC dataset, and we used mit-b2 for the pre-training weights. The other specific hyperparameter settings are shown in
Table 2.
In the boundary extraction stage, the parameters of the rules of thumb needed to be set manually; the GSD of the Tianjin and Urban 3D datasets corresponded to 0.1 m and 0.5, respectively, area thresholds of Td = 2000 and Td = 80 were defined to exclude buildings with an area of less than 20 square meters, length thresholds of Ts = 10 and Ts = 2 were set to exclude walls with a boundary length of less than 1 m, and α = π/6 was set to remove overly sharp nodes, whereas β = 17 π/18 was used to remove overly smooth nodes. During the height extraction stage, the maximum value of the nDSM within the boundary is used as the building height.
6. Discussion
Figure 14 and
Figure 15 demonstrate the results of this research method when applied to the reconstruction of a large-scale BIM. As can be seen in the figure, the BIMs reconstructed by applying this study’s method demonstrated neat building boundaries. Although there were errors in the reconstruction of some buildings, these could be corrected by a small amount of manual postprocessing. Compared with completely manual modeling methods, this method significantly reduces the need and time required for manual corrections and improves overall work efficiency.
The proposed method can be effectively used to accomplish BIM reconstruction for most buildings, but it still has limitations in some cases as shown in
Figure 16 and
Figure 17. Due to the limitations of semantic segmentation, two closely adjacent buildings cannot be correctly separated in the initial stage, and two adjacent buildings may be recognized as one connected building, which will cause errors in the final BIM reconstruction; this can be solved by using instance segmentation in the future. In addition, the proposed regularized boundary extraction method is only applicable to regular and conventional buildings and will be simplified when dealing with buildings with circular boundaries.
In summary, the method proposed in this paper provides a feasible and highly accurate solution for large-scale BIM reconstruction, but it does have limitations in terms of the reconstruction of complex and irregular structures. The theoretical significance of this research lies in its regularized building contour extraction of orthophotos and DSMs through the CMX network and postprocessing-based approach, which helps to expand the idea of automated BIM reconstruction. Meanwhile, this research has important implications for the urban planning, facility management, and construction industries, where this automated, accurate BIM reconstruction approach can significantly increase efficiency and reduce costs.
7. Conclusions
In this study, an innovative approach to automated BIM reconstruction that utilized orthophotos with DSMs to achieve large-scale BIM reconstruction was proposed. The main conclusions are as follows.
The method proposed in this study is capable of obtaining regularized contour and elevation information of buildings from orthophotos and DSMs and of realizing automated batch reconstruction work by using Dynamo 2.1. The application to the Tianjin and Urban 3D datasets proved the effectiveness of the method, and the rate of correct reconstruction reached 85.61% and 82.93%, respectively. These results show that the method proposed in this paper represents a powerful tool for achieving high efficiency and high correctness during large-scale BIM reconstruction.
In this study, it was verified that the CMX network can fully integrate the multimodal features of orthophotos and DOM for the semantic segmentation of building footprints, and the segmentation results were better than those of methods that used RGB. The IoU in building semantic segmentation with the CMX network reached 96.71% and 92.78% on the Tianjin and Urban 3D datasets, respectively. The significant improvement in this indicator sets a new benchmark for subsequent research.
The two-stage regular building boundary extraction method proposed in this study using polygon optimization and empirical rules for postprocessing semantically segmented images was able to accurately extract regularized building boundaries. It provides a new direction for subsequent regularized building boundary extraction.
The significance of this study is that it proposes a novel approach to large-scale BIM reconstruction. The method proposed in this paper can greatly reduce the time and labor required for traditional BIM reconstruction, thereby enabling more frequent updates of the BIM database and supporting the rapid development of smart city initiatives. Furthermore, the high accuracy of our reconstruction method can greatly improve the reliability of BIM data, which are crucial for urban planning, disaster management, and renovation projects.
The method proposed in this paper can complete building extraction and boundary optimization tasks. In addition, various advanced optimization algorithms [
39,
40] have previously been used as solutions in many fields such as scheduling [
41,
42] and optimization [
43]. For the decision problem involved in this study, the method proposed in this paper can be compared with other advanced optimization algorithms in future research.