Labelled Indoor Point Cloud Dataset for BIM Related Applications

Abreu, Nuno; Souza, Rayssa; Pinto, Andry; Matos, Anibal; Pires, Miguel

doi:10.3390/data8060101

Open AccessData Descriptor

Labelled Indoor Point Cloud Dataset for BIM Related Applications

by

Nuno Abreu

^1,*

,

Rayssa Souza

¹

,

Andry Pinto

^1,2

,

Anibal Matos

^1,2 and

Miguel Pires

³

¹

INESC TEC, 4200-465 Porto, Portugal

²

Faculty of Engineering, University of Porto, 4200-465 Porto, Portugal

³

Grupo Casais, 4700-565 Braga, Portugal

^*

Author to whom correspondence should be addressed.

Data 2023, 8(6), 101; https://doi.org/10.3390/data8060101

Submission received: 18 May 2023 / Revised: 29 May 2023 / Accepted: 29 May 2023 / Published: 1 June 2023

(This article belongs to the Section Spatial Data Science and Digital Earth)

Download

Browse Figures

Versions Notes

Abstract

:

BIM (building information modelling) has gained wider acceptance in the AEC (architecture, engineering, and construction) industry. Conversion from 3D point cloud data to vector BIM data remains a challenging and labour-intensive process, but particularly relevant during various stages of a project lifecycle. While the challenges associated with processing very large 3D point cloud datasets are widely known, there is a pressing need for intelligent geometric feature extraction and reconstruction algorithms for automated point cloud processing. Compared to outdoor scene reconstruction, indoor scenes are challenging since they usually contain high amounts of clutter. This dataset comprises the indoor point cloud obtained by scanning four different rooms (including a hallway): two office workspaces, a workshop, and a laboratory including a water tank. The scanned space is located at the Electrical and Computer Engineering department of the Faculty of Engineering of the University of Porto. The dataset is fully labelled, containing major structural elements like walls, floor, ceiling, windows, and doors, as well as furniture, movable objects, clutter, and scanning noise. The dataset also contains an as-built BIM that can be used as a reference, making it suitable for being used in Scan-to-BIM and Scan-vs-BIM applications. For demonstration purposes, a Scan-vs-BIM change detection application is described, detailing each of the main data processing steps.

Dataset: https://doi.org/10.5281/zenodo.7948116

Dataset License: Creative Commons Attribution 4.0 International License (CC BY 4.0).

Keywords:

laser scanning; point cloud; indoor reconstruction; BIM; scan-to-BIM; scan-vs-BIM

1. Summary

With the development of building information modelling (BIM) and reality capture technologies, such as laser scanning and Photogrammetry in the architecture, engineering, and construction (AEC) domain, activities such as construction progress monitoring, quality control, or even full 3D model reconstruction, are becoming increasingly automated [1]. There is an ongoing effort to increase energy efficiency levels of the building stock across the European Union [2], but unfortunately there is a general lack of information on its current state as as-built information is frequently inexistent. Producing this documentation is a time and cost consuming process when using traditional manual labour. Modern data acquisition methods, such as the laser scanner, are being used to obtain the 3D geometric features of the building aiming to improve the accuracy of the data obtained and to reduce the acquisition time. The acquired data, a point cloud, are then used to generate the as-built model of the building. Currently, the BIM format is favoured due to the possibilities of collaboration and building lifecycle management that it enables. This process is known as Scan-to-BIM, and is an actively researched topic in the AEC and computer vision (CV) domains. The Scan-vs-BIM process is another active topic closely related to progress monitoring and quality control, and consists of automatically identifying 3D model objects by aligning a point cloud of a construction site with an as-designed 3D BIM model and comparing the two according to some metrics [3].

A topic gaining the interest of the research community in the last decade is the 3D reconstruction of indoor environments. It is a challenging process in terms of its ability to automatically provide a geometrically accurate and semantically rich indoor model. Furthermore, it also constitutes an essential process for many applications, such as indoor navigation [4], disaster management [5], and facility maintenance [6].

Indoor environment 3D reconstruction presents specific challenges due to the particularly irregular layout of the buildings and the existence of clutter and occlusions [7]. For example, inefficient scan location planning causing deficient coverage may lead to the acquisition of insufficient data about elements such as walls, floors, and windows during data collection, which can greatly affect the performance of the reconstruction algorithms. Additionally, this lack of data may hinder the determination of the topological relationships between the indoor spaces.

To address these challenges, several techniques have been proposed in recent years as this has been active research topic in both the AEC and CV domains. A search for related keywords in the Web of Science platform allows for the retrieval of pertinent information regarding the related scientific work carried out. Figure 1 summarizes the publication statistics, regarding the time frame from 2010 to 2022. These keywords included permutations of “indoor reconstruction”, “layout estimation”, “modelling”, and “point cloud.” From 2012 onwards, a substantial increase in the number of publications can be seen, suggesting the growing importance of this research topic. Therefore, this demonstrates the impact that datasets such as the one provided here can have on the related upcoming research.

Furthermore, in the robotics domain, the BIM data and acquired point clouds can be used to determine the position and pose of the robot by applying similar geometry extraction algorithms to the ones used in the above processes [8].

Despite this growing awareness, few datasets adequate for research and development of Scan-to-BIM and Scan-vs-BIM methodologies can be found that are publicly accessible and well documented. The EU FP7 funded Durable Architectural Knowledge (DURAARK) project was a three-year project that aimed to develop methods for the long-term preservation of building data, including 3D point clouds and BIM models as well as the related metadata, knowledge, and Web data. The project covered a wide range of processes and methods related to the domain of architectural 3D content. As of today, although this processes and the methods used are still found online via OpenAIRE [9], their data repository, which contained similar datasets to the ones presented here, is no longer accessible [10]. It is reportedly offline and one of the reasons that contributes to this certainly was the decision to host the data on a non-specialized self-hosted repository even after the end of the project.

An interesting 3D reconstruction benchmarking initiative held since 2017, named the ISPRS benchmark on indoor modelling, aimed at enabling a direct comparison of different methods for generating 3D indoor models from point cloud data by providing a public benchmark dataset and an evaluation framework. After the initiative ended, the analysis of the results reported by the authors [11] indicated that the reconstructed methods displayed different performances across the datasets, demonstrating the importance of having distinct datasets to assess performance, with different features and degrees of complexity. Interestingly, they also state that the generated models displayed higher correctness if points corresponding to clutter are classified and filtered in a pre-processing step before geometric feature extraction. Additionally, the presence of clutter in the point cloud was identified as one of the main challenges in automated indoor modelling as clutter (1) can easily be confused with structural elements and (2) occludes the mentioned elements, causing missing data in the point cloud. Unfortunately, after these results were published the benchmarks datasets webpage went offline [12]. It is understandable that this would happen knowing that the purpose of the dataset was to benchmark the reconstruction algorithms and the study with the results had already been published. Other reason that contributed to this was, as in the previous case, the decision to host the data on a non-specialized self-hosted repository.

Another dataset is hosted in a non-specialized repository for data storage (e.g., GitHub) and presents a complete dataset description, but does not provide a labelled point cloud [13].

Overall, even if the unavailable datasets were in fact available, they are not labelled, thus not contributing to the development of CV techniques which enable object classification and efficient clutter removal. The complex interactions between objects, clutter, and occlusions create difficulties for feature extraction algorithms especially in indoor environments, therefore making this dataset valuable to the CV community.

The data made available here were acquired within the Digital Construction Revolution—REV@CONSTRUCTION—project, which was one of the use cases used for validating a Scan-vs-BIM application developed within the same project (see Section 6).

This data descriptor contributes to the goal of having a reference dataset by presenting an indoor mapping dataset and information about the major elements present in the scene like walls, floor, ceiling, windows, and doors, as well as furniture, movable objects, and clutter. In addition, the diverse features of the sampled spaces (e.g., design use of the spaces, type of objects, excessive clutter, type of windows, and size) further adds to this value.

Datasets such as the one provided here are particularly useful for:

Validating Scan-to-BIM and Scan-vs-BIM algorithms:

These algorithms involve processing of the acquired point clouds in order to extract structural features for modelling purposes. As mentioned, there is a growing interest in automated reconstruction techniques and datasets including high-quality, high-resolution point clouds and corresponding BIM models are required for validating them for benchmarking purposes.

2.: Tackling one of the main challenges in indoor reconstruction—clutter and occlusions:

A distinctive aspect in indoor modelling is the large amount of clutter and occlusion that may be present in the scene. Recent literature [14,15,16] identifies this as one of the main issues faced by indoor reconstruction algorithms. This is the only dataset found online with labelled elements and clutter, which allows testing of new techniques to deal with this issue.

3.: Development of BIM-based path planning and localization algorithms, as well as perception systems for mobile robots:

No standard format to transfer structural knowledge to a robot is found in the literature, with BIM being a good candidate. BIM models enable complete transfer of detailed information about the structure of a building in a structured manner. Robotics systems can use this information for evaluating path planning techniques, generating optimal paths that enable them to navigate through the building space. Additionally, through simulation, these models can be used as a reference to enable the robot to localize itself, accurately determine their position and orientation within the building, and validate procedures that enable them to perform their tasks [17]. Furthermore, the labelled high-resolution point cloud enables the creation of a semantically rich Digital Twin which can be used to test perception systems [18,19].

The rest of this document is organized as follows. In Section 2 a brief background on Scan-to-BIM and Scan-vs-BIM applications is given. Section 3 provides the description of the created dataset. Then, Section 4 describes the executed methodology to generate and pre-process the dataset, mentioning the sensor and software used. Next, Section 5 presents a discussion on the challenges faced to obtain the dataset and provides a few insights into scanning different types of buildings. Section 6 presents a common use case for this dataset, automated change detection. Finally, Section 7 discusses the potential research applications.

2. Background

BIM can be regarded as a process responsible for generating and managing building information during its entire lifecycle. The diverse information contained in a BIM model together with its 3D representation ability makes it possible to enhance the design, construction, and maintenance phases of the corresponding building. One key factor that enables a successful BIM implementation is the existence of accurate information in the BIM model. However, the contained information is often inaccurate, out of date, or even missing. Modern sensing technologies such as 3D laser scanning, have been widely adopted in the AEC domain as a mean to capture this information. In essence, two BIM related processes use the acquired point clouds to tackle these issues:

Scan-to-BIM: the process of transforming point cloud data into actual BIM models;
Scan-vs.-BIM: the process of comparing point cloud data representing an as-built building to its as-designed BIM model in order to identify differences.

Currently, both processes are predominantly manual. While the acquired point clouds are used as accurate references for creating models instead of unstructured manual measurements, improving modelling efficiency, the actual modelling phase remains a monotonous, error prone, and time-consuming manual process. Such processes can be significantly influenced by the expertise of the modellers, which may not be much. Hence, current research has been focusing on automating the steps involved in the modelling phase [20].

The current Scan-to-BIM process adopted by the community involves manually feeding the point cloud data into BIM development applications such as Autodesk Revit, Bentley, or Graphisoft ArchiCAD, and then designing the BIM models according to those data. Although manageable for simple projects, it becomes time consuming and error prone for large-scale projects with many structural elements and complex geometries. This led the research community to come up with semi-automated or automated Scan-to-BIM methodologies to overcome these limitations. In particular, as-built BIM reconstruction involves two distinct steps, namely: modelling of building elements and modelling of non-geometric attributes. In the first step, geometric modelling is performed and objects in the scene are recognized. While geometric modelling models individual structural elements from the point cloud data (e.g., plane segmentation), object recognition algorithms label these data into specific classes such as wall, ceiling, floor, window, etc. Thorough reviews of the relevant techniques for automated Scan-to-BIM techniques can be found in [21,22]. Bosché et al. [3] introduced the term “Scan-vs-BIM”, naming a methodology that compares an as-built point cloud to an as-designed BIM model. The generated outputs are a group of BIM elements covered by the point cloud (therefore classified as existing) and another group which is not covered (therefore classified as missing). If the results are accurate, the management of the construction operation is facilitated as deviations from the original construction design and delays in the schedule plan can be automatically recognized and communicated. Since then, other researchers have demonstrated the applicability of Scan-vs-BIM processes for tracking construction progress and quality control [23,24,25]. A state of the art on point cloud-based change detection techniques was recently presented by Stilla [26].

In summary, both Scan-to-BIM and Scan-vs-BIM processes use cases collectively illustrate the importance of datasets containing both 3D point clouds and corresponding BIM models for algorithm benchmarking purposes. The dataset described next then constitutes a valuable contribution to the field, enabling additional tests to be performed in order to demonstrate the efficiency and robustness of the mentioned algorithms.

3. Data Description

The data collected were obtained by scanning five different enclosed spaces using a terrestrial laser scanner (TLS). The floor plan is represented in Figure 2, showing the layout of the scanned area. The dataset includes the registered and labelled point cloud and the as-built BIM model of these spaces. The different data subsets are described in detail below. Table 1 gives a brief description of the files in the dataset.

3.1. Point Cloud

A fully registered point cloud (seen in Figure 3), combining the individual point clouds obtained from 21 scans, is provided in the ASCII format (also known as the PLY format). This format is very versatile for storing point cloud data, as it registers individual point coordinates, colour (RGB), intensity, and other attributes such as the individual point labels. Table 2 has an actual description of the column variables in the provided file.

ASCII files can be easily read using a simple text reader, and further visualized and processed with common software programs for point cloud processing such as CloudCompare [27] or the Autodesk Software Suite (loaded using Recap and exported to Revit or Civil 3D). The big advantage of this format is that it is widely supported and can be easily integrated in point cloud processing pipelines, although at the cost of a bigger file size (not compressed). Table 3 describes the number of scans taken and 3D points for each space while Table 4 describes the labelled point data contained in the file.

Without surprise, the building’s structural elements were by far more extensively sampled then the rest due to their superior visible surface area.

3.2. BIM Model

To carry out the BIM modelling of INESC TEC laboratories, manual measurement techniques were adopted by using a tape measure and a laser measurer to determine the dimensions of the spaces and existing fixed furniture. In the initial stage, the metric survey of the floor plan of the space was carried out in the Autodesk AutoCAD software, in order to simplify the information and make the measurements of the existing floor plan of the project compatible with the actual measurements of the building. Then, the production of the 3D BIM model was carried out in the Autodesk Revit software, using the information of the floor plan in AutoCAD as the basis for the modelling, to consider all the measurements made of all the structural elements (floor, lining, and wall) and all the openings and frames (windows and doors). In the final stage, the existing furniture was modelled individually in order to represent the space as reliably as possible for its subsequent analysis. Then, the model was exported to the IFC format (2 × 3) for distribution. Figure 4 represents the built BIM model provided in the dataset.

4. Methods

The laser scans were taken in March 2023 in the Electrical and Computer Engineering Department of the Faculty of Engineering of the University of Porto, Portugal (where one of the INESC TEC robotic laboratories is located). They were obtained using a Leica BLK360 G2 laser scanner, seen in Figure 5, using the highest resolution available (5 mm at 10 m). The scanning positions were chosen to capture the complete geometric features of the spaces, but it was difficult to do so because of all the objects present (which ended up causing occlusions).

By guaranteeing some overlap between pairs of point clouds, the registration error was minimized. The 21 acquired point clouds were registered using Leica’s Cyclone Register 360, and the average error reported was 3 mm. Some cropping had to be done before registering due to the laser light distortion caused by the windows, as the original point cloud erroneously represented parts of reflected objects on both sides of it. Figure 6 illustrates both the scanning positions and the registration links defined within the mentioned software. The following step involved the pre-processing of the point cloud using CloudCompare, and consisted of further cropping and segmentation. CloudCompare was used for two main reasons: (1) it is a popular open-source software for processing point clouds and (2) it allows point labelling (meaning that it attributes a label to each point in the cloud). Cropping was performed to remove outliers and other erroneous data that were acquired probably due to the high reflectivity of some of the surfaces which caused laser light to scatter. Point cloud labelling is a highly time-consuming and thorough process which involves segmenting the point cloud around the features of interest and labelling all the points within each segmented set. In this case, we defined labels for the main structural elements of the building and for the main objects found in the scene. All the points that did not fit within one category were left unassigned. Due to the added labels, the point cloud was exported in the ASCII format (instead of the original E57 format delivered by Leica’s Cyclone Register 360), which unfortunately increases its size considerably.

5. Discussion

The acquisition of 3D point clouds presents many challenges related to the requirements imposed by the applications that will consume them. Changes in unconstrained environments and the surface materials can have considerable impact on the appearance of objects. While sensor measurements are affected by noise, laser scanner data are negatively affected by the presence of artifacts that manifest themselves as outlying points in the point cloud. Such artifacts may be caused by surface materials with non-diffuse reflection properties, causing low energy backscattering to be returned to the sensor which may be insufficient to trigger a distance measurement (e.g., windows, water, etc). An issue may arise when other adjacent scans with high degree of overlap are able to get different data on these regions. During our data acquisition activities, similar artifacts were present in the point cloud, forcing us to execute a pre-processing analysis to remove them before effectively using it within the proposed use case scenario. As a consequence of this data pruning, additional scans needed to be taken in order enable good registration performance (sub centimetre error) between adjacent point clouds taken on those regions.

Another challenge that was faced during data acquisition was dealing with occlusions, especially considering the nature of this indoor space: constrained and filled with equipment and furniture. Occlusions cause gaps in the point cloud, making it incomplete in the sense that not every element in that space is conveniently sampled. Given the limited size of the described indoor spaces, the occlusions were managed (when possible) by performing additional scans around the objects or structural elements that caused them. Considering the limited energy capacity of the laser scanner, the total data acquisition time and the resources required to process the resulting point cloud (which includes redundant data), such a tactic may not be adequate for scanning larger spaces. By conveniently executing a scan planning procedure for determining the most suitable scanning locations, occlusions can be minimized and posterior point cloud registration performance can also be improved by ensuring optimal coverage overlap.

Planning sensor locations and views based on 3D data, also known in the literature as next-best-view (NBV) planning, is an active research topic. NBV approaches can be distinguished as model based or non-model based [28]:

Model-based approaches take advantage of a pre-existing model of the structures to be sampled, with some level of fidelity, to plan the views (also known as off-line approaches).
Non-model-based algorithms select views in real time (also known as online approaches) while 3D information is being acquired, since no a priori information is given about the structures.

Thus, considering that the previous information about the environment is incomplete, these approaches can be complementary to each other: plan the scan locations and views with information available beforehand, and then adjust the scan plan as new information is being collected and processed. On-line planning involves incrementally incorporating collected information into a model (volume or surface based), and evaluating it according to some performance metric (e.g., point density). Candidate views can be evaluated using visibility and quality criteria, such as angle of incidence with the surface normal [29] or range and overlap with previous scans [30].

Online scan planning approaches offer convenient ways to address the challenges brought by existing clutter and occlusions, increasing the efficiency of the data acquisition process by minimizing the acquisition of redundant data. For example, Quintana et al. [31] presented a scan planning methodology that used voxelization to reduce data redundancy and labelled voxels according to the contained data. First a binary support vector machine classifier was used to identify points belonging to structural elements, with all the remaining points being considered clutter. Occupied voxels contained either structural points or clutter, while non-occupied voxels could be empty because there are no data there despite being sensed, or empty because they are occluded due to clutter or structural elements. Next, their algorithm estimates how many of the occluded-structure voxels would become structure voxels if that scan was effectively taken, therefore optimizing the scan plan.

The scan planning problem has specific requirements when applied to heritage sites. Initially, the adequate sensor resolution (or point density) and measurement accuracy should be established taking into account the required level of detail to fully capture the target surface features that should be preserved [32]. The measurement accuracy is affected not only by the scanner’s specifications, but also by the scanning geometry which is the position of the scanner relative to the target surfaces to be sampled. Coverage issues easily arise when scanning these types of buildings due to their features and geometry (existence of balconies, recessed windows, or steep angles), and in some cases laser scanning systems may be deemed inadequate [32]. Vilariño et al. [33] proposed a scan planning algorithm that considered the requested level of detail, overlap ratio, coverage, and angle of incidence. Initially, the space occupied by the building is delineated by analysing aerial images. Then, the space is discretized to facilitate visibility analysis from candidate scan positions. To avoid time-consuming analysis of high-density grids required for this type of building, the authors used a triangulation-based space subdivision considering the actual delimiting geometry of the building. Candidate scan positions are evaluated by performing visibility analysis using ray tracing. For larger sites, the authors suggest evaluating a combination of terrestrial and aerial data acquisition systems.

6. Use Case: Scan-vs-BIM Application for Change Detection

With the arrival of BIM, traditional construction quality assessment methods evolved from traditional methods, based on periodic site visits by architects or engineers to validate the finished structure by comparing it with the 2D design plans, to become a digital and automated process. Construction quality can typically be assessed by comparing the as-planned BIM model to the acquired point cloud, obtained by scanning the target structure. In the literature, this comparison is performed by registering the point cloud to the BIM model (since they do not have the same coordinate reference frame), and detecting differences such as the existence of certain elements in the BIM which are not covered by the point cloud or the existence of geometric features in the point cloud which are not present in the BIM. To more accurately describe and demonstrate this procedure, an application for automated change monitoring is discussed here. Each phase of the procedure is detailed in order to highlight the main challenges faced in this type of application. Figure 7 illustrates the proposed approach, while Figure 8 gives a graphical representation of the BIM model after it is loaded by this application.

6.1. Data Pre-Processing

Point clouds acquired from laser scanners by sampling structures or buildings can easily contain millions of data points and occupy several gigabytes of memory, depending obviously on the number of scans taken and the range and resolution of the scanning sensor. When processing point cloud data, there is a clear trade-off in terms of sampling accuracy and required processing time when considering all of the acquired data points or only using a small subset of the data. The challenge here is to reduce the amount of data as much as possible, while retaining representative data that allow posterior feature extraction with minimal loss of detail. To address this challenge, multiple solutions exist aiming to select the most representative data, minimizing the memory requirements and computational load. Three main metrics can be used to describe the information in point cloud data [34]:

point density, which defines the number of points within a unit area;
measurement uncertainty, which refers to the standard deviation of the shortest distance between points and the target surfaces; and
occlusions, which are the unsampled regions of the target surface.

When using static sensors like a TLS, commonly used in the AEC domain, the sensor position is usually accurately known, and the measurement uncertainty can be disregarded knowing that the laser scanners on the market today have extremely low measurement errors [35]. On the other hand, when using mobile platforms such as robots, then the localization uncertainty needs to be accounted for together with the sensor measurement error if a low-cost sensor is used (mobile robots have limited onboard energy resources which usually leads to the integration of low energy and low-cost sensors with higher measurement error).

Nevertheless, point clouds can be very large and one may not be able to load them into the system’s memory all at once. A simple strategy would be to increase swap memory to account for the point cloud and related data structures. Another would be to use space partitioning techniques to process segments of the point cloud in isolation when possible.

6.1.1. Density Reduction

Point density determines the sampling level, and therefore the amount of detail that a point cloud is able to hold from a target surface. Several methods exist for point cloud density reduction, with two of the most common being:

Voxelization: divides the point cloud into small cubes (known as voxels), each containing a subset of the points. Density reduction is achieved by keeping only one point per voxel. This point can be chosen according to different principles, such as the centroid of the voxel or the proximity to the centre of the voxel.
Minimum distance between points: removes points that are below a given minimum distance from each other. This value can be chosen based on the desired point density and level of detail. It can also be used with voxelization to further reduce density.

Voxelization is usually the faster method, as it involves a simple spatial partitioning operation. Nevertheless, it can lead to loss of detail and accuracy, especially if the voxel size is too large [36]. On the other hand, the minimum distance method is able to better preserve the original shape of the surface, but it can be slower and more computationally intensive. Hence, a decision on which method to use should be based on the target type of structure to be scanned. For this demonstration, the minimum distance method was adopted.

6.1.2. Geometry Extraction

Data reduction can also be performed by removing redundant sections of the point cloud. In a Scan-vs-BIM application, where the objective is to adjust the point cloud to a BIM model, it makes sense to remove the unmodelled elements normally present in cluttered environments such as people, vegetation, and furniture. In addition to simplifying the point cloud, this also improves point cloud registration performance as automated registration in heavily cluttered environments is still a challenge [16]. The segmentation of redundant elements can be performed directly or indirectly:

The direct method actually detects and identifies the clutter in the scene, segmenting it from the data.
The indirect method detects and segments the structural elements in the scene, removing everything else which does not fit in that category.

As the description implies, the direct method involves accurate isolation of objects in the point cloud, which is a time-consuming process. Several approaches exist in the literature for finding 3D objects automatically. Sharif et al. [37] implemented a three-step methodology: initially an offline model library is created from available models, then an online searching and matching step is performed, followed by a match refinement and isolation step using the iterative closest point (ICP) algorithm. The last step was included to reduce the number of false positives (objects detected erroneously) that the authors found, which would increase with the amount of clutter present in the environment. Following the current trend in the CV domain, Chen et al. [38] proposed a deep learning framework to automatically detect and classify both building elements and other objects. They use the PointNet neural network architecture to predict the class label of each object segment in the point cloud. Guo et al. [39] presented a state-of-the-art review on existing local surface feature-based 3D object recognition methods, providing a general description of each step of the process.

On the other hand, the indirect method consists of detecting structural elements such as walls, ceiling, and floor, segmenting them from the point cloud while considering everything else as clutter. Typically, on non-heritage buildings, the mentioned elements are planar (in fact they should be the major planes in the scene), so plane extraction algorithms can be employed to extract the corresponding points from the cloud. Typically, the random sample consensus (RANSAC) algorithm [40] is used to perform plane extraction. One of the disadvantages of this methodology when compared to the direct method is that RANSAC performance is negatively affected with too much clutter in the scene, so structural elements might not be accurately detected. One way to mitigate this is to only allow the RANSAC algorithm to accept planes whose normal coefficient is within a maximum threshold degree deviation from the Manhattan world-type plane’s direction.

Obviously, if the model contains curved surfaces, typically present in heritage buildings, then other geometric feature extraction methodologies will need to be adopted.

6.2. Point Cloud Registration

In order to match the input point cloud to the BIM model, a process known as registration was adopted. The ICP algorithm, one of the most used registration algorithms [41], iteratively adjusts the alignment of a point cloud until the distances between points in the cloud and their closest points in the target point cloud are minimized. Although robust, its convergence domain is narrow and it is easy to fall into a local optimal solution. Since point-to-point correspondence needs to be evaluated in each iteration, the calculation time will be proportional to the number of points involved. In order to minimize the probability of converging into local optimal solutions, a hierarchical ICP approach can be adopted where the initial iterations are made on a coarse point cloud (down sampled) to obtain an initial alignment, and then use it for promoting convergence on a denser point cloud. Figure 9 displays the point cloud after coarse registration to the BIM model, while Figure 10 displays a point cloud registered with ICP (a coarse to fine registration procedure was adopted).

6.3. Classification

After adjusting the point cloud to the model, point classification can be performed. This determines the point correspondence to each element in the model and is based on the Euclidian distance metric: for each point, determine the closest element in the model, considering a maximum distance threshold to discard unmodelled features. Due to the size of the input point cloud, this process can be very heavy computationally. The fundamental problem here is finding the nearest neighbour (NN) of a query point in 3D space given a reference point cloud (which in this case is synthesized using the model). Since this involves determining the NNs for each point in the original point cloud, Kd-trees can be used to speed up the search process. These perform a hierarchical decomposition of the space along each dimension. For low dimensions they can be used for NN queries in logarithmic time and linear space [42].

6.4. Change Detection

Detecting unmodelled features is performed by analysing the unclassified points from the cloud. These can be clustered and presented to the user as an unidentified object. Additionally, CV algorithms can be used to classify the object itself.

Furthermore, the degree of correspondence between each BIM element and associated point cloud subset (classified in the previous phase) can be measured by estimating the ratio between the area of the sampled surfaces of the BIM elements and the total visible surface area of those elements. This enables the calculation of a global metric which informs the user on how close the built structure is to the as-designed one. Figure 11 illustrates how this information can be presented to the user.

7. Potential Research Applications

A labelled point cloud dataset of an indoor environment, such as the one provided here, has several research applications in various fields, including CV, robotics, machine learning, and architecture:

Developing and training object recognition and tracking systems [43]. For example, it can be used to train machine learning algorithms to recognize and track furniture, appliances, and other objects.
Creating detailed maps of indoor environments [44] and developing algorithms for indoor navigation [45]. This can be useful for applications such as robotics, indoor localization systems, and virtual reality.
Developing automatic 3D reconstruction methodologies for creating 3D models of indoor spaces, which can be used for architectural design for example [46].

Furthermore, the combination of a labelled point cloud data with a BIM model also has particular interest in the AEC domain:

Developing new construction progress monitoring techniques, matching the acquired point cloud to the BIM model, and estimating the degree of completeness of the construction [47].
Developing change detection algorithms [48], which can be used for construction quality assessment. It can be used to track the evolution of a construction through time, or to detect differences between as-planned design and the actual as-built design.

8. Conclusions

This paper introduced a new dataset comprising a fully labelled point cloud and BIM model of an indoor space at the INESC TEC laboratories located in the Faculty of Engineering of the University of Porto, Portugal. The data collection was performed using a Leica BLK 360 G2 TLS, and a total of 584,701,977 points were acquired from 21 laser scans. These scans were carefully distributed across the indoor environment, guaranteeing line of sight between pairs of scans in order to maximize point cloud overlap and enhance the registering process performance as a consequence. The data were pre-processed for removing unwanted noise (e.g., distortion introduced by the windows), avoiding the need for further processing prior to use. All the relevant points corresponding to structural elements were labelled. In addition to those points, those corresponding to typical objects founds in similar environments were also labelled accordingly, as these can play an important role in assessing the performance of CV algorithms for clutter detection and object segmentation in BIM related applications. In total, 33 types of objects were identified in the dataset. The dataset also includes a complete BIM model of the space, including relevant furniture.

The collected dataset has some pertinent real-world applications in the AEC and CV domains, namely Scan-to-BIM and Scan-vs-BIM applications. While the former involves extracting geometric features from the point cloud to model BIM elements, the latter involves comparing the acquired point cloud to the as-designed BIM model to identify differences from the original design plan or construction schedule. As this description implies, representative datasets containing BIM models and point clouds are essential for the development and validation of such approaches and for one reason or another they are not widely available. This is the main reason why the dataset provided here will be of great use to the research community in the aforementioned domains. Furthermore, some insights are given into specific potential research applications, ranging from object recognition techniques to 3D reconstruction and change detection algorithms.

Author Contributions

Laser scanner data acquisition: N.A.; point cloud processing: N.A.; point cloud labelling: N.A.; local distance measuring: R.S.; BIM modelling: R.S.; writing—N.A. and R.S.; review: A.P., A.M. and M.P.; Funding acquisition: A.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) and Lisbon Regional Operational Programme (ROP Lisbon) [Project No. 046123; Funding Reference: POCI-01-0247-FEDER-046123 and LISBOA-01-0247-FEDER-046123].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available at the Zenodo research data repository at https://doi.org/10.5281/zenodo.7948116 (accessed on 31 May 2023).

Acknowledgments

We thank the support given by Grupo Casais during the execution of the mentioned research project.

Conflicts of Interest

The authors declare no conflict of interest.

References

Omar, T.; Nehdi, M. Data acquisition technologies for construction progress tracking. Autom. Constr. 2016, 70, 143–155. [Google Scholar] [CrossRef]
European Commission. Directive (EU) 2018/844 of the European Parliament and of the Civil Council; 30 May 2018. Available online: https://eur-lex.europa.eu/legal-content/en/TXT/?uri=CELEX%3A32018L0844 (accessed on 31 March 2023).
Bosché, F.; Guillemet, A.; Turkan, Y.; Haas, C.T. Tracking the built status of MEP works: Assessing the value of a Scan-vs-BIM system. J. Comput. Civ. Eng. 2014, 28, 05014004. [Google Scholar] [CrossRef]
Isikdag, U.; Zlatanova, S.; Underwood, J. A BIM-Oriented Model for supporting indoor navigation requirements. Comput. Environ. Urban Syst. 2013, 41, 112–123. [Google Scholar] [CrossRef]
Nikoohemat, S.; Diakité, A.; Zlatanova, S.; Vosselman, G. Indoor 3D reconstruction from point clouds for optimal routing in complex buildings to support disaster management. Autom. Constr. 2020, 113, 103109. [Google Scholar] [CrossRef]
Chen, W.; Chen, K.; Cheng, J.C.; Wang, Q.; Gan, V.J. BIM-based framework for automatic scheduling of facility maintenance work orders. Autom. Constr. 2018, 91, 15–30. [Google Scholar] [CrossRef]
Naseer, M.; Khan, S.; Porikli, F. Indoor scene understanding in 2.5/3d for autonomous agents: A survey. IEEE Access 2018, 7, 1859–1887. [Google Scholar] [CrossRef]
Moura, M.S.; Rizzo, C.; Serrano, D. Bim-based localization and mapping for mobile robots in construction. In Proceedings of the 2021 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), Santa Maria da Feira, Portugal, 28–29 April 2021. [Google Scholar]
DuraARK EU Project Results. 2016. Available online: https://cordis.europa.eu/project/id/600908/results (accessed on 14 May 2023).
DURAARK Datasets. Available online: http://data.duraark.eu/ (accessed on 31 March 2023).
Khoshelham, K.; Tran, H.; Acharya, D.; Vilariño, L.; Kang, Z.; Dalyot, S. Results of the ISPRS benchmark on indoor modelling. ISPRS Open J. Photogramm. Remote Sens. 2021, 2, 100008. [Google Scholar] [CrossRef]
Khoshelham, L.; Vilariño, L.D.; Peter, M.; Kang, Z. The Isprs Benchmark on Indoor Modelling. 2017. Available online: https://www2.isprs.org/commissions/comm4/wg5/benchmark-on-indoor-modelling/ (accessed on 14 May 2023).
Thomson, C.; Boehm, J. Indoor Modelling Benchmark for 3D Geometry Extraction. In ISPRS Technical Commission V Symposium; International Society for Photogrammetry and Remote Sensing (ISPRS): Riva del Garda, Italy, 2014; Volume XL-5, pp. 581–587. [Google Scholar]
Nan, L.; Xie, K.; Sharf, A. A search-classify approach for cluttered indoor scene understanding. ACM Trans. Graph. (TOG) 2012, 31, 137. [Google Scholar] [CrossRef]
Previtali, M.; Barazzetti, L.; Brumana, R.; Scaioni, M. Towards automatic indoor reconstruction of cluttered building rooms from point clouds. In Proceedings of the ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Riva del Garda, Italy, 23–25 June 2014; Volume II-5. [Google Scholar]
Czerniawski, T.; Nahangi, M.; Haas, C.; Walbridge, S. Pipe spool recognition in cluttered point clouds using a curvature-based shape descriptor. Autom. Constr. 2016, 71, 346–358. [Google Scholar] [CrossRef]
Huang, B.; Liao, H.; Ge, Y.; Zhang, W.; Kang, H.; Wang, Z.; Wu, J. Development of BIM Semantic Robot Autonomous Inspection and Simulation System. In Proceedings of the 2023 9th International Conference on Mechatronics and Robotics Engineering (ICMRE), Shenzhen, China, 10–12 February 2023. [Google Scholar]
Kong, F.; Liu, X.; Tang, B.; Lin, J.; Ren, Y.; Cai, Y.; Zhu, F.; Chen, N.; Zhang, F. MARSIM: A light-weight point-realistic simulator for LiDAR-based UAVs. IEEE Robot. Autom. Lett. 2023, 8, 2954–2961. [Google Scholar] [CrossRef]
Manivasagam, S.; Wang, S.; Wong, K.; Zeng, W.; Sazanovich, M.; Tan, S.; Urtasun, R. Lidarsim: Realistic lidar simulation by leveraging the real world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Valero, E.; Bosché, F.; Bueno, M. Laser scanning for BIM. J. Inf. Technol. Constr. (ITcon) 2022, 27, 486–495. [Google Scholar] [CrossRef]
Tang, P.; Huber, D.; Akinci, B.; Lipman, R.; Lytle, A. Automatic reconstruction of as-built building information models from laser-scanned point clouds: A review of related techniques. Autom. Constr. 2010, 19, 829–843. [Google Scholar] [CrossRef]
Pătrăucean, V.; Armeni, I.; Nahangi, M.; Yeung, J.; Brilakis, I.; Haas, C. State of research in automatic as-built modelling. Adv. Eng. Inform. 2015, 29, 162–171. [Google Scholar] [CrossRef]
Braun, A.; Tuttas, S.; Borrmann, A.; Stilla, U. Improving progress monitoring by fusing point clouds, semantic data and computer vision. Autom. Constr. 2020, 116, 103210. [Google Scholar] [CrossRef]
Tsige, G.Z. Scan-vs-BIM Automated Registration Using Columns Segmented by Deep Learning for Construction Progress Monitoring. Master’s Thesis, University of Twente, Enschede, The Netherlands, 2022. [Google Scholar]
Chuang, T.Y.; Yang, M.J. Change component identification of BIM models for facility management based on time-variant BIMs or point clouds. Autom. Constr. 2023, 147, 104731. [Google Scholar] [CrossRef]
Stilla, U.; Yusheng, X. Change detection of urban objects using 3D point clouds: A review. ISPRS J. Photogramm. Remote Sens. 2023, 197, 228–255. [Google Scholar] [CrossRef]
Girardeau-Montaut, D.C. Cloud compare—3D Point Cloud and Mesh Processing Software. Available online: https://www.danielgm.net/cc/ (accessed on 31 May 2023).
Scott, W.R.; Roth, G.; Rivest, J.F. View planning for automated three-dimensional object reconstruction and inspection. ACM Comput. Surv. (CSUR) 2003, 35, 64–96. [Google Scholar] [CrossRef]
Massios, N.A.; Fisher, R.B. A best next view selection algorithm incorporating a quality criterion. In Proceedings of the British Machine Vision Conference, Southampton, UK, 14–17 September 1998; Volume 2. [Google Scholar]
JVásquez-Gómez, I.; Löpez-Damian, E.; Sucar, L. View planning for 3D object reconstruction. In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, St. Louis, MO, USA, 10–15 October 2009. [Google Scholar]
Quintana, B.; Prieto, S.A.; Adán, A.; Vázquez, A.S. Semantic scan planning for indoor structural elements of buildings. Adv. Eng. Inform. 2016, 30, 643–659. [Google Scholar] [CrossRef]
England, H. 3D Laser Scanning for Heritage: Advice and Guidance on the Use; Historic England: Swindon, UK, 2018. [Google Scholar]
Vilariño, L.D.; Frías, N.L.; Previtali, M.; Scaioni, M.; Frías, J.B. Scan planning optimization for outdoor archaeological sites. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, XLII-2/W11, 489–494. [Google Scholar] [CrossRef]
Elberink, S.O.; Vosselman, G. Quality analysis on 3D building models reconstructed from airborne laser scanning data. ISPRS J. Photogramm. Remote Sens. 2011, 66, 157–165. [Google Scholar] [CrossRef]
Muralikrishnan, B. Performance evaluation of terrestrial laser scanners—A review. Meas. Sci. Technol. 2021, 32, 072001. [Google Scholar] [CrossRef] [PubMed]
Xu, Y.; Tong, X.; Stilla, U. Voxel-based representation of 3D point clouds: Methods, applications, and its potential use in the construction industry. Autom. Constr. 2021, 126, 103675. [Google Scholar] [CrossRef]
Sharif, M.M.; Nahangi, M.; Haas, C.; West, J. Automated Model-Based Finding of 3D Objects in Cluttered Construction Point Cloud Models. Comput.-Aided Civ. Infrastruct. Eng. 2017, 32, 893–908. [Google Scholar] [CrossRef]
Chen, J.; Kira, Z.; Cho, Y.K. Deep learning approach to point cloud scene understanding for automated scan to 3D reconstruction. J. Comput. Civ. Eng. 2019, 33, 04019027. [Google Scholar] [CrossRef]
Guo, Y.; Bennamoun, M.; Sohel, F.; Lu, M.; Wan, J. 3D object recognition in cluttered scenes with local surface features: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 2270–2287. [Google Scholar] [CrossRef]
Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
Besl, P.J.; McKay, N.D. Method for registration of 3-D shapes. Sens. Fusion IV Control Paradig. Data Struct. 1992, 1611, 586–606. [Google Scholar] [CrossRef]
Bentley, J.L.; Friedman, J.H.; Finkel, R.A. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 1977, 3, 209–226. [Google Scholar]
Hagelskjaer, F.; Buch, A.G. Pointvotenet: Accurate Object Detection and 6 DOF Pose Estimation in Point Clouds. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2020. [Google Scholar]
Nüchter, A.; Hertzberg, J. Towards semantic maps for mobile robots. Robot. Auton. Syst. 2008, 56, 915–926. [Google Scholar] [CrossRef]
Pfreundschuh, P.; Hendrikx, H.F.; Reijgwart, V.; Siegwart, R.; Cramariuc, A. Dynamic object aware lidar slam based on automatic generation of training data. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021. [Google Scholar]
Liu, G.; Wei, S.; Zhong, S.; Huang, S.; Zhong, R. Reconstruction of Indoor Navigation Elements for Point Cloud of Buildings with Occlusions and Openings by Wall Segment Restoration from Indoor Context Labeling. Remote Sens. 2022, 14, 4275. [Google Scholar] [CrossRef]
Han, K.; Degol, J.; Golparvar-Fard, M. Geometry- and appearance-based reasoning of construction progress monitoring. J. Constr. Eng. Manag. 2018, 144, 04017110. [Google Scholar] [CrossRef]
Park, S.; Ju, S.; Yoon, S.; Nguyen, M.H.; Heo, J. An efficient data structure approach for BIM-to-point-cloud change detection using modifiable nested octree. Autom. Constr. 2021, 132, 103922. [Google Scholar] [CrossRef]

Figure 1. Number of publications mentioning the keywords since 2010.

Figure 2. Floor plan. Room-108 is a dedicated testing facility with a water tank. Room-109 is a workshop where robots are assembled. Rooms-110 and -111 are offices for researchers and students. The hallway is the space-175.

Figure 3. The point cloud acquired in one of INESC TEC robotics research laboratories.

Figure 4. BIM model of the spaces.

Figure 5. Leica BLK360 G2 laser scanner.

Figure 6. Top view of the point cloud, with scan positions in red and register links in green.

Figure 7. Change detection procedure overview.

Figure 8. Hallway view after loading the model.

Figure 9. Hallway view after coarse registration of the pre-processed point cloud.

Figure 10. The point cloud is aligned with the building model after registration.

Figure 11. After point cloud classification, a comparison can be made with what was supposed to have been built. For this demonstration the points labelled as belonging to the tank were removed, and the consequent lack of correspondence with the models is displayed using a red overlay. Although many points were classified as belonging to the wall at the back, occlusions caused by clutter were responsible for considerable gaps in coverage which led to only roughly 50% of the wall being actually detected. All the other BIM elements correctly identified in the point cloud are highlighted with a green overlay.

Table 1. Description of the dataset.

File Name	Format	Files	Size	Description
CRASLAB_annotated.zip	ASCII	1	4.3 GB	Registered point cloud
CRASLAB_BIM.zip	IFC	1	10.0 MB	BIM model

Table 2. Point cloud column variables.

Variable	Description
Point X coordinate (m)	Coordinate of a point in the X axis
Point Y coordinate (m)	Coordinate of a point in the Y axis
Point Z coordinate (m)	Coordinate of a point in the Z axis
Point colour (R)	Red colour intensity (0–255)
Point colour (G)	Green colour intensity (0–255)
Point colour (B)	Blue colour intensity (0–255)
Intensity	Return strength of the laser beam (0–1)
Label	Point label

Table 3. Spatial scan distribution.

Space	Number of Scans	Number of Points
−108	6	183,787,505
−109	3	89,906,532
−110	5	132,629,460
−111	3	89,001,431
−175	4	89,377,049
Total	21	584,701,977

Table 4. Point cloud labels.

Number	Object	Number of Points
0	unassigned	62,412,722
1	ceiling	288,024,661
2	floor	73,215,411
3	wall	89,783,056
4	door	16,114,181
5	window	12,037,086
6	desk	20,302,532
7	chair	6,960,430
8	cabinet	20,399,593
9	mobile cabinet	453,602
10	shelf	2,834,452
11	vents	4,663,309
12	water tank	4,858,587
13	bin	869,702
14	box	18,243,987
15	board	5,266,606
16	computer	341,661
17	screen	1,864,965
18	printer	1,180,765
19	vest	67,128
20	switch	45,149
21	paper dispenser	397,825
22	alcohol dispenser	192,923
23	cable	281,862
24	phone	17,305
25	robot	1,617,877
26	water kettle	17,205
27	stairs	653,957
28	ladder	156,425
29	oil heater	45,711
30	divider	8,779,277
31	hanger	1,792,555
32	fan	100,189
33	water dispenser	709,281

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Abreu, N.; Souza, R.; Pinto, A.; Matos, A.; Pires, M. Labelled Indoor Point Cloud Dataset for BIM Related Applications. Data 2023, 8, 101. https://doi.org/10.3390/data8060101

AMA Style

Abreu N, Souza R, Pinto A, Matos A, Pires M. Labelled Indoor Point Cloud Dataset for BIM Related Applications. Data. 2023; 8(6):101. https://doi.org/10.3390/data8060101

Chicago/Turabian Style

Abreu, Nuno, Rayssa Souza, Andry Pinto, Anibal Matos, and Miguel Pires. 2023. "Labelled Indoor Point Cloud Dataset for BIM Related Applications" Data 8, no. 6: 101. https://doi.org/10.3390/data8060101

Article Menu

Labelled Indoor Point Cloud Dataset for BIM Related Applications

Abstract

1. Summary

2. Background

3. Data Description

3.1. Point Cloud

3.2. BIM Model

4. Methods

5. Discussion

6. Use Case: Scan-vs-BIM Application for Change Detection

6.1. Data Pre-Processing

6.1.1. Density Reduction

6.1.2. Geometry Extraction

6.2. Point Cloud Registration

6.3. Classification

6.4. Change Detection

7. Potential Research Applications

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI