Positional Accuracy Assessment of Digital Elevation Models and 3D Vector Datasets Using Check-Surfaces

Mozas-Calvache, Antonio Tomás

doi:10.3390/ijgi12090348

Open AccessArticle

Positional Accuracy Assessment of Digital Elevation Models and 3D Vector Datasets Using Check-Surfaces

by

Antonio Tomás Mozas-Calvache

Department of Cartographic, Geodetic and Photogrammetric Engineering, University of Jaen, 23071 Jaen, Spain

ISPRS Int. J. Geo-Inf. 2023, 12(9), 348; https://doi.org/10.3390/ijgi12090348

Submission received: 17 May 2023 / Revised: 15 August 2023 / Accepted: 21 August 2023 / Published: 23 August 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This study focuses on the positional accuracy of Digital Elevation Models (DEMs) and 3D vector features by considering that both datasets can be used as a product to assess or as a reference. The main objective is to provide an alternative method to the traditional use of checkpoints by using check-surfaces in order to avoid identification issues. The methodology includes the determination of a set of polygons with a significant height in relation to the surrounding area (elevated or depressed) and those cells extracted from the DEM that match these elements. The check-surfaces are obtained after a triangulation of these polygons. The methodology uses procedures based on buffers to provide several results in the form of distribution functions of accuracies (2D, vertical and 3D). The trial has been carried out using a large set of data representing buildings obtained from official institutions. The results show consistent 2D, vertical and 3D accuracy values related to commonly used confidence levels. The application has demonstrated the viability of this approach for obtaining horizontal and vertical accuracies individually and jointly at any confidence level. In addition, the study includes the analysis of the results of specific zones, considering several characteristics.

Keywords:

DSM; buildings; grid; polygon; surfaces; quality

1. Introduction

Spatial or positional accuracy is one of the six elements of data quality defined by ISO 19157 [1]. It is the one related to the accuracy of the location of features, considering a feature as an abstraction of real world phenomena [2]. The ISO 19157 standard defines spatial accuracy considering three sub-elements: absolute accuracy, relative accuracy and gridded data position accuracy. These are related to the closeness of reported coordinate values, relative positions and gridded data position values to values accepted as or being true, respectively [1]. This aspect implies that methodologies developed to assess positional accuracy must include a comparison of data (absolute or relative positions) with respect to other independent datasets that are considered as true values. In the case of absolute position, positional accuracy is related to the closeness of the coordinates of each entity to its true position. Since this true position is unknown, we use data from a more accurate source (reference) to assess their positional discrepancies. Traditionally, positional assessment has been carried out by comparing a sample of well-defined checkpoints extracted from the dataset to be assessed and the homologous elements obtained from a given reference. Obviously, the reference data must be obtained from an independent and more accurate source. As an example, the ratio between accuracies of reference and the product to be assessed should be at least three [3]. The comparison of checkpoints has been implemented using several metrics which are mainly based on the calculation of the difference of coordinates, such as standard deviation and Root Mean Squared Error (RMSE). During the last decades, several approaches and standards have been published and applied to spatial databases to assess their positional quality, such as the NMAS (National Map Accuracy Standard) [4], the EMAS (Engineering Map Accuracy Standard) [5], the NSSDA (National Standard for Spatial Data Accuracy) [6], the ASPRS accuracy standards for large-scale maps [7] and the ASPRS Positional Accuracy Standards for Digital Geospatial Data [8]. Considering 3D data, most studies and standards have considered a distinction between horizontal and vertical accuracies, although we must consider that both are related (e.g., a planimetric displacement of a DEM will cause a height error of all points regardless of the vertical accuracy of the source). In this sense, we must consider that the horizontal accuracy has great influence on the vertical accuracy, mainly when the slope of the terrain is high. Thus, some metrics provide an error value considering the terrain slope, such as the Koppe’s formula [9,10]. Similarly, some institutions such as ISPRS have established tolerances considering terrain slope and map scale [11].

1.1. Positional Accuracy of Vector Databases

Positional discrepancies of spatial data are mainly caused by inaccuracies during acquisition and processing. These discrepancies cause a positional uncertainty of data related to their spatial position (absolute or relative). Positional uncertainty can be considered as an area (2D) or volume (3D) surrounding each entity where the most probable position is located. This most probable position is statistically related to the true position. In the case of the simplest geometry (a 2D point), error theory describes an elliptical error around the most probable position which is simplified, assuming equal variances of X and Y, by a circular error [12]. These errors are calculated considering a specific confidence (e.g., 95%). In the case of linear entities, several studies have proposed the idea of an uncertainty band surrounding the most probable position. This band is also extended to polygons considering their boundaries as a collection of points and lines (geometrically). The band model has experienced an evolution from an initial definition based on the epsilon band (proposed by Perkal [13]), where the true position of each segment is assumed within a constant distance (epsilon) from the measured segment. This basic band is composed of the union of the rectangle centered and aligned to the segment (width of two epsilon) and two circles centered at each endpoint (epsilon radius). Subsequently, this band model has been modified by several authors [14,15,16,17,18,19]. In addition, other models have been described, such as those based on the stochastic process theory [20,21], probability distribution functions [22,23], covariance-based error band [24], etc. These uncertainty models have been extensively described in the literature [25,26] and extended from 2D to 3D [27].

The assessment of positional accuracy has been widely implemented using checkpoints. As mentioned previously, several standards have been proposed and applied using metrics based on standard deviation, RMSE, etc. These standards include some recommendations about the sample of points to be used. However, the use of lines and polygons to perform positional quality assessments is more limited. Some metrics have been proposed but, to the best of our knowledge, no standard has been implemented until this moment. Metrics applied to lines and polygons are based on models related to the uncertainty band. As an example, McMaster [28] described several metrics for analyzing the displacement between a simplified line and the original, such as vector displacement and aerial displacement. The latter was subsequently adapted by Skidmore and Turner [29] to obtain a mean displacement between the line to be assessed and the reference. Goodchild and Hunter [30] proposed a metric based on generating a buffer around the reference line and quantifying the percentage of the line to be assessed that lies within it. Increasing the width of the buffer allows obtaining a distribution function of the line inclusion. Tveite and Langaas [31] proposed several metrics based on the analysis of the possible situations caused after generating two buffers (around the line to be assessed and the reference line). They obtained a distribution function of these metrics after increasing the width of the buffers. Mozas and Ariza [32] described a metric for obtaining a mean displacement between two lines considering the length of the adjacent segments. Finally, Mozas and Ariza [33] proposed an adaptation of some of these metrics from 2D to 3D.

1.2. Positional Accuracy of DEMs

DEMs are widely used by various disciplines and institutions to represent and analyze the topography of the terrain (Digital Terrain Models [DTMs]), a surface that can be defined as the boundary between the lithosphere and the atmosphere (without biosphere and anthroposphere) [34], and the topography of the lower boundary of the atmosphere (Digital Surface Models [DSMs]), a surface that can include the lithosphere, hydrosphere, cryosphere, biosphere and anthroposphere [34], although they may present other configurations considering particular characteristics (e.g., a DSM of buildings contains a surface of the boundary of the atmosphere and buildings). DEMs can be categorized into two data models: raster and vector. In the first case, DEMs are based on grids, which are composed of a regular array of cells containing an attribute value (elevations) arranged by row and column. This model is commonly referred to as 2.5D data (e.g., ArcGIS Pro Software) because it only supports a single z-value for each planimetric location (in a specific column and row). Grid models allow us to record average values based on the entire area covered by the cell or a discrete value of a specific point located within this area. On the other hand, vector DEMs are composed of 3D points. As with any product based on measurements, DEMs are not exempt from errors, which are mainly categorized into gross errors or blunders, systematic errors and random errors. Errors in DEMs have been extensively analyzed including their causes and consequences [35] and the assessment methods [36,37].

In the case of DEMs, there are a large number of studies that have focused on the positional assessment of this type of spatial data. These studies cover different aspects such as the acquisition of data (photogrammetry, LiDAR, etc.), data models (grids, TIN, etc.), type of accuracy (absolute and relative), accuracy measures (normal distribution and robust parameters [38]), etc. However, many of these studies focused on the vertical accuracy of DEMs without considering the effect of the horizontal accuracy. It is not usual to inform about horizontal accuracy [35,36], although there are some studies that analyzed the impact of horizontal errors on DEMs accuracy, achieving an improvement in the final vertical accuracy after correcting them [39]. Most of the studies published until this moment use checkpoints, obtained mainly from GNSS surveying, topographic maps, more accurate DEMs, etc. They include recommendations about the sample of points to be used. There are some examples of using linear elements [40] or plane surfaces [41] to obtain the sample of checkpoints needed to assess the positional accuracy of the product. In addition to these methodologies based on discrepancies between checkpoints, other authors have proposed alternative approaches to assessing positional accuracy. Ariza-López and Reinoso-Gordo [42] compared two DEMs using two buffering methods (based on a simple and double buffer) applied to homologous cells of each DEM (reference and product) by analyzing the inclusion of cells within the simple buffer and the intersection of voxels generated in each pair of cells (double buffer). However, this comparison is carried out on the vertical component without considering horizontal uncertainty. Another interesting approach, which considered buffers, is that described by Mozas et al. [10] for analyzing contour lines. They generated a 3D buffer around the maximum slope line defined between contour lines and counted the number of points, extracted from a DEM, which were inside. More detailed reviews of practices related to the accuracy assessments of DEMs are shown by Mesa-Mingorance and Ariza-López [36] and Polidori and El Hage [37].

1.3. Objectives and Structure

As described previously, the assessment of the positional accuracy of DEMs has traditionally been undertaken using checkpoints. However, points are usually not well defined in DEMs. Depending on the grid resolution or vertex density of the TIN, the location of points is more or less estimated within a cell (grid model) or a triangle (TIN model). Most studies use an associated image to determine the position of these points but this solution is not always possible. Recently, some interesting approaches have described methodologies for facilitating the identification and position of points to be used in positional assessment. For example, Höhle [41] used well-defined points obtained from plane intersections to assess LiDAR data. These planes were previously determined from building roofs surfaces. In this sense, a surface of an elevated/depressed geometry is better defined in DEMs than specific points. Considering this premise, why not take advantage of using surfaces to assess the positional accuracy of a DEM rather than checkpoints? The use of 3D vector data related to areas (polygons) can be a good alternative for checking DEM accuracy due to the good definition of borders that represent these elevated/depressed surfaces. In this sense, uncertainty models of vector data can be used to develop a new methodology for assessing DEMs. In addition to borders, we also must consider the level of detail of (LoD) of the internal structure that represents the element (e.g., a building) due to the possible lack of information (e.g., height changes) of simplified models. On the other hand, height data extracted from DEMs can also be used as a reference to assess the positional accuracy of vector databases due to the current high availability and updating level of DEMs.

The main objective of this study consists of the development of a new methodology for assessing DEMs using surfaces from a 3D vector source and vice versa. This approach must consider horizontal and vertical accuracies independently and jointly. In this sense, the method must be based on 3D uncertainty models.

To achieve this goal, in this study, we develop a new methodology based on the application of the uncertainty band model to assess the positional accuracy of DEMs using 3D vector data of elevated/depressed surfaces as reference and vice versa. Therefore, our approach can be applied in both ways: to assess DEMs or 3D vector datasets using the other dataset as reference. The only limitation is related to the need for the reference to be obtained from a more accurate and independent source. We have applied this method using two datasets, a DSM of buildings and polygonal surfaces modelling building footprints extracted from 3D vector datasets.

As a secondary objective, we have analyzed the possible influence of the average area of elements and slope by applying this method to different zones categorized according to these variables.

This paper is structured as follows: First, a description of the proposed methodology, and subsequently, its application using official datasets obtained from two institutions. Finally, we describe the main results obtained after this application, their impact on this topic and the main conclusions of this study.

2. Methodology

The assessment of the positional accuracy of DEMs and vector databases has been underpinned by metrics based on a sample of checkpoints (except some recent research applications that use linear elements). As described previously, the use of points requires a good identification of both sources (to be assessed and reference). In the case of DEMs, this identification is not always clear because the characteristics of the data model itself depend mostly on an annexed image of the terrain (e.g., orthoimage) that is related to the DEM. However, this image is not always available. The spatial resolution of the DEM supposes a clear limitation to the estimation of the position of checkpoints. In this context, the use of other elements such as surfaces [41] and linestrings [43] can help to improve the identification. For example, the use of the roof of a building, which is represented by several cells in a grid (DSM), can improve the identification of the element (depending on the resolution) because the borders are defined by several cells instead of within one cell and, in addition, the internal surface is clearly identified. In this sense, problems of identification are reduced to a certain part of the element, instead of the entire element. On the other hand, the use of checkpoints supposes a reduced sample of data, considering the dimension of this type of element, in contrast to the complete area to be assessed. In this way, the use of the same number of elements but considering surfaces will suppose an evaluation of a larger area of terrain (although we must consider that adjacent height values are correlated). Nevertheless, this aspect could be used to reduce the number of elements to be used in the assessment. Considering these premises, we propose a new methodology for assessing the positional accuracy of DEMs and vector datasets based on check-surfaces instead of checkpoints. Although these check-surfaces are obtained from the vector dataset, the methodology can be applied using the DEM as the dataset to be assessed and the 3D vector data as reference and vice versa. The unique premise is that the independent reference source must be more accurate than the other. To simplify, we assume that the DEM is based on grid model although other models, such as the TIN, could be used with a simple adaptation of the procedure (considering the vertex positions instead of cells).

Figure 1 shows the methodology proposed in this study, where two data sources are used: a 3D vector dataset containing polygons and a DEM that includes height values of elements (e.g., DSM of buildings; see Figure 2a). Regardless of the sources to be assessed and for reference, the methodology is developed in the same way. Firstly, there is an initial step to prepare both datasets prior to the positional assessment. Obviously, both sources must refer to the same Coordinate Reference System (CRS), including the same heights reference. In the case of some DSMs, this aspect could involve the use of a DTM obtained from the same source to obtain absolute heights (e.g., DSM of buildings). Considering the 3D vector dataset, the procedure includes a selection of features taking data included in the other source into account. At this stage, a set of cells related to the sampled features of the vector source must be selected. In this sense, a matching procedure is developed in order to obtain unique correspondences between elements from both sources. We recommend the application of automatic matching methodologies that have been recently developed to improve processing efficiency, such as those based on boundary boxes, centroids, buffers, etc., which are described in more detail by several authors [44,45,46,47], although semiautomatic or visual matching can also be developed. The sample of elements must consider those requirements established by standards when using checkpoints. As an example, we recommend those sample sizes based on project area [8]. Data preparation ends with two datasets, a set of polygons composed of 3D vertexes and their corresponding cells selected from the DEM containing height data.

The assessment considers that each cell of the DEM dataset represents a position in space defined by its XY coordinates and the height data (included as an attribute). Following the uncertainty model described previously, we assume that each element presents an uncertainty band surrounding its more probable position. This band can be simplified by using a buffer which surrounds the element. In this sense, we adapt the procedure described by Goodchild and Hunter [30] for linear features and expanded to 3D by Mozas-Calvache and Ariza-Lopez [33]. They considered several buffer distances to estimate the percentage of the total length of the low-accurate lines that lie within a specific distance of the reference. As a result, a distribution function of the percentages of inclusion inside buffers is obtained. In our approach, we consider three buffers of a certain distance to the vector element (d), a 2D buffer around the vector element to determine the planimetric accuracy, a height buffer above and below the element to determine the vertical accuracy and a 3D buffer around the 3D vector element to determine a complete value of 3D accuracy. Therefore, the 3D vector element is considered as a surface where several buffers are applied. The determination of the percentage of inclusion of cells (of each element) within each buffer (considering several distances d) is obtained by considering the spatial behavior of their positions (XY coordinates of the center of the cell and height attribute) and the polygon that defines the vector element. The method follows this procedure:

2D inclusion: This involves the determination of the cells that lie planimetrically within the polygon (Figure 2b). The point-in-polygon algorithm is used to determine if the XY position of the cell is inside the vector element. Considering all data, we obtained a percentage of inclusion, which is named 2DInc.
External 2D buffer (d) of polygons: Cells which are not included in the polygon are selected to determine if they are included within a 2D buffer of a certain distance (d) (Figure 2c). In this case, we calculate the 2D Euclidean distance from a point (cell position) to those segments which compose the polygon. The lower value is used as minimum distance to the element and this value will determine if the cell is considered inside the buffer of a certain distance (d) or not. An increment of the buffer distance could increase the percentage of cells included in them. The percentage of inclusion (named 2DBd) includes cells which are included in the polygon (2DInc) and those which are included in the external 2D buffer of a certain distance (d).
Height buffer: This buffer implies a certain layer located above and below the 3D vector element. To define this layer and calculate the distance from cell position (XY-height) to the vector element, we determine a 3D triangulation based on the polygon which represents the element (Figure 2d). In our approach, we suggest the use of Delaunay triangulation algorithm. Once this procedure is completed, the vertical distance is calculated from the cell position to the triangle located above or below it. This value will determine whether the cell position is within the height buffer of distance d (Figure 2e). Finally, a percentage of inclusion (HBd) of cells within the height buffer of a determined distance (d) is obtained.
External 3D buffer (d) of polygons: This stage is developed in a similar way to the external 2D buffer stage but considering 3D distances from the position of cells to the segments that compose the polygon. The set of cells evaluated in this case are those which are not considered in the previous stage. Thus, they are those cells whose positions are not included planimetrically inside the polygon. After the calculation of the 3D distance, we determine the cells which are included inside the 3D buffer (Figure 2f) (3DBd) by adding those cells considered in HBd.

The development of this methodology supposes that we can obtain four main results considering the inclusion of cell positions within polygons (2DInc) or buffers (2DBd, HBd and 3DBd). Each result must be considered because this allows us to assess different aspects of the positional accuracy of the source to be checked. First, the 2DInc indicates which cells are inside the vector polygon showing some possible problems related to XY displacements or completeness of the DEM with respect to the vector source. The 2DBd results show us a distribution function of inclusions when we increase the distance (d) of the 2D buffer surrounding the polygon. Thus, we can obtain a certain value of 2D accuracy with respect to the most probable position (assumed from a more accurate data) for a specific percentage of inclusion (confidence level). The HBd results also show a distribution function of inclusions related to the vertical accuracy of data when we increase the distance (d). As happened with the previous results, this distribution allows us to estimate accuracy (vertical accuracy in this case) related to a specific percentage of inclusion (confidence level). Finally, the 3DBd results show a distribution function (by increasing d) of the 3D accuracy of the source to be assessed with respect to the reference. Using these results, we obtained values of 3D positional accuracy related to specific percentages of inclusion (confidence level).

In addition to the general methodology described previously, this study includes an analysis of the possible influence of the average area of elements (related to urban and rural zones) and the slope of the terrain. In this case, the method includes a previous categorization of the zones considering these variables and the application of the general procedure to the selected elements obtaining the distribution functions of the results (2DInc, 2DBd, HBd and 3DBd) for each category.

3. Application

The application of the methodology proposed in this study has been carried out using two official datasets published by Spanish institutions. On one hand, we selected a 3D vector dataset of buildings extracted from the BCA10 database of the Instituto de Estadística y Cartografía de Andalucía [48], a regional cartographic institution in southern Spain. This 3D vector dataset includes 3D data which represent the roofs of the buildings at scale 1:10.000 with a horizontal positional accuracy better than 1 m. On the other hand, we also used a DEM of 2.5 m of spatial resolution containing a DSM of buildings (Modelo Digital de Superficies de Edificación, MDSnE2.5) published by the Instituto Geográfico Nacional of Spain [49]. This product was obtained from a LiDAR project of this institution (PNOA-LiDAR) and shows a horizontal positional accuracy better than 0.5 m. Attribute data stored in this DEM are related to the heights of buildings. Therefore, a previous procedure was implemented to add terrain height values to buildings height. The terrain height was obtained from a DTM also published by the same institution and with the same lineage (PNOA-LiDAR project) [49]. Subsequently, both sources (BCA10 and MDSnE2.5) were referred to the same CRS (EPSG:25829 and EPSG:25830) and more specifically, height data were referenced to the same origin. Considering the nominal accuracy of the sources (1 m and 0.5 m), in this application, we selected the vector dataset as the elements to be assessed and the DSM as reference. Although the selected dataset does not meet the 3x-more-accurate level recommended for reference data, we must consider that this study is not focused on performing a quality assessment of specific datasets, but only on testing the proposed methodology. In this sense, we only considered it necessary that the reference dataset be more accurate (2x) than the other one.

After taking into account the spatial extent of the data (Andalucía covers more than 87,000 square kilometers) and the distribution in cartographic sheets (zones) of the products used in this study, we selected 34 sheets of about 7.3 per 4.7 km (area of 34.3 square kilometers) (Figure 3a) containing data related to buildings. Therefore, we considered a large area of more than 1100 square kilometers. The selection of zones (sheets) considered several aspects in order to provide a complete study. Thus, the selected zones were well distributed in the territory (eight provinces), including rural and urban areas and a great diversity of heights and slopes (Figure 3a).

Following the methodology proposed in this study, we selected a set of features from the vector dataset (polygons) in each zone (step 1 Figure 3b). The selection considered a good spatial distribution and the great diversity of buildings (step 2 in Figure 3b). Depending on the characteristics of the zone, we selected from 50 to 100 polygons. These sets of elements are significantly larger than the sample size recommended by ASPRS [8] for areas lower than 500 square kilometers (25 checkpoints). The polygons were matched to sets of cells selected from the DEM using a semi-automated procedure. More specifically, we used a buffer of 10 m around the selected polygons to select cells with a certain proximity to the vector data used (step 3 in Figure 3b). This procedure facilitated the matching between polygons and cells but also contributed to reduce the presence of outliers. After that, the matching between polygons and DSM was performed manually, removing those cells which did not show correspondence with the selected polygons and identifying possible outliers. This procedure was developed using several raster-editing tools (QGIS plugin). Finally, we obtained a set of polygons and a set of cells with height information related to those elements (step 4 in Figure 3b).

Table 1 summarizes the main aspects of the data used in this study. We highlight the large set of elements used (2839 polygons), which represent about 1.8 square kilometers of data implicated in the assessment.

Once all the data were prepared for the assessment procedure, the rest of the proposed methodology was applied using a software developed in Java JDK 1.8 for this purpose. This includes obtaining the triangulation and the results of the assessment. In this sense, the application allowed us to obtain percentages of the inclusion of cells inside matched polygons and different buffers generated around them (results of percentage of 2DInc, 2DBd, HBd and 3DBd). The buffer distances (d) selected were 0.5 m, and 1 to 10 m with a step of 1 m. This meant 11 values of buffer distance applied to the three cases (2D buffer, height buffer and 3D buffer).

4. Results

The main results obtained after applying the methodology are shown in Figure 4. Figure 4a shows the percentages of inclusion considering buffer distances (d) from 0.5 to 10 m. This range of distances was selected taking into account the accuracy of the sources. The percentages were calculated with respect to the total number of cells used in this study. The percentage of 2DInc is always fixed because it depends on the cells included planimetrically in the polygons. In this case, the percentage of cells is about 82%. Considering the 2DBd, the percentages of inclusion obtained increase from 86% (0.5 m) to about 100% (5 m). The 95% percentage is achieved using a distance of about 2.25 m. On the other hand, the analysis considering heights shows lower percentages of inclusion. The results of HBd show values from 16% (0.5 m) to the value achieved by 2DInc (82%) because the percentage of inclusion of HBd is calculated with respect to the total number of cells. However, the maximum number of cells included in HBd is the same as those included in 2DInc (cells included planimetrically within polygons). Therefore, if we consider the relation with respect to the number of cells of 2DInc (instead of all cells) the result is that shown in Figure 4b, where 90% is achieved with 4.4 m and 95% with 5.6 m approximately. Finally, the results of 3DBd (Figure 4a) show percentages from 17% (0.5 m) to 99% (10 m). 90% is achieved with a distance of about 4.6 m and 95% with 6 m. In summary, the results show values of 2.25, 5.6 and 6 m for 2D, vertical and 3D accuracies (95%), respectively.

Figure 4 also shows a relative comparison of the previous results. Thus, Figure 4c shows the difference of percentages of inclusion between 2DInc and HBd. In this case, the graphic shows a decrease in the difference between both results, achieving 5% of difference at 5.5 m. Figure 4d shows the difference of percentages of inclusion between 2DBd and 3DBd. In this case the graph is quite similar to the previous one, achieving 5% of difference at 6 m. Both graphs represent the difference between 2D and 3D cases, considering (2DBd-3DBd) and not considering (2DInc-HBd) the planimetric uncertainty of data.

In addition to the total results, we have also analyzed the percentages of inclusion, considering two aspects that characterize each zone such as average area of the polygons (size of the buildings) and the slope. The goal is to analyze the possible influence of the size of polygons and terrain topography. The results are shown in Figure 5.

First, Figure 5a shows the results categorized by the average area of the polygons. In this case, we have divided the range of cases into three intervals: below 500, between 500 and 1000, and above 1000 square meters. The purpose of this division is to analyze cases related to urban and rural zones, considering that lower values of average area are related to rural zones with isolated buildings, while the highest values are related to urban zones with large blocks of buildings. The result of 2DBd shows lower percentages of inclusion in the first category considering distances from 0.5 to 3 m. This result contrasts with that obtained using HBd because the lowest percentages are obtained with the third category considering distances from 0.5 to 3.5 m. From this distance (3.5 m) to 10 m, the behavior changes, showing the highest values those zones included in the third category (the highest average area of polygons). Finally, the 3DBd results show lower values of inclusion in the third category. These results confirm that polygons from urban areas have better percentages of inclusion in 2D, but in contrast, these polygons show lower percentages of inclusion in 3D. As a hypothesis, we consider that this issue is related to the presence of more complexity in building structures in urban landscapes. This aspect supposes more variability in height. However, the vector dataset used in this study only includes basic structures of buildings and does not represent this complexity as in the case of the DSM. Second, Figure 5b shows the results categorized by the average slope of each zone. The purpose of this analysis is to determine the effect of terrain on the results. The mean values of slopes are calculated using a DTM of each zone. The range of values is divided into three intervals: less than 5%, from 5% to 11%, and greater than 11%. The results of 2DBd show similar tendencies, while the HBd and the 3DBd show higher percentages of inclusion in cases with lower slope values (below 11%), and while the zones with higher mean slope values have lower percentages of inclusion in 3D results (HBd and 3DBd). Therefore, slope appears to show a great influence on vertical accuracy, which confirms the results described in other studies [9,10].

In addition to the analysis of results from the set of zones categorized by the type of scene and the terrain, we also analyzed some specific zones which are representative of the categories previously studied in order to detect particular behaviors. Figure 6 shows the results of these specific zones compared to the total results: 0923-14 and 0947-12 are urban zones including two of the main cities of Andalusia (about 5500 and 3200 polygons, respectively), while 1047-33 and 1027-22 are rural zones (about 360 and 270 polygons, respectively). On the other hand, 0923-14 and 1047-33 cover flat terrain (average slope of 3.9% and 2.5%, respectively) while 0947-12 and 1027-22 cover undulating and mountain terrain (average slope of 14.3% and 24.7%, respectively). The results of zone 0923-14 show higher values of inclusion in 2DInc and 2DBd with respect to the total results while HBd and 3DBd show lower values. Zone 0947-12 shows similar behaviors to the previous one. In contrast, zone 1047-33 shows higher values of HBd and 3DBd with respect to the total although the 2D results show lower values. Finally, 1027-22 shows percentages lower than the total in HBd and 3Dd. These results indicate the importance of the type of scene (urban or rural) and terrain in these results. Urban zones show lower values of 3D inclusion, while 2D percentages are higher than the total. On the other hand, rural zones show lower values of 2D results. Considering the type of terrain, all cases of 3D results in undulating and mountain terrain show lower values. This also occurs in zone 0923-14 (urban and flat), while the other flat zone (1047-33) show the best percentage of inclusion in these 3D results. Thus, the influence of the type of scene (urban vs. rural) is clear in 3D results. The influence of terrain slope is not clear considering these zones because it can be affected to a greater extent by the type of scene.

To test the behavior of these metrics in urban scenes, we selected several cases of buildings. In this sense, Figure 7 shows the results obtained for specific buildings located in an urban zone (0947-12). The three examples (Figure 7(a1,b1,c1)) represent those cells included considering the results of inclusion obtained in the 2DInc (Figure 7(a2,b2,c2)), 2DBd (Figure 7(a3,b3,c3)), HBd (Figure 7(a4,b4,c4)) and 3DBd (Figure 7(a5,b5,c5)) by taking into account a buffer distance of 3 m. The first example (case a) shows a building where a large number of cells are included in both 2D and 3D. The second example (case b) shows a building with a large amount of inclusion in 2D but with some gaps in 3D due to the height discrepancies of some internal cells. The third example (case c) shows a building with a large amount of cells included (2D results), while the 3D values of inclusion are null (HBd) and some cells are included within the 3D buffer. Clearly, this building has great discrepancies in height probably due to its complex structure that is not well defined by vector data.

5. Discussion

The results described in this study demonstrate the feasibility of our approach for assessing the positional accuracy of data. The metrics related to 2D accuracy show consistent results considering the characteristics of the dataset assessed (a priori accuracy indicated by the producer). Thus, 90% of cells are included in the 2D buffer (2DBd) of 1 m and 95% in that of 2 m. On the other hand, vertical accuracy is related to the HBd results (cells included in 2DInc). In this case, 90% is achieved at a distance of 4.4 m and 95% at 5.6 m. These results represent higher values with respect to 2D accuracy. The 3D accuracy shows 4.6 m (90%) and 6 m (95%). This is coherent with those values obtained for horizontal and vertical accuracies. We highlight that the vertical accuracy has the greatest influence on the 3D results.

The analysis of specific cases, considering type of scene and slope, has evidenced their influence on the results. Type of scene (urban and rural zones based on the average area of polygons) has shown greater influence with respect to terrain conditions, even in undulating and mountainous zones. We assume that this is due to two main factors: First, there is a high 2D accuracy, so the influence of the 2D uncertainty on vertical accuracy is lower even when the slope is higher. Second, the level of detail of the 3D vector dataset is not adequate enough to represent complex buildings considering their internal height variation. This has a direct effect on large structures of buildings typically located in urban zones. These hypotheses have been evidenced by analyzing specific cases of buildings located in an urban zone. However, we suggest developing a specific study using synthetic data to demonstrate them.

Therefore, the methodology proposed in this study provides us with a wide variety of results in the form of distribution functions describing the 2D accuracy, the vertical accuracy and the 3D accuracy of data. Obtaining distribution functions to describe accuracies allows us to determine three results of positional accuracy (horizontal or 2D, vertical and 3D) considering any specific confidence level. We consider this approach to be a great improvement with respect to point-based methods (published in several standards [4,5,6,7,8]) due to, among other causes, the increase of the area checked. For instance, if we consider the same sample size of points and check-surfaces, the number of assessed cells is obviously higher when applying our approach. Moreover the use of highlighted elements, such as buildings, allows us a direct assessment of horizontal and vertical accuracy, considering both planimetric and vertical uncertainties of data in contrast to other methods based on surfaces [42].

The proposed procedure uses the position of the center of the cells and the height attribute to determine the inclusion in a polygon and buffers. An extension of this approach can include consideration of the entire cell (voxels) in the calculation of the four results (2DInc, 2DBd, HDb and 3DBd). The idea is that a voxel can be considered completely or partially included in a polygon and buffers. Therefore, the inclusion of a voxel will consider a decimal value related to the percentage of inclusion (from 0 to 1) instead of selecting a binary value (0 or 1—not included or included), as in the case of cells.

6. Conclusions

The methodology proposed in this study is based on the application of the band uncertainty model to a set of check-surfaces in order to assess the positional accuracy of DEMs or 3D vector data using the other dataset as reference. The selected check-surfaces are obtained from the 3D vector dataset by a triangulation procedure applied to polygons containing height data. These polygons must represent features that are well defined both planimetrically and in height. This implies the use of elevated or depressed features with respect to the surrounding area. The methodology provides four results (2DInc, 2DBd, HDb and 3DBd), described in the form of distribution functions, which are related to the 2D or horizontal accuracy, the vertical accuracy and the 3D accuracy. The representation of these distribution functions provides us with a complete overview of the behavior of these accuracies considering any confidence level.

The application of this methodology has been carried out on a large volume of data distributed over an extended area obtained from two official datasets published in Spain. The results have demonstrated the feasibility of the methodology for assessing the positional accuracy (2D, vertical and 3D) of the dataset to be assessed (in this case the 3D vector dataset) based on the reference source (DEM). Therefore, the main objective of this study has been widely achieved. However, we have detected some limitations related to the definition and resolution of data. In particular, the 3D vector dataset used in this application contains polygons representing the 3D boundaries of buildings, but does not contain information about their internal structures. This aspect has caused some discrepancies in the results of height accuracy in certain cases of large buildings with complex roof structures (typical of urban zones). In these cases, lower values of accuracy are more related to this lack of height information in internal areas. This problem can also affect DEMs, taking into account their spatial resolution. However, this issue can be considered when selecting the sample of data to be used in the assessment, thus avoiding these elements. In any case, the level of detail (LoD in 3D city models) and the spatial resolution of both datasets should be considered when applying this methodology to buildings. In this sense, when using buildings, we recommend a certain level of detail (LoD2) of the vector dataset and a sufficient spatial resolution of the DEM considering the accuracy to be assessed.

Without question, the methodology proposed in this study has demonstrated its feasibility, consistency and validity in assessing the positional accuracy of 3D vector datasets and DEMs, providing several values of accuracy (2D, vertical and 3D). This allows us to analyze positional accuracy individually and jointly. The field of application is enormous. On one hand, assessing any DEMs that contain significant heights with respect to the surrounding data (e.g., DSM of buildings, infrastructures, etc.), and on the other hand, evaluating 3D vector datasets including cartographic databases, city models, 3D cadastre, etc.

Future work will focus on the sampling procedure when using these types of elements. In addition, we will adapt this methodology to other datasets (e.g., raw LiDAR data) and test it using other elements (e.g., roads). The adaptation of this approach to TIN models should also be studied. The influence of the scene and terrain type could also be studied in depth, considering the preliminary results obtained in this application.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

References

ISO 19157:2013; Geographic Information–Quality Principles. ISO: Geneva, Switzerland, 2013.
ISO 19101-1:2014; Geographic Information–Reference Model–Part 1: Fundamentals. ISO: Geneva, Switzerland, 2014.
Höhle, J.; Potuckova, M. Assessment of the Quality of Digital Terrain Models; European Spatial Data Research: Amsterdam, The Netherlands, 2011. [Google Scholar]
U.S. Bureau of the Budget. United States National Map Accuracy Standards; U.S. Bureau of the Budget: Washington, DC, USA, 1947.
American Society of Civil Engineers. Map Uses, Scales and Accuracies for Engineering and Associated Purposes; American Society of Civil Engineers, Committee on Cartographic Surveying, Surveying and Mapping Division: New York, NY, USA, 1983. [Google Scholar]
FGDC-STD-007; Geospatial Positioning Accuracy Standards, Part 3. National Standard for Spatial Data Accuracy. Federal Geographic Data Committee: Reston, VA, USA, 1998.
American Society for Photogrammetry and Remote Sensing. ASPRS accuracy standards for large-scale maps. Photogramm. Eng. Remote Sens. 1990, 56, 1068–1070. [Google Scholar]
American Society for Photogrammetry and Remote Sensing. ASPRS Positional Accuracy Standards for Digital Geospatial Data. Photogramm. Eng. Remote Sens. 2015, 81, 277. [Google Scholar]
Hodgson, M.E.; Jensen, J.; Raber, G.; Tullis, J.; Davis, B.A.; Thompson, G.; Schuckman, K. An evaluation of Lidar-derived elevation and terrain slope in leaf-off conditions. Photogramm. Eng. Remote Sens. 2005, 71, 817–823. [Google Scholar] [CrossRef]
Mozas-Calvache, A.T.; Ureña-Cámara, M.A.; Pérez-García, J.L. Accuracy of contour lines using 3D bands. Int. J. Geogr. Inf. Sci. 2013, 27, 2362–2374. [Google Scholar] [CrossRef]
Imhof, E. Cartographic Relief Presentation; ESRI Press: Redlands, CA, USA, 2007. [Google Scholar]
Greenwalt, C.R.; Shultz, M.E. Principles of Error Theory and Cartographic Applications; Aeronautical Chart and Information Center: St. Louis, MO, USA, 1962. [Google Scholar]
Perkal, J. On epsilon length. Bull. Académie Pol. Sci. 1956, 4, 399–403. [Google Scholar]
Chrisman, N.R. A theory of cartographic error and its measurement in digital bases. Proc. AutoCarto 1982, 5, 159–3168. [Google Scholar]
Blakemore, M. Generalisation and error in spatial data bases. Cartographica 1984, 21, 131–139. [Google Scholar] [CrossRef]
Goodchild, M.F. A model of error for choropleth maps, with applications to geographic information systems. Proc. AutoCarto 1987, 8, 165–174. [Google Scholar]
Caspary, W.; Scheuring, R. Positional accuracy in spatial databases. Comput. Environ. Urban Syst. 1993, 17, 103–110. [Google Scholar] [CrossRef]
Hunter, G.J.; Goodchild, M.F. Dealing with error in Spatial Databases: A simple case of study. Photogramm. Eng. Remote Sens. 1995, 61, 529–537. [Google Scholar]
Kronenfeld, B.J. Beyond the epsilon band: Polygonal modeling of gradation/uncertainty in area-class maps. Int. J. Geogr. Inf. Sci. 2011, 25, 1749–1771. [Google Scholar] [CrossRef]
Leung, Y.; Yan, J. A locational error model for spatial features. Int. J. Geogr. Inf. Sci. 1998, 12, 607–620. [Google Scholar] [CrossRef]
Shi, W.; Liu, W. A stochastic process based model for the positional error of line segments in GIS. Int. J. Geogr. Inf. Sci. 2000, 14, 51–66. [Google Scholar] [CrossRef]
Heuvelink, G.B.M.; Brown, J.D.; Van Loon, E.E. A probabilistic framework for representing and simulating uncertain environmental variables. Int. J. Geogr. Inf. Sci. 2007, 21, 497–513. [Google Scholar] [CrossRef]
Wu, H.S.; Liu, Z.L. Simulation and model validation of positional uncertainty of line feature on manual digitizing a map. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2008, XXXVII-B2, 843–848. [Google Scholar]
Leung, Y.; Jian-Hong, M.; Goodchild, M.F. A general framework for error analysis in measurement-based GIS: Part 1, The basic measurement error model and related concepts. J. Geogr. Syst. 2004, 6, 325–354. [Google Scholar] [CrossRef]
Li, D.; Zhang, J.; Wu, H. Spatial data quality and beyond. Int. J. Geogr. Inf. Sci. 2012, 26, 2277–2290. [Google Scholar] [CrossRef]
Foy, A.S.; Carstensen, L.W.; Prisley, S.P.; Campbell, J.B.; Dymond, R.L. A Review and Evaluation of Uncertainty Classification and the Error-Band Geometry Model. Trans. GIS 2015, 19, 604–618. [Google Scholar] [CrossRef]
Gil de la Vega, P.; Ariza-López, F.J.; Mozas-Calvache, A.T. Models for positional accuracy assessment of linear features: 2D and 3D cases. Surv. Rev. 2016, 48, 347–360. [Google Scholar] [CrossRef]
McMaster, R.B. A statistical analysis of mathematical measures for linear simplification. Am. Cartogr. 1986, 13, 103–116. [Google Scholar] [CrossRef]
Skidmore, A.; Turner, B. Map accuracy assessment using line intersect sampling. Photogramm. Eng. Remote Sens. 1992, 58, 1453–1457. [Google Scholar]
Goodchild, M.F.; Hunter, G.J. A simple positional accuracy measure for linear features. Int. J. Geogr. Inf. Sci. 1997, 11, 299–306. [Google Scholar] [CrossRef]
Tveite, H.; Langaas, S. An accuracy assessment meted for geographical line data sets based on buffering. Int. J. Geogr. Inf. Sci. 1999, 13, 27–47. [Google Scholar] [CrossRef]
Mozas, A.T.; Ariza, F.J. New method for positional quality control in cartography based on lines. A comparative study of methodologies. Int. J. Geogr. Inf. Sci. 2011, 25, 1681–1695. [Google Scholar] [CrossRef]
Mozas-Calvache, A.T.; Ariza-López, F.J. Adapting 2D positional control methodologies based on linear elements to 3D. Surv. Rev. 2015, 47, 195–201. [Google Scholar] [CrossRef]
Guth, P.L.; Van Niekerk, A.; Grohmann, C.H.; Muller, J.P.; Hawker, L.; Florinsky, I.V.; Gesch, D.; Reuter, H.I.; Herrera-Cruz, V.; Riazanoff, S.; et al. Digital elevation models: Terminology and definitions. Remote Sens. 2021, 13, 3581. [Google Scholar] [CrossRef]
Fisher, P.F.; Tate, N.J. Causes and consequences of error in digital elevation models. Prog. Phys. Geogr. 2006, 30, 467–489. [Google Scholar] [CrossRef]
Mesa-Mingorance, J.L.; Ariza-López, F.J. Accuracy assessment of digital elevation models (DEMs): A critical review of practices of the past three decades. Remote Sens. 2020, 12, 2630. [Google Scholar] [CrossRef]
Polidori, L.; El Hage, M. Digital elevation model quality assessment methods: A critical review. Remote Sens. 2020, 12, 3522. [Google Scholar] [CrossRef]
Höhle, J.; Höhle, M. Accuracy assessment of digital elevation models by means of robust statistical methods. ISPRS J. Photogramm. Remote Sens. 2009, 64, 398–406. [Google Scholar] [CrossRef]
Guan, L.; Pan, H.; Zou, S.; Hu, J.; Zhu, X.; Zhou, P. The impact of horizontal errors on the accuracy of freely available Digital Elevation Models (DEMs). Int. J. Remote Sens. 2020, 41, 7383–7399. [Google Scholar] [CrossRef]
Becek, K. Assessing global digital elevation models using the runway method: The advanced spaceborne thermal emission and reflection radiometer versus the shuttle radar topography mission case. IEEE Trans. Geosci. Remote Sens. 2013, 52, 4823–4831. [Google Scholar] [CrossRef]
Höhle, J. Assessing the positional accuracy of airborne laser scanning in urban areas. Photogramm. Rec. 2013, 28, 196–210. [Google Scholar] [CrossRef]
Ariza-López, F.J.; Reinoso-Gordo, J.F. Comparison of Gridded DEMs by Buffering. Remote Sens. 2021, 13, 3002. [Google Scholar] [CrossRef]
Santos, A.D.P.D.; Medeiros, N.D.G.; Poz, A.P.D.; Santos, G.R.D.; Rodrigues, D.D.; Emiliano, P.C. Methodology for the extraction of homologous points from a DEM/DSM to evaluate the relative positional accuracy. Bull. Geod. Sci. 2020, 26, e2020007. [Google Scholar] [CrossRef]
Ruiz, J.J.; Ariza, F.J.; Ureña, M.A.; Blázquez, E.B. Digital map conflation: A review of the process and a proposal for classification. Int. J. Geogr. Inf. Sci. 2011, 25, 1439–1466. [Google Scholar] [CrossRef]
Xavier, E.M.; Ariza-López, F.J.; Ureña-Cámara, M.A. A survey of measures and methods for matching geospatial vector datasets. ACM Comput. Surv. 2016, 49, 1–34. [Google Scholar] [CrossRef]
Lei, T.; Lei, Z. Optimal spatial data matching for conflation: A network flow-based approach. Trans. GIS 2019, 23, 1152–1176. [Google Scholar] [CrossRef]
Mozas-Calvache, A.T. A geometrical approach to matching raster & vector databases of buildings. J. Spat. Sci. 2022, 1–16. [Google Scholar] [CrossRef]
Instituto de Estadística y Cartografía de Andalucía. Base Cartografica de Andalucia. 1:10.000. Available online: https://www.juntadeandalucia.es/institutodeestadisticaycartografia/prodCartografia/bc/bca.htm (accessed on 20 January 2023).
Instituto Geográfico Nacional. Modelo Digitales de Elevaciones. Available online: http://centrodedescargas.cnig.es/CentroDescargas/catalogo.do?Serie=MDT02 (accessed on 20 January 2023).

Figure 1. Proposed methodology.

Figure 2. Determination of buffers: (a) 3D vector (polygons) and DSM; (b) 2D inclusion; (c) 2D buffer; (d) 3D triangulation of polygon (check-surface); (e) height buffer; (f) 3D buffer.

Figure 3. Example of the application developed in this study: (a) sheets selected (in red); (b) procedure to select polygons and matched cells.

Figure 4. Main results obtained in this study: (a) percentages of inclusion with respect to all cells (all metrics); (b) percentages of HBd with respect to the number of cells of 2DInc; (c) differences between 2DInc and HDb; (d) differences between 2DBd and 3DBd.

Figure 5. Results categorized by: (a) average area of polygons; (b) slope.

Figure 6. Results of specific zones.

Figure 7. Examples of results obtained from specific buildings.

Table 1. Summary of data used in this study.

		Total	Average (by Zone)	Average (by Element)
Zones	Number	34
	Area (km²)	1166.5	34.3
	Maximum height (m)	2770	707
	Average height (m)		486
	Minimum height (m)	0	330
	Standard deviation (m)		77
	Slope (%)		10
Total polygons	Number (N)	56,566	1664
Total polygons	Density (N/km²)		49
Selected polygons	Number (n)	2839	84
	Area (m²)	1,793,764	52,758	631
	Perimeter (m)	282,240	8301	99
Selected cells	Number	311,794
Selected cells	Area (m²)	1,948,712	57,315

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mozas-Calvache, A.T. Positional Accuracy Assessment of Digital Elevation Models and 3D Vector Datasets Using Check-Surfaces. ISPRS Int. J. Geo-Inf. 2023, 12, 348. https://doi.org/10.3390/ijgi12090348

AMA Style

Mozas-Calvache AT. Positional Accuracy Assessment of Digital Elevation Models and 3D Vector Datasets Using Check-Surfaces. ISPRS International Journal of Geo-Information. 2023; 12(9):348. https://doi.org/10.3390/ijgi12090348

Chicago/Turabian Style

Mozas-Calvache, Antonio Tomás. 2023. "Positional Accuracy Assessment of Digital Elevation Models and 3D Vector Datasets Using Check-Surfaces" ISPRS International Journal of Geo-Information 12, no. 9: 348. https://doi.org/10.3390/ijgi12090348

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Positional Accuracy Assessment of Digital Elevation Models and 3D Vector Datasets Using Check-Surfaces

Abstract

1. Introduction

1.1. Positional Accuracy of Vector Databases

1.2. Positional Accuracy of DEMs

1.3. Objectives and Structure

2. Methodology

3. Application

4. Results

5. Discussion

6. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI