Next Article in Journal
Recognizing Building Group Patterns in Topographic Maps by Integrating Building Functional and Geometric Information
Previous Article in Journal
Urban Air Quality Assessment by Fusing Spatial and Temporal Data from Multiple Study Sources Using Refined Estimation Methods
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Multi-Scale Residential Areas Matching Method Considering Spatial Neighborhood Features

1
Institute of Geospatial Information, Information Engineering University, Zhengzhou 450000, China
2
Collaborative Innovation Center of Geo-Information Technology for Smart Central Plains, Zhengzhou 450000, China
3
Key Laboratory of Spatiotemporal Perception and Intelligent Processing, Ministry of Natural Resources, Zhengzhou 450000, China
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2022, 11(6), 331; https://doi.org/10.3390/ijgi11060331
Submission received: 6 April 2022 / Revised: 30 May 2022 / Accepted: 30 May 2022 / Published: 31 May 2022

Abstract

:
Residential areas is one of the basic geographical elements on the map and an important content of the map representation. Multi-scale residential areas matching refers to the process of identifying and associating entities with the same name in different data sources, which can be widely used in map compilation, data fusion, change detection and update. A matching method considering spatial neighborhood features is proposed to solve the complex matching problem of multi-scale residential areas. The method uses Delaunay triangulation to divide complex matching entities in different scales into closed domains through spatial neighborhood clusters, which can obtain many-to-many matching candidate feature sets. At the same time, the geometric features and topological features of the residential areas are fully considered, and the Relief-F algorithm is used to determine the weight values of different similarity features. Then the similarity and spatial neighborhood similarity of the polygon residential areas are calculated, after which the final matching results are obtained. The experimental results show that the accuracy rate, recall rate and F value of the matching method are all above 90%, which has a high matching accuracy. It can identify a variety of matching relationships and overcome the influence of certain positional deviations on matching results. The proposed method can not only take account of the spatial neighborhood characteristics of residential areas, but also identify complex matching relationships in multi-scale residential areas accurately with a good matching effect.

1. Introduction

Spatial object matching is an important prerequisite to realize multi-source spatial information fusion, spatial object change detection and dynamic update, and its purpose is to establish the corresponding relationship between the same object with the same name in different spatial data sources. Residential area matching is an important foundation of spatial data processing and applications. However, data from various sources differs in many ways, including geometry, attributes, and spatial relationships. This poses significant challenges for spatial data matching, especially at different spatial scales. Under the effects of cartographic generalization, the spatial distribution of residential areas is more diverse, and matching relationships between data are more complex [1,2,3,4,5,6]. As such, research on matching residential areas, especially multi-scale residential areas, is of great significance.
Many scholars have conducted studies on areal entity matching that have focused on two main aspects: measures of similarity and matching strategies. Studies on measures of similarity mainly use geometric, topological, and semantic features of areal elements as similarity features, among which geometric features are most used. In the calculation of spatial similarity, the location, area, direction and shape of entities are generally used as indicators to calculate the similarity of elements from different sources, and each indicator is assigned a weight according to the features of different entities, and then the total similarity is obtained. Masuyama et al. [7] calculated the possibility of matching according to the degree of overlap of areas. Hao et al. [8] generalized similarities in the shape, position, and size of areal entities and identified matching entities based on their degree of total similarity. An et al. [9] proposed a multilevel description method for measuring the shape similarity of multi-scale areal entities. Luo et al. [10] combined spatial and semantic features of areal entities to identify the best matching objects. Some studies [11,12] have also calculated the similarities of contextual environmental features to match polygon residential areas, and mainly uses the area and perimeter of the triangle around the target building to calculate the contextual similarity.
Among the studies that have been conducted on matching strategies, Wang et al. [4] proposed a new method to match residential areas using skeleton line mesh of black region, and the matching object is transformed into skeleton line mesh matching. Tong et al. [13] proposed a matching model based on probability theory that matches targets by calculating the probability of candidate targets. Wu et al. [14] constructed a Voronoi diagram of polygon residential areas and used convex hull similarities to match areal entities. Wang et al. [15] used geometric features of sample data to train a neural network model and determine final matching results. Other studies [16,17] have used the probabilistic relaxation method to solve the matching matrix and obtain globally optimal matching results.
Although the above studies have addressed issues with areal entity matching to some extent, most involved matching entities at the same or similar scales, and their methods do not apply to complex matching at multiple scales. This study was conducted to develop a multi-scale residential area matching method that takes into account spatial neighborhood features. This method divides complex matching entities at multiple scales into closed domains using spatial neighborhood clusters. It also considers the influence of the features of residential areas themselves and spatial neighborhood features on the matching results, matching the different matching relationships to obtain the final matching results.

2. Complex Matching Candidate Determination Method for Multi-Scale Polygon Residential Areas

2.1. Complex Matching Relationship Analysis of Multi-Scale Polygon Residential Areas

Because of factors such as data sources, spatial accuracy, and cartographic generalization, there are significant differences in geometric, attribute, and topological features of residential elements in multi-scale spatial data. According to the entity correspondence of residential areas in different scale data and the number of entities included, the matching mode of multi-scale residential areas can be subdivided into 1: 0, 0: 1, 1: 1, 1: N, M: 1 and M: N, as shown in Table 1.
(1) 1: 0 mode, the residential areas existing in the large scale data have no corresponding targets in the small scale data. This matching relation may be caused by the selection algorithm. When large scale data is reduced to small scale data, it is necessary to make trade-offs due to the reduction of map size, resulting in the abandonment of corresponding targets in small scale data.
(2) 0: 1 mode, the residential areas existing in the small scale data have no corresponding target in the larger scale data. The matching relationship may be caused by the multi-source of spatial data or the temporal difference of data at different scales. If the large scale data is relatively new, the residents corresponding to the small scale data are deleted, while the small scale data is not updated.
(3) 1: 1 mode, a residential area in the large scale data corresponds to a polygon in the small scale data. This situation involves map simplification, where the geometry (size, shape, etc.) may change despite the existence of one-to-one polygon object at different scales.
(4) 1: N mode, one residential area in the large scale data corresponds to N polygon residential areas in the small scale data. It may be caused by multi-source, multi-temporal or data error of spatial data of different scales.
(5) M: 1 mode, M polygon residential areas in the large scale data correspond to one settlement in the small scale data. This situation is mainly caused by the mergence operator. In the scale transformation, multiple residential areas are merged into one, so there will be a many-to-one matching relationship.
(6) M: N mode, multiple polygon residential areas in large scale data correspond to multiple polygon residential areas in small scale data. This situation is mainly caused by typification operators, which can maintain the consistency of spatial features despite the inconsistency of the number of targets at different scales.
Compared with the same scale spatial data matching, multi-scale residential areas have more complex matchings, and the determination of matching target is more complicated. First, most of the data at smaller scales is generalized from data at larger scales; therefore, there are corresponding relationships between the data in terms of cartographic generalization. For example, the mergence operator merges multiple objects into one object, resulting in a one-to-many matching relationship. Many-to-many matching relationships are mainly derived from the typification operator in cartographic generalization [18]. Although the quantities before and after generalization are inconsistent, the typical features of elements can be maintained, as shown in Figure 1a. In addition, combinations of different cartographic generalization operators (including selection, simplification, and displacement) are usually used. The combined use of simplification, mergence, and displacement operators changes the shapes of elements and causes certain displacements, so it becomes more difficult to identify matching relationships between elements, as shown in Figure 1b.
In present study, a two-way matching strategy was used to solve complex matching relationships. The main idea of two-way matching strategy is to discover the possible matching relationship between two kinds of data by exchanging the identity of the matched object and the object to be matched. Traditional methods with two-way matching strategy mainly uses the buffer zone or Minimum Bounding Rectangle (MBR) candidate matching methods, which exchange matching data and reference data and find matching relationships in multi-scale data. Traditional two-way matching is effective for one-to-one and one-to-many matching, but it often leads to false and missing matches in many-to-many matching generated by a typification operator. With incremental convex hull, group object detection, target clustering, and other methods, the efficacy of many-to-many matching is also unremarkable [19,20].

2.2. Determining Many-to-Many Matching Candidates Based on Spatial Neighborhood Clusters

For complex matching of multi-scale polygon residential areas, especially many-to-many matching relationships, it is necessary to obtain corresponding candidate elements in the data at different scales, that is, to determine candidate elements with many-to-many matching relationships. To this end, this study introduces the concept of spatial neighborhood clusters, which adheres to the basic principles of gestalt. It can aggregate elements with consistent spatial distribution patterns and adjacent elements together to ensure that the integrity of the aggregated element set is not undermined [21]. In many-to-many matching, spatial elements are aggregated elements with neighboring relationships, so they can be divided by spatial neighborhood clusters. In this method, Delaunay triangulation network was used to construct the spatial adjacency relations of residential areas, and the candidate elements were divided into closed spatial neighborhoods. Then, the spatial neighborhood cluster composed of several elements was obtained by further screening through the spatial distance, which was called the matching aggregation factor set.
Let two types of data to be matched be small-scale data S = { s 1 , s 2 , , s m } and large-scale data C = { c 1 , c 2 , , c n } , among which s i and c i are polygons to be matched in data S and C , and m and n are corresponding polygon numbers respectively. First, based on the initial matching results, dividing S and C into two types, the matched entities were labeled S 1 and C 1 , and the unmatched entities were labeled S 2 and C 2 , wherein S 1 and C 1 are a 1 : 1 and 1 : N match, and S 2 and C 2 are a 1 : 0 and M : N match, respectively. Then, obtain the geometric center of the first type of entity ( S 1 and C 1 ), and use Delaunay triangulation to construct the spatial neighborhood relationship of S 1 and C 1 based on the geometric center point. Finally, merge the triangular spaces that cover the second type of entity ( S 2 and C 2 ). Entities within these triangulated spaces are spatial neighborhood clusters with M : N matched features.
Because the first type of entity does not contain an M : N match, a Delaunay triangulation is constructed, and the M : N match is distributed in the triangular space constructed by the geometric center points of the first type of entity ( S 1 and C 1 ). Then, the triangular spaces of the covered S 2 and C 2 are merged, which can prevent the aggregate element set of the M : N match from being undermined, thus ensuring its continuity and edges. As can be seen from Figure 2, D means triangle, if D 1 , D 2 , and D 3 are intersected with s 1 , s 2 , and s 3 from the second type of entity, then the triangles are merged, D 3 and D 4 are intersected with s 4 , and D 3 , D 4 are then merged. Finally, D 1 , D 2 , D 3 and D 4 are merged into a triangular neighborhood space. In addition, s 5 only intersects with D 6 , so D 6 is a single triangular space.
After the neighborhood spaces of the triangles are determined, the areal entities ( S 2 and C 2 ) located in the triangular spaces can be found by spatial analysis. There may be multiple areal entities in the same triangular space, but the spatial neighborhood clusters at this point are not necessarily the final many-to-many matching candidate element set. Some areal elements are far apart, so they do not meet the M : N match spatial proximity rule. Therefore, it is necessary to further narrow the scope to eliminate spatial neighborhood clusters and determine the final M : N matching element set. Drawing on the existing constrained Delaunay triangulation method, the method employed was as follows: (1) The edge nodes of polygon residential areas were extracted. Given that there are few residential area nodes, long, narrow triangles are likely to affect the construction of the Delaunay triangulation. Based on the literature [4], the residential nodes were densified. (2) A Delaunay triangulation was constructed that was constrained by the edges of the residential area, as shown in Figure 3a. (3) Based on the constrained Delaunay triangulation, the triangles of length smaller than d t that connect different polygons were retained ( d t = G μ , where G is the denominator of the scale and μ is 0.4 mm [22]). Based on the connections of the triangle edges, the spatial neighborhood clusters of neighboring residential areas, that is, the final M : N matching candidate element set, were obtained, as shown in Figure 3b. Finally, the spatial domain needs to be calculated as a whole, so the convex hull method is introduced. A convex hull is the minimal convex polygon that contains all points of a cluster and represents the basic structure for describing the shape of a spatial object with small variable values. The convex hull processing is carried out for this spatial domain, and the similarity calculation is carried out to determine the matching relation of M: N.
According to the above steps, the pseudo-code of the many-to-many matching candidate determination method based on spatial neighborhood cluster is as Algorithm 1:
Algorithm 1 Spatial Neighborhood Clusters Algorithm
Input: small scale residential data S, large scale residential data C, divide S1, S2 and C1, C2;
Output: candidate matching cluster
/*start*/
/*1. Construct Delaunay triangulation */
GetCentroid (S1 and C1)
Delaunay (S1 and C1)
/*2. Merge triangulation space */
For i in S2
  If S2(i) or C2(i).Intersection (Delaunay_triangulation):
  Cluster_tri.append (Delaunay_triangulation)
Spatial_Analysis (Cluster_tri and C2(i) or S2(i))
Get (Cluster_origin)
/*3. The final spatial cluster */
Delaunay_constraint (Cluster_origin)
Calcuate(dt)
Get (Cluster_final)
Return Cluster_final
/*end*/

3. Calculating Similarity Taking into Account Spatial Neighborhood Features

In the real world, geographic elements have their own spatial features, and they are related to and mutually influence their surrounding neighborhood elements. The spatial features of residential entities that can be matched with each other have certain similarities, but their neighborhood features should also be similar. Sometimes, relying on their own features alone will not lead to accurate matching results. In Figure 1b, for example, p 2 and p 3 overlap spatially with q 1 , but if neighborhood features are considered, it is relatively obvious that p 2 and p 3 are matching entities with q 2 . Therefore, in the process of residential matching, it is necessary to consider the spatial similarities of the entities themselves, as well as to consider the similarities of spatial neighborhood features, and fully combine geometric and topological features of elements to obtain accurate matching results.

3.1. Similarities in Features of Residential Areas

Traditional similarity measures mainly include location, direction, area and shape, etc. The following is a brief introduction.

3.1.1. Location Similarity

Spatial distance reflects the locational proximity of geographic entities, and it is an important measure of geometric similarity. The location of residential area is mainly reflected by its centroid, so the location similarity can be reflected by calculating the distance of centroid. If P a ( x 1 , y 1 ) and P b ( x 2 , y 2 ) are the geometric centroids of two polygon residential areas, the Euclidean distance is used to calculate position similarity as follows:
s i m s e l f _ p o s = 1 ( x 1 x 2 ) 2 + ( y 1 y 2 ) 2 U
where U is the maximum distance between the edge vertices of two polygon residential areas.

3.1.2. Direction Similarity

Direction is an important distinguishing feature of polygon residential areas, and it is generally the long axis of the Minimum Bounding Rectangle. The similarity of the directions of the two residential areas can be reflected by comparing the difference of the long axis direction of the MBR. If θ 1 and θ 2 are the direction values of two polygon settlements, then their direction similarity is calculated as follows:
s i m s e l f _ d i r = 1 θ 1 θ 2 θ τ
where θ τ is the direction threshold value, which is usually π / 2 .

3.1.3. Area Similarity

Area size is an important feature of a residential area. The smaller the difference is in the area of residential areas being matched, the greater the possibility is of them being similar entities. If A 1 and A 2 are the areas of two polygon residential areas, their area similarity is defined as follows:
s i m s e l f _ a r e a = 1 A 1 A 2 M a x ( A 1 , A 2 )
where M a x ( A 1 , A 2 ) is the maximum value of A 1 and A 2 .

3.1.4. Shape Similarities

Shape is another important visual feature of polygon settlements, which can be widely used in detection, recognition, and matching of polygon objects. In this study, the turning function is used to describe the shape features of polygon residential areas. A polygon can be represented using a list of angle-length pairs, whereby the angle at a vertex is the accumulated tangent angle at this point, while the corresponding length is the normalized accumulated length of the polygon at this point [23]. Figure 4 shows the change of tangent angles (y-axis) along the normalized accumulated length of the polygon sides (x-axis) in the clockwise direction from the starting point (black hollow circle), respectively. From that point of view, the tangent angle can be regarded as a function of the normalized accumulated length. This kind of representation is invariant to rotation because it contains no orientation information. Furthermore, it is invariant to scaling, since the normalized length makes it independent of the polygon size. The shape similarity of the turning function is calculated as follows:
s i m s e l f _ s h a p e = 1 1 0 e a ( l ) e b ( l ) d l max ( 1 0 e a ( l ) d l , 1 0 e b ( l ) d l )
where e a ( l ) and e b ( l ) are the accumulated values of the corners of polygon residential areas.
In summary, the total similarity of features of the residential area itself is calculated as follows:
s i m s e l f = i = 1 4 w i s i m s e l f _ X
where X denotes p o s , d i r , a r e a , s h a p e i.e., position, direction, area, and shape features, and i = 1 4 w i = 1 .

3.2. Similarity of Spatial Neighborhood Features of Residential Areas

3.2.1. Identifying Spatial Neighbors

In order to calculate the similarity of spatial neighborhood features on the basis of the similarity of residents’ own features, it is necessary to determine the spatial proximity target of residents’ elements first. In this study, we used a Voronoi diagram to determine the space neighbors of small-scale polygon residential data S . A Voronoi diagram is a spatial neighborhood analysis tool that effectively conveys the spatial proximity of geographic entities. It is divided according to the nearest neighbor principle, and each point is associated with its nearest neighbor. It can ensure that each element has a certain number of space neighbors. First, the geometric centers of polygon residential areas in S were obtained, and then the Voronoi diagram was constructed. Space neighbors of polygonal elements can be determined according to distribution relationships. Figure 5 shows that, according to the spatial distribution of the Voronoi diagram, the space neighbors of residential area s 0 are s 1 s 6 . For polygon residential area s i with space neighbor s h , V o r represents the Voronoi diagram spatial distribution, and its space neighbor set is defined as N i = { s h : ( s i , s h ) V o r } .
Since the large-scale data set C contains more detailed and more geographic elements, the spatial distribution is quite different from that of the small-scale data set S , and the same method does not guarantee consistent space neighbors. Therefore, for the large-scale residential area data set C , defining the set of other areal targets in data set C with a distance from c j that is smaller than d τ as space neighbors can be expressed as follows: N j = { c k : d ( c j , c k ) d τ } , where c j is the candidate matching entity of s i , c k is the space neighbor of c j , and d τ is the distance threshold, which is determined according to the accuracy of the data.
Using the Voronoi diagram and the distance threshold method, it is possible to determine the space neighbors of the multi-scale polygon residential area data and then calculate the space neighbor similarity of polygon residential areas at multiple scales.

3.2.2. Spatial Neighborhood Similarity

Once spatial neighbors are determined, it is necessary to determine specific matching relationships by calculating the spatial neighborhood similarity of polygon residential areas. If s h and c k are the space neighbors of s i and c j , then ( s h , c k ) is the neighbor candidate match of ( s i , c j ) , the relative position, relative direction, relative area, and relative shape similarities of their spatial neighborhood environments can be calculated as Figure 6 shows.
(1) Relative position similarity
Relative position is determined primarily from the distance and direction features of ( s i , c j ) and ( s h , c k ) , as shown in Figure 6. The calculation formula is as follows:
r d i s = 1 d ( s i , s h ) d ( c j , c k ) max s m N i , c n N j ( d ( s i , s m ) , d ( c j , c n ) ) r d i r = 1 α ( s i s h , c j c k ) π / 2 s i m r e l _ p o s = r d i s × r d i r
where d ( s i , s h ) and d ( c j , c k ) are the geometric center distances of areal targets; N i and N j are the space neighbor sets of s i and c j ; and r d i s is the relative distance relationship of the neighbor candidate match. α ( s i s h , c j c k ) represents the relative direction of the centroid connection of ( s i , s h ) and ( c j , c k ) , with a threshold value of π / 2 . The relative position similarity s i m r e l _ p o s is obtained by multiplying r d i s and r d i r , s i m r e l _ p o s [ 0 , 1 ] , and the greater the relative distance r d i s and the relative direction r d i r are, the more similar the relative position values of ( s i , s h ) and ( c j , c k ) are and vice versa.
(2) Relative direction similarity
The main directions of the areal targets are shown in Figure 6. Relative direction similarity is the difference between the main directions of ( s i , s h ) and ( c j , c k ) :
s i m r e l _ d i r = 1 β ( s i , s h ) β ( c j , c k ) π / 2
where β is the difference between the main directions of the two areal targets, β [ 0 , π / 2 ] , and s i m r e l _ d i r represents the direction similarity of ( s i , s h ) and ( c j , c k ) , where the greater the value is, the more similar the relative directions of the target pairare and vice versa.
(3) Relative area similarity
Relative area similarity is determined mainly by the sizes of the areal targets and is calculated as follows:
s i m r e l _ a r e a = 1 1 + ( ϕ ( s i , s h ) ϕ ( c j , c k ) ) 2 ϕ ( s i , s h ) = 1 / ϕ ( s i , s h ) , ϕ ( c j , c k ) = 1 / ϕ ( c j , c k ) , i f ϕ ( s i , s h ) > 1
where ϕ ( s i , s h ) = A r e a ( s i ) / A r e a ( s h ) , ϕ ( c j , c k ) = A r e a ( c j ) / A r e a ( c k ) , and A r e a are the areas of corresponding targets and s i m r e l _ a r e a represents the relative area similarity between ( s i , s h ) and ( c j , c k ) . To ensure that ϕ ( s i , s h ) [ 0 , 1 ] , if ϕ ( s i , s h ) > 1 , both ϕ ( s i , s h ) and ϕ ( c j , c k ) are counted in reverse.
(4) Relative shape similarity
s i m r e l _ s h a p e = 1 1 + ( δ ( s i , s h ) δ ( c j , c k ) ) 2 δ ( s i , s h ) = 1 / δ ( s i , s h ) , δ ( c j , c k ) = 1 / δ ( c j , c k ) , i f δ ( s i , s h ) > 1
where δ ( s i , s h ) = e ( s i ) / e ( s h ) , δ ( c j , c k ) = e ( c j ) / e ( c k ) , and e ( · ) are areal target shape values calculated using the target function, and s i m r e l _ s h a p e represents the relative shape similarity between ( s i , s h ) and ( c j , c k ) . To ensure that δ ( s i , s h ) [ 0 , 1 ] , if δ ( s i , s h ) > 1 , both δ ( s i , s h ) and δ ( c j , c k ) are counted in reverse.
Based on Equations (6)–(9), the overall neighborhood similarity of ( s i , s h ) and ( c j , c k ) is expressed as follows:
s i m r e l ( s i , c j ; s h , c k ) = i = 1 4 w i s i m r e l _ Y
where Y represents p o s , d i r , a r e a , s h a p e , that is, the position, direction, area, and shape features, and i = 1 4 w i = 1 .
Because there is more than one neighboring matching candidate of s i and c j , it is necessary to generalize all the neighbor matching targets of s i and c j and calculate the overall neighborhood similarity as follows:
s i m r e l = s h N i max k N j ( s i m r e l ( s i , c j ; s h , c k ) ) / N u m
where N u m represents the number of space neighbors of s i in N i .

3.3. Using the Relief-F Algorithm to Determine Similarity Weight Values

In previous studies, weight values have been determined largely using empirical values, which is subjective and does not guarantee optimal weight values. The Relief-F algorithm is a multi-class feature selection algorithm that is an update of the Relief algorithm. The Relief-F algorithm is a method for calculating the weights of features based on sample learning. It has been used in many fields because of its simplicity, intuitiveness, and computational efficiency [24,25]. The algorithm assigns initial weights to each feature and then refines the weights by updating the formula, finally obtaining the weights of the various features. The Relief-F algorithm was used in this study to determine the weights of the various similarity features.
If the matching entity feature set is R = { R 1 , R 2 , , R N } , R i = { R i 1 , R i 2 , , R i M } represents the i th matching entity feature (the features in this study are distance, area, direction, and shape, so M = 4 ), and C l a s s = { C l a s s 1 , C l a s s 2 , , C l a s s p } represents the matching entity features of different groups. The k nearest neighbor samples were obtained from the sample sets of the same and different groups as R i , and the following formula was used to update the weight values of the j th feature of R i .
w j i = w j i 1 + 1 k × N [ d i f f m i s s ( R i , k ) d i f f h i t ( R i , k ) ] d i f f m i s s ( R i , k ) = m = 1 k R i j R h i t m j max ( R * j ) min ( R * j ) d i f f h i t ( R i , k ) = C C l a s s ( R i ) [ P ( C ) 1 P ( C l a s s ( R i ) ) m = 1 k R i j R m i s s m j max ( R * j ) min ( R * j ) ]
where d i f f m i s s ( R i , k ) and d i f f h i t ( R i , k ) represent the differences between R i and the same group and the different group, respectively; max ( R * j ) and min ( R * j ) are the largest and smallest values of the j th feature, respectively; P ( C ) is the probability of class C , that is, the number of samples in class C as a proportion of the total samples; and R h i t m j and R m i s s m j represent the j th eigenvalues of samples in the same group and different group as R i .
Equation (12) was used to determine the weight values in Equations (5) and (10). According to the calculated feature similarity s i m s e l f of the residential area itself and the spatial neighborhood similarity s i m r e l , the average of the two was taken as s i m = ( s i m s e l f + s i m r e l ) / 2 to obtain the final similarity value of the entities to be matched, s i and c j , with the final matching relationships determined according to the set threshold.

4. Test and Analysis

4.1. Matching Process and Test

The following is the testing process for the multi-scale residential area matching method proposed in this study (Figure 7):
  • Data preprocessing is conducted, which mainly includes data format conversion, coordinate system conversion, projection alteration, and topology checking;
  • Weight values are determined by selecting a certain number of positive samples and calculating the sub-features of similarity of a residential area and spatial neighborhood similarity, using the Relief-F algorithm to determine the weight values of the various features;
  • Initial matching is undertaken using Minimum Bounding Rectangle to screen candidate matching entities and then by calculating spatial similarity values and determining 1 : 1 and 1 : N candidate matches;
  • Spatial neighborhood clusters are determined using the method described in Section 2.2, based on the initial matching. The matched entities were labeled C 1 and S 1 , and the unmatched entities were labeled C 2 and S 2 , which mainly included M: N matching and 1: 0 matching. Finally, the Delaunay triangulation network is created to divide the many-to-many matching spatial domain;
  • M : N matching is conducted by performing convex hull processing on the obtained aggregated element set of the spatial neighborhood clusters and converting them into single entities for matching, and spatial similarity is calculated to determine the matching relationship M : N ;
  • All matching results are obtained and evaluated, and the matching relation is mainly determined by the spatial similarity value.
To verify the effectiveness of the algorithm proposed in this study for matching multi-scale residential areas, the ArcGIS Engine 10.2 software was used to design and implement the algorithm using the Python programming language. We used a computer with the Microsoft Windows 10 operating system, an Intel i7 processor, 8 GB of memory, and a 512-GB hard drive. The test used residential data at scales of 1:50,000 and 1:10,000 for an area of the city of Ningbo, Zhejiang Province in eastern China, which included 689 and 2016 polygon entities, respectively. The selected area has both densely populated urban areas and more sparsely populated suburban areas, which are suitable for matching. The test data are shown in Figure 8.

4.2. Test Results and Analysis

Five groups of positive samples were selected from typical areas, such as urban, rural, and suburban areas, of the experimental data, with 20 samples in each group. The method proposed in this paper was used to calculate the similarity values for location, direction, area, and shape features of the residential areas themselves and their spatial neighborhoods. The Relief-F algorithm was used to refine the weight values of the features. During the training process, a value of 8 for k was obtained after running the algorithm 20 times. The similarity feature weight values in Equations (5) and (10) were obtained and are shown in Table 2. In addition, in accordance with the experience and knowledge of experts [14,17,26], and after much feedback, the distance threshold was set to d τ = 50 m , and the similarity threshold was set to s i m 0 = 0.6 .
To evaluate the matching efficacy of the algorithm proposed in this study, values of the matching accuracy P , recall R , and F were calculated as follows by comparison with manual matching results (through visual observation):
P = f ( C ) f ( C ) + f ( W ) + f ( M ) R = f ( C ) f ( C ) + f ( U ) F = 2 P R P + R
where f ( C ) is the number of correct matches, f ( W ) is the number of mismatches, f ( M ) is the number of matches that could not be determined manually, and f ( U ) is the number of missed matches.
We used the test data to compare the method employed in this study with the area overlap rate [7] and own feature similarity [8] matching methods from other studies, with the area overlap rate threshold set to 0.5 and the similarity threshold set to 0.6. The match results statistics are shown in Table 3. It can be seen that the accuracy, recall, and F value of the method employed in this study were all above 90%, but the accuracy and recall of the matching methods employed in the other studies [7,8] were in the range of 70 to 80%. Since the experimental data used in this paper were preprocessed, that is, there were no large geometric deviations, the area overlap ratio matching method was relatively good, but some 1 : N matches could not be fully matched and were prone to mismatches, and some entities with individual geometric deviations were prone to missed matches. In terms of multi-scale matching, because of significant differences in spatial features between some matching entities, there were many 1 : N matches. The matching method based on own feature similarity is prone to mismatches and missed matches. Furthermore, the methods used in the other studies [7,8] could not effectively identify M : N matching relationships.
However, the two methods used in the other studies are less complex, so they are quicker than the method used in this study. The method in this paper needs to determine candidate targets for many times, which is time-consuming in the traverse process. In addition, Voronoi diagram and Delaunay triangulation network are also used, so the overall speed is slightly lower.
Figure 9 shows some of the detailed matching results for the data from the test area, and the drawing specification mainly expresses the matching relation by connecting the centroid of gravity of different residential areas. It can be seen that the method employed in this study can effectively handle multi-scale polygon residential matching, can identify various types of matching relationships, and has good matching efficacy. It can be seen from Figure 9a,b that the distribution of polygon residential areas is relatively regular, with no significant positional deviations and mostly 1 : 1 matching and simpler 1 : N matching, so these polygon residential areas are easier to match, and the matching results are better.
If there is a complex spatial distribution, the spatial similarity features of the residential areas themselves are not sufficiently obvious, so matching these entities depends more on spatial neighborhood features. The method used in this study takes this into account, thereby improving matching efficiency somewhat. Figure 9c,d show that the overall matching results of polygon residential areas were good, and the method proposed in this study, which is based on spatial neighborhood clusters, can accurately identify complex M : N matches, as indicated by the areas in red dotted lines in the figure. However, because of differences in the features of individual elements being too large or matching relationships being relatively fuzzy, mismatches and missed matches also occurred, as indicated by areas in blue solid lines in the figure.
Figure 10 shows a matching sample selected from the results set. The similarities of the different entities were obtained from the calculations, as shown in Table 4. In Figure 10a, s 1 and t 1 are a 1 : 1 matching relationship. The own feature similarity and spatial neighborhood similarity of the two areal entities were both very high, so a matching relationship could be accurately determined between the two. In Figure 10b, s 2 and t 2 , t 3 are a 1 : N match. The similarity between the two entity types is relatively high, and the matching relationship between the three can also be accurately determined through two-way matching. In Figure 10c, s 3 and s 4 , together with t 4 , t 5 , and t 6 , are an M : N match caused by typification. After initial matching, a specific matching relationship could not be determined, but an M : N candidate matching relationship could be determined using the method based on spatial neighborhood clusters. Convex hull processing was then performed. As Figure 10d shows, this matching relationship was converted into a 1 : 1 simple matching, and the similarity value was calculated to determine the matching relationship between the elements.

5. Conclusions

This paper introduced a multi-scale residential area matching method that takes into account spatial neighborhood features. This method analyzes the complex matching relationships of multi-scale polygon residential areas and proposes a many-to-many candidate matching determination method that is based on spatial neighborhood clusters to achieve complex many-to-many matching relationships. It also combines similarities of polygon residential areas themselves and spatial neighborhood similarities, taking into account geometric and topological features of residential areas, to obtain accurate matching results. The experimental results show that this method can identify complex one-to-many and many-to-many matching relationships in multi-scale residential areas and overcome the influence of positional deviations on matching results, with a high degree of matching accuracy.
Although the method is believed to be very useful, there are issues for further investigation concerning its practical applications. One is that the primary match is found correctly, but other matches can be lost in special cases, which usually happens if several objects have been aggregated, but one of them is more similar to the target object. The other is that when the scale difference between the data to be matched is great, the residents are greatly affected by cartographic generalization factors, and their own features and spatial neighborhood features will change, which is easy to cause mismatching. Future studies should focus on adapting this method to matching polygon residential areas with scales of different spans and extending the method to matching other areal elements.

Author Contributions

Conceptualization, Jingzhen Ma; methodology, Jingzhen Ma, Qun Sun and Shaomei Li; software, Jingzhen Ma; validation, Bowei Wen; formal analysis, Jingzhen Ma; investigation, Zhao Zhou; resources, Qun Sun; data curation, Bowei Wen; writing—original draft preparation, Jingzhen Ma; writing—review and editing, Jingzhen Ma, Qun Sun and Shaomei Li; visualization, Jingzhen Ma; supervision, Shaomei Li; project administration, Zhao Zhou; funding acquisition, Jingzhen Ma and Qun Sun. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (42101454, 41901397), the Fund Project of Zhongyuan Scholar of Henan Province (202101510001) and the Joint Fund of Collaborative Innovation Center of Geo-Information Technology for Smart Central Plains, Henan Province and Key Laboratory of Spatiotemporal Perception and Intelligent processing, Ministry of Natural Resources (212102).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data and codes presented in this study are available on request from the authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ruiz, J.; Ariza, F.; Ureña, M.; Elidia, B. Digital Map Conflation: A Review of the Process and a Proposal for Classification. Int. J. Geogr. Inf. Sci. 2011, 25, 1439–1466. [Google Scholar] [CrossRef]
  2. Xavier, E.; Ariza, F.; Ureña, M. A Survey of Measures and Methods for Matching Geospatial Vector Datasets. ACM Comput. Surv. 2016, 49, 1–34. [Google Scholar] [CrossRef]
  3. Fu, Z.; Yang, Y.; Gao, X.; Zhao, X.; Fan, L. An Optimization Algorithm for Multi-characteristics Road Network Matching. Acta Geod. Cartogr. Sin. 2016, 45, 608–615. [Google Scholar]
  4. Wang, X.; Qian, H.; He, H.; Chen, J.; Hu, H. Matching Multi-source Areal Habitations with Skeleton Line Mesh of Blank Region. Acta Geod. Cartogr. Sin. 2015, 44, 927–935. [Google Scholar]
  5. Guo, Q.; Xie, Y.; Liu, J.; Wang, L.; Zhou, L. Algorithms for Road Networks Matching Considering Scale Variation and Data Update. Acta Geod. Cartogr. Sin. 2017, 46, 381–388. [Google Scholar]
  6. Kim, J.; Vasardani, M.; Winter, S. Similarity Matching for Integrating Spatial Information Extracted from Place Descriptions. Int. J. Geogr. Inf. Sci. 2016, 31, 56–80. [Google Scholar] [CrossRef]
  7. Masuyama, A. Methods for Detecting Apparent Differences Between Spatial Tessellations at Different Time Points. Int. J. Geogr. Inf. Sci. 2006, 20, 633–648. [Google Scholar] [CrossRef]
  8. Hao, Y.; Tang, W.; Zhao, Y.; Li, N. Areal Feature Matching Algorithm Based on Spatial Similarity. Acta Geod. Cartogr. Sin. 2008, 37, 501–506. [Google Scholar]
  9. An, X.; Sun, Q.; Xiao, Q.; Yan, W. A shape multilevel description method and application in measuring geometry similarity of multi-scale spatial data. Acta Geod. Cartogr. Sin. 2011, 40, 495–501. [Google Scholar]
  10. Luo, G.; Zhang, X.; Qi, L.; Guo, T. The Fast Positioning and Optimal Combination Method of Change Vector Object. Acta Geod. Cartogr. Sin. 2014, 43, 1285–1292. [Google Scholar]
  11. Samal, A.; Seth, S.; Cueto, K. A Feature-based Approach to Conflation of Geospatial Sources. Int. J. Geogr. Inf. Sci. 2004, 18, 459–489. [Google Scholar] [CrossRef]
  12. Kim, J.; Yu, K.; Heo, J.; Lee, H. A New Method for Matching Objects in Two Different Geospatial Datasets Based on the Geographic Context. Comput. Geosci. 2010, 36, 1115–1122. [Google Scholar] [CrossRef]
  13. Tong, X.; Deng, S.; Shi, W. A Probabilistic Theory-based Matching Method. Acta Geod. Cartogr. Sin. 2007, 36, 210–217. [Google Scholar]
  14. Wu, J.; Wan, Y.; Chiang, Y.; Fu, Z.; Deng, M. A Matching Algorithm Based on Voronoi Diagram for Multi-Scale Polygonal Residential Areas. IEEE Access 2018, 6, 4904–4915. [Google Scholar] [CrossRef]
  15. Wang, Y.; Chen, D.; Zhao, Z.; Ren, F.; Du, Q. A Back-Propagation Neural Network-Based Approach for Multi-Represented Feature Matching in Update Propagation. Trans. GIS 2015, 19, 964–993. [Google Scholar] [CrossRef]
  16. Zhang, Y.; Huang, J.; Deng, M.; Fang, X.; Hu, J. Relaxation Labelling Matching for Multi-scale Residential Datasets Based on Neighboring Patterns. Geomat. Inf. Sci. Wuhan Univ. 2018, 43, 1098–1105. [Google Scholar]
  17. Zhang, X.; Ai, T.; Stoter, J.; Zhao, X. Data Matching of Building Polygons at Multiple Map Scales Improved by Contextual Information and Relaxation. ISPRS J. Photogramm. Remote Sens. 2014, 92, 147–163. [Google Scholar] [CrossRef]
  18. Xu, J.; Wu, F.; Zhu, J.; Qian, H. A Multi-to-Multi Matching Algorithm Between Neighborhood Scale Settlement Data. Geomat. Inf. Sci. Wuhan Univ. 2014, 39, 340–345. [Google Scholar]
  19. Huh, Y.; Kim, J.; Lee, J.; Yu, K.; Shi, W. Identification of Multi-scale Corresponding Object-set Pairs Between Two Polygon Datasets with Hierarchical Co-clustering. ISPRS J. Photogramm. Remote Sens. 2014, 88, 60–68. [Google Scholar] [CrossRef]
  20. Zhang, L.; Guo, Q.; Sun, Y. The Method of Matching Residential Features in Topographic Maps at Neighboring Scales. Geomat. Inf. Sci. Wuhan Univ. 2008, 33, 604–607. [Google Scholar]
  21. Liu, L.; Zhu, D.; Zhu, X.; Ding, X.; Guo, W. A Multi-scale Polygonal Object Matching Method Based on MBR Combinatorial Optimization Algorithm. Acta Geod. Cartogr. Sin. 2018, 47, 652–662. [Google Scholar]
  22. Yang, M.; Ai, T.; Yan, X.; Chen, Y.; Zhang, X. A Map-algebra-based Method for Automatic Change Detection and Spatial Data Updating across Multiple Scales. Trans. GIS 2018, 22, 435–454. [Google Scholar] [CrossRef]
  23. Fan, H.; Zipf, A.; Fu, Q.; Neis, P. Quality assessment for building footprints data on OpenStreetMap. Int. J. Geogr. Inf. Sci. 2014, 28, 700–719. [Google Scholar] [CrossRef]
  24. Kononenko, I. Estimating attributes: Analysis and extension of RELIEF. In European Conference on Machine Learning; Springer: Berlin/Heidelberg, Germany, 1994; pp. 171–182. [Google Scholar]
  25. Du, S.; Luo, L.; Cao, K.; Shu, M. Extracting Building Patterns with Multilevel Graph Partition and Building Grouping. ISPRS J. Photogramm. Remote Sens. 2016, 122, 81–96. [Google Scholar] [CrossRef]
  26. Chen, L.; Zhang, X.; Lin, H.; Yang, M. Change Analysis and Decision Tree Based Detection Model for Residential Objects across Multiple Scales. Acta Geod. Cartogr. Sin. 2018, 47, 403–412. [Google Scholar]
Figure 1. Effects of Cartographic Generalization on Multi-scale Spatial Data. (a) Polygon Residential Areas Many-to-many Corresponding Relationships; (b) Displaced Data.
Figure 1. Effects of Cartographic Generalization on Multi-scale Spatial Data. (a) Polygon Residential Areas Many-to-many Corresponding Relationships; (b) Displaced Data.
Ijgi 11 00331 g001
Figure 2. Adjacent Triangles Merging Graph.
Figure 2. Adjacent Triangles Merging Graph.
Ijgi 11 00331 g002
Figure 3. Obtaining Spatial Neighborhood Clusters. (a) Constrained Delaunay Triangulation; (b) Spatial Neighborhood Clusters.
Figure 3. Obtaining Spatial Neighborhood Clusters. (a) Constrained Delaunay Triangulation; (b) Spatial Neighborhood Clusters.
Ijgi 11 00331 g003
Figure 4. Turning Function Representation.
Figure 4. Turning Function Representation.
Ijgi 11 00331 g004
Figure 5. Spatial Neighbors of Residential Areas.
Figure 5. Spatial Neighbors of Residential Areas.
Ijgi 11 00331 g005
Figure 6. Relative Relationships of Polygon Residential Areas.
Figure 6. Relative Relationships of Polygon Residential Areas.
Ijgi 11 00331 g006
Figure 7. The Matching Process of Multi-scale Residential Areas.
Figure 7. The Matching Process of Multi-scale Residential Areas.
Ijgi 11 00331 g007
Figure 8. Test Data. (a) 1:10,000 Residential Area Data; (b) 1:50,000 Residential Area Data.
Figure 8. Test Data. (a) 1:10,000 Residential Area Data; (b) 1:50,000 Residential Area Data.
Ijgi 11 00331 g008
Figure 9. Matching Results of the Proposed Method.
Figure 9. Matching Results of the Proposed Method.
Ijgi 11 00331 g009
Figure 10. Matching Samples Selected from the Results.
Figure 10. Matching Samples Selected from the Results.
Ijgi 11 00331 g010
Table 1. Matching Mode of Multi-scale Residential Areas.
Table 1. Matching Mode of Multi-scale Residential Areas.
Matching ModeLarge Scale DataSmall Scale Data
1: 0 Ijgi 11 00331 i001 Ijgi 11 00331 i002
0: 1 Ijgi 11 00331 i003 Ijgi 11 00331 i004
1: 1 Ijgi 11 00331 i005 Ijgi 11 00331 i006
1: N Ijgi 11 00331 i007 Ijgi 11 00331 i008
M: 1 Ijgi 11 00331 i009 Ijgi 11 00331 i010
M: N Ijgi 11 00331 i011 Ijgi 11 00331 i012
Table 2. Weight Values of Similarity Features.
Table 2. Weight Values of Similarity Features.
Weight ValuePositionDirectionAreaShape
Equation (5)0.3960.2380.1760.190
Equation (10)0.3170.2620.1930.228
Table 3. Match Results Statistics.
Table 3. Match Results Statistics.
Matching Method f ( C ) f ( W ) f ( M ) f ( U ) P /% R /% F /%Running
Time/s
This study57940185290.991.891.385
Previous study [7]465891811781.379.980.613
Previous study [8]4021051816476.671.073.721
Table 4. Matching Results for Figure 9.
Table 4. Matching Results for Figure 9.
Matching SampleEntity PairSimilarityMatch
Figure 10a s 1 : t 1 0.883Y
Figure 10b s 2 : t 2 0.676Y
s 2 : t 3 0.724Y
Figure 10c s 3 : t 4 0.504N
s 3 : t 5 0.541N
s 4 : t 4 0.492N
s 4 : t 6 0.518N
Figure 10d h 1 : h 2 0.795Y
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ma, J.; Sun, Q.; Zhou, Z.; Wen, B.; Li, S. A Multi-Scale Residential Areas Matching Method Considering Spatial Neighborhood Features. ISPRS Int. J. Geo-Inf. 2022, 11, 331. https://doi.org/10.3390/ijgi11060331

AMA Style

Ma J, Sun Q, Zhou Z, Wen B, Li S. A Multi-Scale Residential Areas Matching Method Considering Spatial Neighborhood Features. ISPRS International Journal of Geo-Information. 2022; 11(6):331. https://doi.org/10.3390/ijgi11060331

Chicago/Turabian Style

Ma, Jingzhen, Qun Sun, Zhao Zhou, Bowei Wen, and Shaomei Li. 2022. "A Multi-Scale Residential Areas Matching Method Considering Spatial Neighborhood Features" ISPRS International Journal of Geo-Information 11, no. 6: 331. https://doi.org/10.3390/ijgi11060331

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop