Context-Aware Matrix Factorization for the Identification of Urban Functional Regions with POI and Taxi OD Data

Jing, Changfeng; Hu, Yanru; Zhang, Hongyang; Du, Mingyi; Xu, Shishuo; Guo, Xian; Jiang, Jie

doi:10.3390/ijgi11060351

Open AccessArticle

Context-Aware Matrix Factorization for the Identification of Urban Functional Regions with POI and Taxi OD Data

by

Changfeng Jing

^*

,

Yanru Hu

,

Hongyang Zhang

,

Mingyi Du

,

Shishuo Xu

,

Xian Guo

and

Jie Jiang

School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 100044, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2022, 11(6), 351; https://doi.org/10.3390/ijgi11060351

Submission received: 1 April 2022 / Revised: 3 June 2022 / Accepted: 14 June 2022 / Published: 16 June 2022

(This article belongs to the Special Issue New Geospatial Science: Analytics and Management for Large Geospatial Datasets)

Download

Browse Figures

Versions Notes

Abstract

:

The identification of urban functional regions (UFRs) is important for urban planning and sustainable development. Because this involves a set of interrelated processes, it is difficult to identify UFRs using only single data sources. Data fusion methods have the potential to improve the identification accuracy. However, the use of existing fusion methods remains challenging when mining shared semantic information among multiple data sources. In order to address this issue, we propose a context-coupling matrix factorization (CCMF) method which considers contextual relationships. This method was designed based on the fact that the contextual relationships embedded in all of the data are shared and complementary to one another. An empirical study was carried out by fusing point-of-interest (POI) data and taxi origin–destination (OD) data in Beijing, China. There are three steps in CCMF. First, contextual information is extracted from POI and taxi OD trajectory data. Second, fusion is performed using contextual information. Finally, spectral clustering is used to identify the functional regions. The results show that the proposed method achieved an overall accuracy (OA) of 90% and a kappa of 0.88 in the study area. The results were compared with the results obtained using single sources of non-fused data and other fusion methods in order to validate the effectiveness of our method. The results demonstrate that an improvement in the OA of about 5% in comparison to a similar method in the literature could be achieved using this method.

Keywords:

urban functional regions; identification; matrix factorization; POI data; OD trajectory data

1. Introduction

The identification of urban functional regions (UFRs) is the basis of urban planning. Accurate UFR identification is crucial to support urban infrastructure planning and improve quality of life. Many types of data have been used for the identification of UFRs, from early questionnaires and land use status maps [1] to emerging data such as points of interest (POIs) [2], social media data [3], and trajectory data [4]. However, using only one type of data has inevitable disadvantages, both spatially and semantically. For example, POI data cannot represent the actual size of a space. The sparsity of social media data may lead to a low identification accuracy. Therefore, fusing data to improve identification accuracy has become a hot research topic.

Significant progress has been made in the use of data fusion methods for the identification of UFRs. Traditional platform-based fusion methods [5,6] have been used widely in loose coupling mode. The advantage of this method is that it is easy to understand. Nevertheless, its ability to portray urban functional features is weak. Feature-based fusion methods [7,8,9,10] can capture feature vectors to represent the main features of each data point. These methods can remove redundant features and retain principal features depending on the correlated information. Instead of focusing on the data themselves, decision-based fusion methods [11,12] were proposed to integrate the derived decisions from data. The complementarity of multiple prior decisions was considered, but the heterogeneity among decisions may lead to a low accuracy of UFR identification [13]. Semantically based methods identify functional regions based on a holistic understanding of the information behind the data [14,15,16]. Therefore, all of these methods can ascertain what each piece of data represents, why different data can be fused, and how they can mutually enhance each other’s features. However, they often ignore shared or common information indicating similarity among different data. Moreover, information sparsity is another challenge in accurate identification. Therefore, mining and engaging the shared information among multiple data sources in order to improve UFR identification accuracy has become an interesting issue and a hot research topic.

In order to address the sparsity issue and mine the common information, a data fusion method named context-coupling matrix factorization (CCMF) was proposed to compensate for the problem of information sparsity by considering the contextual relationships among multiple data sources. First, information extraction is performed based on POI data and taxi origin–destination (OD) trajectory data. Semantic information is derived from the POI data, while spatial interaction and time–frequency information are extracted from the taxi OD trajectory data. Secondly, the CCMF method is used to mine common information among the data sources to be fused into integrated data to identify UFRs. Finally, spectral clustering was adopted to identify UFRs.

The contributions of this paper involve two main aspects:

(1): The CCMF method was proposed to fuse data to achieve a higher identification accuracy. The contextual relationships among the data sources were considered in order to mine the shared information for data fusion.
(2): Empirical work was conducted to validate the usability of the proposed method in the study area. POI and taxi OD trajectory data were fused for use in UFR identification. Comparing the accuracy between fused data and single data, the results demonstrate that the proposed method achieved a higher accuracy in the identification of UFRs.

The remainder of this study is structured as follows. Section 2 reviews the existing literature. Section 3 presents a definition of the case study area and data. Section 4 introduces the methodology and workflow for urban functional region identification with the CCMF method. Section 5 describes the experiments and results. Section 6 provides a discussion of other fusion methods and the impact of parameters. Finally, Section 7 draws some conclusions.

2. Related Work

2.1. The Urban Functional Region Identification Approach

Many UFR identification methods have been developed, from initial field surveys to statistical methods to machine learning methods. Field surveys [17] are more subjective, depending on the investigator’s knowledge. Statistical methods provide new opportunities to quantitatively measure the metrics of UFRs, for example the kernel density analysis method [18], k-means [19], and latent Dirichlet allocation (LDA) [20]. Their accuracy depends on their capacity to extract features from data. Meanwhile, they are weak in handling nonlinear relationships between data sources. Machine learning methods provide the opportunity to mine information and partly improve identification accuracy for nonlinear relationships by a black-box approach [21,22,23]. Some widely used machine learning methods include GMM [5], fuzzy clustering [24], and random forest classifiers [25]. Recently, non-negative matrix factorization (NMF), which can be used to detect spatial–temporal interactions in cities from multiple perspectives, has attracted increasing attention [26,27]. However, most of the works described above applied fusion to only one data type. Therefore, data fusion method research should receive more attention.

2.2. Data Fusion Methods for Urban Functional Region Identification

The basic principle of data fusion is to combine various data sources according to specific criteria [28]. Data fusion methods for the identification of UFRs can be categorized into platform-based, feature-based, decision-based, and semantic-based methods [29].

Platform-based methods are the lowest level of data fusion according to Luo and Kay’s architecture [30]. Geospatial data and remote sensing images have been integrated to evaluate urban land use change [31]. Information on both the space and time dimensions was mined in order to discover urban functional regions in a loose model, but it did not consider social attributes [32]. In recent work, an approach was used that integrates social properties in POI and natural properties in high-spatial-resolution remote sensing images to delineate UFRs [33]. However, data fusion in platform-based methods engages different datasets at different stages. Therefore, there is no requirement for data consistency [34].

Feature-based methods can derive comprehensive information. The basic principle of feature-based methods is equality. Feature vectors are extracted from various data sources dependently, and their weights are set according to their importance in the identification of UFRs. Therefore, weighted fusion methods are widely used. For example, a weighted fusion method was applied to street images and social media data in order to mitigate the sparsity of social media data [7]. A framework to fuse landscape metrics and human activity metrics was developed to identify UFRs [35]. In recent years, deep learning methods have increasingly been involved in feature-based fusion. An improved CNN model [36] was used to recognize UFRs, which can automatically perform feature extraction and handle high-dimensional data. Considering the limitations of the CNN model in accuracy and uncertainty, more attention should be paid to CNN network models [37].

Decision-based methods perform a synthetic estimation for each decision derived from various data sources. In contrast to other fusion methods, decision-based methods focus on various synthetic combinations of decisions, instead of data types [38]. The decision-based fusion method will generate a fused attribute descriptor combining all of the decisions. Therefore, the descriptor has powerful interpretability for UFR identification results. Weight has been widely used to describe various contributions of different decisions [39]. Machine learning methods can be helpful in the determination of weights in order to avoid subjective bias [40]. Except for the weight mechanism, artificial intelligence (AI) has been widely used as an end-to-end approach [41].

Semantic-based fusion is a high-level data fusion method based on the semantic information of the data. Simple AND-OR operations cannot connect them essentially. Semantically based methods engage the intrinsic semantic information to present the data features. The increasing amount of spatiotemporal data requires more research to extract and describe the semantic information. A Non-Negative Tensor Factorization method is proposed to extend the results achieved for static networks to time-varying networks, which will enrich the semantic information based on spatiotemporal information [42]. In a multilevel model, one can use an individual’s socioeconomic characteristics to estimate their impact on the functional determination of urban regions. However, the interrelationship between semantic information is ignored [43]. The cross-correlation mechanism was introduced to mine the semantic features. The CC-FLU model was proposed to identify UFRs by integrating semantic features and the inter-relationships between them [44].

3. Study Area and Datasets

3.1. Study Area

Beijing, the capital of the People’s Republic of China, was selected as the research area. It is a modern, world-famous, international city, and it has a complicated spatial structure. The study area is within the Fifth Ring Road of Beijing, which contains various developed urban functional regions. A map of the study area is shown in Figure 1.

3.2. Datasets and Processioning

The main data sources in our work were OpenStreetMap (OSM), taxi OD trajectory, and POI data.

(1) OpenStreetMap (OSM) is a well-known application that uses volunteered geographic information (VGI) data, representing an important source of citizen science data. An irregular grid formed by road network data is the basic unit to represent socioeconomic functions in urban management and planning. In this paper, data for Beijing’s Fifth Ring Road in 2018 were obtained from the OpenStreetMap website (https://www.openstreetmap.org/, 18 March 2022). The irregular grid generated by the road network data is the basic unit that expresses the socioeconomic functions in urban management and urban planning, and it is unified in its recognition of the impact on urban functions [45,46].

The scale and shape of geographical units may exert impacts on research results. The division method of homogeneous grids cannot guarantee the continuity of land-use types in cells. The land-use types in the cells at the junction of different land-use types are highly mixed.

The OSM data often serve to generate basic spatial units for UFR identification. These data must be preprocessed in order to ensure data quality. The spatial unit used in this study is a combination of ArcGIS software operation and manual co-division: for example, the removal of overhang points in the road and the extension of the independent road line to connect it to the adjacent road, and finally, by manually removing unnecessary internal community roads and the middle lines of roads. The study area was divided into 1260 units, considering four levels of roads: highway, primary, secondary, and urban trunk roads.

The scale and shape of the geographical units may exert impacts on the research results. The division method of homogeneous grids cannot guarantee the continuity of land-use types in cells. The land-use types in the cells at the junction of different land-use types are highly mixed.

(2) Taxi OD trajectory data consist of the paths of taxis, as recorded by Global Navigation Satellite System (GNSS) tracking devices. Residents’ daily activities are meaningful, and can be portrayed by taxi OD trajectory data. In our work, one week of taxi OD trajectory data from the Beijing Fifth Ring Road area in 2018 were employed. It contains about 25–35 million trajectory point data for a single day. Each datum record mainly contains the taxi ID, time, latitude, longitude, speed, direction, and status (see Table 1).

Due to the characteristics of taxi operation, some remote locations have no distribution of OD points, while some popular regions have more distributions. Therefore, the obtained data are not evenly distributed. There will be missing values or zero in the data matrix. Likewise, the data used in this paper are the OD data of Beijing for one week. Short data cycles also make data sparse.

(3) POI data consist of point data with geographical information portraying real-world objects, whether 2D or 3D, indoor or outdoor [47]. POI data usually have fine spatial granularity and are widely used to delineate people’ activities. POI properties comprise names, categories, latitudes, and longitudes. Approximately 147,158 records about POIs in 2018 were collected from the Gaode map API (http://lbs.amap.com/, 29 March 2022), containing 22 categories and sub-categories. Following Chinese technological specifications (GB50137-2011), the POI data were re-classified as residential, public service, commercial and financial, industrial, green square, and transportation data (Table 2). In our work, the POI data were used to determine the types of UFRs.

4. Methodology

Inspired by the work of Zheng and Yi [34,48], CCMF was proposed to fuse POI and taxi OD trajectory data for UFR identification. As shown in Figure 2, this framework mainly consists of three steps. In the first step, feature information is extracted. Secondly, the feature matrices are decomposed and a fused matrix is derived. Finally, the spectral clustering method is employed to identify the functional regions. The methods in the first step are more common. Therefore, the following sections focus on describing the methods of steps 2 and 3.

4.1. Feature Information Extraction Workflow

In our work, trajectory OD and POI data were fused in order to identify UFRs. The information extraction was the basis of the proposed CCMF method, and the results were stored as a matrix. The process was as follows:

(1): The number of a particular type of POI in a spatial unit can partly reflect the functional type to some extent. The overlay spatial operation was carried out on POI data and the spatial units in order to generate a spatial unit POI matrix, P. The rows of matrix P represent a POI type in this region, and the columns represent the unit location. Finally, powerful semantic information is provided to the annotating function type.
(2): The location information of the OD data reflects intersections between regions. That is, the origin and destination points falling into some spatial unit denote their interaction. Calculating the sum of OD points within each spatial unit, an OD interaction matrix, Q, is obtained. Rows of matrix Q represent passenger pick-up origin points, and columns denote drop-off destination points. The matrix element value represents the frequency of travel from origin to destination points. It is used as original data, mainly using the frequency of travel interactions of residents within each unit to explore the potential functions of the region.
(3): The time-frequency sequences extracted from taxi OD data represent taxi flows or residents’ travel patterns for the time period in this region. The ordinary statistical method was employed to calculate the time-frequency sequences. The similarity of spatial units is crucial to UFR identification. The dynamic time warping (DTW) algorithm is a widely used time regularization method to measure the similarity of two sets of time series data [49], and this algorithm was also adopted in our work to obtain the time–frequency similarity matrix, R. In the same time, there may be a situation in which the number of OD interactions is the same but the functions are different. For example, if the peak region is at 6–8 a.m. and 6–8 p.m. during weekdays, it is likely to be an Industrial or Residential region, while at weekends it is likely to be a Commercial or a Green region. Therefore, the hourly features are extracted in order to compensate that shortcoming.

4.2. Context-Coupled Matrix Factorization for Data Fusion

A single data source can only show an urban functional property from one perspective. There are similarities in functions between different urban regions, and when we know that two regions are similar in some metrics, the similarity between the data can be used as contextual information to strengthen the similarity between regions when one is an incomplete description of the data. We describe the nature or function of the regions together by learning multiple corresponding similarities (calculated using the corresponding data, which can reinforce each other). Therefore, fusing the data to achieve high identification accuracy should be a better approach for UFR identification. As determined by the literature review, the existing works focus mainly on different fusion methods, but ignore the contextual information between data sources.

To address this gap, the CCMF method, which considers contextual relationships among data sources and fully engages them in measuring associations, was proposed. The core concept of CCMF is the collaborative factorization of feature matrices extracted from multiple data sources. The contextual information of geospatial data is essential in order to improve the identification accuracy for spatial and semantic understanding. These data involve shared hidden factors that can complement each other.

The specific solution reducing the data sparsity limitation is mainly owing to the fact that they have a common dimension (1260 spatial units). The similarity of different regions learned from one dataset is transferred to another dataset by context-aware matrix factorization in order to compensate for the missing values. Therefore, high identification accuracy can be obtained. In this paper, this was achieved by the inter-regional similarity obtained from the dense dataset (the POI matrix and OD time–frequency matrix) to enhance the similarity obtained from the OD interaction matrix. That is, the similarity of the POI types obtained from the POI matrix compensates for the semantic missingness of the OD interaction matrix. The similarity of residents’ travel times obtained from the OD time–frequency matrix compensates for the temporal missingness in the OD interaction matrix. A schematic diagram is given in Figure 3. There is a shared hidden factor M for matrices P and Q. Likewise, there is a shared hidden factor N for matrices Q and R. M contains common information from the POI data and interaction information, and N includes the similarity of spatial units. The features of multiple data sources are fused to compensate for data sparsity, which is helpful to solve the problem of data bias. Therefore, the identification accuracy can be improved.

The correlation between two spatially close objects is stronger than that between two remote objects. The OD interaction, POI, and OD time–frequency matrices are decomposed simultaneously. Then, their shared interaction matrix M for feature semantics and interaction matrix N for feature similarity are formed. The shared information M between the OD interaction matrix Q and the POI matrix P is the POI type similarity between different regions. POI types can assist to the determination of urban function. The rows of matrix P represent the POI type in this region, and the columns represent the unit location. Then, the two columns represent the similarity of the POI type in different regions. Through fusion, POI types are shared in order to compensate for semantic missingness. The shared information N by the OD interaction matrix Q and the OD time–frequency matrix R is the similarity of the interaction time between cells. One row of matrix R represents a unit, and one column represents the change in OD frequency of this unit over time. The two columns in matrix R represent the similarity of people’s activity time between the units, and the similarity of the interaction times between units can be shared after fusion. Through fusion, the similarity of the interaction times between units is shared in order to compensate for the difference in residents’ activity times. The formula is as follows [50]:

L (H, M, N) = \frac{1}{2} {| | W (Q - (M N^{T})) | |}_{F}^{2} + \frac{α}{2} {| | P - (M H^{T}) | |}_{F}^{2} + \frac{β}{2} {| | R - (N N^{T}) | |}_{F}^{2} + \frac{γ}{2} ({| | H | |}_{F}^{2} + {| | M | |}_{F}^{2} + {| | N | |}_{F}^{2})

(1)

where L denotes the data fusion method, W denotes the weight matrix in the range [0, 1],

α

is used to regulate the influence of the OD interaction information and POI category on recognition, and M is the common matrix after the factorization of Q and P (i.e., Q and P interact through M). Similarly,

β

is used to regulate the influence of OD interaction and time–frequency information on recognition, and N is the common matrix after the decomposition of Q and R, (i.e., Q and R interact through N). The first three terms of Formula (1) are used to control the matrix factorization, and the last term is used to prevent over-fitting. Inspired by the literature [51], the stochastic gradient descent (SGD) method was engaged in order to optimize Formula (1) for higher accuracy based on contextual information in the similarity between the time–frequency and POI information. The proposed matrix factorization algorithm is shown in Algorithm 1.

Algorithm 1: Context-coupled matrix factorization algorithm.
Input: OD interaction matrix Q, POI matrix P, OD time–frequency matrix R, error threshold ε, iteration number T
Output: Q’= MNT, the strength of association between user travel and POI
1	Random initialization H $, M, N$ , gradient descent rate $γ$
2	set t = 1
3	if (t < T and Lt − Lt + 1 > $ε$ )
4	use SGD method to calculate $\frac{\partial L}{\partial H}, \frac{\partial L}{\partial M}, \frac{\partial L}{\partial N}$
5	set $γ$ = 1
6	while ((L (Mt $- γ \frac{\partial L}{\partial H}$ , Ut $- γ \frac{\partial L}{\partial M},$ Vt $- γ \frac{\partial L}{\partial N}$ ) > L (H, M, N)) do
7	set $γ$ = $γ / 2$
8	Mt + 1 = Mt $- γ \frac{\partial L}{\partial H},$ Ut + 1 = Ut $- γ \frac{\partial L}{\partial M},$ Vt + 1 = Vt $- γ \frac{\partial L}{\partial N}$
9	set t = t + 1
10	end loop
11	end if
12	Return Q’= MN^T

4.3. Spectral Clustering to Identify Urban Functional Regions

Spectral clustering is an algorithm that evolved from graph theory [52]; it has many applications in machine learning, computer vision, and data analysis. Spectral clustering transforms the clustering problem into a spectral decomposition of the similarity matrix [53]. The basic idea is to classify the feature vectors obtained from feature decomposition. In clustering, the similarity represents the characteristics of the data. The task of spectral clustering is to divide objects into different subgraphs by a similarity matrix, where the objects within the subgraphs are similar to each other.

When we have similarity matrix W, we can perform graph partitioning via the following steps:

(1): Calculate diagonal matrix T. The values of matrix T on the diagonal are the sums of the corresponding rows or columns of matrix W.
(2): Compute the Laplacian matrix L. This is a symmetric matrix formed by subtracting similarity matrix W from diagonal matrix T.
(3): Perform undirected graph division. The matrix is first transformed into the form of a graph by considering regions as objects of the graph, and considering similarity as weights to connect them (Figure 4). In the figure, “Min cut” denotes the segmentation with the smallest cut edge, and “Normalized cut” denotes the segmentation of approximately the same size with the smallest cut edge. According to the graph cut idea, the first K eigenvalues of the similarity matrix are found, and then the corresponding eigenvectors are found.
(4): Apply the k-means clustering method for cluster feature vectors. Many studies have employed spatial-based clustering methods to determine regions, such as k-means and DBSCAN. These methods focus on the spatial proximity of entities, and ignore the similarities and semantic information among entities. However, the spectral clustering method can address this gap. In addition, spectral clustering is better for sparse data because it only focuses on a similarity matrix.

In spectral clustering, matrix Q’ from the CCMF method is used as input data. The spatial distribution of urban functional areas is identified. However, the semantic property of each functional region is still unknown. In our work, the widely used frequency density (FD) and category proportion (CP) methods [5] were employed. The formula for FD is as follows:

F D_{i} = \frac{n_{i}}{N_{i}} i = 1, 2, 3, \dots, 6

(2)

where

i

refers to the type of POI,

n_{i}

refers to the number of POIs of types

i

in the unit,

N_{i}

refers to the total number of POIs of the type, and

F D_{i}

refers to the frequency density of type

i

POIs in the total number of POIs of that type.

The formula for the calculation of the proportions of different types is shown in Equation (3):

C P_{i} = \frac{F_{i}}{\sum_{i = 1}^{6} F_{i}} \times 100 % i = 1, 2, 3, \dots, k

(3)

where

C P_{i}

refers to the frequency density of POI type

i

in this region, and k refers to the type of cluster.

POI data have rich semantic information for the identification of UFR types. In our work, the maximum proportion rule is used to determine the functional type. When the proportion of one type of POI is bigger than the others, this function of this unit is defined as this POI type. In contrast with the traditional human experience determination method, the powerful semantic information of POI data was used quantitatively. Therefore, better accuracy can be achieved.

4.4. Evaluation Metrics

The identification results were evaluated both quantitatively and qualitatively. The kappa factor and overall accuracy (OA) were employed to quantify the identification accuracy [54,55,56]. The OA value refers to the ratio of the number of correctly classified categories to the total number of categories, although the OA value is a good indicator of the classification accuracy. However, for multi-class features with a highly unbalanced number of category elements, its value is more influenced by categories with a high number of categories, is not well characterized for each category of feature, and represents the ratio of classification to completely random classification producing an error reduction. They demonstrated consistency in the measurement of the accuracy of UFR identification methods. Inspired by existing work [57], the ground truth was manually identified with Gaode Map and Gaode Image. The comparison shows whether the identification results are consistent with the ground truth types.

5. Results

5.1. Data Fusion with the Proposed CCMF Method

The proposed CCMF method was used to fuse the semantic information of POI data, and to show the similarity between spatial units. Figure 5 gives a schematic diagram of the data fusion. Matrix Q’ is the fusion result of transformation from the original matrix Q. M’ represents the shared factors, considering POI semantic information in the matrix factorization. N’ denotes the shared factors, taking into account the similarity of spatial units. After fusing, some zero values of the original matrix Q were filled. Therefore, our work compensates well for the sparsity of OD interaction data.

5.2. Results of the UFR Identification

The proposed CCMF method was used to fuse POI and taxi OD data. The CP method was used for clustering. Then, following the urban land use field investigation in China and the Chinese technological specifications (GB50137-2011), the urban land was divided into six clusters: residential, public service, commercial and financial, industrial, green square, and transportation. The distribution of functional regions is mapped in Figure 6. Table 3 shows the CP values of POIs in different clusters.

Cluster 1: This region cluster shows a high CP value for the green square category (0.219440). More importantly, this cluster contains typical tourism spots. Therefore, cluster 1 was judged to be green square (Green).

Cluster 2: Companies in this functional region have a high CP value (0.172115). In addition, the spatial units covered in this cluster include Peking University and the Great Hall of the People. Therefore, cluster 2 was judged to be public service (Pub).

Cluster 3: The CP values in the transportation category (0.217904) have higher values in cluster 3 than in other regions. This cluster area covers the most important transportation hubs in the study area, including Beijing Train Station and Beijingnan Railway Station. Therefore, cluster 3 was judged to be transportation (Trans).

Cluster 4: The value of the residential category (0.202814) is higher in cluster 4 than in other regions. The spatial units covered in this cluster include the Ganjiakou Community and the Fangcheng Yuan Community. Therefore, cluster 4 was judged to be residential (Res).

Cluster 5: The value of the industrial category (0.175169) is high in this area. This cluster area contains landmarks such as Zhongguancun and Wangjing. Therefore, cluster 5 was judged to be industrial (Ind).

Cluster 6: This region cluster shows a higher CP value for the commercial and financial category (0.198671), and has landmarks including Xidan and Xihongmen. Therefore, cluster 6 was judged to be commercial and financial (Comm).

5.3. Accuracy Comparison Analysis Results

In order to thoroughly validate the UFR identification accuracy, three experiments were designed. The first one was an accuracy analysis for our proposed CCMF method from quantitative and qualitative perspectives. In the second experiment, an accuracy comparison was conducted between fused CCMF data and non-fused single data. Finally, considering the importance of clustering methods in UFR identification, a comparison of different clustering methods was carried out.

5.3.1. Accuracy Analysis with the Proposed CCMF Method

In order to verify the recognition accuracy, we conducted quantitative and qualitative analyses. The quantitative analysis involved calculating the OA and kappa values. These were calculated by randomly selecting 30 units for each type of functional region and comparing them with the ground truth obtained from Gaode Map. Table 4 shows the identification accuracy of the fused data according to the proposed CCMF method. The OA reached 90% and the kappa value was 0.88. The user’s accuracy (UA) indicates the percentage of correct identifications in this category, while the producer’s accuracy (PA) indicates the probability that the real reference data of this category will be correctly identified. The UA value for residential was 90%, that for public service was 90%, that for commercial and financial was 93%, and that for green square was 93%, while the value was 87% for both industrial and transportation. Regarding PA, industrial had a value of 96%, commercial and financial had a value of 97%, transportation had a value of 96%, residential had a value of 82%, public service had a value of 84%, and green square had a value of 85%.

The qualitative analysis was validated using an interactive visualization approach with Gaode Map and Gaode Image. Table 5 shows some typical spatial units. The results indicate that the CCMF method could effectively identify UFRs in Beijing.

The validation results shown that the Context-Aware Matrix Factorization model has strong identification accuracy, but there are still some errors. The main reasons for the erroneous results are explained below.

(1): One reason is the multi-functionality of the region. Firstly, there are usually very rich POI data of commercial and financial near-residential regions. Likewise, there are usually parks or squares near high-end neighborhoods, which may be misclassified, resulting in a much smaller number of residential areas than the real data. Second, based on the current situation in Beijing, there are large shopping malls in many subway stations, such that the purpose of people is not well determined by POI data and OD data.
(2): Another reason is that the area cannot be acquired. POI data tend to abstract spatial entities as area-free points, whereas in real life, the area of entities is also one of the important factors of the identification of urban functional regions.

5.3.2. Accuracy Analysis of Single Data without Fusing

An experiment was designed to compare the identification accuracy between fused data and non-fused single data (POI and taxi OD trajectory data, separately). For the POI data, kernel density estimation was employed to detect the urban functional spatial areas, and DBSCAN clustering was used to determine their semantic functional type. For the taxi OD data, the DTW distance method and spectral clustering were used for UFR identification. For the fusing experiment, the method was our proposed CCMF, and the results are shown in the above subsection.

The results are shown in Figure 7. Six typical areas, labeled with circles and numbered in Figure 7, were chosen for validation. Table 6 presents the comparison results and accuracy with the OA and kappa values. The ground truth was obtained from Gaode Map. The results show that the identification accuracy was lower based on a single data source was than it was based on fused data.

5.3.3. Accuracy Analysis of Different Clustering Methods

A comparison experiment was designed to verify the effectiveness of spectral clustering. The reference methods used were k-means and DBSCAN. The results are shown in Figure 8; six typical regions, labelled with circles and numbers, were selected for qualitative validation. The identification results and accuracy are presented in Table 7. The results indicate that the spectral clustering method has better performance than the other two methods.

6. Discussion

6.1. Identification Accuracy Comparison

(1) Comparison of the accuracy with non-fused single data.

As shown in Table 6, the accuracy was higher for fused data than non-fused single data. This was the expected conclusion, as the prepared fused data would lead to a better identification result.

Moreover, this can also be understood from the data characteristics. A single data source can only express one perspective of urban functional areas, such that the accuracy will be low. POI data are static data with rich semantic information, while OD data are dynamic and can delineate the daily activities of city residents. For POI data, each POI can only represent a point, and they do not consider the influence of the nearby area and the spatial scale (e.g., area and length). The same is true for OD data, which can reflect residents’ activities to a certain extent, though without semantic information. The fusion of these two types of data can not only enhance the comprehensiveness of data but also fill in missing values. Therefore, high accuracy can be achieved.

(2) Comparison of the accuracy with other clustering methods.

In terms of the clustering method principles, spectral clustering is a more suitable method for our data. K-means, DBSCAN, and spectral clustering are all unsupervised classification methods which are used to classify similar data into groups. The k-means algorithm has good clustering performance for data with Gaussian distribution, such that the requirements for data are strict. DBSCAN is a density-based clustering method that supports arbitrary shape clustering for dense data, in contrast to the k-means method. However, if the data density is not uniform, the DBSCAN clustering result is poor. Spectral clustering focuses on intrinsic similar features of the data instead of the spatial distribution. Therefore, this method can address the data concern.

For our work, the POI and taxi trajectory OD data are non-Gaussian, and are stored in matrix form. The uneven spatial distribution is also significant. The data density is higher in some commercial and financial and green square areas, while the data volume is smaller in some residential areas. Therefore, the k-means and DBSCAN clustering methods have lower recognition accuracy for our data. On the other hand, spectral clustering only requires a similarity matrix between data sources; thus, this was selected as the best approach for clustering regions.

6.2. Comparison with Other Fusion Methods

As was found in other studies [58,59], the comparison of the reported accuracy of similar work was widely used as an accuracy validation method. Therefore, we selected some similar studies, as shown in Table 8, according to case area and methodology.

Based on the assumption that these methods achieved similar accuracy with various types of data, we adopted their reported accuracies as comparison variables. Table 8 shows the selected methods and accuracy results. In the comparison experiments, the parameters of Formula (1) of our proposed method were set as

α

= 0.1,

β

= 0.5, and

γ

= 0.01. The authors of [13] proposed a decision fusion strategy within the Fifth Ring Road of Beijing (first line of Table 8), which is the same study area as in our work. However, our proposed method achieved higher accuracy. The authors of [44] proposed a CC-FLU model (second line of Table 8) based on the same principle as our proposed method, but the accuracy of our work was about 5% higher. The other three methods in Table 8 represent platform-based [33], feature-based [56], and decision-based [40] fusion methods. The results presented in Table 8 indicate that our method had a slightly lower OA value than the SOE learning framework-based method, but it had a higher kappa coefficient. The Kappa coefficient is slightly higher, which means that the classification of this paper is more accurate. Therefore, our method shows higher classification consistency. In this paper, the distribution of the number of functional regions is unbalanced. The number of Transportation areas is much smaller than that of other functional areas, while the number of Residential areas is significantly higher than that of other region. Therefore, the OA value is not a good indication of the superiority of the method. In addition, the slightly lower OA value may be due to data processing; for example, when approaching the destination, some drivers operate the metering device in advance, artificially changing the vehicle’s occupied status to empty. Some of the taxi stops are on the road and not in a specific area. The processing may result in OD points that do not match the real region where they are located. In the future, we will use different data and add more advanced means of processing data for further research in order to improve the recognition accuracy. Taken together, these results indicate the good performance of our proposed method.

6.3. Influence of the Parameters on the Accuracy

The advantage of our method is that it can incorporate POI semantic, interaction, and spatial unit similarity information. Parameters

α

and

β

balance the feature information from the OD interaction, POI, and OD time–frequency matrices. If

α

= 0 and

β

= 0, the method only mines the OD interaction data. If

α

= inf and

β

= 0, this indicates the fusion of the POI and OD interaction data. If

α

= inf and

β

= inf, the three types of data are fused by matrix factorization. Therefore, the values of

α

and

β

have a significant impact on the identification accuracy. We used the control variable method and the iterative validation method to determine the parameters. α is used to regulate the effect of Q and P on recognition, and

β

is used to regulate the effect of Q and R on recognition; they can be studied separately to determine them. First, β is set to a value of 0. At this time, only the fusion of Q and P is studied, and the fusion effect is better when the value of α is 0.1 after iterative validation. Therefore, α is set to 0.1; the fusion of Q and R is studied, which is the determination of the optimal value of β. The above process is exchangeable, and it is likewise possible to determine β first and then α. The method using neural network training may be able to be determined simultaneously, which is our future research direction. The confusion matrix for the calculation of the OA and kappa values was used to validate the proposed method. Figure 9 provides the accuracy trends with different values of these parameters. As shown in Figure 9a, with an increased

α

, the kappa and OA values also increased. When

α

surpassed 0.1, the accuracy decreased. Therefore, 0.1 was chosen as the value of

α

. Using this as a basis, we determined the value of

β

. Similarly, Figure 9b shows that OA and kappa increased and then decreased with increasing values of

β

; they reached the highest value when

β

= 0.5. Thus,

α

= 0.1 and

β

= 0.5 were chosen as the parameter values for the experiments in this paper.

7. Conclusions

In this paper, a context-coupled matrix factorization (CCMF) method was proposed to fuse data considering the contextual relationships between various data sources. The goal of our research was to provide a more accurate method for the delineation of urban functional regions. The principle of our method is the use of matrix factorization to identify UFRs. This idea has been widely used in recommended research, but seldom in urban functional identification. The proposed method was validated by fusing POI and taxi OD data in Beijing, China. The results indicate superior performance in OA and compared to single non-fused data sources and other similar models. The CCMF method improved the accuracy by about 5% when compared to a similar method in the case area. However, our work focused on the methodology. There is still room for improvement. For example, our next work will examine mixed functional regions with fine granularity.

Author Contributions

Conceptualization, Changfeng Jing and Yanru Hu; methodology, Yanru Hu and Hongyang Zhang; validation, Yanru Hu and Hongyang Zhang; formal analysis, Mingyi Du and Shishuo Xu; investigation, Changfeng Jing; data curation, Yanru Hu and Hongyang Zhang; writing—original draft preparation, Changfeng Jing and Yanru Hu; writing—review and editing, Xian Guo and Jie Jiang; supervision, Changfeng Jing; project administration, Changfeng Jing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Beijing Natural Science Foundation (grant #8222009); the Training Program for Talents by Xicheng, Beijing (grant #202137); and the Pyramid Talent Training Project of BUCEA (grant #JDJQ20200306).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable to this article.

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Liu, Y.; Wang, F.; Xiao, Y.; Gao, S. Urban land uses and traffic “source-sink areas”: Evidence from GPS-enabled taxi data in Shanghai. Landsc. Urban Plan. 2012, 106, 73–87. [Google Scholar] [CrossRef]
Hu, Y.; Han, Y. Identification of urban functional areas based on POI Data: A case study of the guangzhou economic and technological development zone. Sustainability 2019, 11, 1385. [Google Scholar] [CrossRef] [Green Version]
Frias-Martinez, V.; Frias-Martinez, E. Spectral clustering for sensing urban land use using Twitter activity. Eng. Appl. Artif. Intell. 2014, 35, 237–245. [Google Scholar] [CrossRef] [Green Version]
Ge, P.; He, J.; Zhang, S.; Zhang, L.; She, J. An Integrated Framework Combining Multiple Human Activity Features for Land Use Classification. ISPRS Int. J. Geo-Inf. 2019, 8, 90. [Google Scholar] [CrossRef] [Green Version]
Gao, Q.; Fu, J.; Yu, Y.; Tang, X. Identification of urban regions’ functions in Chengdu, China, based on vehicle trajectory data. PLoS ONE 2019, 14, e0215656. [Google Scholar] [CrossRef] [Green Version]
Xiao, D.; Zhang, X.; Hu, Y. Urban Functional Area Identification Method Based on Mobile Big Data. Xitong Fangzhen Xuebao/J. Syst. Simul. 2019, 31, 2281–2288. [Google Scholar] [CrossRef]
Ye, C.; Zhang, F.; Mu, L.; Gao, Y.; Liu, Y. Urban function recognition by integrating social media and street-level imagery. Environ. Plan. B Urban Anal. City Sci. 2021, 48, 1430–1444. [Google Scholar] [CrossRef]
Yuan, J.; Zheng, Y.; Xie, X. Discovering regions of different functions in a city using human mobility and POIs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 186–194. [Google Scholar] [CrossRef]
Yu, M.; Li, J.; Lv, Y.; Xing, H.; Wang, H. Functional Area Recognition and Use-Intensity Analysis Based on Multi-Source Data: A Case Study of Jinan, China. ISPRS Int. J. Geo-Inf. 2021, 10, 640. [Google Scholar] [CrossRef]
Dai, P.; Jing, C.; Du, M.; Zhou, W. A method based on spatial analyst to detect hot spot of urban component management events. In Proceedings of the 2015 2nd IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM), Fuzhou, China, 8–10 July 2015; pp. 55–59. [Google Scholar] [CrossRef]
Qian, Z.; Liu, X.; Tao, F.; Zhou, T. Identification of Urban Functional Areas by Coupling Satellite Images and Taxi GPS Trajectories. Remote Sens. 2020, 12, 2449. [Google Scholar] [CrossRef]
Meng, T.; Jing, X.; Yan, Z.; Pedrycz, W. A survey on machine learning for data fusion. Inf. Fusion 2020, 57, 115–129. [Google Scholar] [CrossRef]
Jia, Y.; Ge, Y.; Ling, F.; Guo, X.; Wang, J.; Wang, L.; Chen, Y.; Li, X. Urban land use mapping by combining remote sensing imagery and mobile phone positioning data. Remote Sens. 2018, 10, 446. [Google Scholar] [CrossRef] [Green Version]
Xing, H.; Meng, Y. Integrating landscape metrics and socioeconomic features for urban functional region classification. Comput. Environ. Urban Syst. 2018, 72, 134–145. [Google Scholar] [CrossRef]
Zhu, J.; Tao, C.; Lin, X.; Peng, J.; Huang, H.; Chen, L.; Wang, Q. A multiple subspaces-based model: Interpreting urban functional regions with big geospatial data. ISPRS Int. J. Geo-Inf. 2021, 10, 66. [Google Scholar] [CrossRef]
Jing, C.; Dong, M.; Du, M.; Zhu, Y.; Fu, J. Fine-Grained Spatiotemporal Dynamics of Inbound Tourists Based on Geotagged Photos: A Case Study in Beijing, China. IEEE Access 2020, 8, 28735–28745. [Google Scholar] [CrossRef]
Herold, M.; Liu, X.H.; Clarke, K.C. Spatial metrics and image texture for mapping urban land use. Photogramm. Eng. Remote Sens. 2003, 69, 991–1001. [Google Scholar] [CrossRef] [Green Version]
Wang, Y.; Wang, T.; Tsou, M.H.; Li, H.; Jiang, W.; Guo, F. Mapping dynamic urban land use patterns with crowdsourced geo-tagged social media (Sina-Weibo) and commercial points of interest collections in Beijing, China. Sustainability 2016, 8, 1202. [Google Scholar] [CrossRef] [Green Version]
Papadakis, E.; Gao, S.; Baryannis, G. Combining design patterns and topic modeling to discover regions that support particular functionality. ISPRS Int. J. Geo-Inf. 2019, 8, 385. [Google Scholar] [CrossRef] [Green Version]
Zhang, X.; Du, S.; Wang, Q. Hierarchical semantic cognition for urban functional zones with VHR satellite images and POI data. ISPRS J. Photogramm. Remote Sens. 2017, 132, 170–184. [Google Scholar] [CrossRef]
Cao, R.; Tu, W.; Yang, C.; Li, Q.; Liu, J.; Zhu, J.; Zhang, Q.; Li, Q.; Qiu, G. Deep learning-based remote and social sensing data fusion for urban region function recognition. ISPRS J. Photogramm. Remote Sens. 2020, 163, 82–97. [Google Scholar] [CrossRef]
Bao, H.; Ming, D.; Guo, Y.; Zhang, K.; Zhou, K.; Du, S. DFCNN-based semantic recognition of urban functional zones by integrating remote sensing data and POI data. Remote Sens. 2020, 12, 1088. [Google Scholar] [CrossRef] [Green Version]
Zhang, C.; Sargent, I.; Pan, X.; Li, H.; Gardiner, A.; Hare, J.; Atkinson, P.M. An object-based convolutional neural network (OCNN) for urban land use classification. Remote Sens. Environ. 2018, 216, 57–70. [Google Scholar] [CrossRef] [Green Version]
Pei, T.; Sobolevsky, S.; Ratti, C.; Shaw, S.L.; Li, T.; Zhou, C. A new insight into land use classification based on aggregated mobile phone data. Int. J. Geogr. Inf. Sci. 2014, 28, 1988–2007. [Google Scholar] [CrossRef] [Green Version]
Toole, J.L.; Ulm, M.; González, M.C.; Bauer, D. Inferring land use from mobile phone activity. In Proceedings of the ACM SIGKDD International Workshop on Urban Computing, Beijing, China, 12 August 2012; pp. 1–8. [Google Scholar] [CrossRef] [Green Version]
Kang, C.; Shi, L.; Wang, F.; Liu, Y. How urban places are visited by social groups? Evidence from matrix factorization on mobile phone data. Trans. GIS 2020, 24, 1504–1525. [Google Scholar] [CrossRef]
Kang, C.; Qin, K. Understanding operation behaviors of taxicabs in cities by matrix factorization. Comput. Environ. Urban Syst. 2016, 60, 79–88. [Google Scholar] [CrossRef]
Zhang, P.; Li, T.; Wang, G.; Luo, C.; Chen, H.; Zhang, J.; Wang, D.; Yu, Z. Multi-source information fusion based on rough set theory: A review. Inf. Fusion 2021, 68, 85–117. [Google Scholar] [CrossRef]
Bayoudh, K.; Knani, R.; Hamdaoui, F.; Mtibaa, A. A survey on deep multimodal learning for computer vision: Advances, trends, applications, and datasets. Vis. Comput. 2021, 1–32. [Google Scholar] [CrossRef]
Esteban, J.; Starr, A.; Willetts, R.; Hannah, P.; Bryanston-Cross, P. A review of data fusion models and architectures: Towards engineering guidelines. Neural Comput. Appl. 2005, 14, 273–281. [Google Scholar] [CrossRef] [Green Version]
Xiao, J.; Shen, Y.; Ge, J.; Tateishi, R.; Tang, C.; Liang, Y.; Huang, Z. Evaluating urban expansion and land use change in Shijiazhuang, China, by using GIS and remote sensing. Landsc. Urban Plan. 2006, 75, 69–80. [Google Scholar] [CrossRef]
Assem, H.; Xu, L.; Buda, T.S.; O’Sullivan, D. Spatio-Temporal clustering approach for detecting functional regions in cities. In Proceedings of the 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), San Jose, CA, USA, 6–8 November 2016; pp. 370–377. [Google Scholar] [CrossRef]
Song, J.; Lin, T.; Li, X.; Prishchepov, A.V. Mapping Urban Functional Zones by Integrating Very High Spatial Resolution Remote Sensing Imagery and Points of Interest: A Case Study of Xiamen, China. Remote Sens. 2018, 10, 1737. [Google Scholar] [CrossRef] [Green Version]
Zheng, Y. Methodologies for Cross-Domain Data Fusion: An Overview. IEEE Trans. Big Data 2015, 1, 16–34. [Google Scholar] [CrossRef]
Tu, W.; Hu, Z.; Li, L.; Cao, J.; Jiang, J.; Li, Q.; Li, Q. Portraying urban functional zones by coupling remote sensing imagery and human sensing data. Remote Sens. 2018, 10, 141. [Google Scholar] [CrossRef] [Green Version]
Zhou, W.; Ming, D.; Lv, X.; Zhou, K.; Bao, H.; Hong, Z. SO–CNN based urban functional zone fine division with VHR remote sensing image. Remote Sens. Environ. 2020, 236, 111458. [Google Scholar] [CrossRef]
Zhai, Y.; Yao, Y.; Guan, Q.; Liang, X.; Li, X.; Pan, Y.; Yue, H.; Yuan, Z.; Zhou, J. Simulating urban land use change by integrating a convolutional neural network with vector-based cellular automata. Int. J. Geogr. Inf. Sci. 2020, 34, 1475–1499. [Google Scholar] [CrossRef]
Cao, K.; Guo, H.; Zhang, Y. Comparison of approaches for urban functional zones classification based on multi-source geospatial data: A case study in Yuzhong District, Chongqing, China. Sustainability 2019, 11, 660. [Google Scholar] [CrossRef] [Green Version]
Yu, B.; Wang, Z.; Mu, H.; Sun, L.; Hu, F. Identification of Urban Functional Regions Based on Floating Car Track Data and POI Data. Sustainability 2019, 11, 6541. [Google Scholar] [CrossRef] [Green Version]
Feng, Y.; Huang, Z.; Wang, Y.; Wan, L.; Liu, Y.; Zhang, Y.; Shan, X. An SOE-Based Learning Framework Using Multisource Big Data for Identifying Urban Functional Zones. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 7336–7348. [Google Scholar] [CrossRef]
Srivastava, S.; Vargas-Muñoz, J.E.; Tuia, D. Understanding urban landuse from the above and ground perspectives: A deep learning, multimodal solution. Remote Sens. Environ. 2019, 228, 129–143. [Google Scholar] [CrossRef] [Green Version]
Gauvin, L.; Panisson, A.; Cattuto, C. Detecting the community structure and activity patterns of temporal networks: A non-negative tensor factorization approach. PLoS ONE 2014, 9, e86028. [Google Scholar] [CrossRef] [Green Version]
Zhang, W.; Ji, C.; Yu, H.; Zhao, Y.; Chai, Y. Interpersonal and intrapersonal variabilities in daily activity-travel patterns: A Networked spatiotemporal analysis. ISPRS Int. J. Geo-Inf. 2021, 10, 148. [Google Scholar] [CrossRef]
Zhang, Y.; Li, Q.; Tu, W.; Mai, K.; Yao, Y.; Chen, Y. Functional urban land use recognition integrating multi-source geospatial data and cross-correlations. Comput. Environ. Urban Syst. 2019, 78, 101374. [Google Scholar] [CrossRef]
Wang, J.; Ye, Y.; Fang, F. Study of Urban Functional Zoning Based on Nuclear Density and Fusion Data. Geogr. Geo-Inf. Sci. 2019, 35, 66–71. [Google Scholar] [CrossRef]
Xue, B.; Zhao, B.; Xiao, X.; Li, J.; Xie, X.; Ren, W. A Poi Data-Based Study on Urban Functional Areas of the Resourcesbased City: A Case Study of Benxi, Liaoning. Hum. Geogr. 2020, 35, 81–90. [Google Scholar] [CrossRef]
Kim, K.; Lee, K. Handling points of interest (POIs) on a mobile web map service linked to indoor geospatial objects: A case study. ISPRS Int. J. Geo-Inf. 2018, 7, 216. [Google Scholar] [CrossRef] [Green Version]
Yi, B.; Shen, X.; Liu, H.; Zhang, Z.; Zhang, W.; Liu, S.; Xiong, N. Deep Matrix Factorization with Implicit Feedback Embedding for Recommendation System. IEEE Trans. Ind. Inform. 2019, 15, 4591–4601. [Google Scholar] [CrossRef]
Shokoohi-Yekta, M.; Hu, B.; Jin, H.; Wang, J.; Keogh, E. Generalizing DTW to the multi-dimensional case requires an adaptive approach. Data Min. Knowl. Discov. 2016, 31, 1–31. [Google Scholar] [CrossRef] [Green Version]
Ma, H.; Yang, H.; Lyu, M.R.; King, I. SoRec: Social recommendation using probabilistic matrix factorization. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, Napa Valley, CA, USA, 26–30 October 2008; pp. 931–940. [Google Scholar] [CrossRef]
Du, R.; Lu, J.; Cai, H. Double Regularization Matrix Factorization Recommendation Algorithm. IEEE Access 2019, 7, 139668–139677. [Google Scholar] [CrossRef]
Fiedler, M. Algebraic connectivity of graphs. Czechoslov. Math. J. 1973, 23, 298–305. [Google Scholar] [CrossRef]
Shi, J.; Malik, J. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 888–905. [Google Scholar] [CrossRef] [Green Version]
Zhou, M.; Lu, L.; Guo, H.; Weng, Q.; Cao, S.; Zhang, S.; Li, Q. Urban sprawl and changes in land-use efficiency in the beijing–tianjin–hebei region, china from 2000 to 2020: A spatiotemporal analysis using earth observation data. Remote Sens. 2021, 13, 2850. [Google Scholar] [CrossRef]
Sun, Z.; Jiao, H.; Wu, H.; Peng, Z.; Liu, L. Block2vec: An approach for identifying urban functional regions by integrating sentence embedding model and points of interest. ISPRS Int. J. Geo-Inf. 2021, 10, 339. [Google Scholar] [CrossRef]
He, X.; Yuan, X.; Zhang, D.; Zhang, R.; Li, M.; Zhou, C. Delineation of Urban Agglomeration Boundary Based on Multisource Big Data Fusion—A Case Study of Guangdong–Hong Kong–Macao Greater Bay Area (GBA). Remote Sens. 2021, 13, 1801. [Google Scholar] [CrossRef]
Liu, X.; Tian, Y.; Zhang, X.; Wan, Z. Identification of urban functional regions in Chengdu based on taxi trajectory time series data. ISPRS Int. J. Geo-Inf. 2020, 9, 158. [Google Scholar] [CrossRef] [Green Version]
Du, S.; Du, S.; Liu, B.; Zhang, X.; Zheng, Z. Large-scale urban functional zone mapping by integrating remote sensing images and open social data. GIScience Remote Sens. 2020, 57, 411–430. [Google Scholar] [CrossRef]
Zhong, Y.; Su, Y.; Wu, S.; Zheng, Z.; Zhao, J.; Ma, A.; Zhu, Q.; Ye, R.; Li, X.; Pellikka, P.; et al. Open-source data-driven urban land-use mapping integrating point-line-polygon semantic objects: A case study of Chinese cities. Remote Sens. Environ. 2020, 247, 111838. [Google Scholar] [CrossRef]

Figure 1. Study area within the Fifth Ring Road of Beijing, China.

Figure 2. Flowchart for the identification of urban functional regions.

Figure 3. Context-coupled matrix factorization.

Figure 4. Undirected graph division of spectral clustering: the min cut and normalized cut.

Figure 5. Results of the data fusion.

Figure 6. Map of UFR identification results.

Figure 7. Identification results obtained using single data sources (i.e., without data fusion): (a) POI results and (b) taxi results; and (c) results obtained using fused data. (Different circles denotes 6 typical regions for qualitative validation. The results were shown in Table 6).

Figure 8. Identification results of the clustering methods: (a) k-means, (b) DBSCAN, and (c) spectral clustering. (Different circles denotes that the 6 typical regions for qualitative validation. The results were shown in Table 7).

Figure 9. Influence of the parameters on the overall accuracy and the kappa value: (a) parameter

α

; (b) parameter

β

.

Figure 9. Influence of the parameters on the overall accuracy and the kappa value: (a) parameter

α

; (b) parameter

β

.

Table 1. Vehicle trajectory data.

ID	Time	Lon	Lat	Speed	Direction	Status
577745	20181115001526	116.0343475	39.7697029	0	44	1
77451	20181115001524	116.2787704	39.9216766	0	142	1
…	…	…	…	…	…	…
164881	20181115001522	116.5705872	39.8961868	0	0	1
77420	20181115001527	116.4042664	39.9476395	60	88	1

Status indicates whether the taxi is running: 1 indicates that it is occupied; 0 indicates that it is empty.

Table 2. POI reclassification.

Category	Sub-Categories
Residential	Hotel, guesthouse, community, dormitory, house
Public service	Government, museum, hospital, school, library, post office
Commercial and financial	Movie theater, entertainment, bank, restaurant, supermarket, cafe
Industrial	company, enterprise, building
Green square	Scenic spot, park, natural scenery
Transportation	Subway station, bus station, airport, railway station

Table 3. CP values of different clustering regions (clusters 1–6).

	Cluster 1	Cluster 2	Cluster 3	Cluster 4	Cluster 5	Cluster 6
Residential	0.143812	0.177729	0.181027	0.202814	0.167457	0.187726
Public service	0.149206	0.193115	0.168945	0.170677	0.160588	0.173410
Commercial and financial	0.144601	0.163830	0.213594	0.196002	0.172417	0.198671
Industrial	0.177717	0.160707	0.140391	0.174287	0.175169	0.175980
Green square	0.219440	0.150754	0.078140	0.079752	0.170572	0.091786
Transportation	0.165223	0.174866	0.217904	0.176468	0.153796	0.181427

Table 4. Fused data identification confusion matrix with 30 spatial units.

		Res.	Ind.	Pub.	Comm.	Trans.	Green	UA
Classification Result	Res.	27	0	1	0	1	1	0.90
	Ind.	1	26	1	0	0	2	0.87
	Pub.	2	0	27	0	0	1	0.90
	Comm.	0	0	1	28	0	1	0.93
	Trans.	2	1	1	0	26	0	0.87
	Green	1	0	1	0	0	28	0.93
	PA	0.82	0.96	0.84	0.97	0.96	0.85

OA = 0.9, kappa = 0.88.

Table 5. Comparison of the functional area identification results.

Functional Area	Place	Identification Results	Gaode Image	Gaode Map
Green Square	Palace Museum (Lat: 39.9237871 Lon:116.4034318)
Green Square	Temple of Heaven Park (Lat: 39.888243 Lon: 116.417246)
Public Service	Peking University (Lat: 39.998877 Lon: 116.316833)
Public Service	Great Hall of the People (Lat: 39.911394 Lon: 116.400238)
Transportation	Beijing Train Station (Lat: 39.909462 Lon: 116.433547)
Transportation	Beijingnan Railway Station (Lat:39.871232 Lon: 116.384901)
Residential	Ganjiakou Community (Lat: 39.926322 Lon: 116.327689)
Residential	Fangcheng Yuan Community (Lat: 39.8733 Lon: 116.442967)
Industrial	Zhongguancun (Lat: 39.983633 Lon: 116.322892)
Industrial	Wangjing (Lat: 39.992024 Lon: 116.476471)
Commercial and Financial	Xidan (Lat: 39.91748 Lon: 116.381024)
Commercial and Financial	Xihongmen (Lat: 39.792674 Lon: 116.333765)

Table 6. Comparison of the results and accuracy in different spatial units.

Regions	POI		OD		POI + OD		Ground Truth
A	Res.		Pub.		Green		Green
B	Ind.		Comm.		Pub.		Pub.
C	Comm.		Comm.		Trans.		Trans.
D	Pub.		Green		Ind.		Ind.
E	Pub.		Trans.		Res.		Res.
F	Comm.		Green		Comm.		Comm.
Accuracy	OA	kappa	OA	kappa	OA	kappa
Accuracy	84.1%	0.82	82.6%	0.79	90.0%	0.88

Table 7. Comparison results and accuracy of the different clustering methods.

Region	K-Means		DBSCAN		Spectral Clustering		Ground Truth
A	Ind.		Green		Green		Green
B	Green		Res.		Pub.		Pub.
C	Ind.		Res.		Trans.		Trans.
D	Comm.		Comm.		Ind.		Ind.
E	Res.		Ind.		Res.		Res.
F	Comm.		Comm.		Comm.		Comm.
Accuracy	OA	kappa	OA	kappa	OA	kappa
Accuracy	84.4%	0.81	85.6%	0.82	90.0%	0.88

Table 8. Overall accuracy of the proposed method compared with other fusion methods.

Method	OA (%)	Kappa
Decision fusion strategy of RSI and MPPD [13]	83.5	NA
CC-FLU mode [44]	85.1	0.81
Fusion method of POI and RSI [33]	78.5	0.75
Fusion using spatial metrics [56]	81.3	NA
SOE-based learning framework [40]	90.9	0.85
Our proposed method	90.0	0.88

NA denotes that this work did not provide accuracy results.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jing, C.; Hu, Y.; Zhang, H.; Du, M.; Xu, S.; Guo, X.; Jiang, J. Context-Aware Matrix Factorization for the Identification of Urban Functional Regions with POI and Taxi OD Data. ISPRS Int. J. Geo-Inf. 2022, 11, 351. https://doi.org/10.3390/ijgi11060351

AMA Style

Jing C, Hu Y, Zhang H, Du M, Xu S, Guo X, Jiang J. Context-Aware Matrix Factorization for the Identification of Urban Functional Regions with POI and Taxi OD Data. ISPRS International Journal of Geo-Information. 2022; 11(6):351. https://doi.org/10.3390/ijgi11060351

Chicago/Turabian Style

Jing, Changfeng, Yanru Hu, Hongyang Zhang, Mingyi Du, Shishuo Xu, Xian Guo, and Jie Jiang. 2022. "Context-Aware Matrix Factorization for the Identification of Urban Functional Regions with POI and Taxi OD Data" ISPRS International Journal of Geo-Information 11, no. 6: 351. https://doi.org/10.3390/ijgi11060351

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Context-Aware Matrix Factorization for the Identification of Urban Functional Regions with POI and Taxi OD Data

Abstract

1. Introduction

2. Related Work

2.1. The Urban Functional Region Identification Approach

2.2. Data Fusion Methods for Urban Functional Region Identification

3. Study Area and Datasets

3.1. Study Area

3.2. Datasets and Processioning

4. Methodology

4.1. Feature Information Extraction Workflow

4.2. Context-Coupled Matrix Factorization for Data Fusion

4.3. Spectral Clustering to Identify Urban Functional Regions

4.4. Evaluation Metrics

5. Results

5.1. Data Fusion with the Proposed CCMF Method

5.2. Results of the UFR Identification

5.3. Accuracy Comparison Analysis Results

5.3.1. Accuracy Analysis with the Proposed CCMF Method

5.3.2. Accuracy Analysis of Single Data without Fusing

5.3.3. Accuracy Analysis of Different Clustering Methods

6. Discussion

6.1. Identification Accuracy Comparison

6.2. Comparison with Other Fusion Methods

6.3. Influence of the Parameters on the Accuracy

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI