Next Article in Journal
Enhancing the Understanding of the EU Gender Equality Index through Spatiotemporal Visualizations
Previous Article in Journal
Identifying Urban Park Events through Computer Vision-Assisted Categorization of Publicly-Available Imagery
 
 
Article
Peer-Review Record

A Self-Attention Model for Next Location Prediction Based on Semantic Mining

ISPRS Int. J. Geo-Inf. 2023, 12(10), 420; https://doi.org/10.3390/ijgi12100420
by Eric Hsueh-Chan Lu * and You-Ru Lin
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 4: Anonymous
ISPRS Int. J. Geo-Inf. 2023, 12(10), 420; https://doi.org/10.3390/ijgi12100420
Submission received: 23 August 2023 / Revised: 3 October 2023 / Accepted: 9 October 2023 / Published: 13 October 2023

Round 1

Reviewer 1 Report

A self-attention model for next location prediction based on semantic mining

 

This paper introduced a self-attention model for location prediction. This paper is well structured. This study contributes to the location prediction, which is important for geographers and urban planners. But there are some confusing parts in this paper that needed to be improved or clarified. My main confusing part is about the semantic understanding of POIs. Since you titled the algorithm to be based on “semantic mining”, I thought semantic mining is a main focus of this paper, but the sections regarding semantic mining need more clarification.

 

1.     Can you explain more about the POI dataset? I saw “poi.na” and “poi.ty”. What exactly do they mean? Page 14, line 467 mentioned “Rental data: POI data: ”. What does the “rental data” mean?

 

2.     Figure 9 POI categories also look confusing to me. What does “address” mean? Every POI has an address, so I am not sure what does this mean as a standalone POI category. Also not clear are: real estate, infrastructure, organization, domestic services.

 

3.     For the POI categorization: If a shopping mall contains food court or restaurants, how this is reflected in the POI data? Is it categorized as a shopping mall or restaurant? In this case, how can it affect the nearest neighborhood for the semantic mining?

 

4.     In page 6, line 254, you mentioned the users’ location was recorded every 1 to 5 seconds. Given generates a massive amount of stay points for each user. In page 7, you mentioned that you used cluster method to generate clusters of stay point, but later sections you switched into grids? I’m confused in this part and not sure which method you used to get the clusters, or it’s based on point density in each grid?

 

Minor:

1.     Some in-line references were not generated correctly that shows “error” in the text.

2.     Figure 15 is hard to see the background map. Seems like there are some water bodies and green spaces, but it’s too light and I can’t tell the semantic background.

Some parts need more clarification.

Author Response

This paper introduced a self-attention model for location prediction. This paper is well structured. This study contributes to the location prediction, which is important for geographers and urban planners. But there are some confusing parts in this paper that needed to be improved or clarified. My main confusing part is about the semantic understanding of POIs. Since you titled the algorithm to be based on “semantic mining”, I thought semantic mining is a main focus of this paper, but the sections regarding semantic mining need more clarification.

Response: Thanks very much for the recognition of our efforts and give us many valuable comments and suggestions. Yes, “semantic mining” is a main focus of this paper. To better understand our idea, we have carefully revised this paper and point-to-point responded the following questions.

 

Point 1: Can you explain more about the POI dataset? I saw “poi.na” and “poi.ty”. What exactly do they mean? Page 14, line 467 mentioned “Rental data: POI data:”. What does the “rental data” mean?

Response: We are sorry that there are some typos in paper writing. We have mentioned in Table 1 that 'poi.ty' represents the type of POI and removed the typos 'poi.na' and 'rental data' at line 256 and 468, respectively.

 

Point 2: Figure 9 POI categories also look confusing to me. What does “address” mean? Every POI has an address, so I am not sure what does this mean as a standalone POI category. Also not clear are: real estate, infrastructure, organization, domestic services.

Response: Thanks for your question. The type of Points of Interest (POIs) was referenced from Tencent Maps. "Address" represents natural place names, road names, administrative place names, and similar categories. "Real estate" represents POIs related to residential areas, including residential zones, office buildings, and subsidiary facilities within residential complexes. "Infrastructure" includes fundamental facilities such as airports, train stations, subways, and public toilets. "Organization" includes public law organization, foreign organization, government organization, and scientific research organization. "Domestic services" represents POIs related to daily life activities, such as hair salons, nursing homes, pet services, and laundromats. We have added this information as a supplement in lines 472-474.

 

Point 3: For the POI categorization: If a shopping mall contains food court or restaurants, how this is reflected in the POI data? Is it categorized as a shopping mall or restaurant? In this case, how can it affect the nearest neighborhood for the semantic mining?

Response: It is a good question which is also the core idea of our paper. In line 60, we mentioned that "semantic" is calculated based on distance and category. It can represent either "shopping" or "restaurants," we use probability to represent semantic features without losing other important semantic information.

 

Point 4: In page 6, line 254, you mentioned the users’ location was recorded every 1 to 5 seconds. Given generates a massive amount of stay points for each user. In page 7, you mentioned that you used cluster method to generate clusters of stay point, but later sections you switched into grids? I’m confused in this part and not sure which method you used to get the clusters, or it’s based on point density in each grid?

Response: Thanks for your question. We mentioned in line 288 that Clustering is utilized to mine homes and workplaces. The stay points sequence cannot be directly applied to mining home and workplace because no two stay points have the same latitudes and longitudes. To solve this problem, we apply the clustering algorithm to find out where the stay points are clustered. In the prediction phase, we adopted a common approach based on the related studies, which involves converting stay points into grids.

 

Minor:

Point 1: Some in-line references were not generated correctly that shows “error” in the text.

Response: Thank you for bringing this to our attention. We have corrected this error.

 

Point 2: Figure 15 is hard to see the background map. Seems like there are some water bodies and green spaces, but it’s too light and I can’t tell the semantic background.

Response: Thank you for the reminder. We have updated the image in Figures 15, improving the visibility of the semantic background.

 

Reviewer 2 Report

The paper “A Self-Attention Model for Next Location Prediction Based on Semantic Mining”  introduces a method to enhance the accuracy of predicting a user's next location based on semantic information derived from trajectory data and contextual features.

Overall, the article is well-structured, and the writing is clear, below are some comments to be addressed.

There are several missing references number in the text.

At line 278 "We When we calculate", please revise.

At line 444 the authors should use "section" instead of "chapter".

From 461 to 465: the text needs rewriting. “there are 41,080 grids” do the authors means grid cells?

Line 467 Rental data: POI data:, what is "rental data"? double colon, please revise.

In Section 5.3 the authors should better explain why SERM and MSSRM have been chosen for comparison with their model.

In Section 5.4 would be more readable if distinctive letters are added to the four sub-figures and referred in the text.

Author Response

The paper “A Self-Attention Model for Next Location Prediction Based on Semantic Mining” introduces a method to enhance the accuracy of predicting a user's next location based on semantic information derived from trajectory data and contextual features. Overall, the article is well-structured, and the writing is clear, below are some comments to be addressed.

Response: Thanks very much for the recognition of our efforts and give us many valuable comments and suggestions. We have carefully revised this paper and point-to-point responded the following questions.

 

Point 1: There are several missing references number in the text.

Response: Thank you for pointing out the missing reference numbers; this has been corrected.

 

Point 2: At line 278 "We When we calculate", please revise.

Response: Regarding the phrasing issue at line 278, your feedback prompted a revision.

 

Point 3: At line 444 the authors should use "section" instead of "chapter".

Response: We acknowledge the suggestion to use "section" instead of "chapter" at line 444, and this adjustment has been made.

 

Point 4: From 461 to 465: the text needs rewriting. “there are 41,080 grids” do the authors means grid cells?

Response: In response to the query about "grid cells" from line 461 to 465, we confirm this interpretation and have clarified the text accordingly.

 

Point 5: Line 467 Rental data: POI data:, what is "rental data"? double colon, please revise.

Response: We are sorry that this is a typo of paper writing. The typo "rental data" at line 468 has been removed.

 

Point 6: In Section 5.3 the authors should better explain why SERM and MSSRM have been chosen for comparison with their model.

Response: The SERM and MSSRM models share similarities with our model's architecture. The difference lies in SERM's joint learning of various embeddings (location, time, semantics, and user) and using LSTM to predict the next location. We are interested in understanding the performance using LSTM and a model without semantic mining. On the other hand, MSSRM jointly learns various features (user, location, and time), Self-Attention is introduced to distinguish each location in different contexts. However, their model does not consider semantics. Thus, we want to compare the difference between models with and without semantic features. We have supplemented information in lines 611 to 614.

 

Point 7: In Section 5.4 would be more readable if distinctive letters are added to the four sub-figures and referred in the text.

Response: Thank you for your reminder. We have addressed this suggestion in Figure 15.

Reviewer 3 Report

1.      This research is informative and interesting.

2.      The manuscirpt is good, written according to correct scientific principles.

3.      Please relate the findings to previously reported studies!

4.      What is the novelty of this research? Please add substantiation of the contribution to the theory, and practice, then show the scientific novelty of the study conducted compared to the already known technique/approach, and why readers need to read the present study!

5.      Typo in line 399,  Error! Reference source not found”

6.      Because it involves statistics i.e. probability, what distribution is the data used?

Comments for author File: Comments.pdf

Author Response

Point 1: This research is informative and interesting.

Point 2: The manuscript is good, written according to correct scientific principles.

Point 3: Please relate the findings to previously reported studies!

Response: Thanks very much for the recognition of our efforts and give us many valuable comments and suggestions. In line 56, it is mentioned that to understand the user’s semantic trajectory behavior, relevant studies matched the trajectory stay point to the nearest Point of Interest (POI). However, location error may lead to incorrect semantic matching; incorrect semantic matching causes the model to learn wrong semantic behavior patterns. Due to the motivation mentioned above, we propose a Self-Attention for Next Location Prediction based on Semantic Mining. In our model, we have location features, temporal features, semantic features and user features, while focusing on how to extract se-mantic features. For semantic features, the stay points of trajectory are matched to the k nearest POI, we then use the reciprocal of the distance from the stay points to the k nearest POI and the number of categories as weights. Finally, we use probability to express semantic features without losing other important semantic information.

 

Point 4: What is the novelty of this research? Please add substantiation of the contribution to the theory, and practice, then show the scientific novelty of the study conducted compared to the already known technique/approach, and why readers need to read the present study!

Response: Thanks for your question. We designed semantic matching to effectively extract semantic features for each stay point. Subsequently, we integrated semantic matching with sequence pattern mining to enrich the semantic features. Our experiments were conducted using the Geolife dataset. According to the experimental results, our approach outperformed state-of-the-art methods.

 

Point 5: Typo in line 399, “Error! Reference source not found”

Response: Thank you for the reminder. We have addressed and rectified the error.

 

Point 6: Because it involves statistics i.e. probability, what distribution is the data used?

Response: We are sorry that we are not sure which data you are referring to. If you are asking about the POI data, based on the distribution of POI data, we consider that it should follow an exponential distribution.

Reviewer 4 Report

Managing a large number of semantic categories or characteristics might complicate semantic matching. The semantic representations of various types of stay points may differ.How do you handle such issues.

Stay points may have ambiguous semantics. It is challenging to assign a single semantic label accurately.

Mixing semantic matching and sequential pattern mining can produce complicated patterns and relationships, making effective interpretation and understanding of the underlying semantics challenging.

In some circumstances, semantic information for stay points may be minimal, making it difficult to capture the semantics properly and generate meaningful patterns. what approach do you employ then.

When dealing with location prediction, the sheer size of the model and the computing cost might become a serious concern, especially in real-time or large-scale applications.

Line 138 The semantic information about the places is not captured by one-hot encoding. It sees all sites as being equally distinct from one another, regardless of their physical closeness or similarities. 

Line 249 It is critical to guarantee that data from many sources (user, geographical, temporal, and semantic) are appropriately and consistently aligned. Model mistakes and inconsistencies can result from mismatched or misaligned data.

It might be difficult to determine which qualities are most relevant and useful for a certain activity. To prevent overloading the model with unnecessary or duplicate information, feature selection approaches may be required.

Line 286:OPTICS necessitates the establishment of precise parameters, such as the number of points required for a cluster (MinPts) and a distance threshold (epsilon). Selecting proper parameter values can be difficult and may need subject knowledge.

The embedding for new or unknown nodes are not precomputed, Node2Vec may struggle to handle scenarios when there are new or unknown nodes during inference. So how do you handle the "cold start" problem.

Time2Vec is especially developed to detect periodicity in time series data. Time2Vec may not be the ideal solution if your data has complicated temporal patterns that go beyond periodicity.

 

 

 

 

Author Response

Point 1: Managing a large number of semantic categories or characteristics might complicate semantic matching. The semantic representations of various types of stay points may differ. How do you handle such issues.

Stay points may have ambiguous semantics. It is challenging to assign a single semantic label accurately.

Mixing semantic matching and sequential pattern mining can produce complicated patterns and relationships, making effective interpretation and understanding of the underlying semantics challenging.

In some circumstances, semantic information for stay points may be minimal, making it difficult to capture the semantics properly and generate meaningful patterns. what approach do you employ then.

When dealing with location prediction, the sheer size of the model and the computing cost might become a serious concern, especially in real-time or large-scale applications.

Response: Thanks very much for your comments. We totally agree that it is challenging to assign an accurate semantic label to a stay point. Hence, we decided to assign multiple semantic labels with a probability to a stay point. In certain cases, stay points may have limited semantic information, necessitating sufficient data to support this. To capture the semantics of various types of stay points effectively, we designed a Semantic Matching approach for semantic feature extraction, we use the probability to represent the semantics of a stay point without losing other important semantic information.

 

Point 2: Line 138 The semantic information about the places is not captured by one-hot encoding. It sees all sites as being equally distinct from one another, regardless of their physical closeness or similarities. 

Response: Thanks for your comment. We consider that using one-hot encoding cannot effectively reflect physical closeness or similarities. In our proposed Semantic Matching, we use probabilities to represent semantic behaviors. Therefore, it can reflect semantic similarity, i.e., vector similarity.

 

Point 3: Line 249 It is critical to guarantee that data from many sources (user, geographical, temporal, and semantic) are appropriately and consistently aligned. Model mistakes and inconsistencies can result from mismatched or misaligned data.

Response 3: It is a good point. We agree that data mismatch or inconsistency can potentially lead to model errors and inconsistencies. To mitigate this concern, we integrated a set of features including user features, location features, time features, and semantic features. The first three features were derived from trajectory data, while the last feature (semantic feature) was obtained by matching stay points with the k-nearest POIs. By utilizing these two primary sources of data, we aimed to minimize the risk of data mismatch issues.

 

Point 4:  It might be difficult to determine which qualities are most relevant and useful for a certain activity. To prevent overloading the model with unnecessary or duplicate information, feature selection approaches may be required.

Response 4: The feature selection is indeed a crucial aspect in deep learning. To ensure that each feature contributes to the accuracy, we conducted extensive experiments specifically focusing on these features to validate their impact on accuracy.

 

Point 5: Line 286: OPTICS necessitates the establishment of precise parameters, such as the number of points required for a cluster (MinPts) and a distance threshold (epsilon). Selecting proper parameter values can be difficult and may need subject knowledge.

Response 5: It is right that OPTICS requires precise parameters. The parameter setting is based on our testing and reference [22]. Selecting suitable parameters for MinPts and distance threshold (epsilon) can be challenging. However, it is undeniable that parameter tuning is very critical for OPTICS, this issue will be incorporated into our future work (Line 699).

 

Point 6: The embedding for new or unknown nodes are not precomputed, Node2Vec may struggle to handle scenarios when there are new or unknown nodes during inference. So how do you handle the "cold start" problem.

Response 6: Thanks for your insightful comment on "cold start" problem. In our study, we have not yet addressed the 'cold start' problem. Suppose there are some new or unknown nodes which are added into the location prediction system, Node2Vec requires re-computation for every new set of unknown nodes. However, we totally agree that the 'cold start' problem is a critical issue, and it will be considered in our future work (Line 701).

 

Point 7: Time2Vec is especially developed to detect periodicity in time series data. Time2Vec may not be the ideal solution if your data has complicated temporal patterns that go beyond periodicity.

Response 7: Again, this is also an insightful comment, thanks. We think that Time2Vec is designed to address periodicity in time-series data. However, we consider that it may require amount of data to learn these complex temporal patterns if the data exhibits complex temporal patterns beyond periodicity.

Back to TopTop