Privacy-Preserving Method for Trajectory Data Publication Based on Local Preferential Anonymity

Zhang, Xiao; Luo, Yonglong; Yu, Qingying; Xu, Lina; Lu, Zhonghao

doi:10.3390/info14030157

Open AccessArticle

Privacy-Preserving Method for Trajectory Data Publication Based on Local Preferential Anonymity

by

Xiao Zhang

^1,2,

Yonglong Luo

^1,2,*

,

Qingying Yu

^1,2,

Lina Xu

^1,2 and

Zhonghao Lu

^1,2

¹

School of Computer and Information, Anhui Normal University, Wuhu 241003, China

²

Anhui Provincial Key Laboratory of Network and Information Security, Wuhu 241003, China

^*

Author to whom correspondence should be addressed.

Information 2023, 14(3), 157; https://doi.org/10.3390/info14030157

Submission received: 15 December 2022 / Revised: 9 February 2023 / Accepted: 27 February 2023 / Published: 2 March 2023

Download

Browse Figures

Versions Notes

Abstract

:

With the rapid development of mobile positioning technologies, location-based services (LBSs) have become more widely used. The amount of user location information collected and applied has increased, and if these datasets are directly released, attackers may infer other unknown locations through partial background knowledge in their possession. To solve this problem, a privacy-preserving method for trajectory data publication based on local preferential anonymity (LPA) is proposed. First, the method considers suppression, splitting, and dummy trajectory adding as candidate techniques. Second, a local preferential (LP) function based on the analysis of location loss and anonymity gain is designed to effectively select an anonymity technique for each anonymous operation. Theoretical analysis and experimental results show that the proposed method can effectively protect the privacy of trajectory data and improve the utility of anonymous datasets.

Keywords:

data processing; trajectory anonymity; privacy preservation; splitting; suppression; dummy trajectory

1. Introduction

With the rapid development of modern device awareness systems and mobile positioning technologies, such as global positioning systems (GPSs) and radio frequency identification devices (RFID), access to location-based services (LBSs) is becoming increasingly used [1,2]. While enjoying convenient services, users leave their location records, which constitute the trajectory data. Many trajectories are released and used for various applications, such as urban planning, advertisement placement, and shop bidding [3,4,5]. However, the direct publication and application of trajectory data may lead to the leakage of users’ information [6,7]. Attackers can infer other unknown locations or identify users’ full trajectory based on some of the background knowledge they own [8].

For example, an electronic card company provides payment services for two chain stores A and B and stores user transaction records. Chain stores A and B store transaction records of users consuming at their stores, and the transaction records generated in physical locations may contain some important spatiotemporal information (e.g., the user in a specific location at a certain time). After preprocessing, the transaction records in some chain stores can be presented as the type of trajectory in Table 1. As shown in Table 1, assume that Alice generates the record

a_{5} \to b_{4} \to a_{1}

, which represents that she consumes at stores

a_{1}

and

a_{5}

of A and store

b_{4}

of B, respectively. Considering a publication of the transaction records by the company for analysis purposes, A can be regarded as an attacker who owns and tracks all the locations

a_{i}

; he owns the partial record

a_{5} \to a_{1}

of Alice and can infer that Alice has visited

b_{4}

in addition to

a_{5}

and

a_{1}

. This is because only the record

t_{1}

goes through

a_{5}

and then through

a_{1}

, so it can be determined that the complete record of Alice is

t_{1}

. In this case, the user’s consumption preferences and other information may be leaked, resulting in a breach of user privacy.

To achieve the safe publication of trajectory data in the above scenarios, existing works usually anonymize the trajectory dataset before publication [8,9,10,11]. Specifically, the data holder processes the trajectory dataset to be published when the background knowledge mastered by attackers is known to prevent the attackers from inferring user information. Under the premise of meeting users’ privacy requirements, such methods continuously process trajectory sequences to reduce the probability of unknown location information being inferred. This data processing can be divided into two steps. The first step is to identify the privacy threats; in other words, all the problematic sequences in the dataset should be found. The second step is to eliminate those privacy threats by processing some trajectory sequences. The algorithms [8] based on suppression and splitting can effectively achieve privacy preservation, but the time consumption for identifying privacy threats is relatively large, and the utility of the anonymous dataset after suppression is not very great. Two new tree-based structure [9,10] were proposed to optimize time performance. However, similar to the above methods, these methods are also implemented based on suppression. Due to the implementation characteristics of suppression, a large amount of information loss also often occurs.

The above methods can effectively achieve safe publication of trajectory data while meeting users’ privacy requirements. However, the methods are implemented based on suppression to eliminate privacy threats, which may lead to large information loss [9,10,11]. Specifically, the global suppression and local suppression algorithms are implemented to calculate the anonymity gain value and then delete some specific locations directly. Therefore, anonymous datasets usually lose many locations, and the number of lost locations gradually increases with an increasing dataset size. This characteristic may result in the excessive loss of information in the anonymous dataset, which is not conducive to analysis and application of the data after publication.

To improve the above problem, in other words, to realize the safe publication of trajectory data and improve the utility of trajectory data after publication, we propose a privacy-preserving method for trajectory data publication based on local preferential anonymity (LPA). We adopt a tree-based storage structure [10] and a simplified calculation method to extract problematic nodes; consider suppression, splitting, and dummy trajectory adding as candidate techniques; and select a specific technique based on the analysis of location loss and anonymity gain.

The main contributions of this paper are as follows:

(1): We propose a privacy-preserving method based on LPA, which implements suppression, splitting, and dummy trajectory adding based on the trajectory projection tree.
(2): We design a local preferential (LP) function based on the analysis of location loss and anonymity gain to select the final technique. Selecting an appropriate technique for problematic nodes can effectively achieve privacy preservation and reduce information loss in the process of privacy preservation.
(3): We conduct theoretical analysis and a set of experiments to show the feasibility of privacy preservation. Experimental results show that compared with existing methods, the LPA algorithm can effectively achieve privacy preservation while reducing the information loss of anonymous datasets.

2. Related Work

When trajectory datasets are published, user privacy has increasingly attracted attention from all walks of life. Researchers have done much work on privacy preservation of trajectory data and have proposed many privacy-preserving methods [1,7]. In this section, we investigate the existing work in this field. The methods for anonymizing locations can be divided into the following categories.

2.1. Dummy Trajectory

The method based on dummy trajectory adds synthetic dummy trajectories according to the characteristics of the original trajectory to reduce the probability that the real trajectory is inferred. Luper et al. [12] applied the dummy trajectory technique to trajectory data publication. Effectively generating dummy trajectories is a main problem. On the one hand, the dummy trajectory should be similar to the motion pattern of the real trajectory. The trajectory synthesis method proposed by Lei et al. [13] is widely used. When synthesizing spatiotemporal trajectory data, some spatiotemporal points are selected from the real trajectory, and these spatiotemporal points are rotated at the centre. Wu et al. [14] considered the gravity movement model to generate a dummy trajectory and proposed a method to protect trajectory data in continuous LBS and trajectory publication. Considering the spatiotemporal constraints of geographical environment of the users, Lu et al. [15] proposed a trajectory privacy protection method based on dummy trajectory; this method hides the real trajectories by dummy trajectories and can effectively protect user privacy. On the other hand, it is necessary to avoid adding too many dummy trajectories.

However, existing works rarely apply dummy trajectory to privacy protection for location sequences such as our work. The method proposed in this paper is applied to the environment where the attackers have already mastered some background knowledge, and the dummy trajectory is used to reduce the availability of background knowledge.

2.2. Clustering and Partition

The method based on clustering and partitioning divides the original trajectory dataset into small groups and then performs anonymization in each group to make the trajectories within each group indistinguishable. These methods are mostly used for spatiotemporal trajectories.Samarati [16] and Sweeney [17] implemented privacy protection based on k-anonymity, which deal the data so that each record has at least the exact same quasi-identifier attrib-ute value as the other k-1 records in the same group. Abul et al. [18] proposed (k, δ)-anonymity based on k-anonymity, proposed the never walk alone (NWA) algorithm, and proposed achieving k-anonymity through trajectory clustering. Domingo-Ferrer et al. [19,20] pointed out that the NWA algorithm may still leak privacy, and they proposed the SwapLocations algorithm to classify trajectories by general microaggregation and then replace the locations in the clusters. This method may also lose much information. Dong et al. [21] proposed a trajectory privacy preservation method based on frequent paths, which applied the concept of frequent paths to trajectory privacy preservation for the first time. They first found frequent paths to divide candidate groups and then selected representative trajectories in the candidate groups. This method can effectively achieve privacy protection but does not consider the different privacy requirements of users. Considering users’ privacy requirements, Kopanaki et al. [22] proposed personalized privacy parameters based on the research of Domingo-Ferrer et al. [19] and proposed trajectory division in the preprocessing stage. To further improve the utility of anonymous datasets, Chen et al. [23] considered multiple factors of spatiotemporal trajectories and proposed a method of privacy preservation for trajectory data publication based on 3D cell division to improve the problem of excessive information loss. The method divides cells in the trajectory preprocessing stage and performs suppression or location exchange within each cell for anonymity, which improves data utility.

Unlike our work, the above methods are often used for spatiotemporal trajectory data publication, and the partitioning of trajectories is implemented in the preprocessing stage. The advantage of trajectory division is that no locations are lost. Terrovitis et al. [8] proposed a method of splitting trajectories. The trajectories are split in the anonymous stage, and a trajectory may be split into two trajectories in their method, which can achieve effective anonymity and reduce the loss of locations. However, it may not work very well if only the splitting is implemented in trajectory anonymity, because splitting trajectories may affect the behavioral pattern of the users.

2.3. Generalization and Suppression

The privacy-preserving methods for location sequences are mostly based on generalization and suppression. Generalization is utilized to replace the content of information with a more ambiguous concept and generalize the locations on the trajectory to an enlarged area, which causes the locations in this area to be indistinguishable. Suppression selectively deletes specific sensitive locations before releasing trajectory data to achieve trajectory privacy preservation. Nergiz et al. [24] first proposed a generalization-based privacy preservation method for trajectory data publication, which introduced k-anonymity into the generalization-based method to achieve anonymity. Then, Yarovoy et al. [25] proposed a spatial generalization-based k-anonymity approach to anonymously publish moving object databases. Terrovitis et al. [26] defined a k^m-anonymity model, which is a new version of k-anonymity and is mainly used to protect privacy in the publication of set-valued data. To effectively protect data utility, Poulis et al. [27] applied k^m-anonymity to trajectory data and developed three anonymity algorithms based on a priori principles. Terrovitis et al. [28] proposed a method to suppress partial location information to achieve privacy preservation. To solve the problem of inferring unknown locations through partial information by attackers, they used partial trajectory knowledge as the quasi-identifier of the remaining locations and transformed the trajectory database into a safe counterpart.

For the above problem, Terrovitis et al. [8] continued applying suppression and splitting to develop four algorithms. Among those four algorithms, they proposed two algorithms based on suppression

G_{s u p}

and

L_{s u p}

, which delete specific locations according to different selection methods. The proposed algorithms can effectively achieve privacy protection, but there are still some problems in the processing. One is the linear storage of the original trajectory and the idea of the greedy algorithm, which result in a large amount of computation time. The other is the large information loss of the anonymous dataset. Lin et al. [10] proposed four algorithms based on suppression, which were implemented on tree structures. They proposed a method of anonymity gain measurement, which was optimized in terms of the performance of computation time. Although the performance of computation time is significantly improved, the information loss is basically the same as that of the algorithms [8]. In addition, the global suppression algorithm based on the tree structure cannot perform the correct update operation in some cases.

To reduce information loss of the anonymous dataset, we adopt a tree-based storage structure [10] to store the original data and implement suppression, splitting, and dummy trajectory adding based on the tree structure, which transforms processing of the privacy problems into problematic nodes. We consider suppression, splitting, and dummy trajectory adding as candidate techniques and design an LP function based on location loss and anonymity gain to select the final anonymity technique.

3. Preliminaries

To address the problem of privacy preservation for trajectory data publication, some specific locations in the dataset to be published should be processed, and a corresponding safe dataset with minimal information loss and good privacy preservation will be obtained. In this section, we introduce some basic definitions and formulas that will be used throughout this paper, and the storage structure used in our method will also be described.

3.1. Problem Definition

We proposed a privacy protection method for trajectory data publication in this paper. In our method, it should be noted that the trajectories studied in our work are sematic trajectories and are unidirectional. In other words, there are no repeated locations in a trajectory, The relevant definitions of the trajectory data are introduced as follows.

Let

l_{i}

be a location of special interest on a map (e.g., hospital, bank, or store). Then,

L

is a set of locations denoted by

L = \{l_{1}, l_{2}, \dots {, l}_{n}\}

.

Definition 1 (Trajectory).

A trajectory

t

is defined as an ordered list of locations,

t = l_{1} \to l_{2} \to \dots \to l_{n}

, where

l_{i} \in

L,

1 \leq i \leq n

. The length of a trajectory

t

denoted as

|t|

, which represents the number of locations in

t

. The set of

m

trajectories is presented as

T = \{t_{1}, t_{2}, {\dots, t}_{m}\}

Example 1.

Table 1 shows eight trajectories, which are defined on the set of locations

L = \{a_{1}, a_{2}, a_{3}, a_{4}, a_{5}, b_{1}, b_{2} {, b}_{3} {, b}_{4}\}

, and

T = \{t_{1}, t_{2}, {t_{3}, t_{4}, t_{5}, t_{6}, t_{7}, t}_{8}\}

. The length of trajectory

t_{1} = a_{5} \to b_{4} \to a_{1}

is

|t_{1}| = 3

Definition 2 (Subtrajectory).

Given two trajectories defined on the set of locations L, denoted as

t_{1} = l_{1}^{1} \to l_{2}^{1} \to \dots \to l_{m}^{1}

and

t_{2} = l_{1}^{2} \to l_{2}^{2} \to \dots \to l_{k}^{2}

, we say that

t_{1}

is a subtrajectory of

t_{2}

, denoted as

t_{1} \subseteq t_{2}

, if and only if

|t_{1}| < |t_{2}|

and there is a mapping

f

that

l_{1}^{1} = l_{f (1)}^{2}

,

l_{2}^{1} = l_{f (2)}^{2}

, …,

l_{m}^{1} = l_{f (m)}^{2}

,

f (1) < f (2) < \dots < f (m)

holds.

Example 2.

For trajectory

t_{1} = a_{5} \to b_{4} \to a_{1}

in Table 1, the trajectories

a_{5}

,

b_{4}

,

a_{1}

,

a_{5} \to b_{4}

,

b_{4} \to a_{1}

, and

a_{5} \to a_{1}

are the subtrajectories of

t_{1}

To achieve the safe publication of trajectory data, we anonymize the trajectory dataset before publication [8,9,10,11]. Therefore, we need to infer the partial information of trajectory owned by potential attackers Adv, which is called the background knowledge.

Definition 3 (Background Knowledge).

For a set of trajectories

T

defined on

L

, an attacker A in Adv owns a set of locations

L^{A} \subset L

, and

L^{A}

is called the background knowledge owned by A. Attacker A can track any user visiting the locations in

L^{A}

.

In this paper, we assume that any attacker owns background knowledge and that the background knowledge mastered by any two attackers is not shared. In other words, for any attacker A, B

\in A d v

and

A \neq B

,

L^{A} \cap L^{B} = \emptyset

. If our method is applied to the situation where attackers are sharing background knowledge, the trajectory data publisher needs to regroup attackers according to the disjointed background knowledge.

Example 3.

In Table 1, attackers A and B own the sets of locations

L^{A} = \{a_{1}, a_{2}, a_{3}, a_{4}, a_{5}\}

and

L^{B} = \{b_{1}, b_{2}, b_{3}, b_{4}\}

, respectively, and

L^{A} \cap L^{B} = \emptyset

Definition 4 (Trajectory Projection).

Let A be an attacker owning

L^{A}

, and there is a trajectory

t = l_{1} \to l_{2} \to \dots \to l_{n}

. The trajectory projection of attacker A is denoted as

{P T}_{A} (t)

and is a subtrajectory of trajectory

t

. In other words, all locations in

{P T}_{A} (t)

belong to

L^{A}

and trajectory

t

.

The set of trajectory projections of all trajectories

t \in T

about attacker A is denoted as

T^{A}

=

\{{P T}_{A} (t) : t \in T\}

.

Example 4.

In Table 1, the trajectory projections of

t_{1}

to attackers A and B are

{P T}_{A} (t_{1}) = a_{5} \to a_{1}

and

{P T}_{B} (t_{1}) = b_{4}

, respectively. The trajectory projection set of all trajectories

t \in T

is

T^{A}

, as shown in Table 2.

The attacker A may be able to infer the trajectories that contain the trajectory projection; we say that the set of these trajectories is a trajectory projection support set.

Definition 5 (Trajectory Projection Support Set).

Let

T *

be a trajectory dataset to be published, and let

t^{A}

be a trajectory in

T^{A}

. If

t^{A} = {P T}_{A} (t)

, we say that

t

supports a trajectory projection

t^{A}

. The trajectory projection support set of

t^{A}

in

T^{A}

about

T *

is denoted as

S T * (t^{A})

,

S T * (t^{A})

=

\{t \in T * : {{P T}_{A} (t) = t}^{A}\}

.

S T * (t^{A})

represents a set of trajectories, and each trajectory in

S T * (t^{A})

supports a trajectory projection

t^{A}

.

Example 5.

In Table 1, if

t^{A} = a_{3}

, the trajectory projection support set of

a_{3}

is

S T * (a_{3}) = \{t_{2}, t_{8}\}

. It also represents that the attacker A may be able to infer that

a_{3}

is the trajectory projection of either of

t_{2}

or

t_{8}

.

For these scenarios, A can also infer that

a_{3}

may be associated with

b_{4}

. Let

S T * (t^{A}, l)

be a trajectory set that contains all trajectories in

S T * (t^{A})

and a location

l \notin L^{A}

, denoted as

S T * (t^{A}, l) = \{t : t \in S T * (t^{A}) \land l \in t\}

.

Definition 6 (Inference Probability).

The inference probability is denoted by

P_{r} (t^{A}, l)

, which represents the probability of associating a location

l \notin L^{A}

to trajectory projection

t^{A}

, and calculated as Equation (1):

P_{r} (t^{A}, l) = \frac{|S T * (t^{A}, l)|}{|S T * (t^{A})|}

(1)

Example 6.

After Example 5,

S T * (a_{3}, b_{4}) = \{t_{2}\}

. The inference probability of associating

b_{4}

to trajectory projection

a_{3}

is

P_{r} (a_{3}, b_{4})

=

\frac{|S T * (a_{3}, b_{4})|}{|S T * (a_{3})|} = \frac{|\{t_{2}\}|}{|\{t_{2}, t_{8}\}|} = \frac{1}{2}

.

Definition 7 (Problematic Pair).

Let

P_{t}

be a probability threshold defined by the user, which represents the user’s privacy requirements. If

P_{r} (t^{A}, l) > P_{t}

, then the pair

(t^{A}, l)

is a problematic pair, and

t^{A}

is defined as problematic. Moreover, the number of problems of

(t^{A}, l)

is

|(t^{A}, l)|

, and the total number of problems is denoted as N.

A trajectory dataset is called unsafe if it has at least one problematic pair. The aim of our work is to eliminate all problems.

Definition 8 (Problem Definition).

Given a trajectory dataset T, a user’s privacy requirements threshold

P_{t}

, and a set of potential attackers Adv, construct a corresponding safe dataset

T *

to T, with minimum information loss.

3.2. Trajectory Projection Tree

In the existing trajectory privacy preservation methods, some storage structures were proposed in the trajectory preprocessing stage, such as linear structures [8], graph structures [21,29] and tree structures [9,10]. In this paper, we choose the tree structure (TP-tree) [10] to store the original trajectory dataset T. The reason is that the tree-based storage structure can simplify the calculation of inference probability and reduce the computational cost.

Example 7.

With Table 1 as the original trajectory dataset T and attackers set

A d v = \{A, B\}

as the inputs, let us execute the TP-treeConstruction algorithm [10]to construct a TP-tree corresponding to T. The TP-tree of Table 1is shown in Figure 1. The details of the TP-tree are introduced as follows:

Fisrtly, obtain the trajectory projections of different attackers; the trajectory projections are inserted as

2 n d

-level nodes. For trajectory

t_{1} = a_{5} \to b_{4} \to a_{1}

, the trajectory projections of A and B are

{P T}_{A} (t_{1}) = a_{5} \to a_{1}

and

{P T}_{B} (t_{1}) = b_{4}

, respectively. The

2 n d

-level nodes

\{a_{5} \to a_{1} : (t_{1}, 13)\}

and

\{b_{4} : (t_{1}, 2)\}

should be inserted into the TP-tree.

Secondly, for each attacker’s trajectory projection, all locations in the trajectory but not in the trajectory projection are the locations that may be inferred. For the tree node

{P T}_{A} (t_{1}) = a_{5} \to a_{1}

, the location

b_{4}

in

t_{1}

but not in

{P T}_{A} (t_{1})

may be inferred by A, so the

3 r d

-level node

\{b_{4} : (t_{1}, 2)\}

of the

2 n d

-level node

\{a_{5} \to a_{1} : (t_{1}, 13)\}

should be generated. Similarly, for the tree node

{P T}_{B} (t_{1}) = b_{4}

, the locations

a_{5}

and

a_{1}

in

t_{1}

but not in

{P T}_{B} (t_{1})

may be inferred by B, and the

3 r d

-level nodes

\{a_{5} : (t_{1}, 1)\}

and

\{a_{1} : (t_{1}, 3)\}

of the

2 n d

-level node

\{b_{4} : (t_{1}, 2)\}

should be generated.

It should be noted that in order to facilitate writing, only the trajectory projection and location representation contained in the node are written in the subsequent description of the node. For example, the

2 n d

-level node

\{a_{5} \to a_{1} : (t_{1}, 13)\}

can be written as

\{a_{5} \to a_{1}\}

.

3.3. Anonymity Gain Measurement

When processing location sequences, the trajectories will be changed, and the total number of problems N may also be changed. Taking both aspects into consideration, a set of gain measurement formulas for suppression and splitting were proposed [8]. To measure the change in N,

{g a i n}_{N}

is defined as Equation (2):

{g a i n}_{N} = \frac{N - N^{'}}{N}

(2)

where N (

N^{'}

) represents the number of problems before (after) anonymity.

Let

t_{m}^{A}

and

t_{n}^{A}

(

t_{n}^{A} {\subset t}_{m}^{A}

) be the trajectory projections owned by an attacker A. When at least one of

t_{m}^{A}

and

t_{n}^{A}

is problematic, we use

t_{n}^{A}

to unify

t_{m}^{A}

; in other words, the locations in

t_{n}^{A}

but not in

t_{m}^{A}

must be deleted from

t_{m}^{A}

and the original trajectories that contain

t_{m}^{A}

. To measure the information loss, the ploss of suppression is defined as Equation (3):

p l o s s (t, t^{'}) = 1 - \frac{|t^{'}| (|t^{'}| - 1)}{|t| (|t| - 1)}

(3)

where

t

(

t^{'}

) represents the trajectory before (after) suppression and

|t|

(

|t^{'}|

) represents the length of trajectory

t

(

t^{'}

). The anonymity gain measurement of suppression is defined as

G_{g a i n}

, which is shown as Equation (4):

G_{g a i n} (t_{m}^{A}, t_{n}^{A}) = \frac{N - N^{'}}{N} \times \frac{1}{\sum_{t \in S} p l o s s (t, t^{'})}

(4)

where S represents the set of trajectories that are affected by the change in suppression,

S \subseteq T

.

Let

l

be the splitting location in a trajectory

t

. Trajectory

t

is split into two subtrajectories (

t^{'}, t^{″}

) from location

l

. The ploss of splitting is defined as Equation (5):

p l o s s (t, t^{'}, t^{″}) = 1 - \frac{|t^{'}| (|t^{'}| - 1) + |t^{″}| (|t^{″}| - 1)}{|t| (|t| - 1)}

(5)

where

t

(

t^{'}, t^{″}

) represents the trajectory before (after) splitting, and

|t|

(

|t^{'}|, |t^{″}|

) represents the length of the trajectory

t

(

t^{'}, t^{″}

). Therefore, the anonymity gain measurement of splitting is defined as

S_{g a i n}

, shown as Equation (6):

S_{g a i n} (l, t) = \frac{N - N^{'}}{N} \times \frac{1}{\sum_{t \in S} p l o s s (t, t^{'}, t^{″})}

(6)

where S represents the set of trajectories affected by the change in splitting,

S \subseteq T

.

For a problematic trajectory projection

t^{A}

, a dummy trajectory identical to

t^{A}

will be generated and defined as

t^{'}

. Therefore, the ploss of the dummy trajectory is defined as Equation (7):

p l o s s (t, t^{'}) = 1 - \frac{|t| (|t| - 1)}{|t^{'}| (|t^{'}| - 1)}

(7)

where

t

(

t^{'}

) represents the trajectory before (after) adding the dummy trajectory, and

|t|

(

|t^{'}|

) represents the length of trajectory

t

(

t^{'}

). However, there is no original trajectory that corresponds to

t^{'}

, so the values of

p l o s s (t, t^{'})

must be 1. The anonymity gain measurement of the dummy trajectory is defined as

F_{g a i n}

, shown as Equation (8):

F_{g a i n} (t^{'}) = \frac{N - N^{'}}{N}

(8)

4. Methods

In this section, we propose a privacy-preserving method based on LPA to protect the privacy of trajectory publication. An overview of the proposed algorithm is shown in Figure 2. First, we construct the TP-tree [10] to store the original trajectory dataset T. Then, we use the problematic nodes finding (PNF) algorithm to obtain the set of problematic nodes P in the TP-tree and the total number of the problems N. Afterwards, we measure the anonymity gain of each anonymity technique and obtain the anonymity gains

G_{g a i n} (t_{m}^{A}, t_{n}^{A})

,

S_{g a i n} (l, t)

, and

F_{g a i n} (t^{'})

. Then, the LP algorithm is used to select a specific technique to process the problematic nodes. The proposed method is described in detail in the rest of this section.

Definition 9 (Problematic Node).

The problematic node is the

2 n d

-level node containing the problem in the TP-tree, which needs to be eliminated. The tree node of TP-tree containing

t^{A}

is a problematic node if and only if

t^{A}

is problematic.

4.1. Problematic Nodes Finding

Based on the TP-tree constructed in the preprocessing stage, we use the PNF algorithm to identify the privacy threats of the trajectory dataset. Then, we find all problematic nodes in the tree under the user’s privacy requirements. The PNF algorithm is depicted as Algorithm 1.

Algorithm 1: Problematic_Nodes_Finding (PNF)

Input: TP-tree, user’s privacy requirements threshold

P_{t}

Output: Problematic pairs set Q, problematic nodes set P,

the total number of problems N

1:: Q = $\emptyset$ , P = $\emptyset$ , N = 0
2:: for each Sec in the $2 n d$ -level of TP-tree do
3:: for each Thi node of Sec do
4:: Get the tra.num of Sec named $S e c . n u m$
5:: Get the tra.num of Thi named $T h i . n u m$
6:: Calculate $P_{r} = \frac{T h i . n u m}{S e c . n u m}$
7:: if $P_{r} > P_{t}$ then
8:: Insert pair (Sec, Thi) into Q
9:: if Sec not in P then
10:: Insert Sec into P
11:: $N = N + T h i . n u m$
12:: return Q, P, N

Algorithm 1 takes the TP-tree and user’s privacy requirements threshold

P_{t}

as the inputs and returns the set of problematic pairs Q, the set of problematic nodes P, and the total number of problems N. After initialization, the algorithm scans all the

2 n d

-level nodes Sec and the corresponding child nodes Thi of Sec and obtains the trajectory number of Thi and Sec, denoted as

S e c . n u m

and

T h i . n u m

(Lines 2 to 5), respectively. Then, this algorithm calculates

P_{r} = \frac{T h i . n u m}{S e c . n u m}

and checks whether

P_{r}

is greater than

P_{t}

. If

P_{r}

is greater than

P_{t}

,

(S e c, T h i)

is inserted into set Q. Moreover, if Sec is not in set P, Sec is inserted into set P. Finally, the

T h i . n u m

of Thi is accumulated into the total number of problems N (Lines 6 to 11).

Example 8.

After constructing the TP-tree of T, Algorithm 1 is executed to find the problematic nodes of the TP-tree. In our example, the user’s privacy requirements threshold

P_{t}

is defined as 0.5. After executing Algorithm 1, as Figure 3demonstrates the problematic nodes are shown in the dotted box, the problematic set

P = \{a_{5} \to a_{1}, a_{1} \to a_{2}, a_{1} \to a_{3} {, a}_{1} \to a_{5}, a_{2}, a_{1} \to a_{5} \to a_{4} \to a_{2}, b_{3} \to b_{2}, b_{1} \to b_{2}, b_{2}\}

,and the total number of problems N is 16.

4.2. Anonymity Gain Measurment

For a problematic node, we calculate the anonymity gain when applying suppression (

G_{g a i n} (t_{m}^{A}, t_{n}^{A})

), splitting (

S_{g a i n} (l, t))

and the dummy trajectory (

F_{g a i n} (t^{'})

). Then, we select the final technique and obtain an array denoted as FinalGain, which is a row of PreGain. The PreGain array is shown in Table 3, where each column of the PreGain array corresponds to the trajectory projection contained in problematic node that needs to be processed, the trajectory ID contained in the node, the final operation, the relevant operation information, and the anonymity gain.

4.2.1. Suppression

We introduce the operation of suppression based on the TP-tree. For a problematic node, we first find two nodes of the same attacker from the

2 n d

level of the TP-tree to form a node pair

(t_{m}^{A}, t_{n}^{A})

. Notably,

t_{n}^{A}

is the subtrajectory of

t_{m}^{A}

. The idea of suppression is to use

t_{n}^{A}

to unify

t_{m}^{A}

. In other words, the locations that exist in

t_{n}^{A}

but not in

t_{m}^{A}

are deleted. The details of the tree updating are described in Example 9.

Example 9.

Considering the problematic node

b_{1} \to b_{2}

, suppression based on the TP-tree is used, and the anonymity gain is obtained. The first step is to find the node pair that includes

b_{1} \to b_{2}

, in which

(b_{1} \to b_{2}, b_{2})

is the only node pair. Therefore, we use

b_{2}

to unify

b_{1} \to b_{2}

, and the location

b_{1}

must be removed from

t_{7}

. As shown in the first step of Figure 4, the location

b_{1}

and its location.id are deleted from the

2 n d

-level node

b_{1} \to b_{2}

, and then the remaining

b_{1} \to b_{2}

and its child nodes are inserted into the

2 n d

-level node

b_{2}

and its child nodes, respectively. We can see from the second step in Figure 4 that node

a_{1} \to a_{5} \to a_{4} \to a_{2}

also includes

t_{7}

; therefore, child node

b_{1}

and its location.id must also be deleted. Afterwards, the problem number N is 10, the length of trajectory

t_{7}

becomes 5, and the anonymity gain

G_{g a i n} (b_{1} \to b_{2}, b_{2})

is 1.12.

4.2.2. Splitting

We design the splitting operation based on the TP-tree. The idea of the intuitive splitting technique is to directly divide a trajectory into two trajectories from the splitting location. The splitting location is selected based on the trajectory projection contained in the problematic node, and all the locations in the trajectory projection contained in the problematic node can be defined as the splitting location

l

. The location

l

is an indivisible location if and only if

l

is the last location of the original trajectory. For a problematic node there may be several cases of splitting, so selecting the final splitting location

l

requires calculating the splitting anonymity gain of all potential locations except the indivisible one and then selecting the best one as the final splitting anonymity gain. The details of the TP-tree changes are described in Example 10.

Example 10.

Considering the problematic node

b_{1} \to b_{2}

, splitting based on the TP-tree is used, and the anonymity gain is obtained. The local change in the tree is shown in Figure 5. First, the trajectories that are involved in the problematic node are found. Moreover, the trajectory involved is

t_{7}

, and the splitting location is

b_{1}

or

b_{2}

, where

b_{2}

is an indivisible location. Thus,

t_{7}

is split from

b_{1}

to obtain

{t_{7}}^{'}

and

{t_{7}}^{″}

. Then, we construct a new TP-tree* about the trajectory

{t_{7}}^{'}

and

{t_{7}}^{″}

, delete all node information of

t_{7}

in the original TP-tree, and merge the TP-tree* of the trajectory

{t_{7}}^{'}

and

{t_{7}}^{″}

into the original TP-tree. After splitting, the number of problems N is 12; the length of

t_{7}

is 6; the length of the trajectory

{t_{7}}^{'}

and

{t_{7}}^{″}

are 2 and 4, respectively; and the

S_{g a i n} (b_{1}, t_{7})

is 0.47.

4.2.3. Dummy Trajectory

The key to the dummy trajectory based on the TP-tree is adding dummy trajectories; in this paper, we add a dummy trajectory based on the trajectory projection contained in the problematic node. In other words, a dummy trajectory similar to the trajectory protection contained in problematic node is added to the TP-tree. The details of the TP-tree changes are described in Example 11.

Example 11.

Considering the problematic node

b_{1} \to b_{2}

, the dummy trajectory based on the TP-tree is used, and the anonymity gain is obtained. A dummy trajectory

{t_{7}}^{'} = b_{1} \to b_{2}

is added to the TP-tree, and Figure 6shows the local change of the tree by using the dummy trajectory. Afterwards, the problem number N is 12, and

F_{g a i n} (b_{1} \to b_{2})

is 0.25.

4.3. Local Preferential Seclection

We consider suppression, splitting, and dummy trajectory adding as candidate techniques to achieve privacy preservation based on the tree structure, solving problematic nodes step by step. To select the final technique for a specific problematic node, we propose an LP function based on the analysis of location loss and the anonymity gain.

Let

P_{g a i n}

and

R_{g a i n}

be the first and second anonymity gains, respectively. The suppression is the final selection if one of the following conditions holds:

(1): $P_{g a i n} = G_{g a i n} (t_{m}^{A}, t_{n}^{A})$ and Del.loc = 1;
(2): $P_{g a i n} = G_{g a i n} (t_{m}^{A}, t_{n}^{A})$ and $|G_{g a i n} (t_{m}^{A}, t_{n}^{A}) - R_{g a i n}|$ $>$ $P_{t}$

In other cases, the operation that corresponds to

R_{g a i n}

must be finally selected.

Analysis. In this paper, we consider suppression, splitting, and dummy trajectory adding as candidate techniques. However, suppression will quickly delete several locations once to reduce the number of problems, which results in the algorithm tending towards suppression. Additionally, many locations may be deleted. Therefore, we construct an LP function to change this trend to protect privacy and reduce information loss. We define the operation as the best operation if and only if the number of locations lost is 1 and the anonymity gain of the suppression is largest. We set the value of

P_{t}

to the same as the user’s privacy requirements threshold.

The LP algorithm is shown as Algorithm 2. The LP algorithm takes a problematic node c, the user’s privacy requirements threshold

P_{t}

, and the TP-tree as the inputs. The LP algorithm returns an array FinGain, shown as Table 3. Initially, for a problematic node c, the LP algorithm calculates the anonymity gain of suppression

G_{g a i n} (t_{m}^{A}, t_{n}^{A})

. The node pair

(m t_{m}^{A}, {m t}_{n}^{A})

is set as the maximum anonymity gain

G_{g a i n} (t_{m}^{A}, t_{n}^{A})

(Lines 2 to 4). Then, the LP algorithm scans all the

2 n d

-level nodes of the TP-tree, finds all the trajectories that the problematic node contained, and calculates the splitting anonymity gain

S_{g a i n} (l, t)

of all trajectories at each splitting location. By letting

(m l, m t)

be the gain,

S_{g a i n} (l, t)

is largest (Lines 5 to 7). Finally, the LP algorithm adds the same dummy trajectory as trajectory projection

t_{c}^{A}

and calculates the anonymity gain of dummy trajectory

F_{g a i n} (t^{'})

(Line 8). Afterwards, the LP algorithm selects the final operation for the problematic node and returns FinGain (Lines 9 to 17).

Algorithm 2: Local_Preferential (LP)

Input: Pending problematic node c, user’s privacy requirements threshold

P_{t}

, TP-tree

Output: Array of final operation FinGain

1:: FinGain = $\emptyset$
2:: for every $(t_{m}^{A}, t_{n}^{A})$ and one of $t_{m}^{A}, t_{n}^{A}$ is equal to $t_{c}^{A}$ do// $t_{c}^{A}$ is the trajectory projection contained in node c
3:: Calculate $G_{g a i n} (t_{m}^{A}, t_{n}^{A})$
4:: Let ${(m t}_{m}^{A}, {m t}_{n}^{A})$ be the node pair with maximum $G_{g a i n} (t_{m}^{A}, t_{n}^{A})$
//Problematic Node $\leftarrow t_{c}^{A}$ & Final Operation $\leftarrow$ suppress & Operation Information $\leftarrow {(m t}_{m}^{A}, {m t}_{n}^{A})$ & Anonymity Gain $\leftarrow G_{g a i n} (m t_{m}^{A}, {m t}_{n}^{A})$
5:: for each l in $t_{c}^{A}$ do
6:: Calculate $S_{g a i n} (l, t)$
7:: Let $(m l, m t)$ be the projections with maximum $S_{g a i n} (l, t)$
//Problematic Node $\leftarrow t_{c}^{A}$ & Final Operation $\leftarrow$ split & Operation Information $\leftarrow l$ & Anonymity Gain $\leftarrow S_{g a i n} (m l, m t)$
8:: Calculate $F_{g a i n} (t^{'}) ⊳$ $t^{'}$ = $t_{c}^{A}$
//Problematic Node $\leftarrow t_{c}^{A}$ & Final Operation $\leftarrow$ dummy & Operation Information $\leftarrow t^{'}$ & Anonymity Gain $\leftarrow F_{g a i n} (t^{'})$
9:: Let the highest gain be $P_{g a i n}$ and the second highest gain be $R_{g a i n}$
10:: if $G_{g a i n} (t_{m}^{A}, t_{n}^{A})$ = $P_{g a i n}$ then
11:: if del.loc = 1 or $|G_{g a i n} (t_{m}^{A}, t_{n}^{A}) - R_{g a i n}|$ $> P_{t}$ then
12:: Insert pair $(t_{m}^{A}, t_{n}^{A})$ and $G_{g a i n} (t_{m}^{A}, t_{n}^{A})$ into FinGain
13:: else
14:: Insert the corresponding information about $R_{g a i n}$ into FinGain
//the corresponding information $⊳$ Problematic Node& Final Operation &
Operation Information & Anonymity Gain
15:: else
16:: Insert the corresponding information about $P_{g a i n}$ into FinGain
//the corresponding information $⊳$ Problematic Node& Final Operation &
Operation Information & Anonymity Gain
17:: return FinGain

4.4. Trajectory Anonymization

In this subsection, we introduce our algorithm of privacy preservation for trajectory data publication based on LPA. The algorithm iteratively solves the problematic nodes of the TP-tree until the trajectory dataset release is met. The LPA algorithm is shown as Algorithm 3.

Algorithm 3 takes an original trajectory dataset T, a set of attackers Adv, and a user’s privacy requirements threshold

P_{t}

as the inputs and returns a safe trajectory dataset

T *

. Initially, the LPA algorithm constructs the TP-tree [10] (Line 1) and then calls the PNF algorithm to obtain the set of problematic nodes P and the total number of problems N (Line 2). If there are problematic nodes in the TP-tree (

N > 0)

, c is set to be a problematic node, and the LPA algorithm continues to select the final operation according to the LP algorithm (Line 5). Afterwards, the returned final operation and related information are used to solve the problematic node c and update the TP-tree. Specifically, if the final operation is suppression, the corresponding node updates of

t_{m}^{A}

are merged into

t_{n}^{A}

(Lines 6 to 11). If the final operation is splitting, the LPA algorithm finds all the trajectories that are involved in the problematic node and constructs a dataset

T^{s}

. All node information about the trajectory

t \subset T^{s}

must be deleted from the original TP-tree. After that, each trajectory

t \subset T^{s}

is split from the splitting location l, and a new dataset

T^{m}

is reconstructed. Finally, the TP-tree* of the dataset

T^{m}

is merged to the TP-tree (Lines 13 to 18). If the final operation is to add a dummy trajectory, then the same dummy trajectory

t^{'}

as the problematic node is added and merged into the TP-tree (Lines 20 to 21). After the above step, the new number of problems N is calculated, and the LPA algorithm iteratively eliminates the number of problems until N is 0. Then, the corresponding safe trajectory dataset

T *

is obtained.

Algorithm 3: Local_Preferential_Anonymity (LPA)

Input: Original trajectory dataset T, attackers set Adv, user’s privacy requirements threshold

P_{t}

Output: A corresponding safe dataset

T *

1:: TP-tree = TPtreeConstruction (T, Adv)
2:: (Q, P, N) = Problematic_Nodes_Finding (TP-tree, $P_{t}$ )
3:: whileN > 0 do
4:: Let a pending problematic node in P be c
5:: FinGain = Local_Preferential (TP-tree, c, $P_{t}$ )
6:: if FinOp is suppress then //FinOp $⊳$ Final Operation in FinGain
7:: Let ${S e c}^{'}$ (Sec) be the node at $2 n d$ -level of TP-tree having projection $t_{m}^{A}$ ( $t_{n}^{A}$ )
8:: Update Sec.tra as Sec.tra $\cup {S e c}^{'} . t r a$ and update Sec.loc
9:: Insert each child node $h^{'}$ of ${S e c}^{'}$ into Sec
10:: Delete ${S e c}^{'}$ from TP-tree
11:: Delete the corresponding child node h of ${S e c}^{"}$ // ${S e c}^{"} . t r a \cap {S e c}^{'} . t r a \neq \emptyset$ & h is the location in $t_{n}^{A}$ but not in $t_{m}^{A}$
12:: else
13:: if FinOp is split then //FinOp $⊳$ Final Operation in FinGain
14:: Reconstruct trajectory dataset as $T^{s}$
15:: Remove all node information about $t \subset T^{s}$ from TP-tree
16:: Split each t in $T^{s}$ and reconstruct dataset $T^{m}$
17:: TP-tree* = TPtreeConstruction ( $T^{m}$ , Adv)
18:: Update TP-tree as TP-tree $\cup$ TP-tree*
19:: else
20:: FinOp is dummy //FinOp $⊳$ Final Operation in FinGain
21:: Update TP-tree as TP-tree∪ $t^{'}$
22:: (Q, P, N) = Problematic_Nodes_Finding (TP-tree, $P_{t}$ )
23:: return $T *$

Example 12.

According to the LP function, the final operation for the problematic node

b_{1} \to b_{2}

is suppression.

4.5. Algorithms Anlysis

4.5.1. Trajectory Privacy Preservation Capability

Consider an original trajectory dataset T, and let

T *

be the anonymized trajectory dataset of T.

Theorem 1.

The LPA algorithm can derive a corresponding safe dataset

T *

of the original trajectory datasetT.

Proof.

To derive a corresponding safe dataset

T *

of trajectory dataset T, there are no problem pairs in

T *

and no problematic nodes in the final TP-tree, and the number of problems N is 0. In our method, the LPA algorithm first constructs a TP-tree of T, so each entire trajectory will be stored in the tree. Then LPA calls the PNF algorithm to find the problematic nodes P in the TP-tree. Afterwards, the LPA algorithm selects the anonymity operation for each problematic node based on the LP algorithm and then solves the problematic node one by one until P is empty and N is 0. □

For example, for the original dataset in Table 1, after executing the LPA algorithm, we obtain a dataset

T *

, as shown in Table 4. Then, we execute the PNF algorithm for identification; the problem pairs set Q and the problematic nodes set P are empty, and the number of problems N is 0. Therefore,

T *

is a safe dataset.

4.5.2. Complexity Analysis

In order to present the results of algorithm complexity more clearly, let Adv be a set of potential attackers,

N_{t a}

be the maximum number of different projections about trajectory dataset T to an attacker, and

N_{l o c}

be the maximum number of different locations contained in the trajectories to a projection. N is the total number of problems.

The cost of the TPtreeConstruction algorithm depends on inserting the trajectories in T into TP-tree. When considering an attacker and a trajectory, the cost of inserting all locations into the tree is

O (N_{t a} \times N_{l o c})

. Therefore, the cost of inserting all locations of all trajectories of all attackers is

O (|T| \times |A d v| \times N_{t a} \times N_{l o c})

. The most expensive cost of algorithm PNF depends on scanning the

2 n d

-level node and the corresponding

3 r d

-level nodes of each

2 n d

-level node at the same time. The worst case is the maximum number of

2 n d

-level projection nodes, as the cost of algorithm PNF is

O (|A d v| \times N_{t a} \times N_{l o c})

.

Algorithm LP mainly measures the anonymous gains of suppression, splitting, and dummy trajectory adding for a problematic node in the problematic node set of the TP-tree and selects the best value in the FinGain array. The expensive cost of suppression involves to scanning the

2 n d

-level nodes of the TP-tree once, finding all the trajectory projection pairs, and then scanning the TP-tree again for the calculation. The cost is

O (2 \times |A d v| \times N_{t a} \times N_{l o c})

.

The cost of splitting involves scanning the trajectories contained in the problematic node and the corresponding trajectory projection of the problematic node, and then the different locations of this trajectory projection are analyzed, with the cost of

O (N_{l o c} \times |A d v| \times N_{t a})

.The cost of dummy trajectory adding involves scanning the TP-tree for calculation, and the time requirement is

O (|A d v| \times N_{l o c} \times N_{t a})

. In total, the cost of algorithm LP is

O (|A d v| \times N_{l o c} \times N_{t a})

.

The time cost of algorithm LPA is mainly dependent on the total number of problems N in the TP-tree and the call to the above algorithms. The worst case is that all problems N need to be iterated, and the cost is

O (N \times |A d v| \times N_{t a} \times N_{l o c})

.

Discussion.

Time performance analysis of the proposed algorithm LPA and the compared algorithms [8,10].

The storage structures of

G_{s u p}

and

L_{s u p}

[8] are arrays to store the trajectory dataset and the set of trajectory projection, so each calculation of the inference probability needs to traverse the trajectory projection set first and then the trajectory dataset, which results in lower time performance. In order to solve this problem, the tree-based storage structure is adopted in the algorithms

{I G}_{S U P} *

and

{I L}_{S U P} *

[10], which avoids wasting time in traversing the trajectory dataset in the calculation process of inference probability and verifies the significant improvement of time performance through a number of experiments. In our method, the tree-based storage structure [10] is also be applied to store the trajectory dataset, so the time performance of the problematic node finding is better than that of algorithm

G_{s u p}

and

L_{s u p}

and is similar to the algorithm

{I G}_{S U P} *

and

{I L}_{S U P} *

. In the stage of solving the problematic nodes, we consider three candidate technologies for a specific node, which takes more time than the algorithm

{I G}_{S U P} *

and

{I L}_{S U P} *

, but its traversal performance is still better than the algorithm

G_{s u p}

and

L_{s u p}

because of the storage structure. However, although the time performance of the algorithm LPA is worse than that of the algorithms

{I G}_{S U P} *

and

{I L}_{S U P} *

, the data utility of the anonymous dataset constructed by the LPA algorithm is significantly better than that of the comparison algorithm. The detailed analysis of data utility will be introduced in Section 5.

5. Experiment

The LPA algorithm was implemented in Python 3, and other algorithms were implemented in Java. All experiments were tested on a PC with an AMD Ryzen 7 at 2.90 GHz with 16 GB of RAM. We compared our algorithm with the algorithm

G_{s u p}

and

L_{s u p}

[8]; both algorithms apply global and local suppression. The

{I G}_{S U P} *

and

{I L}_{S U P} *

[10], which are based on suppression and pruning techniques, are also compared. Our experiment is mainly aimed at measuring the data utility of the anonymous dataset; the programming language of the algorithm will not affect the final result of the data.

5.1. Dataset

We use the City80K [30,31] dataset in our experiments. City80K contains 80,000 trajectories. After trajectory preprocessing, the data form shown in Table 1 will be obtained. In this paper, only the location information that the user signs in with was used.

5.2. Metrics

The key to measuring the performance of the trajectory privacy preservation algorithm is to evaluate the quality of anonymous trajectory datasets, which mainly include information that remains from anonymous datasets and the utility of anonymous datasets. We focus on the changes in the trajectory dataset before and after anonymity. The measuring utilities are as follows.

Average trajectory remaining ratio. The first metric measures the remaining ratio of locations of each trajectory. We define

{T R}_{i}

as the trajectory remaining ratio of trajectory

t_{i}

and calculate it as Equation (9):

{T R}_{i} = \frac{|l o c t i o n s r e t a i n e d w i t h o r d e r i n {t_{i}}^{'}|}{|l o c t i o n s i n t_{i}|}

(9)

We compare each trajectory step by step and find the average trajectory remaining ratio for the anonymous dataset, denoted as TR(avg). The higher the value is, the better the effect will be. The formula for calculating TR(avg) is in Equation (10):

T R (a v g) = \frac{1}{|T|} \times \sum_{i \in |T|} {T R}_{i}

(10)

Average location appearance ratio. We use the average location appearance ratio to measure changes in the number of locations before and after anonymity, denoted as AR(avg), and we calculate the change in the number of appearances of the location in the original trajectory dataset and the anonymous dataset. For each location

l \in L

in the trajectory dataset, we define AR(l) as the number of appearances of l in the anonymity dataset

(n_{l}^{″})

and the original trajectory dataset

{(n}_{l}^{'})

ratio. The formulas are shown in Equations (11) and (12):

A R (l) = \frac{n_{l}^{″}}{n_{l}^{'}}

(11)

R (a v g) = \frac{1}{|L|} \times \sum_{l \in L} A R (l),

(12)

where the value of AR(avg) is between 0 and 1; the higher the value is, the better the effect.

Frequent sequential pattern remaining ratio. In general, if there are more frequent sequential patterns that are preserved after anonymity, then the better the data utility will be. We use the Prefixspan algorithm to obtain frequent sequential patterns in the original dataset and the anonymity dataset. Then, we calculate the average percentage of frequent sequential patterns of the original dataset observed in the safe counterpart, denoted as FSP(avg).

5.3. Result and Analysis

Considering the parameter settings of the compared algorithms, we set the experimental parameters as follows:

(1): The user’s privacy requirements threshold $P_{t}$ . $P_{t}$ is 0.5 by default, and the range varies from 0.4 to 0.7.
(2): The average length of a trajectory denoted by $|t|$ . $|t|$ is 6 by default, and the range varies from 4 to 7.
(3): The size of the trajectory dataset denoted by $|T|$ . $|T|$ is 300 by default, and the range varies between 150 and 400.

Since our algorithm has a certain randomness in the order of processing and the selection of problematic node c, each group of experiments is run ten times and the results are averaged.

5.3.1. Average Trajectory Remaining Ratio

Figure 7 shows the average trajectory remaining ratio TR(avg) for the variable of the user’s privacy requirements threshold

P_{t}

, the average length of a trajectory

|t|

, and the size of the trajectory dataset

|T|

.

As shown in Figure 7a, the highest TR(avg) value is always achieved by the LPA algorithm, which is much better than

G_{S U P}

,

L_{S U P}

{I G}_{S U P} *

, and

{I L}_{S U P} *

. The trend of the LPA algorithm initially decreases and then increases because the LP function is used in the LPA algorithm, which can balance the use of the three techniques with certain randomness. In Figure 7b, the TR(avg) values decrease as

|t|

increase. This occurs because if the trajectory is larger, the locations of a trajectory that are owned by the attackers will also be larger, which results in an increase in the number of problems N. We can see that the values of TR(avg) of the LPA algorithm are on average 0.32 and 0.25 higher than that of

G_{S U P}

and

{I G}_{S U P} *

, respectively, and 0.16 higher than

L_{S U P}

and

{I L}_{S U P} *

. Figure 7c presents the result when the value of

|T|

increases. The TR(avg) values of the LPA algorithm fluctuate within the range of approximately 0.88 to 0.93. Additionally, the TR(avg) values are always much higher than the other algorithms, which indicates that the scalability of our algorithm is better than that of the other algorithms. The main reason for this result occurring is that the LP function used in the LPA balances the selection of the three techniques.

5.3.2. Average Location Appearance Ratio

Figure 8 shows the average location appearance ratio AR(avg) for the user’s privacy requirements threshold

P_{t}

, the average length of a trajectory

|t|

, and the size of the trajectory dataset

|T|

.

As shown in Figure 8a, with the increase of

P_{t}

, the AR(avg) values of all algorithms show an upwards trend, and the AR(avg) values of the LPA algorithm are always significantly higher than those of the others. In Figure 8b with the increase of

|t|

, the AR(avg) values of the four algorithms gradually decrease, but the AR(avg) values of the LPA algorithm take 6 as the critical point and show a trend of first decreasing and then increasing. Moreover, the AR(avg) value of the LPA algorithm is always higher than that of the other algorithms. Figure 8c shows that as

|T|

increases, the AR(avg) values of the LPA algorithm fluctuate within the range of approximately 0.80 and are always higher than those of the other four algorithms. The reason for this phenomenon is that the LPA algorithm considers three techniques as candidate techniques, and the application of the LP function reduces the frequency of the suppression and reduces the information loss of anonymity processing.

5.3.3. Frequent Sequential Pattern Mining

We present the average frequent sequential pattern remaining ratio FSP(avg) for the variable of the user’s privacy requirements threshold

P_{t}

, the average length of a trajectory

|t|

, and the size of the trajectory dataset

|T|

. In our experiments, the frequent pattern threshold is defined as 2. The results are shown in Figure 9. For each parameter change, the LPA algorithm has the highest FSP(avg) values.

As shown in Figure 9a, for LPA,

L_{S U P}

, and

{I L}_{S U P} *

, the general trend is that the FSP(avg) values increase as

P_{t}

increases, while the trends of

G_{S U P}

and

{I G}_{S U P} *

have no obvious upwards motion. Furthermore, the FSP(avg) values of the LPA algorithm are significantly better than those of the other algorithms because the LP function of the LPA algorithm gradually processes the problematic nodes and merges the nodes, which improves the FSP(avg). In Figure 9b, the FSP(avg) values gradually increase with an increasing

|t|

, and the FSP(avg) value of the LPA algorithm is always higher than that of the other algorithms. The reason for this result is that the longer the average trajectory length is, the higher the number of problems. Similar to the data in the previous two experimental results, Figure 9c shows that as

|T|

increases, the FSP(avg) values show an overall upwards trend, and the values of LPA are overall higher than those of the other algorithms. It can be concluded that the overall performance of FSP(avg) in LPA is better than that of the other algorithms.

To summarize, compared with the other algorithms [8,10], the performance of the LPA algorithm is better in terms of data utility. Additionally, this result indicates that the LPA algorithm achieves higher data utility in anonymity and effectively balances the data utility and privacy preservation.

6. Discussion

We mainly focus on the research of privacy protection methods of trajectory data publishing by trajectory data processing. The data type is semantic location (such as user transaction data). Considering potential attackers, we can further infer the unknown location of users that may be inferred by attackers through identifying the privacy threats. A new privacy protection method is proposed to solve the problem of privacy disclosure in the analysis and reuse of such data. We conducted an experimental evaluation on the public dataset, and the experimental results show that our method can effectively achieve trajectory privacy protection, reduce information loss, and improve the data utility. The preprocessing part and storage structure of the tree in the first stage can reduce the running time of identifying the privacy threats, which has been proved in other work [10]. In the second stage, we consider the mixed use of multiple technologies to improve the utility of anonymous data. At the same time, this method can be further extended to a data-type privacy protection method with different types of sensitive attributes in semantic location, which is the next research content.

7. Conclusions

In this paper, we proposed a novel privacy-preserving method for trajectory data publication based on LPA, which is used to defend against attackers and infer other unknown locations through partial background knowledge in their possession. We designed the operation of splitting and dummy trajectory based on the tree structure and considered suppression, splitting, and dummy trajectory adding as candidate techniques. Then, we proposed an LP function based on the analysis of location loss and anonymity gain to select the final operation for each problematic node. The empirical results illustrated that the LPA algorithm reduces information loss and improves data utility to a certain extent. There is also room for improvement in the selection function, and we will continue to study this problem and consider the relevance of different types of sensitive attributes in semantic location in our ongoing work.

8. Patent

A patent of the same name has been applied for the relevant research content of this manuscript (No. CN202210178099.9), which entered the substantive examination stage on 27 May 2022.

Author Contributions

Conceptualization, X.Z. and Y.L.; methodology, X.Z.; validation, X.Z., L.X. and Z.L.; investigation, L.X. and Z.L.; resources, Q.Y.; data curation, X.Z.; writing—original draft preparation, X.Z.; writing—review and editing, Y.L. and Q.Y.; visualization, Z.L.; supervision, Y.L.; project administration, Y.L.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 62272006 and the University Collaborative Innovation Project of Anhui Province, grant number GXXT-2019-040. The APC was funded by the National Natural Science Foundation of China, grant number 62272006.

Data Availability Statement

We are sorry that the raw/processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study. However, if the paper is accepted for publication, the data and code in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Mendes, R.; Vilela, J.P. Privacy-preserving Data Mining: Methods, Metrics, and Applications. IEEE Access 2017, 5, 10562–10582. [Google Scholar] [CrossRef]
Naji, H.A.; Wu, C.; Zhang, H. Understanding the impact of human mobility patterns on taxi drivers’ profitability using clustering techniques: A case study in Wuhan, China. Information 2017, 8, 67. [Google Scholar] [CrossRef]
Wang, W.; Mu, Q.; Pu, Y.; Man, D.; Yang, W.; Du, X. Sensitive Labels Matching Privacy Protection in Multi-Social Networks. In Proceedings of the ICC 2020–2020 IEEE International Conference on Communications, Dublin, Ireland, 7–11 June 2020; pp. 1–7. [Google Scholar]
Chen, L.; Xu, Y.; Xie, F.; Huang, M.; Zheng, Z. Data Poisoning Attacks on Neighborhood-Based Recommender Systems. Trans. Emerg. Telecommun. Technol. 2021, 32, e3872. [Google Scholar] [CrossRef]
Yang, Z.; Wang, R.; Wu, D.; Wang, H.; Song, H.; Ma, X. Local Trajectory Privacy Protection in 5G Enabled Industrial Intelligent Logistics. IEEE Trans. Ind. Inform. 2021, 18, 2868–2876. [Google Scholar] [CrossRef]
Fung, B.C.; Wang, K.; Chen, R.; Yu, P.S. Privacy-preserving Data Publishing: A Survey of Recent Developments. ACM Comput. Surv. 2010, 42, 1–53. [Google Scholar] [CrossRef]
Jin, F.; Hua, W.; Francia, M.; Chao, P.; Orlowska, M.; Zhou, X. A Survey and Experimental Study on Privacy-Preserving Trajectory Data Publishing. IEEE Trans. Knowl. Data Eng. 2022, 1. [Google Scholar] [CrossRef]
Terrovitis, M.; Poulis, G.; Mamoulis, N.; Skiadopoulos, S. Local Suppression and Splitting Techniques for Privacy Preserving Publication of Trajectories. IEEE Trans. Knowl. Data Eng. 2017, 29, 1466–1479. [Google Scholar] [CrossRef]
Hassan, M.Y.; Saha, U.; Mohammed, N.; Durocher, S.; Miller, A. Efficient Privacy-Preserving Approaches for Trajectory Datasets. In Proceedings of the IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress, Calgary, AB, Canada, 17–22 August 2020; pp. 612–619. [Google Scholar]
Lin, C.Y. Suppression Techniques for Privacy-preserving Trajectory Data Publishing. Knowl.-Based Syst. 2020, 206, 106354. [Google Scholar] [CrossRef]
Lin, C.Y.; Wang, Y.C.; Fu, W.T.; Chen, Y.S.; Chien, K.C.; Lin, B.Y. Efficiently Preserving Privacy on Large Trajectory Datasets. In Proceedings of the IEEE Third International Conference on Data Science in Cyberspace, Guangzhou, China, 18–21 June 2018; pp. 358–364. [Google Scholar]
Luper, D.; Cameron, D.; Miller, J.; Arabnia, H.R. Spatial and Temporal Target Association Through Semantic Analysis and GPS Data Mining. In Proceedings of the 2007 International Conference on Information & Knowledge Engineering, Las Vegas, Nevada, USA, 25–28 June 2007; pp. 25–28. [Google Scholar]
Lei, P.R.; Peng, W.C.; Su, I.J.; Chang, C.P. Dummy-based Schemes for Protecting Movement Trajectories. J. Inf. Sci. Eng. 2012, 28, 335–350. [Google Scholar]
Wu, Q.; Liu, H.; Zhang, C.; Fan, Q.; Li, Z.; Wang, K. Trajectory Protection Schemes Based on A Gravity Mobility Model in IoT. Electronics 2019, 8, 148. [Google Scholar] [CrossRef]
Liu, X.; Chen, J.; Xia, X.; Zong, C.; Zhu, R.; Li, J. Dummy-Based Trajectory Privacy Protection Against Exposure Location Attacks. In Proceedings of the International Conference on Web Information Systems and Applications, Qingdao, China, 20–22 September 2019; pp. 368–381. [Google Scholar]
Samarati, P. Protecting Respondents Identities in Microdata Release. IEEE Trans. Knowl. Data Eng. 2001, 13, 1010–1027. [Google Scholar] [CrossRef]
Sweeney, L. k-anonymity: A Model for Protecting Privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 2002, 10, 557–570. [Google Scholar] [CrossRef]
Abul, O.; Bonchi, F.; Nanni, M. Never walk alone: Uncertainty for Anonymity in Moving Objects Databases. In Proceedings of the IEEE 24th International Conference on Data Engineering, Cancun, Mexico, 7–12 April 2008; pp. 376–385. [Google Scholar]
Domingo-Ferrer, J.; Trujillo-Rasua, R. Microaggregation-and Permutation-Based Anonymization of Movement Data. Inf. Sci. 2012, 208, 55–80. [Google Scholar] [CrossRef]
Trujillo-Rasua, R.; Domingo-Ferrer, J. On the Privacy Offered by (k, δ)-Anonymity. Inf. Syst. 2013, 38, 491–494. [Google Scholar] [CrossRef]
Dong, Y.; Pi, D. Novel Privacy-Preserving Algorithm Based on Frequent Path for Trajectory Data Publishing. Knowl.-Based Syst. 2018, 148, 55–65. [Google Scholar] [CrossRef]
Kopanaki, D.; Pelekis, N.; Kopanakis, I.; Theodoridis, Y. Who Cares about Others’ privacy: Personalized Anonymization of Moving Object Trajectories. In Proceedings of the 19th International Conference on Extending Database Technology: Advances in Database Technology, Bordeaux, France, 15–18 March 2016. [Google Scholar]
Chen, C.; Luo, Y.; Yu, Q.; Hu, G. TPPG: Privacy-Preserving Trajectory Data Publication Based on 3D-Grid Partition. Intell. Data Anal. 2019, 23, 503–533. [Google Scholar] [CrossRef]
Nergiz, M.E.; Atzori, M.; Saygin, Y. Towards Trajectory Anonymization: A Generalization-Based Approach. In Proceedings of the SIGSPATIAL ACM GIS 2008 International Workshop on Security and Privacy in GIS and LBS, Irvine, CA, USA, 4–7 November 2008; Volume 2, pp. 52–61. [Google Scholar] [CrossRef]
Yarovoy, R.; Bonchi, F.; Lakshmanan, L.V.; Wang, W.H. Anonymizing Moving Objects: How to Hide a Mob in A Crowd? In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, Saint Petersburg, Russia, 24–26 March 2009; pp. 72–83. [Google Scholar]
Terrovitis, M.; Mamoulis, N.; Kalnis, P. Local and Global Recoding Methods for Anonymizing Set-valued Data. VLDB J. 2011, 20, 83–106. [Google Scholar] [CrossRef]
Poulis, G.; Skiadopoulos, S.; Loukidis, G.; Gkoulalas Divanis, A. Apriori-based Algorithms for km-anonymizing Trajectory Data. Trans. Data Priv. 2014, 7, 165–194. [Google Scholar] [CrossRef]
Terrovitis, M.; Mamoulis, N. Privacy preservation in the publication of trajectories. In Proceedings of the 9th International Conference on Mobile Data Management, Beijing, China, 27-30 April 2008; pp. 65–72. [Google Scholar]
Yao, L.; Chen, Z.; Hu, H.; Wu, G.; Wu, B. Privacy Preservation for Trajectory Publication Based on Differential Privacy. ACM Trans. Intell. Syst. Technol. 2022, 13, 1–21. [Google Scholar] [CrossRef]
Chen, R.; Fung, B.C.; Mohammed, N.; Desai, B.C.; Wang, K. Privacy-preserving Trajectory Data Publishing by Local Suppression. Inf. Sci. 2013, 231, 83–97. [Google Scholar] [CrossRef]
Komishani, E.G.; Abadi, M.; Deldar, F. PPTD: Preserving Personalized Privacy in Trajectory Data Publishing by Sensitive Attribute Generalization and Trajectory Local Suppression. Knowl.-Based Syst. 2016, 94, 43–59. [Google Scholar] [CrossRef]

Figure 1. The TP-tree in Table 1. The

2 n d

-level nodes of the TP-tree correspond to projection trajectories, and the

3 r d

-level nodes correspond to locations that may be inferred.

Figure 1. The TP-tree in Table 1. The

2 n d

-level nodes of the TP-tree correspond to projection trajectories, and the

3 r d

-level nodes correspond to locations that may be inferred.

Figure 2. Overview of the proposed methods (LPA). N is the total number of problems.

Figure 3. Problematic nodes in Table 1. Problematic nodes are marked with dotted boxes.

Figure 4. Updating of the TP-tree by using suppression. The number ① represents the location

b_{1}

was removed from

2 n d

-level node

b_{1} \to b_{2}

, and updated the remaining nodes, and ② represents

3 r d

-level node

b_{1}

and also be deleted.

Figure 4. Updating of the TP-tree by using suppression. The number ① represents the location

b_{1}

was removed from

2 n d

-level node

b_{1} \to b_{2}

, and updated the remaining nodes, and ② represents

3 r d

-level node

b_{1}

and also be deleted.

Figure 5. Updating of the TP-tree by using splitting. The number ① represents

t_{7}

is split from

b_{1}

to obtain

{t_{7}}^{'}

and

{t_{7}}^{″}

, and ② represents the construction of a new TP-tree* about the trajectory

{t_{7}}^{'}

and

{t_{7}}^{″}

.

Figure 5. Updating of the TP-tree by using splitting. The number ① represents

t_{7}

is split from

b_{1}

to obtain

{t_{7}}^{'}

and

{t_{7}}^{″}

, and ② represents the construction of a new TP-tree* about the trajectory

{t_{7}}^{'}

and

{t_{7}}^{″}

.

Figure 6. Updating of the TP-tree by using a dummy trajectory. The number ① represents a dummy trajectory

{t_{7}}^{'} = b_{1} \to b_{2}

is added.

Figure 6. Updating of the TP-tree by using a dummy trajectory. The number ① represents a dummy trajectory

{t_{7}}^{'} = b_{1} \to b_{2}

is added.

Figure 7. Average trajectory remaining ratio TR(avg) (higher is better). (a) Varying

P_{t}

. (b) Varying

|t|

(avg). (c) Varying

|T|

.

Figure 7. Average trajectory remaining ratio TR(avg) (higher is better). (a) Varying

P_{t}

. (b) Varying

|t|

(avg). (c) Varying

|T|

.

Figure 8. Average location appearance ratio AR(avg) (higher is better). (a) Varying

P_{t}

. (b) Varying

|t|

(avg). (c) Varying

|T|

.

Figure 8. Average location appearance ratio AR(avg) (higher is better). (a) Varying

P_{t}

. (b) Varying

|t|

(avg). (c) Varying

|T|

.

Figure 9. Average published frequent sequential pattern remaining ratio FSP(avg) (higher is better). (a) Varying

P_{t}

. (b) Varying

|t|

(avg). (c) Varying

|T|

.

Figure 9. Average published frequent sequential pattern remaining ratio FSP(avg) (higher is better). (a) Varying

P_{t}

. (b) Varying

|t|

(avg). (c) Varying

|T|

.

Table 1. An example of trajectory set T.

ID	Trajectory
$t_{1}$	$a_{5} \to b_{4} \to a_{1}$
$t_{2}$	$b_{4} \to a_{3}$
$t_{3}$	$b_{3} \to a_{1} \to a_{2} \to b_{2}$
$t_{4}$	$a_{1} \to a_{3} \to b_{3} \to b_{2}$
$t_{5}$	$b_{4} \to a_{1} \to a_{5}$
$t_{6}$	$b_{4} \to a_{2}$
$t_{7}$	$a_{1} \to b_{1} \to a_{5} \to a_{4} \to a_{2} \to b_{2}$
$t_{8}$	$b_{2} \to a_{3}$

Table 2. Trajectory projection set

T^{A}

of T.

Table 2. Trajectory projection set

T^{A}

of T.

ID	${P T}_{A}$
${P T}_{A} (t_{1})$	$a_{5} \to a_{1}$
${P T}_{A} (t_{2})$	$a_{3}$
${P T}_{A} (t_{3})$	$a_{1} \to a_{2}$
${P T}_{A} (t_{4})$	$a_{1} \to a_{3}$
${P T}_{A} (t_{5})$	$a_{1} \to a_{5}$
${P T}_{A} (t_{6})$	$a_{2}$
${P T}_{A} (t_{7})$	$a_{1} \to a_{5} \to a_{4} \to a_{2}$
${P T}_{A} (t_{8})$	$a_{3}$

Table 3. Array of PreGain.

Problematic Node	Trajectory ID	Final Operation	Operation Information	Anonymity Gain
$t^{A}$	$t_{i}$	suppress	$(t_{m}^{A}, t_{n}^{A})$	$G_{g a i n} (t_{m}^{A}, t_{n}^{A})$
$t^{A}$	$t_{i}$	split	$l$	$S_{g a i n} (l, t)$
$t^{A}$	$t_{i}$	dummy	$t^{'}$	$F_{g a i n} (t^{'})$

Table 4. A corresponding safe dataset

T *

of T.

Table 4. A corresponding safe dataset

T *

of T.

ID	Trajectory
${t_{1}}^{'}$	$a_{5} \to b_{4} \to a_{1}$
${t_{2}}^{'}$	$b_{4} \to a_{3}$
${t_{3}}^{'}$	$b_{3} {\to a}_{2} \to b_{2}$
${t_{4}}^{'}$	$b_{4} \to a_{1} \to a_{5}$
${t_{5}}^{'}$	$b_{4} \to a_{2}$
${t_{6}}^{'}$	$b_{2} \to a_{3}$
${t_{7}}^{'}$	$a_{5} \to a_{1}$
${t_{8}}^{'}$	$a_{3}$
${t_{9}}^{'}$	$b_{3} \to a_{2}$
${t_{10}}^{'}$	$a_{1} \to a_{5} \to b_{4} \to a_{2}$
${t_{11}}^{'}$	$b_{2}$
${t_{12}}^{'}$	$a_{1} \to a_{5}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, X.; Luo, Y.; Yu, Q.; Xu, L.; Lu, Z. Privacy-Preserving Method for Trajectory Data Publication Based on Local Preferential Anonymity. Information 2023, 14, 157. https://doi.org/10.3390/info14030157

AMA Style

Zhang X, Luo Y, Yu Q, Xu L, Lu Z. Privacy-Preserving Method for Trajectory Data Publication Based on Local Preferential Anonymity. Information. 2023; 14(3):157. https://doi.org/10.3390/info14030157

Chicago/Turabian Style

Zhang, Xiao, Yonglong Luo, Qingying Yu, Lina Xu, and Zhonghao Lu. 2023. "Privacy-Preserving Method for Trajectory Data Publication Based on Local Preferential Anonymity" Information 14, no. 3: 157. https://doi.org/10.3390/info14030157

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Privacy-Preserving Method for Trajectory Data Publication Based on Local Preferential Anonymity

Abstract

1. Introduction

2. Related Work

2.1. Dummy Trajectory

2.2. Clustering and Partition

2.3. Generalization and Suppression

3. Preliminaries

3.1. Problem Definition

3.2. Trajectory Projection Tree

3.3. Anonymity Gain Measurement

4. Methods

4.1. Problematic Nodes Finding

4.2. Anonymity Gain Measurment

4.2.1. Suppression

4.2.2. Splitting

4.2.3. Dummy Trajectory

4.3. Local Preferential Seclection

4.4. Trajectory Anonymization

4.5. Algorithms Anlysis

4.5.1. Trajectory Privacy Preservation Capability

4.5.2. Complexity Analysis

5. Experiment

5.1. Dataset

5.2. Metrics

5.3. Result and Analysis

5.3.1. Average Trajectory Remaining Ratio

5.3.2. Average Location Appearance Ratio

5.3.3. Frequent Sequential Pattern Mining

6. Discussion

7. Conclusions

8. Patent

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI