GeoSDVA: A Semi-Supervised Dirichlet Variational Autoencoder Model for Transportation Mode Identification

Zhang, Xiaoxi; Gao, Yuan; Wang, Xin; Feng, Jun; Shi, Yan

doi:10.3390/ijgi11050290

Open AccessArticle

GeoSDVA: A Semi-Supervised Dirichlet Variational Autoencoder Model for Transportation Mode Identification

by

Xiaoxi Zhang

¹,

Yuan Gao

²

,

Xin Wang

^1,3

,

Jun Feng

¹

and

Yan Shi

^4,*

¹

School of Information Science and Technology, Northwest University, Xi’an 710127, China

²

School of Economics and Management, Northwest University, Xi’an 710127, China

³

Department of Geomatics Engineering, University of Calgary, Calgary, AB T2N 1N4, Canada

⁴

School of Foreign Languages, Northwest University, Xi’an 710127, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2022, 11(5), 290; https://doi.org/10.3390/ijgi11050290

Submission received: 11 March 2022 / Revised: 23 April 2022 / Accepted: 26 April 2022 / Published: 29 April 2022

Download

Browse Figures

Versions Notes

Abstract

:

Inferring the transportation modes of travelers is an essential part of intelligent transportation systems. With the development of mobile services, it is easy to effectively obtain massive location readings of travelers with GPS-enabled smart devices, such as smartphones. These readings make understanding human activities very convenient. Therefore, how to automatically infer transportation modes from these massive readings has come into the spotlight. The existing methods for transportation mode identification are usually based on supervised learning. However, the raw GPS readings do not contain any labels, and it is expensive and time-consuming to annotate sufficient samples for training supervised learning-based models. In addition, not enough attention is paid to the problem that GPS readings collected in urban areas are affected by surrounding geographic information (e.g., the level of road transportation or the distribution of stations). To solve this problem, a geographic information-fused semi-supervised method based on a Dirichlet variational autoencoder, named GeoSDVA, is proposed in this paper for transportation mode identification. GeoSDVA first fuses the motion features of the GPS trajectories with the nearby geographic information. Then, both labeled and unlabeled trajectories are used to train the semi-supervised model based on the Dirichlet variational autoencoder architecture for transportation mode identification. Experiments on three real GPS trajectory datasets demonstrate that GeoSDVA can train an excellent transportation mode identification model with only a few labeled trajectories.

Keywords:

transportation mode identification; deep learning; semi-supervised learning; variational autoencoder

1. Introduction

Inferring transportation modes is a crucial component of intelligent transportation systems. The transportation mode selected by the traveler is a fundamental behavior characteristic. Inferring transportation mode distributions can help city management agencies understand residents’ behavior [1] and control the urban transport system [2,3,4]. In addition, transportation mode information plays a vital role in the construction of an activity-based user model [5] and the provision of task-centered supports [6]. For example, transportation mode identification helps with customizing mobile services [7] and with advertising recommendations [8]. In recent years, an increasing number of cities have begun to deploy Mobility as a Service (MaaS). This new demand-oriented transportation service will change the travel behaviors of urban residents. In order to predict and meet the travel needs of urban residents in a reasonable manner, the basic problem of identifying the transportation mode of residents must be solved.

In the past few decades, the use of GPS devices represented by smartphones has become widespread, and has produced massive amounts of resident data with spatio-temporal information, which offers an excellent opportunity to analyze urban transportation modes at low cost and with high efficiency. However, inferring transportation modes from the raw trajectory is not easy, and still faces several challenges.

To begin with, raw GPS records only contain timestamps and location information, which do not directly reflect trajectory characteristics. Different from image data, using deep learning models to extract features directly and automatically from the raw GPS trajectories is difficult. Therefore, the first problem in transportation mode identification is how to construct an effective representation of the raw trajectories that can also be used by the model. The main method of existing research is to extract a series of motion features. However, in real-life scenarios, the users’ GPS trajectories are influenced by complex geographic information [9]. Previous studies have demonstrated excellent traffic forecasting capabilities by exploiting the spatial topological relationship in road networks to learn the spatio-temporal dependency of traffic flows [10,11]. Moreover, the geographic context supplements valuable information for predictions, and has made a decisive contribution to predictions and forecasts of air quality by using limited sensor data [12,13] and mining moving behavior based on trajectory data [14].

In recent years, studies show that the relationship between GPS readings and geographic information contributes to transportation mode identification. For example, the motion characteristics for the same type of vehicle may vary because they should obey the different velocity limits on the different city roads [15]. As shown in Figure 1, we use 1 and 0 to indicate whether there is a bus station within 100 m of the GPS readings. The velocity characteristics of this trajectory are significantly affected as it passes through the bus station. It can be seen that the motion characteristics of the trajectories in the city are affected by the surrounding facilities. It is desirable to construct the representation of the trajectories combined with their own characteristics and the surrounding geographical context [9,16,17]. Therefore, we encode geographic information (e.g., the level of the road network, the distribution of public stations) and fuse it with the motion features of each GPS reading of the trajectories to build the trajectory embedding.

Second, the raw GPS trajectories collected from GPS-enabled smart devices do not directly contain the exact transportation mode labels. Therefore, to construct the sample for training the supervised learning model, users or researchers often assign transportation mode labels to the trajectories manually, which makes the studies based on the supervised learning model face the same limitations as traditional travel survey research. Unlike images and natural language texts, the trajectory composed of GPS readings is difficult for researchers to understand, making annotating labels a very time-consuming task. Furthermore, it is often costly for a large number of users to annotate their trajectories. For example, as introduced by Stopher and Greaves [18], it costs USD 750 to collect a week’s worth of user data. Compared with the high-cost label annotation, a large number of unlabeled GPS trajectories are easy to obtain. Therefore, it is necessary to design a model that can help us automatically extract information that is helpful for transportation mode identification from the easily obtained unlabeled data.

Based on previous studies [15,17,18], there are two research questions for transportation mode identification on GPS trajectories, i.e.,: 1. How can heterogenous features be integrated, such as motion features and geographic information, to represent a trajectory? 2. How can a semi-supervised transportation mode identification model be built with only a small amount of labeled trajectories and a large number of unlabeled trajectories?

In order to solve the above problems, we propose a novel transportation mode identification framework, named GeoSDVA, in which we first generate the embedding of trajectory with the combination of the motion features and geographic information of each GPS reading of a raw trajectory, and then construct a semi-supervised transportation mode identification model based on a Dirichlet variational autoencoder. Accordingly, the main contributions of the study are summarized as follows:

We propose a new sample design method to fuse the heterogeneous features together, including the geographic information and motion features from the raw trajectories. This method integrates the motion features and geographic contexts to which the GPS trajectory is related, including the bus stations, road intersections, and the road network level, to form an embedded representation of the raw trajectory;
We construct a semi-supervised Dirichlet variational autoencoder (DirVAE) based the transportation mode identification model, named GeoSDVA, which consists of three modules: an encoding module to encode trajectories into latent vectors, a classifier module with multi-scale convolution to extract multi-scale features and predict the transportation modes of trajectories, and a decoding module that reconstructs the input trajectories;
We conduct extensive experiments on three real-world datasets, namely, Geolife v1.3, MTL Trajet 2017, and MTL Trajet 2016. The experimental results show that GeoSDVA outperforms other state-of-the-art frameworks. Furthermore, we evaluate the identification metrics of GeoSDVA under different amounts of labeled trajectories. The evaluation results show that GeoSDVA is practical for identifying the transportation mode of trajectories, especially when a small number of labeled trajectories are included in the training datasets.

2. Related Work

Most previous works have defined transportation mode identification based on GPS trajectories as a multi-class identification problem [19] that contains two sub-problems, i.e., feature or sample design and identification model design.

2.1. Feature or Sample Design

Feature design is a fundamental step in transportation mode identification. In pioneering studies [19,20,21,22], statistical features extracted from raw trajectories are widely used to identify transportation modes.In the work of Zheng et al. [19], statistical characteristics such as trajectory length, velocity, acceleration, heading change rate (HCR), stop rate (SR), or velocity change rate (VCR) were used. Sun et al. utilized the proportion of GPS readings with an acceleration statistic greater than the threshold in the trajectory as a feature for distinguishing between trucks and buses [20]. In another study, Nitsche et al. selected statistical features such as the 5th and 50th percentile of velocity, acceleration, deceleration, and heading to classify transportation modes [21]. Compared with the above-mentioned studies, Zhu et al. adopted statistical characteristics such as time slice type (TS), acceleration change rate (ACR), 85th velocity, and acceleration [22]. However, most of the feature selection methods are subjective and time-consuming.

GPS readings collected from residents reflect complex urban environments and traffic conditions [19]. Therefore, to improve the accuracy of transport mode identification, an external GIS context, such as road traffic network, public infrastructure, or POI distribution, has been gradually introduced in some studies. Semanjski et al. pointed out that the spatial distribution represents the different traffic modes in cities [17]. Stenneth et al. extracted statistical characteristics of urban GIS features, such as the distance between bus stations and the residents’ locations [9]. Kasahara et al. used urban GIS features, such as bus lines, subway lines, expressways, and pedestrian areas [16]. However, these studies rely on artificial feature extraction from geographic information, resulting in their under-utilization.

An increasing number of studies in the literature have borrowed neural networks’ deep learning model to automatically learn potential features from raw trajectories [23,24]. However, it is still necessary to perform proper data preprocessing on the raw trajectory to generate samples before the raw trajectory is fed to the deep learning model.

Some studies turn the trajectory into image samples. Enod et al. proposed a sample design method of matching raw trajectories to grid images [23]. Zhang et al. adopted the same method, turning the trajectories into multi-view images [25]. However, only the raw trajectory’s spatial information, with no motion characteristics, can be extracted using this method. Moreover, the process for constructing the trajectory images relies on many hyper-parameters, such as geographic span, grid size, and fixed sampling rate. This kind of method maintains the shape of the trajectory, but cannot effectively preserve the motion feature of the trajectory. Dabiri et al. processed the raw trajectories as segments with fixed GPS readings, and then merged the four classic motion features, velocity, acceleration, jerk, and heading change rate, to each segment [24,26]. Nawaz et al. used the motion feature vector and social information [27]. James et al. merged the motion feature vectors and wavelet transform features to identify transport modes [28]. However, those methods do not consider the geographic information related to the raw trajectory.

Currently, deep learning models have been used in a few studies to fuse the motion characteristics of raw trajectories with geographic information for identification tasks. For example, Xiao et al. fused land-use type, transportation modes, and personal information to identify the purpose of a user’s trip through an artificial neural network [29]. Servizi et al. combined the raw GPS trajectory with land-use type and public facilities to detect stop points through a deep learning model [30]. Thus, using deep learning models to automatically extract and fuse potential features from geographic information and GPS trajectories for transportation mode identification is a promising approach.

2.2. Identification Model Design

The identification model is another critical component of urban transportation mode identification. The previous works were mainly involved in traditional machine learning and deep learning models.

Traditional machine learning models have been based mostly on statistical features for transportation mode identification, such as the prior probability-based heuristic method [19], the tree-based ensemble model [31], Bayesian networks [32], and the LightGBM classifier [33]. Those methods achieved good identification results based on GPS data in some applications. However, they are highly dependent on features extracted in the previous process, which are subjective and time-consuming.

Supervised deep learning models have been widely applied to transportation mode identification in recent years. The previous studies showed that deep learning models make it possible to learn potential features from samples, usually from trajectory images or motion feature vectors. For example, Endo et al. extracted deep features from trajectory images by using a fully connected deep neural network [23]. Zhang et al. introduced a complex multi-scale convolutional neural network to extract the deep features of different time and space scales from multi-view trajectory images [25]. However, the motion features of trajectories cannot be extracted with those models, resulting in poor identification accuracy. Yazdizadeh et al. designed a large ensemble library consisting of a series of CNN (convolutional neural network) models with different hyper-parameter values and architectures to identify transportation modes [22]. James et al. learned deep features from the motion features by a stacked LSTM network, combined with the wavelet transform features, to infer the travel mode [28]. The limitation of their studies is that they did not take full advantage of latent spatial features. Navaz et al. proposed a convolutional LSTM-based method to identify transport modes with motion feature vector and social information [27]. However, they are all supervised learning methods, which find it hard to process the datasets with less labeled GPS trajectories.

To overcome the above shortcomings, a few studies have explored the application of semi-supervised models in transportation mode identification. Among these, Dabiri et al. constructed a convolutional autoencoder-based semi-supervised model to identify transportation modes [26]. However, in their model, the classifier’s output fails to participate in the reconstruction process of the decoder. Li et al. expanded the labeled trajectory samples using generated adversarial networks [34]. After that, they adopted the expanded dataset to train the CNN-based transportation mode identification model. In this way, Li et al., taking the transportation mode identification problem as a dense classification task, constructed a similarity entropy-based semi-supervised encoder–decoder network [35]. In general, the semi-supervised learning of transportation mode identification remains an understudied topic.

2.3. Semi-Supervised Variational Autoencoder

The semi-supervised variantional autoencoder (semi-VAE) was firstly introduced to image classification by Kingma et al. [36]. In addition to computer vision, semi-supervised VAE is also widely used in natural language processing. The traditional semi-supervised deep learning model based on an encoder–decoder network generally makes the encoder and classifier share weights to take advantage of the potential features from the unlabeled datasets [26]. What is different from the traditional model mentioned above is that, in the semi-supervised VAE model, the actual class label or the predicted label of the classifier is added to the respective input of the encoder module and to the input of the decoder module, so that the model can learn from both labeled and unlabeled data. In recent years, some studies have stressed the potential of the VAE model for spatio-temporal data mining [37,38]. Due to the generation ability and inference ability of VAEs, the problem of a lack of samples or labels of spatial-temporal data can be solved. The classical VAE assumes that the prior distribution is a Gaussian distribution, from which it learns the approximate

μ

(mean) and

σ

(standard deviation) [36]. Compared with the Gaussian distribution, the Dirichlet distribution can better simulate multi-model distribution. In order to make use of the advantages of the Dirichlet distribution, the DirVAE [39] model is obtained by replacing the prior distribution of classical VAEs with the Dirichlet distribution. Inspired by the work so far available, we designed a novel semi-supervised DirVAE-based framework named GeoSDVA to identify the transportation mode on the dataset with a large amount of unlabeled and a small amount of labeled trajectories. In a word, this paper proposes a novel framework named GeoSDVA to close the research gap in transportation mode identification. GeoSDVA mainly consists of the two following steps: first, extracting motion features from raw GPS trajectories before fusing them with nearby geographic information; second, training a semi-supervised DirVAE model designed on multiple trajectory features.

3. Methodology

As is shown in Figure 2, the overall framework of GeoSDVA is composed of two main components: trajectory embedding and the semi-supervised DirVAE model. In the first part, typical motion features, such as velocity, acceleration, and the heading rate of each GPS reading from the raw GPS trajectory, are computed. Then, the geographic information near the raw trajectory is encoded and fused with the motion features sequence. In the second part, we build a semi-supervised DirVAE model designed for trajectory features, including the encoding, decoding, and classifier modules. While the encoder module extracts potential temporal features from continuous trajectory feature sequences through a GRU, the decoder module adopts a dense network to reconstruct the trajectory feature sequence. The classifier module adopts a multi-scale one-dimensional convolution to extract the multi-scale potential features of the trajectory feature sequence for better classification. Before introducing GeoSDVA in detail, we first list the important mathematical symbols and the corresponding explanations in Table 1.

3.1. Trajectory Embedding

Definition 1.

GPS trajectory: A GPS trajectory is a sequence composed of n GPS readings,

T = \{p_{1}, p_{2}, p_{3} \dots p_{n}\}

. Each reading is a triplet consisting of latitude, longitude, and timestamps.

p_{i} = [l a t_{i}, l n g_{i}, t i m e s t a m p_{i}]

represents the i-th GPS reading of a GPS trajectory.

GPS readings are often affected by the environment during collection, such as urban canyons or human interruptions, resulting in missing movement readings. In order to identify the transportation mode of the trajectories, we need to extract a series of features from the GPS readings. If the time interval between two GPS readings is too long, abnormal features will be extracted. In order to ensure the continuity of the GPS trajectory, we use a time interval threshold to segment the GPS trajectory. Therefore, in data preprocessing, we divide the raw trajectory into sub-trajectories called trips when the time interval between two adjacent GPS readings is greater than the threshold, set as 20 min, as in the studies mentioned in [19,24].

Definition 2.

Trip: A trip is a sub-trajectory consisting of m consecutive GPS readings. For any two adjacent GPS readings in the trip, the time interval between them is less than a certain time threshold.

For those already-labeled trips in the datasets, we divide each of them into segments according to transportation mode; following that, each segment is labeled as a single transportation mode. For unlabeled trips, we adopt a heuristic algorithm [19] to segment them. Then, we divide the preprocessed segments using a fixed-length window. Subsequently, each segment is utilized to generate a feature vector of two parts: the motion feature and the geographic feature. The motion feature contains the segment’s motion characteristics, and the geographic feature composes the geographic information in the vicinity of a segment. The design of the two parts will be described separately in the following process.

3.1.1. Motion Feature

Definition 3.

Motion feature sequence: For a segment that contains m readings, the motion feature is denoted as

M o t i o n

= {

m o t i o n_{1}

,

m o t i o n_{2}

…

m o t i o n_{L}

}, where

m o t i o n_{i}

represents the segment’s motion characteristics of the GPS reading

p_{i}

.

This paper extracts three typical physical features, namely, velocity, acceleration, and heading change rate, representing the segments’ motion features, with the previous literature [24,27,28] for reference. The motion feature vector

m o t i o n_{i}

of the GPS reading includes velocity

V_{i}

, acceleration/deceleration

A_{i}

, and bearing rate

B_{i}

.

To obtain the value of velocity

V_{i}

, we calculate the geographical distance between two GPS readings by Vincenty’s formula [40]. Then, the velocity

V_{i}

at the GPS reading

p_{i}

can be determined by the distance and time interval between

p_{i}

and the next GPS reading

p_{i + 1}

, as is shown in Equation (1). Furthermore, the acceleration/deceleration of the GPS reading

p_{i}

can be computed according to Equation (2). The heading change rate

B_{i}

is the change of the angle between the forward trajectory direction and the due-north direction, which is calculated by Equations (3)–(5).

V_{i} = \frac{V i n c e n t y (p_{i + 1} - p_{i})}{t_{i + 1} - t_{i}}

(1)

A_{i} = \frac{S_{i + 1} - V_{i}}{t_{i + 1} - t_{i}}

(2)

y_{i} = sin (l n g_{i + 1} - l n g_{i}) cos (l a t_{i + 1})

(3)

x_{i} = cos (l a t_{i}) sin (l a t_{i + 1}) - sin (l a t_{i}) cos (l a t_{i + 1}) cos (l n g_{i + 1} - l n g_{i})

(4)

B_{i} = |arctan (y_{i + 1}, x_{i + 1}) - arctan (y_{i}, x_{i})|

(5)

3.1.2. Geographic Features

As discussed before, transportation modes in urban areas are also decided by the geographic environment to a certain extent. Therefore, we introduce geographic features as critical contextual information to represent the geographic environment related to the GPS trajectory and the trajectory embedding.

Definition 4.

Geographic feature sequence: For a segment that contains m readings, the geographic feature sequence is denoted as Geo =

g e o_{1}

,

g e o_{2}

, …

g e o_{L}

, where

g e o_{i}

is a feature vector corresponding to the geographic information of the GPS reading

p_{i}

.

In this study, the geographic information of a GPS reading

p_{i}

is denoted as

g e o_{i}

= [

R_{i}

,

C_{i}

,

S_{i}

]. These features are derived from the road level, road intersections, and bus stations near the GPS readings.

Among them, the road level is considered to be a helpful feature for classifying vehicles on the road [15]. Similarly, many bike and walk trajectories appear on low-level roads, while cars and public transit are are mostly and mainly distributed on high-level roads. Thus, we use

R_{i}

to represent the road level where the GPS reading

p_{i}

is located.

The structure of the road network affects the movement of both vehicles and pedestrians. For example, a vehicle passing through a road intersection could be controlled by a traffic signal, or it may need to give way, which will change its motion characteristics. According to Figure 3a, the trajectory of a car slows down when passing through an intersection. Therefore, we use the binary variable

C_{i}

to represent the existence of road intersections within 100 m of the GPS reading

p_{i}

.

Bus stations on the road network can help distinguish between cars and public transit. The movement characteristics of cars and buses are similar, but the significant difference is that buses usually stop at stations periodically. For example, the bus trajectory shown in Figure 3b slows down near the bus stations. Therefore, we use the binary variable

S_{i}

to represent the existence of bus stations within 100 m of the GPS reading

p_{i}

.

By combining the motion feature with the geographic feature, each GPS reading

p_{i}

of the segment is expressed as a feature vector

e_{i}

= [

V_{i}

,

A_{i}

,

B_{i}

,

R_{i}

,

C_{i}

,

S_{i}

]. Each segment with a single transportation mode label is represented as a

6 \times N

embedding sequence

E m b = \{e m b_{1}, e m b_{2}, e m b_{3} \dots e m b_{L}\}

, where L is the number of GPS readings.

3.2. Semi-Supervised DirVAE Model

Based on the semi-supervised VAE model proposed in [36] and on the DirVAE model proposed in [41], we design a novel semi-supervised DirVAE model that aims mainly at trajectory features such as an encoding, classifying, and decoding module for transportation model identification; each module will be introduced in detail in the following part.

3.2.1. Encoding Module

The encoding module has two layers. The first layer is an encoding GRU (gate recurrent unit), which has effectively captured the potential temporal information and is widely used in trajectory mining. It takes the embedding sequence of a trajectory segment

E m b = \{e m b_{1}, e m b_{2}, e m b_{3} \dots e m b_{L}\}

as input, and outputs the encoded hidden state

h_{i}

of each time step

H = \{h_{1}, h_{2}, h_{3} \dots h_{L}\}

represented by a set. The process of the GRU extraction of potential temporal features is shown in Equations (6)–(9). The second layer is a dense network that takes H and the transportation mode label y as inputs and outputs the encoded latent vectors of this trajectory segment.

z_{t} = sigmoid (W_{z} \cdot [h_{t - 1}, E m b_{t}])

(6)

r_{t} = sigmoid (W_{r} \cdot [h_{t - 1}, E m b_{t}])

(7)

{\tilde{h}}_{t} = tanh (W \cdot [h_{t - 1}, F_{t}])

(8)

h_{t} = (1 - z_{t}) * h_{t - 1} + z_{t} * {\tilde{h}}_{t}

(9)

3.2.2. Classifier Module

The classifier module is used to identify the transportation mode of a trajectory segment y in multiple dimensions, and can parameterize the distribution

q_{φ} (y ∣ x)

. It consists of four parts. The first part is a multi-scale one-dimensional convolutional layer, which is made up of multiple parallel one-dimensional convolution networks with different kernel receptive fields. It is designed to extract the multi-scale features of trajectory segments in both the long- and short-term. The one-dimensional convolution process is shown in Equation (10):

c o n v 1 D (x_{u}^{l - 1}) = Relu (\sum_{c = 1}^{p} W_{u + c - 1}^{l - 1} x_{u + c - 1}^{l - 1} + b)

(10)

where l is the order number of the one-dimensional convolution layer, p denotes the receptive kernel size, u is the order number of the units in the layer, and W is the trainable weight parameter. We use the ReLU activation function in this layer. A max-pooling layer is applied to retain essential features in the feature map after each one-dimensional convolution. Specifically, we use four one-dimensional convolutional networks with different kernel receptive field sizes to extract features at different scales.

The second part is a fusion layer. It is applied to fuse multiple potential feature maps with different scales, which were output from the previous layer. It connects all multi-scale feature maps in the dimension of the channel using a ReLU activation function, as shown in Equation (11):

F = Relu ([f_{1}, f_{2}, f_{3}, f_{4}])

(11)

where F is the output of this layer and

f_{i}

denotes the output of a convolutional network.

The third part is a GRU layer. The GRU can further extract potential temporal features from F for transportation mode identification. The hidden state

h_{t}

of the last time step of the GRU is considered to represent the potential temporal features of F.

The last part is a dense network. It will generate the probability that the current segment belongs to each transportation mode y from

h_{t}

using a softmax function.

3.2.3. Decoding Module

The decoding module is used to reconstruct the input trajectory segment. The module contains three parts. The first part is a dense network, aiming at learning the vector

α

—the parameters of the Dirichlet distribution—by taking the outputs of both the encoding and the classifier module as inputs. According to [41], we also use a softmax Gaussian distribution to approximate the Dirichlet distribution. The parameters

μ

and

σ

of the Gaussian distribution can be calculated by the parameters

α

of the Dirichlet distribution using Equations (12) and (13) below:

μ_{i} = log α_{i} - \frac{1}{K} \sum_{i} log α

(12)

σ_{i} = \frac{1}{α_{i}} (1 - \frac{2}{K}) + \frac{1}{K^{2}} \sum_{i} \frac{1}{α}

(13)

where K is the dimension of the parameter tensor, i is the index of the parameter,

μ

and

σ

are the mean and standard deviation of the Gaussian distribution, and

α

is the parameter of the Dirichlet distribution.

The second part is a sampling layer used to generate the latent variables z by sampling from the Dirichlet distribution. Approximately, we sample z from the softmax Gaussian distribution through a reparameterization trick. The process is shown in Equation (14):

z = ϵ σ + μ ϵ \sim N (0, 1)

(14)

where z is the latent vector extracted by the variational encoder and

ϵ

is the random error tensor subject to the standard normal distribution.

The third part is a dense decoding network that is used to reconstruct the input trajectory segment and parameterize the following distribution, which can be formulated as Equation (15):

p_{θ} (x| z, y) = D (x ∣ f_{d} (z, y))

(15)

where

D (.)

denotes the distribution of the input trajectory segments and

f_{d} (.)

denotes the function of the dense decoding network.

3.2.4. Model Objective

Transportation mode identification is a typical multi-classification problem. For labeled trajectory segments, the

q_{φ} (y ∣ x)

of the classifier module can be optimized by the cross-entropy loss, which is defined as Equation (16):

L_{c} = \frac{1}{N} \sum_{i} L_{i} = - \frac{1}{N} \sum_{i} \sum_{c = 1}^{M} y_{i c} log (p_{i c})

(16)

where M is the number of labels and

p_{i c 0}

represents the probability that sample i belongs to label c.

The objective function of our model is derived from what has been referred to by [36]. For trajectory segments with transportation mode labels in the dataset, the objective function is defined as Equation (17):

L_{l} = - E_{q_{φ} (z ∣ x, y)} [log p_{θ} (x ∣ y, z) + log p_{θ} (y) + log p (z) - log q_{φ} (z ∣ x, y)]

(17)

where

q_{φ}

represents the encoder network and

p_{θ}

represents the decoder network.

Compared with the above-mentioned function, the objective function for trajectory segments without transportation mode labels in the dataset is defined as Equation (18):

L_{u} = - E_{q_{φ} (y, z ∣ x)} [log p_{θ} (x ∣ y, z) + log p_{θ} (y) + log p (z) - log q_{φ} (y, z ∣ x)]

(18)

where

p_{θ}

and

q_{φ}

still represent the decoder and encoder networks, respectively.

Finally, the model objective function is obtained by the combination of objective functions corresponding to the three parts, which is defined as Equation (19):

L_{t o t a l} = L_{c} + γ (L_{l} + L_{u})

(19)

where

γ

is the hyper-parameter used to control the relative importance of classifier loss and autoencoder loss.

4. Experiment Results

In this section, we explain and discuss the results of GeoSDVA through a series of experiments. All experiments are performed on the same PC with a Core i7 3.20 GHz processor and 32.0 GB memory. All models and experiments are implemented via TensorFlow v1.8 and Scikit-learn.

4.1. Gps Data

In this paper, three real-life datasets, Geolife V1.3, MTL Trajet 2017, and MTL Trajet 2016, are used to evaluate the proposed approach. An overiew of these three datasets is shown in Table 2.

The Geolife v1.3 dataset collected five years’ worth of GPS data from 182 users between April 2007 and August 2012. Among these users, 73 of them have marked their transportation modes. The transportation mode labels of trajectories in the Geolife dataset mainly include walk, bike, bus, car, taxi, train, subway, airplane, boat, run, etc.

The MTL Trajet 2017 dataset collected the GPS data of 4425 users from 18 September to 15 October 2017. In the MTL Trajet 2017 dataset, user-labeled transportation modes include public transportation, walk, car/motorcycle, bike, car sharing, taxi, etc.

The MTL Trajet 2016 dataset collected GPS data from September to November 2016. In this dataset, user-labeled transportation modes include public transportation, walk, automobile, bike, etc.

There are three main reasons for why we use these datasets to explore the performance of the GeoSDVA model. First, most of the trajectories in these datasets are collected in an urban environment, so we can obtain the corresponding GIS data through open data sources. Second, there are more than three transportation modes in these datasets, which enables us to verify the transportation mode identification performance of GeoSDVA. Third, and last, the datasets are composed of some labeled and some unlabeled trajectories, which enables us to explore the semi-supervised learning ability of GeoSDVA.

In order to ensure that the GPS trajectories can be associated with GIS data, we filtered the data according to geographical areas. For the Geolife dataset, we selected the GPS readings in Beijing (116.15° E, 39.75° N to 116.6° E, 40.1° N), accounting for about 70% of the total dataset. For the MTL Trajet 2017 and the MTL Trajet 2016 datasets, we selected the GPS readings in Montreal (73.942° W, 45.415° N to 73.479° W, 45.701° N), accounting for about 90% of each dataset.

We selected only ground transportation modes for the experiment. In the Geolife dataset, trajectories labeled as taxi, car, bus, walk, and bike are selective objects, with car and taxi falling into one category and labeled as “car” in particular. Finally, the experimental dataset covers four transportation modes: bike, public transit (bus), car, and walk. The trajectories with other labels or without labels are considered unlabeled trajectories. In the MTL Trajet 2017 dataset, we selected the trajectories labeled as public transit, taxi, car-sharing, car/motorcycle, bicycle, and walking for the experiment. The data labeled as car-sharing, taxi, and car/motorcycle are combined as “car”. Ultimately, the dataset contains four labels: walk, public transit, car, and bike. The trajectories with multiple labels, with other labels, and without labels are regarded as unlabeled trajectories. In the MTL Trajet 2016 dataset, we select the trajectories labeled as walk, bike, public transit, and automobile. The data labeled as automobile are relabeled as “car”. Finally, the dataset contains walk, bike, public transit, and bike. The distribution of various label trajectories in each dataset is shown in Table 2.

4.2. GIS Data

The GIS data we use is mainly come from the road network data of OSMNX [42]. The road networks of Beijing and Montreal are shown in Figure 4. Road network data consists of road network nodes and road network edges. Taking the Beijing road network data as an example, Table 3 shows partial data of the road network edge, and Table 4 shows partial data of the road network node. For the road network edges, we determine the level of the road network based on the value of the highway attribute, which mainly includes “motorway”, “trunk”, “primary”, “secondary”, “tertiary”, “unclassified”, and “residential”. For the road network nodes, we filter the nodes with the highway attribute values of “crossing”, “traffic_signals”, and “motorway_junction”. For the bus station data, we use a public data source to extract their coordinates. The data of bus stations in Beijing comes from public SHP files; the data of bus stations in Montreal comes from Google Transit API.

4.3. Experiment Setup

In the experiment, several typical evaluation metrics, such as accuracy, precision, recall, and macro F1-score, are employed to evaluate the performance of the proposed model. The calculation methods in this paper are shown in Equations (17)–(21). In the confusion matrix of the multi-class identification, when these evaluation metrics are calculated for each category, the samples that belong to this category are considered positive. The samples that do not belong to this category are negative. Accuracy is used to evaluate the overall identification of the model in the test set. Since the transportation mode in the datasets are unbalanced, the macro F1-score is used to evaluate the overall performance of the model in identifying each transportation mode in the test set.

A c c u r a c y = \frac{T P}{T P + F P + F N + T N}

(20)

P r e c i s i o n = \frac{T P}{T P + F P}

(21)

R e c a l l = \frac{T P}{T P + F N}

(22)

F 1 - S c o r e = \frac{2 \cdot P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l}

(23)

M a c r o - F 1 = \frac{1}{N} \sum_{i = 0}^{N} F 1 - S c o r e

(24)

In all experiments, L is set to 100. If the number of GPS readings in a segment is less than 100, we use linear interpolation for it. At the beginning of model training, the loss of encoder is much greater than the loss of classifier, which will lead to the optimizer ignoring the loss of classifier. In order to alleviate this problem, we set

γ

in the loss function to 0.1 to weigh the relative importance between classifier loss and encoder loss.

In the following subsection, we evaluated the performance of GeoSDVA on semi-supervised transportation mode identification. Firstly, the proposed GeoSDVA was compared with other baseline methods by the experiment to explore GeoSDVA’s ability to use information from unlabeled trajectories under the guidance of different numbers of labeled trajectories obtained by adjusting their ratios. Secondly, we analyzed how the knowledge learned by DirVAE from unlabeled trajectories helped the task of transportation mode identification. Finally, the transportation mode of unlabeled trajectories in specific areas was identified using the trained GeoSDVA, and its results were analyzed.

4.4. Benchmarks

To illustrate the advantages of GeoSDVA, we compare the proposed model with several baseline methods, listed below:

DT: This method, developed by Zheng et al. [19], extracts some statistical features from the GPS trajectory segment and identifies the transportation mode through a C4.5 decision tree;
RF: This method, developed by Xiao et al. [31], utilizes tree-based ensemble models, such as random forests, to infer the transportation modes;
SVM: This method, developed by Bolbol et al. [43], uses segment-level statistical features and an SVM to identify transportation modes;
DNN: This method, developed by Endo et al. [23], uses a dense network to extract features from the images composed of meshed trajectories and to identify transportation modes;
CNN-GAN: This method, developed by Li et al. [34], uses generative adversarial networks to expand the labeled trajectory dataset, and adopts the enhanced dataset to train a six-layer CNN network for transportation mode identification;
SECA: This model, developed by Dabiri et al. [44], follows the autoencoder framework for semi-supervised learning and uses a six-layer CNN network as both an encoder and a classifier to share weight between the labeled and unlabeled trajectories.
Pseudo-label: This model, developed by Dabiri et al. [44], uses a self-training strategy for semi-supervised learning;
Two-step: This model, developed by Dabiri et al. [44], features a two-step training model which first trains the encoder–decoder network and then trains the classifier based on the encoder’s latent vector.

It should be noted that the sample forms required by these methods are different. For experimental fairness, all samples are generated from the same dataset composed of raw GPS trajectory segments. During the experiment, labeled trajectories are divided into train datasets and test datasets, according to the ratio of 8:2, and all unlabeled trajectories participate in the training. By changing the number of labeled trajectories, we randomly sample the training dataset according to the label proportion. Meanwhile, the unlabeled dataset remains unchanged.

4.5. Identification Performance

Firstly, we validate the advantage of our proposed GeoSDVA mode over its counterparts. We build the GeoSDVA and other baseline methods separately with varying amounts of labeled trajectories, and then evaluate them on the same test dataset. The evaluted results between GeoSDVA and other baseline methods on three real datasets are listed in Table 5, Table 6 and Table 7, with the best performances shown in bold. The percentage in the first row of the tables represents the sampling rate of the labeled training dataset. For example, 10% means that only 10% of the labeled trajectories participated in the training.

According to Table 5, Table 6 and Table 7, GeoSDVA performs better in identifying transportation modes than the baseline method under any sampling rate of labeled trajectories. The experiments SVM, DT, and DNN, which use two datasets and are based on supervised learning, generally show disadvantages compared with other methods. When the sampling rate of labeled data is low, compared with other supervised learning methods, RF shows some advantages. However, its identifying performance is not good enough when the sampling rate of labeled data is significant. Moreover, it is worth noting that the two-step model performs poorly on both datasets, especially on Geolife, as is seen from Table 5. In the two-step model, the classifier does not participate in the first training step, which may lead to the encoder’s failure to learn the information that is helpful for transportation mode identification. Compared with the SECA, pseudo-label, and CNN-GAN, the macro F1-score of GeoSDVA improves from 4% to 7% in the Geolife dataset, from 0.4% to 3% in the MTL Trajet 2017 dataset, and by over 0.8% in the MTL Trajet 2016 dataset when the sampling rate is 10%. SECA uses an autoencoder and classifiers with shared weights for semi-supervised learning. Due to the lack of constraints on the latent vector learned by the autoencoder, SECA cannot effectively use the unlabeled trajectory when there is less labeled data. The pseudo-label method labels the unlabeled trajectories in the training process continuously, but the pseudo-label of the unlabeled trajectories may be wrong. The CNN-GAN method uses labeled trajectories to expand the dataset through a GAN network, rather than using unlabeled trajectories for learning. When all labeled data are used to train the model, GeoSDVA can still achieve the best performance. Compared with other methods, its macro F1-score improves by about 2% on the Geolife dataset, from 0.6% to 1.6% on the MTL Trajet 2017 dataset, and by at least 2% on the MTL Trajet 2016 dataset. This is because GeoSDVA uses GIS features that are not used by other models, which makes it easier to distinguish between transportation modes.

Overall, there are two main reasons why GeoSDVA has advantages over the baseline model. First, GeoSDVA combines the geographic information around the trajectories, which makes it easier to distinguish between different transportation modes, resulting in a higher macro F1-score. Second, compared with the semi-supervised models based on deep learning, such as the SECA, two-step, and CNN-GAN methods, the semi-DirVAE in GeoSDVA can better extract the features from the unlabeled trajectories, enabling a better performance when there are few label trajectories. We will analyze the role of these two factors in the following subsections.

4.6. Advantage of Semi-Supervised DirVAE

In this subsection, we analyze the advantage of the semi-supervised DirVAE in the model. Firstly, with varying amounts of labeled trajectories, we separately build the GeoSDVA and the supervised-GeoSDVA, which is a variant model that only retains the classifier module. We test the two models on the same test dataset and analyze the impact of the semi-supervised DirVAE on transportation mode identification. Table 8 show the comparison between GeoSDVA and supervised-GeoSDVA. The comparison results indicate that GeoSDVA performs significantly better than supervised-GeoSDVA, especially in the case of fewer labeled trajectory datasets. Obviously, it can been seen the macro F1-score of GeoSDVA improves from 7% to 8% when 10% of the labeled trajectories are in training. While all labeled trajectories are involved in training, the macro F1-score of GeoSDVA is about 2–6% higher than that of supervised-GeoSDVA. The experiment results show that no matter how many labeled trajectories participate in the training, the transportation mode identification performance of the model is improved when DirVAE is added. When the proportion of labeled trajectories is relatively low, the improvement brought by DirVAE is very obvious, especially in terms of macro F1-score, taking the identification results on the MTL Trajet 2016 dataset when the label data proportion is 25% as an example. Figure 5a,b shows the confusion matrices of GeoSDVA and supervised-GeoSDVA, respectively. Due to the small amount of labeled trajectories, the supervised-GeoSDVA identifies a large number of trajectories as car and public transit incorrectly. As a comparison, the identification result of GeoSDVA does not have this problem. This proves that the addition of DirVAE can make the model obtain better identification results when there are fewer labeled trajectories.

To further analyze how the semi-supervised DirVAE uses unlabeled data to help with transportation mode identification, we extract the latent vector of the DirVAE encoder module from the pre-trained GeoSDVA. For three datasets, a random sample of 20,000 trajectory segments is taken from both labeled and unlabeled trajectory datasets, respectively. Then, these trajectories are input into the pre-trained GeoSDVA to obtain their latent vectors. Figure 6 shows the distribution of the latent vectors after the T-SNE dimensionality reduction.

Overall, the latent vectors of different transportation mode trajectories extracted by DirVAE have clear boundaries. From the visualization of the latent vector of Geolife, it is found that the latent vectors of the car (green scatters) and public transit (pink scatters) labels have clear boundaries, but the latent vectors of the bike (blue scatters) and walk (yellow scatters) labels are intertwined in some spaces. This is because some walk and bike trajectories in the Geolife dataset are in a similar geographical environment and have similar motion features. Moreover, a clear boundary between the walk (blue scatters) label and other trajectories appears on the visualization of the latent vector of the MTL Trajet 2017 dataset. By comparison, the boundary between the latent vectors of the public transit (pink scatters), bike (blue scatters), and car (green scatters) labels is not very obvious. On the MTL Trajet 2016 dataset, we also observed that cars (green scatters) and public transit (pink scatters) are intertwined. There are two reasons for this result. In the MTL Trajet 2017 dataset, car trajectories are much more common than other transportation mode trajectories. In the MTL Trajet 2016 dataset, car trajectories account for more than 60% of the total. Second, some trajectories labeled as public transit in the MTL Trajet 2017 and MTL Trajet 2016 datasets do not stop near bus stations periodically during movement. These trajectories may not be recorded under bus or railway mode, which makes it difficult to distinguish these trajectories from car trajectories.

4.7. Influence of Geographic Information

In this subsection, we analyze the role of geographic information in transportation mode identification. We speculate that, in urban areas, the geographic information around the trajectory is helpful for distinguishing transportation modes. We delete three features related to geographic information in trajectory embedding, and then train the GeoSDVA model on two datasets, respectively. In Table 9, we use “Motion+GIS” and “Motion” to distinguish between models with different features.

According to the experimental results, on the Geolife dataset, the F1-scores of public transit, car, and walk have been improved with the supplementary geographic information. Among them, the F1-score of the car trajectories increased most significantly, to 6%. At the same time, the F1-score of the bike trajectories decreased by about 0.5%. Comparatively, on the MTL Trajet 2017 dataset, the supplement of geographic information has improved the F1-scores of all transportation modes, as is demonstrated in the above table, with the increase of public transit, bike, car, and walk, respectively, by 0.6%, 0.4%, 0.1%, and 1%. On the MTL Trajet 2016 dataset, the supplement of geographic information has improved the F1-scores of public transit, bike, car, and walk, respectively, by 6%, 0.6%, 1.2%, and 4.8%. From a macro point of view, the identification performance of GeoSDVA has been improved with the supplement of geographic information on three datasets.

We visualized the confusion matrices of three datasets with different features, as shown in Figure 7, Figure 8 and Figure 9. On the Geolife dataset, the typical misclassification occurs between bike and walk, as well as between public transit and car. The addition of GIS features improves the identification accuracy of public transit, bike, and car, but reduces the identification accuracy of walk. According to the confusion matrix, the misclassification of public transit as car is reduced by 0.8%, and that of car as public transit is reduced by about 1.4%. Similarly, the misclassification of bike as walk is reduced by about 1.2%, while the misclassification of walk as bike is improved by 0.6%. The reason for the reduction may be that walk and bike have similar GIS features. On the MTL Trajet 2017 dataset, it is difficult to correctly identify public transit trajectories, which will be incorrectly identified as car and walk. The addition of GIS features improves the identification accuracy of all transportation modes. However, the addition of GIS features cannot effectively distinguish public transit from car. The proportion of misclassifications of public transit as car has hardly changed. The improvement of public transit is mainly due to fewer misclassifications as walk and bike. The public transit trajectory in this dataset may include other modes, such as ferry cars, that do not stop at bus stations, which makes it difficult to distinguish between public transit and car. On the MTL Trajet 2016 dataset, the results observed are similar to those on the MTL Trajet 2017 dataset. GIS features improve the identification accuracy of all transportation modes. The addition of GIS features does not increase the proportion of public transit trajectories misclassified as car trajectories, but significantly reduces the proportion of public transit trajectories misclassified as walk trajectories. Moreover, the proportion of bike misidentified as car is reduced by 2%, and the propotation of car misidentified as public transit is reduced by 2.3%; the walk identification is not improved significantly. In general, the addition of GIS features improves the accuracy of transportation identification on each dataset.

4.8. Case Study

In this subsection, we use GeoSDVA to analyze the distribution of transportation modes in a specific area of Beijing, located between (116.300831° E, 39.971985° N) and (116.333889° E, 40.026152° N), as shown in Figure 10a. In this area, there is a university campus, around which there are some roads. In order to better explain our results, we visualized the road level and bus stations in this area, as shown in Figure 10b. It can be seen that the university campus is surrounded by “Primary” and “Secondary” level roads, while the road level inside the university campus is “Unclassified” and “Residential”. In addition, the cyan diamond symbols represent the locations of bus stations. In this area, bus stations are mainly distrbuted on high-grade roads, such as “Primary” and “Secondary” level roads. Obviously, there are no bus stations inside the university campus. We have selected all trajectories located in this area that were not marked by the user from the Geolife dataset, and utilized the pre-trained GeoSDVA model to identify the transportation modes of these trajectories. Since the GeoSDVA model requires an input with fixed length, we use the sliding window and maximum vote to determine the transportation mode of a trajectory.

Figure 11 shows the four transportation mode trajectories identified by GeoSDVA in this area, respectively. The identification results indicate that a large number of walk and bike trajectories exist inside the university campus, while public transit and car trajectories rarely appear there. Inside the university campus, people tend to walk and use bikes, while the use of cars in this area may be limited. For a better explanation, we provide statistics for the proportion of various transportation modes inside and outside the university campus, as shown in Figure 12. In this area, about 14.3% of the trajectories are walking trajectories and 42.8% are bike trajectories inside the university campus. In contrast, only 2% of the trajectories found inside the univeristy campus are public transit ones, and 3% of them are car trajectories.In addition, we also found some other phenomena. In the northeast section of the area, there is a “Motorway” level road. Walk and bike trajectories are not identified on this road, because these two modes are not allowed to use it. We identified fewer public transit trajectories than car trajectories inside the university campus. This is because there is no bus station inside the university campus, so the bus station has become a strong support to distinguish between the trajectories of cars and public transit in this area. Moreover, we also observed that, compared with the car trajectories, the public transit trajectory appears on roads with a higher road level, which is also determined by the location of bus stations. In general, the identification results of GeoSDVA are consistent with common sense in reality, and its results have reference value.

5. Discussion and Conclusions

This paper proposes a novel semi-supervised model named GeoSDVA to identify residents’ transportation modes. First, the input of GeoSDVA is designed for GPS readings that combine motion features with geographic information. Second, a semi-supervised model based on DirVAE is built to learn features that are helpful for transportation mode identification from unlabeled trajectories. Then, with the evaluation of GeoSDVA on two real GPS trajectory datasets, it is proved that GeoSDVA has the ability of semi-supervised learning when only a few trajectories are labeled, and its performance in transportation mode identification is better than the baseline methods. After that, we analyze how the DirVAE module obtains helpful information from unlabeled trajectories for transportation mode identification. Moreover, we analyze the impact of geographic information on the identification of specific transportation modes. Finally, we use a pre-trained GeoSDVA to label and analyze the unlabeled trajectories in an area, which shows that the identified results have practical significance.

GeoSDVA can be applied to practical decision making by urban management departments and transport companies in view of the following advantages. First, a large number of unlabeled GPS trajectories is easy to collect, while GeoSDVA only needs a few expensive labeled trajectories in the training, which brings a cost advantage. Second, low-carbon travel has now become a hotspot in transportation research areas. There are significant differences in the carbon emissions of different transportation modes. GeoSDVA can be used to identify the carbon emissions of individual travelers, so as to help the government formulate relevant policies to encourage green travel. Third, the transportation mode identification of GPS trajectories in the city by GeoSDVA can help formulate other urban planning schemes. For example, clustered car trajectories in a city can help find frequent paths, so as to set up appropriate public transit routes to meet residents’ travel needs and reduce environmental pollution.

In addition, GeoSDVA has the following advantages over existing transportation mode identification models. Compared with the model that only uses motion features for transportation mode identification, GeoSDVA considers the affect of GIS information on trajectory, and uses deep networks to extract relevant features. Compared with the transportation mode based on traditional machine learning, the GeoSDVA model does not need to design statistical features manually, and can automatically extract features from the raw trajectories. Compared with the transportation identification model based on supervised learning, GeoSDVA can use the unlabeled trajectories to learn, and can use a small number of labeled trajectories for accurate transportation mode identification, which solves the problem of the high cost associated with labeling trajectories. Compared with the semi-supervised transportation identification mode based on an autoencoder, GeoSDVA utilizes a variational encoder structure to build a more constrained latent space and improve the semi-supervised learning ability.

In the future, we will improve and enhance our work from the following three points. Firstly, there may be some transportation modes in the unlabeled dataset that do not appear in the labeled dataset. The trajectories of these unknown transportation modes may bring noise to the semi-supervised learning model. Secondly, privacy protection in location-based services has increasingly become a research hotspot. Transportation mode identification in a privacy-protection scenario is a problem worth studying. Finally, it is very attractive to introduce more GIS features for transportation mode identification. GIS features such as urban land types may be helpful to distinguish between trajectories.

Author Contributions

Conceptualization, Xiaoxi Zhang and Yuan Gao; methodology, Xiaoxi Zhang; software, Xiaoxi Zhang; validation, Xiaoxi Zhang; formal analysis, Xiaoxi Zhang; resources, Yuan Gao and Jun Feng; data curation, Xiaoxi Zhang; writing—original draft preparation, Xiaoxi Zhang; writing—review and editing, Yuan Gao, Yan Shi and Xin Wang; visualization, Xiaoxi Zhang; supervision, Yuan Gao, Xin Wang and Jun Feng; project administration, Yuan Gao; funding acquisition, Yuan Gao. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been supported by the National Social Science Foundation of China, “Research on the construction and application of spatio-temporal model of museum tourism in the Yellow River basin based on deep learning” (20BTJ047).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Geolife V1.3 https://www.microsoft.com/en-us/download/details.aspx?id=52367, accessed on 10 March 2022. MTL Trajet 2016 dataset and MTL Trajet 2017 dataset https://donnees.montreal.ca/ville-de-montreal/mtl-trajet, accessed on 10 March 2022.

Conflicts of Interest

The authors declare no conflict of interest.

References

Khosroshahi, A.; Ohn-Bar, E.; Trivedi, M.M. Surround vehicles trajectory analysis with recurrent neural networks. In Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016. [Google Scholar]
Eluru, N.; Chakour, V.; El-Geneidy, A.M. Travel mode choice and transit route choice behavior in Montreal: Insights from McGill University members commute patterns. Public Transp. 2012, 4, 129–149. [Google Scholar] [CrossRef]
Ding, N.; He, Q.; Wu, C.; Fetzer, J. Modeling Traffic Control Agency Decision Behavior for Multimodal Manual Signal Control Under Event Occurrences. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2467–2478. [Google Scholar] [CrossRef]
Shewmake, S.; Jarvis, L. Hybrid cars and HOV lanes. Transp. Res. Part A 2014, 67, 304–319. [Google Scholar] [CrossRef]
Wang, B.; Gao, L.; Juan, Z. Travel Mode Detection Using GPS Data and Socioeconomic Attributes Based on a Random Forest Classifier. IEEE Trans. Intell. Transp. Syst. 2017, 19, 1547–1558. [Google Scholar] [CrossRef]
Zheng, Y.; Liu, L.; Wang, L.; Xie, X. Learning transportation mode from raw gps data for geographic applications on the web. In Proceedings of the 17th International Conference on World Wide Web, Beijing, China, 21–25 April 2008. [Google Scholar]
Zhu, X.; Li, J.; Liu, Z.; Wang, S.; Yang, F. Learning Transportation Annotated Mobility Profiles from GPS Data for Context-Aware Mobile Services. In Proceedings of the 2016 IEEE International Conference on Services Computing (SCC), San Francisco, CA, USA, 27 June–2 July 2016. [Google Scholar]
Gomez, L.P.; Szybalski, A.T.; Thrun, S.; Nemec, P.; Urmson, C.P. Transportation-aware physical advertising conversions. U.S. Patent No. 8,630,897, 14 January 2014. [Google Scholar]
Stenneth, L.; Wolfson, O.; Yu, P.S.; Xu, B. Transportation Mode Detection using Mobile Phones and GIS Information. In Proceedings of the 19th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2011, Chicago, IL, USA, 1–4 November 2011. [Google Scholar]
Yu, B.; Yin, H.; Zhu, Z. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. arXiv 2018, arXiv:1709.04875. [Google Scholar] [CrossRef] [Green Version]
Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Li, H. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3848–3858. [Google Scholar] [CrossRef] [Green Version]
Lin, Y.; Chiang, Y.Y.; Franklin, M.; Eckel, S.P.; Ambite, J.L. Building Autocorrelation-Aware Representations for Fine-Scale Spatiotemporal Prediction. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020; pp. 352–361. [Google Scholar] [CrossRef]
Lin, Y.; Mago, N.; Gao, Y.; Li, Y.; Chiang, Y.Y.; Shahabi, C.; Ambite, J.L. Exploiting Spatiotemporal Patterns for Accurate Air Quality Forecasting Using Deep Learning. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems; Association for Computing Machinery: New York, NY, USA, 2018; pp. 359–368. [Google Scholar] [CrossRef]
Yue, M.; Li, Y.; Yang, H.; Ahuja, R.; Chiang, Y.Y.; Shahabi, C. DETECT: Deep Trajectory Clustering for Mobility-Behavior Analysis. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019. [Google Scholar]
Dabiri, S.; Markovi, N.; Heaslip, K.; Reddy, C.K. A deep convolutional neural network based approach for vehicle classification using large-scale GPS trajectory data. Transp. Res. Part C Emerg. Technol. 2020, 116, 102644. [Google Scholar] [CrossRef]
Kasahara, H.; Iiyama, M.; Minoh, M. Transportation mode inference using environmental constraints. In Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication—IMCOM '7—Transportation Mode Inference Using Environmental Constraints 2017, Beppu, Japan, 5–7 January 2017; pp. 1–8. [Google Scholar]
Semanjski, I.; Gautama, S.; Ahas, R.; Witlox, F. Spatial context mining approach for transport mode recognition from mobile sensed big data. Comput. Environ. Urban Syst. 2017, 66, 38–52. [Google Scholar] [CrossRef]
Stopher, P.R.; Greaves, S.P. Household travel surveys: Where are we going? Transp. Res. Part A Policy Pract. 2007, 41, 367–381. [Google Scholar] [CrossRef]
Zheng, Y.; Li, Q.; Chen, Y.; Xie, X.; Ma, W.Y. Understanding mobility based on GPS data. In Proceedings of the 10th International Conference on Ubiquitous Computing, Seoul, Korea, 21–24 September 2008; pp. 312–321. [Google Scholar]
Sun, Z.; Ban, X. Vehicle classification using GPS data. Transp. Res. Part C 2013, 37, 102–117. [Google Scholar] [CrossRef]
Nitsche, P.; Widhalm, P.; Breuss, S.; Braendle, N.; Maurer, P. Supporting large-scale travel surveys with smartphones—A practical approach. Transp. Res. Part C Emerg. Technol. 2014, 43, 212–221. [Google Scholar] [CrossRef]
Zhu, Q.; Min, Z.; Li, M.; Min, F.; Huang, Z.; Gan, Q.; Zhou, Z. Identifying Transportation Modes from Raw GPS Data. In International Conference of Young Computer Scientists; Springer: Singapore, 2016; pp. 100–102. [Google Scholar]
Endo, Y.; Toda, H.; Nishida, K.; Kawanobe, A. Deep Feature Extraction from Trajectories forTransportation Mode Estimation. In Pacific-Asia Conference on Knowledge Discovery and Data Mining; Springer: Cham, Switzerland, 2016. [Google Scholar]
Dabiri, S.; Heaslip, K. Inferring transportation modes from GPS trajectories using a convolutional neural network. Transp. Res. Part C Emerg. Technol. 2018, 86, 360–371. [Google Scholar] [CrossRef] [Green Version]
Zhang, R.; Xie, P.; Wang, C.; Liu, G.; Wan, S. Classifying transportation mode and speed from trajectory data via deep multi-scale learning. Comput. Netw. 2019, 162, 106861.1–106861.13. [Google Scholar] [CrossRef]
Yu, J. Semi-supervised deep ensemble learning for travel mode identification. Transp. Res. Part C Emerg. Technol. 2020, 112, 120–135. [Google Scholar] [CrossRef]
Nawaz, A.; Huang, Z.; Wang, S.; Hussain, Y.; Khan, Z. Convolutional LSTM based transportation mode learning from raw GPS trajectories. IET Intell. Transp. Syst. 2020, 14, 570–577. [Google Scholar] [CrossRef]
Yu, J.J.Q. Travel Mode Identification With GPS Trajectories Using Wavelet Transform and Deep Learning. IEEE Trans. Intell. Transp. Syst. 2020, 22, 1093–1103. [Google Scholar] [CrossRef]
Xiao, G.; Juan, Z.; Zhang, C. Detecting trip purposes from smartphone-based travel surveys with artificial neural networks and particle swarm optimization. Transp. Res. Part C Emerg. Technol. 2016, 71, 447–463. [Google Scholar] [CrossRef]
Servizi, V.; Petersen, N.C.; Pereira, F.C.; Nielsen, O.A. Stop detection for smartphone-based travel surveys using geo-spatial context and artificial neural networks. Transp. Res. Part C Emerg. Technol. 2020, 121, 102834. [Google Scholar] [CrossRef]
Xiao, Z.; Wang, Y.; Fu, K.; Wu, F. Identifying Different Transportation Modes from Trajectory Data Using Tree-Based Ensemble Classifiers. ISPRS Int. J. Geo-Inf. 2017, 6, 57. [Google Scholar] [CrossRef] [Green Version]
Xiao, G.; Juan, Z.; Zhang, C. Travel mode detection based on GPS track data and Bayesian networks. Comput. Environ. Urban Syst. 2015, 54, 14–22. [Google Scholar] [CrossRef]
Wang, B.; Wang, Y.; Qin, K.; Xia, Q. Detecting Transportation Modes Based on LightGBM Classifier from GPS Trajectory Data. In Proceedings of the 2018 26th International Conference on Geoinformatics, Kunming, China, 28–30 June 2018. [Google Scholar]
Li, L.; Zhu, J.; Zhang, H.; Tan, H.; Du, B.; Ran, B. Coupled application of generative adversarial networks and conventional neural networks for travel mode detection using GPS data. Transp. Res. Part A Policy Pract. 2020, 136, 282–292. [Google Scholar] [CrossRef]
Li, Z.; Xiong, G.; Wei, Z.; Lv, Y.; Anwar, N.; Wang, F.Y. A Semi-supervised End-to-end Framework for Transportation Mode Detection by Using GPS-enabled Sensing Devices. IEEE Internet Things J. 2021, 1. [Google Scholar] [CrossRef]
Kingma, D.P.; Mohamed, S.; Rezende, D.J.; Welling, M. Semi-supervised learning with deep generative models. Adv. Neural Inf. Process. Syst. 2014, 27, 3581–3589. [Google Scholar]
Chen, X.; Xu, J.; Zhou, R.; Chen, W.; Fang, J.; Liu, C. TrajVAE: A Variational AutoEncoder model for trajectory generation. Neurocomputing 2021, 428, 332–339. [Google Scholar] [CrossRef]
Zhou, F.; Gao, Q.; Trajcevski, G.; Zhang, K.; Zhong, T.; Zhang, F. Trajectory-User Linking via Variational AutoEncoder. IJCAI 2018, 3212–3218. [Google Scholar] [CrossRef] [Green Version]
Jahangiri, A.; Rakha, H. Developing a support vector machine (SVM) classifier for transportation mode identification by using mobile phone sensor data. In Proceedings of the Transportation Research Board 93rd Annual Meeting, Washington, DC, USA, 12–16 January 2014; Volume 14, p. 1442. [Google Scholar]
Vincenty, T. Direct and Inverse Solutions of Geodesics on the Ellipsoid with Application of Nested Equations. Emp. Surv. Rev. 2013, 23, 88–93. [Google Scholar] [CrossRef]
Joo, W.; Lee, W.; Park, S.; Moon, I.C. Dirichlet variational autoencoder. Pattern Recognit. 2020, 107, 107514. [Google Scholar] [CrossRef]
Boeing, G. OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks. Comput. Environ. Urban Syst. 2017, 65, 126–139. [Google Scholar] [CrossRef] [Green Version]
Bolbol, A.; Cheng, T.; Tsapakis, I.; Haworth, J. Inferring hybrid transportation modes from sparse GPS data using a moving window SVM classification. Comput. Environ. Urban Syst. 2012, 36, 526–537. [Google Scholar] [CrossRef] [Green Version]
Dabiri, S.; Lu, C.T.; Heaslip, K.; Reddy, C.K. Semi-Supervised Deep Learning Approach for Transportation Mode Identification Using GPS Trajectory Data. IEEE Trans. Knowl. Data Eng. 2020, 32, 1010–1023. [Google Scholar] [CrossRef] [Green Version]

Figure 1. The effect caused by the location of a bus station on a trajectory.

Figure 2. The GeoSDVA framework.

Figure 3. The influence of road intersections and bus stations. (a) A car trajectory affected by a road intersection. (b) A bus trajectory affected by a bus station.

Figure 4. The road networks of Beijing and Montreal. (a) Beijing. (b) Montreal.

Figure 5. The comparison of the identification results. (a) GeoSDVA. (b) Supervised-GeoSDVA.

Figure 6. The visualization of the latent vectors. (a) Geolife dataset. (b) MTL Trajet 2017 dataset. (c) MTL Trajet 2016 dataset.

Figure 7. The Confusion matrix of the Geolife dataset. (a) Motion+GIS. (b) Motion.

Figure 8. The confusion matrix of the MTL Trajet 2017 dataset. (a) Motion+GIS. (b) Motion.

Figure 9. The confusion matrix of the MTL Trajet 2016 dataset. (a) Motion+GIS. (b) Motion.

Figure 10. Case study area. (a) Map. (b) Road network and bus stations.

Figure 11. Transportation mode identification by GeoSDVA.

Figure 12. Transportation mode distribution.

Table 1. Notation summary.

Symbol	Description
$p_{i}$	GPS reading, including longitude, latitude, and timestamp.
T	GPS trajectory, consisting of a series of GPS readings.
$V_{i}$	Velocity of the GPS reading $p_{i}$ .
$A_{i}$	Acceleration of the GPS reading $p_{i}$ .
$B_{i}$	Heading change rate of the GPS reading $p_{i}$ .
$R_{i}$	Road level of the GPS reading $p_{i}$ .
$C_{i}$	Binary variable reflecting the existence of road intersections near the GPS reading $p_{i}$ .
$S_{i}$	Binary variable reflecting the existence of bus stations near the GPS reading $p_{i}$ .
L	Number of GPS readings in the GPS segment.
W	Trainable parameters in the model.
$α$	Parameter of the Dirichlet distribution.
$μ$	Mean of the Gaussian distribution.
$σ$	Standard deviation of the Gaussian distribution.
z	Latent vector in the variational encoder.
x	Input of the model or layer.
y	Transportation mode label of the trajectory.
$γ$	Hyper-parameter in the loss function.

Table 2. GPS data description.

Dataset	City	Time	Label Trajectory	Unlabel Trajectory	Public	Bike	Car	Walk
Geolife V1.3	Beijing	April 2007 to August 2012	9592	9078	26.9%	14.2%	14.7%	43.9%
MTL Trajet 2017	Montreal	September 2017 to October 2017	74,218	111,067	20.7%	21.8%	39.6%	17.7%
MTL Trajet 2016	Montreal	September 2016 to November 2016	60,526	232,804	23.4%	6.9%	62.2%	7.3%

Table 3. Road network edge.

Osmid	One-Way	Highway	Length
323214282	False	residential	22.085
219439624	True	tertiary	172.687
169715232	True	tertiary_link	19.29
711919172	False	pedestrian	94.279

Table 4. Road network node.

x	y	Osmid	Highway
39.9964585	116.3869809	1732871775	traffic_signals
39.9964821	116.3854167	1732871776	traffic_signals
39.9966241	116.3940338	1732871792	crossing
39.8164954	116.4937731	289114266	motorway_junction
40.0800631	116.5271858	7825231285	crossing

Table 5. The performance of different models on the Geolife dataset.

Method	Metrics	10%	25%	50%	75%	100%
GeoSDVA	Accuracy	0.751	0.777	0.782	0.787	0.801
GeoSDVA	Macro F1	0.731	0.760	0.758	0.766	0.789
RF	Accuracy	0.745	0.757	0.759	0.760	0.765
RF	Macro F1	0.717	0.732	0.735	0.735	0.741
SVM	Accuracy	0.720	0.729	0.732	0.737	0.737
SVM	Macro F1	0.674	0.688	0.695	0.700	0.700
DT	Accuracy	0.636	0.647	0.635	0.642	0.651
DT	Macro F1	0.617	0.629	0.617	0.622	0.633
DNN	Accuracy	0.728	0.739	0.744	0.750	0.755
DNN	Macro F1	0.690	0.707	0.715	0.721	0.728
SECA	Accuracy	0.729	0.760	0.778	0.786	0.787
SECA	Macro F1	0.693	0.722	0.757	0.765	0.768
Pseudo-Label	Accuracy	0.734	0.770	0.778	0.785	0.783
Pseudo-Label	Macro F1	0.698	0.748	0.756	0.763	0.769
Two-Step	Accuracy	0.535	0.550	0.591	0.694	0.696
Two-Step	Macro F1	0.437	0.459	0.536	0.652	0.661
CNN-GAN	Accuracy	0.714	0.742	0.767	0.769	0.785
CNN-GAN	Macro F1	0.662	0.708	0.735	0.730	0.766

Table 6. The performance of different models on the MTL Trajet 2017 dataset.

Method	Metrics	10%	25%	50%	75%	100%
GeoSDVA	Accuracy	0.821	0.824	0.845	0.847	0.852
GeoSDVA	Macro F1	0.740	0.767	0.772	0.783	0.791
RF	Accuracy	0.790	0.796	0.801	0.801	0.803
RF	Macro F1	0.707	0.717	0.726	0.724	0.727
SVM	Accuracy	0.781	0.788	0.792	0.792	0.794
SVM	Macro F1	0.683	0.693	0.699	0.699	0.701
DT	Accuracy	0.712	0.713	0.713	0.718	0.718
DT	Macro F1	0.643	0.644	0.642	0.648	0.647
DNN	Accuracy	0.791	0.796	0.804	0.806	0.806
DNN	Macro F1	0.704	0.711	0.728	0.726	0.724
SECA	Accuracy	0.805	0.819	0.829	0.841	0.846
SECA	Macro F1	0.736	0.740	0.755	0.779	0.785
Pseudo-Label	Accuracy	0.800	0.821	0.824	0.834	0.834
Pseudo-Label	Macro F1	0.714	0.752	0.760	0.767	0.776
Two-Step	Accuracy	0.773	0.781	0.781	0.787	0.793
Two-Step	Macro F1	0.678	0.688	0.688	0.704	0.705
CNN-GAN	Accuracy	0.799	0.807	0.821	0.835	0.843
CNN-GAN	Macro F1	0.695	0.707	0.737	0.778	0.775

Table 7. The performance of different models on the MTL Trajet 2016 dataset.

Method	Metrics	10%	25%	50%	75%	100%
GeoSDVA	Accuracy	0.852	0.864	0.874	0.881	0.887
GeoSDVA	Macro F1	0.721	0.758	0.765	0.770	0.798
RF	Accuracy	0.829	0.861	0.862	0.863	0.864
RF	Macro F1	0.713	0.747	0.749	0.754	0.754
SVM	Accuracy	0.834	0.838	0.840	0.844	0.845
SVM	Macro F1	0.641	0.655	0.664	0.679	0.683
DT	Accuracy	0.787	0.786	0.783	0.792	0.787
DT	Macro F1	0.651	0.651	0.655	0.669	0.670
DNN	Accuracy	0.845	0.857	0.859	0.860	0.861
DNN	Macro F1	0.708	0.727	0.736	0.738	0.737
SECA	Accuracy	0.815	0.856	0.871	0.879	0.884
SECA	Macro F1	0.677	0.714	0.751	0.768	0.778
Pseudo-Label	Accuracy	0.842	0.852	0.870	0.875	0.874
Pseudo-Label	Macro F1	0.697	0.705	0.741	0.751	0.753
Two-Step	Accuracy	0.834	0.850	0.851	0.851	0.852
Two-Step	Macro F1	0.646	0.695	0.686	0.687	0.696
CNN-GAN	Accuracy	0.846	0.857	0.868	0.870	0.872
CNN-GAN	Macro F1	0.684	0.721	0.739	0.739	0.775

Table 8. The performance of the supervised and semi-supervised models.

Dataset	Method	Metrics	10%	25%	50%	75%	100%
Geolife	GeoSDVA	Accuracy	0.751	0.777	0.782	0.787	0.801
	GeoSDVA	Macro F1	0.731	0.760	0.758	0.766	0.789
	Supervised	Accuracy	0.718	0.753	0.762	0.777	0.788
	Supervised	Macro F1	0.650	0.710	0.724	0.754	0.772
MTL Trajet 2017	GeoSDVA	Accuracy	0.821	0.824	0.845	0.847	0.852
	GeoSDVA	Macro F1	0.740	0.767	0.772	0.783	0.791
	Supervised	Accuracy	0.795	0.811	0.824	0.830	0.839
	Supervised	Macro F1	0.676	0.701	0.729	0.751	0.764
MTL Trajet 2016	GeoSDVA	Accuracy	0.852	0.864	0.874	0.881	0.887
	GeoSDVA	Macro F1	0.721	0.758	0.765	0.770	0.798
	Supervised	Accuracy	0.834	0.842	0.843	0.861	0.862
	Supervised	Macro F1	0.648	0.631	0.680	0.726	0.733

Table 9. The influence of geographic information.

Dataset	Feature	Accuracy	Macro F1	Public Tranist	Bike	Car	Walk
Geolife	Moiton+GIS	0.801	0.789	0.766	0.822	0.726	0.842
Geolife	Motion	0.795	0.781	0.761	0.827	0.700	0.837
MTL Trajet 2017	Moiton+GIS	0.852	0.791	0.550	0.930	0.901	0.784
MTL Trajet 2017	Motion	0.848	0.786	0.544	0.926	0.900	0.774
MTL Trajet 2016	Moiton+GIS	0.887	0.798	0.594	0.885	0.937	0.778
MTL Trajet 2016	Motion	0.865	0.767	0.536	0.879	0.925	0.730

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, X.; Gao, Y.; Wang, X.; Feng, J.; Shi, Y. GeoSDVA: A Semi-Supervised Dirichlet Variational Autoencoder Model for Transportation Mode Identification. ISPRS Int. J. Geo-Inf. 2022, 11, 290. https://doi.org/10.3390/ijgi11050290

AMA Style

Zhang X, Gao Y, Wang X, Feng J, Shi Y. GeoSDVA: A Semi-Supervised Dirichlet Variational Autoencoder Model for Transportation Mode Identification. ISPRS International Journal of Geo-Information. 2022; 11(5):290. https://doi.org/10.3390/ijgi11050290

Chicago/Turabian Style

Zhang, Xiaoxi, Yuan Gao, Xin Wang, Jun Feng, and Yan Shi. 2022. "GeoSDVA: A Semi-Supervised Dirichlet Variational Autoencoder Model for Transportation Mode Identification" ISPRS International Journal of Geo-Information 11, no. 5: 290. https://doi.org/10.3390/ijgi11050290

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GeoSDVA: A Semi-Supervised Dirichlet Variational Autoencoder Model for Transportation Mode Identification

Abstract

1. Introduction

2. Related Work

2.1. Feature or Sample Design

2.2. Identification Model Design

2.3. Semi-Supervised Variational Autoencoder

3. Methodology

3.1. Trajectory Embedding

3.1.1. Motion Feature

3.1.2. Geographic Features

3.2. Semi-Supervised DirVAE Model

3.2.1. Encoding Module

3.2.2. Classifier Module

3.2.3. Decoding Module

3.2.4. Model Objective

4. Experiment Results

4.1. Gps Data

4.2. GIS Data

4.3. Experiment Setup

4.4. Benchmarks

4.5. Identification Performance

4.6. Advantage of Semi-Supervised DirVAE

4.7. Influence of Geographic Information

4.8. Case Study

5. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI