Next Article in Journal
Effect of Variation in Row Spacing on Soil Wind Erosion, Soil Properties, and Cyperus esculentus Yield in Sandy Land
Previous Article in Journal
Realizing a Rural Sustainable Development through a Digital Village Construction: Experiences from China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exploring the Individual Travel Patterns Utilizing Large-Scale Highway Transaction Dataset

1
Department of Transportation Engineering, Shandong Jianzhu University, Jinan 250101, China
2
Department of Computer Science, Shandong Jianzhu University, Jinan 250101, China
3
Shandong Hi-Speed Company Limited, Jinan 250014, China
*
Authors to whom correspondence should be addressed.
Sustainability 2022, 14(21), 14196; https://doi.org/10.3390/su142114196
Submission received: 21 September 2022 / Revised: 17 October 2022 / Accepted: 28 October 2022 / Published: 31 October 2022
(This article belongs to the Section Sustainable Transportation)

Abstract

:
With the spread of electronic toll collection (ETC) and electronic payment, it is still a challenging issue to develop a systematic approach to investigate highway travel patterns. This paper proposed to explore spatial–temporal travel patterns to support traffic management. Travel patterns were extracted from the highway transaction dataset, which provides a wealth of individual information. Additionally, this paper constructed the analysis framework, involving individual, and temporal and spatial attributes, on the basis of the RFM (Recency, Frequency, Monetary) model. In addition to the traditional factors, the weekday trip and repeated rate were introduced in the study. Subsequently, various models, involving K-means, Fuzzy C-means and SOM (Self-organizing Map) models, were employed to investigate travel patterns. According to the performance evaluation, the SOM model presented better performance and was utilized in the final analysis. The results indicated that six groups were categorized with a significant difference. Through further investigation, we found that the random traveler occupied over 40% of the samples, while the commuting traveler and long-range freight traveler presented relatively fixed spatial and temporal patterns. The results were also meaningful for highway authority management. The discussion and implication of travel patterns to be integrated with the dynamic pricing strategy were also discussed.

1. Introduction

Full understanding of traveler characteristics is necessary for highway authorities to provide better service to customers. With the development of information technology, electronic toll collection (ETC) and electronic payment have been widely utilized in highway management to record travel information. Compared to a manual tollbooth, ETC technology is able to improve traffic capacity and reduce energy consumption at the same time [1]. In particular, fuel consumption and carbon monoxide emission can be reduced by 50% and 29%, respectively. By the end of 2020, the number of ETC customers had reached 204 million in China, which demonstrates a huge market for further highway system development [2]. Thus, it is meaningful to examine travel patterns to support the highway management and operation.
One of the dominant factors that depicts traveler characterization is the data availability. Previous literature on traveler mobility patterns solely focused on the lumped travel demand [3], which is not able to reveal the individual characteristics. However, electrical transaction technology collects the travel information bound to “one traveler—one vehicle—one ID”, which provides the opportunity to investigate individual travel patterns. According to statistics from the Ministry of Transport of China, the market penetration of ETC exceeds 65%. In terms of the vehicle type, ETC usage of freight and passenger cars are 53% and 70%, respectively. With the fast development and wide spread, ETC technology is believed to be integrated with the huge market of parking and refuel, which can promote the development of ITS. Moreover, the information related to ETC can be further examined in market and customer research, as the unique ID is binding to a unique person and a unique vehicle. However, the individual behavior for highway travelers was rarely investigated in existing research.
On the other hand, travelers are also defined as customers for the highway. To deal with the customer-oriented management, Customer Relationship Management (CRM), first proposed in 1990s, was conducted around the world in various areas [3,4]. Customer segmentation is one of the critical techniques for CRM to analyze customer characteristics and allocate service resources properly. For instance, Tsai et al. developed two clustering techniques to conduct customer segmentation and strategies for automobile dealership [5]. Cheng et al. investigated the channel preference of highspeed railway passengers in terms of the segmentation method [6]. In recent years, the application of big data mining technology in the CRF domine is the emerging trend, and the development of ETC technology also provides the opportunity for further customer research.
Thus, it is still meaningful to build comprehensive models in terms of the individual traveler’s characteristics to uncover travel patterns. This paper proposed the approach to examine the travel characteristics of highway travelers and provide the model to partition them into groups in terms of their mobility patterns. In addition to the traditional customer segmentation model, spatial and temporal factors were also examined. To construct the clustering model, several models were selected and compared.
The rest of the paper is depicted as follows. Section 2 introduces the literature review. Data sources are described in Section 3. Section 4 and Section 5 provide the methodology and discussion parts, respectively. The last section addresses the conclusion and implications.

2. Literature Review

Travel pattern analysis is essential for highway management and operation of traffic authorities. It is still a challenging issue to propose a systematic approach to recognize, analyze, and forecast traffic characteristics, since an extensive amount of literature was constrained by the data size and quality [7]. Generally, the utilized traffic datasets can be divided into two categories, including stationary data and probe data.
The most widely used stationary data are the sensor data, which can continuously capture the traffic trends of highway. Cao et al. reconstrued the traffic state index, including ratio of driving speed and traffic density, in terms of the fuzzy logic inference [8]. Through spatial–temporal analysis with a loop detector, Wen et al. proposed a short-term convolution neural network (CNN) deep learning framework for traffic flow prediction [9]. The results demonstrated the good performance of the model to capture the traffic features. Some other research also explored a multimode data fusion method, for instance, the homogeneous traffic data and heterogeneous data fusion, involving fixed traffic cameras, location information, and vehicular sensors [10]. However, the stationary data were anonymous data, which cannot provide the continuous information of travelers for individual analysis.
Compared to stationary data, the probe data that were investigated in the studies were GPS or location data from the floating vehicles. The GPS dataset derived from the taxi, collecting the taxi ID, coordinate information, and time stamp from 10 s to 5 min based on the quality of GPS device, is the most used dataset in previous research [11,12,13]. In this research, Kong et al. [11] and Yang et al. [12] built the traffic congestion forecasting model through reconstructing the floating vehicle trajectory. Considering the correlation between traffic activities and resident behavior, Fu et al. examined the spatial heterogeneity and migration characteristics of congestion through taxi trajectory data [13]. Other probe datasets were also introduced for travel-pattern analysis. Nadeem et al. employed transit GPS data to build five different prediction algorithms for the prediction of real-time congestion [14]. Through proposing the prediction method in terms of bus travel time, Huang et al. collected over 60,000 predictions of bus trajectory [15]. In recent years, cellular movements were also considered in the research of traffic simulation and identification of traffic patterns [16,17]. Nevertheless, this research was limited to part of the samples, as GPS devices cannot be mounted on all types of vehicles.
On the other hand, as aforementioned, customer segmentation was the efficient approach for CRM to evaluate the customer feature and customer behavior, in terms of enormous individual information. Currently, CRM research mainly involves the retail customer, energy consumption customer, and telecom customer [18,19,20,21] and was introduced to the transportation area with extensive data mining methods, such as K-means, K-means++, decision tree, fuzzy logic, and so on [22,23]. Smith defined the segmentation concept as the process to cluster similar customers with common characteristics, such as purchase behavior and individual property [24]. Chiang developed the model to assess the value of airline travelers and suggested several rules to optimize the target market for the CRM system [25]. Additionally, the data mining tools were utilized in the research from Ngai et al. to analyze customer data within the CRF framework [26]. Through constructing the widely used customer segmentation model, that is, the RFM (Recency Frequency Monetary) model, Qian et al. explored the segmentation of highway ETC (Electrical Toll Collection) customers [27]. However, the travel pattern analysis from the customer perspective was rarely conducted and the spatial–temporal features were not involved in the existing CRM study.
Therefore, this paper investigated the highway transaction data from Shandong Hi-speed Company Limited, involving over 8 million vehicles. Various models, such as traditional clustering algorithm K-means and machine learning methods SOM (self-organizing map) were employed in the study. The dataset and methodology are addressed detailly in the following sections.

3. Data Source

The utilized dataset took one month, ranging from 1 March 2021 to 31 March 2021. The highway transaction dataset captures a wealth of individual information, involving transaction ID, vehicle plate, time stamp and trading detail, while the standard format is shown in Table 1. Mover, the geo-information of highway toll stations was also collected to examine the spatial distribution of traffic.
Specifically, due to the system issue, there was missing or incorrect information in the transaction records, which were supplemented or removed. In the processed transaction dataset, over 8.9 million vehicles with 45.3 million records were included. In this paper, the transaction information of one vehicle was assumed to be bound to one traveler, even though other travelers might drive the same vehicle in various periods, such as taxis and freight vehicles.
As aforementioned, the RFM model, first proposed in 1994 [28], is efficient to investigate the behavior of customers. For highway travelers, R refers to the time interval when the traveler entered the highway, F represents the frequency of traveler use on the highway, and M shows average payment when using the highway. However, the highway transaction dataset captures not only recency, frequency and monetary information, but also temporal and spatial distribution of travel patterns. Thus, this paper proposes the analysis framework for highway traveler characterization. As Figure 1 presents, the individual, temporal, and spatial attributes are considered.
The individual attribute contains travel day, travel frequency, and total fee. Travel day reveals the number of days that the traveler enters highway in one month. Trip Frequency refers to the total number of highway trips for one traveler, while the Total Fee represents the aggregated consumption of trip payment. The temporal attribute includes Weekday Trip and Peak-hour Trip. Weekday Trip represents the number of highway trips on a weekday, while Peak-hour Trip represents the number of trips during peak-hour on a weekday. The peak-hour contains morning peak 7:00–9:00 am and night peak 5:00–7:00 pm. The temporal attribute demonstrates the travel habits related to trip purpose. For instance, commuting trips have a relatively fixed travel time in a day. On the other hand, the spatial attribute contains the repeated rate, which reveals the travel habits in space. Repeated Rate addresses the most repeated trips with the same origin and destination for one traveler. It can be expressed as Equation (1).
R e p e a t e d R a t e = max n { F 1 F n } F
where F n represents the number of trip n , while F represents the total number of highway trips.
The statistics of the traveler attributes are summarized in Table 2. According to the dataset, three types of vehicles include passenger vehicle, freight vehicle and special vehicle, which occupies 83.33%, 16.53% and 0.14%, respectively. The special vehicles refer to the highway patrol, fire rescue, and road maintenance vehicles for special tasks. On average, the trip frequency and total fee are 5.15 and 431.6 CNY, respectively. Due to the frequent entrance and exit of a special vehicle, such as highway patrol, the maximum trips’ frequency is up to 678, but the total cost is free. Interestingly, as for the temporal features, the travel day, weekday trip, and peak-hour trip are relatively low, which indicates a majority of the travelers have few highway trips less than 3 in one month. In contrast, the commuting travelers contribute to the high frequency of weekday and peak-hour trips, as they travel regularly. Moreover, the average repeated rate is 0.55, which is affected by the travelers with only one highway trip. The repeated rate for this type of traveler is 1.

4. Methodology

It is generally an unsupervised clustering issue to distinguish the highway traveler in terms of travel patterns. In order to investigate the traveler characterization, various models, involving traditional K-means, fuzzy C-means, and artificial SOM models, were employed. Additionally, the comparison and evaluation of clustering models were also addressed.

4.1. K-Means Model

The K-means algorithm is one of the traditional unsupervised learning algorithms based on the Euclidean distance [28]. It is capable of dealing with Spherical distributed data, but the initial number of clusters must be given in priority. The basic procedures of K-means algorithm are expressed below.
  • Randomly select K points as the initial cluster centroids from the whole sample. K was determined in advance.
  • Calculate the Euclidean distance between cluster centroids and other sample, then assign the sample to the cluster with the closest centroids. The Euclidean distance can be expressed as Equation (2).
d ( x , y ) = i ( x i y i ) 2
3.
Recalculate the positions of K cluster centroids when the new clusters are generated.
4.
Repeat Steps 2 and 3 until the cluster centroids no longer change. The objective function of K-means algorithm is expressed as Equation (3).
J c = i = 1 K X C i X M i 2
where X is the sample, while M i is the cluster centroid.

4.2. Fuzzy C-Means Model

The Fuzzy C-means model allows the sample to be assigned to more than one group with the consideration of fuzzy logic [29]. This is also the advantage of the Fuzzy C-means algorithm over the K-means method, in which the clustering result must entirely be represented in only one cluster. Similar to K-means method, the Fuzzy C-means algorithm also requires the initial input of cluster number K. The objective function and memberships can be expressed as Equations (4) and (5).
J z = i = 1 K j = 1 n μ i j X j M i 2
i = 1 K μ i j = 1
where z represents the Fuzzifier fixed by the traveler, μ i j represents the Degree of membership of observations j to cluster i .
The detailed procedures are addressed as follows:
  • Randomly select K points as the initial cluster centroids from the whole sample.
  • Calculate the cluster centroid M i using Equation (6).
M i = j = 1 n ( μ i j ) z X j j = 1 n ( μ i j ) z
3.
Update the membership μ i j as Equation (7).
μ i j = ( 1 d 2 ( X j , M i ) ) 1 z 1 i = 1 K ( 1 d 2 ( X j , M i ) ) 1 z 1
4.
Repeat the steps 2 and 3 until the value J z no longer decrease.

4.3. Self-Organizing Map (SOM) Model

The self-organizing map (SOM) is efficient in exploring the data feature, while the framework is presented in Figure 2 [30]. Different from the K-means and Fuzzy C-means approaches, the SOM model does not need the initial number of clusters. Like most artificial neural networks, the SOM has training and mapping modes. After constructing the map using input samples with a competitive process, also called vector quantization, the SOM model automatically classifies the input vector. Specifically, it is able to describe a mapping from a higher-dimensional input space to a lower dimensional map space. The detailed procedures are listed below.
  • The individual, temporal, and spatial attributes were extracted as the input layer. Each node is initialized randomly with the parameters and weights.
  • Randomly select the sample X = { x i } and compute the distance to each output node, while the node with shortest distance is defined as the winning node. The distance is expressed as,
d j ( x ) = i = 1 D ( x i w i j ) 2
where w i j is the vector of neuron, while D represents the dimension of input sample.
3.
The node adjacent to node I x was updated and activated. Recalculate the weight between I x and adjacent nodes, in terms of the Equation (9).
W j , I ( x ) = exp ( S j , I ( x ) 2 2 σ 2 )
where S i , j represents the Euclidean distance between node i and j .
4.
The parameters for each node are updated by the gradient descent, as Equation (11) shows.
w j i = η ( t ) W j , I ( x ) ( t ) ( x i w i j )
where η ( t ) represents the learning rate.
5.
Repeat step 2 until the iteration reach the final convergence.

4.4. Model Performance Evaluation

In order to assess the performance of various models, different metrics were employed in the paper. The most widely used metric was the Sum of Standard Error (SSE), which can be computed by the distance between samples to its cluster center. In addition, other measures were introduced in the research recently [31]. Generally, the measurements can be categorized into two types: external cluster validation and internal cluster validation. External cluster validation refers to the assessment between assigned cluster and external clusters, while the internal cluster validation refers to the assessment among samples within the assigned cluster.
The Silhouette Coefficient was selected in the paper. It measures how well the samples are grouped and estimates the distance of an object to its own cluster compared to other clusters. The range of Silhouette Coefficient is [−1, 1], while the higher value means the better performance. The Silhouette Coefficient can be computed as below:
S = 1 K i = 1 K 1 N ( i ) x ϵ C i a ( x ) b ( x ) max ( a ( x ) b ( x ) )
where K represents the cluster number. N ( i ) represents the samples in cluster i . C i represents the samples of cluster i .
Moreover, the Davies–Bouldin index (DBI) was also selected as the internal evaluation metric for assessment. It can be expressed as Equation (12).
D B = 1 K i , j = 1 K max i j ( σ i + σ j d ( C i , C j ) )
where σ i presents the average distance between cluster centroid and other points. d ( C i , C j ) is distance between the centroids of cluster C i and cluster C j .

5. Results and Discussion

According to the statistics for traveler attributes in Table 2, the average trip frequency was as low as 5.15, which indicates the majority of the highway travelers conduct infrequent and random trips. Additionally, contrary to the commuting trips on weekday, some travelers enter the highway on weekends only. In order to distinguish the travelers with low frequency and random travel pattern, this paper proposed to filter these travelers in preliminary analysis and provides the definition as follows.
  • Cluster 1: Travelers enter the highway only once.
  • Cluster 2: Travelers enter the highway only on weekends.
Subsequently, the traveler attributes have various forms and units, which must be transformed with the same and efficient format. Thus, this paper utilized the Z-score standardization approach to transform the attributes into a specific range. The formula is expressed as follows:
z s c o r e = X μ σ
where μ and σ represent the means value and standard deviation of samples, respectively.
On the other hand, the K-means and Fuzzy C-means models require the initial cluster number before the classification. This paper employed the elbow method to determine the number of clusters with the consideration of SSE and average Silhouette Coefficient. According to the elbow principle [32], the metrics for cluster number K would drop dramatically like an “elbow” while the K is increasing. As Figure 3 presents, the SSE and average Silhouette Coefficient decrease significantly while the cluster number is ranging from 2 to 6. Consequently, cluster numbers 2, 3, 4, 5, and 6 could be set as the alternative “elbow” numbers. However, the metrics reach a plateau and keep stable while the cluster number is over 6. Therefore, the initial cluster number for K-means and Fuzzy C-means models was set as 6. Th SOM does not require the initial number, as it obtains the optimized cluster automatically.
In terms of the filtered highway transaction dataset, three models, K-means, Fuzzy C-means, and SOM, were utilized to obtain the clustering results. Through the comparison and assessment in Figure 4 and Table 3, the SOM model demonstrated better performance than other models. As Figure 4 shows, the silhouette coefficients for SOM were all over 0, and the mean value was up to 0.45, which is presented by the red dotted line. In addition, Table 3 provides the comparison of multiple evaluation metrics. We found that the SSE of SOM obtained was 20% lower than other models. The silhouette coefficient and DB Index of SOM also presented better performance. Therefore, to explore the travel pattern with a large-scale dataset, the SOM had better performance and stability.
As described in above section, the SOM model was employed for the travel pattern analysis. Through the clustering, 6 groups of travelers were obtained and named C3 to C8, while C1 and C2 were defined as the travelers enter highway once and on weekend only, respectively. The results for traffic pattern analysis are shown in Table 4. The results indicate that the majority of travelers conducted fewer than 5 highway trips, while C1, C2, and C3 contribute to about 70% of all travelers. Especially for C3, the spatial repeated rate 0.421 indicated the trips are fixed, but no obvious spatial–temporal pattern was found. Thus, this group can be defined as the infrequent and random travelers. Additionally, group C4 and C7 not only had the most trip frequency and weekday trips, but also had a high repeated rate, which means the highway trips of these travelers were mainly related to commuting trips with regular patterns. Interestingly, the cluster C5 had the highest freight vehicle proportion 87.89% and average total fee, which indicates this group of travelers were mainly engaged in long-range freight transportation. Finally, the highway trips made by group C6 and C8 were mainly on weekday non-peak periods with a low repeated rate, and there were no fixed patterns found. These travelers are inferred to be related to short- or medium-range trip activity.
Moreover, the temporal and individual traffic patterns are further examined in Figure 5 and Figure 6. Figure 5 presents the temporal travel patterns for various clusters of travelers. We found that the afternoon peak for travel demand had little difference compared to the morning and night peak if there was no fixed travel time, such as groups C2, C3, C6, and C8. The group C5 mainly consisted of freight vehicles, and no peak for travel demand was found due to the flexible time for long-range transportation. In addition, group C4 and C7, defined as commuting travelers, illustrate obvious morning and night peaks. In Figure 6, vehicle type and payment type are presented. Except for the high proportion of freight vehicles in C5, the passenger vehicle occupied the majority. For payment type, the electric pay was mainly completed by cellphone, while the unknown means no records for payment type due to system mistakes. We found that ETC and Electrical Pay contributed almost 80% of the highway transaction, but there is no obvious difference for payment type between traveler clusters.
The results are also meaningful for highway authority management. Currently, the dynamic pricing strategy is one of the significant parts to promote the ITS (Intelligent Transportation System) on highways, with the consideration of traffic demand, time, weather conditions, and history price. This paper proposes a novel approach to improve the dynamic pricing management in terms of travel patterns. For instance, the commuting travelers, inferred from group C4 and C7, can be provided a time-based and trip-based strategy due to the regular trip time and trajectory. For group C5, mainly consisting of freight vehicles, the vehicle-type-based and range-based strategy can be provided due to the long-range trip and vehicle type. The detailed individual dynamic pricing strategy can be explored when more individual information is provided.

6. Conclusions

Electronic toll collection (ETC) and electronic payment have been considered an approach to reduce congestion and pollutants’ emission within the highway tollbooth area. Through mining the highway transaction dataset, this paper proposed to explore the traffic patterns of individual travelers to support the traffic management of highway authorities. Firstly, 8.9 million of travelers with 45.3 million records were kept in the highway transaction dataset after the data cleansing process. In addition, in order to reveal the individual travel pattern, the individual, spatial, and temporal attributes of highway travelers were extracted, and six factors were determined to build the analysis framework.
Subsequently, various models, involving the K-means, Fuzzy C-means and SOM models, were employed to investigate the travel patterns. Through the comparison, the SOM outperformed the other two models with the clustering evaluation metrics. Thus, it was utilized for the feature analysis. The results showed well-distinguished groups of highway travelers with significant spatial–temporal and individual characteristics. The random and infrequent travelers, commuting travelers, and long-range freight travelers were found through the analysis.
From a policy perspective, this paper signifies the individual and spatial–temporal attributes of highway travelers. The proposed approach is meaningful for highway authorities to understand travel patterns and manage highway transportation. In the discussion section, the classification results were integrated with the current dynamic pricing strategy, which is meaningful to improve the individual traffic service in the Intelligent Transportation System.
Due to the lack of information, it still takes a huge effort to explore the detailed traveler classification. For instance, the demographic and household information from the SP or RP survey could help understand the trip purpose. Further, more research is also required to investigate the long-term highway transaction dataset and the dynamic segmentation issue with increased data.

Author Contributions

In this paper, R.C. conducted the project administration and provided data resources; M.S. conducted data curation and validation; J.J. developed the methodology part on factor analysis and formal analysis; X.C. provided and arranged data resources; H.Z. completed the software test and operation; B.S. completed the writing—review and editing. X.W. completed the original writing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Funding (CN), grant number 41901396, 42001396 and Youth Innovations Science and technology support project in Colleges of ShanDong Province, grant 2021KJ058, Natural Science Funding of Shandong Province, grant number ZR2021MG032. The APC was funded by National Natural Science Funding (CN).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The Highway transaction dataset utilized to support the findings of this paper was derived from the Shandong Hi-speed Company Limited. However, the data are not available.

Acknowledgments

The authors would like to thank the support from the National Natural Science Foundation of China and the data support from the Shandong Hi-speed Company Limited.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. China Business Intelligence. The market analysis of ETC in China 2021. 2021. Available online: https://www.163.com/dy/article/GAP7JV57051481OF.html (accessed on 4 September 2022).
  2. EastMoney. The Spread of ETC in China. 2021. Available online: http://finance.eastmoney.com/a/202103281861734043.html (accessed on 4 September 2022).
  3. Richards, K.A.; Jones, E. Customer relationship management: Finding value drivers. Ind. Mark. Manag. 2008, 37, 120–130. [Google Scholar] [CrossRef]
  4. Soltani, Z.; Navimipour, N.J. Customer relationship management mechanisms: A systematic review of the state of the art literature and recommendations for future research. Comput. Hum. Behav. 2016, 61, 667–688. [Google Scholar] [CrossRef]
  5. Tsai, C.F.; Hu, Y.H.; Lu, Y.H. Customer segmentation issues and strategies for an automobile dealership with two clustering techniques. Expert Syst. 2015, 32, 65–76. [Google Scholar] [CrossRef]
  6. Cheng, Y.H.; Huang, T.Y. High speed rail passenger segmentation and ticketing channel preference. Transp. Res. Part A Policy Pract. 2014, 66, 127–143. [Google Scholar] [CrossRef]
  7. Akhtar, M.; Moridpour, S. A review of traffic congestion prediction using artificial intelligence. J. Adv. Transp. 2021, 2021, 8878011. [Google Scholar] [CrossRef]
  8. Cao, W.; Wang, J. Research on traffic flow congestion based on Mamdani fuzzy system. AIP Conf. Proc. 2019, 2073, 020101. [Google Scholar]
  9. Wen, F.; Zhang, G.; Sun, L.; Wang, X.; Xu, X. A hybrid temporal association rules mining method for traffic congestion prediction. Comput. Eng. 2019, 130, 779–787. [Google Scholar] [CrossRef]
  10. Adetiloye, T.; Awasthi, A. Multimodal big data fusion for traffic congestion prediction. In Multimodal Analytics for Next-Generation Big Data Technologies and Applications; Seng, K.P., Ang, L., Liew, A.W.-C., Gao, J., Eds.; Springer: Berlin, Germany, 2019; Volume 2022, pp. 319–335. [Google Scholar]
  11. Kong, X.; Xu, Z.; Shen, G.; Wang, J.; Yang, Q.; Zhang, B. Urban traffic congestion estimation and prediction based on floating car trajectory data. Future Gener. Comput. Syst. 2016, 61, 97–107. [Google Scholar] [CrossRef]
  12. Yang, Q.; Wang, J.; Song, X.; Kong, X.; Xu, Z.; Zhang, B. Urban traffic congestion prediction using floating car trajectory data. In Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing, Zhangjiajie, China, 18–20 November 2015; pp. 18–30. [Google Scholar]
  13. Fu, X.; Xu, C.; Liu, Y.; Chen, C.H.; Hwang, F.J.; Wang, J. Spatial heterogeneity and migration characteristics of traffic congestion—A quantitative identification method based on taxi trajectory data. Phys. A Stat. Mech. Its Appl. 2022, 588, 126482. [Google Scholar] [CrossRef]
  14. Nadeem, K.M.; Fowdur, T.P. Performance analysis of a real-time adaptive prediction algorithm for traffic congestion. J. Inf. Commun. Technol. 2018, 17, 493–511. [Google Scholar] [CrossRef]
  15. Huang, Z.; Xia, J.; Li, F.; Li, Z.; Li, Q. A peak traffic congestion prediction method based on bus driving time. Entropy 2019, 21, 709. [Google Scholar] [CrossRef]
  16. Li, S.; Zhang, J.; Zhong, G.; Ran, B. A Simulation Approach to Detect Arterial Traffic Congestion Using Cellular Data. J. Adv. Transp. 2022, 2022, 8811139. [Google Scholar] [CrossRef]
  17. Yan, X.; Song, C.; Pei, T.; Wang, X.; Wu, M.; Liu, T.; Shu, H.; Chen, J. Revealing spatiotemporal matching patterns between traffic flux and road resources using big geodata-A case study of Beijing. Cities 2022, 2022, 103754. [Google Scholar] [CrossRef]
  18. Han, S.H.; Lu, S.X.; Leung, S.C.H. Segmentation of telecom customers based on customer value by decision tree model. Expert Syst. Appl. 2012, 39, 3964–3973. [Google Scholar] [CrossRef]
  19. Kim, S.-Y.; Jung, T.-S.; Suh, E.-H.; Hwang, H.-S. Customer segmentation and strategy development based on customer lifetime value: A case study. Expert Syst. Appl. 2006, 31, 101–107. [Google Scholar] [CrossRef]
  20. Benitez, I.; Quijano, A.; Diez, J.L.; Delgado, I. Dynamic clustering segmentation applied to load profiles of energy consumption from Spanish customers. Int. J. Electr. Power Energy Syst. 2014, 55, 437–448. [Google Scholar] [CrossRef]
  21. Wu, R.-S.; Chou, P.-H. Customer segmentation of multiple category data in e-commerce using a soft-clustering approach. Electron. Commer. Res. Appl. 2011, 10, 331–341. [Google Scholar] [CrossRef]
  22. Ernawati, E.; Baharin, S.S.K.; Kasmin, F. A review of data mining methods in RFM-based customer segmentation. J. Phys. Conf. Ser. 2021, 1869, 012085. [Google Scholar] [CrossRef]
  23. Lajimi, H.F.; Majidi, S. Supplier segmentation: A systematic literature review. J. Supply Chain. Manag. Sci. 2021, 2, 138–158. [Google Scholar]
  24. Smith, W.R. Product differentiation and market segmentation as alternative marketing strategies. Mark. Manag. 1995, 4, 63. [Google Scholar] [CrossRef] [Green Version]
  25. Chiang, W.-Y. Discovering customer value for marketing systems: An empirical case study. Int. J. Prod. Res. 2017, 55, 5157–5167. [Google Scholar] [CrossRef]
  26. Ngai, E.W.; Xiu, L.; Chau, D.C. Application of data mining techniques in customer relationship management: A literature review and classification. Expert Syst. Appl. 2009, 36, 2592–2602. [Google Scholar] [CrossRef]
  27. Qian, C.; Yang, M.; Li, P.; Li, S. Application of customer segmentation for electronic toll collection: A case study. J. Adv. Transp. 2018, 2018, 3635107. [Google Scholar] [CrossRef] [Green Version]
  28. Tabianan, K.; Velu, S.; Ravi, V. K-Means Clustering Approach for Intelligent Customer Segmentation Using Customer Purchase Behavior Data. Sustainability 2022, 14, 7243. [Google Scholar] [CrossRef]
  29. Christy, A.J.; Umamakeswari, A.; Priyatharsini, L.; Neyaa, A. RFM ranking—An effective approach to customer segmentation. J. King Saud Univ.-Comput. Inf. Sci. 2021, 33, 1251–1257. [Google Scholar] [CrossRef]
  30. Zong, Y.; Pan, E. A SOM-Based Customer Stratification Model. Wirel. Commun. Mob. Comput. 2022, 2022, 7479110. [Google Scholar] [CrossRef]
  31. Alkhayrat, M.; Aljnidi, M.; Aljoumaa, K. A comparative dimensionality reduction study in telecom customer segmentation using deep learning and PCA. J. Big Data 2020, 7, 1–23. [Google Scholar] [CrossRef]
  32. Kodinariya, T.M.; Makwana, P.R. Review on determining number of Cluster in K-Means Clustering. Int. J. 2013, 1, 995. [Google Scholar]
Figure 1. The framework for travel pattern analysis.
Figure 1. The framework for travel pattern analysis.
Sustainability 14 14196 g001
Figure 2. The framework of SOM model.
Figure 2. The framework of SOM model.
Sustainability 14 14196 g002
Figure 3. Elbow methods for initial cluster number. (left) Variation of SSE (right) Variation of average Silhouette Coefficient.
Figure 3. Elbow methods for initial cluster number. (left) Variation of SSE (right) Variation of average Silhouette Coefficient.
Sustainability 14 14196 g003
Figure 4. Comparison of Silhouette Coefficient for various models.
Figure 4. Comparison of Silhouette Coefficient for various models.
Sustainability 14 14196 g004
Figure 5. Hourly travel demand for various clusters.
Figure 5. Hourly travel demand for various clusters.
Sustainability 14 14196 g005
Figure 6. Clustering results for vehicle and payment type.
Figure 6. Clustering results for vehicle and payment type.
Sustainability 14 14196 g006
Table 1. Highway transaction data format.
Table 1. Highway transaction data format.
FieldData typeDescription
Transaction IDIntThe Transaction ID of records
Vehicle plateVarVehicle plate number, unique
Entry Station IDVarThe station ID when vehicle enter highway
Entry TimeDateThe datetime when vehicle enter highway
Exit Station IDVarThe station ID when vehicle enter highway
Exit TimeDateThe datetime when vehicle enter highway
Transaction FeeFloatFee for highway toll
Other fields
Table 2. Statistics of the traveler attributes.
Table 2. Statistics of the traveler attributes.
AttributeMeanSt. Dev.MinMax
Trip Frequency5.158.321678
Total Fee (CNY)431.61471.87033,554.4
Travel Day3.354.17131
Weekday Trip3.786.57050.5
Peak-hour Trip1.473.24026.7
Repeated Rate0.550.300.051
Sample number8,904,690
Table 3. Evaluation metrics.
Table 3. Evaluation metrics.
Evaluation MetricK-MeansFuzzy C-MeansSOM
SSE5493.716336.514990.15
Silhouette Coefficient0.380.420.45
DB Index0.910.900.77
Table 4. Mean value of travel pattern attribute for various clusters.
Table 4. Mean value of travel pattern attribute for various clusters.
Mean Value.C1C2C3C4C5C6C7C8
Trip Frequency2.53513.28336.50115.069.30467.24619.739
Total Fee (CNY)113.78137.17218.751349.8811903.14670.711991.951009.36
Travel Day1.46612.25320.26911.3326.29926.90312.405
Weekday Trip00.7252.54128.84411.0787.03353.0215.33
Peak-hour Trip00.1840.5358.7061.7841.72415.7723.964
Repeated Rate0.46710.4210.3960.3010.3150.4150.32
Passenger Vehicle%94.86%84.95%87.37%72.78%12.10%70.87%77.75%71.21%
Freight Vehicle%5.09%14.78%12.53%27.13%87.89%29.05%22.13%28.71%
Vehicle Number809,5272,276,2953,884,026134,85790,7521,169,87050,265389,098
Vehicle Proportion9.19%25.85%44.11%1.53%1.03%13.29%0.57%4.42%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Jia, J.; Shao, M.; Cao, R.; Chen, X.; Zhang, H.; Shi, B.; Wang, X. Exploring the Individual Travel Patterns Utilizing Large-Scale Highway Transaction Dataset. Sustainability 2022, 14, 14196. https://doi.org/10.3390/su142114196

AMA Style

Jia J, Shao M, Cao R, Chen X, Zhang H, Shi B, Wang X. Exploring the Individual Travel Patterns Utilizing Large-Scale Highway Transaction Dataset. Sustainability. 2022; 14(21):14196. https://doi.org/10.3390/su142114196

Chicago/Turabian Style

Jia, Jianmin, Mingyu Shao, Rong Cao, Xuehui Chen, Hui Zhang, Baiying Shi, and Xiaohan Wang. 2022. "Exploring the Individual Travel Patterns Utilizing Large-Scale Highway Transaction Dataset" Sustainability 14, no. 21: 14196. https://doi.org/10.3390/su142114196

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop