Comprehensive—Model Based on Time Series for the Generation of Traffic Knowledge for Bus Transit Rapid Line 6 of México City

Díaz-Casco, Manuel A.; Carvajal-Gámez, Blanca E.; Gutiérrez-Frías, Octavio; Osorio-Zúñiga, Fernando S.

doi:10.3390/electronics11193036

Open AccessArticle

Comprehensive—Model Based on Time Series for the Generation of Traffic Knowledge for Bus Transit Rapid Line 6 of México City

by

Manuel A. Díaz-Casco

¹,

Blanca E. Carvajal-Gámez

^2,*

,

Octavio Gutiérrez-Frías

²

and

Fernando S. Osorio-Zúñiga

³

¹

Instituto Politécnico Nacional, ESCOM, Juan de Dios Bátiz, Nueva Industrial Vallejo, Ciudad de México 07320, Mexico

²

Instituto Politécnico Nacional, SEPI-UPIITA, Av. Instituto Politécnico Nacional, La Escalera, Ciudad de México 07340, Mexico

³

División de Estudios de Posgrado de la Facultad de Ingeniería (DEPFI), Universidad Nacional Autónoma de México UNAM, Coyoacán 04510, Mexico

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(19), 3036; https://doi.org/10.3390/electronics11193036

Submission received: 19 August 2022 / Revised: 18 September 2022 / Accepted: 20 September 2022 / Published: 24 September 2022

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Mobile sensor networks consist of different types of integrated devices that collect, disseminate, process and store information from the environments in which they are implemented. This type of network allows for the development of applications and systems in different areas for the generation of knowledge. In this paper, we propose a model called the Metrobus Arrival Prediction (MAP) model for predicting the arrival times of Line 6 buses of the bus rapid transit (BTR) system, known as the Metrobus, in Mexico City (CDMX). The network is composed of mobile and static nodes that collect data related to the speed and position of each Metrobus bus. These data are sent to the proposed time series model, which yields the Metrobus arrival time estimation. MAP allows the density of users projected during the day to be estimated with a time series model that uses the data collected and the historical data of each station. A comparison is made between the model results and the arrival time obtained with real-time traffic monitoring applications, such as Moovit and Google Maps. The proposed model, based on time series, takes the historical data (data of trajectory times) as reference to start the first arrival times. From these values, MAP feeds on the data collected through the sensor network. As the data are collected through the sensor network, the estimates present results, for example, the mean absolute error (MAE) of the expected time was less than 0.2 s and the root mean square error (RMSE) of the expected value was below 1 for the proposed model. Compared to real-time traffic platforms, it presents a value of 0.1650 of the average dispersion obtained in travel times. The obtained values provide certainty that the data shown presents results as accurately as a real-time platform that requires the data at the moments in which the traffic variations occur. Moreover, unlike other state-of-the-art models that rarely interact on the site, MAP requires a reduced number of variables, being an accessible tool for the implementation and scaling of real-time traffic monitoring.

Keywords:

knowledge model; bus travel prediction; mobile wireless sensor networks; time series

1. Introduction

To help manage and estimate traffic, various technologies have been integrated, such as sensors, processing cards, wireless or wired communications, video cameras, sensors, and global positioning systems (GPS). With the objective of collecting and processing data, a traffic monitoring mechanism, known as an intelligent transportation system (ITS), is employed. ITSs are used to monitor and control traffic in cities; these systems mainly collect traffic data and increase the efficiency with which existing resources are used while providing information feedback between the nodes [1].

These types of systems were originally created for use on highways with computerized urban traffic signal control systems. Pioneers of this technology are the United States (US), Europe, and Japan. In the US, the first ITS application appeared in 1967 with the Electronic Route Guidance System (ERGS) project. Subsequently, the development of vehicle infrastructure integration was proposed to provide communication channels between vehicles on the road and their physical environment and improve safety [2].

ITS implementation emerged in Japan in 1960 for traffic control. By 1990, all the densely congested metropolitan areas, including the Tokyo and Osaka expressways, already had their own central control systems [1].

In Europe, several ITS-related projects have been developed. In 1970, the development of the road transport system (RTI) began [1]. ITSs have been perfected and modernized, such as the Dedicated Road Infrastructure for Vehicle Safety in Europe (DRIVE) [2]. In 1980, China launched its first ITS project in response to the advancements in the US, Europe, and Japan. In 2008, China also established its own ITS society, the China Intelligent Transportation System Association [2]. These systems mainly perform traffic data collection and increase the effective use of existing resources while providing information feedback [3]. The most complex ITSs collect and manage data from sensors, global positioning systems (GPSs), surveillance cameras, and data acquisition and processing cards, in conjunction with data storage centers [2,3]. The collection and interpretation of these data directly impact the quality, safety, and efficiency of an ITS. The efficient and intelligent use of the data collected can establish a context for a trip planner. In the case of Mexico, few attempts have been made to combine ITSs for traffic monitoring. The main applications of ITSs are focused on the operation of commercial vehicles (CVO), which uses the satellite location of trailers to reduce vehicle theft [4]. The closest example of an ITS is the transport system known as bus rapid transit (BRT), which was created in 2005. It has an integrated prepayment system facilitating entry at each station and transfer [5]. However, even with the advantages that this type of transport system can have, Mexico does not have a comprehensive ITS for estimating traffic and user density.

According to the traffic index by Tomtom, in 2019, Mexico City ranked 13th in the world in terms of traffic congestion, and citizens spent approximately 195 h per year stopped in traffic. Although it is true that in 2020, there was a decrease in traffic congestion to 135 h spent in stopped in traffic, it is still a persistent problem in large cities. Highlighting that even during the COVID-19 pandemic, this value is only 60 h below the record set in 2019 [6]. This time invested causes other problems for the general public related to health, the environment, and economic losses.

The main goal is to develop a traffic information system for BRT in Mexico City (CDMX), based on dynamic time series. This model is capable of modeling different traffic situations. This system involves, through an architecture of mobile and static nodes, the collection of data on the speed and position of the Metrobus buses. The model mainly starts to generate the timestamps from the historical data, as it starts to collect data through the mobile and static nodes. These feed the proposed time series called Metrobus Arrival Prediction (MAP), which generates estimates of arrival times for each of the stations that make up Metrobus Line 6. A comparison of the arrival times obtained from real-time traffic monitoring applications (apps), such as Moovit and Google Maps, with the estimated is made. The results obtained for the mean absolute error (MAE) of the expected time are less than 0.2 s. In the case of the root mean square error (RMSE) parameter, a result below 1 for the expected value was obtained.

The main contributions are as follows:

(i): Development of a descriptive model of traffic for the BRT in Mexico City (CDMX) based on dynamic time series, capable of adapting to the weather and the dynamics present at the time of year.
(ii): An architecture that combines static and mobile sensor networks with a simple infrastructure that allows position and velocity data to be acquired, which are used for a time series to estimate arrival times at each station and determine the density of users.
(iii): A model based on the proposed time series that provides a prediction that can be used to help in making decisions regarding travel schedule selection, in the 35 stations that make up Metrobus Line 6.

Finally, in this work, we propose a model based on dynamic time series that adapts to the various scenarios present in CDMX. The model is proposed to work in conjunction with a mobile sensor network architecture. These sensor nodes acquire data from real scenarios, which manage to estimate the arrival times of the buses at each station on Line 6. As the time series model is fed, the results will be closer to those estimated by some real-time traffic monitoring applications, such as Google Maps or Moovit.

This paper is divided into the following sections. In Section 2, the work related to the investigated topic is briefly reviewed, and Section 3 provides the current panorama of public transport in CDMX. In Section 4 and Section 5, the design of the proposed model is presented, as well as the technologies and theories that were used as the basis of the research. In addition, a scenario where we implement the proposed model is described in Section 6. A discussion of the results obtained and their scope are provided in Section 7. Finally, Section 8 presents the conclusions obtained from the research and describes the advantages of the proposed model.

Problem Description

The mobility situation in CDMX is complex due to its geographical location, since it occupies only 0.1% of Mexico [7]. According to the United Nations (UN), the population density of CDMX is 20,116,842 [8], with a density of inhabitants per square kilometer of 5741, in addition to the number of people who enter daily from neighboring states, such as Hidalgo, Estado de México, and Morelos. This generates a greater demand for public transport and concessionaires, making efficient traffic management a requirement. The CDMX is a strategic point for work centers, tourism, schools, and shops, generating high concentrations of cars, buses, motorcycles, bicycles, and pedestrians, which increase during commuting hours. Therefore, approximately 2,437,414.33 people used transportation daily in the CDMX in December 2020 [9]. Currently, the CDMX has the following three types of transport services: public, private, and concessioner. The public transportation network is the Metro, which represents 38.2% of the user transport, followed by the Passenger Transport Network (RTP), with 8.84% of users, and the Electric Transport Service (Trolleybus), with 1.49% of users [9]. Private transportation consists of 51.47% minibuses, followed by 11.2% taxis, and finally, 8.8% Metrobus buses, which is a concessionary transport [9]. The latter transported 34,251.6 users in October 2019 [9]. The information produced by a monitoring system can be used by automated trip planners to alert passengers or propose alternative trip routes for passengers when any simple activity change occurs [10]. This is the main challenge of creating a system for collecting and processing information on trips made by the Metrobus buses on Line 6. Metrobus Line 6 has 35 stations with a length of 20 km located in the Gustavo A. Madero and Azcapotzalco city hall (see Figure 1). It has connections with Metrobus Lines 1, 3, and 5 and with Metro Lines 3, 4, 5, 6, 7, and B along with the Central Camionera del Norte. Their operational hours are from 04:30 h. until 24:00 h. It transports around 150,000 passengers a day [11]. The selection criteria for Metrobus Line 6 lies mainly in the fact that it is one of the lines with the greatest variability of possible test scenarios. It covers school areas (National Polytechnic Institute, Metropolitan Autonomous University-Azcapotzalco, Bachilleres, etc.), as well as hospital areas, commercial areas, industrial areas (Vallejo), etc. For this reason, it is important to generate a trip planner that provides security and reliability to users who need to get to their jobs, hospitals or schools, considered one of the main objectives of CDMX. The proposal contributes the following: (i) a sensor network for traffic monitoring using the minimum required infrastructure, which will collect data to feed an arrival time prediction model, (ii) a model based on time series that adapts to the heterogeneous scenarios present in CDMX, and (iii) a model based on the proposed time series that provides a prediction that can be used to help in making decisions regarding travel schedule selection.

In addition to these ideas, we present the following research questions:

How adaptable will the proposed time series be in conjunction with the network architecture for Metrobus arrival forecasting on Line 6?
How efficient is the data collection with the minimum expenditure of resources with the proposed architecture?
Will the proposed time series model adapt to the various scenarios that occur in the CDMX?

The rest of this document shows in more detail the main components proposed for the implementation of this investigation on Metrobus Line 6.

2. Related Work

Estimating the bus arrival time is a problem due to multiple external factors. Examples include daily traffic due to the season and date and time of day, as well as random events [13]. Sometimes reality differs from the program when discussing travel estimation. For this reason, ITS technologies have been supported by traffic estimation models to provide better service; these models are divided into the following categories [14]: (a) historical average models; (b) linear regression models; (c) machine learning models; and (d) Kalman filter models. Historical average models are developed based on historical data collected from bus travel times or arrival times. These are one of the most reliable model types when the traffic flow is small and stable [14]. Some of these models date back to 1999 due to the proposal in [15]. Linear regression models use multivariate statistical analysis to examine the linear correlations between a set of independent variables and a single dependent variable [14,15,16]. Some authors have proposed the implementation of models based on machine learning, as shown in [13,17], as well as [15,18,19,20,21,22]. One drawback is the number of parameters that must be considered, as well as the number of hidden layers that a neuron must have to present the best result. (d) Kalman filter models are commonly used for the prediction of arrival times. The first pieces of work, which combine GPS with historical data, date from 1999 in [14,22]. Later, an improvement was made to the proposal using a model with heterogeneous traffic to predict travel times in Chennai, India [22]. A hybrid method is also proposed for the prediction of bus rapid transit (BRT) travel times using GPS on Line 2 in Chaoyang in the Beijing district [17]. Other related research on traffic estimation, including the research by Maa et al. (2019) [23], proposes a route segment-based approach for travel time prediction. They use data extracted from buses and taxis through GPS. The estimates obtained are extracted under normal and abnormal traffic conditions, and two main base models are presented. The first model with time series estimates uses the historical times of buses and taxis. The second model takes the following three grouping methods as a reference: a support vector machine (SVM), an artificial neural network (ANN) and the nearest neighbor (K-NN), together with dynamic time warping (DTW). Yuan Y. et al. (2020) [24] proposed a mechanism for dynamically predicting the arrival time of a bus. The authors combine a recurrent neural network (RNN) with wide and deep insertion (WDI). In their research, they process the historical GPS sensor data of bus and taxi trips from public databases.

Pettersen N. et al. (2019) [14] proposed a model of a deep neural network with multiple outputs, multiple time steps, and a combination of convolutional layers and long short-term memory (LSTM). The data used belong to a nonpublic database, and the validation of the results obtained are empirically cited.

Several studies related to bus arrival time prediction systems have been presented; in most of these studies, they are carried out using data already collected (static tables) through GPSs owned by transport concessionaires [14]. Subsequently, these data are processed to predict the bus arrival times. Most of these pieces of work are based on the application of neural networks, and the persistent disadvantage of these models remains a complexity and computational cost, which is due to the training present in the different study scenarios. Derived from the above, there is still an open problem to be addressed. This implies the need to model a bus arrival prediction system using dynamic data (dynamic tables during the working day). Together with historical data, we obtain the possible scenarios present in certain periods of time. This model allows transport users to view updated results that are attached to those presented by traffic monitoring apps in real time.

In Table 1, we show a summary of the characteristics of these pieces of work related to the simulation model implemented, test scenario, type of data (static or dynamic), and input variables. Additionally, in Table 1, we show the proposed architecture (implemented hardware) and whether the prediction is made by segment or route.

3. General Proposed Architecture for Traffic Analysis

The proposed model consists of the following five layers: the sensor network layer, the connectivity layer, the interpretation-storage layer, the inference layer, and the application layer (see Figure 2). In each of the layers, a particular function is performed for the collection, sending, storage, estimation, and visualization of the proposed model based on the time series. The data collected will allow the speed, time, and location data from trips made to be fed to the proposed model for arrival prediction.

3.1. Sensor Network Layer

The sensor network layer consists of static and mobile nodes for data collection. Through the mobile nodes installed in Metrobus buses, the position of the bus on the journey is recorded. Static nodes are located at each station and are responsible for collecting data from the mobile node and sending it to the storage and interpretation layer. In Figure 3, we describe the general architecture of the sensor network.

3.1.1. Mobile Sensor Networks for Monitoring, Generation of Itineraries, and Route Analysis

To validate the Metrobus Arrival Prediction (MAP) model and the sensor network architecture proposal, a Galaxy J7 Prime smartphone was placed in each Metrobus bus to be used as a mobile node. A GPS sensor and an accelerometer are embedded in the smartphone Galaxy J7, which allowed us to collect the position and speed information of the bus (see Figure 4). One of the requirements was that the internal infrastructure of the buses was not to be modified. The mobile node connects wirelessly with static nodes (stations). Due to its portability and cost, a mobile device was chosen as a sensor because of the features it provides, such as a gyroscope, accelerometer, wireless connectivity, battery, storage, and processing capacity. A static node accesses the information from a mobile node through a wireless connection available at each station, as shown in Figure 3. For the static node, the Raspberry Pi 3 B with a Wi-Fi Raspberry module communication module was used, which is responsible for sending the collected information to a remote server, see Figure 4.

3.1.2. Connection Layer

In this layer, communication is established between the static nodes, mobile nodes, and the remote server. The communication of the mobile node can switch depending on the availability of the Wi-Fi network. If connectivity with the Wi-Fi network is lacking, communication is carried out through cellular communication under the General Packet Radio Service (GPRS) protocol or with the Global System for Mobile communications (GSM). The switching between communication protocols is due to the lack of connectivity in various geographical areas covered by Line 6. Regarding the static node that is placed in the station, the Wi-Fi network is used to connect the devices. This is because to obtain the information on the location of a vehicle unit, it is necessary to send the data to the server located in the trunk station and to the unit, which is usually moving during a trip.

3.1.3. Storage and Interpretation Layer

In the third layer, or the information storage layer, a general structure is established to manage the storage of the data, information, and knowledge that are used. In this layer, the data acquired by the sensor network are updated.

3.1.4. Presentation Layer

In the interpretation layer and in the information presentation layer, the results of queries made to the system are displayed to estimate the arrival of a bus at the desired station. This information feeds back to the prediction system together with the measurements made by the sensor network. These data are entered into a time series model, thus allowing the generation of arrival time knowledge. To visualize the results obtained by the time series model and the sensor network, we developed a mobile application, as shown in Figure 5.

To verify the results, an application was developed to visualize the results; simultaneously, a traffic visualization application was used to make a subjective comparison of the results obtained. Figure 6 shows the results obtained for the days on which the bus was monitored. The main screen first shows the coordinates (latitude and longitude) and the vehicle speed obtained from the GPS. In Figure 5a, we observe that the speed estimated by the GPS sensor of the mobile device has a coincidence value of 90%, compared to a real-time traffic application. In Figure 5b, we observe the estimate obtained by the sensor of the mobile device, which presents a coincidence of 90%.

4. Metrobus Arrival Prediction Model

In this section, we provide a more detailed description of the proposed model based on time series for the prediction of Line 6 Metrobus arrival.

4.1. Scenario Trip Test

For the validation of the proposed architecture, the Metrobus Arrival Prediction (MAP) model was applied to a test scenario. The following different test scenarios were developed: (a) the first scenario was obtained using the trip matrix, which represents trip planning; (b) the second test scenario was the implementation of the sensor network proposed on Metrobus Line 6 and double-monitoring of the sensor network at the San Bartolo, Instituto Politécnico Nacional, Río Bamba, and Deportivo 18 de Marzo stations, which are central transfer points to other transit routes; and (c) finally, the results are compared to those of real-time traffic apps, such as Google Maps and Moovit.

4.1.1. Data Collected

To start the training, the predictive model was divided into the following two phases: (i) the time series starts working with historical data through previous Metrobus trip planning. These historical data were provided by Metrobus in order to start the first results of the series, see Table 2, and (ii) data are collected through the network of mobile and static sensors installed on Line 6 of the Metrobus. The data request is made by consulting the mobile node (Galaxy J7). The query format used is TYPE_ACCELEROMETER and TYPE_ORIENTATION. With these parameters, the speed (ID_VELOCITY) and the current position (ID_POSITION) of the vehicle are obtained.

The test scenario is Line 6 of the Metrobus. From the data obtained using the sensor network, a user’s estimated time of arrival at the stations will be shown according to traffic conditions. The data collected are comprised of a schedule of 06:00–10:00, 12:00–16:00, and 18:00–22:00 h during the months of November and December 2018 and January and February 2019. Seventy-six vehicles were tracked with an approximate travel distance of 280 km per day per vehicle. The mobile nodes collected the data continuously during this period. Static nodes, similar to mobile ones, established communication with the remote server. Finally, we want to use the collected data to generate information about the context. Based on the collected information, knowledge will be generated for decision making according to the needs.

4.1.2. MAP Description

In this section, we will describe each of the elements that make up MAP. The variables associated with the data collected by the mobile and static nodes are mentioned to obtain the prediction values.

The MAP model applies a specific type of knowledge from the variables that can be monitored by the sensor network. For the estimation of the arrival times of the units, a model based on linear prediction was built, and for the case of the density of users, a model based on the distance between the units, the number of units in service, and the period of the day was used. Both models were adapted to the conditions for use with the Metrobus [20,24,27]. The variables may vary due to different conditions, such as vacation periods, school entry-exit times, marches or manifestations, natural phenomena (rain, earthquakes, etc.), and other incidents that may occur outside the planning of a trip [34]. Next, we explain how the adaptation of these models was developed and the nature of each of them.

4.1.3. Segment Routes

Metrobus Line 6 covers a distance of 20 km and is divided into stations. The average distance between each station is approximately 600 m (this may vary depending on the area and the topology of the line), and this portion of the route between stations represents a segment of the route, which is used as a reference to obtain the data (see Figure 6).

The MAP model used the traffic conditions of each section of the route, considering the information established for each unit. Each of the segments may present different traffic conditions (Figure 7). The conditions depend on the period of the day and the local characteristics.

MAP is a model based on dynamic time series, which allows the estimation of the arrival times of the Metrobus buses from the data collected by the network of mobile and static sensors. The main objective of MAP is to obtain the estimated time of arrival of the vehicle. For the test scenario (Line 6), we must know the distance between each segment, which is obtained from the ID_POSITION variable. From the data collected by the mobile node, the speed of the vehicle identified with the ID_VELOCITY variable is determined.

4.1.4. Average Speed

The average speed is obtained from the data collected through ID_POSITION. The average speed of a unit is obtained as follows:

\bar{v_{i}} = \frac{S_{i f} v + S_{i b} v_{a i}}{S_{i f} + S_{i b}}

(1)

where

S_{i f}

is the distance traveled by the unit from each segment,

S_{i b}

is the distance (ID_POSITION) to travel in the segment,

v

is the current speed (ID_VELOCITY), and

v_{a_{i}}

is the average speed in the current segment according to the most recent trips in the record. This parameter obtains the average speed of a Metrobus bus between segments.

4.1.5. Travel Time between Segments

This value allows the estimation of travel times between segments to obtain the final arrival estimate, which is obtained as follows from Equation (2):

t_{j} = [t_{d} (1 + ρ_{s} (j))] + t_{l p} (j) + Δ k t_{l w}

(2)

where j represents the segment traveled with the current conditions modified by the traffic conditions,

t_{d}

is the base delay time of approximately 15 s per station, and

ρ_{s} (j)

is the density of the users in the station. These data are extracted directly from the people who enter and leave the station, which is provided by the history of ascents and descents of Metrobus.

t_{l p} (j)

is the estimation time using linear prediction,

Δ k

is the average number of traffic lights included in the route, and

t_{l w}

is the duration time of each red light.

The estimated time by linear prediction is obtained by dividing the distance of segment

j

by the estimated speed calculated with the linear prediction. This relationship is shown in Equation (3).

t_{l p} (j) = \frac{S_{j}}{{\hat{v}}_{j}}

(3)

where

S_{j}

is the distance of segment

j

, and

{\hat{v}}_{j}

is the calculated estimated speed used in the linear prediction shown in the next section. This prediction changes according to the traffic conditions and the changes in the traffic light signals during the trip.

4.1.6. Linear Prediction

To estimate the future values related to Section 4.1.4 and Section 4.1.5, the linear prediction estimates the value of the next element according to current traffic conditions from a set of previous values collected through historical trips and data collected from the trips that are currently in progress. In this case, when applying the linear prediction to the speed history of the units that have traveled segment j, it is possible to obtain an estimated speed according to the local conditions of the segment. The reconstruction of the series related to the velocities is obtained according to Equation (4).

{\hat{v}}_{j} (n) = \sum_{i = 1}^{p} a_{i} {\bar{v}}_{l} (n - i)

(4)

where

\hat{v_{j}} (n)

is the series of velocities of segment

j

,

{\bar{v}}_{l} (n - i)

is the recorded speeds,

p

is the number of samples within segment i, and

a_{i}

is the prediction coefficient.

4.1.7. Velocity Prediction Coefficient

According to the data obtained from Equation (1), and as the data are collected through the mobile nodes, it will be possible to predict the speeds that the buses are traveling during the travel day.

To obtain the prediction coefficients of each series of velocities, an autocorrelation matrix is used between the values of the

p

, previous samples considered (see Equation (4)), as shown in Equation (5).

\vec{a} = {(Q^{T} Q)}^{- 1} Q^{T} {\vec{v}}_{j}

(5)

where

Q

is the autocorrelation matrix, and

Q^{T}

represents the transposed autocorrelation matrix.

4.1.8. Predicted Time

Finally, based on the values obtained in Section 4.1.4 and Section 4.1.5, the arrival times to the various segments are used to obtain the estimated time of the entire journey.

Equation (6) represents the general model for the estimation of the arrival time (T_prediction) of a unit along its route.

T_{p r e d i c t e d} = \frac{S_{N S i}}{\bar{v_{i}}} + \sum_{j = i + 1}^{i + n} t_{j}

(6)

where

t_{j}

is the time it takes the unit to travel the actual segment identified j,

S_{N S i}

is the remaining distance to the next station, n is the number of the destination station (the station to which you want to know the arrival time), and

\bar{v_{i}}

is the average speed of the unit in segment

i

.

From Equations (5) and (6), it is possible to predict the Metrobus speed between segments, considering the different traffic conditions. We obtain this parameter to calculate the average speed that a Metrobus bus is traveling between segments, either for normal traffic conditions or some other situation that may cause a change in the trip plan, to generate the possible speed values for any traffic condition.

4.1.9. User Density Estimation

To corroborate that the density of users is related to traffic during peak hours on Metrobus Line 6. User density estimates were made at different times and days of the week according to the proposed sampling schedules. The data referring to the number of users in the stations are taken directly from the entry records of each station. The density of users is relevant information to corroborate the Metrobus demand with respect to the available units, and thus to achieve adequate planning. Next, we describe the proposed equations corresponding to the user density calculation.

4.1.10. User Density

The function

N (m, n)

represents the number of passengers in the unit, as in Equation (7).

N (m, n) = \sum_{i = 1}^{n - 1} (F (m, i) - O (m, i))

(7)

where

m

is the identifier of the transport unit,

n

is the station (ID_Station),

F (m, i)

is the number of passengers boarding the unit,

i

is the station, and

O (m, n)

is the number of passengers exiting from the unit. The number of users is obtained from the history of income at each station.

4.1.11. User Density Prediction

The estimate of the user density per station

ρ_{s} (n)

is given as follows in Equation (8):

ρ_{s} (n) = \frac{S (n) - \sum_{i = 1}^{m} F (i, n)}{C (n)}

(8)

where

S (n)

is the number of users that enter the station,

n

, and

C (n)

is the user capacity of the station.

4.1.12. Unit User Density Prediction

Equation (9) shows the procedure for estimating the density of users in a unit according to the distance traveled,

ρ_{B} (m)

.

ρ_{B} (m) = \frac{\sum_{n \in S_{m}} N (m, n)}{L (m)}

(9)

where

N (m, n)

denotes the number of users that have been calculated as on board, the unit per

n \in S_{m}

at each station is obtained from Equation (7), and

L (m)

is the user capacity of the unit. This result allows of the number of passengers allowed in each unit to be controlled to between 160–240 users.

4.1.13. User Density Prediction for the Day Period

The prediction of user density is obtained to estimate the demand in each period of time during the day. In this way, it allows the number of buses that must be available during the day to be organized. For the density of users according to the period of the day,

ρ_{B} (p)

, Equation (10) is used.

ρ_{B} (p) = \frac{\sum_{m \in p} ρ_{B} (m)}{B (p)}

(10)

where the density of users according to the period of the day is estimated from the sum of the densities of users in units that have circulated in each period,

p

, according to Equation (9).

B (p)

is the number of units that were scheduled in the specified period,

p

. In this case, it has been established that

p

comprises a period of one hour.

ρ_{B} (p)

will return the user prediction given a time or period of the day. This will allow the creation of an advance planning scheme for Metrobus units according to the period. The density of users offers a broader scenario based on the demand conditions and trip delays. Due to the greater demand for buses, traffic conditions have changed during the day.

4.1.14. Average User Density Prediction

Equation (11) estimates the average density of users in units when passing through a specific station.

ρ_{B} (n) = \frac{\sum_{m} N (m, n)}{\sum_{m} L (m)}

(11)

where

\sum_{m} L (m)

is the sum of the load capacity of the units that have passed through station n. Equation (11) allows management to determine the statistical data for future actions. From this data analysis, it allows us to know the temporality of the present situation during the day. These temporalities allow the identification of the segment and period of time in which the most complex traffic conditions occur. A flow diagram to obtain the prediction time is shown, Figure 8.

5. Study Case: Line 6 of Metrobus CDMX

5.1. Scenario Description

Line 6 of the Metrobus was used as a real test scenario, which is managed by a driver assigned to a transport unit. Line 6 has a total of 35 stations along the El Rosario-Villa de Aragón and Villa de Aragón-El Rosario routes. Units wait 15 s at each station and the time spent waiting for traffic lights to change is approximately 10 min on average. The average speed of the units is 40 km/h, which changes according to the traffic conditions in the city and is also measured by a GPS (Galaxy J7) sensor installed in each unit. The Metrobus buses travel in dedicated lanes during travel. In these lanes, there are no disturbances, such as intercessions with other buses or various private vehicles. The greatest conflict occurs at transfer points that are isolated from these dedicated lanes and from the station in particular. Each unit has a maximum capacity of 160 users while each station has a capacity of 4 people per square meter of surface. The stations of Line 6 are an average of 80 square meters, with a capacity of 320 users waiting at the station or 160 persons waiting to travel in each direction.

5.2. Collected Data Preparation

As a first step for the analysis of the route, measurements were made of the distances between the stations of the selected route. To perform this task, the distances between each station where the units pass were mapped. Since the units travel the same distance, the distance traveled is considered the same for all the units on the route. The data collected include time, latitude, and longitude. To obtain the first distances from the time series model, the data were recorded manually (people in the work area) during a round trip. Together with the information obtained from the matrix of scheduled trips (eight complete trips), the station arrival times, the intervals that exist between stations, and the arrival times at each station are estimated. This information is the initial data entered for training the time series model, and it will be updated as the information is collected through the sensor network. To validate the MAP model, a network of sensors is implemented to estimate tracking, generate itineraries, and analyze public transport route data. We implement the mobile sensor network as a comprehensive traffic data collection system. The system consists of a hardware module (sensors, acquisition card), a mobile application for viewing the results, and remote servers as data managers. The system obtains the travel times between each station of Metrobus Line 6. Once the distances between each station (segment) of the route are registered, the times of the different routes are recorded to later sort the information according to the time of day. Subsequently, the sensor network collects the times of the different routes. The information collected was categorized into the following three different groups: normal traffic conditions, no traffic, and traffic. Figure 9 shows a graph of the routes according to the traffic conditions, and the results are obtained from Equation (1).

According to Figure 9, the blue line corresponds to a trip made under normal traffic conditions. The red line in Figure 9 corresponds to a trip made under abnormal traffic conditions. We observe that as a unit advances along the route, the user demand increases as a result of peak traffic hours present in CDMX. That is, we can deduce that as time progresses, traffic conditions change, increasing passenger demand. The aqua blue line corresponds to a trip made in conditions without traffic. We can see that the number of users remains below the blue and red lines, as measured by user density. In conditions without traffic, the demand for units decreases.

As Figure 9 shows, the more complex the traffic conditions are, the longer the units take to complete the route. Although the local traffic conditions should not affect the Metrobus transport service, due to its dedicated lanes, it was observed that the journey times change drastically according to traffic conditions. The reason is mainly due to the increase in the number of passengers at the stations, as the increase in the number of passengers boarding and deboarding the units causes delays at the stations. Likewise, an increase in vehicles in the city changes the operating conditions of the traffic lights. This can be controlled manually by modifying the times of the lights. On some occasions, the dedicated lane may be invaded by vehicles that impede the circulation of the unit, causing the unit to stop more times than expected. The identified intervals correspond to the traffic schedules that occur during the day. On a business day on the analyzed route, an average of 390 trips are scheduled with an interval of 3 min between each unit.

In Table 3, we show the initial sampling interval in the first column, followed by the initial time of the trip, the final time of the trip, as well as the usual traffic conditions during that time, and, finally, the number of trips made by the Metrobus unit. The estimated times obtained from Equation (1) are shown in the start time and end time columns. From these results, it is concluded that with higher traffic, the demand for buses increases as a result of the user demand at the stations. The hours shown in Table 3 coincide with the peak traffic hours, which occur between 06:00 a.m. and 10:00 a.m. and from 6:00 p.m. to 10:00 p.m. in CDMX [6]. In addition, the times with the least traffic are from 04:30 a.m. to 06:00 a.m. and 22:00 p.m. to 00:00 p.m. according to [6]. According to Table 3, in the hours with less traffic, the number of trips is reduced due to the demand for units, unlike during peak hours, when a greater number of trips and units are required to meet passenger demand.

As shown in Table 3, at times without traffic conditions, 70 trips are made, corresponding to 17.95% of the total trips, while at times under normal conditions, 80 trips are made, corresponding to 20.51% of the total. Finally, at the times under traffic conditions, 240 trips are made, corresponding to 61.54% of the total. It is observed that as traffic conditions become more difficult, the demand for transport increases. For this reason, there is a greater number of buses in operation. This information is considered in the estimation of arrival times because it describes the local traffic conditions that could occur during each interval of the route, as shown in Table 3. With the results presented in Table 3, an a priori estimate of the minimum number of trips that a Metrobus unit must make is obtained. This allows managers to decrease the number of buses on the line during the hours from 4:30 a.m. to 6:00 a.m. and from 10:00 p.m. to 12:00 a.m. on a particular day. Additionally, from Table 3, we can deduce need for a greater number of buses to be managed during the hours of 6:00–10:00 and 18:00–22:00.

6. MAP Model Analysis Results

From the above considerations, we analyzed the data collected until this point and proceeded to verify the MAP model. As a result of data collection and storage, a detailed record of each variable to be monitored was obtained. The units were monitored through each node. With the times recorded and stored in the database, tests were carried out with a different number of trips to consider the construction of the autocorrelation matrix of the prediction model. For each number of routes considered, we performed tests with a different number of N samples to obtain the arrival times. The combinations considered and the results obtained are presented in Table 4. In Table 4, the first column represents the test number. The second column is the trip number considered in the autocorrelation matrix. The third column is the number of samples. Subsequently, the next column represents the mean square error (MSE) value obtained, and, finally, the last column is the standard deviation (SD) of each test performed. These last measurements obtained represent the general variation of the training.

From Table 4, we observe that according to the MAP model, as we perform training, information is generated together with the number of samples. We observe from Table 4 that the historical data introduced in MAP in the first tests present a high dispersion with respect to the average value of the duration of the trips of a journey. In addition, as the data are updated, the dispersion of these tends to approach the average value of the travel times. This would be interpreted as the average of the approximations to the values shown by the traffic platforms in real time.

The best result is obtained when trip = 300 for the autocorrelation matrix, with N = 50, and we obtain an MSE value of 0.0272 s and SD = 0.1650. This shows that as the MAP model is fed the data collected by the sensor network, the estimated arrival approximation of each Metrobus unit at each station will present a result close to that obtained by the real-time apps Moovit and Google Maps. We can also observe from Table 4 that the value obtained from the MSE improves as the sample and trip numbers increase. Graphically, the results obtained are shown in Figure 10 and Figure 11. The results obtained from the estimation of speed are obtained from Equations (3), (5) and (6).

From these graphs, we can observe the behavior of the results obtained by the MAP model. For the cases studied, 50 trips and 50 previous samples are considered, and 300 trips and 50 samples are considered. From Figure 10, we observe that using the results obtained in the first 50 trips, the MAP model tends to slowly learn the behavior of the speed variable. The value of this variable will be adjusted as the data are entered into the MAP model. In this way, the MAP model will begin to generate knowledge of the behavior of the units in that segment of the route. As data are fed into the MAP model, more accurate knowledge is generated. This knowledge will support the decision making of both the users and the administrative managers of the Metrobus units, as shown in Figure 11. Figure 10a shows the real speed vs. the estimated speed from the linear prediction made. Figure 10b shows the real time vs. the estimated time, also obtained by linear prediction with 50 samples used. Figure 11a shows the real speed vs. estimated speed based on the linear prediction made. Figure 11b shows the real time vs. the estimated time, also obtained by the linear prediction with 300 recorded samples.

Figure 10a and Figure 11a show the suggested average speed of 40 km/h for the Metrobus units in the dedicated lanes.

From Figure 11, we observe that the results obtained from the model (red curve) are similar to the values obtained in the field (blue curve). This observation is verified from the MSE value, which is 0.02724 with an SD of 0.1650. As mentioned above, the linear prediction allows us to reflect the behavior of the analyzed phenomenon. In this case, the delay behavior that can occur on the routes due to the traffic conditions present in CDMX is reflected. Once the combination of values to be used in the linear prediction model was selected, tests were continued in the route model. To carry out the linear prediction implemented in the MAP model, 390 complete trips are considered. These trips are those scheduled on a normal business day. The complete route includes Villa de Aragón as a starting point and El Rosario station as a destination; see Table 3. The data obtained with the MAP model are compared with the estimates of the Google Maps and Moovit applications, which are commonly used by drivers in CDMX. The data and traffic points presented in these applications are subjective because they are obtained from the opinions and experiences of users on their trips. Table 5 shows the results obtained from the real-time traffic apps, such as Google Maps and Moovit, the result programmed by Metrobus and the MAP model result. To quantitatively evaluate the results obtained from the proposed architecture, we obtained the MAE, the mean absolute percentage error (MAPE), and the root mean square error (RMSE) [7].

M A E = \frac{\sum_{l = 1}^{N} | t_{o b s e r v e d} - t_{p r e d i c t e d} |}{N}

(12)

M A P E = 100 * \frac{1}{N} \sum_{l = 1}^{N} \frac{| t_{o b s e r v e d} - t_{p r e d i c t e d} |}{t_{o b s e r v e d}}

(13)

R M S E = \sqrt{\frac{\sum_{l = 1}^{N - 1} {(t_{o b s e r v e d} - t_{p r e d i c t e d})}^{2}}{N - 1}}

(14)

where 𝑡 observed is the observed bus travel time, 𝑡 predicted is the predicted bus travel time, and 𝑁 is the number of bus trips observed [7]. The results are shown in Table 5.

From Table 5, we obtain that the time deviation with respect to what is programmed for the Moovit application is ±12.188 min of travel; for Google Maps, a deviation of ±12.149 min is obtained; and, finally, for MAP, based on 390 trips obtained from the autocorrelation matrix, a time deviation of ±10.85 min was obtained with respect to the programmed result. From Table 5 and as shown in Figure 10a,b and Figure 11a,b, it can be concluded that as data are entered through the sensor network, MAP feeds on the dynamic events that occurred in the days, hours, and months that were monitored. The result obtained reflects the behavior of what happened that day in relation to traffic, similar to what is observed with the results obtained with the Moovit and Google Maps platforms.

During some hours, the MAP results are different from those of the real-time traffic monitoring platforms. However, it should be noted that MAP provided an approximate result of what happened on the days monitored, compared to the real time results of these platforms. Remember that these platforms are fed by users as the traffic conditions occur. This is unlike MAP, which yielded the result prior to the actual time that can be compared with these platforms and can also provide a forecasted overview.

We observe that the RMSE results obtained by the MAP model compared to these platforms decrease as the number of trips per unit increases. During some hours, the MAP model results are different from the results of the real-time traffic monitoring platforms. However, it should be noted that the MAP model provided an approximate result of what happened on the days monitored, compared to the real-time information of the platforms. These platforms collect data entered by users at certain points and times in situ, unlike the MAP model, which shows information related to the day, time, and season using data collected through the sensor network to make a prediction. Subsequently, the density estimation model was continued. The tests carried out were made with data acquired from the sensor network.

The first scenario that was evaluated involved obtaining user density information from San Bartolo station, and the results obtained from Equation (8) are shown in Figure 10. For the estimation of user density, the following three scenarios were considered: no traffic hours between 04:30–06:00 and 22:00–24:00, normal traffic hours between 10:00–12:00 and 16:00–18:00, and abnormal traffic with a schedule from 06:00–10:00 h and from 18:00–22:00 h. In Figure 12a, we observe the user density behavior in a station under no traffic conditions. We observe a uniform distribution, with few users. In Figure 12a, we observe that the capacity of the station is less than 50%. Figure 12b shows the user density behavior under normal traffic conditions. In this scenario, an increase in users is observed, and the highest points of the graph show stations close to reaching 100% of their capacity. Finally, Figure 12c shows the graph obtained under abnormal traffic conditions. We can observe that the density of users per station is 50% over its allowed capacity. From these results, it can be concluded that the user density values in the Metrobus stations correspond directly to the vehicular traffic present, which may be due to the peak hours in CDMX. These data can be tied to the results shown in Table 5 and Figure 13 and Figure 14, which provide a clearer picture of the possible scenarios present at peak traffic hours, such as user density and bus speed. To obtain the user density in each unit, the three previous scenarios of no traffic, normal traffic, and abnormal traffic were considered. From Figure 13, we observe that the user density increases by one unit depending on the conditions present. This result coincides with that obtained for the user density in the station, as observed in Figure 12.

From Equation (9), we obtain Figure 13. Figure 13a shows the density of users in a unit under without traffic conditions, which is at 30% of its capacity on average. Under normal traffic conditions, it reaches 65% of the average capacity (Figure 13b). The data obtained reflect the behavior observed throughout the operation of the service in CDMX. The demand is given according to the traffic conditions of the day. Continuing with the MAP model analysis, the user density model was evaluated by the period of the day. As Figure 14 demonstrates, the data reflect the behavior analyzed on the routes. The results allow the periods of the day and the traffic conditions where there is a greater density of users, that is, a greater demand for service, to be identified. Within the hours of 06:00–10:00, 12:00–16:00, and 18:00–22:00, the highest user density on the route was identified, which was greater than 80% of the service capacity. Finally, the density model was evaluated, through which it is possible to identify the stations on the route where users board and exit more frequently. In Figure 14, the station with the greatest boarding is the Villa de Aragón station, while the station with the greatest exiting is the El Rosario station. To validate the results obtained from the MAP model, tests were performed simultaneously with the mobile sensor network installed in the San Bartolo, Politécnico, Rio Bamba, and 18 de Marzo stations of Metrobus Line 6.

To validate the performance of the proposed work, the results obtained in terms of the MAPE, MAE, and RMSE are compared. To obtain a comparison of the results obtained with a real scenario, a performance analysis of MAP vs. programmed task, Moovit vs. programmed task and Google Maps vs. programmed task was carried out. These are the latest real-time traffic monitoring applications. To carry out the validations, samples were taken during the morning and evening hours. The results obtained in these pieces of work and those obtained in this research are shown in Table 5.

From the results shown in this section, we observed that the collaborative work between MAP and the sensor network generates knowledge to support the programming of the Metrobus itinerary at different times, seasons, and days, as shown in the results obtained. For Metrobus Line 6, the best similitude of the arrival times obtained from MAP shows an error of 0.57% at 7:58 am, compared to the scheduled arrivals for the day. This result is obtained with a training of the model using the data collected during 300 trips with 50 samples, against the 2% error obtained at 5:27 am with a 1-trip training of the model. The architecture proposal provides a tool to optimize Metrobus resources, since with prior knowledge of the traffic prediction scheme according to the schedule, season and day, the number of vehicles available on the route can be controlled, and knowledge of user demand can be obtained, due to the traffic generated in the area.

The results obtained show that as the samples and the number of trips made by the buses increase, the model is fed, and the results become closer to the information obtained by traffic monitoring applications, such as Google Maps and Moovit. With MAP, it was observed that an MSE of 0.05% in the case of the prediction of the arrival time and 10.5% in the case of the user density were obtained, and these error ranges are due to the great variety of scenarios that occur in CDMX. We also observe that as we increase the number of trips in the autocorrelation matrix, the values of the results with respect to the MAE and MAPE decrease. The lowest MAP performance occurred at 4:30 a.m., with MAE = 0.2800 and MAPE = 0.6500%. These results were obtained with p = 50 and trip = 1, compared with the results obtained from MAE = 0.0800% and MAPE = 0.1300%, obtained by the Moovit platform at 4:35 am. At the end of the test day, at 23:57, the results obtained by MAE and MAPE from MAP are as follows: MAE = 0.1800%, MAPE = 0.2900%, and RMSE = 1.2857. Additionally, the predictions coincide with the value obtained by the Moovit platform. We also observed that the result with the lowest performance in terms of % error occurred at 4:30 for MAP, with a value for RMSE = 2, and the best performance obtained in terms of % error was RMSE = 0.0057 at 6:57 h, with a trip = 50 and p = 50. Finally, it is observed that from the beginning of the samples obtained, the values of the approximations of MAP improve and the error decreases according to the traffic conditions monitored by the mobile sensor network, due to the data collected and processed. Table 2 also shows the details related to the MAP model proposed in this work. MAP is a model based on time series; although its training begins with historical data, these are updated in situ through the hardware architecture proposed in this work. The advantage of MAP over the pieces of work presented lies in the minimal use of variables for its training (ID_POSITION and ID_VELOCITY) to obtain the arrival times between line segments, unlike the other algorithms that use RNN, FK, and prediction rules. Another important feature to highlight is the amount of input data that these algorithms require to obtain the estimate in situ, restricting their application in the field.

7. Discussion

Based on the tests carried out using the MAP model, we observe that with the development of a mobile sensor network, it is possible to establish communication between information sources (mobile and static nodes) for the monitoring of the Metrobus network. Additionally, it is possible to identify that with the MAP model, the monitoring network can be managed and exploited to generate time, resource consumption, and user density predictions. With the results obtained, it can be observed that the analysis describe the phenomenon studied by the MAP model, offering information for decision-making from the same source of information. From Table 4, we were able to verify that MAP can start its arrival estimation from the contained historical data obtained by field observations. As the data are collected through the sensor network, the estimates are close to those produced by real-time traffic monitoring platforms. In the first observations made, a dispersion of value is presented with respect to the trip averages with a value of SD = 0.2313 in the worst case, corresponding to the first trip made during the day. As more trips are made, as observed in trip 16 with 300 samples, the dispersion value with respect to the average value of arrival times is SD = 0.1650. This represents an improvement in the data obtained from almost 30% of the initial samples, since the data are collected in the field. MAP offers a reduction in the number of variables needed to obtain arrival estimates. This is unlike those shown in the state of the art, which implement complex computational methods.

MAP estimates Metrobus arrival times, and these data are relevant to carry out trip planning aimed at the user and the provider service. With the estimation of arrival times, a transfer relationship can be established, allowing the modeling of a possible service demand scenario during the working day. With this parameter, the number of buses available can be made more flexible by hour, day, and month of the year. Additionally, MAP can obtain the estimated speed from the Metrobus data that correlates with the current traffic time in CDMX. This can be used in calculations of delays in transfers, as well as waiting times for users between each station on Line 6. Unlike the work presented in the literature, the contribution of MAP lies in the fact that it simultaneously shows the delay times with the least number of variables to speed up the response times in situ, due to the generation of dynamic data that are collected by the sensor network.

Based on the proposal made to estimate the arrival times of the buses at the Line 6 stations, the following relevant points were observed: (i) using the proposed time series with the variables of bus speed and the distance between segments, it is possible to obtain a time prediction with the lowest MAP performance occurred at 4:30 a.m., with MAE = 0.2800 and MAPE = 0.6500%. These results were obtained with p = 50 and trip = 1, compared with the results obtained from MAE = 0.0800% and MAPE = 0.1300%, obtained by the Moovit platform at 4:35 a.m. At the end of the test day, at 23:57, the results obtained by MAE and MAPE from MAP are as follows: MAE = 0.1800%, MAPE = 0.2900%, and RMSE = 1.2857. We also observed that the result with the lowest performance in terms of % error occurred at 4:30 for MAP, with a value for RMSE = 2, and the best performance obtained in terms of % error was RMSE = 0.0057 at 6:57 h, with a trip = 50 and p = 50

At this point, it is important to highlight that as the historical data are processed in conjunction with data collected through the network of sensors on Line 6, the results are closer to those presented by the traffic apps operating in real time, such as Moovit and Google Maps.

(i) With the basic elements implemented in the architecture of the mobile sensor network in this work, the necessary data are acquired to feed the time series proposed for the prediction of arrival time.

(ii) We can validate that the time series adapts to the traffic conditions present on the day, at the time, and during the season of the year.

It should be noted that some of the example cases use databases with the history of the trips to obtain the arrival predictions. In other work, they make comparisons of the results shown with traffic applications, such as Google traffic. These pieces of work present results in times that are lower than the historical ones. In the particular case of this proposal, we present an in situ sensor network that collects and sends the data to the MAP model to estimate the time of arrival.

The MAP model yields results by combining sensor data with the time series data, presenting encouraging results for this class of travel planning applications in CDMX. This is unlike more complex models that require robust computer equipment and imply a higher computational cost in complex operations, as well as the condition of having a greater number of variables for their respective training.

With the prediction obtained, information is provided for various profiles of end users, such as Metrobus transport users, people who plan trips to be made on a specific day, and resource and bus managers, because with the schedule, day, and season, it is possible to improve the management of the units. We observe that the implemented time series yields useful results for resource management, decision-making for bus boarding, and trip planning. Compared with existing research, it gives interesting approximations to be exploited due to its precision and the minimul errors that it presents. These time series can be adapted naturally to the traffic conditions according to the season without performing new training, as required by neural networks. Finally, in CDMX, there is no free, low-cost tool that collaborates with users and concessionaires of this type of transport for travel management. In the daily life of CDMX, which can be as variable as the traffic, natural conditions, school, and work schedules, a tool of this type is very useful.

8. Conclusions and Future Research

In this work, the implementation of an architecture for a comprehensive management system to support the decision making of Line 6 of the Metrobus is proposed. The proposed work consists of the following two main components: the architecture of a mobile sensor network and the MAP model, which is adaptable to the different traffic situations on Metrobus Line 6, present at CDMX. Being able to understand the environment in which it is developing during the travel days, at different times, holiday seasons, and the entry–exit of workers and students at different times of the day. As can be seen, the best training time was obtained at the end of the day with a value of MAE = 0.1800%, MAPE = 0.2900%, and RMSE = 1.2857. The worst result obtained was presented at the beginning of the day, with values of MAE = 0.2800 and MAPE = 0.6500%. The results obtained from these components allow comparison with real-time traffic monitoring platforms, such as Moovit and Google Maps. These results are close to those of the monitoring platforms, which makes it possible to determine that as data are collected and the time series is trained, data close to those measured in real time will be obtained, allowing the generation of knowledge close to the traffic of the day during a particular period. Based on the proposed mobile sensor network, it is possible to identify how the information components are integrated according to the interests of the users. From these data, the MAP model generates a general panorama of the concurrence and influx of traffic behavior in the area, where the Line 6 Metrobus path in CDMX is presented. With these estimates, users and operators will be able to establish transfer schemes, trajectories, number of units that provide service, among other possible benefits. Moreover, the MAP model generates greater knowledge as more trips are made and as the number of samples increases. The reduced use of external variables to obtain the Metrobus arrival times allows the obtaining of a feasible application for on-site use by users. In addition to continuous updating through dynamic data, relevant information for decision-making is generated. The pieces of work in the literature require multiple input data, such as routes, ID_stop, stay times, and transit times. This situation can generate delays in the calculations for the estimation of arrivals in cases of applications that require a well-timed response. MAP requires only the longitude, latitude, and speed values to estimate the arrival times of the buses at each station on Line 6. As an area of opportunity in the proposed work, estimates can be made of the zones and times that present the highest rate of accidents or delays derived from unforeseen situations. Generating an estimation model that yields this type of information would support the corresponding areas of attention to incidents. Locating in a more agile way the points for the corresponding support activities. Another area of opportunity that was detected is that MAP can gradually integrate various variables, such as people counting and numbers boarding and alighting the buses, to provide greater diversity of transportation alternatives. An area of opportunity observed in the development of this research is related to the improvement and capacity of the data-processing equipment. Metrobus generates a large amount of data and, therefore, requires computing capacity.

Author Contributions

M.A.D.-C.: formal analysis and investigation; B.E.C.-G.: conceptualization, investigation, supervision, and project administration; O.G.-F.: investigation, review, and editing; and F.S.O.-Z.: investigation, review, editing, and validation. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Secretaría de Educación, Ciencia, Tecnología e Innovación of México City under grants SECITI/072/2016, SECTEI/226/2019, SECTEI/249/2021 and projects SIP 20210178, SIP 20220542 and SIP 20220632 of the Instituto Politécnico Nacional. The APC was funded by the Secretaría de Educación, Ciencia, Tecnología e Innovación of the México City under grants SECITI/072/2016, SECTEI/226/2019 and SECTEI/249/2021, and projects SIP 20210178, SIP 20220542 and SIP 20220632 of the Instituto Politécnico Nacional.

Acknowledgments

We thank the Instituto Politécnico Nacional for the SIP Project 20210178-20220542, the Secretaría de Educación, Ciencia, Tecnología e Innovación of the CDMX for the projects SECITI/072/2016, SECTEI/226/2019, and SECTEI/249/2021, and the Metrobus of the CDMX for help and support during the work. The working group also thanks all the anonymous authors who made available the icons presented in this document, such as the Freepik page for the bus image https://www.freepik.es/vectores/autobus and station image https://www.freepik.es/fotos/publicidad-calle (accessed on 13 August 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Miles, J. Intelligent Transport Systems: Overview and Structure (History, Applications, and Architectures). In Encyclopedia of Automotive Engineering, Intelligent Transport System; Chapter 3; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2014; pp. 1–16. [Google Scholar]
Xiong, Z.; Sheng, H.; Rong, W.; Cooper, D. Intelligent transportation systems for smart cities: A progress review. Sci. China Inform. Sci. 2012, 55, 2908–2914. [Google Scholar] [CrossRef]
Veres, M.; Moussa, M. Deep Learning for Intelligent Transportation Systems: A Survey of Emerging trends. IEEE Trans. Intell. Transp. Syst. 2020, 21, 3152–3168. [Google Scholar] [CrossRef]
Mendoza, A.; García, A. Potential Applications of Intelligent Transportation Systems to Road Freight Transport in Mexico. Transp. Res. Rec. 2000, 1707, 81–85. [Google Scholar] [CrossRef]
Hidalgo, D.; Graftieaux, P. Bus Rapid Transit Systems in Latin America and Asia Results and Difficulties in 11 Cities. Transp. Res. Rec. 2008, 2072, 77–88. [Google Scholar] [CrossRef]
TomTom. México City Traffic. Available online: https://www.tomtom.com/en_gb/traffic-index/mexico-city-traffic/ (accessed on 19 July 2022).
Parliamentary Research Institute. 2020. Available online: http://aldf.gob.mx/archivo-9f6f5328e0f0853d4453d481cbffa2b6.pdf (accessed on 15 July 2022).
United Nations. Available online: https://www.un.org/en/events/citiesday/assets/pdf/the_worlds_cities_in_2018_data_booklet.pdf (accessed on 6 July 2022).
National Institute of Statistic and Geography (INEGI). Transporte de Pasajeros. 2021. Available online: https://www.inegi.org.mx/temas/transporteurb/ (accessed on 22 July 2022).
Demuynck, T. Bounding average treatment effects: A linear programming approach. Econ. Lett. 2015, 137, 75–77. [Google Scholar] [CrossRef]
Global BTR DATA. Available online: https://brtdata.org/location/latin_america/mexico/mexico_city (accessed on 4 March 2022).
Metrobus. Available online: https://www.metrobus.cdmx.gob.mx/mapas-de-sistema/mapa-linea-6/L6-mapasbarrio (accessed on 14 September 2022).
Pang, J.; Huang, J.; Du, Y.; Yu, H.; Huang, Q.; Yin, B. Learning to Predict Bus Arrival Time from Heterogeneous Measurements via Recurrent Neural Network. IEEE Trans. Intell. Transp. Syst. 2018, 20, 3283–3293. [Google Scholar] [CrossRef]
Petersen, N.C.; Rodrigues, F.; Camara Pereira, F. Multi-output bus travel time prediction with convolutional LSTM neural network. Expert Syst. Appl. 2019, 120, 426–435. [Google Scholar] [CrossRef]
Wall, Z.; Dailey, D.J. An algorithm for predicting the arrival time of mass transit vehicles using automatic vehicle location data. In Proceedings of the 78th Annual Meeting of the Transportation Research Board, Washington, DC, USA, 10–14 January 1999; pp. 279–288. [Google Scholar]
Jeong, R.; Rilett, L.R. Bus arrival time prediction using artificial neural network model. In Proceedings of the 7th International IEEE Conference on Intelligent Transportation Systems, Washington, DC, USA, 3–6 October 2004; pp. 988–993. [Google Scholar]
He, P.; Jiang, G.; Lam, S.; Tang, D. Travel-time prediction of bus journey with multiple bus trips. IEEE Trans. Intell. Transp. Syst. 2018, 20, 4192–4205. [Google Scholar] [CrossRef]
Hua, X.; Wang, W.; Wang, Y.; Ren, M. Bus Arrival Time Prediction Using Mixed Multi-route. Transport 2018, 33, 543–554. [Google Scholar] [CrossRef]
Fan, Y.; Guthrie, A.; Levinson, D. Waiting time perceptions at transit stops and stations: Effects of basic amenities, gender, and security. Transp. Res. Part A 2016, 88, 251–264. [Google Scholar] [CrossRef]
Yu, B.; Yang, Z.Z.; Chen, K.; Yu, B. Hybrid model for prediction of bus arrival times at next station. J. Adv. Transp. 2010, 44, 193–204. [Google Scholar] [CrossRef]
Yu, Z.; Gao, W.; Zuo, X. Design of Novel Intelligent Transportation System based on Wireless Sensor Network and ZigBee Technology. Sens. Transducers 2013, 156, 95–102. [Google Scholar]
Vanajakshi, L.; Subramanian, S.C.; Sivanandan, R. Travel time prediction under heterogeneous traffic conditions using global positioning system data from buses. IET Intell. Transp. Syst. 2009, 3, 1–9. [Google Scholar] [CrossRef]
Maa, J.; Chana, J.; Ristanoski, G.; Rajasegarar, S.; Leckied, C. Bus travel time prediction with real-time traffic information. Transp. Res. Part C 2019, 105, 536–549. [Google Scholar] [CrossRef]
Yuan, Y.; Shao, C.; Cao, Z.; He, Z.; Zhu, C.; Wang, Y.; Jang, V. Bus Dynamic Travel Time Prediction: Using a Deep Feature Extraction Framework Based on RNN and DNN. Electronics 2020, 9, 1876. [Google Scholar] [CrossRef]
Kumar, B.; Jairam, R.; Arkatkar, S.; Vanajakshi, L. Real time bus travel time prediction using k-NN classifier. Transp. Lett. 2017, 11, 362–372. [Google Scholar] [CrossRef]
Kumar, B.; Vanajakshi, L.; Subramanian, S. Bus travel time prediction using a time-space discretization approach. Transp. Res. Part C 2017, 79, 308–332. [Google Scholar] [CrossRef]
Chien, S.I.-J.; Ding, Y.; Wei, C. Dynamic Bus Arrival Time Prediction with Artificial Neural Networks. J. Transp. Eng. 2002, 128, 429–438. [Google Scholar] [CrossRef]
Cats, O.; Loutos, G. Real-Time Bus Arrival Information System: An Empirical Evaluation. J. Intell. Transp. Syst. 2015, 2, 138–151. [Google Scholar] [CrossRef]
He, P.; Jiang, G.; Lam, S.; Sun, Y. Learning heterogeneous traffic patterns for travel time prediction of bus journeys. Inf. Sci. 2020, 512, 1394–1406. [Google Scholar] [CrossRef]
Jisha, R.C.; Aiswa, R.; Sajitha, K. IoT based school bus tracking and arrival time prediction. In Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi, India, 13–16 September 2017; pp. 1–6. [Google Scholar]
Zhou, Y.; Yao, L.; Chen, Y.; Gong, Y.; Lai, J. Bus arrival time calculation model based on smart card data. Transp. Res. Part C 2017, 74, 81–96. [Google Scholar] [CrossRef]
Gurmu, Z.; Fan, W. Artificial Neural Network Travel Time Prediction Model for Buses Using Only GPS Data. J. Public Transp. 2014, 17, 45–65. [Google Scholar] [CrossRef]
Xu, H.; Ying, J. Bus arrival time prediction with real-time and historic data. Cluster Comput. 2017, 20, 3099–3106. [Google Scholar] [CrossRef]
Zhang, J.; Yu, X.; Tian, C.; Zhang, F.; Tu, L.; Xu, C. Analyzing passenger density for public bus: Inference of crowdedness and evaluation of scheduling choices. In Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), Qingdao, China, 8–11 October 2014. [Google Scholar]

Figure 1. Line 6 Metrobus. Reprinted with permission from Ref. [12].

Figure 2. Proposed architecture of the developed model.

Figure 3. Network architecture for two stations.

Figure 4. Interconnection diagram of the nodes, including the mobile node placed in the Metrobus bus and the static node placed in each station.

Figure 5. Main screen for presenting information: (a) arrival forecast selection screen, (b) location selection screen, (c) current station display, and (d) obtained forecast display screen.

Figure 6. Speed comparison: (a) first trip; (b) second trip.

Figure 7. Segments of Metrobus Line 6.

Figure 8. MAP prediction arrival time model flow diagram.

Figure 9. Relationship of the distance traveled vs. the traffic density of each segment that makes up Line 6.

Figure 10. Linear prediction considering 50 trips and 50 previous samples: (a) real speed vs. estimated speed; (b) real time vs. estimated time.

Figure 11. Linear prediction considering 300 trips and 50 previous samples: (a) estimated speed vs. real speed on a segment; (b) estimated time vs real time on a segment.

Figure 12. Density of users in a station: (a) without traffic conditions, (b) normal traffic conditions, and (c) with traffic conditions. See Table 3.

Figure 13. Density of users in a unit: (a) units without traffic conditions and (b) units under normal traffic conditions. See Table 3.

Figure 14. Load density of users on the Villa de Aragón-El Rosario route.

Table 1. Brief summary of state-of-the-art features and MAP.

Author	Prediction Estimation	Architecture Implemented	Model	Dataset	Case Scenario	Input Variables	Prediction to Line or Segment	Type Data Collected
Kumar A. et al. [25]	Bus arrival prediction	GPS-GPRS	K-nearest neighbors (KNN)—Kalman Filtering	Metropolitan Transportation Corporation. Chennai, India	Route 19B y Route 5C	Latitude, longitude and timestamp data from the database	Segment	Static
Kumar A. et al. [26]	Bus arrival prediction	GPS-Advanced Public Transportation System (APTS)	Kalman filtering	Metropolitan Transportation Corporation. Chennai, India	Route 19B y Route 5C	Similar longitude and latitude patterns from previous trips	Segment	Static
Chien S. et al. [27]	Bus arrival prediction	GPS-Automatic Vehicle Location (AVL)	Artificial neural network (ANN)—error adaptive algorithm	Not available CORSIM-based simulation	Route 39 de New Jersey transit coorporation	Data based on links, stops and arrival-departure times	Segment	Static
Cats O. et al. [28]	Bus arrival prediction	GPS-AVL	Prediction rules based on instantaneous position data	Public transport in Stockholm, Sweden	-	trip_ID, stop_ID, arrival-stop times (actual, scheduled)	Segment	Static
Peilan H. et al. [29]	Bus arrival prediction	GPS (latitude, longitude, timstamp, bus line ID)	No negative matrix factorization (NMF)-Long Short Term Memory (LSTM)	Land Transport Authority, Singapore	1–3 lines in Singapore. Not defined	latitude, longitude, timestamp, bus line ID, travel distance	Segment	Static
Zhout Y. et al. [30]	Bus arrival prediction	Smart card historical data base	Analysis card swiping time distribution	Beijing Transport Operation Coordinate Center. Beijin, China	Line 486, Beijin, China	Card entry	Line	Static
Pang J. [13]	Bus arrival prediction	GPS installed on buses	Recurrent neural network (RNN)-LSTM-gated recurrent unit (GRU)	No information available	Beijin, China	GPS data from buses (longitude, latitude and timestamp) and the static information about the road network	Segment	Static–Dynamic
Jisha R.C. [23]	Bus arrival tracking and prediction	GPS installed on buses	Kalman filtering	GPS receiver in the bus	Haripad to Vallikavu, India	GPS (position, speed, time, distance)	Segment	Static–Dynamic
Maa J. [31]	Bus arrival prediction	GPS installed on the buses/taxis and smart card records	Time series	Historical data GPS installed on buses/taxis and smart card records	Xi’an, China	GPS data from buses (longitude, latitude and timestamp), bus route, bus ID and smart card demand	Line	Static
Kebede Gurmu Z. et al. [32]	Bus arrival prediction	Data collected AVL	ANN—gradient with moment weight and bias learning function	Historical GPS data collected	LT 11. Macae, Brazil.	GPS data	Segment	Static
Xu H. [33]	Bus arrival prediction	GPS on buses	Time series	Historical GPS data collected	Hangzhou, Zhejiang	GPS data	Segment	Static
Yuan Y. [24]	Bus arrival prediction	GPS-AVL	Recurrent neural network (RNN) and deep neural network (DNN)	Historical GPS data collected	Guangzhou 226 bus and Shenzhen 113 bus	GPS data	Segment	Static
Proposed model MAP	Bus arrival prediction	GPS (ID_posición, ID_velocidad)	Time series	Historical and collected data GPS in situ	Line 6 MB, Mexico City	GPS data	Segment	Static–Dynamic

Table 2. Example of the format of the processed data.

Station	Timestamp Start	Date Start	Timestamp Finish	Date Finish	Latitude	Longitude
1	1543296842	26 November 2018, 23:34:02	1543299112	27 November 2018, 00:11:52	19.342571	−98.99567
2	1543354955	27 November 2018, 15:42:35	1543356515	27 November 2018, 16:08:35	19.385864	−99.20839
3	1543356755	27 November 2018, 16:12:35	1543357659	27 November 2018, 16:27:39	19.53792	−99.19847
4	1543358195	27 November 2018, 16:36:35	1543358859	27 November 2018, 16:47:39	19.464422	−99.149284
5	1543359099	27 November 2018, 16:51:39	1543359688	27 November 2018, 17:01:28	19.539156	−99.19683
6	1543360270	27 November 2018, 17:11:10	1543360895	27 November 2018, 17:21:35	19.384357	−98.98343
7	1543361157	27 November 2018, 17:25:57	1543361589	27 November 2018, 17:33:09	19.538795	−99.19727
8	1543363007	27 November 2018, 17:56:47	1543363907	27 November 2018, 18:11:47	19.397495	−98.9951
9	1543364868	27 November 2018, 18:27:48	1543365408	27 November 2018, 18:36:48	19.489239	−99.489296
10	1543365677	27 November 2018, 18:41:17	1543367169	27 November 2018, 19:06:09	19.538876	−99.19711

Table 3. List of route schedules with traffic conditions.

Interval Number	Initial Time	Final Time	Traffic Conditions	Number of Trips
1	04:30	06:00	Without traffic	30
2	06:00	10:00	With traffic	80
3	10:00	12:00	Normal conditions	40
4	12:00	16:00	With traffic	80
5	16:00	18:00	Normal conditions	40
6	18:00	22:00	With traffic	80
7	22:00	00:00	Without traffic	40

Table 4. Calculation of MSE and SD for different combinations of routes and considering previous samples.

Test Number	Trip Number Considered for Autocorrelation Matrix	Number N of Previous Samples	MSE (s)	SD
1	50	10	0.05352	0.2313
2		20	0.05145	0.2268
3		30	0.04946	0.2223
4		50	0.03966	0.1991
5	100	10	0.03676	0.1917
6		20	0.03555	0.1885
7		30	0.03448	0.1856
8		50	0.02973	0.1724
9	200	10	0.03143	0.1772
10		20	0.03056	0.1748
11		30	0.02959	0.1720
12		50	0.02876	0.1695
13	300	10	0.03025	0.1739
14		20	0.02997	0.1731
15		30	0.02888	0.1699
16		50	0.02724	0.1650

Table 5. Comparison of the prediction errors for MAP vs. Programed Task, Moovit vs. Programed Task, and Google vs. Programed Task.

Trip Matrix	Samples	Time Departure	Time Arrival	Platform	MAE (s)	MAPE (%)	RMSE
	-	04:30:00	05:13:45	Programmed task	-	-	-
	50	04:35:00	05:36:00	Moovit	0.0800	0.1300	0.5714
	50	04:30:00 04:30:00	05:39:00	Google maps	0.2400	0.3400	1.7142
1	50	04:30:00 04:30:00	05:27:09	MAP	0.2800	0.6500	2
	-	05:27:00	06:29:45	Programmed task	-	-	-
	50	05:35:00	06:38:00	Moovit	0.0600	0.0900	0.4285
	50	05:27:00	06:36:00	Google maps	0.1800	0.2600	1.2857
20	50	05:27:00	06:27:36	MAP	0.0400	0.0600	0.2857
	-	06:57:00	08:06:35	Programmed task	-	-	-
	50	07:05:00	08:26:00	Moovit	0.3200	0.3900	2.2857
	50	06:57:00	08:06:00	Google maps	0.0800	0.1100	0.0571
50	50	06:57:00	07:58:32	MAP	0.0800	0.1100	0.0571
	-	14:27:00	15:36:35	Programmed task	-	-	-
	50	14:35:00	15:51:00	Moovit	0.0400	0.0500	0.2857
	50	14:27:00	15:38:00	Google maps	0.1400	0.1900	1
200	50	14:27:00	15:45:10	MAP	0.2200	0.2600	1.2857
	-	23:57:00	00:59:45	Programmed task	-	-	-
	50	00:05:00	01:07:00	Moovit	0.1800	0.2900	1.2857
	50	23:57:00	01:12:00	Google maps	0.1200	0.1800	0.8571
390	50	23:57:00	01:08:04	MAP	0.1800	0.2900	1.2857

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Díaz-Casco, M.A.; Carvajal-Gámez, B.E.; Gutiérrez-Frías, O.; Osorio-Zúñiga, F.S. Comprehensive—Model Based on Time Series for the Generation of Traffic Knowledge for Bus Transit Rapid Line 6 of México City. Electronics 2022, 11, 3036. https://doi.org/10.3390/electronics11193036

AMA Style

Díaz-Casco MA, Carvajal-Gámez BE, Gutiérrez-Frías O, Osorio-Zúñiga FS. Comprehensive—Model Based on Time Series for the Generation of Traffic Knowledge for Bus Transit Rapid Line 6 of México City. Electronics. 2022; 11(19):3036. https://doi.org/10.3390/electronics11193036

Chicago/Turabian Style

Díaz-Casco, Manuel A., Blanca E. Carvajal-Gámez, Octavio Gutiérrez-Frías, and Fernando S. Osorio-Zúñiga. 2022. "Comprehensive—Model Based on Time Series for the Generation of Traffic Knowledge for Bus Transit Rapid Line 6 of México City" Electronics 11, no. 19: 3036. https://doi.org/10.3390/electronics11193036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comprehensive—Model Based on Time Series for the Generation of Traffic Knowledge for Bus Transit Rapid Line 6 of México City

Abstract

1. Introduction

Problem Description

2. Related Work

3. General Proposed Architecture for Traffic Analysis

3.1. Sensor Network Layer

3.1.1. Mobile Sensor Networks for Monitoring, Generation of Itineraries, and Route Analysis

3.1.2. Connection Layer

3.1.3. Storage and Interpretation Layer

3.1.4. Presentation Layer

4. Metrobus Arrival Prediction Model

4.1. Scenario Trip Test

4.1.1. Data Collected

4.1.2. MAP Description

4.1.3. Segment Routes

4.1.4. Average Speed

4.1.5. Travel Time between Segments

4.1.6. Linear Prediction

4.1.7. Velocity Prediction Coefficient

4.1.8. Predicted Time

4.1.9. User Density Estimation

4.1.10. User Density

4.1.11. User Density Prediction

4.1.12. Unit User Density Prediction

4.1.13. User Density Prediction for the Day Period

4.1.14. Average User Density Prediction

5. Study Case: Line 6 of Metrobus CDMX

5.1. Scenario Description

5.2. Collected Data Preparation

6. MAP Model Analysis Results

7. Discussion

8. Conclusions and Future Research

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI