# Comparative Analysis of Machine Learning-Based Approaches for Anomaly Detection in Vehicular Data

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Outlier Detection Theory and Related Work

_{ij}). Every neural network receives a set of inputs and, through its training process, adjusts the weight values so as to produce the desired output(s) [11,12]. Neural networks are directly affected by outliers which could potentially be responsible for an accountable decrease in their efficiency and accuracy [11]. The dependance of the neural networks’ performance by outliers in many cases can be utilized for outlier detection purposes. Deep learning can be considered as an evolution to conventional machine learning algorithms which brought upon breakthroughs in several tasks and applications such as speech and text recognition, image and video analysis, outliers detection, and more [13]. A very interesting study about driving behavior and outliers detection was performed by Kieu et al. [14]. In this study, the authors proposed a framework for outlier detection in time series which can be used for dangerous driving and hazardous road conditions. Moreover, they engaged autoencoders to process the time series and reduce its dimension. The autoencoder is a special type of neural network which consists of multiple layers and is used in order to execute a dimensionality reduction of the input dataset. Autoencoders can also support the outlier detection process due to the fact that the reconstruction of outlying values is far more difficult than the normal ones and so they can easily spot such values [15].

## 3. Hybrid Data Management and Analysis Platform

## 4. Dataset and Outlier Detection Methods

## 5. Results and Evaluation

#### 5.1. Univariate Outlier Detection Analysis Results

#### 5.2. Multivariate Outlier Detection Analysis Results

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- United Nations Revision of World Urbanization Prospects; United Nations: New York, NY, USA, 2018.
- World Health Organization (WHO). World Health Statistics 2020: Monitoring Health for the SDGs, Sustainable Development Goals; World Health Statistics; World Health Organization: Geneva, Switzerland, 2020. [Google Scholar]
- International Traffic Safety Data and Analysis Group (IRTAD). Road Safety Annual Report 2019; International Transport Forum: Paris, France, 2019. [Google Scholar]
- Aggarwal, C.C. Outlier Analysis; Springer International Publishing: Cham, Switzerland, 2017; ISBN 978-3-319-47577-6. [Google Scholar]
- Rousseeuw, P.; Hubert, M. Robust Statistics for Outlier Detection. Wiley Interdisc. Rew. Data Min. Knowl. Discov.
**2011**, 1, 73–79. [Google Scholar] [CrossRef] - Santoyo, S. A Brief Overview of Outlier Detection Techniques. Available online: https://towardsdatascience.com/a-brief-overview-of-outlier-detection-techniques-1e0b2c19e561 (accessed on 17 March 2021).
- Maesschalck, R.D.; Jouan-Rimbaud, D.; Massart, D.L. The Mahalanobis Distance. Chemom. Intell. Lab. Syst.
**2000**, 50, 1–18. [Google Scholar] [CrossRef] - Leys, C.; Delacre, M.; Mora, Y.; Lakens, D.; Ley, C. How to Classify, Detect, and Manage Univariate and Multivariate Outliers, With Emphasis on Pre-Registration. Int. Rev. Soc. Psychol.
**2019**, 32. [Google Scholar] [CrossRef] - Cohen, I. Outliers Analysis: A Quick Guide to the Different Types of Outliers. Available online: https://towardsdatascience.com/outliers-analysis-a-quick-guide-to-the-different-types-of-outliers-e41de37e6bf6 (accessed on 17 March 2021).
- Wilcox, R.R. Fundamentals of Modern Statistical Methods, 2nd ed.; Springer: New York, NY, USA, 2010; ISBN 978-1-4419-5525-8. [Google Scholar]
- Khamis, A.; Ismail, Z.; Khalid, H.; Mohammed, A. The Effects of Outliers Data on Neural Network Performance. J. Appl. Sci.
**2005**, 5, 1394–1398. [Google Scholar] [CrossRef] [Green Version] - Patterson, D.W. Artificial Neural Networks: Theory and Applications, 1st ed.; Prentice Hall PTR: Upper Saddle River, NJ, USA, 1998; ISBN 0-13-295353-6. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature
**2015**, 521, 436–444. [Google Scholar] [CrossRef] [PubMed] - Kieu, T.; Yang, B.; Jensen, C.S. Outlier Detection for Multidimensional Time Series Using Deep Neural Networks. In Proceedings of the 2018 19th IEEE International Conference on Mobile Data Management (MDM), Aalborg, Denmark, 25–28 June 2018; pp. 125–134. [Google Scholar]
- Chen, J.; Sathe, S.; Aggarwal, C.; Turaga, D. Outlier Detection with Autoencoder Ensembles. In Proceedings of the 2017 SIAM International Conference on Data Mining, Houston, TX, USA, 27–29 April 2017; pp. 90–98. [Google Scholar]
- Knorr, E.M.; Ng, R.T. Algorithms for Mining Distance-Based Outliers in Large Datasets. In Proceedings of the 24rd International Conference on Very Large Data Bases, San Francisco, CA, USA, 24–27 August 1998; pp. 392–403. [Google Scholar]
- Peterson, L.E. K-Nearest Neighbor. Scholarpedia
**2009**, 4, 1883. [Google Scholar] [CrossRef] - Aggarwal, C.C. Proximity-Based Outlier Detection. In Outlier Analysis; Aggarwal, C.C., Ed.; Springer: New York, NY, USA, 2013; pp. 101–133. ISBN 978-1-4614-6396-2. [Google Scholar]
- Breunig, M.; Kriegel, H.-P.; Ng, R.; Sander, J. LOF: Identifying Density-Based Local Outliers. In Proceedings of the ACM Sigmod Record, Dallas, TX, USA, 16–18 May 2000; Volume 29, pp. 93–104. [Google Scholar]
- Thang, T.M.; Kim, J. The Anomaly Detection by Using DBSCAN Clustering with Multiple Parameters. In Proceedings of the 2011 International Conference on Information Science and Applications, Jeju, Korea, 26–29 April 2011; pp. 1–5. [Google Scholar]
- Hadi, A.S.; Simonoff, J.S. Procedures for the Identification of Multiple Outliers in Linear Models. J. Am. Stat. Assoc.
**1993**, 88, 1264–1272. [Google Scholar] [CrossRef] - Xu, H.; Caramanis, C.; Sanghavi, S. Robust PCA via Outlier Pursuit. IEEE Trans. Inf. Theory
**2012**, 58, 3047–3064. [Google Scholar] [CrossRef] [Green Version] - Rousseeuw, P.J. Least Median of Squares Regression. J. Am. Stat. Assoc.
**1984**, 79, 871–880. [Google Scholar] [CrossRef] - Rousseeuw, P.J.; Driessen, K.V. A Fast Algorithm for the Minimum Covariance Determinant Estimator. Technometrics
**1999**, 41, 212–223. [Google Scholar] [CrossRef] - Thomas, R.; Judith, J.E. Voting-Based Ensemble of Unsupervised Outlier Detectors. In Advances in Communication Systems and Networks; Jayakumari, J., Karagiannidis, G.K., Ma, M., Hossain, S.A., Eds.; Springer: Singapore, 2020; pp. 501–511. [Google Scholar]
- Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support Vector Machines. IEEE Intell. Syst. Appl.
**1998**, 13, 18–28. [Google Scholar] [CrossRef] [Green Version] - Li, K.-L.; Huang, H.-K.; Tian, S.-F.; Xu, W. Improving One-Class SVM for Anomaly Detection. In Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693), Xi’an, China, 5 November 2003; Volume 5, pp. 3077–3081. [Google Scholar]
- Liu, F.T.; Ting, K.M.; Zhou, Z. Isolation Forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar]
- Vasconcelos, I.; Vasconcelos, R.O.; Olivieri, B.; Roriz, M.; Endler, M.; Junior, M.C. Smartphone-Based Outlier Detection: A Complex Event Processing Approach for Driving Behavior Detection. J. Internet Serv. Appl.
**2017**, 8, 13. [Google Scholar] [CrossRef] [Green Version] - Vasconcelos, I.; Vasconcelos, R.O.; Olivieri, B.; Endler, M.; Júnior, M.C. Smart Driving Behavior Analysis Based on Online Outlier Detection: Insights from a Controlled Case Study. In Proceedings of the ICAS 2017: The Thirteenth International Conference on Autonomic and Autonomous Systems, Barcelona, Spain, 21–25 May 2017; p. 82. [Google Scholar]
- Zheng, Y.; Hansen, J.H.L. Unsupervised Driving Performance Assessment Using Free-Positioned Smartphones in Vehicles. In Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016; pp. 1598–1603. [Google Scholar]
- Zhang, M.; Chen, C.; Wo, T.; Xie, T.; Bhuiyan, M.Z.A.; Lin, X. SafeDrive: Online Driving Anomaly Detection From Large-Scale Vehicle Data. IEEE Trans. Ind. Inform.
**2017**, 13, 2087–2096. [Google Scholar] [CrossRef] - Dang, T.T.; Ngan, H.Y.T.; Liu, W. Distance-Based k-Nearest Neighbors Outlier Detection Method in Large-Scale Traffic Data. In Proceedings of the 2015 IEEE International Conference on Digital Signal Processing (DSP), Singapore, 21–24 July 2015; pp. 507–510. [Google Scholar]
- Djenouri, Y.; Belhadi, A.; Lin, J.C.; Cano, A. Adapted K-Nearest Neighbors for Detecting Anomalies on Spatio–Temporal Traffic Flow. IEEE Access
**2019**, 7, 10015–10027. [Google Scholar] [CrossRef] - Wu, H.; Sun, W.; Zheng, B. A Fast Trajectory Outlier Detection Approach via Driving Behavior Modeling. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 837–846. [Google Scholar]
- McAfee, A.; Brynjolfsson, E.; Davenport, T.H.; Patil, D.; Barton, D. Big Data: The Management Revolution. Harv. Bus. Rev.
**2012**, 90, 60–68. [Google Scholar] [PubMed] - Grinberg, M. Flask Web Development: Developing Web Applications with Python; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2018. [Google Scholar]
- MySQL. Available online: https://www.mysql.com (accessed on 22 March 2021).
- MongoDB. Available online: https://www.mongodb.com (accessed on 22 March 2021).
- Battle, R.; Benson, E. Bridging the Semantic Web and Web 2.0 with Representational State Transfer (REST). J. Web Semant.
**2008**, 6, 61–69. [Google Scholar] [CrossRef] - Apache Software Foundation Hadoop. Available online: https://hadoop.apache.org (accessed on 22 March 2021).
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Mishra, A. Metrics to Evaluate Your Machine Learning Algorithm. Available online: https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234 (accessed on 22 March 2021).
- Chen, S.; Wang, W.; van Zuylen, H. A Comparison of Outlier Detection Algorithms for ITS Data. Expert Syst. Appl.
**2010**, 37, 1169–1178. [Google Scholar] [CrossRef] - Chen, Z.; Yu, J.; Zhu, Y.; Chen, Y.; Li, M. D
^{3}: Abnormal Driving Behaviors Detection and Identification Using Smartphone Sensors. In Proceedings of the 2015 12th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), Seattle, WA, USA, 22–25 June 2015; pp. 524–532. [Google Scholar] - Goix, N. How to Evaluate the Quality of Unsupervised Anomaly Detection Algorithms? arXiv
**2016**, arXiv:1607.01152. [Google Scholar] - Zhao, Y.; Nasrullah, Z.; Li, Z. PyOD: A Python Toolbox for Scalable Outlier Detection. arXiv
**2019**, arXiv:1901.01588. [Google Scholar]

Variable | Unit | Note |
---|---|---|

timestamp | date and time | the timestamp of the measurement |

lat | degrees | the latitude of the vehicle’s position |

lon | degrees | the longitude of the vehicle’s position |

altitude | meters | the altitude compared to sea level |

accuracy | number | the accuracy of the vehicle’s geolocation |

bearing | degrees | the direction of the vehicle |

speedGPS | km/h | the speed of the vehicle according to the GPS |

speedOBD | km/h | the speed of the vehicle |

vinNumber | number | the unique vehicle’s identifier |

rpm | rounds/min | the engine’ rounds per minute |

relThrottle | percentage | the throttle’s position percentage |

intakeTemp | °C | the environmental temperature |

engineTemp | °C | the engine temperature |

fuelType | list | the type of the fuel used by the vehicle |

fuelLevel | percentage | the level of the fuel in the vehicle’s tank |

engineRuntime | hours | the hours since the engine started |

pendingTrouble | binary | declares if there is an emergency situation |

carPlate | string of characters | the license plate of the car |

**Table 2.**Final data structure for each vehicle considering the univariate analysis for outlier detection.

Variable | Unit | Type |
---|---|---|

altitude | meters | integer |

accuracy | number | double |

bearing | degrees | double |

speedGPS | km/h | integer |

speedOBD | km/h | integer |

rpm | rounds/min | integer |

relThrottle | percentage | integer |

intakeTemp | °C | integer |

engineTemp | °C | integer |

fuelType | list | integer |

fuelLevel | percentage | integer |

engineRuntime | hours | integer |

Attribute | Value |
---|---|

Samples | 4,000,000 |

Mean speed | 53.073 |

Standard Deviation | 24.726 |

Min speed value (km/h) | 1 |

Max speed value (km/h) | 169 |

Mean speed on 25% of the dataset (km/h) | 33 |

Mean speed on 50% of the dataset (km/h) | 62 |

Mean speed on 75% of the dataset (km/h) | 73 |

Algorithm | Accuracy | F1-Score | MAE | Time (Seconds) | Time (Hours) |
---|---|---|---|---|---|

Kernel SVM | 0.96129 | 0.35600 | 0.39000 | 49,500 | 13.75000 |

i-Forest | 0.98339 | 0.20000 | 0.43200 | 362 | 0.10056 |

MCD | 0.98851 | 0.99400 | 0.00300 | 13 | 0.00361 |

KNN | 0.97919 | 0.24200 | 0.41400 | 5340 | 1.48333 |

LOF | 0.99844 | 0.14700 | 0.45900 | 12,900 | 3.58333 |

Z-score | 0.00001 | 0.00023 | 0.86900 | 5 | 0.00139 |

Algorithm | Outliers (Percentage) | Time (Seconds) | Time (Hours) |
---|---|---|---|

Mahalanobis Distance | 10.20060% | 9.70240 | 0.00270 |

Algorithm | Outliers (Percentage) | Time (Seconds) | Time (Hours) |
---|---|---|---|

i-Forest | 9.99040% | 153.50870 | 0.04264 |

LOF | 2.15010% | 1707.43240 | 0.47429 |

DBSCAN | 7.51540% | 1431.20880 | 0.39756 |

Elliptic Envelope | 10.00010% | 258.92080 | 0.07192 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Demestichas, K.; Alexakis, T.; Peppes, N.; Adamopoulou, E.
Comparative Analysis of Machine Learning-Based Approaches for Anomaly Detection in Vehicular Data. *Vehicles* **2021**, *3*, 171-186.
https://doi.org/10.3390/vehicles3020011

**AMA Style**

Demestichas K, Alexakis T, Peppes N, Adamopoulou E.
Comparative Analysis of Machine Learning-Based Approaches for Anomaly Detection in Vehicular Data. *Vehicles*. 2021; 3(2):171-186.
https://doi.org/10.3390/vehicles3020011

**Chicago/Turabian Style**

Demestichas, Konstantinos, Theodoros Alexakis, Nikolaos Peppes, and Evgenia Adamopoulou.
2021. "Comparative Analysis of Machine Learning-Based Approaches for Anomaly Detection in Vehicular Data" *Vehicles* 3, no. 2: 171-186.
https://doi.org/10.3390/vehicles3020011