Privacy Preserved Video Summarization of Road Traffic Events for IoT Smart Cities
1.1. Research Questions
- RQ1: Can an object detection model trained on synthetic data (e.g., YOLO) produce optimum event detection results on real-time captured CCTV footage?
- RQ2: For an event alert, can a CCTV video segment be extracted and stored in an encrypted format with foolproof security of encryption/decryption keys?
1.2. Research Contributions
- RC1: YOLO is trained for event detection (rather than object detection) using a customised synthetic video dataset (accident and non-accident video frames).
- RC2: The testing of the trained model is performed on real-time CCTV footage to classify the accidental and non-accidental video frames with different environmental conditions such as videos recorded at night and during rain.
- RC3: Event-based video summaries of CCTV footage are stored in an encrypted format using Diffie–Hellman key exchange and Fernet cipher. The SHA256 hash is applied to tackle key and data-level security.
- RC4: An annotated customised synthetic dataset (accident and non-accident video frames) for YOLOv5 model training is provided for future researchers.
2. Related Work
2.1. Video Summarization through YOLO
2.2. Cryptography for Visual Data Security
3.1. Dataset Preparation
- For the training of YOLO, high-quality road traffic videos (synthetic dataset) were taken from BeamNG drive .
- YOLO required annotated data for training, therefore, frames were extracted from videos for applying annotation.
- In the last step of the dataset preparation phase, the annotation was applied to every selected frame. Total 600 frames were selected and manually annotated by the tool . The bounding boxes for accident and non-accident regions were annotated in frames as 0 and 1, respectively, as shown in Figure 3. After annotation, the frames along with their respective text files were saved on the computer drive. The text files contain the annotation information, i.e., class name and coordinates for each bounding box. After this, the YOLO was trained over the customised annotated dataset. The annotated dataset is made publicly available on Kaggle repository  for future researchers.
3.2. YOLO Training
- YOLOv5 is pre-trained model on the COCO dataset . In this research, YOLOv5 is trained on customised synthetic dataset, as this version is lighter and faster than previous versions. It can easily be implemented on custom datasets by making necessary modifications for the classification task at hand. As this research entailed the classification of only two classes, therefore, we re-trained it on our dataset with the existing base layers. The head of YOLOv5 consists of three convolutional layers that predict the location of bounding boxes (x, y, height, width) where an event has occurred, scores (certainty of the predicted event), and class of the event. YOLOv5 uses sigmoid linear unit (SILU) and sigmoid activation functions. SILU is also known as swish activation function and it is used in the hidden layers with convolutional operations. On the other hand, sigmoid is used in the output layer with convolutional operations.
- YOLOv5 model training was done with Google Colab . The dataset was split into training (80%) and validation (20%) data. The model summary comprised of number of layers, parameters, gradients, and GFLOPs is shown in Table 1. YOLO keeps the aspect ratio of the images, therefore, all the training images (1920 × 1080 resolution) were resized (416 × 234 resolution) for use in a 416 × 416 network. It took 3.939 h to train the model for 30 epochs with batch size 16. Initially, the training started on pre-trained weights, and on completion of training, two new weight files were created with the names and . We selected the file containing the best weights, i.e., , to test real-time videos, whereas holds the weights for last epoch in the training. The training results for one batch are shown in Figure 3.
- The model was evaluated on synthetic data during training phase. Sample results on the validation data along with ground truth labels are shown in Figure 4a and predicted classes in validation phase are shown in Figure 4b, which shows model has learned good enough from the training dataset. However, testing is performed on real-time videos footage rather than synthetic dataset which is discussed in Section 3.3.
- PR curve shown in Figure 6a represents that model can predict accident and non-accident classes with a score of 0.984 and 0.924, respectively, with mAP@0.5, where mAP is mean average precision threshold. The graphs contain precision (P) values on y-axis and recall (R) values on x-axis. Equations (1) and (2) were used for calculation of P and R values, where: is true positive, is false negative, and is false positive.
- F1-Confidence Curve is a measure of P and R values at any specific threshold value, i.e., 0.91 here for both classes at 0.430, as shown in Figure 6b. -confidence curve value is near 1, which shows model is trained well. score is calculated as:
3.3. Testing on Real-Time Videos Footage
3.4. Storage and Retrieval
Cryptography for Summarized Videos
- Firstly, two password keys ( and ) were generated by DH key exchange algorithm.
- A hash value (h) was calculated for camera ID and stored as a digest in .
- The password key was used along with to derive an encryption key (), whereas was stored in a hardware wallet for generating the decryption key () at the time of decryption.
- was further used for preserving the privacy of a summarized video.
- Fully encrypted video or token was stored on the server.
- was deleted.
|Algorithm 1: Pseudo-code for applied cryptography on summarized videos|
|/* Calculate password keys for encryption and decryption key generation||/*|
|Initialization: p ← any_random_number, g ← any_random_number;|
|Output: , ;|
|/* Calculate password keys and through Diffie–Hellman key exchange algorithm||/*|
|s ← select any random number;|
|r ← select any random number;|
|x ← calculate exchange key ();|
|y ← calculate exchange key ();|
|← calculate password key for encyrption ();|
|← calculate password key for decryption ();|
|Store on wallet;|
|/* Calculation of||/*|
|h ← calculate hash value for camera ID;|
|← store digest of h;|
|/* Generate encryption key and apply Fernet encryption||/*|
|← define key derivative function PBKDF2HMAC with calculated ;|
|← convert into byte array;|
|Input: summarized video;|
|← generate randomly;|
|← apply Fernet encryption on summarized video by using and ;|
|Output: fully encrypted video or token;|
|/* Generate decryption key and apply Fernet decryption||/*|
|Read: camera ID, ;|
|← define key derivative function PBKDF2HMAC with calculated ;|
|← convert into byte array;|
|Input: fully encrypted video or token;|
|← apply Fernet decryption on fully encrypted video by using IV and ;|
|Output: summarized video;|
- Authorised stakeholder accessed the fully encrypted video along with meta-data stored on the server.
- Calculate using camera ID.
- DH generated key was read from the wallet and the same key generation process was repeated to generate .
- was read from Fernet token to decrypt the video with and .
- The summarized video was decrypted for the authorised stakeholder.
4. Results and Discussion
5. Comparative Analysis
6. Limitations and Future Work
- Unavailability of synthetic data for training on multiple road events rather than just accidents/non-accidents.
- Model is tested on fixed position CCTV recorded videos footage only.
- Only vehicular accidents are focused in this research, while pedestrian crashes are not considered.
Data Availability Statement
Conflicts of Interest
- Shifa, A.; Asghar, M.N.; Noor, S.; Gohar, N.; Fleury, M. Lightweight Cipher for H. 264 Videos in the Internet of Multimedia Things with Encryption Space Ratio Diagnostics. Sensors 2019, 19, 1228. [Google Scholar] [CrossRef]
- Wang, Z.; Liu, J. A Review of Object Detection Based on Convolutional Neural Network. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017; pp. 11104–11109. [Google Scholar]
- Chen, C.; Liu, M.-Y.; Tuzel, O.; Xiao, J. R-CNN for Small Object Detection. In Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 214–230. [Google Scholar]
- Girshick, R. Fast R-Cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-Cnn: Towards Real-Time Object Detection with Region Proposal Networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef]
- Zaidi, S.S.A.; Ansari, M.S.; Aslam, A.; Kanwal, N.; Asghar, M.; Lee, B. A survey of modern deep learning based object detection models. Digit. Signal Process. 2022, 126, 103514. [Google Scholar]
- Liu, C.; Tao, Y.; Liang, J.; Li, K.; Chen, Y. Object Detection Based on YOLO Network. In Proceedings of the 2018 IEEE 4th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 14–16 December 2018; pp. 799–803. [Google Scholar]
- Aslam, A.; Curry, E. Towards a generalized approach for deep neural network based event processing for the internet of multimedia things. IEEE Access 2018, 6, 25573–25587. [Google Scholar]
- Santad, T.; Silapasupphakornwong, P.; Choensawat, W.; Sookhanaphibarn, K. Application of YOLO Deep Learning Model for Real Time Abandoned Baggage Detection. In Proceedings of the 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE), Nara, Japan, 9–12 October 2018; pp. 157–158. [Google Scholar]
- Liu, R.; Ren, Z. Application of Yolo on Mask Detection Task. In Proceedings of the 2021 IEEE 13th International Conference on Computer Research and Development (ICCRD), Beijing, China, 5–7 January 2021; pp. 130–136. [Google Scholar]
- Laroca, R.; Severo, E.; Zanlorensi, L.A.; Oliveira, L.S.; Gonçalves, G.R.; Schwartz, W.R.; Menotti, D. A Robust Real-Time Automatic License Plate Recognition Based on the YOLO Detector. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–10. [Google Scholar]
- Cao, Z.; Liao, T.; Song, W.; Chen, Z.; Li, C. Detecting the Shuttlecock for a Badminton Robot: A YOLO Based Approach. Expert Syst. Appl. 2021, 164, 113833. [Google Scholar]
- Park, S.-S.; Tran, V.-T.; Lee, D.-E. Application of Various Yolo Models for Computer Vision-Based Real-Time Pothole Detection. Appl. Sci. 2021, 11, 11229. [Google Scholar] [CrossRef]
- Asghar, M.N.; Kanwal, N.; Lee, B.; Fleury, M.; Herbst, M.; Qiao, Y. Visual Surveillance within the EU General Data Protection Regulation: A Technology Perspective. IEEE Access 2019, 7, 111709–111726. [Google Scholar] [CrossRef]
- Asghar, M.N.; Ansari, M.S.; Kanwal, N.; Lee, B.; Herbst, M.; Qiao, Y. Deep Learning Based Effective Identification of EU-GDPR Compliant Privacy Safeguards in Surveillance Videos. In Proceedings of the 2021 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), In Virtual, 25–28 October 2021; pp. 819–824. [Google Scholar]
- Tahir, M.; Asghar, M.N.; Kanwal, N.; Lee, B.; Qiao, Y. Joint Crypto-Blockchain Scheme for Trust-Enabled CCTV Videos Sharing. In Proceedings of the 2021 IEEE International Conference on Blockchain (Blockchain), Melbourne, Australia, 6–8 December 2021; pp. 1–6. [Google Scholar]
- Simmons, G.J. Symmetric and Asymmetric Encryption. ACM Comput. Surv. CSUR 1979, 11, 305–330. [Google Scholar] [CrossRef]
- Guntuboina, C.; Porwal, A.; Jain, P.; Shingrakhia, H. Deep Learning Based Automated Sports Video Summarization Using YOLO. ELCVIA Electron. Lett. Comput. Vis. Image Anal. 2021, 20, 99–116. [Google Scholar]
- Kakodra, S.S.; Sujatha, C.; Desai, P. Query-By-Object Based Video Synopsis. In Proceedings of the 2021 International Conference on Intelligent Technologies (CONIT), Karnataka, India, 25–27 June 2021; pp. 1–5. [Google Scholar]
- Ul Haq, H.B.; Asif, M.; Ahmad, M.B.; Ashraf, R.; Mahmood, T. An Effective Video Summarization Framework Based on the Object of Interest Using Deep Learning. Math. Probl. Eng. 2022, 2022, 7453744. [Google Scholar] [CrossRef]
- Negi, A.; Kumar, K.; Saini, P.; Kashid, S. Object Detection based Approach for an Efficient Video Summarization with System Statistics over Cloud. In Proceedings of the 2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON), Allahabad, India, 2–4 December 2022; pp. 1–6. [Google Scholar]
- Abu Taha, M.; Hamidouche, W.; Sidaty, N.; Viitanen, M.; Vanne, J.; El Assad, S.; Déforges, O. Privacy Protection in Real Time HEVC Standard Using Chaotic System. Cryptography 2020, 4, 18. [Google Scholar] [CrossRef]
- Shifa, A.; Imtiaz, M.B.; Asghar, M.N.; Fleury, M. Skin Detection and Lightweight Encryption for Privacy Protection in Real-Time Surveillance Applications. Image Vis. Comput. 2020, 94, 103859. [Google Scholar]
- Alawi, A.R.; Hassan, N.F. A Proposal Video Encryption Using Light Stream Algorithm. Eng. Technol. J. 2021, 39, 184–196. [Google Scholar] [CrossRef]
- Huang, X.; Arnold, D.; Fang, T.; Saniie, J. A Chaotic-Based Encryption/Decryption System for Secure Video Transmission. In Proceedings of the 2021 IEEE International Conference on Electro Information Technology (EIT), Mt. Pleasant, MI, USA, 14–15 May 2021; pp. 369–373. [Google Scholar]
- Tyagi, S.S. Enhancing Security of Cloud Data through Encryption with AES and Fernet Algorithm through Convolutional-Neural-Networks (CNN). Int. J. Comput. Netw. Appl. 2021, 8, 288–299. [Google Scholar]
- Li, H.; Xiezhang, T.; Yang, C.; Deng, L.; Yi, P. Secure Video Surveillance Framework in Smart City. Sensors 2021, 21, 4419. [Google Scholar] [CrossRef]
- Aribilola, I.; Asghar, M.N.; Kanwal, N.; Fleury, M.; Lee, B. SecureCam: Selective Detection and Encryption enabled Application for Dynamic Camera Surveillance Videos. IEEE Trans. Consum. Electron. 2022. [Google Scholar] [CrossRef]
- Home. Available online: https://www.beamng.com/game/ (accessed on 29 May 2022).
- Make Sense. Available online: https://www.makesense.ai/ (accessed on 30 June 2022).
- Accident and Non-Accident Dataset for YOLO. Available online: https://www.kaggle.com/datasets/mehwishtahir722/accident-and-nonaccident-dataset-for-yolo (accessed on 11 November 2022).
- Ultralytics/YOLOv5. Ultralytics. Available online: https://github.com/ultralytics/YOLOv5/blob/c98128fe71a8676037a0605ab389c7473c743d07/README.md (accessed on 4 October 2022).
- Google Colaboratory. Available online: https://colab.research.google.com/ (accessed on 4 June 2022).
- Car Crashes Time, CCTV CAR CRASHES COMPILATION 2018 #EP. 20, 8 January 2018. Available online: https://www.youtube.com/watch?v=gQkoujWBxqg&t=452s (accessed on 23 August 2022).
- Fernet (Symmetric Encryption) — Cryptography 39.0.0.dev1 Documentation. Available online: https://cryptography.io/en/latest/fernet/ (accessed on 18 November 2022).
- Mathews, S.P.; Gondkar, R.R. Protocol Recommendation for Message Encryption in Mqtt. In Proceedings of the 2019 International Conference on Data Science and Communication (IconDSC), Bangalore, India, 1–2 March 2019; pp. 1–5. [Google Scholar]
- Rescorla, E. Diffie-Hellman Key Agreement Method. 1999. Available online: https://www.rfc-editor.org/rfc/rfc2631.html (accessed on 18 November 2022).
- Lehto, N.; Halunen, K.; Latvala, O.-M.; Karinsalo, A.; Salonen, J. CryptoVault-A Secure Hardware Wallet for Decentralized Key Management. In Proceedings of the 2021 IEEE International Conference on Omni-Layer Intelligent Systems (COINS), Barcelona, Spain, 23–25 August 2021; pp. 1–4. [Google Scholar]
- Asghar, M.N.; Ghanbari, M. MIKEY for keys management of H. 264 scalable video coded layers. J. King Saud-Univ.-Comput. Inf. Sci. 2012, 24, 107–116. [Google Scholar]
|Video No.||Frames||Length (Sec)||Speed Per Image (ms)|
|Video No.||Frames||BB||Total BB||Accuracy (%)|
|Frames||Video Size (MB)|
|Video No.||Original||Summarized||Original||Summarized||Encrypted Summarized|
|Video No.||Length (sec)||Reduction %|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Tahir, M.; Qiao, Y.; Kanwal, N.; Lee, B.; Asghar, M.N. Privacy Preserved Video Summarization of Road Traffic Events for IoT Smart Cities. Cryptography 2023, 7, 7. https://doi.org/10.3390/cryptography7010007
Tahir M, Qiao Y, Kanwal N, Lee B, Asghar MN. Privacy Preserved Video Summarization of Road Traffic Events for IoT Smart Cities. Cryptography. 2023; 7(1):7. https://doi.org/10.3390/cryptography7010007Chicago/Turabian Style
Tahir, Mehwish, Yuansong Qiao, Nadia Kanwal, Brian Lee, and Mamoona Naveed Asghar. 2023. "Privacy Preserved Video Summarization of Road Traffic Events for IoT Smart Cities" Cryptography 7, no. 1: 7. https://doi.org/10.3390/cryptography7010007