Comparing Video Activity Classifiers within a Novel Framework
Abstract
:1. Introduction
- We propose a new video activity recognition framework that is flexible, modular, and encompasses object detection, object tracking, and event classification.
- We demonstrated the efficacy of the tracking and classifier modules, except the object detection module, in the proposed framework using the well-known VIRAT dataset. The results are very encouraging.
- We demonstrate the use of logical rules to improve the efficacy of deep learning classifiers.
2. Proposed Video Event Classification Framework and Its Critical Modules
2.1. VideoGraph
2.2. DAN
2.3. Proposed Video Activity Recognition Pipeline
2.3.1. General Architecture
2.3.2. Rule-Based System
- Rule 1: Exiting a facility
- Rule 2: Entering a facility
- Rule 3: Entering or exiting a vehicle
- Rule 4: Special case
- Rule 5: Non-event determination based on distance from points of interest
- Rule 6: Elimination of certain events based on the scene contents.
- Rule 7: VideoGraph Probability Thresholding.
3. Results
3.1. Dataset
3.2. Comparison of Three Event Recognition Approaches
3.2.1. VideoGraph Only
3.2.2. Rule-Based Approach
3.2.3. Hybrid Approach
3.3. Three Detailed Videos
3.3.1. Building Entrance Scene: VIRAT_S_010000_04_000530_000605
3.3.2. Parking Lot Scene: VIRAT_S_010111_08_000920_000954
3.3.3. Mixed Parking Lot and Building Entrance Scene: VIRAT_S_000102
4. Discussions and Future Work
4.1. Discussions
4.2. Limitations
4.3. Future Work
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Rautaray, S.S.; Agrawal, A. Vision based hand gesture recognition for human computer interaction: A survey. Artif. Intell. Rev. 2012, 43, 1–54. [Google Scholar] [CrossRef]
- Paul, S.N.; Singh, Y.J. Survey on Video Analysis of Human Walking Motion. Int. J. Signal Process. Image Process. Pattern Recognit. 2014, 7, 99–122. [Google Scholar] [CrossRef] [Green Version]
- Vishwakarma, S.; Agrawal, A. A survey on activity recognition and behavior understanding in video surveillance. Vis. Comput. 2012, 29, 983–1009. [Google Scholar] [CrossRef]
- Lin, W.; Sun, M.-T.; Poovandran, R.; Zhang, Z. Human activity recognition for video surveillance. In Proceedings of the 2008 IEEE International Symposium on Circuits and Systems, Seattle, WA, USA, 18–21 May 2008. [Google Scholar]
- Duong, T.V.; Bui, H.H.; Phung, D.; Venkatesh, S. Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Model. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 838–845. [Google Scholar]
- Kuo, Y.-M.; Lee, J.-S.; Chung, P.-C. A Visual Context-Awareness-Based Sleeping-Respiration Measurement System. IEEE Trans. Inf. Technol. Biomed. 2009, 14, 255–265. [Google Scholar] [PubMed]
- Huynh, H.H.; Meunier, J.; Sequeira, J.; Daniel, M. Real time detection, tracking and recognition of medication intake. World Acad. Sci. Eng. Technol. 2009, 60, 280–287. [Google Scholar]
- Foroughi, H.; Aski, B.S.; Pourreza, H. Intelligent Video Surveillance for Monitoring Fall Detection of Elderly in Home Environments. In Proceedings of the IEEE 11th International Conference on Computer and Information Technology (ICCIT), Khulna, Bangladesh, 24–27 December 2008; pp. 219–224. [Google Scholar]
- Liu, C.D.; Chung, P.C.; Chung, Y.N.; Thonnat, M. Understanding of human behaviors from videos in nursing care monitoring systems. J. High Speed Netw. 2007, 16, 91–103. [Google Scholar]
- Kwan, C.; Zhou, J. Anomaly detection in low quality traffic monitoring videos using optical flow. In Proceedings of the Pattern Recognition and Tracking XXIX, Orlando, FL, USA, 30 April 2018; Volume 10649. [Google Scholar]
- Kwan, C.; Zhou, J.; Wang, Z.; Li, B. Efficient anomaly detection algorithms for summarizing low quality videos. In Proceedings of the Pattern Recognition and Tracking XXIX, Orlando, FL, USA, 27 April 2018; Volume 10649. [Google Scholar]
- Kwan, C.; Zhou, J.; Yin, J. The development of a video browsing and video summary review tool. In Proceedings of the Pattern Recognition and Tracking XXIX, Orlando, FL, USA, 27 April 2018; Volume 10649. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. 2018. Available online: https://arxiv.org/abs/1804.02767 (accessed on 8 April 2018).
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Sun, S.; Akhtar, N.; Song, H.; Mian, A.S.; Shah, M. Deep affinity network for multiple object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 1. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hussein, N.; Gavves, E.; Smeulders, A.W.M. VideoGraph: Recognizing Minutes-Long Human Activities in Videos. arXiv 2019, arXiv:1905.05143. [Google Scholar]
- Five Video Classification Methods Implemented in Keras and TensorFlow. 2017. Available online: https://blog.coast.ai/five-video-classification-methods-implemented-in-keras-and-tensorflow-99cad29cc0b5? (accessed on 29 April 2020).
- Donahue, J.; Hendricks, L.A.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Darrell, T.; Saenko, K. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 2625–2634. [Google Scholar]
- Li, Y.; Yu, T.; Li, B. Recognizing Video Events with Varying Rhythms. arXiv 2020, arXiv:2001.05060. [Google Scholar]
- Ullah, A.; Muhammad, K.; Hussain, T.; Baik, S.W. Conflux LSTMsNetwork: A Novel Approach for Multi-View Action Recognition. Available online: https://www.researchgate.net/profile/Amin_Ullah3/publication/339133153_Conflux_LSTMs_Network_A_Novel_Approach_for_Multi-View_Action_Recognition/links/5e3f79ce299bf1cdb918f8e4/Conflux-LSTMs-Network-A-Novel-Approach-for-Multi-View-Action-Recognition.pdf (accessed on 20 September 2020).
- Ullah, A.; Muhammad, K.; Haq, I.U.; Baik, S.W. Action recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environments. Futur. Gener. Comput. Syst. 2019, 96, 386–397. [Google Scholar] [CrossRef]
- Ullah, A.; Muhammad, K.; Del Ser, J.; Baik, S.W.; De Albuquerque, V.H.C. Activity Recognition Using Temporal Optical Flow Convolutional Features and Multilayer LSTM. IEEE Trans. Ind. Electron. 2019, 66, 9692–9702. [Google Scholar] [CrossRef]
- The VIRAT Video Datase. Available online: https://viratdata.org/ (accessed on 11 March 2020).
Event ID | Event Type | Number of Videos |
---|---|---|
1 | Person loading an Object to a Vehicle | 21 |
2 | Person Unloading an Object from a Car/Vehicle | 59 |
3 | Person Opening a Vehicle/Car Trunk | 42 |
4 | Person Closing a Vehicle/Car Trunk | 41 |
5 | Person getting into a Vehicle | 111 |
6 | Person getting out of a Vehicle | 97 |
7 | Person gesturing | 51 |
8 | Person digging | 0 |
9 | Person carrying an object | 822 |
10 | Person running | 22 |
11 | Person entering a facility | 156 |
12 | Person exiting a facility | 133 |
Classified | |||||||||
---|---|---|---|---|---|---|---|---|---|
Ground Truth (GT) | Event 5 | Event 6 | Event 11 | Event 12 | Non-Event | Total | Accuracy | OA | |
Event 5 | 9 | 1 | 0 | 0 | 0 | 10 | 0.9 | 0.56 | |
Event 6 | 0 | 10 | 0 | 0 | 0 | 10 | 1 | - | |
Event 11 | 0 | 0 | 10 | 0 | 0 | 10 | 1 | - | |
Event 12 | 0 | 0 | 1 | 9 | 0 | 10 | 0.9 | - | |
Non-Event | 13 | 10 | 2 | 3 | 0 | 28 | 0 | - |
Classified | |||||||||
---|---|---|---|---|---|---|---|---|---|
GT | Event 5 | Event 6 | Event 11 | Event 12 | Non-Event | Total | Accuracy | OA | |
Event 5 | 8 | 1 | 0 | 0 | 1 | 10 | 0.8 | 0.74 | |
Event 6 | 0 | 8 | 0 | 0 | 2 | 10 | 0.8 | - | |
Event 11 | 0 | 0 | 7 | 1 | 2 | 10 | 0.7 | - | |
Event 12 | 0 | 0 | 1 | 8 | 1 | 10 | 0.8 | - | |
Non-Event | 3 | 2 | 3 | 1 | 19 | 28 | 0.68 | - |
Classified | |||||||||
---|---|---|---|---|---|---|---|---|---|
GT | Event 5 | Event 6 | Event 11 | Event 12 | Non-Event | Total | Accuracy | OA | |
Event 5 | 9 | 1 | 0 | 0 | 0 | 10 | 0.9 | 0.81 | |
Event 6 | 0 | 10 | 0 | 0 | 0 | 10 | 1 | - | |
Event 11 | 0 | 0 | 10 | 0 | 0 | 10 | 1 | - | |
Event 12 | 0 | 0 | 1 | 9 | 0 | 10 | 0.9 | - | |
Non-Event | 5 | 2 | 2 | 2 | 17 | 28 | 0.61 | - |
Classified | ||||||||
---|---|---|---|---|---|---|---|---|
Event 5 | Event 6 | Event 11 | Event 12 | Non-Event | Accuracy | OA | ||
GT | - | - | - | - | - | - | 0.31 | |
- | - | - | - | - | - | - | ||
Event 11 | 0 | 0 | 2 | 0 | 0 | 1 | - | |
Event 12 | 0 | 0 | 0 | 2 | 0 | 1 | - | |
Non-Event | 1 | 6 | 1 | 1 | 0 | 0 | - |
Classified | ||||||||
---|---|---|---|---|---|---|---|---|
Event 5 | Event 6 | Event 11 | Event 12 | Non-Event | Accuracy | OA | ||
GT | - | - | - | - | - | - | 0.92 | |
- | - | - | - | - | - | - | ||
Event 11 | 0 | 0 | 2 | 0 | 0 | 1 | - | |
Event 12 | 0 | 0 | 0 | 2 | 0 | 1 | - | |
Non-Event | 0 | 0 | 1 | 0 | 8 | 0.89 | - |
Classified | ||||||||
---|---|---|---|---|---|---|---|---|
Event 5 | Event 6 | Event 11 | Event 12 | Non-Event | Accuracy | OA | ||
GT | - | - | - | - | - | - | 0.85 | |
- | - | - | - | - | - | - | ||
Event 11 | 0 | 0 | 2 | 0 | 0 | 1 | - | |
Event 12 | 0 | 0 | 0 | 2 | 0 | 1 | - | |
Non-Event | 0 | 0 | 1 | 1 | 7 | 0.78 | - |
Classified | ||||||||
---|---|---|---|---|---|---|---|---|
Event 5 | Event 6 | Event 11 | Event 12 | Non-Event | Accuracy | OA | ||
GT | Event 5 | 1 | 0 | 0 | 0 | 0 | 1 | 0.4 |
Event 6 | 0 | 1 | 0 | 0 | 0 | 1 | - | |
- | - | - | - | - | - | - | ||
- | - | - | - | - | - | - | ||
Non-Event | 1 | 1 | 1 | 0 | - | 0 | - |
Classified | ||||||||
---|---|---|---|---|---|---|---|---|
Event 5 | Event 6 | Event 11 | Event 12 | Non-Event | Accuracy | OA | ||
GT | Event 5 | 1 | 0 | 0 | 0 | 0 | 1 | 0.8 |
Event 6 | 0 | 1 | 0 | 0 | 0 | 1 | - | |
- | - | - | - | - | - | - | ||
- | - | - | - | - | - | - | ||
Non-Event | 0 | 1 | 0 | 0 | 2 | 0.67 | - |
Classified | ||||||||
---|---|---|---|---|---|---|---|---|
Event 5 | Event 6 | Event 11 | Event 12 | Non-Event | Accuracy | OA | ||
GT | Event 5 | 1 | 0 | 0 | 0 | 0 | 1 | 0.6 |
Event 6 | 0 | 1 | 0 | 0 | 0 | 1 | - | |
- | - | - | - | - | - | - | ||
- | - | - | - | - | - | - | ||
Non-Event | 1 | 1 | 0 | 0 | 1 | 0.33 | - |
Classified | ||||||||
---|---|---|---|---|---|---|---|---|
Event 5 | Event 6 | Event 11 | Event 12 | Non-Event | Accuracy | OA | ||
GT | Event 5 | 1 | 0 | 0 | 0 | 0 | 1 | 0.6 |
Event 6 | 0 | 2 | 0 | 0 | 0 | 1 | - | |
Event 11 | 0 | 0 | 3 | 1 | 0 | 0.75 | - | |
Event 12 | 0 | 0 | 0 | 3 | 0 | 1 | - | |
Non-Event | 4 | 0 | 1 | 0 | 0 | 0 | - |
Classified | ||||||||
---|---|---|---|---|---|---|---|---|
Event 5 | Event 6 | Event 11 | Event 12 | Non-Event | Accuracy | OA | ||
GT | Event 5 | 1 | 0 | 0 | 0 | 0 | 1 | 0.93 |
Event 6 | 0 | 1 | 0 | 0 | 0 | 0.5 | - | |
Event 11 | 0 | 0 | 4 | 0 | 0 | 1 | - | |
Event 12 | 0 | 0 | 0 | 3 | 0 | 1 | - | |
Non-Event | 0 | 0 | 0 | 0 | 6 | 1 | - |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kwan, C.; Budavari, B.; Ayhan, B. Comparing Video Activity Classifiers within a Novel Framework. Electronics 2020, 9, 1545. https://doi.org/10.3390/electronics9091545
Kwan C, Budavari B, Ayhan B. Comparing Video Activity Classifiers within a Novel Framework. Electronics. 2020; 9(9):1545. https://doi.org/10.3390/electronics9091545
Chicago/Turabian StyleKwan, Chiman, Bence Budavari, and Bulent Ayhan. 2020. "Comparing Video Activity Classifiers within a Novel Framework" Electronics 9, no. 9: 1545. https://doi.org/10.3390/electronics9091545