Next Article in Journal
Smart Rainwater Harvesting for Sustainable Potable Water Supply in Arid and Semi-Arid Areas
Next Article in Special Issue
Distributed Optimization of Joint Seaport-All-Electric-Ships System under Polymorphic Network
Previous Article in Journal
Spatial Differentiation Effect of Rural Logistics in Urban Agglomerations in China Based on the Fuzzy Neural Network
 
 
Article
Peer-Review Record

Research on Pedestrian Detection and DeepSort Tracking in Front of Intelligent Vehicle Based on Deep Learning

Sustainability 2022, 14(15), 9281; https://doi.org/10.3390/su14159281
by Xuewen Chen *, Yuanpeng Jia, Xiaoqi Tong and Zirou Li
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Sustainability 2022, 14(15), 9281; https://doi.org/10.3390/su14159281
Submission received: 16 June 2022 / Revised: 21 July 2022 / Accepted: 26 July 2022 / Published: 28 July 2022

Round 1

Reviewer 1 Report

 

There are a few doubts or problems to answer in this paper and my comments are listed as follows.

First, please adjust the sequence number of the form “Tab.1 Statistical results of pedestrian tracking evaluation indicators” to Tab.2.

Second, It is mentioned in section 4.1 of the manuscript that the training and testing curves of the P-R network have achieved convergence and the model accuracy has been improved compared with the model before improvement. Then, what is the training or testing accuracy of the network? These precision values should be explained in the paper.  

 

 

 

Author Response

First, please adjust the sequence number of the form “Tab.1 Statistical results of pedestrian tracking evaluation indicators” to Tab.2.

Reply: Yes, I agree with you. The sequence number of Tab.1 in the manuscript has been changed to Tab.2 marked with highlight in the revised manuscript.

Second, It is mentioned in section 4.1 of the manuscript that the training and testing curves of the P-R network have achieved convergence and the model accuracy has been improved compared with the model before improvement. Then, what is the training or testing accuracy of the network? These precision values should be explained in the paper.

Reply: Yes, I agree with you. The training accuracy and testing accuracy of the improved YOLO model are 82.21% and 72.02%, respectively. Compared with the model before improvement, the training accuracy and detection accuracy of the improved model are improved by 1.57% and 2.57 respectively. The validated result on training accuracy and testing accuracy of the improved YOLO model as shown in the following figure.

 

 

Author Response File: Author Response.pdf

Reviewer 2 Report

Give the contributions of the paper more clearly in the introduction section.

Provide the preliminaries to understand the notations much better.

The literature review is not comprehensive enough and the authors have overlooked many of the recent papers. The review needs to be extended. The following paper should be considered to give an opportunity to readers about recent real-life applications.

M. Asha Jerlin M. Abirami, "A smart parking system using IoT", World Review of Entrepreneurship, Management and Sustainable Development, Volume 15, Issue 3, 2019.

R. Saravanan, "Selfish node detection based on evidence by trust authority and selfish replica allocation in DANET", International Journal of Information and Communication Technology(Inderscience),  Volume 9, Issue 04, January 2016.

Srinivasan Palanisamy, S Sankar, Ramasubbareddy Somula, "Communication Trust and Energy-Aware Routing Protocol for WSN Using DS Theory", International Journal of Grid and High Performance Computing (IJGHPC -IGI),  Volume 13, Issue 4, 2021, Pages 24-36, Publisher IGI Global. DOI: 10.4018/IJGHPC.2021100102.

Senthil Murugan Nagarajan, Puspita Chatterjee, Waleed Alnumay, V Muthukumaran, "Integration of IoT based routing process for food supply chain management in sustainable smart cities", Sustainable Cities and Society(Elsevier), Volume 76, January 2022.

Senthil Murugan Nagarajan, Puspita Chatterjee, Waleed Alnumay, Uttam Ghosh, "Effective task scheduling algorithm with deep learning for Internet of Health Things (IoHT) in sustainable smart cities", Sustainable Cities and Society(Elsevier), Volume 71, August 2021.

What will be the frame size realized by Kalman filter estimation in the Hungarian algorithm?

When the crowd is so dynamic and ad-hoc in nature how the targets will be tracked and the trace is matched?

In section 3.3 It has been said that "tracking target with the shortest matching time has a higher priority, which ensures the continuity of tracking," How the priority is set with here?

The section number is wrongly given as "2.4. Optimal matching of pedestrian targets based on Hungarian algorithm". Check once.

In Table 1 Mention the filter size for the Fully connected layer and the L2 Normalized layer

Comparison with more recent studies and methods would be appreciated.

Give the article a mild language revision to get rid of a few complex sentences that hinder readability and eradicate typo errors.

I would like to see the manuscript again after revisions.

 

Author Response

Give the contributions of the paper more clearly in the introduction section.

Reply: Yes, I agree with you. The article is structured as follows. Section “Improved YOLO network design based on deep learning” Analyses the pedestrian detection process based on deep learning and proposes a CBAM (Convolutional Block Attention Module) module, including the Spatial Attention Module(SAM) and Channel Attention Module(CAM) based on the general YOLO model architecture. Section “DeepSort Pedestrian tracking based on Improved Network” designs the flow of DeepSort pedestrian tracking method and the Kalman filter to estimate the pedestrian motion state. The measures of appearance feature and Mahalanobis distance are setting for data association matching between the detection frame and predicted pedestrian trajectory, and the Hungarian algorithm was used to achieve the optimal matching of pedestrian targets. Section “Pedestrian detection and tracking evaluation indicators” discusses the evaluation indicators on pedestrian detection and tracking. Section “ Pedestrian detection and tracking verification” lists some verification results on pedestrian detection and tracking based on the dataset PD-2022, the COCO dataset, VOC 2017 and other datasets.  Section “ Conclusions” summarizes the research results and remarks of the article. The main innovation of the paper is that the improved pedestrian detection model has a stronger ability to deal with occlusion, and accurately detects missed and misdetected images, which solves the tracking failure caused by occlusion before improvement. 

 

Provide the preliminaries to understand the notations much better.

Reply: Yes, I agree with you. I have compiled the notations used in the manuscript and listed them in the table below.

Appendix table on the description of symbol

Symbol

Description of symbol

Symbol

Description of symbol

xk

Pedestrian motion state variable

 ri

Appearance descriptor

A

State transition matrix

mAP

Average detection accuracy

ω

Prediction noise

M 

Number of images to be recognized

Q

Covariance matrix of noise

Ng

 Number of positive samples

H

Measurement matrix

dti 

Distance deviation

vk

 Observation noise

Ct

The logarithm of successful matching

p 

Horizontal coordinate of the target center point.

mt

The real number of targets that are not matched successfully in the t frame.

q

Vertical coordinate of the target center point

FPT

Number of unsuccessful trackers in the t frame.

 

The prior estimate of pedestrian state

mmet 

The number of ID switch occurring in the t frame.

 

The prior estimate of covariance

gt

Total number of real targets contained in the t frame image.

γ 

Tthe length-width ratio of the boundary frame.

MOTA

Pedestrian tracking accuracy

h 

The height of the boundary frame.

MOTP

pedestrian tracking precision

vp, vq, vγ vh

The corresponding velocities of each point of the boundary frame

MT

The index value that is higher than 80% of the trajectory is tracked is higher

dj 

The position of the jth detection frame.

ML

The index value is less than 20% of the trajectory is untracked parameter

yi

The target position predicted by the ith tracking track.

FM

Target trajectory interrupt times(

dj

 The position of the jth detection frame.

IDSW

Switching times

yi

The target position predicted by the ith tracking track.

FPS

The processing speed of each frame in a video

The literature review is not comprehensive enough and the authors have overlooked many of the recent papers. The review needs to be extended. The following paper should be considered to give an opportunity to readers about recent real-life applications.

Reply: Yes.

What will be the frame size realized by Kalman filter estimation in the Hungarian algorithm?

Reply: 0.02 seconds per frame

When the crowd is so dynamic and ad-hoc in nature how the targets will be tracked and the trace is matched?

Reply: When the detection result of the pedestrian detector is cascaded with the pedestrian track at this moment estimated by Kalman filter according to the previous moment, the measures of appearance feature and Mahalanobis distance are used for data association matching as shown in the formula 9 to formula 11.

In section 3.3 It has been said that "tracking target with the shortest matching time has a higher priority, which ensures the continuity of tracking," How the priority is set with here?

Reply: During the cascade matching, a time update parameter (time_since_update) is set for each track. The tracking track is arranged according to the update parameter value, and the tracking track with the smallest parameter value is first associated with the detector for matching, so as to ensure that the tracking target with the shortest matching time has a higher priority, which ensures the continuity of tracking and reduces the number of identity ID switching.

The section number is wrongly given as "2.4. Optimal matching of pedestrian targets based on Hungarian algorithm". Check once.

Reply: Yes. The section number "2.4. Optimal matching of pedestrian targets based on Hungarian algorithm" has been deleted.

In Table 1 Mention the filter size for the Fully connected layer and the L2 Normalized layer

Reply: The filter does not need to be set up for the full connection layer and the L2 Normalized layer.

Comparison with more recent studies and methods would be appreciated.

Reply: When evaluating the basic model of YOLO, we have compared it with the target detection test accuracy value officially released by YOLO. When the detection speed is 45frames per second, the training accuracy and testing accuracy of the YOLO model before improvement are 80.64% and 69.45% in this paper, respectively. 

 Average detection Precision (AP) value contrast

Model

AP(%)

FPS(f/s)

The YOLO official model

55.3

35

The YOLO model constructed in this paper

69.45

45

 

Give the article a mild language revision to get rid of a few complex sentences that hinder readability and eradicate typo errors.

Reply: Yes. Thank you for your careful review. We will check and proofread grammar and other errors carefully.

Author Response File: Author Response.pdf

Reviewer 3 Report

The subject of the article is interesting, and the problem is worth exploring, but it is recommended to complement the discussion and conclusion section with a clear indication of what is innovative in the presented solution in comparison to other such solutions (authors' contribution to the development of science). It would also be necessary to characterize the barriers that the authors faced in their experiments and to point out the advantages of the presented solution compared to other research in this area.

The subject of the article is interesting, and the problem is worth exploring, but it is recommended to complement the discussion and conclusion section with a clear indication of what is innovative in the presented solution in comparison to other such solutions (authors' contribution to the development of science). It would also be necessary to characterize the barriers that the authors faced in their experiments and to point out the advantages of the presented solution compared to other research in this area.

In my opinion, the purpose of conducting the research and research methods are rather not entirely clear. The authors describe the modules in general terms, but it is difficult to refer to the method because there is no information about what pedestrian characteristics are analyzed? The description of the methodology should be more detailed. In addition to the description of the modules, there should be a description of the data that was used in each module and more detailed analysis diagrams. The figures presented showing the architecture of the solutions used should be described in more detail.

The discussion could be developed by comparing the research with others in the area presented by the authors (the discussion should refer to the aim of the research and include more reference sources).

 

The conclusions should state in which scientific and practical areas the presented research can be useful. 

Author Response

The subject of the article is interesting, and the problem is worth exploring, but it is recommended to complement the discussion and conclusion section with a clear indication of what is innovative in the presented solution in comparison to other such solutions (authors' contribution to the development of science). It would also be necessary to characterize the barriers that the authors faced in their experiments and to point out the advantages of the presented solution compared to other research in this area.

Reply: According to official statistics, the test accuracy of YOLO model is only 55.3%. The training accuracy and testing accuracy of the YOLO model before improvement are 80.64% and 69.45%, respectively. The improved YOLOv3 pedestrian detection model and DeepSort pedestrian tracking method were compared and verified in the same experimental environment. The training accuracy and testing accuracy of the improved YOLO model are 82.21% and 72.02%, respectively. Compared with the model before improvement, the training accuracy and detection accuracy of the improved model are improved by 1.57% and 2.57 respectively. The verification results show that the improved pedestrian detection model has a stronger ability to deal with occlusion, and accurately detects missed and misdetected images, which solves the tracking failure caused by occlusion before improvement. The validated result on training accuracy and testing accuracy of the improved YOLO model as shown in the following figure.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

In my opinion, the purpose of conducting the research and research methods are rather not entirely clear. The authors describe the modules in general terms, but it is difficult to refer to the method because there is no information about what pedestrian characteristics are analyzed? The description of the methodology should be more detailed. In addition to the description of the modules, there should be a description of the data that was used in each module and more detailed analysis diagrams. The figures presented showing the architecture of the solutions used should be described in more detail.

Reply: The Fig.4 and Tab.1 describe the overall structure of the YOLO+CBAM model in the manuscript, and the Fig.6 has been added to further explain the DeepSort pedestrian tracking process.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The discussion could be developed by comparing the research with others in the area presented by the authors (the discussion should refer to the aim of the research and include more reference sources).

Reply: Yes, I agree with you. For comparison of pedestrian tracking evaluation index results, some evaluation parameters are added on the basis of original evaluation parameters in Tab.2. The evaluation index IDSW represents the switching times of identity ID of tracking track matching, which is used to measure the ability of the model to deal with occlusion problem. If the identity ID of the pedestrian remains unchanged after occlusion, the target before and after occlusion is judged to be the same, indicating accurate tracking. It is found in the table that the IDSW value of the improved model is lower, which indicates that the model has better ability to deal with occlusion. The evaluation index FPS is used to measure the processing speed and real-time performance of the model for each frame of video, and the value has little change, indicating that it can meet the requirements of real-time performance.

Tab.2 Statistical results of pedestrian tracking evaluation indicators

Model

MOTA

MOTP

MT

ML

FM

IDSW

FPS

YOLO+DeepSort

49

60.11

24

22.6

986

611

36

Improved YOLO+DeepSort

53.8

62.12

28.2

21.8

241

120

33

The conclusions should state in which scientific and practical areas the presented research can be useful.

Reply: Yes, I agree with your opinion. This paper mainly focuses on the special cases of missing detection, misdetection and tracking failure caused by small pedestrians with multiple targets and partially blocked pedestrians in dim driving environment, and solves the problems of multi-target tracking failure caused by occlusion. The research results of this paper have important reference value for theoretical research and development of human-vehicle collision avoidance system, such as the ADAS system, intelligent vehicle system, and AEB system, and so on.

Round 2

Reviewer 3 Report

Thank you for the improvements.

Back to TopTop