Next Article in Journal
Design and Implementation of a Planar MIMO Antenna for Spectrum-Sensing Applications
Next Article in Special Issue
Use of QUIC for Mobile-Oriented Future Internet (Q-MOFI)
Previous Article in Journal
Application of Voiceprint Recognition Technology Based on Channel Confrontation Training in the Field of Information Security
 
 
Article
Peer-Review Record

An Effective YOLO-Based Proactive Blind Spot Warning System for Motorcycles

Electronics 2023, 12(15), 3310; https://doi.org/10.3390/electronics12153310
by Ing-Chau Chang 1, Chin-En Yen 2, Ya-Jing Song 1, Wei-Rong Chen 1, Xun-Mei Kuo 1, Ping-Hao Liao 1, Chunghui Kuo 3 and Yung-Fa Huang 4,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4:
Electronics 2023, 12(15), 3310; https://doi.org/10.3390/electronics12153310
Submission received: 28 June 2023 / Revised: 24 July 2023 / Accepted: 31 July 2023 / Published: 2 August 2023
(This article belongs to the Special Issue Advances and Challenges in Future Networks)

Round 1

Reviewer 1 Report

Very interesting work and nicely presented. It is particularly interesting that the authors also presented Figure 17 with the proposed setup. I do have some minor comments:

- There are related works (the authors mention plenty) that deal with this problem. It is not clearly highlighted how the proposed work compares to related works.

- It is unclear whether this system was actually tested in real life. Was it? If so, make sure it is highlighted in the paper since it is significant.

Minor editing and typos throughout the text.

Author Response

Point 1: Very interesting work and nicely presented. It is particularly interesting that the authors also presented Figure 17 with the proposed setup. I do have some minor comments:

- There are related works (the authors mention plenty) that deal with this problem. It is not clearly highlighted how the proposed work compares to related works.

Response 1: Thank you for this comment. We have added the third contribution of this study at the end of Section 1 to highlight that this system has been tested in real life. A new Section 4.3.1 presents the real-world evaluation of the implemented system. It includes a video clip and five snapshots recorded on the road [19] and six key metrics, i.e., the average image classification and processing time in the cloud server, the image transmission rate of the Android phone, the round-trip delay between the Android phone and the cloud server, YOLOv4 classification accuracy and error rate of the estimated distance, to provide concrete evidences of the PBSW system's efficiency and reliability.

Point 2: - It is unclear whether this system was actually tested in real life. Was it? If so, make sure it is highlighted in the paper since it is significant.

Response 2: Thank you for this comment. Yes, the real systems have been implemented as the revised text in Section 4.3.1. A new Section 4.3.1 presents the real-world evaluation of the implemented system. It includes a video clip and five snapshots recorded on the road [19] and six key metrics, i.e., the average image classification and processing time in the cloud server, the image transmission rate of the Android phone, the round-trip delay between the Android phone and the cloud server, YOLOv4 classification accuracy and error rate of the estimated distance, to provide concrete evidences of the PBSW system's efficiency and reliability.

Reviewer 2 Report

In the manuscript, the authors introduced a proactive bus blind spot warning system to enable motorcyclists to be notified when entering a bus blind spot. The authors provided a detailed pipeline for the system including software and hardware. The manuscript is easy to follow and the proposed system sounds practical. However, there are some suggestions:

1. It would be better if the authors could explain the reasons to choose YOLOv4 as the baseline model in detail. If the authors want to balance the performance and inference speeds of the detection model, it would be better if the authors could also discuss the existing works about efficient object detection [a][b][c].

[a] "Efficientdet: Scalable and efficient object detection." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.

[b] "Dynamic proposals for efficient object detection." arXiv preprint arXiv:2207.05252 (2022).

[c] "Learning efficient object detection models with knowledge distillation." Advances in neural information processing systems 30 (2017).

2. In the manuscript, the authors mentioned the model's inference speed (FPS < 10). I wonder whether it will be too slow for the real-world application. It would be better if the authors could briefly explain that.

3. It would be better if the authors could compare their system with the existing blind-spot detection products or some other object detection models to show the motivations and contributions since blind-spot detection is well-investigated and has plenty of products in use.

Good 

Author Response

Point 1: In the manuscript, the authors introduced a proactive bus blind spot warning system to enable motorcyclists to be notified when entering a bus blind spot. The authors provided a detailed pipeline for the system including software and hardware. The manuscript is easy to follow and the proposed system sounds practical.

Response 1: Thank you for this comment.

Point 2: However, there are some suggestions: It would be better if the authors could explain the reasons to choose YOLOv4 as the baseline model in detail. If the authors want to balance the performance and inference speeds of the detection model, it would be better if the authors could also discuss the existing works about efficient object detection [a][b][c].

[a] "Efficientdet: Scalable and efficient object detection." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.

[b] "Dynamic proposals for efficient object detection." arXiv preprint arXiv:2207.05252 (2022).

[c] "Learning efficient object detection models with knowledge distillation." Advances in neural information processing systems 30 (2017).

Response 2: Thank you for this comment. Section 4.3.4 presents YOLOv4 classification results of the PBSW system. The YOLOv4 classification results, i.e., Precision, Recall and Average Precision (AP), are shown in Table 3, when TP=392, FP=24 and FN=58. Overall, the AP is 92.82% and F1-Score is 0.91 in this real-world evaluation. In terms of these results, this PBSW system, which performs the YOLOv4 model in the cloud server, can accurately identify the rear-view mirror when the motorcycle moves into the proximity of the bus.

As discussed in Section 4.3.5, if a more efficient image board, instead of Raspberry pi 3B+, is used at the motorcycle side, it may execute the YOLOv4 or other advanced image recognition model locally to accurately detected the bus's rear-view mirror, while also avoiding the fluctuated RTTs to transmit images to the cloud server through unstable 4G mobile network. The recommended object detection references are discussed as the following.

[a]: Line 167-171

“ In [23], the authors proposed a vehicle detection algorithm based on the YOLOv3 model trained with a great volume of traffic data. The model is pruned to ensure its efficiency on the edge equipment. Then, a real-time vehicle tracking counter for vehicles that combines the vehicle detection and vehicle tracking algorithms was proposed to realize the detection of traffic flow.

[b]: lines 194-197

“ In [31], the authors propose a simple yet effective method which is adaptive to different computational resources by generating dynamic proposals for object detection. Moreover, a dynamic model is extended to choose the number of proposals according to the input image, greatly reducing computational costs.”

[c]: lines 197-200

“ In [32], the authors address several innovations such as a weighted cross-entropy loss to address class imbalance to handle the regression component and adaptation layers. A comprehensive empirical evaluation with different distillation configurations was conducted over multiple datasets.”

“4.3.5. Discussions

In the following, four issues are addressed to improve this PBSW system in the future.

  • Section 4.3.1 presents the real-world evaluation of the implemented system and results of six performance metrics in our test scenario, which presenting its profound ability to reduce accidents and significantly enhance motorcycle safety. However, there are many different scenarios on the road, which needs significant time to test this system. We will continue working on it to collect data for stressing the usability of this PBSW system across different scenarios.
  • Section 4.3.3 discusses the average round-trip delay (RTT) measured between the Android phone and the cloud server in the real-world evaluation. The RTT includes the one-way transmission delay for the Android phone to send images to the cloud server, the YOLOv4 image classification and processing time in the cloud server and the one-way transmission delay for the cloud server to send the results back to Android phone. As a result, the average one-way transmission delay between the Android phone and the cloud server is 0.245 sec. Please note that these RTTs are measured by the Android phone when it communicates with the cloud server through a stable 4G mobile network. If the 4G signal suffers from significant interferences or blocks by obstacles, these RTTs may fluctuate seriously.
  • This system uses Raspberry pi 3B+ board as the main body to connect two USB cameras for obtaining road images with a resolution of 640×480 pixels per frame and seven frames per second (fps). The Android phone also creates a connectionless UDP socket to transmit the captured images to the cloud server through the mobile network for training the YOLOv4 model. Due to the load of the Android phone and the fluctuated transmission rate of the 4G mobile network, the average image transmission rate of this PBSW system is further limited to 4 fps, which becomes performance bottleneck of this system. Hence, the predicted motorcycle location is not continuous, as shown in the video clip recorded in the real-world evaluation. In the future, we will replace the Raspberry pi 3B+ board with a more efficient one, like the Jetson Nano board [20], to raise the image capture rate, image transmission rate and image inference speed accordingly.”

Point 3: In the manuscript, the authors mentioned the model's inference speed (FPS < 10). I wonder whether it will be too slow for the real-world application. It would be better if the authors could briefly explain that.

Response 3: Thank you for this comment. This system uses Raspberry pi 3B+ board as the main body to connect two USB cameras for obtaining road images with a resolution of 640×480 pixels per frame and seven frames per second (FPS). The Android phone also creates a connectionless UDP socket to transmit the captured images to the cloud server through the mobile network for training the YOLOv4 model. Due to the load of the Android phone and the fluctuated transmission rate of the 4G mobile network, the average image transmission rate of this PBSW system is further limited to 4 FPS, which becomes performance bottleneck of this system. Hence, the predicted motorcycle location is not continuous, as shown in the video clip recorded in the real-world evaluation. In the future, we will replace the Raspberry pi 3B+ board with a more efficient one, like the Jetson Nano board [20], to raise the image capture rate, image transmission rate and image inference speed accordingly. The further explanations are shown on lines 299-313.

“This system uses Raspberry pi 3B+ board as the main body to connect two USB cameras for obtaining road images with a resolution of 640×480 pixels per frame and seven frames per second (FPS). It also connects a Wi-Fi module to transmit images to Android phones using the connectionless UDP socket. The hardware equipment of the motorcycle is fixed on the steering handle of the motorcycle, as shown in Figure 17. The part on the left is a dual-lens device, and the part on the right is an Android phone. As mentioned above, the Android phone also creates a connectionless UDP socket to transmit the captured images to the cloud server through the mobile network for training the YOLOv4 model. The cloud server used in this system is equipped with Intel i7-8700 CPU and NVIDIA GeForce RTX 2080 Ti GPU, which is used to identify the rearview mirror in the captured image and calculate the distance between the rearview mirror and the motorcycle. When it detects that the motorcycle has entered the blind spot or the inner wheel difference area of the bus, it will immediately send a warning to the PBSW APP of the motorcyclist. This PBSW system uses C++ to implement the YOLOv4 model and SURF image matching algorithm, JAVA to implement Android APP, and Python and opencv-python to implement dual-lens image acquisition. “

Point 4: It would be better if the authors could compare their system with the existing blind-spot detection products or some other object detection models to show the motivations and contributions since blind-spot detection is well-investigated and has plenty of products in use.

Response 4: Thank you for this comment. We discussed the existing blind-spot detection products in line 120-124.

“In the current blind spot detection system research, most of the detections are like [12], using images or signals received by sensors to detect dangerous factors (such as pedestrians, motorcycles or small vehicles, etc.) in the blind spot area of the vehicle. When dangerous factors are detected around the vehicle, the system will remind the driver and even further inform the surrounding dangerous factors to reduce accidents.”

Reviewer 3 Report

From my understanding, your work here extends the article that you published at a IEEE conference in 2020:

Chang, C., Chen, W. R., Kuo, X. M., Song, Y. J., Liao, P. H., & Kuo, C. (2020, November). An Artificial Intelligence-based Proactive Blind Spot Warning System for Motorcycles. In 2020 International Symposium on Computer, Consumer and Control (IS3C) (pp. 404-407). IEEE.   One of the first observations is that the results in this article are identical to the ones from the conference. Usually, the extended article needs to undergo more work, and based on the input received from the conference, to present improved results. In your case, there is no improvement, because there is no additional work undergone. Secondly, there are new authors added to the manuscript submitted here. For example, let's take Y.-F.H. You state that he was involved in the conceptualization. However, he was not an author in the first, original article, which presented the entire concept. You need to address this type of issue, and to present a real Authors Contribution section. As it is, it may seem misleading.   Moving on, your manuscript has moderate-to-severe English problems. Let's take e.g. the first 2 sentences from your article: 1. The goal of our proposed system in this paper is to design a proactive bus blind spot warning (PBSW) system that will immediately notify motorcyclists when they enter the blind spot of a target vehicle, i.e., a bus. 2. The hardware devices of our proposed system on motorcycle side, consisting of a Raspberry Pi 3B+ and a dual-lens stereo camera.   You can see they both have problems. For sentence 1, you don't need to repeat the fact that you proposed the system, the words "in this paper" are redundant, and you repeat the word "system" twice. Here's a rewrite: "The goal of this research is to design a proactive bus blind spot warning (PBSW) system that will immediately notify motorcyclists when they enter the blind spot of a target vehicle, i.e., a bus. "   The sentence 2 is grammatically incorrect. Here's a rewrite: "The proposed hardware is placed on the motorcycle and consist of a Raspberry Pi 3B+ and a dual-lens stereo camera."   You have problems like this throughout your entire manuscript. I would expect a complete rewrite.   Moving on 18 references is not nearly enough. ADAS work is booming. Blind spot detection is effervescent. You should have at least 40-50 good references. Your state of the art is weak.   Moving on the the technical part, everything looks interesting, however, I am having a problem with the Round-Trip Delay. You state that it is 0.5 on average. Could you split this time further? How long does it take for you to send the data, to process it, and to send it back? I am having a hard time imagining this.  

The English needs to be improved.

Author Response

Point 1: From my understanding, your work here extends the article that you published at a IEEE conference in 2020: Chang, C., Chen, W. R., Kuo, X. M., Song, Y. J., Liao, P. H., & Kuo, C. (2020, November). An Artificial Intelligence-based Proactive Blind Spot Warning System for Motorcycles. In 2020 International Symposium on Computer, Consumer and Control (IS3C) (pp. 404-407). IEEE. One of the first observations is that the results in this article are identical to the ones from the conference. Usually, the extended article needs to undergo more work, and based on the input received from the conference, to present improved results. In your case, there is no improvement, because there is no additional work undergone.

Response 1: Thank you for this comment. This version of manuscript has been significantly extended from the conference one as follows.

In the Introduction section, we add sentences to mention importance of ADAS and BSD systems first. Related BSD systems fails to notify nearby vehicle drivers in this blind spot of the possible collision. We further explain the definition of the difference of radius inner wheels in Figure 2, which is a new function of this version. Finally, we add three major contributions of this study.

In the Related Work section, we add four paragraphs to discuss operations and defects of traditional blind spot detection systems [8-12], which enhances motivations of this proactive PBSW system. In Section 2, Ref. [14]-[18] and [20]-[32] are added to discuss the updated development of YOLOv4. Section 2.5 and Figure 4 is added to discuss related work [16, 17] of the Inner Wheel Difference Calculation.

In Section 3, new Section 3.2 is first added to describe three-dimensional and two-dimensional relationship in Figure 7 and the camera calibration process of this system in Figure 8. Second, detailed steps to find and classify the target in the Cloud server are mentioned in Section 3.3 and shown in Figure 10. Third, Figure 11 shows how to use the SURF feature point sets of the left and right images to match them in the matching process. Finally, two steps, which are illustrated in Figures 14 and 15, to calculate of the relative angle between the bus and the motorcycle are added.

In Section 4, we first add four figures, i.e., Figures 18 to 21 in Section 4.2, to show the map mode and three cases of the warning mode of the Android PBSW APP. Second, we add new Section 4.3.1 to present the real-world evaluation of the implemented system with the video clip shot on the road [19]. Five figures, i.e., Figure 22 and Figure (a), (b), (c) and (d), shows snapshot of the PBSW system from this video clip for emphasizing the unique advantages, i.e., its real-time functionality and significant impact on enhancing motorcyclist safety. Third, we add a paragraph in Section 4.3.3 to discuss how RTTs are measured. Table 2 is added to present the average image processing time in the cloud server, the image transmission rate of the Android phone and the round-trip delay between the Android phone and the cloud server in the real-world evaluation. Finally, four issues in new Section 4.3.5 are addressed to improve this PBSW system in the future.

In summary, we have added 17 new figures, 1 new table, 4 new Sections and 33 new references in this revised version, as compared to the conference version. The word count of this revised version is increased from 3166 in the conference version to 8100. Consequently, this work has undergone significant improvements, as mentioned above.

Point 2: Secondly, there are new authors added to the manuscript submitted here. For example, let's take Y.-F.H. You state that he was involved in the conceptualization. However, he was not an author in the first, original article, which presented the entire concept. You need to address this type of issue, and to present a real Authors Contribution section. As it is, it may seem misleading.  

Response 2: Thank you for this comment. We have updated the author contribution section to present their real contributions.

Point 3: Moving on, your manuscript has moderate-to-severe English problems. Let's take e.g. the first 2 sentences from your article: 1. The goal of our proposed system in this paper is to design a proactive bus blind spot warning (PBSW) system that will immediately notify motorcyclists when they enter the blind spot of a target vehicle, i.e., a bus. 2. The hardware devices of our proposed system on motorcycle side, consisting of a Raspberry Pi 3B+ and a dual-lens stereo camera. You can see they both have problems. For sentence 1, you don't need to repeat the fact that you proposed the system, the words "in this paper" are redundant, and you repeat the word "system" twice. Here's a rewrite: "The goal of this research is to design a proactive bus blind spot warning (PBSW) system that will immediately notify motorcyclists when they enter the blind spot of a target vehicle, i.e., a bus. "   The sentence 2 is grammatically incorrect. Here's a rewrite: "The proposed hardware is placed on the motorcycle and consist of a Raspberry Pi 3B+ and a dual-lens stereo camera."   You have problems like this throughout your entire manuscript. I would expect a complete rewrite.

Response 3: Thank you for this comment. We have rewritten this manuscript.

4.3.1. Real-World Evaluation of the Implemented System

For emphasizing the unique advantages, i.e., its real-time functionality and significant impact on enhancing motorcyclist safety, of this proposed PBSW system, it has been tested in real life. Please refer to the video clip shot on the road [19]. Figure 22 presents the map mode snapshot of the PBSW system from this video clip when the motorcycle finds no bus in front. The images taken by the left and right lens, the actual hardware layout of the motorcycle and the Android PBSW APP are shown in the upper left, lower left and right parts of Figure 22. Whenever the cloud server has recognized the rear mirrors of the bus and has identified that the motorcycle has entered the visible area, i.e., the green zone, or the blind spot, i.e., the red zone, the Android PBSW APP illustrates the location of the motorcycle as it receives relative angle and distances from the cloud server. Figure 23 (a) and (b) present the PBSW warning mode snapshots as soon as the motorcycle enters the visible area of the bus when it is behind the bus and reaches the rear end of the bus, respectively. Figure 23 (c) and (d) present the PBSW warning mode snapshots as soon as the motorcycle enters the blind spot of the bus when it is beside the bus and reaches the front end of the bus, respectively.

In the following, six key metrics, i.e., the average image classification and processing time in the cloud server, the image transmission rate of the Android phone, the round-trip delay between the Android phone and the cloud server, YOLOv4 classification accuracy and error rate of the estimated distance, to provide concrete evidences of the PBSW system's efficiency and reliability are measured in the real-world evaluation. “

Point 4:  Moving on 18 references is not nearly enough. ADAS work is booming. Blind spot detection is effervescent. You should have at least 40-50 good references. Your state of the art is weak.

Response 4: Thank you for this comment. The advanced ADAS functionalities are designed to reduce the road accidents by continuously monitoring the inside and outside activities through multiple sensor arrangements. The ADAS functionalities can minimizes the risks by reducing the driver error, predicting the trajectories, continuously alerting the drivers, and taking a vehicle control during driver’s incapable situations [3]. Due to the frequent occurrence of traffic accidents, safe driving assistance systems have become the most popular and critical technology on the market. In future, safe driving assistance systems will play an important role in protecting the safety of drivers and passengers and preventing traffic accidents. Moreover, the references are extended to 41 good references.

“Further feature extraction is performed on each candidate area, and a classifier is used to determine whether the region contains a target object and to precisely locate the target. Representative models of this method include (1) R-CNN, (2) Fast R-CNN, (3) Faster R-CNN, and (4) Mask R-CNN [14]. On the other hand, the "One-stage" model combines the feature extraction network and the target detection network to extract image features. Then, the model can predict the location and category of the target, can directly perform object recognition on the entire image to eliminate the process of generating candidate regions [15].

Two-stage often uses some selective search techniques to reduce the amount of calculation and improve the detection accuracy. Therefore, One-stage has faster speed and lower computational cost, but the detection accuracy is usually lower. In contrast, two-stage can achieve higher detection accuracy, but it is slower and more computationally expensive [16]. In order to overcome these difficulties, the researchers proposed various methods and techniques, such as introducing attention mechanisms, multi-scale feature fusion and contextual information, to enhance the feature representation ability of small targets and improve the accuracy of detection [17].

In the literature [18], the GoogLeNet Inception structure and Rainbow series are cited on the basis of SSD detection network, the SSD Inception v3 Rainbow (INAR-SSD) model architecture is proposed, and the performance of INAR-SSD, Faster RCNN and SSD is compared, and the experimental results show that the accuracy of INAR-SSD model on the ALDD dataset is 78.80%. Since the rear-viewed mirror of the bus is a small object in the image relative to the vehicle, as same as the license plate, we refer to the method of YOLO object detector, which is using in the License Plate Recognition (ALPR) system [19], as our identification method to help us accurately distinguish the position of the rear-viewed mirror of the bus.

YOLOv1 [20] used deep convolutional neural network as a feature extractor. The input image is resized to 448×448 to ensure that all objects can be effectively detected. Split the adjusted input image into 7×7 meshes. YOLOv1 detection schematic [20] and YOLOv2 [21] uses the deep convolutional neural network, which consists of 19 convolutional layers and 5 maximum pooling layers, as a feature extractor, with a small model size and computation while maintaining good feature extraction capabilities. YOLOv3 [22] uses the deep convolutional neural network of Darknet-53 as a feature extractor. Darknet-53 has 15 convolutional layers and can better capture features in images.

In [23], the authors proposed a vehicle detection algorithm based on the YOLOv3 model trained with a great volume of traffic data. The model is pruned to ensure its efficiency on the edge equipment. Then, a real-time vehicle tracking counter for vehicles that combines the vehicle detection and vehicle tracking algorithms was proposed to realize the detection of traffic flow.

Module YOLOv4 [24] uses CSPDarkNet53 as a feature extractor. CSPDarkNet53 combines the Cross Stage Partial (CSP) module with the Darknet-53 module to improve feature representation and learning ability by branching and fusing the network. Training on object detection using the Complete IoU (CIoU) loss function. CIoU considers the complete cross-parallel relationship between the prediction box and the real box, which can more accurately measure the accuracy of the predicted box and further improve the detection accuracy. YOLOv5 [25] has five detection models in the public library.

With adding a small target detection layer to the overall model, it improves much accuracy [26]. In addition, model compression and acceleration are added to carry on small image computing units with limited resources, and the network is further streamlined by channel pruning, and finally a lightweight detection model is obtained. In [27], the GSC YOLOv5 object detection algorithm is proposed. In YOLOv5, the lightweight of the algorithm structure is used to reduce the number of parameters and FLOPs of the model. The CBAM attention module is added to replace the SE attention module to increase the ability to extract spatial information [28]. In [29], in the Backbone of YOLOv5, the Ghost module is cited to replace the original convolution, and the method of adding a transformer at the end improves the ability of feature expression. In [30], the R-YOLOv5 object detection algorithm is proposed, and the Swin Transformer module, Feature Enhancement and Attention Mechanism (FEAM) and Adaptively Spatial Feature Fusion (ASFF) are cited. On the basis of YOLOv5, with adding the Swin Transformer module to the end of the Backbone, it enhances the characteristics of small objects and reduces the interference of complex backgrounds on small objects.

In [31], the authors propose a simple yet effective method which is adaptive to different computational resources by generating dynamic proposals for object detection. Moreover, a dynamic model is extended to choose the number of proposals according to the input image, greatly reducing computational costs. In [32], the authors address several innovations such as a weighted cross-entropy loss to address class imbalance to handle the regression component and adaptation layers. A comprehensive empirical evaluation with different distillation configurations was conducted over multiple datasets”

Point 5: Moving on the technical part, everything looks interesting, however, I am having a problem with the Round-Trip Delay. You state that it is 0.5 on average. Could you split this time further? How long does it take for you to send the data, to process it, and to send it back? I am having a hard time imagining this.  

Response 5: Thank you for this comment. In Section 4.3.3, we have discussed the average round-trip delay (RTT) measured between the Android phone and the cloud server in the real-world evaluation. The RTT includes the one-way transmission delay for the Android phone to send images to the cloud server, the YOLOv4 image classification and processing time in the cloud server and the one-way transmission delay for the cloud server to send the results back to Android phone. First, the average image classification time is 0.01 sec in the real-world evaluation, as mentioned in Section 4.3.2. Second, the image processing steps in the cloud server include step 5 in Section 3.3 to match each feature point of the rear-view mirrors with the SURF algorithm and step 6 to estimate the precise distance between the lens and the rear-view mirror, which spends 0.04ms. Finally, the average one-way transmission delay between the Android phone and the cloud server is 0.245 sec. These values are listed in Table 2. Please note that these RTTs are measured by the Android phone when it communicates with the cloud server through a stable 4G mobile network. If the 4G signal suffers from significant interferences or blocks by obstacles, these RTTs may fluctuate seriously.

“4.3.3. Average Image Transmission Rate, Image Processing Time and Round-Trip Delay

For calculating the average image transmission rate of the Android phone and the average round-trip delay (RTT) between the Android phone and the cloud server in the real-world evaluation, the following steps are executed. When the Android phone is ready to send pairs of images to the cloud server, it records the timestamps to send the first and nth pair of images at time  and , respectively. Hence, the average image transmission rate is defined as  fps. For calculating the RTT for the ith pair of images, the Android phone records the timestamp, , to send this pair of images and the timestamp, , when it receives recognition results of this pair from the cloud server. Hence, the average RTT for n pairs of images is defined as . In the real-world evaluation when n is 20000 and the 4G mobile network is stable, the average image transmission rate and the average RTT of this PBSW system are 4 fps and 0.5 seconds, respectively.

The RTT includes the one-way transmission delay for the Android phone to send images to the cloud server, the YOLOv4 image classification and processing time in the cloud server and the one-way transmission delay for the cloud server to send the results back to Android phone. The average image classification time of 0.01 sec in the real-world evaluation as mentioned above. The image processing steps in the cloud server include step 5 in Section 3.3 to match each feature point of the rear-view mirrors with the SURF algorithm and step 6 to estimate the precise distance between the lens and the rear-view mirror, which spends 0.04ms. Finally, the average one-way transmission delay between the Android phone and the cloud server is 0.245 sec. Please note that these RTTs are measured by the Android phone when it communicates with the cloud server through a stable 4G mobile network. If the 4G signal suffers from significant interferences or blocks by obstacles, these RTTs may fluctuate seriously. “

Reviewer 4 Report

The authors have proposed a model “An Effective YOLO-based Proactive Blind Spot Warning System for Motorcycles.” I have the major comments for this manuscript.

 

  1. The keywords are failed to highlight the main contribution of this paper.
  2. Related work should be better motivated with issues and problems associated with existing approaches and how the proposed approach is the advance model.
  3. Clearly highlight the unique advantages of the proposed proactive bus blind spot warning (PBSW) system, emphasizing its real-time functionality and significant impact on enhancing motorcyclist safety.
  4. Provide a detailed and concise explanation of the technical implementation process, specifically outlining the seamless transmission of images from the Raspberry Pi board to the Android phone and subsequently to the cloud server, ensuring clarity and understanding for readers.
  5. Thoroughly address the performance and reliability of the YOLOv4 image recognition model in accurately detecting the bus's rear-view mirror, while also acknowledging any potential limitations or challenges encountered during its implementation.
  6. Elaborate on the methodology and algorithms utilized for estimating the precise distance between the motorcycle and the bus, showcasing the effectiveness of the lens imaging principle and its ability to deliver accurate distance calculations.
  7. Present the results of the performance evaluation in a clear and concise manner, including key metrics such as image classification speed, round-trip delay, and distance prediction accuracy, thereby providing concrete evidence of the PBSW system's efficiency and reliability.
  8. For fair comparison, authors should also compare their study with the most recent works to same objective.
  9. Discuss the practical implications and potential real-world applications of the PBSW system, underscoring its usability across different scenarios and its profound ability to reduce accidents and significantly enhance motorcycle safety.
  10. List of references should be improved with recent work.

 

English can be improved 

Author Response

Point 1: The authors have proposed a model “An Effective YOLO-based Proactive Blind Spot Warning System for Motorcycles.” I have the major comments for this manuscript. The keywords are failed to highlight the main contribution of this paper.

Response 1: Thank you for this comment. We have revised this manuscript and added a subsection to present the on-road evaluation of the implemented system. Hence, the keywords have been modified as “proactive warning system; blind spot; area of the inner wheel difference; motorcycle; real-time; Raspberry Pi; dual-lens camera; Android App; YOLO; distance estimation” to highlight the main contribution of this paper.

Point 2: Related work should be better motivated with issues and problems associated with existing approaches and how the proposed approach is the advance model.

Response 2: Thank you for this comment. In Section 1, we have clarified the contribution of this study. Moreover, the motorcycle riders who use this proactive blind spot warning system will receive the alarm when they enter the blind spot of the bus, which is shown in Figure 1, or the area of difference of radius between inner wheels, which is shown in Figure 2.

Point 3: Clearly highlight the unique advantages of the proposed proactive bus blind spot warning (PBSW) system, emphasizing its real-time functionality and significant impact on enhancing motorcyclist safety.

Response 3: Thank you for this comment. We have highlighted the unique advantages of the proposed proactive bus blind spot warning (PBSW) system in the fourth paragraph in Section 1 and have added the third contribution of this study at the end of Section 1 to present that this system has been tested in real life. A new Section 4.3.1 presents the real-world evaluation of the implemented system. It includes a video clip and five snapshots recorded on the road [19] and six key metrics, i.e., the average image classification and processing time in the cloud server, the image transmission rate of the Android phone, the round-trip delay between the Android phone and the cloud server, YOLOv4 classification accuracy and the error rate of the estimated distance, to emphasize its real-time functionality and significant impact on enhancing motorcyclist safety.

Point 4: Provide a detailed and concise explanation of the technical implementation process, specifically outlining the seamless transmission of images from the Raspberry Pi board to the Android phone and subsequently to the cloud server, ensuring clarity and understanding for readers.

Response 4: Thank you for this comment. In Section 4.1 we presented the hardware and software of the Implemented PBSW system. This system uses Raspberry pi 3B+ board as the main body, connects two USB cameras to obtain road images with a resolution of 640×480, 7 frames per second (FPS) and a Wi-Fi module to transmit images to Android phones using the connectionless UDP socket. The Android phone also creates a connectionless UDP socket to transmit the captured images to the cloud server through the mobile network for training the YOLOv4 model.

Point 5: Thoroughly address the performance and reliability of the YOLOv4 image recognition model in accurately detecting the bus's rear-view mirror, while also acknowledging any potential limitations or challenges encountered during its implementation.

Response 5: Thank you for this comment. Section 4.3.4 presents YOLOv4 classification results of the PBSW system. The YOLOv4 classification results, i.e., Precision, Recall and Average Precision (AP), are shown in Table 3, when TP=392, FP=24 and FN=58. Overall, the AP is 92.82% and F1-Score is 0.91 in this real-world evaluation. In terms of these results, this PBSW system can accurately identify the rear-view mirror when the motorcycle moves into the proximity of the bus. Consequently, this PBSW system works well in real life.

Point 6: Elaborate on the methodology and algorithms utilized for estimating the precise distance between the motorcycle and the bus, showcasing the effectiveness of the lens imaging principle and its ability to deliver accurate distance calculations.

Response 6: Thank you for this comment. In Section 3.3 the Blind Spot Warning Process of PBSW is presented on the methodology. Step 6 in the complete blind spot warning process of the PBSW system explains the algorithm utilized for estimating the precise distance between the motorcycle and the bus, according to the principle of lens imaging [18]. Figure 22 shows the error rate at each location beside the bus in the 5 m ´ 20 m area, which is 1 m behind ocular points of the bus driver. This experimental data shows that the proposed distance estimation approach in the PBSW system achieves different error rates for different locations of the motorcycle. The maximum error rate of the dual-lens ranging method used in this PBSW system is 0.20%, which showcasing the effectiveness of this lens imaging principle and its ability to deliver accurate distance calculations.

Point 7: Present the results of the performance evaluation in a clear and concise manner, including key metrics such as image classification speed, round-trip delay, and distance prediction accuracy, thereby providing concrete evidence of the PBSW system's efficiency and reliability.

Response 7: Thank you for this comment. A new Section 4.3.1 is rewritten for the real-world evaluation of the implemented system. It includes a video clip and five snapshots recorded on the road [19] and six key metrics, i.e., the average image classification and processing time in the cloud server, the image transmission rate of the Android phone, the round-trip delay between the Android phone and the cloud server, YOLOv4 classification accuracy and error rate of the estimated distance, to provide concrete evidences of the PBSW system's efficiency and reliability.

Point 8: For fair comparison, authors should also compare their study with the most recent works to same objective.

Response 8: Thank you for this comment. We have evaluated the experimental results as shown in Tables 1-3. Then the improved results are obtained by the average one-way transmission delay between the Android phone and the cloud server is 0.245 sec.

Point 9: Discuss the practical implications and potential real-world applications of the PBSW system, underscoring its usability across different scenarios and its profound ability to reduce accidents and significantly enhance motorcycle safety.

Response 9: Thank you for this comment. A new Section 4.3.1 is rewritten for the real-world evaluation of the implemented system and results of six performance metrics in our test scenario, which presenting its profound ability to reduce accidents and significantly enhance motorcycle safety. However, there are many different scenarios on the road, which needs significant time to test this system. We will continue working on it to collect data for stressing the usability of this PBSW system across different scenarios.

Point 10: List of references should be improved with recent work.

Response 10: Thank you for this comment. In the Related Work section, we add four paragraphs to discuss operations and defects of traditional blind spot detection systems [8-12], which enhances motivations of this proactive PBSW system. In Section 2, Ref. [14]-[18] and [20]-[32] are added to discuss the updated development of YOLOv4. Section 2.5 and Figure 4 is added to discuss related work [16, 17] of the Inner Wheel Difference Calculation.

Round 2

Reviewer 2 Report

This is a manuscript after revision. I am grateful for the authors' responses, which have addressed all my concerns. I do not have any extra concerns.

Good

Reviewer 3 Report

The original paper was improved.

 

Still minor problems, but they should be solved during publishing phase.

 

Reviewer 4 Report

The authors have thoroughly revised the manuscript according to the comments raised in previous review, therefore, the quality of manuscript seems better to me, I think this paper should be published.

Minor check is required

Back to TopTop