Next Article in Journal
Low-Frequency Noise Modeling of 4H-SiC Metal-Oxide-Semiconductor Field-Effect Transistors
Next Article in Special Issue
UFace: An Unsupervised Deep Learning Face Verification System
Previous Article in Journal
Personalized Search Using User Preferences on Social Media
Previous Article in Special Issue
A Novel Deep-Learning-Based Enhanced Texture Transformer Network for Reference Image Super-Resolution
 
 
Article
Peer-Review Record

An Efficient Motion Registration Method Based on Self-Coordination and Self-Referential Normalization

Electronics 2022, 11(19), 3051; https://doi.org/10.3390/electronics11193051
by Yuhao Ren 1,2, Bochao Zhang 1,2, Jing Chen 1,2, Liquan Guo 1,2,* and Jiping Wang 1,*
Reviewer 1: Anonymous
Electronics 2022, 11(19), 3051; https://doi.org/10.3390/electronics11193051
Submission received: 29 August 2022 / Revised: 20 September 2022 / Accepted: 22 September 2022 / Published: 24 September 2022
(This article belongs to the Collection Computer Vision and Pattern Recognition Techniques)

Round 1

Reviewer 1 Report

The manuscript proposes an neural network-based approach to human activity classification applied on a set of features derived from human body self-coordinates by means of self-referential normalization. In terms of topic, the manuscript is suitable to the Journal. In terms of research design and presentation, it suffers from certain drawback. I suggest the authors to consider the possibility to address the following remarks.

Remarks:

1. The authors announce two novel aspects of their approach. The first aspect is focused on feature extraction rather than on the methodology which is based on predefined neural networks. In authors’ words: “this special action feature fusion method is another innovation of this paper” (p. 5, l., 232). The authors  should clearly explain the novelty in the feature extraction. While the self-coordinates and self-referential normalization can be discussed as novelty, the temporal ordering is certainly not novel. Some recent literature on self-referential normalization in the context of human activity analysis should be added in the reference list.

2. Another announced novel aspect is related to the position that the proposed feature extraction process allows for smaller datasets and simpler neural networks without loss of performance. In authors’ words: “No more data, more features or more complex networks are needed, which is one of the innovations of this paper” (p. 3, l. 135-136).  This statement is not fully supported in the manuscript. The reported study only demonstrates that the self-coordinates and self-referential normalization improve performance when applied on the considered feature set. It does not consider the following questions:
(i) Would a larger dataset underpinning the same methodology improve performance of the system?
(ii) Would more complex neural networks applied on the same datasets improve performance of the system?
(iii) How the proposed approach compare with another approaches in the literature?
The authors should consider at least some of these questions to support the stated novelty aspects.

3. The discrimination capacity of the proposed approach is not fully demonstrated. The proposed approach was evaluated for the task of classification of fundamental human behaviors of relatively small similarity (i.e., walking, boxing, handclapping, handwaving and jogging). However, the manuscript does not consider the question of how the proposed approach would perform for task-oriented human behaviors of significant interclass similarity. The authors are advised to evaluate their approach on a dataset containing recordings of task-oriented human activities (e.g., the Carnegie Mellon University Multimodal Activity Database, or any other appropriate dataset).

Additional remarks:

4. It is not clear why is the Action Quality Assessment relevant to this study and mentioned in the abstract and the introduction.

5. Some unnecessary hyphenation: “hu-man” (p. 1, l. 33), “im-proving” (p. 2, l. 59), “maxi-mum” (p. 5, l. 203), “it-self” (p. 5, l. 209), etc.

Author Response

Response to Reviewer 1 Comments

Thank you very much for your valuable comments, they have helped me a lot. Here are the changes I made in response to the comments you mentioned point-by-point.

1. The authors announce two novel aspects of their approach. The first aspect is focused on feature extraction rather than on the methodology which is based on predefined neural networks. In authors’ words: “this special action feature fusion method is another innovation of this paper” (p. 5, l., 232). The authors should clearly explain the novelty in the feature extraction. While the self-coordinates and self-referential normalization can be discussed as novelty, the temporal ordering is certainly not novel. Some recent literature on self-referential normalization in the context of human activity analysis should be added in the reference list.

Response 1: Thank you very much for your reminder. I am sorry that I did not describe clearly what the innovation of the article is, and the introduction seems to be a bit inconsistent with the content of the article. After referring to your comments, I found my mistake and made adjustments. The innovative point of this article is not feature fusion or feature extraction, but the use of self-coordinates and self-referential normalization to eliminate the inevitable errors in motion registration during AQA. This is a new direction different from the traditional methods to improve AQA accuracy.

In the introduction section, the background of the relevant issues in AQA is presented. Under the same goal of improving accuracy, various scholars have worked in different directions. Either to improve the existing network structure, or to increase the amount of data, or to increase the number of features. Then, a new direction for improving accuracy is introduced: improving accuracy by eliminating errors caused by human size differences and unnecessary displacements in AQA. This is my central idea in writing the introduction and related work.

2. Another announced novel aspect is related to the position that the proposed feature extraction process allows for smaller datasets and simpler neural networks without loss of performance. In authors’ words: “No more data, more features or more complex networks are needed, which is one of the innovations of this paper” (p. 3, l. 135-136).  This statement is not fully supported in the manuscript. The reported study only demonstrates that the self-coordinates and self-referential normalization improve performance when applied on the considered feature set. It does not consider the following questions:
(i) Would a larger dataset underpinning the same methodology improve performance of the system?
(ii) Would more complex neural networks applied on the same datasets improve performance of the system?
(iii) How the proposed approach compare with another approaches in the literature?
The authors should consider at least some of these questions to support the stated novelty aspects.

Response 2: Thank you very much for your suggestion. Indeed, as you said, based on the available experimental results, such a conclusion cannot be drawn. It was not rigorous, it was my mistake, and I am really sorry. In response, I have removed this inappropriate conclusion. Because this conclusion was not the focus of this article. And I have revised the entire article accordingly.

3. The discrimination capacity of the proposed approach is not fully demonstrated. The proposed approach was evaluated for the task of classification of fundamental human behaviors of relatively small similarity (i.e., walking, boxing, handclapping, handwaving and jogging). However, the manuscript does not consider the question of how the proposed approach would perform for task-oriented human behaviors of significant interclass similarity. The authors are advised to evaluate their approach on a dataset containing recordings of task-oriented human activities (e.g., the Carnegie Mellon University Multimodal Activity Database, or any other appropriate dataset).

Response 3: I strongly agree with this suggestion of yours. But the purpose of our experiment is actually to demonstrate that our method will have a further improvement effect on the classification accuracy of the existing classification network. That is, our method cannot determine the classification accuracy. What we can influence is to improve the original upper limit of the classification network. Even if we can only improve it a little bit, it is very meaningful. The experimental results have shown that our method can significantly improve the performance of both classification networks, which I think is a proof of the effectiveness of our method. I'm sorry I didn't express myself clearly. I have changed it accordingly in the text.

4. It is not clear why is the Action Quality Assessment relevant to this study and mentioned in the abstract and the introduction.

Response 4: Maybe I did not express myself clearly, our method is to eliminate the effect brought by body size difference and unnecessary displacement during AQA. In response I have made some adjustments to the abstract and the introduction.

5. Some unnecessary hyphenation: “hu-man” (p. 1, l. 33), “im-proving” (p. 2, l. 59), “maxi-mum” (p. 5, l. 203), “it-self” (p. 5, l. 209), etc.

Response 5: I am very sorry for the error caused by this carelessness. I have removed the unnecessary '-' symbol from the beginning to the end of the article.

Author Response File: Author Response.docx

Reviewer 2 Report

The proposed approach has enough novelty in methodology. Minor revision is needed in terms of technical details. Some comments are suggested. 

1. It is suggested to discuss about the runtime of retrieving process. (Comparison results are not needed)

2. Discuss about the skeleton process with more details. 

3. Is the "accuracy" an action category? What is the meaning of "accuracy" in the Tables 5 , 6 , 7? Did you mean "average"?

Author Response

Response to Reviewer 2 Comments

Thank you very much for your valuable comments, they have helped me a lot. Here are the changes I made in response to the comments you mentioned point-by-point.

1. It is suggested to discuss about the runtime of retrieving process. (Comparison results are not needed)

Response 1: On page 6, line 240-243 of the article, I added the average time required to fuse the features for each action. 12.08ms on AMD R7 CPU, which is less than the running time of the network itself. This indicates that adding the additional time consumption caused by our method is acceptable. And the improvement of the classification accuracy of our method on the classification network is also significant.

2. Discuss about the skeleton process with more details. 

Response 2: On page 6, line 216 of the article, I added a skeleton extraction section (3.3 Skeleton extraction). The main point is to briefly introduce the model we use for skeleton extraction - 'light-openpose', which is a lightweight skeleton extraction model developed by our group members on top of the existing opensource model 'openpose'. Using this model, it is easy and fast to obtain data on key points of the human skeleton

3. Is the "accuracy" an action category? What is the meaning of "accuracy" in the Tables 5, 6, 7? Did you mean "average"?

Response 3: I'm sorry I didn't write clearly. The "accuracy" is not an action category. As you said, it refers to the average accuracy of the action classification. I've corrected my description on page 7, line 281 and page 8, line 324. Thank you for your guidance.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The authors have adequately responded to the remarks from my previous review report and I believe the revised manuscript has been sufficiently improved.

Remark:

1. In several places in the manuscript, the authors state that the proposed approach “eliminates” errors in standard AQA approaches (cf. p. 1, l. 20; p. 2, l. 60 and 62, 71 and 73; p. 3, l. 147; p. 5, l. 195; p. 6, l. 199 and 211; p. 12, l. 415). The study demonstrates the improvement in performance, which indicates that errors that occur in standard AQA approaches have been reduced, but the question of whether these errors were completely eliminated was not considered. Thus, the authors are advised to consider the possibility to replace the verb “eliminate” in this context with a more appropriate verb, e.g., “reduce” (which the authors used at p. 1, l. 49), etc.

Author Response

Response to Reviewer 1 Comments

Thank you very much for your valuable comments, it have helped me a lot. Here are the changes I made in response to the comments you mentioned.

1. In several places in the manuscript, the authors state that the proposed approach “eliminates” errors in standard AQA approaches (cf. p. 1, l. 20; p. 2, l. 60 and 62, 71 and 73; p. 3, l. 147; p. 5, l. 195; p. 6, l. 199 and 211; p. 12, l. 415). The study demonstrates the improvement in performance, which indicates that errors that occur in standard AQA approaches have been reduced, but the question of whether these errors were completely eliminated was not considered. Thus, the authors are advised to consider the possibility to replace the verb “eliminate” in this context with a more appropriate verb, e.g., “reduce” (which the authors used at p. 1, l. 49), etc.

Response 1: 

Thank you very much for checking my manuscript so carefully and finding this error. I couldn't agree with you more. As you said, this study did not consider the question of whether the error mentioned in the article was completely eliminated or not. So I should have used 'reduce' instead of 'eliminate'. Therefore, I have changed all 'eliminate' to 'reduce' in the article (cf. p. 1, l. 20; p. 2, l. 52, 53, 55, 58 and 60; p. 5, l. 165, 169 and 181; p. 9, l. 320 ; p. 11, l. 362 and 372). Thanks again for your guidance!

Author Response File: Author Response.docx

Back to TopTop