Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Hybrid Attention Network for Malware Detection Based on Multi-Feature Aligned and Fusion

Electronics 2023, 12(3), 713; https://doi.org/10.3390/electronics12030713

by Xing Yang

, Denghui Yang

and Yizhou Li^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Electronics 2023, 12(3), 713; https://doi.org/10.3390/electronics12030713

Submission received: 22 December 2022 / Revised: 19 January 2023 / Accepted: 20 January 2023 / Published: 1 February 2023

(This article belongs to the Special Issue Big Data Analytics and Artificial Intelligence in Electronics)

Round 1

Reviewer 1 Report

In this paper, the authors suggested a hybrid attention model for malware detection by integrating binary file and assembly code. Some suggestions and remarks are required to be addressed to improve the paper's quality:

1)English proofreading is required since there are many mistakes in many phrases in terms of English.

2) Please check and correct the citations in the text since many citations were not written properly; for example, citation in line 439, and some citations were written after sentences.

3)In Section 2. Related Work:

- In malware detection, many previous works applied machine learning using the static, dynamic and hybrid features. So, it is expected that the authors discussed more previous works, especially those applied fusion methods. Furthermore, at the end of this section, the authors should explain how different the proposed work about the previous works mentioned in this section.

4) I found many mistakes in the numbers of figures and tables in captions and text. Furthermore, please rewrite the captions of figures and tables in a better way.

5) In order to show the performance improvement of the proposed fusion method, I suggest also adding comparison results using binary file and assembly code before the proposed fusion and then after the proposed fusion.

6) I believe that the proposed fusion method is time-consuming as it is based on CNN and deep learning. So, I suggest that the limitations of the proposed method should be discussed in Section 6. Conclusion

Author Response

We are very grateful to you for giving us an opportunity to revise our manuscript. We appreciate you very much for your positive and constructive comments and suggestions on our manuscript entitled “A hybrid attention network for malware detection based on multi-feature aligned and fusion”(ID: electronics-2146166)

We have studied reviewers’ comments carefully and revised our manuscript according to the comments point to point. The following are the responses and revisions. We have made in response to the reviewers’ questions and suggestions on an item-by-item basis. Thanks again to the hard work of the editor and reviewer!

Q.1: English proofreading is required since there are many mistakes in many phrases in terms of English.

Response: We double checked our manuscript and carefully made revision according to your suggestions. In addition, we have asked English professional to polish and modify the manuscript.

Q.2: Please check and correct the citations in the text since many citations were not written properly; for example, citation in line 439, and some citations were written after sentences.

Response: Thanks for the reminder, we have carefully checked and corrected the citation in our manuscript.

Q.3: In Section 2. Related Work:

Response: According to your comment, we listed a table to summarize the related work in Section 2, that includes the feature extraction strategies and classification methods of static, dynamic, and hybrid feature malware detection in previous studies. Furthermore, our method realizes the automatic feature extraction other than the manual extraction. Considering both the local and global information of a sample, we focus on the approaches for multi-feature fusion. We added an explanation of the difference between the proposed work and previous works at the end of Section 2.

Q.4: I found many mistakes in the numbers of figures and tables in captions and text. Furthermore, please rewrite the captions of figures and tables in a better way.

Response: We are sorry for our negligence of mistakenly writing figures and tables, and we carefully corrected figures and tables in the manuscript according to the format required by the journal.

Q.5: In order to show the performance improvement of the proposed fusion method, I suggest also adding comparison results using binary file and assembly code before We appreciate you very much for your positive and constructive comments and suggestions on our manuscript the proposed fusion and then after the proposed fusion.

Response: Thanks for your suggestion, we built models with binary sequences and opcodes respectively and added two more comparison experiments with our fusion method. The result showed that the proposed fusion method indeed improved the performance. As a supplement, we gave some description about the ablation studies in Section 5.1.

Q.6: I believe that the proposed fusion method is time-consuming as it is based on CNN and deep learning. So, I suggest that the limitations of the proposed method should be discussed in Section 6. Conclusion

Response: In response to your inquiry, we have provided information about the time consumption of training models and malware detection in Section 4.4. due to several attention modules were applied for feature fusion, our model needs more time for training than other deep learning-based models. It took ~25 hours for training, while only 0.23 seconds for a detection of one piece of data. It is therefore anticipated to meet the production requirement. In Section 5.3, we gave a discussion about the limitation of our method in terms of time consumption.

Reviewer 2 Report

A multi-features extraction and fusion method is proposed in this paper to detect malware variants. The proposed method features (1) a stacked Convolutional Network to capture the temporal information and discontinuity in the function call, (2) the triangular attention to extract code-level features from assembly code, and (3) the cross attention to enhance the stability of feature representation. The proposed method is compared with a baseline deep learning model and five malware detection models using the Kaggle Malware Classification dataset and a large real-world dataset. This paper is mainly prepared well with sufficient contributions as a journal publication. Reviewer’s comments are primarily minor, with additional information requested and formatting issues, such as follows.

- Introduction: Please briefly explain malware, different malware types of malware, and how does malware different from other cyberattacks.

- I would suggest having a table summary for Section 2.

- Some abbreviations are not explained. Although they are commonly used abbreviations in this field, they should be explained to readers who are not experts in the fields, for example, lMD5, SHA1, API, IDA, 1D CNN, LSTM, AVG, and others. Check the entire draft.

- Section 3 is good except for figures’ and tables’ numbering sequence errors. Fig 1 Neural network is a typo “Nerual”. Check the draft for other typos. Line 199 should be Fig 2. Please update the subsequent figure captions numbering and in-text figure citations.

- Line 315 should be Table 2. It mentions the dimension change of each layer. What triggers this dimension change, and does this dimension change on every run? Please explain or add more details.

- Ri in Eq 15 and ri in Eq 2 are similar, although they are in a different contexts. Please consider changing the notation.

- Please explain the comparison methods in Table 2, Line 375, and Table 3, Line 415. What are their key advantages, and why are they selected to be compared with the proposed method? Additionally, Table 2 and Table 3 do not compare the processing time of the proposed method. Please include some information about the processing time. Does the processing time sacrificed for accuracy and precision?

- Section 5.3, please also identify the disadvantages of the proposed method.

- Formatting

o Check references and citations. Line 81 should be [10-12]. Please edit the rest of the paper.

o Check bullet points error. For example, Lines 94,97,100

o Inconsistent capitalization for heading in Sections and subsections, in figures, and in tables. For example, Sections 2.1 and 2.2, texts in Tables.

Author Response

Dear Editor and Reviewers:

Q.1: Introduction: Please briefly explain malware, different malware types of malware, and how does malware different from other cyberattacks.

Response: As Reviewer suggested, we added the definition of malware and description the difference between malware and other cyberattacks at the beginning of the second paragraph in Section 1. Furthermore, we listed prevalent types of malware families in the third paragraph.

Q.2: I would suggest having a table summary for Section 2.

Response: Considering the Reviewer’s suggestion, we have added a table at the end of Section 2 that summarizes the feature extraction strategies and classification methods of static, dynamic, and hybrid feature malware detection in previous studies.

Q.3: Some abbreviations are not explained. Although they are commonly used abbreviations in this field, they should be explained to readers who are not experts in the fields, for example, lMD5, SHA1, API, IDA, 1D CNN, LSTM, AVG, and others. Check the entire draft.

Response: Thanks for the reminder, we have added the complete explanation of the abbreviations where they first appeared in the manuscript.

Q.4: Section 3 is good except for figures’ and tables’ numbering sequence errors. Fig 1 Neural network is a typo “Nerual”. Check the draft for other typos. Line 199 should be Fig 2. Please update the subsequent figure captions numbering and in-text figure citations.

Response: Thanks for discovering this mistake, we double checked and corrected the figures’ and tables’ numbering sequence errors, and corrected the spelling mistake in the figures.

Q.5: Line 315 should be Table 2. It mentions the dimension change of each layer. What triggers this dimension change, and does this dimension change on every run? Please explain or add more details.

Response: We are sorry that we may have not expressed it clearly. In the table, dimension transformation is determined by modules of the network. The sample with a length of 1000 characters is input in the network to show how the dimension change after each module. When the shape of the input changed, the dimension after convolution, pooling, and other operations would be different. To avoid confusion in understanding, we changed the title of the table to “The dimension change of each module on the detection model”.

Q.6: Ri in Eq 15 and ri in Eq 2 are similar, although they are in a different context. Please consider changing the notation.

Response: Thanks for the reminder, we have changed the abbreviation of Recall in Eq15 to

Q.7: Please explain the comparison methods in Table 2, Line 375, and Table 3, Line 415. What are their key advantages, and why are they selected to be compared with the proposed method? Additionally, Table 2 and Table 3 do not compare the processing time of the proposed method. Please include some information about the processing time. Does the processing time sacrifice for accuracy and precision?

Response: (1) The comparison method selected in the manuscript are mainly based on the following reasons: Firstly, classic model (Malconv) was chosen which was widely acknowledged and utilized as the foundation for numerous subsequent investigations in this field. Then, two gray image based approaches were chosen which had received many attention in malware detection recently. Finally, we compared our method with remarkable studies for multi-features fusion.

(2)We described in Section 4.4 about the time consumption of training model and detecting malware. The trained model took around 25 hours to complete, and the detection of one piece of data took 0.23 seconds. Compared to other deep learning-based models, our model took more time for training. However, the time consumption of detection was kept at the millisecond level.

Q.8: Section 5.3, please also identify the of the proposed method.

Response: Thanks for the valuable comment, we gave same descriptions on disadvantages of our model in Section 5.3. As indicated in section 4.4, our model requires 25 hours for training, although our method achieves higher accuracy than previous methods. In the future, we will try to optimize the model's algorithm with less computation cost.

Q.9: Formatting

o Check references and citations. Line 81 should be [10-12]. Please edit the rest of the paper.

o Check bullet points error. For example, Lines 94,97,100

o Inconsistent capitalization for heading in Sections and subsections, in figures, and in tables. For example, Sections 2.1 and 2.2, texts in Tables.

Response: We are sorry we have some formatting errors in our manuscript, we have reviewed and rectified the citations, removed the space before Lines 94,97,100, and revised the inconsistency of the section and subsection, figures, and tables in the manuscript.

Round 2

Reviewer 1 Report

The authors have to cite the related works written in Table 1. The summary of the related works. I think the revised paper looks good.

Author Response

Q.1: The authors have to cite the related works written in Table 1. The summary of the related works. I think the revised paper looks good.

Response: According to your comment, we have cited the related works written in Table 1.

Article Menu

A Hybrid Attention Network for Malware Detection Based on Multi-Feature Aligned and Fusion

Further Information

Guidelines

MDPI Initiatives

Follow MDPI