Next Article in Journal
Indoor Air Quality Assessment in Grocery Stores
Next Article in Special Issue
Optimal Graph Convolutional Neural Network-Based Ransomware Detection for Cybersecurity in IoT Environment
Previous Article in Journal
Estimation of Energy Management Strategy Using Neural-Network-Based Surrogate Model for Range Extended Vehicle
Previous Article in Special Issue
Preliminary Examination of Emergent Threat and Risk Landscapes in Intelligent Harvesting Robots
 
 
Article
Peer-Review Record

Reducing False Negatives in Ransomware Detection: A Critical Evaluation of Machine Learning Algorithms

Appl. Sci. 2022, 12(24), 12941; https://doi.org/10.3390/app122412941
by Robert Bold 1, Haider Al-Khateeb 2,* and Nikolaos Ersotelos 3,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2022, 12(24), 12941; https://doi.org/10.3390/app122412941
Submission received: 28 October 2022 / Revised: 8 December 2022 / Accepted: 10 December 2022 / Published: 16 December 2022

Round 1

Reviewer 1 Report

I would like to commend the authors for this sound and interesting topic. The tools utilized are of the current trends with study objectives that are relevant and timely. I only have a few comments with the paper.

1. I believe that the introduction needs to be highlighted separately since the third paragraph is quite long which encompasses literatures and gap, purpose, and tools. I think for clarity, these can be separated into two or three paragraphs to avoid confusion.

2. The inclusion of research questions would benefit the paper from the probable output readers can assume upon going through the paper.

3. The methodology may opt to consider pseudocodes alongside the calculation explanation so readers may benefit from it - especially for those who plan to extend/cite the paper.

4. The evaluation I feel needs proper representation like utilizing the Taylor Diagram to encompass all possible accuracy, standard deviation, RMSEA, and correlation among different machine learning tools. You may want to consider https://doi.org/110.3390/su141811329 for reference.

5. The formatting and references of the paper, together with the grammar and sentence construction should be checked.

I hope these comments would help the authors to have their paper ready for publication. 

Author Response

“I would like to commend the authors for this sound and interesting topic. The tools utilized are of the current trends with study objectives that are relevant and timely. I only have a few comments with the paper.

  1. I believe that the introduction needs to be highlighted separately since the third paragraph is quite long which encompasses literatures and gap, purpose, and tools. I think for clarity, these can be separated into two or three paragraphs to avoid confusion.”

The authors thank the reviewer for their valuable comments to improve the current state of the paper. Please note the following revision in response to the review:

The introduction has been revised to split the long paragraph into separate paragraphs with each focusing on the following topics:

  • ransomware-as-a-service phenomenon
  • spread of ransomware via phishing emails
  • other delivery methods such as unpatched vulnerability
  • Challenges proposed by new malware trends
  • Finally, a paragraph that focuses on the opportunities and challenges offered by ML.

“2. The inclusion of research questions would benefit the paper from the probable output readers can assume upon going through the paper.”

Two key research questions have been included in the introduction.

“3. The methodology may opt to consider pseudocodes alongside the calculation explanation so readers may benefit from it - especially for those who plan to extend/cite the paper.”

For further clarity, a sample code has been included e.g., on how to aggregate the API calls made by the processes spawned by each sample file that was run within our experiment, where the results were output to a CSV file.

“4. The evaluation I feel needs proper representation like utilizing the Taylor Diagram to encompass all possible accuracy, standard deviation, RMSEA, and correlation among different machine learning tools. You may want to consider https://doi.org/110.3390/su141811329 for reference.”

Thank you for your constructive feedback. The https://doi.org/110.3390/su141811329 was not accessible online (we have received a “DOI Prefix [110.3390] Not Found” error. However, we appreciate the feedback and confirm there can be several matrices to compare ML algorithms. In our study we have:

  • Discussed and utilised the contextual evaluation metrics used in related work as explained in Section 4.1
  • Discussed and utilised novel metrics proposed and covered them in our study such as Positive Likelihood Ratio, (Negative Likelihood Ratio, Diagnostic Odds Ratio, Youden's index, Number Needed to Diagnose, Number Needed to Misdiagnose, Net Benefit.

“5. The formatting and references of the paper, together with the grammar and sentence construction should be checked.

I hope these comments would help the authors to have their paper ready for publication.”

We have revisited the manuscript for additional proof-reading.

Reviewer 2 Report

This paper presents an empirical comparison of the performance of ML algorithms for the task of detecting ransomware, by monitoring API calls. Special focus is put on the False Negatives that each algorithm produces, which is a "red flag" for an algorithm performing in such a domain.

Language is good, the existing work that is analyzed is sufficient, and the experimental setup used to obtain the results seems sound and realistic.

However, regarding the design, I would expect to see a more thorough evaluation, e.g. k-fold validation, more datasets, more state-of-the-art classification algorithms (and perhaps all composed in an extensible open-source tool that can be used for evaluating algorithms applied in the security sector).

Minor comments:

Including a future work part might improve the overall presentation of the work.

The third paragraph of the introduction is too long, and perhaps disrupts the focus of the reader. For example, some historical background could have been placed in Section 2. 

l. 94: "has been used": Perhaps this should be rephrased (e.g. is used), because the way it is currently written implies use in other (past) studies. 

l. 124: a missing r in "behaviour"

l. 133: but if this is not certain, then perhaps the following discussion should have been based upon the assumption of a different approach, other than that of Netto et al.

l. 454: The title of 4.2 should change, since "novel" implies that the metrics are introduced in the current paper, bit this is not the case, as it is mentioned in l. 455. This is somehow inconsistent.

Table 3: The first row could be omitted, since it contains the same value in all columns.

l. 567-574: I would prefer to see the notes for Table 3 in a paragraph instead of in a numbered list.

l.576-onwards: Quite a few whitespaces are missing.

 

Author Response

“This paper presents an empirical comparison of the performance of ML algorithms for the task of detecting ransomware, by monitoring API calls. Special focus is put on the False Negatives that each algorithm produces, which is a "red flag" for an algorithm performing in such a domain.

Language is good, the existing work that is analyzed is sufficient, and the experimental setup used to obtain the results seems sound and realistic.”

The authors thank the reviewer for their valuable comments to improve the current state of the paper. Please note the following revision in response to the review:

“However, regarding the design, I would expect to see a more thorough evaluation, e.g. k-fold validation, more datasets, more state-of-the-art classification algorithms (and perhaps all composed in an extensible open-source tool that can be used for evaluating algorithms applied in the security sector).”

We do appreciate there can be several matrices to compare ML algorithms. Integrating all classification algorithms within an open-source tool is out of scope for this paper but we have considered the suggestion in the following way to improve this study:

  • We have included a new Figure (Figure 6) to illustrate the Net Benefit for a range of probability thresholds.
  • We have fully revised the conclusion section and also included a future work section in which we also cover the opportunity of an extensible open-source tool to developed.
  • The study discuss and utilise contextual evaluation metrics used in related work as explained in Section 4.1
  • Additionally, our paper discuss and utilise novel metrics proposed and covered them in our study such as Positive Likelihood Ratio, (Negative Likelihood Ratio, Diagnostic Odds Ratio, Youden's index, Number Needed to Diagnose, Number Needed to Misdiagnose, Net Benefit.

“Minor comments:

Including a future work part might improve the overall presentation of the work.”

A future section was added to the paper.

“The third paragraph of the introduction is too long, and perhaps disrupts the focus of the reader. For example, some historical background could have been placed in Section 2.”

We have implemented the proposed modification.

“l. 94: "has been used": Perhaps this should be rephrased (e.g. is used), because the way it is currently written implies use in other (past) studies.”

We have implemented the proposed modification.

“l. 124: a missing r in "behaviour”"

We have implemented the proposed modification.

“l. 133: but if this is not certain, then perhaps the following discussion should have been based upon the assumption of a different approach, other than that of Netto et al.”

Thank you for your constructive feedback. The sentence has been revised to avoid confusion. The approach that is described in the referenced study by Netto et al., (2018) is technically clear to us. The critical discssion included in our study is applicable to it as it does not solely rely on the utilisation of the FileSystemWatecher class. However, for further details we tried to benefit the reader by mentioning an example of how a programmer can build such trigger. Likewise, the buffer overflow discussion within the same paragraph is a potential threat regardless of the class used. Hence, we now more clearly mention that the FileSystemWatcher class is an example.

“l. 454: The title of 4.2 should change, since "novel" implies that the metrics are introduced in the current paper, bit this is not the case, as it is mentioned in l. 455. This is somehow inconsistent.”

We have implemented the proposed modification.

“Table 3: The first row could be omitted, since it contains the same value in all columns.”

We have implemented the proposed modification.

“l. 567-574: I would prefer to see the notes for Table 3 in a paragraph instead of in a numbered list.”

We have implemented the proposed modification.

“l.576-onwards: Quite a few whitespaces are missing.”

Thank you for your constructive feedback. We have revisited the manuscript for missing whitespaces.

Reviewer 3 Report

The work is related to the ransomware that allows criminals having limited knowledge to launch ransomware attacks on the systems. The research work offers a critical literature review, examination, and testing of various state-of-the-art machine learning algorithms and models to detect ransomware. The previous focus was to report precision while overlooking the significance of other important factors in confusion matrices such as false negatives. Therefore, critical evaluation of ML models using 800 malware and 800 benign samples is done to mitigate ransomware at different levels of the detection system. Some of the detailed comments and recommended changes are mentioned below as the paper cannot be accepted in its current form, the changes are highly recommended and would be appreciated.

 

Some of the Comments are mentioned below:

·         The writing of the paper can be improved further to enhance the quality of the draft.

·         Generic model of ransomware detection strategy or system must be shown using the flowchart or figure, so that readers may understand the overall process clearly.

·         In Heading 3 at line 245,  some proposed method-related detail must be included before discussing directly the host environments used for experiments(First, something related to its design or Architecture must be presented, and then the experiment-related material).

·         Proposed Methods Description with some flow chart or algorithm representation is recommended in order to make the research design and methodologies more appropriate. The quality of the presentation will be improved in this manner. 

·         Some more detail(please elaborate detail a bit for better understanding) regarding the employment of specific ML algorithms to solve the malware-related problem is highly recommended.

·         The conclusion can be made a bit brief while discussing the detailed results in the discussion and result portion, finally summarizing them in the conclusion very briefly(Right now, there seems to be much more detail in it).

·         More paper referencing from 2021 and 2022 must be included in relevant sections such as Background/ ML Algorithms and in the text where necessary.   

Author Response

“The work is related to the ransomware that allows criminals having limited knowledge to launch ransomware attacks on the systems. The research work offers a critical literature review, examination, and testing of various state-of-the-art machine learning algorithms and models to detect ransomware. The previous focus was to report precision while overlooking the significance of other important factors in confusion matrices such as false negatives. Therefore, critical evaluation of ML models using 800 malware and 800 benign samples is done to mitigate ransomware at different levels of the detection system. Some of the detailed comments and recommended changes are mentioned below as the paper cannot be accepted in its current form, the changes are highly recommended and would be appreciated.”

 The authors thank the reviewer for their valuable comments to improve the current state of the paper. Please note the following revision in response to the review: 

“Some of the Comments are mentioned below:

  • The writing of the paper can be improved further to enhance the quality of the draft.”

Thank you for your constructive feedback. We have revisited the manuscript for additional proof-reading and in some cases rewriting sentences and paragraphs as part of the review process.

“·  Generic model of ransomware detection strategy or system must be shown using the flowchart or figure, so that readers may understand the overall process clearly.

  • In Heading 3 at line 245, some proposed method-related detail must be included before discussing directly the host environments used for experiments (First, something related to its design or Architecture must be presented, and then the experiment-related material).
  • Proposed Methods Description with some flow chart or algorithm representation is recommended in order to make the research design and methodologies more appropriate. The quality of the presentation will be improved in this manner.”

Thank you for your constructive feedback about the methodology section. We have revised the first paragraph in our methodology section to show how the architecture is influenced by a ransomeware detection strategy that is based on Mchine Learning classifiers. We have developed and included a new figure as suggested (Figure 1) which describes both an overall ML process for malware detection, and the processing phases required to train a ML classifier (preprocession, feature extraction etc). In addition to other revisions to the methodology section for further clarity such as including a sample code on how to aggregate the API calls made by the processes spawned by each sample file that was run within our experiment, where the results were output to a CSV file.

“· Some more detail (please elaborate detail a bit for better understanding) regarding the employment of specific ML algorithms to solve the malware-related problem is highly recommended.”

We have incorporated more details in the manuscript regarding the employment of ML algorithms for the detection of ransomeware. Some of these changes are now part of the result sections e.g., more analysis and a new figure. Other revisions took place within the background section e.g., more recent papers have been referenced and discussed in relation to specific algorithms.

“· The conclusion can be made a bit brief while discussing the detailed results in the discussion and result portion, finally summarizing them in the conclusion very briefly(Right now, there seems to be much more detail in it).”

We have revisited the conclusion section accordingly making sure it includes a summary of findings while all the results are contained within the results section. We have also added a new section on future work in this area.

“· More paper referencing from 2021 and 2022 must be included in relevant sections such as Background/ ML Algorithms and in the text where necessary.”

Thank you for your constructive feedback. We have revisited the Background section and extended the critical discussion with more studies from 2021 and 2022 as suggested.

Round 2

Reviewer 1 Report

The comments and suggestions were addressed accordingly. Thank you.

Author Response

We are pleased to know that we have addressed your constructive feedback comments and that you have accepted our paper for publication. Your valuable comments helped us revise the document and improve the proposed methodology and research outcomes.

Reviewer 2 Report

I appreciate the text additions, however there are still some parts that must be improved.

It is not clear whether the dataset consists of 1600 or 1606 samples. Also, 20% of the dataset equals to 320 samples, but the results of Table 3 indicate that the test samples were 293, a fact that creates some confusion to the reader.

It would be important to have a note on class imbalance, since, as I anticipate, in real world cases, the number of positive examples will be much lower than the negative examples of the normal traffic. It would greatly improve the quality of this work if a series of experiments with imbalanced datasets was included, otherwise at least recognize and state such issues.

l. 107-110: Since such questions are put in place, I would expect clear and explicit answers to them, e.g. at the end of the results section, or e.g. in the conclusions.

l. 767: ...this thesis... -> this paper?

l. 779: One would expect the algorithm to be fine-tuned before put under comparison. If further fine-tuning is required, then how can the results of the study be considered as precise and reliable?

Author Response

“It is not clear whether the dataset consists of 1600 or 1606 samples. Also, 20% of the dataset equals to 320 samples, but the results of Table 3 indicate that the test samples were 293, a fact that creates some confusion to the reader.”

We greatly appreciate this important follow up, we acknowledge this particular feedback has helped us to investigate our experiment procedure and related notes. We confirm this has now been corrected and that the number of samples included in this particular experiment have now been amended to make it clear for the reader within section “3.2 Dataset Preparation”. Overall, the sample size was 1465 (730 malware, 735 benign). Hence, during testing 20% of the overall dataset gives 293 samples which we confirm is the correct number we used to testing.

“It would be important to have a note on class imbalance, since, as I anticipate, in real world cases, the number of positive examples will be much lower than the negative examples of the normal traffic. It would greatly improve the quality of this work if a series of experiments with imbalanced datasets was included, otherwise at least recognize and state such issues.”

With regards to the issue of class in-balance in real world scenarios where the number of positive samples is much lower, we appreciate this issue is true in this field of research. Firstly, we have balanced our samples to avoid under-sampling / over-sampling. We also included a paragraph in the conclusion section, supported by a reference, to make sure this limitation is clear to the reader and well acknowledged.

“I. 107-110: Since such questions are put in place, I would expect clear and explicit answers to them, e.g. at the end of the results section, or e.g. in the conclusions.”

The conclusion section has now been amended to more precisely summarise the answer to the research questions.

“I. 767: ...this thesis... -> this paper?”

We have corrected the typo error

“I. 779: One would expect the algorithm to be fine-tuned before put under comparison. If further fine-tuning is required, then how can the results of the study be considered as precise and reliable?”

With regards to finetuning the ANN Model 3, we have revised the wordings of the paragraph. Furthermore, please note that we confirm that fine-tuning was included as part of this study to produce the best possible performance. However, we wanted to acknowledge the theoretical possibility (including novel means) for further enhancement.

Reviewer 3 Report

Its improved according to my observations so its now accepted from my side.

Author Response

We are pleased to know that we have addressed your constructive feedback comments and that you have accepted our paper for publication. Your valuable comments helped us revise the document and improve the proposed methodology and research outcomes.

Back to TopTop