Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Network Intrusion Detection Based on Amino Acid Sequence Structure Using Machine Learning

Electronics 2023, 12(20), 4294; https://doi.org/10.3390/electronics12204294

by Thaer AL Ibaisi^1,*

, Stefan Kuhn^1,2,†

, Mustafa Kaiiali^1,†

and Muhammad Kazim^1,†

Reviewer 1:

Erwin Kristen

Reviewer 2: Anonymous

Electronics 2023, 12(20), 4294; https://doi.org/10.3390/electronics12204294

Submission received: 15 September 2023 / Revised: 8 October 2023 / Accepted: 13 October 2023 / Published: 17 October 2023

(This article belongs to the Special Issue Machine-Learning-Enabled Big Data Analysis: Advancements, Applications and Challenges)

Round 1

Reviewer 1 Report

The manuscript gives a condensed overview of an Intrusion Detection Systems (IDS) method using an Amino Acid Sequence Structure encoding in combination with machine learning technologies.

The manuscript describes a very innovative intrusion detection solution that uses DNA patterns to determine the feature values for a neural network. The document therefore combined network security engineering and neural networks with genetics. This makes studying the manuscript challenging for the reader with only a basic general knowledge of genetics.

Example: What is a codon? A one-sentence explanation can be very helpful to the reader (For example: A codon is the smallest functional subunit of a DNA sequence and consists of three chemical bases, in general).

This limited genetic knowledge happens to me, and I can follow the explanations of the research approach up to chapter 3.4. The amino acid encoding is clearly explained (Table 3), but the feature transformation (chapter 3.4) is no longer comprehensible to me. The description is done on only one page, but this is a very crucial step of the method. For me, the list of the ten structural properties is totally out of context to for me.

How does these structural properties offer a multi-dimensional view of the amino acid sequence? What does the value 0.04878048780487805 means for β Strand? What does the value 5210 means for Reduced Cysteines in table 5, and so on?

This represents an important step for data preparation as these values are used as inputs to the neural network. The values are calculated by the software Biopython. What does Biophyton do? How are the values calculated?

That's it for genetics, because in the next chapter the work for the neural network is described, and the data and values determined in the research are presented.

Please, completely revise the chapter 3.4 “Feature Transformation”. Explain each step in more detail and with examples (such as in Table 3 for amino acid encoding) so that it can also be understood by a non-genetic expert. Explain the calculated feature values. Explain the differences between artificially generated amino acid sequences in the manuscript and real natural sequences. What is the benefit of encoding with amino acids instead of numerical values? Is using analysis software from biology an advantage?

The use of amino acid sequences to encode the network data represents an innovative research work. According to the results of the presented research work, there is a high level of accuracy in the detection of network attacks.

Here are some additional recommendation and comments:

· What is the supplementary material intended for? It is not related to the manuscript. Here the input features for the neural network were not encoded with amino acid sequences but rather with numerical values in the padding algorithm. There are nice table overviews, nice lists and figures, but in what context? Remove the material, it confuses more than helps. --> Remove it!

· Line 77: Write “Suyehira [4] proposed an encoding and decoding algorithm tailored for a DNA-based data storage system.” instead of “Suyehira [4] proposed an encoding and decoding algorithm tailored for encoding and decoding algorithm tailored for a DNA-based data storage system.”.

· Line 92: Add a one-sentence description for the term “Codon”.

· Line 258: You write: “The major advantages and disadvantages of subsampling are:”. You only list advantages, list one or two disadvantages.

· Figure 3: The output of the neural network topology is the “Output Layer”.

The manuscript is written in a well readable manner. For a complete understanding in detail, the reader must have specialist knowledge.

References are checked only randomly. All tested references were found.

The manuscript can be, from my point of view, after a major rework of chapter 3.4 “Feature Transformation” released for publication.

Author Response

We thank the reviewers for their time and helpful comments. We have made changes to incorporate their feedback, which are highlighted in red in the revised paper. Replies to individual remarks follow below, with the original comments in italics.

The manuscript gives a condensed overview of an Intrusion Detection Systems (IDS) method using an Amino Acid Sequence Structure encoding in combination with machine learning technologies. The manuscript describes a very innovative intrusion detection solution that uses DNA patterns to determine the feature values for a neural network. The document therefore combined network security engineering and neural networks with genetics. This makes studying the manuscript challenging for the reader with only a basic general knowledge of genetics. Example: What is a codon? A one-sentence explanation can be very helpful to the reader (For example: A codon is the smallest functional subunit of a DNA sequence and consists of three chemical bases, in general).

We have added a sentence explaining codon. We have added extensive material to sections 3.2 to 3.4. We hope that sections 3.2 to 3.4 now give a compelling argument for our choice of encoding and are accessible without background knowledge.

We have completely revised 3.4 and believe it is much clearer now.

We have added extensive explanations for this in section 3.4.

We have added details about software and calculation of values to section 3.4.

That's it for genetics, because in the next chapter the work for the neural network is described, and the data and values determined in the research are presented. Please, completely revise the chapter 3.4 “Feature Transformation”. Explain each step in more detail and with examples (such as in Table 3 for amino acid encoding) so that it can also be understood by a non-genetic expert.

We have completely revised this section, including examples and tables, and believe it is much clearer now.

Explain the calculated feature values. Explain the differences between artificially generated amino acid sequences in the manuscript and real natural sequences.

We have added a new table 5 to section '3.3 Amino acid mapping' titled 'Differences between artificially generated and real natural Amino acid sequences'. In section 3.4, we also explain extensively the calculated feature values.

What is the benefit of encoding with amino acids instead of numerical values?

We have added an explanation at the end of section '3.3 Amino acid mapping' explaining this.

Is using analysis software from biology an advantage?

We explain this in section 3.4

We are happy that the reviewer agrees with us.

What is the supplementary material intended for? It is not related to the manuscript. Here the input features for the neural network were not encoded with amino acid sequences but rather with numerical values in the padding algorithm. There are nice table overviews, nice lists and figures, but in what context? Remove the material, it confuses more than helps. --> Remove it!

The supplemental information covers two areas: a) Other machine learning methods as neural networks b) more hyperparameter options for the neural network. We have added clear explanations of this on p. 19 and we believe that, like this, the materials are valuable for some readers. We ask the reviewer to consider keeping those materials.

Line 77: Write “Suyehira [4] proposed an encoding and decoding algorithm tailored for a DNA-based data storage system.” instead of “Suyehira [4] proposed an encoding and decoding algorithm tailored for encoding and decoding algorithm tailored for a DNA-based data storage system.”.

This has been changed as suggested.

Line 92: Add a one-sentence description for the term “Codon”.

We have added this.

Line 258: You write: “The major advantages and disadvantages of subsampling are:”. You only list advantages, list one or two disadvantages.

We have added two disadvantages to the list.

Figure 3: The output of the neural network topology is the “Output Layer”.

This has been changed as suggested.

The manuscript is written in a well readable manner. For a complete understanding in detail, the reader must have specialist knowledge. References are checked only randomly. All tested references were found. The manuscript can be, from my point of view, after a major rework of chapter 3.4 “Feature Transformation” released for publication.

We have tried to improve the manuscript for non-specialist readers, but clearly, we can not provide a full coverage of biological information storage and processing here, but we hope that the changes to sections 3.2 to 3.4 make a compelling argument for our choice of methods.

Reviewer 2 Report

This article presents a novel Amino acids encoding mechanism for encoding network transactions and generating structural properties from Amino

acid sequences.

Furthermore, please find below points suggested to be included in the final version of the article if considered for acceptance:

- while the approach taken is interesting, it is necessary to argue why it would be beneficial for NIDS to use it and how you see this connection

applied between these fields. This should be properly explained in the Introduction section.

- the use of the first person plural should be reduced in the Introduction section.

- at the end of the Related Work section the statement of the knowledge gap tackled in this research should be mentioned.

- the choice for using old, but classical datasets in this domain should be discussed in the Materials and Methods section.

Author Response

This article presents a novel Amino acids encoding mechanism for encoding network transactions and generating structural properties from Amino acid sequences. Furthermore, please find below points suggested to be included in the final version of the article if considered for acceptance:

- while the approach taken is interesting, it is necessary to argue why it would be beneficial for NIDS to use it and how you see this connection applied between these fields. This should be properly explained in the Introduction section.

We have added extensive material about this in sections 3.2 to 3.4. We found that it is too much to discuss in the introduction. We hope that sections 3.2 to 3.4 now give a compelling argument for our choice of encoding.

- the use of the first person plural should be reduced in the Introduction section.

We have reworded the introduction to passive voice. Due to the number of changes, those are not marked in red.

- at the end of the Related Work section the statement of the knowledge gap tackled in this research should be mentioned.

We have added a clear statement of the knowledge gap and how we intend to fill it at the end of section 2.

- the choice for using old, but classical datasets in this domain should be discussed in the Materials and Methods section.

This is now discussed at the end of section 3.1. We also list it as future work in the conclusion.

Round 2

Reviewer 1 Report

The revised manuscript provides a well-explained overview of an intrusion detection system (IDS) method that uses amino acid sequence structure encoding in combination with machine learning technologies.

Through the extensive revision of the manuscript, by adding more in-depth explanations and detailed examples, the underlying innovation and research work is well presented and highlighted.

Although this is still largely about biochemistry, amino acids and their active principles, the authors have found a good way to introduce a non-biology expert to the topic.

The manuscript is written in a well readable and understandable manner. For a complete understanding in detail, the reader must have specialist knowledge.

References are checked only randomly. All tested references were found.

The manuscript can be, from my point of view, approved for publication.

Article Menu

Network Intrusion Detection Based on Amino Acid Sequence Structure Using Machine Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI