Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessCommunication

Peer-Review Record

Detecting Phishing Accounts on Ethereum Based on Transaction Records and EGAT

Electronics 2023, 12(4), 993; https://doi.org/10.3390/electronics12040993

by Xuanchen Zhou^1,2

, Wenzhong Yang^2,* and Xiaodan Tian²

Reviewer 1: Anonymous

Reviewer 2:

Waheeb Abu-Ulbeh

Reviewer 3:

Ozgur Koray Sahingoz

Electronics 2023, 12(4), 993; https://doi.org/10.3390/electronics12040993

Submission received: 3 January 2023 / Revised: 10 February 2023 / Accepted: 14 February 2023 / Published: 16 February 2023

(This article belongs to the Topic Machine and Deep Learning)

Round 1

Reviewer 1 Report

-more information on Ethereum Network characteristics and the preference to use it in this work would be helpful

-explanation on the creation of the weight matrix W for each node ) is needed (p5, line:165).

-the role of the blockchain scalability, to handle higher volume of transactions, with the use of a two-layer EGAT could be considered.

Author Response

Thank you for all your suggestions.

About the first suggestion "-more information on Ethereum Network characteristics and the preference to use it in this work would be helpful"

According to your mention of using more Ethernet features network features, my original paper used many unique Ethernet network features such as gas price, balance, value, and timestamp, and built some additional features based on their statistical features. Other features that I did not use as part of the features such as "address, blockNumber (the block number of this chain activity), hash (the hash value of a transaction record)" are more similar to sequential identification, without much linguistic information.

This is indeed a good suggestion, some features are only available on smart contract accounts, such as "raw code", I will try the semantic representation effect in a separate smart contract detection.

About the second suggestion "-explanation on the creation of the weight matrix W for each node ) is needed (p5, line:165).“

This weight matrix W you mentioned is the weight matrix in the original GAT (graph attention network), and I have added the relevant elaboration in the modified version.

About the third suggestion "-the role of the blockchain scalability, to handle higher volume of transactions, with the use of a two-layer EGAT could be considered."

Sorry for not being clear in the text, I did use a two-layer EGAT layer. this is a very good suggestion because I also found that a single-layer EGAT layer did not work well for messaging the graph structure in the article so I used a two-layer EGAT.

Reviewer 2 Report

All comments are written in the original manuscript.

Comments for author File: Comments.pdf

Author Response

First of all, thank you for all your suggestions!

Your first suggestion is about the edge‘s features in the text "Are these characteristics the only ones that can be assumed in the evaluation?"

Gas fee, value, and timestamp are the characteristics that must exist and contain real meaning for every transaction composition. This is why I chose them. The other features are more relevant to the points, so I constructed more features on the point information.

Your second suggestion is "A more complex subparagraph can be added in future studies."

This is a very good suggestion, and I will try other more complex subparagraph construction methods in subsequent studies.

Your third suggestion is "The references from 2020 to 2022 are small in comparison to others; this comment should be considered in future research."

Yes, there should be more citations and references to some recent papers. Combined with your second suggestion, I will cite more recent papers in the subsequent research. In the current revised version, some recent papers are also newly added.

Thanks again for your suggestion!

Reviewer 3 Report

Blockchain has a significant impact on a variety of applications, but it also attracts numerous cybercriminals. In blockchain, phishing transfers the victim's virtual currency in order to fraudulently make enormous profits, posing a threat to the blockchain ecosystem. Ethereum, one of the blockchain platforms, can provide information to detect phishing fraud to prevent greater losses.

In this paper authors aimed to pose a solution for this type and similar problems by Detecting Phishing Accounts on Ethereum Based on Transaction Records and EGAT: Edge-Featured Graph Attention Networks.

The topic is very interesting and fits in the area of Electronics Journal.

However, I have a strong objection about the acceptance of the paper.

Firstly,

Authors must make a clear definition of the problem.

It should be better to use some figures, graphs etc.

Secondly,

They should be careful about writing the paper.

It is not a good style to write even an abstract in this way.

I have listed some of them in the abstract section as follows.

Due to the characteristics of blockchain anonymity, Ether gradually becomes the main of phishing.—It is not clear of proved! What is your basis for this claim?

To address the above problems,-- What are the problems above? There is only one sentence previously! It does not describe any problem (and also problems)

an EGAT network based on the-- What does EGAT refer? It is not clear in the abstract section.

In this study to detect Ether phishing accounts—What does “Ether phishing accounts” mean?

important information as a node and edge features of the graph.--??

Thirdly,

Please give some additional details about your datasets.

Fourthly,

Experimental results are not clear.

Table 3. Results of phishing detection. Comparison with the results in other papers

Accuracy Precision Recall F1-score of EGAT are shown as 0.986, same for all.

Is it reasonable? Why? How?

Why the Accuracy Precision Recall F1-score of EGAT are different form Table2?

Finally some minor comments

EGAT is defined as (Graph Attention Network)[10]. – How this can be possible?

in other papers are shown in Table ??.-- ??

The main contributions are the following two aspects.-- The main contributions are in the following two aspects.

The statistical features manually as follow:--??

Lifetime (LT): Time the account has been alive since a transaction was recorded.--??

Author Response

Thank you for the very detailed advice you gave, it was very helpful to me.

Firstly，

The INTRODUCTION section in my revised version more clearly states the goals of the dissertation.

Secondly,

I have made changes to address the language style and presentation issues you raised in the abstract section and checked other locations.

Thirdly,

In the Dataset section, I have added a new, more detailed elaboration.

Fourthly,

The reason why the values of acc, pre, recall and f1 are the same is because of the torchmetrics toolbox. In the function F1score(average), when I set the parameter average='micro', the output values of these evaluation metrics will be exactly the same for each epoch. When I set average='weighted' it will be different.

This is the the reason for the different results in the two tables.

I am very sorry I forgot to unify the results, I have corrected it.

Finally ，

EGAT uses line graph+GAT to learn node and edge representations. The auxiliary graph is constructed using line graph (nodes in original graph become edges in line graph, edges become nodes), and the model is trained on original graph and line graph, and node, edge embeddings are updated alternately.Lifetime

(LT): The value of the difference between the timestamp of the last transaction record of the account and the timestamp of the beginning of the transaction record.

Thank you again for your revised comments.

Round 2

Reviewer 3 Report

Firstly,

I want to see a detailed explanation of my previous reviews.

For example, I said that

"Fourthly,

Experimental results are not clear.

Table 3. Results of phishing detection. Comparison with the results in other papers

Accuracy Precision Recall F1-score of EGAT are shown as 0.986, same for all.

Is it reasonable? Why? How?

Why the Accuracy Precision Recall F1-score of EGAT are different form Table2?
"

I want to see an explanation about all these five reviews. However, there is not!

Secondly,

The authors have to enhance the paper, especially in English. It is really hard to read and understand it.
I have listed only the hazy parts in the abstract section.

"The development of blockchain technology has brought prosperity to the cryptocurrency market and has made the blockchain platform a hotbed of crimes."-- How?
How can "the development of blockchain technology" made "the blockchain platform a hotbed of crimes."?
This study to detect phishing accounts on Ethereum through the classification of transaction network subgraphs.--?? What is the verb in this sentence?
Firstly, the accounts .. -- Firstly for what?
Firstly, the accounts are used as nodes ...-- which accounts?
the flow of transaction funds is used as directed edges..-- is there only one flow?
to construct the basic transaction network graph. -- "the basic transaction network graph" is tere only one TNG? or "a transaction network graph"
the F1 value of the proposed method... -- What does "F1 value" mean? F1-score!!
the network is more efficient and accurate compared with Graph2Vec and DeepWalk,--??

I Strongly suggest a Native speaker check the paper.

Thirdly,

The problem definition is still "not satisfactory" in the paper.

Fourthly,

ZHAO [12] analyzed Bitcoin accounts --- Why is it written in uppercase?

MONACO [13] uses 12 transaction---However,
14. MONACO J V.Identifying Bitcoin Users by Transaction Behavior[C]//SPIE.2015 International Society for Optics and Photonics Defense Security and Sen sing,April 20-24,2015, Baltimore, MD,USA.Washington:SPIE,2015:33-47.

Please check all your references.

A. H. H. Kabla [24] et al. proposed--- Why do you use it as "A. H. H. Kabla"

Finally,

The authors showed that,

Model Accuracy Precision Recall F1-score
DeepWalk 0.651 0.651 0.651 0.651
GraphSage 0.832 0.832 0.832 0.832
EGAT 0.986 0.986 0.986 0.986

According to the formula,
Accuracy =(TP + TN)/(TP + TN + FP + FN)
Precision =(TP)/ (TP + FP)
Recall =(TP) / (TP + FN)

How can all these formulas give the same values?

The authors said that "In the function F1score(average), when I set the parameter average='micro', the output values of these evaluation metrics will be exactly the same for each epoch. When I set average='weighted' it will be different."-- How this can be possible?

Could you please show the confusion matrix? at least for the "Authors Reply" section.
I want to see it.

Author Response

Thank you for all your revisions.

The first issue you pointed out：

"Accuracy Precision Recall F1-score of EGAT are shown as 0.986, same for all.

Is it reasonable? Why? How?"

After outputting the ConfusionMatrix, I found that there is indeed a problem. I have retrained and corrected it in my paper.

The reasons are as follows.

In the previous version of the paper, the evaluation metrics I used were direct calls to the torchmetrics API, with no manual calculations. The code is as follows.

from torchmetrics import F1Score, Accuracy, Precision, Recall f1_score = F1Score(num_classes = 2, average='micro')
precision_score = Precision(num_classes = 2, average='micro')
recall_score = Recall(num_classes = 2, average='micro')
accuracy_score = Accuracy(num_classes = 2, average='micro')

'micro' is used to calculate the overall metrics, not for each category, such as overall TN, FN, etc.
The manual calculation revealed that only Accuarcy was correct in the calculation results. The three values of Precision, Recall and F1 are incorrect. And their value is equal to Accuracy

Example:

Confusion matrix and evaluation metrics for an epoch in the training when using "micro".

tensor([[326, 2],
[177, 167]])
tensor(0.7336) tensor(0.7336) tensor(0.7336) tensor(0.7336)

Accuracy: 0.7336309552192688,Precision: 0.7336309552192688, Recall: 0.7336309552192688, F1-score: 0.7336309552192688

The latex formula is as follows.

\text { Accuracy }=\frac{1}{N} \sum_{i}^{N} 1\left(y_{i}=\hat{y}_{i}\right)

torchmetrics的文档地址：

https://torchmetrics.readthedocs.io/en/stable/classification/accuracy.html

'weighted' calculates the metrics for each category and then performs a weighted average based on the number of samples in each category.

Example:

Confusion matrix and evaluation metrics for an epoch in the training when using "weighted".

tensor([[339, 7],
[ 15, 311]])
tensor(0.9673) tensor(0.9675) tensor(0.9673) tensor(0.9672)
Accuracy: 0.9672619104385376,Precision: 0.9675042629241943, Recall: 0.9672619104385376, F1-score: 0.9672456383705139

I found that the manual calculation using confusion matrix gives different results than the direct output using torchmetrics.

Based on repeated experiments over the past few days and after carefully studying the documentation of torchmetrcis, I speculated that the reason for the different results between manual and API results was due to the way the evaluation metrics were calculated using multiclassification tasks.

So I used the 0-1 classification (binary) method for the calculation and found that the results were consistent with my manual calculation. The method used is as follows.

from torchmetrics.classification import BinaryF1Score, BinaryAccuracy, BinaryPrecision, BinaryRecall ,BinaryConfusionMatrix f1_score = BinaryF1Score()
precision_score = BinaryPrecision()
recall_score = BinaryRecall()
accuracy_score = BinaryAccuracy()

Example, using one of the results of the binary method

tensor([[350, 15],
[ 4, 303]])
TP：303 ， FN：4 ， FP：15 ， TN：350
tensor(0.9717) tensor(0.9528) tensor(0.9870) tensor(0.9696)
Accuracy: 0.9717261791229248,Precision: 0.9528301954269409, Recall: 0.9869706630706787, F1-score: 0.9696000218391418

The second issue you pointed out：

I consulted a professional English editor to help me with the many grammatical problems you pointed out in the paper.

The third issue you pointed out：

The problem definition is still "not satisfactory" in the paper.

I revised the Introduction section of my paper again.

The fourth issue you point out：

ZHAO [12] analyzed Bitcoin accounts --- Why is it written in uppercase?

I used the wrong reference format. A first check has been made to correct it.

Thank you very much for your careful guidance.

There may still be problems with the article, and your comments on corrections are welcome.

Round 3

Reviewer 3 Report

In the second revision of the paper,

What do you mean with the following text?

"Please punctuate equations as regular text. Theorem-type environments (including 223

propositions, lemmas, corollaries etc.) can be formatted as follows: 224

Theorem 1. Example text of a theorem."

- - - - - -

Although, in the previous reviews it is strictly described about the use of wrong format references as "Please check all your references." in my foutheh review.

There are still too many errors in the paper.

W. Chen, X. Guo, Z. Chen, Z. Zheng, and Y. Lu, Phishing Scam Detection on Ethereum: Towards Financial Security for Blockchain Ecosystem. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence bf 2020, 4506–4512.

what does bf mean here?

WHy do you use different formats for references?

Bian Lingyu, Zhang Linlin, Zhao Kai,

W. Chen, X. Guo, Z. Chen, Z.

Lee D. T. Automated

TOYODA K T.

Elli Androulaki Ghassan O. Karame et al.

- - - - - -

"In January 2019, according to a Chainalysis report, Ethereum has become the preferred encryption platform for fraudsters." What is your reference for this?

- - - - - -

Why do you still use uppercase letters? Although it was said in the previous reviews.

MONACO [15] includes

ANDROULAKI [16] and others found

REID [17] et al. cluster the

YIN [18] et al. selected

TOYODA [21] et al. extracted the

LIN [19] and others

KANEMURA [22] et

CHEN [23] et al.

Author Response

Thank you very much again for your meticulous revision.

"Please punctuate equations as regular text. Theorem-type environments (including 223

propositions, lemmas, corollaries etc.) can be formatted as follows: 224

Theorem 1. Example text of a theorem."

1. I made the mistake because I incorrectly placed the text from a template into the paper, which has now been corrected.

2. The second problem was that in the template, it was necessary to use {\bf 2020}, plus highlight the year. This has now been fixed.

3. I failed to fully understand the required format of references. This time, I did some revisions to it.

% Reference 1
Author~1, T. The title of the cited article. {\em Journal Abbreviation} {\bf 2008}, {\em 10}, 142--149. "In January 2019, according to a Chainalysis report, Ethereum has become the preferred encryption platform for fraudsters." What is your reference for this?

4. The company Chainalysis basically released a report per year. I have added new references.

Why do you still use uppercase letters? Although it was said in the previous reviews.

5. After I made the modifications in Word, I forgot to update them in latex.

Article Menu

Detecting Phishing Accounts on Ethereum Based on Transaction Records and EGAT

Further Information

Guidelines

MDPI Initiatives

Follow MDPI