Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Multi-Objective Optimization for High-Dimensional Maximal Frequent Itemset Mining

Appl. Sci. 2021, 11(19), 8971; https://doi.org/10.3390/app11198971

by Yalong Zhang^1,*, Wei Yu¹, Xuan Ma², Hisakazu Ogura³ and Dongfen Ye¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2021, 11(19), 8971; https://doi.org/10.3390/app11198971

Submission received: 24 August 2021 / Revised: 19 September 2021 / Accepted: 22 September 2021 / Published: 26 September 2021

(This article belongs to the Special Issue Selected Papers from FCPAE2021 and 3rd International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM2021))

Round 1

Reviewer 1 Report

This paper aims to propose a frequent itemset mining algorithm for datasets which have high-dimensional attributes by multi-objective optimization. Since most of the existing algorithms (such as APRIORI, FP-Growth, ECLAT and so on) suffer from exponential explosive time complexity, reducing their (APRIORI, FP-Growth, ECLAT and so on) time and space complexity can be considered as a good contribution to the itemset mining field.

Authors provided two reasons why they did not compare their method with existing algorithms, but they can add a new section (Related work) to describe the differences (similarities and dissimilarities) between their method and other traditional methods such as APRIORI, FP-Growth and so on.

Experimental evaluations were performed well and only on their method. Results were presented mostly in curves. I agreed that traditional algorithms cannot be applied to the datasets with higher number of transaction or high-dimensions, but is it possible to find smaller (not very small, such as “Nursery”, “Adult”... from UCI ML repository) datasets that can be applied also to traditional algorithms (APRIORI, FP-Growth) to show the advantages of your method in time and space complexity.

Here are some minor revisions:

In line 34: “Thus, frequent itemsets, such as shopping basket data analysis, webpage prefetching, cross-shopping, personalized websites, and network intrusion detection, have been extensively applied.”

Provide some references if possible.

In line 98. There is a technical error: “Based on this this definition”. One “this” must be removed.
In line 102. There is extra empty space between “and” and “no”.
In lines 106-107. What does this mean? “When the itemset X is a frequent itemset and no superset Y of X is found in the transaction sets D to make Y be a frequent itemset, then X is a maximal frequent itemset.”

If X is a frequent itemset, any superset of X will be frequent as well.

In line 121. Where you used M? you defined M but it is not used in the definition.
In some places “frequent 1 itemsets” (for example: line 138) term is used without score (-), in some other places “frequent 2-itemsets or 1-itemsets” is used with score (for example: line 139 and line 164).

Update it in whole paper or define why it was like that.

In line 139-140. “③Different number of frequent 1 itemsets were randomly added into each of the remaining frequent 2-itemsets, which gained the same quantity of k-itemsets with the frequent 2-itemsets. “

Why randomly added? Why not ordinary?

In line 143. “The group was used as the non-frequent itemset, and it was recorded as group2”.

Which group? (with support lower than user-defined minimum support threshold?).

Algorithm 1 should be modified, some parts are misunderstanding: in line 1 (P is written instead of sum); in line 4, is not defined or updated in anywhere of the algorithm and so on.

In line 19 of Algorithm 1, “fitness = fitness=f ¤length;”. What does it mean (fitness=fitness=…)?

In line 270, “Two curves are seen in Fig. 2(c).”. Is this really 2(c) or 2(e)?
In line 305, “traction” should be corrected
In line 351, “the accurate”, “the” should be in capital letter

Comments for author File: Comments.pdf

Author Response

Thank you for your constructive comments. We have revised the paper according to your comments one by one. Your comments have contributed greatly to the improvement of the quality of the paper. But there are limits to what we can do, and mistakes are inevitable. Please feel free to let me know if there is anything problem in paper.

Author Response File: Author Response.pdf

Reviewer 2 Report

In the context of association rule mining, this paper proposes an algorithm that is able to maximal frequent itemsets through a genetic algorithm-based solution. The main goal is to have a solution capable of finding maximal frequent itemsets over big transactional databases.

The paper tackles a very interesting problem, and the solution seems to be novel enough. However, the paper should be improved in the presentation and readability.
In particular: 1) contents should be presented in a better way; 2) the authors should formalize the problem issues they considered; 3) the definitions and the presentation of the solution should be complemented with explanatory examples; and 4) the paper is full of typos.

Several detailed comments follow:
- What do you mean with "clinical" algorithm?
- The introduction section should better introduce the usefulness of inferring maximal frequent itemsets.
- The introduction section also contains related works, which should be presented in a separated section. Moreover, a more complete discussion on RW should be integrated. For instance, since the problem of discovering (R)FDs is equivalent to the AR mining problem, also such algorithms should be discussed in the RW section. Recent works are:
https://doi.org/10.1007/s10618-019-00667-7
https://doi.org/10.1109/TKDE.2020.2967722

- The remainder of the paper should be presented at the end of the introduction section.
- The whole presentation appears quite colloquial. More formality is required for a scientific publication.
- In section 2, the meaning of the implication has not been explained.
- One of the main problems is the fact that no examples have been introduced. The could help the understanding of the problem and the proposal.
- The boundaries of the proposal should be better emphasized. Are there other algorithms addressing the same problem? with a similar strategy?
- Include a discussion on time and space complexity.
- A comparative evaluation should be introduced especially if similar solutions have been defined in the literature.
- The comparison should also involve exact algorithms, in order to evaluate the effectiveness of the proposal.

Author Response

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The manuscript has been improved. Please check the technical errors one more time when you are submitting the camera-ready file

Author Response

Thank you very much for your comments last time and this time. I will check very carefully for technical errors when I am submitting the camera-ready file in final.

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper has been substantially improved, and the authors solved most of my previous remarks. Nevertheless, some issues remain unsolved. They are described in the following:

- A comparative evaluation of the proposal wrt. traditional approaches should be added, even if it requires involving smaller transaction sets. It could consider a transaction set that gradually becomes larger.
- The authors do not discuss future directions. For instance, the management of big result sets can benefit from some visual tools, like the following two: 1) https://doi.org/10.1016/j.bdr.2021.100240
2) https://doi.org/10.1145/2975167.2975185
- I suggest performing another proofreading over the whole paper.

Author Response

Thank you so much for your constructive comments. We have revised the paper according to your comments one by one. Your comments have contributed greatly to the improvement of the quality of the paper.

Author Response File: Author Response.pdf

Article Menu

Multi-Objective Optimization for High-Dimensional Maximal Frequent Itemset Mining

Further Information

Guidelines

MDPI Initiatives

Follow MDPI