Topic Menu

Topic Editors

E-Mail Website

Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

Dr. John (Junhu) Wang

E-Mail Website

School of Information and Communication Technology, Griffith University, Brisbane, Australia

Dr. Min Yang

E-Mail Website

Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

Recent Advances in Data Mining

Abstract submission deadline

closed (31 March 2023)

Manuscript submission deadline

closed (31 May 2023)

Viewed by

13320

Topic Information

Dear Colleagues,

Data mining is the procedure of identifying valid, potentially suitable, and understandable information; detecting patterns; building knowledge graphs; and finding anomalies and relationships in big data with Artificial-Intelligence-enabled IoT （AIoT). This process is essential for advancing knowledge in various fields dealing with raw data from web, text, numeric, media, or financial transactions. Its scope has expanded through hybridizing various data mining algorithms for use in financial technology and cryptocurrency, the blockchain, data sciences, sentiment analysis, and recommender systems. Moreover, data mining provides advantages in many practical fields, such as in preserving the privacy of health data analysis and mining, biology, data security, smart cities, and smart grids. It is also necessary to investigate the recent advances in data mining involving the incorporation of machine learning algorithms and artificial neural networks. Among other fields of artificial intelligence, machine and deep learning are certainly some of the most studied in recent years. There has been a massive shift in the last few decades due to the advent of deep learning, which has opened up unprecedented theoretic and application-based opportunities for data mining. This Topic will present a collection of articles reflecting the latest developments in data mining and related fields, investigating both practical and theoretical applications; knowledge discovery and extraction; image analysis; classification and clustering; FinTech and cryptocurrency; the blockchain and data security; privacy-preserving data mining; and many others. Contributions focused on both theoretical and practical models are welcome. Papers will be selected for inclusion based on their formal and technical soundness, experimental support, and relevance.

Prof. Dr. Qingshan Jiang
Dr. John (Junhu) Wang
Dr. Min Yang
Topic Editors

Keywords

data mining
web mining
text mining
graph mining
classification
clustering
machine learning
deep learning
knowledge graph
knowledge discovery and extraction
artificial intelligence
statistical modeling
privacy-preserving data mining
social networks analysis
natural language processing applications
recommendation systems
big data storage systems
big data analysis
data management and analysis
FinTech data analysis and cryptocurrency
blockchain data security

Participating Journals

Journal Name	Impact Factor	CiteScore	Launched Year	First Decision (median)	APC
Algorithms algorithms	2.3	3.7	2008	15 Days	CHF 1600
Applied Sciences applsci	2.7	4.5	2011	16.9 Days	CHF 2400
Electronics electronics	2.9	4.7	2012	15.6 Days	CHF 2400
Energies energies	3.2	5.5	2008	16.1 Days	CHF 2600
Mathematics mathematics	2.4	3.5	2013	16.9 Days	CHF 2600

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

Immediately share your ideas ahead of publication and establish your research priority;
Protect your idea from being stolen with this time-stamped preprint article;
Enhance the exposure and impact of your research;
Receive feedback from your peers in advance;
Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (7 papers)

Download All Papers

Order results

Result details

Journals

Show export options Show export options

Select all

Export citation of selected articles as:

16 pages, 1756 KiB

Open AccessArticle

Effective Event Extraction Method via Enhanced Graph Convolutional Network Indication with Hierarchical Argument Selection Strategy

by Zheng Liu, Yimeng Li, Yu Zhang, Yu Weng, Kunyu Yang and Chaomurilige

Electronics 2023, 12(13), 2981; https://doi.org/10.3390/electronics12132981 - 06 Jul 2023

Cited by 1 | Viewed by 883

Abstract

As one of foundation technologies for massive data processing for AI, event mining is attracting more and more attention, mainly including event detection (event trigger identification and event classification) and argument extraction. At present, EE-GCN is one of the most effective methods for event detection. However, since EE-GCN only focuses on event detection, complete event multi-tuple extraction needs to be improved. Inspired by the EE-GCN event detection method, this paper proposes an effective event extraction method via graph convolutional network indication with a hierarchical argument selection strategy. The method mainly includes the following steps. (1) Based on the ACE2005 argument extraction template, a new argument extraction template is established for the Baidu event extraction dataset. (2) The trigger events and event classification detected by EE-GCN are used as indicators to determine the argument extraction template, and the alternative arguments are extracted via named entity recognition based on the determined template. (3) Making full use of the side information of EE-GCN graph to solve the local and global correlation degree, and based on the local and global correlation degrees, the final argument multi-tuple is determined. (4) Finally, several experiments are conducted on the Baidu event extraction dataset to compare the proposed method with other methods. The experimental results show that the proposed method has improved the accuracy and completeness of the event extraction compared to other existing methods. Full article

(This article belongs to the Topic Recent Advances in Data Mining)

► Show Figures

Figure 1

13 pages, 2179 KiB

Open AccessArticle

Multi-Modal Entity Alignment Method Based on Feature Enhancement

by Huansha Wang, Qinrang Liu, Ruiyang Huang and Jianpeng Zhang

Appl. Sci. 2023, 13(11), 6747; https://doi.org/10.3390/app13116747 - 01 Jun 2023

Viewed by 963

Abstract

Multi-modal entity alignment refers to identifying equivalent entities between two different multi-modal knowledge graphs that consist of multi-modal information such as structural triples and descriptive images. Most previous multi-modal entity alignment methods have mainly used corresponding encoders of each modality to encode entity information and then perform feature fusion to obtain the multi-modal joint representation. However, this approach does not fully utilize the multi-modal information of aligned entities. To address this issue, we propose MEAFE, a multi-modal entity alignment method based on feature enhancement. The MEAFE adopts the multi-modal pre-trained model, OCR model, and GATv2 network to enhance the model’s ability to extract useful features in entity structure triplet information and image description, respectively, thereby generating more effective multi-modal representations. Secondly, it further adds modal distribution information of the entity to enhance the model’s understanding and modeling ability of the multi-modal information. Experiments on bilingual and cross-graph multi-modal datasets demonstrate that the proposed method outperforms models that use traditional feature extraction methods. Full article

(This article belongs to the Topic Recent Advances in Data Mining)

► Show Figures

Figure 1

16 pages, 543 KiB

Open AccessArticle

Incrementally Mining Column Constant Biclusters with FVSFP Tree

by Jiaxuan Zhang, Xueyong Wang and Jie Liu

Appl. Sci. 2023, 13(11), 6458; https://doi.org/10.3390/app13116458 - 25 May 2023

Viewed by 627

Abstract

Bicluster mining has been frequently studied in the data mining field. Because column constant biclusters (CCB) can be transformed to be discriminative rules, they have been widely applied in various fields. However, no research on incrementally mining CCB has been reported in the literature. In real situations, due to the limitation of computation resources (such as memory), it is impossible to mine biclusters from very large datasets. Therefore, in this study, we propose an incremental mining CCB method. CCB can be deemed as a special case of frequent pattern (FP). Currently the most frequently used method for incrementally mining frequent patterns is FP tree based method. In this study, we innovatively propose an incremental mining CCB method with modified FP tree data structure. The technical contributions lie in two aspects. The first aspect is that we propose a modified FP tree data structure, namely Feature Value Sorting Frequent Pattern (FVSFP) tree that can be easily maintained. The second aspect is that we innovatively design a method for mining CCB from FVSFP tree. To verify the performance of the proposed method, it is tested on several datasets. Experimental results demonstrated that the proposed method has good performance for incrementally handling a newly added dataset. Full article

(This article belongs to the Topic Recent Advances in Data Mining)

► Show Figures

Figure 1

20 pages, 676 KiB

Open AccessArticle

Enhancement of Question Answering System Accuracy via Transfer Learning and BERT

by Kai Duan, Shiyu Du, Yiming Zhang, Yanru Lin, Hongzhuo Wu and Quan Zhang

Appl. Sci. 2022, 12(22), 11522; https://doi.org/10.3390/app122211522 - 13 Nov 2022

Cited by 4 | Viewed by 2152

Abstract

Entity linking and predicate matching are two core tasks in the Chinese Knowledge Base Question Answering (CKBQA). Compared with the English entity linking task, the Chinese entity linking is extremely complicated, making accurate Chinese entity linking difficult. Meanwhile, strengthening the correlation between entities and predicates is the key to the accuracy of the question answering system. Therefore, we put forward a Bidirectional Encoder Representation from Transformers and transfer learning Knowledge Base Question Answering (BAT-KBQA) framework, which is on the basis of feature-enhanced Bidirectional Encoder Representation from Transformers (BERT), and then perform a Named Entity Recognition (NER) task, which is appropriate for Chinese datasets using transfer learning and the Bidirectional Long Short-Term Memory-Conditional Random Field (BiLSTM-CRF) model. We utilize a BERT-CNN (Convolutional Neural Network) model for entity disambiguation of the problem and candidate entities; based on the set of entities and predicates, a BERT-Softmax model with answer entity predicate features is introduced for predicate matching. The answer ultimately chooses to integrate entities and predicates scores to determine the definitive answer. The experimental results indicate that the model, which is developed by us, considerably enhances the overall performance of the Knowledge Base Question Answering (KBQA) and it has the potential to be generalizable. The model also has better performance on the dataset supplied by the NLPCC-ICCPOL2016 KBQA task with a mean F1 score of 87.74% compared to BB-KBQA. Full article

(This article belongs to the Topic Recent Advances in Data Mining)

► Show Figures

Figure 1

16 pages, 2240 KiB

Open AccessArticle

Intelligent Identification and Order-Sensitive Correction Method of Outliers from Multi-Data Source Based on Historical Data Mining

by Guangyu Chen, Zhengyang Zhu, Li Yang, Wenhao Huang, Yuzhuo Zhang, Gang Lin and Shengjie Zhang

Electronics 2022, 11(18), 2819; https://doi.org/10.3390/electronics11182819 - 07 Sep 2022

Cited by 2 | Viewed by 1335

Abstract

In recent years, outliers caused by manual operation errors and equipment acquisition failures often occur, bringing challenges to big data analysis. In view of the difficulties in identifying and correcting outliers of multi-source data, an intelligent identification and order-sensitive correction method of outliers from multi-data sources based on historical data mining was proposed. First, an intelligent identification method of outliers of single-source data is proposed based on neural tangent kernel K-means (NTKKM) clustering. The original data is mapped to high-dimensional feature space using Neural Tangent Kernel, where the features of outliers are acquired by K-means clustering to realize the accurate identification of outliers. Second, an order-sensitive missing value imputation framework for multi-source data (OMSMVI) was proposed. The similarity graph of sources with missing data was constructed based on multidimensional similarity analysis, and the filling order decision was transformed into an optimization problem to realize the optimal filling order decision of missing values in multi-source data. Finally, a neighborhood-based imputation (NI) algorithm is proposed. Based on the traditional KNN filling algorithm, neighboring nodes of sources with missing data are flexibly selected to the achieve accurate correction of outliers. The case experiment was operated on actual power grid data, and the results show that the proposed clustering method can identify outliers more accurately, and the determined optimal imputation sequence has higher accuracy, which provide a feasible new idea for the identification and correction of outliers in the process of data preprocessing. Full article

(This article belongs to the Topic Recent Advances in Data Mining)

► Show Figures

Figure 1

19 pages, 930 KiB

Open AccessArticle

Chinese Named Entity Recognition Based on Knowledge Based Question Answering System

by Didi Yin, Siyuan Cheng, Boxu Pan, Yuanyuan Qiao, Wei Zhao and Dongyu Wang

Appl. Sci. 2022, 12(11), 5373; https://doi.org/10.3390/app12115373 - 26 May 2022

Cited by 6 | Viewed by 2308

Abstract

The KBQA (Knowledge-Based Question Answering) system is an essential part of the smart customer service system. KBQA is a type of QA (Question Answering) system based on KB (Knowledge Base). It aims to automatically answer natural language questions by retrieving structured data stored in the knowledge base. Generally, when a KBQA system receives the user’s query, it first needs to recognize topic entities of the query, such as name, location, organization, etc. This process is the NER (Named Entity Recognition). In this paper, we use the Bidirectional Long Short-Term Memory-Conditional Random Field (Bi-LSTM-CRF) model and introduce the SoftLexicon method for a Chinese NER task. At the same time, according to the analysis of the characteristics of application scenario, we propose a fuzzy matching module based on the combination of multiple methods. This module can efficiently modify the error recognition results, which can further improve the performance of entity recognition. We combine the NER model and the fuzzy matching module into an NER system. To explore the availability of the system in some specific fields, such as a power grid field, we utilize the power grid-related original data collected by the Hebei Electric Power Company to improve our system according to the characteristics of data in the power grid field. We innovatively make the dataset and high-frequency word lexicon in the power grid field, which makes our proposed NER system perform better in recognizing entities in the field of power grid. We used the cross-validation method for validation. The experimental results show that the F1-score of the improved NER model on the power grid dataset reaches 92.43%. After processing the recognition results by using the fuzzy matching module, about 99% of the entities in the test set can be correctly recognized. It proves that the proposed NER system can achieve excellent performance in the application scenario of a power grid. The results of this work will also fill the gap in the research of intelligent customer-service-related technologies in the power grid field in China. Full article

(This article belongs to the Topic Recent Advances in Data Mining)

► Show Figures

Figure 1

14 pages, 2591 KiB

Open AccessArticle

One-Shot Fault Diagnosis of Wind Turbines Based on Meta-Analogical Momentum Contrast Learning

by Xiaobo Liu, Hantao Guo and Yibing Liu

Energies 2022, 15(9), 3133; https://doi.org/10.3390/en15093133 - 25 Apr 2022

Cited by 6 | Viewed by 1730

Abstract

The rapid development of artificial intelligence offers more opportunities for intelligent mechanical diagnosis. Fault diagnosis of wind turbines is beneficial to improve the reliability of wind turbines. Due to various reasons, such as difficulty in obtaining fault data, random changes in operating conditions, or compound faults, many deep learning algorithms show poor performance. When fault samples are small, ordinary deep learning will fall into overfitting. Few-shot learning can effectively solve the problem of overfitting caused by fewer fault samples. A novel method based on meta-analogical momentum contrast learning (MA-MOCO) is proposed in this paper to solve the problem of the very few samples of wind turbine failures, especially one-shot. By improving the momentum contrast learning (MOCO) and using the training idea of meta-learning, the one-shot fault diagnosis of wind turbine drivetrain is analyzed. The proposed model shows a higher accuracy than other common models (e.g., model-agnostic meta-learning and Siamese net) in one-shot learning. The feature embedding is visualized by t-distributed stochastic neighbor embedding (t-SNE) in order to test the effectiveness of the proposed model. Full article

(This article belongs to the Topic Recent Advances in Data Mining)

► Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Displaying articles 1-7

Submit your Abstract

Journal Name	Impact Factor	CiteScore	Launched Year	First Decision (median)	APC
Algorithms algorithms	2.3	3.7	2008	15 Days	CHF 1600
Applied Sciences applsci	2.7	4.5	2011	16.9 Days	CHF 2400
Electronics electronics	2.9	4.7	2012	15.6 Days	CHF 2400
Energies energies	3.2	5.5	2008	16.1 Days	CHF 2600
Mathematics mathematics	2.4	3.5	2013	16.9 Days	CHF 2600

Topic Menu

Topic Editors

Recent Advances in Data Mining

Topic Information

Keywords

Participating Journals

Published Papers (7 papers)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI