Topic Editors

Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
Dr. John (Junhu) Wang
School of Information and Communication Technology, Griffith University, Brisbane, Australia
Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

Recent Advances in Data Mining

Abstract submission deadline
closed (31 March 2023)
Manuscript submission deadline
closed (31 May 2023)
Viewed by
13202

Topic Information

Dear Colleagues,

Data mining is the procedure of identifying valid, potentially suitable, and understandable information; detecting patterns; building knowledge graphs; and finding anomalies and relationships in big data with Artificial-Intelligence-enabled IoT (AIoT). This process is essential for advancing knowledge in various fields dealing with raw data from web, text, numeric, media, or financial transactions. Its scope has expanded through hybridizing various data mining algorithms for use in financial technology and cryptocurrency, the blockchain, data sciences, sentiment analysis, and recommender systems. Moreover, data mining provides advantages in many practical fields, such as in preserving the privacy of health data analysis and mining, biology, data security, smart cities, and smart grids. It is also necessary to investigate the recent advances in data mining involving the incorporation of machine learning algorithms and artificial neural networks. Among other fields of artificial intelligence, machine and deep learning are certainly some of the most studied in recent years. There has been a massive shift in the last few decades due to the advent of deep learning, which has opened up unprecedented theoretic and application-based opportunities for data mining. This Topic will present a collection of articles reflecting the latest developments in data mining and related fields, investigating both practical and theoretical applications; knowledge discovery and extraction; image analysis; classification and clustering; FinTech and cryptocurrency; the blockchain and data security; privacy-preserving data mining; and many others. Contributions focused on both theoretical and practical models are welcome. Papers will be selected for inclusion based on their formal and technical soundness, experimental support, and relevance.

Prof. Dr. Qingshan Jiang
Dr. John (Junhu) Wang
Dr. Min Yang
Topic Editors

Keywords

  • data mining
  • web mining
  • text mining
  • graph mining
  • classification
  • clustering
  • machine learning
  • deep learning
  • knowledge graph
  • knowledge discovery and extraction
  • artificial intelligence
  • statistical modeling
  • privacy-preserving data mining
  • social networks analysis
  • natural language processing applications
  • recommendation systems
  • big data storage systems
  • big data analysis
  • data management and analysis
  • FinTech data analysis and cryptocurrency
  • blockchain data security

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Algorithms
algorithms
2.3 3.7 2008 15 Days CHF 1600
Applied Sciences
applsci
2.7 4.5 2011 16.9 Days CHF 2400
Electronics
electronics
2.9 4.7 2012 15.6 Days CHF 2400
Energies
energies
3.2 5.5 2008 16.1 Days CHF 2600
Mathematics
mathematics
2.4 3.5 2013 16.9 Days CHF 2600

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (7 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
16 pages, 1756 KiB  
Article
Effective Event Extraction Method via Enhanced Graph Convolutional Network Indication with Hierarchical Argument Selection Strategy
by Zheng Liu, Yimeng Li, Yu Zhang, Yu Weng, Kunyu Yang and Chaomurilige
Electronics 2023, 12(13), 2981; https://doi.org/10.3390/electronics12132981 - 06 Jul 2023
Cited by 1 | Viewed by 874
Abstract
As one of foundation technologies for massive data processing for AI, event mining is attracting more and more attention, mainly including event detection (event trigger identification and event classification) and argument extraction. At present, EE-GCN is one of the most effective methods for [...] Read more.
As one of foundation technologies for massive data processing for AI, event mining is attracting more and more attention, mainly including event detection (event trigger identification and event classification) and argument extraction. At present, EE-GCN is one of the most effective methods for event detection. However, since EE-GCN only focuses on event detection, complete event multi-tuple extraction needs to be improved. Inspired by the EE-GCN event detection method, this paper proposes an effective event extraction method via graph convolutional network indication with a hierarchical argument selection strategy. The method mainly includes the following steps. (1) Based on the ACE2005 argument extraction template, a new argument extraction template is established for the Baidu event extraction dataset. (2) The trigger events and event classification detected by EE-GCN are used as indicators to determine the argument extraction template, and the alternative arguments are extracted via named entity recognition based on the determined template. (3) Making full use of the side information of EE-GCN graph to solve the local and global correlation degree, and based on the local and global correlation degrees, the final argument multi-tuple is determined. (4) Finally, several experiments are conducted on the Baidu event extraction dataset to compare the proposed method with other methods. The experimental results show that the proposed method has improved the accuracy and completeness of the event extraction compared to other existing methods. Full article
(This article belongs to the Topic Recent Advances in Data Mining)
Show Figures

Figure 1

13 pages, 2179 KiB  
Article
Multi-Modal Entity Alignment Method Based on Feature Enhancement
by Huansha Wang, Qinrang Liu, Ruiyang Huang and Jianpeng Zhang
Appl. Sci. 2023, 13(11), 6747; https://doi.org/10.3390/app13116747 - 01 Jun 2023
Viewed by 954
Abstract
Multi-modal entity alignment refers to identifying equivalent entities between two different multi-modal knowledge graphs that consist of multi-modal information such as structural triples and descriptive images. Most previous multi-modal entity alignment methods have mainly used corresponding encoders of each modality to encode entity [...] Read more.
Multi-modal entity alignment refers to identifying equivalent entities between two different multi-modal knowledge graphs that consist of multi-modal information such as structural triples and descriptive images. Most previous multi-modal entity alignment methods have mainly used corresponding encoders of each modality to encode entity information and then perform feature fusion to obtain the multi-modal joint representation. However, this approach does not fully utilize the multi-modal information of aligned entities. To address this issue, we propose MEAFE, a multi-modal entity alignment method based on feature enhancement. The MEAFE adopts the multi-modal pre-trained model, OCR model, and GATv2 network to enhance the model’s ability to extract useful features in entity structure triplet information and image description, respectively, thereby generating more effective multi-modal representations. Secondly, it further adds modal distribution information of the entity to enhance the model’s understanding and modeling ability of the multi-modal information. Experiments on bilingual and cross-graph multi-modal datasets demonstrate that the proposed method outperforms models that use traditional feature extraction methods. Full article
(This article belongs to the Topic Recent Advances in Data Mining)
Show Figures

Figure 1

16 pages, 543 KiB  
Article
Incrementally Mining Column Constant Biclusters with FVSFP Tree
by Jiaxuan Zhang, Xueyong Wang and Jie Liu
Appl. Sci. 2023, 13(11), 6458; https://doi.org/10.3390/app13116458 - 25 May 2023
Viewed by 620
Abstract
Bicluster mining has been frequently studied in the data mining field. Because column constant biclusters (CCB) can be transformed to be discriminative rules, they have been widely applied in various fields. However, no research on incrementally mining CCB has been reported in the [...] Read more.
Bicluster mining has been frequently studied in the data mining field. Because column constant biclusters (CCB) can be transformed to be discriminative rules, they have been widely applied in various fields. However, no research on incrementally mining CCB has been reported in the literature. In real situations, due to the limitation of computation resources (such as memory), it is impossible to mine biclusters from very large datasets. Therefore, in this study, we propose an incremental mining CCB method. CCB can be deemed as a special case of frequent pattern (FP). Currently the most frequently used method for incrementally mining frequent patterns is FP tree based method. In this study, we innovatively propose an incremental mining CCB method with modified FP tree data structure. The technical contributions lie in two aspects. The first aspect is that we propose a modified FP tree data structure, namely Feature Value Sorting Frequent Pattern (FVSFP) tree that can be easily maintained. The second aspect is that we innovatively design a method for mining CCB from FVSFP tree. To verify the performance of the proposed method, it is tested on several datasets. Experimental results demonstrated that the proposed method has good performance for incrementally handling a newly added dataset. Full article
(This article belongs to the Topic Recent Advances in Data Mining)
Show Figures

Figure 1

20 pages, 676 KiB  
Article
Enhancement of Question Answering System Accuracy via Transfer Learning and BERT
by Kai Duan, Shiyu Du, Yiming Zhang, Yanru Lin, Hongzhuo Wu and Quan Zhang
Appl. Sci. 2022, 12(22), 11522; https://doi.org/10.3390/app122211522 - 13 Nov 2022
Cited by 4 | Viewed by 2133
Abstract
Entity linking and predicate matching are two core tasks in the Chinese Knowledge Base Question Answering (CKBQA). Compared with the English entity linking task, the Chinese entity linking is extremely complicated, making accurate Chinese entity linking difficult. Meanwhile, strengthening the correlation between entities [...] Read more.
Entity linking and predicate matching are two core tasks in the Chinese Knowledge Base Question Answering (CKBQA). Compared with the English entity linking task, the Chinese entity linking is extremely complicated, making accurate Chinese entity linking difficult. Meanwhile, strengthening the correlation between entities and predicates is the key to the accuracy of the question answering system. Therefore, we put forward a Bidirectional Encoder Representation from Transformers and transfer learning Knowledge Base Question Answering (BAT-KBQA) framework, which is on the basis of feature-enhanced Bidirectional Encoder Representation from Transformers (BERT), and then perform a Named Entity Recognition (NER) task, which is appropriate for Chinese datasets using transfer learning and the Bidirectional Long Short-Term Memory-Conditional Random Field (BiLSTM-CRF) model. We utilize a BERT-CNN (Convolutional Neural Network) model for entity disambiguation of the problem and candidate entities; based on the set of entities and predicates, a BERT-Softmax model with answer entity predicate features is introduced for predicate matching. The answer ultimately chooses to integrate entities and predicates scores to determine the definitive answer. The experimental results indicate that the model, which is developed by us, considerably enhances the overall performance of the Knowledge Base Question Answering (KBQA) and it has the potential to be generalizable. The model also has better performance on the dataset supplied by the NLPCC-ICCPOL2016 KBQA task with a mean F1 score of 87.74% compared to BB-KBQA. Full article
(This article belongs to the Topic Recent Advances in Data Mining)
Show Figures

Figure 1

16 pages, 2240 KiB  
Article
Intelligent Identification and Order-Sensitive Correction Method of Outliers from Multi-Data Source Based on Historical Data Mining
by Guangyu Chen, Zhengyang Zhu, Li Yang, Wenhao Huang, Yuzhuo Zhang, Gang Lin and Shengjie Zhang
Electronics 2022, 11(18), 2819; https://doi.org/10.3390/electronics11182819 - 07 Sep 2022
Cited by 2 | Viewed by 1327
Abstract
In recent years, outliers caused by manual operation errors and equipment acquisition failures often occur, bringing challenges to big data analysis. In view of the difficulties in identifying and correcting outliers of multi-source data, an intelligent identification and order-sensitive correction method of outliers [...] Read more.
In recent years, outliers caused by manual operation errors and equipment acquisition failures often occur, bringing challenges to big data analysis. In view of the difficulties in identifying and correcting outliers of multi-source data, an intelligent identification and order-sensitive correction method of outliers from multi-data sources based on historical data mining was proposed. First, an intelligent identification method of outliers of single-source data is proposed based on neural tangent kernel K-means (NTKKM) clustering. The original data is mapped to high-dimensional feature space using Neural Tangent Kernel, where the features of outliers are acquired by K-means clustering to realize the accurate identification of outliers. Second, an order-sensitive missing value imputation framework for multi-source data (OMSMVI) was proposed. The similarity graph of sources with missing data was constructed based on multidimensional similarity analysis, and the filling order decision was transformed into an optimization problem to realize the optimal filling order decision of missing values in multi-source data. Finally, a neighborhood-based imputation (NI) algorithm is proposed. Based on the traditional KNN filling algorithm, neighboring nodes of sources with missing data are flexibly selected to the achieve accurate correction of outliers. The case experiment was operated on actual power grid data, and the results show that the proposed clustering method can identify outliers more accurately, and the determined optimal imputation sequence has higher accuracy, which provide a feasible new idea for the identification and correction of outliers in the process of data preprocessing. Full article
(This article belongs to the Topic Recent Advances in Data Mining)
Show Figures

Figure 1

19 pages, 930 KiB  
Article
Chinese Named Entity Recognition Based on Knowledge Based Question Answering System
by Didi Yin, Siyuan Cheng, Boxu Pan, Yuanyuan Qiao, Wei Zhao and Dongyu Wang
Appl. Sci. 2022, 12(11), 5373; https://doi.org/10.3390/app12115373 - 26 May 2022
Cited by 6 | Viewed by 2297
Abstract
The KBQA (Knowledge-Based Question Answering) system is an essential part of the smart customer service system. KBQA is a type of QA (Question Answering) system based on KB (Knowledge Base). It aims to automatically answer natural language questions by retrieving structured data stored [...] Read more.
The KBQA (Knowledge-Based Question Answering) system is an essential part of the smart customer service system. KBQA is a type of QA (Question Answering) system based on KB (Knowledge Base). It aims to automatically answer natural language questions by retrieving structured data stored in the knowledge base. Generally, when a KBQA system receives the user’s query, it first needs to recognize topic entities of the query, such as name, location, organization, etc. This process is the NER (Named Entity Recognition). In this paper, we use the Bidirectional Long Short-Term Memory-Conditional Random Field (Bi-LSTM-CRF) model and introduce the SoftLexicon method for a Chinese NER task. At the same time, according to the analysis of the characteristics of application scenario, we propose a fuzzy matching module based on the combination of multiple methods. This module can efficiently modify the error recognition results, which can further improve the performance of entity recognition. We combine the NER model and the fuzzy matching module into an NER system. To explore the availability of the system in some specific fields, such as a power grid field, we utilize the power grid-related original data collected by the Hebei Electric Power Company to improve our system according to the characteristics of data in the power grid field. We innovatively make the dataset and high-frequency word lexicon in the power grid field, which makes our proposed NER system perform better in recognizing entities in the field of power grid. We used the cross-validation method for validation. The experimental results show that the F1-score of the improved NER model on the power grid dataset reaches 92.43%. After processing the recognition results by using the fuzzy matching module, about 99% of the entities in the test set can be correctly recognized. It proves that the proposed NER system can achieve excellent performance in the application scenario of a power grid. The results of this work will also fill the gap in the research of intelligent customer-service-related technologies in the power grid field in China. Full article
(This article belongs to the Topic Recent Advances in Data Mining)
Show Figures

Figure 1

14 pages, 2591 KiB  
Article
One-Shot Fault Diagnosis of Wind Turbines Based on Meta-Analogical Momentum Contrast Learning
by Xiaobo Liu, Hantao Guo and Yibing Liu
Energies 2022, 15(9), 3133; https://doi.org/10.3390/en15093133 - 25 Apr 2022
Cited by 6 | Viewed by 1715
Abstract
The rapid development of artificial intelligence offers more opportunities for intelligent mechanical diagnosis. Fault diagnosis of wind turbines is beneficial to improve the reliability of wind turbines. Due to various reasons, such as difficulty in obtaining fault data, random changes in operating conditions, [...] Read more.
The rapid development of artificial intelligence offers more opportunities for intelligent mechanical diagnosis. Fault diagnosis of wind turbines is beneficial to improve the reliability of wind turbines. Due to various reasons, such as difficulty in obtaining fault data, random changes in operating conditions, or compound faults, many deep learning algorithms show poor performance. When fault samples are small, ordinary deep learning will fall into overfitting. Few-shot learning can effectively solve the problem of overfitting caused by fewer fault samples. A novel method based on meta-analogical momentum contrast learning (MA-MOCO) is proposed in this paper to solve the problem of the very few samples of wind turbine failures, especially one-shot. By improving the momentum contrast learning (MOCO) and using the training idea of meta-learning, the one-shot fault diagnosis of wind turbine drivetrain is analyzed. The proposed model shows a higher accuracy than other common models (e.g., model-agnostic meta-learning and Siamese net) in one-shot learning. The feature embedding is visualized by t-distributed stochastic neighbor embedding (t-SNE) in order to test the effectiveness of the proposed model. Full article
(This article belongs to the Topic Recent Advances in Data Mining)
Show Figures

Figure 1

Back to TopTop