Knowledge Engineering and Data Mining Volume II

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 20 June 2024 | Viewed by 13628

Special Issue Editors


E-Mail Website
Guest Editor
Faculty of Computer Science and Information Technology, West Pomeranian University of Technology Szczecin, Zolnierska 49, 71-210 Szczecin, Poland
Interests: ontology; knowledge representation; semantic web technologies; OWL; RDF; knowledge engineering; knowledge bases; knowledge management; reasoning; information extraction; ontology learning; sustainability; sustainability assessment; ontology evaluation
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Institute of Computer Science, Faculty of Science and Technology, University of Silesia, ul. Będzińska 39, 41-200 Sosnowiec, Poland
Interests: knowledge representation and reasoning; rule-based knowledge bases; outliers mining; expert systems; decision support systems; information retrieval systems
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Extracting knowledge from data is a fundamental process in creating intelligent information retrieval systems, decision support, and knowledge management. Among the welcome topics of work, we seek research on data mining methods, multidimensional data analysis, supervised and unsupervised learning methods, methods of knowledge base management, language ontologies, ontology learning, and others. We encourage you to present new algorithms and work on practical solutions, i.e., applications/systems presenting the actually created applications of the proposed research achievements.

The Special Issue covers the entire knowledge engineering pipeline: from data acquisition and data mining to knowledge extraction and exploitation. For this reason, we have conceived this Special Issue, the purpose of which is to gather the many researchers operating in the field to contribute to a collective effort in understanding the trends and future questions in the field of knowledge engineering and data mining. Topics include, but are not limited to:

  • knowledge acquisition and engineering;
  • data mining methods;
  • big knowledge analytics;
  • data mining, knowledge discovery, and machine learning;
  • knowledge modeling and processing;
  • knowledge acquisition and engineering;
  • query and natural language processing;
  • data and information modeling;
  • data and information semantics;
  • data-intensive applications;
  • knowledge representation and reasoning;
  • decision support systems;
  • decision-making;
  • group decision-making;
  • rules mining;
  • outliers mining;
  • data exploration;
  • data science;
  • semantic web data and linked data;
  • ontologies and controlled vocabularies;
  • data acquisition;
  • multidimensional data analysis;
  • supervised and unsupervised learning methods;
  • parallel processing and modeling;
  • languages based on parallel programming and data mining.

Dr. Agnieszka Konys
Prof. Dr. Agnieszka Nowak-Brzezińska
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • knowledge engineering
  • knowledge representation and reasoning
  • decision support systems
  • knowledge acquisition
  • outliers mining
  • decision making
  • data mining
  • data science
  • data exploration
  • multidimensional data analysis
  • supervised and unsupervised learning methods
  • ontology
  • knowledge-based systems
  • ontology learning
  • methods of knowledge base management
  • parallel processing and modeling
  • languages based on parallel programming and data mining

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 415 KiB  
Article
GPT-Driven Source-to-Source Transformation for Generating Compilable Parallel CUDA Code for Nussinov’s Algorithm
by Marek Palkowski and Mateusz Gruzewski
Electronics 2024, 13(3), 488; https://doi.org/10.3390/electronics13030488 - 24 Jan 2024
Viewed by 850
Abstract
Designing automatic optimizing compilers is an advanced engineering process requiring a great deal of expertise, programming, testing, and experimentation. Maintaining the approach and adapting it to evolving libraries and environments is a time-consuming effort. In recent years, OpenAI has presented the GPT model, [...] Read more.
Designing automatic optimizing compilers is an advanced engineering process requiring a great deal of expertise, programming, testing, and experimentation. Maintaining the approach and adapting it to evolving libraries and environments is a time-consuming effort. In recent years, OpenAI has presented the GPT model, which is designed for many fields like computer science, image processing, linguistics, and medicine. It also supports automatic programming and translation between programming languages, as well as human languages. This article will verify the usability of the commonly known LLM model, GPT, for the non-trivial NPDP Nussinov’s parallel algorithm code within the OpenMP standard to create a parallel equivalent of CUDA for NVIDIA graphics cards. The goal of this approach is to avoid creating any post-processing scripts and writing any lines of target code. To validate the output code, we compare the resulting arrays with the ones calculated by the optimized code for the CPU generated employing the polyhedral compilers. Finally, the code will be checked for scalability and performance. We will concentrate on assessing the capabilities of GPT, highlighting common challenges that can be refined during future learning processes. This will enhance code generation for various platforms by leveraging the outcomes from polyhedral optimizers. Full article
(This article belongs to the Special Issue Knowledge Engineering and Data Mining Volume II)
Show Figures

Figure 1

12 pages, 1070 KiB  
Article
Prioritization of Scheduled Surgeries Using Fuzzy Decision Support and Risk Assessment Methods
by Luiza Fabisiak
Electronics 2024, 13(1), 90; https://doi.org/10.3390/electronics13010090 - 25 Dec 2023
Viewed by 530
Abstract
The aim of this study was to develop a method to minimize the risk of cancellation of planned surgery in hospital orthopedic departments. The paper proposes a method that combines multi-criteria and multi-faceted risk assessment. Two data sources are used: a fuzzy FTOPSIS [...] Read more.
The aim of this study was to develop a method to minimize the risk of cancellation of planned surgery in hospital orthopedic departments. The paper proposes a method that combines multi-criteria and multi-faceted risk assessment. Two data sources are used: a fuzzy FTOPSIS method, combined with FMEA assessment. The FMEA method presented in this paper uses the technique of prioritizing preferences according to FTOPSIS similarity to the ideal solution and belief structure, in order to overcome the shortcomings of traditional FMEA indicators. Finally, a numerical case study of process optimization for elective surgery in a Polish clinic is presented. The focus was on planned hip replacements. The effectiveness of the method in assessing the main factors influencing cancellation of elective surgery is demonstrated. A high accuracy of the results and wide adaptability of the method to other areas are features of the combination of the abovementioned methods. The problem addressed in this publication is the high rate of cancellation of elective surgery. The selection of relevant criteria, their importance, and the preferences of the patients were studied. The results of the method provide a viable action plan for the proposed research problem. The proposed method is multifaceted and can be part of an information system to support reorganization, restructuring, and modification of an operational process. Full article
(This article belongs to the Special Issue Knowledge Engineering and Data Mining Volume II)
Show Figures

Figure 1

14 pages, 9268 KiB  
Article
A Knowledge Graph Method towards Power System Fault Diagnosis and Classification
by Cheng Li and Bo Wang
Electronics 2023, 12(23), 4808; https://doi.org/10.3390/electronics12234808 - 28 Nov 2023
Viewed by 1062
Abstract
As the scale and complexity of electrical grids continue to expand, the necessity for robust fault detection techniques becomes increasingly urgent. This paper seeks to address the limitations in traditional fault detection approaches, such as the dependence on human experience, low efficiency, and [...] Read more.
As the scale and complexity of electrical grids continue to expand, the necessity for robust fault detection techniques becomes increasingly urgent. This paper seeks to address the limitations in traditional fault detection approaches, such as the dependence on human experience, low efficiency, and a lack of logical relationships. In response, this study presents a cascaded model that leverages the Random Forest classifier in combination with knowledge reasoning. The proposed method exhibits a high efficiency and accuracy in identifying six basic fault types. This approach not only simplifies fault detection and handling processes but also improves their interpretability. The paper begins by constructing a power fault simulation model, which is based on the IEEE 14-bus system. Subsequently, a Random Forest classification model is developed and compared with other commonly used models such as Support Vector Machines (SVMs), k-Nearest Neighbor (KNN), and Naïve Bayes, using metrics such as the F1-score, accuracy, and confusion matrices. Our results reveal that the Random Forest classifier outperforms the other models, particularly in small-sample datasets, with an accuracy of 90%. Then, we apply knowledge mining technology to create a comprehensive knowledge graph of power faults. At last, we use the transE model for knowledge reasoning to enhance the interpretability to assist decision making and to validate its reliability. Full article
(This article belongs to the Special Issue Knowledge Engineering and Data Mining Volume II)
Show Figures

Figure 1

21 pages, 1202 KiB  
Article
How Far Have We Progressed in the Sampling Methods for Imbalanced Data Classification? An Empirical Study
by Zhongbin Sun, Jingqi Zhang, Xiaoyan Zhu and Donghong Xu
Electronics 2023, 12(20), 4232; https://doi.org/10.3390/electronics12204232 - 13 Oct 2023
Viewed by 644
Abstract
Imbalanced data are ubiquitous in many real-world applications, and they have drawn a significant amount of attention in the field of data mining. A variety of methods have been proposed for imbalanced data classification, and data sampling methods are more prevalent due to [...] Read more.
Imbalanced data are ubiquitous in many real-world applications, and they have drawn a significant amount of attention in the field of data mining. A variety of methods have been proposed for imbalanced data classification, and data sampling methods are more prevalent due to their independence from classification algorithms. However, due to the increasing number of sampling methods, there is no consensus about which sampling method performs best, and contradictory conclusions have been obtained. Therefore, in the present study, we conducted an extensive comparison of 16 different sampling methods with four popular classification algorithms, using 75 imbalanced binary datasets from several different application domains. In addition, four widely-used measures were employed to evaluate the corresponding classification performance. The experimental results showed that none of the employed sampling methods performed the best and stably across all the used classification algorithms and evaluation measures. Furthermore, we also found that the performance of the different sampling methods was usually affected by the classification algorithms employed. Therefore, it is important for practitioners and researchers to simultaneously select appropriate sampling methods and classification algorithms, for handling the imbalanced data problems at hand. Full article
(This article belongs to the Special Issue Knowledge Engineering and Data Mining Volume II)
Show Figures

Figure 1

19 pages, 582 KiB  
Article
Electrical Power Edge-End Interaction Modeling with Time Series Label Noise Learning
by Zhenshang Wang, Mi Zhou, Yuming Zhao, Fan Zhang, Jing Wang, Bin Qian, Zhen Liu, Peitian Ma and Qianli Ma
Electronics 2023, 12(18), 3987; https://doi.org/10.3390/electronics12183987 - 21 Sep 2023
Cited by 1 | Viewed by 797
Abstract
In the context of electrical power systems, modeling the edge-end interaction involves understanding the dynamic relationship between different components and endpoints of the system. However, the time series of electrical power obtained by user terminals often suffer from low-quality issues such as missing [...] Read more.
In the context of electrical power systems, modeling the edge-end interaction involves understanding the dynamic relationship between different components and endpoints of the system. However, the time series of electrical power obtained by user terminals often suffer from low-quality issues such as missing values, numerical anomalies, and noisy labels. These issues can easily reduce the robustness of data mining results for edge-end interaction models. Therefore, this paper proposes a time–frequency noisy label classification (TF-NLC) model, which improves the robustness of edge-end interaction models in dealing with low-quality issues. Specifically, we employ two deep neural networks that are trained concurrently, utilizing both the time and frequency domains. The two networks mutually guide each other’s classification training by selecting clean labels from batches within small loss data. To further improve the robustness of the classification of time and frequency domain feature representations, we introduce a time–frequency domain consistency contrastive learning module. By classifying the selection of clean labels based on time–frequency representations for mutually guided training, TF-NLC can effectively mitigate the negative impact of noisy labels on model training. Extensive experiments on eight electrical power and ten other different realistic scenario time series datasets show that our proposed TF-NLC achieves advanced classification performance under different noisy label scenarios. Also, the ablation and visualization experiments further demonstrate the robustness of our proposed method. Full article
(This article belongs to the Special Issue Knowledge Engineering and Data Mining Volume II)
Show Figures

Figure 1

14 pages, 296 KiB  
Article
Time and Energy Benefits of Using Automatic Optimization Compilers for NPDP Tasks
by Marek Palkowski and Mateusz Gruzewski
Electronics 2023, 12(17), 3579; https://doi.org/10.3390/electronics12173579 - 24 Aug 2023
Cited by 1 | Viewed by 725
Abstract
In this article, we analyze the program codes generated automatically using three advanced optimizers: Pluto, Traco, and Dapt, which are specifically tailored for the NPDP benchmark set. This benchmark set comprises ten program loops, predominantly from the field of bioinformatics. The codes exemplify [...] Read more.
In this article, we analyze the program codes generated automatically using three advanced optimizers: Pluto, Traco, and Dapt, which are specifically tailored for the NPDP benchmark set. This benchmark set comprises ten program loops, predominantly from the field of bioinformatics. The codes exemplify dynamic programming, a challenging task for well-known tools used in program loop optimization. Given the intricacy involved, we opted for three automatic compilers based on the polyhedral model and various loop-tiling strategies. During our evaluation of the code’s performance, we meticulously considered locality and concurrency to accurately estimate time and energy efficiency. Notably, we dedicated significant attention to the latest Dapt compiler, which applies space–time loop tiling to generate highly efficient code for the NPDP benchmark suite loops. By employing the aforementioned optimizers and conducting an in-depth analysis, we aim to demonstrate the effectiveness and potential of automatic transformation techniques in enhancing the performance and energy efficiency of dynamic programming codes. Full article
(This article belongs to the Special Issue Knowledge Engineering and Data Mining Volume II)
Show Figures

Figure 1

17 pages, 559 KiB  
Article
A New Method for Graph-Based Representation of Text in Natural Language Processing
by Barbara Probierz, Anita Hrabia and Jan Kozak
Electronics 2023, 12(13), 2846; https://doi.org/10.3390/electronics12132846 - 27 Jun 2023
Cited by 1 | Viewed by 1806
Abstract
Natural language processing is still an emerging field in machine learning. Access to more and more data sets in textual form, new applications for artificial intelligence and the need for simple communication with operating systems all simultaneously affect the importance of natural language [...] Read more.
Natural language processing is still an emerging field in machine learning. Access to more and more data sets in textual form, new applications for artificial intelligence and the need for simple communication with operating systems all simultaneously affect the importance of natural language processing in evolving artificial intelligence. Traditional methods of textual representation, such as Bag-of-Words, have some limitations that result from the lack of consideration of semantics and dependencies between words. Therefore, we propose a new approach based on graph representations, which takes into account both local context and global relationships between words, allowing for a more expressive textual representation. The aim of the paper is to examine the possibility of using graph representations in natural language processing and to demonstrate their use in text classification. An innovative element of the proposed approach is the use of common cliques in graphs representing documents to create a feature vector. Experiments confirm that the proposed approach can improve classification efficiency. The use of a new text representation method to predict book categories based on the analysis of its content resulted in accuracy, precision, recall and an F1-score of over 90%. Moving from traditional approaches to a graph-based approach could make a big difference in natural language processing and text analysis and could open up new opportunities in the field. Full article
(This article belongs to the Special Issue Knowledge Engineering and Data Mining Volume II)
Show Figures

Figure 1

17 pages, 1213 KiB  
Article
Knowledge Discovery in Databases for a Football Match Result
by Szymon Głowania, Jan Kozak and Przemysław Juszczuk
Electronics 2023, 12(12), 2712; https://doi.org/10.3390/electronics12122712 - 17 Jun 2023
Viewed by 1219
Abstract
The analysis of sports data and the possibility of using machine learning in the prediction of sports results is an increasingly popular topic of research and application. The main problem, apart from choosing the right algorithm, is to obtain data that allow for [...] Read more.
The analysis of sports data and the possibility of using machine learning in the prediction of sports results is an increasingly popular topic of research and application. The main problem, apart from choosing the right algorithm, is to obtain data that allow for effective prediction. The article presents a comprehensive KDD (Knowledge Discovery in Databases) approach that allows for the appropriate preparation of data for sports prediction on sports data. The first part of the article covers the subject of KDD and sports data. The next section presents an approach to developing a dataset on top football leagues. The developed datasets are the main purpose of the article and have been made publicly available to the research community. In the latter part of the article, an experiment with the results based on heterogeneous groups of classifiers and the developed datasets is presented. Full article
(This article belongs to the Special Issue Knowledge Engineering and Data Mining Volume II)
Show Figures

Figure 1

11 pages, 598 KiB  
Article
Handling the Complexity of Computing Maximal Consistent Blocks
by Teresa Mroczek
Electronics 2023, 12(10), 2295; https://doi.org/10.3390/electronics12102295 - 19 May 2023
Viewed by 745
Abstract
The maximal consistent blocks technique, adopted from discrete mathematics, describes the maximal collection of objects, in which all objects are indiscernible in terms of available information. In this paper, we estimate the total possible number of maximal consistent blocks and prove that the [...] Read more.
The maximal consistent blocks technique, adopted from discrete mathematics, describes the maximal collection of objects, in which all objects are indiscernible in terms of available information. In this paper, we estimate the total possible number of maximal consistent blocks and prove that the number of such blocks may grow exponentially with respect to the number of attributes for incomplete data with “do not care” conditions. Results indicate that the time complexity of some known algorithms for computing maximal consistent blocks has been underestimated so far. Taking into account the complexity, for the practical usage of such blocks, we propose a performance improvement involving the parallelization of the maximal consistent blocks construction method. Full article
(This article belongs to the Special Issue Knowledge Engineering and Data Mining Volume II)
Show Figures

Figure 1

18 pages, 628 KiB  
Article
Householder Transformation-Based Temporal Knowledge Graph Reasoning
by Xiaojuan Zhao, Aiping Li, Rong Jiang, Kai Chen and Zhichao Peng
Electronics 2023, 12(9), 2001; https://doi.org/10.3390/electronics12092001 - 26 Apr 2023
Cited by 1 | Viewed by 1502
Abstract
Knowledge graphs’ reasoning is of great significance for the further development of artificial intelligence and information retrieval, especially for reasoning over temporal knowledge graphs. The rotation-based method has been shown to be effective at modeling entities and relations on a knowledge graph. However, [...] Read more.
Knowledge graphs’ reasoning is of great significance for the further development of artificial intelligence and information retrieval, especially for reasoning over temporal knowledge graphs. The rotation-based method has been shown to be effective at modeling entities and relations on a knowledge graph. However, due to the lack of temporal information representation capability, existing approaches can only model partial relational patterns and they cannot handle temporal combination reasoning. In this regard, we propose HTTR: Householder Transformation-based Temporal knowledge graph Reasoning, which focuses on the characteristics of relations that evolve over time. HTTR first fuses the relation and temporal information in the knowledge graph, then uses the Householder transformation to obtain an orthogonal matrix about the fused information, and finally defines the orthogonal matrix as the rotation of the head-entity to the tail-entity and calculates the similarity between the rotated vector and the vector representation of the tail entity. In addition, we compare three methods for fusing relational and temporal information. We allow other fusion methods to replace the current one as long as the dimensionality satisfies the requirements. We show that HTTR is able to outperform state-of-the-art methods in temporal knowledge graph reasoning tasks and has the ability to learn and infer all of the four relational patterns over time: symmetric reasoning, antisymmetric reasoning, inversion reasoning, and temporal combination reasoning. Full article
(This article belongs to the Special Issue Knowledge Engineering and Data Mining Volume II)
Show Figures

Figure 1

25 pages, 1646 KiB  
Article
The Impact of Data Quality on Software Testing Effort Prediction
by Łukasz Radliński
Electronics 2023, 12(7), 1656; https://doi.org/10.3390/electronics12071656 - 31 Mar 2023
Cited by 1 | Viewed by 1314
Abstract
Background: This paper investigates the impact of data quality on the performance of models predicting effort on software testing. Data quality was reflected by training data filtering strategies (data variants) covering combinations of Data Quality Rating, UFP Rating, and a threshold [...] Read more.
Background: This paper investigates the impact of data quality on the performance of models predicting effort on software testing. Data quality was reflected by training data filtering strategies (data variants) covering combinations of Data Quality Rating, UFP Rating, and a threshold of valid cases. Methods: The experiment used the ISBSG dataset and 16 machine learning models. A process of three-fold cross-validation repeated 20 times was used to train and evaluate each model with each data variant. Model performance was assessed using absolute errors of prediction. A ‘win–tie–loss’ procedure, based on the Wilcoxon signed-rank test, was applied to identify the best models and data variants. Results: Most models, especially the most accurate, performed the best on a complete dataset, even though it contained cases with low data ratings. The detailed results include the rankings of the following: (1) models for particular data variants, (2) data variants for particular models, and (3) the best-performing combinations of models and data variants. Conclusions: Arbitrary and restrictive data selection to only projects with Data Quality Rating and UFP Rating of ‘A’ or ‘B’, commonly used in the literature, does not seem justified. It is recommended not to exclude cases with low data ratings to achieve better accuracy of most predictive models for testing effort prediction. Full article
(This article belongs to the Special Issue Knowledge Engineering and Data Mining Volume II)
Show Figures

Figure 1

23 pages, 9327 KiB  
Article
Influence of Different Data Interpolation Methods for Sparse Data on the Construction Accuracy of Electric Bus Driving Cycle
by Xingxing Wang, Peilin Ye, Yelin Deng, Yinnan Yuan, Yu Zhu and Hongjun Ni
Electronics 2023, 12(6), 1377; https://doi.org/10.3390/electronics12061377 - 13 Mar 2023
Cited by 3 | Viewed by 1382
Abstract
Battery electric vehicles (BEVs) are one of the most promising new energy models for industrialization and marketization at this stage, which is an important way to solve the current urban haze air pollution, high fuel cost and sustainable development of the automobile industry. [...] Read more.
Battery electric vehicles (BEVs) are one of the most promising new energy models for industrialization and marketization at this stage, which is an important way to solve the current urban haze air pollution, high fuel cost and sustainable development of the automobile industry. This paper takes pure electric buses as the research object and relies on the operation information management platform of new energy buses in Nantong city to propose an electric bus cycle construction method based on the mixed interpolation method to process sparse data. Three different interpolation methods, linear interpolation, step interpolation and mixed interpolation, were used to preprocess the collected data. The principal component analysis method and K-means clustering algorithm were used to reduce and classify the eigen parameter matrix. According to the clustering results, different categories of moving section and idle section libraries were established. According to the length of time and the correlation among various types, several moving sections and idle sections were selected to form a representative driving cycle of Nantong city buses. The results show that the mixed interpolation method, based on linear interpolation and cubic spline interpolation, has a good processing effect. The average relative error between the synthesized working conditions and the measured data are 15.71%, and the relative error of the seven characteristic parameters is less than 10%, which meets the development requirements. In addition, the comparison and analysis with the characteristic parameters of the world typical cycle conditions (NEDC, WLTC) show that the constructed cycle conditions of Nantong city are reasonable and reliable to represent the driving conditions of pure electric buses in Nantong city, which can provide a reference for the optimization of the bus energy control strategy. Full article
(This article belongs to the Special Issue Knowledge Engineering and Data Mining Volume II)
Show Figures

Graphical abstract

Back to TopTop