International Database Engineered Applications

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Processes".

Deadline for manuscript submissions: 31 July 2024 | Viewed by 15997

Special Issue Editor


E-Mail Website
Guest Editor
Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
Interests: databases; computational linguistics; bioinformatics; geoinformatics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

We are inviting selected authors to submit extended versions of their 26th International Database Engineered Applications Symposium (IDEAS) paper to the Information journal (ISSN 2078-2489, ESCI and Scopus-indexed with a 4.2 citescore). The extended version must provide a minimum of 50% new content and not exceed 30% copy/paste from the proceedings paper. In addition, some authors who also work in topics related to the IDEAS 2022 conference may be considered upon writing to the guest editor about their interest to participate in the Special Issue.

The main topics of the Special Issue are all aspects of database engineering defined broadly, and particularly topics of emerging interest describing work on integrating new technologies into products and applications, on experiences with existing and novel techniques, and on the identification of unsolved challenges. In addition, there were two special tracks of the conference, which include data science and GIS systems and applications. We also welcome papers on these topics.

Thank you,

Prof. Dr. Peter Revesz
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

 

Keywords

  • data engineering
  • human–computer interaction
  • data science and analytic methods
  • machine learning foundations of data science
  • geographic information systems (GIS)
  • database systems for practical applications

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 974 KiB  
Article
Streamlining Temporal Formal Verification over Columnar Databases
by Giacomo Bergami
Information 2024, 15(1), 34; https://doi.org/10.3390/info15010034 - 08 Jan 2024
Viewed by 1141
Abstract
Recent findings demonstrate how database technology enhances the computation of formal verification tasks expressible in linear time logic for finite traces (LTLf). Human-readable declarative languages also help the common practitioner to express temporal constraints in a straightforward and accessible language. Notwithstanding [...] Read more.
Recent findings demonstrate how database technology enhances the computation of formal verification tasks expressible in linear time logic for finite traces (LTLf). Human-readable declarative languages also help the common practitioner to express temporal constraints in a straightforward and accessible language. Notwithstanding the former, this technology is in its infancy, and therefore, few optimization algorithms are known for dealing with massive amounts of information audited from real systems. We, therefore, present four novel algorithms subsuming entire LTLf expressions while outperforming previous state-of-the-art implementations on top of KnoBAB, thus postulating the need for the corresponding, leading to the formulation of novel xtLTLf-derived algebraic operators. Full article
(This article belongs to the Special Issue International Database Engineered Applications)
Show Figures

Figure 1

26 pages, 3768 KiB  
Article
Comparative Analysis of Membership Inference Attacks in Federated and Centralized Learning
by Ali Abbasi Tadi, Saroj Dayal, Dima Alhadidi and Noman Mohammed
Information 2023, 14(11), 620; https://doi.org/10.3390/info14110620 - 19 Nov 2023
Viewed by 1618
Abstract
The vulnerability of machine learning models to membership inference attacks, which aim to determine whether a specific record belongs to the training dataset, is explored in this paper. Federated learning allows multiple parties to independently train a model without sharing or centralizing their [...] Read more.
The vulnerability of machine learning models to membership inference attacks, which aim to determine whether a specific record belongs to the training dataset, is explored in this paper. Federated learning allows multiple parties to independently train a model without sharing or centralizing their data, offering privacy advantages. However, when private datasets are used in federated learning and model access is granted, the risk of membership inference attacks emerges, potentially compromising sensitive data. To address this, effective defenses in a federated learning environment must be developed without compromising the utility of the target model. This study empirically investigates and compares membership inference attack methodologies in both federated and centralized learning environments, utilizing diverse optimizers and assessing attacks with and without defenses on image and tabular datasets. The findings demonstrate that a combination of knowledge distillation and conventional mitigation techniques (such as Gaussian dropout, Gaussian noise, and activity regularization) significantly mitigates the risk of information leakage in both federated and centralized settings. Full article
(This article belongs to the Special Issue International Database Engineered Applications)
Show Figures

Figure 1

22 pages, 558 KiB  
Article
Prototype Selection for Multilabel Instance-Based Learning
by Panagiotis Filippakis, Stefanos Ougiaroglou and Georgios Evangelidis
Information 2023, 14(10), 572; https://doi.org/10.3390/info14100572 - 19 Oct 2023
Viewed by 1221
Abstract
Reducing the size of the training set, which involves replacing it with a condensed set, is a widely adopted practice to enhance the efficiency of instance-based classifiers while trying to maintain high classification accuracy. This objective can be achieved through the use of [...] Read more.
Reducing the size of the training set, which involves replacing it with a condensed set, is a widely adopted practice to enhance the efficiency of instance-based classifiers while trying to maintain high classification accuracy. This objective can be achieved through the use of data reduction techniques, also known as prototype selection or generation algorithms. Although there are numerous algorithms available in the literature that effectively address single-label classification problems, most of them are not applicable to multilabel data, where an instance can belong to multiple classes. Well-known transformation methods cannot be combined with a data reduction technique due to different reasons. The Condensed Nearest Neighbor rule is a popular parameter-free single-label prototype selection algorithm. The IB2 algorithm is the one-pass variation of the Condensed Nearest Neighbor rule. This paper proposes variations of these algorithms for multilabel data. Through an experimental study conducted on nine distinct datasets as well as statistical tests, we demonstrate that the eight proposed approaches (four for each algorithm) offer significant reduction rates without compromising the classification accuracy. Full article
(This article belongs to the Special Issue International Database Engineered Applications)
Show Figures

Figure 1

12 pages, 1558 KiB  
Article
Correction of Threshold Determination in Rapid-Guessing Behaviour Detection
by Muhammad Alfian, Umi Laili Yuhana, Eric Pardede and Akbar Noto Ponco Bimantoro
Information 2023, 14(7), 422; https://doi.org/10.3390/info14070422 - 21 Jul 2023
Cited by 1 | Viewed by 988
Abstract
Assessment is one benchmark in measuring students’ abilities. However, assessment results cannot necessarily be trusted, because students sometimes cheat or even guess in answering the questions. Therefore, to obtain valid results, it is necessary to separate valid and invalid answers by considering rapid-guessing [...] Read more.
Assessment is one benchmark in measuring students’ abilities. However, assessment results cannot necessarily be trusted, because students sometimes cheat or even guess in answering the questions. Therefore, to obtain valid results, it is necessary to separate valid and invalid answers by considering rapid-guessing behaviour. We conducted a test to record exam log data from undergraduate and postgraduate students to model rapid-guessing behaviour by determining the threshold response time. Rapid-guessing behaviour detection is inspired by the common k-second method. However, the method flattens the application of the threshold, thus allowing misclassification. The modified method considers item difficulty in determining the threshold. The evaluation results show that the system can identify students’ rapid-guessing behaviour with a success rate of 71%, which is superior to the previous method. We also analysed various aggregation techniques of response time and compared them to see the effect of selecting the aggregation technique. Full article
(This article belongs to the Special Issue International Database Engineered Applications)
Show Figures

Figure 1

19 pages, 18994 KiB  
Article
Convolutional Neural Networks Analysis Reveals Three Possible Sources of Bronze Age Writings between Greece and India
by Shruti Daggumati and Peter Z. Revesz
Information 2023, 14(4), 227; https://doi.org/10.3390/info14040227 - 07 Apr 2023
Cited by 3 | Viewed by 2107
Abstract
This paper analyzes the relationships among eight ancient scripts from between Greece and India. We used convolutional neural networks combined with support vector machines to give a numerical rating of the similarity between pairs of signs (one sign from each of two different [...] Read more.
This paper analyzes the relationships among eight ancient scripts from between Greece and India. We used convolutional neural networks combined with support vector machines to give a numerical rating of the similarity between pairs of signs (one sign from each of two different scripts). Two scripts that had a one-to-one matching of their signs were determined to be related. The result of the analysis is the finding of the following three groups, which are listed in chronological order: (1) Sumerian pictograms, the Indus Valley script, and the proto-Elamite script; (2) Cretan hieroglyphs and Linear B; and (3) the Phoenician, Greek, and Brahmi alphabets. Based on their geographic locations and times of appearance, Group (1) may originate from Mesopotamia in the early Bronze Age, Group (2) may originate from Europe in the middle Bronze Age, and Group (3) may originate from the Sinai Peninsula in the late Bronze Age. Full article
(This article belongs to the Special Issue International Database Engineered Applications)
Show Figures

Graphical abstract

10 pages, 407 KiB  
Article
Fundamental Research Challenges for Distributed Computing Continuum Systems
by Victor Casamayor Pujol, Andrea Morichetta, Ilir Murturi, Praveen Kumar Donta and Schahram Dustdar
Information 2023, 14(3), 198; https://doi.org/10.3390/info14030198 - 22 Mar 2023
Cited by 11 | Viewed by 2767
Abstract
This article discusses four fundamental topics for future Distributed Computing Continuum Systems: their representation, model, lifelong learning, and business model. Further, it presents techniques and concepts that can be useful to define these four topics specifically for Distributed Computing Continuum Systems. Finally, this [...] Read more.
This article discusses four fundamental topics for future Distributed Computing Continuum Systems: their representation, model, lifelong learning, and business model. Further, it presents techniques and concepts that can be useful to define these four topics specifically for Distributed Computing Continuum Systems. Finally, this article presents a broad view of the synergies among the presented technique that can enable the development of future Distributed Computing Continuum Systems. Full article
(This article belongs to the Special Issue International Database Engineered Applications)
Show Figures

Figure 1

60 pages, 1718 KiB  
Article
Quickening Data-Aware Conformance Checking through Temporal Algebras
by Giacomo Bergami, Samuel Appleby and Graham Morgan
Information 2023, 14(3), 173; https://doi.org/10.3390/info14030173 - 08 Mar 2023
Cited by 4 | Viewed by 1962
Abstract
A temporal model describes processes as a sequence of observable events characterised by distinguishable actions in time. Conformance checking allows these models to determine whether any sequence of temporally ordered and fully-observable events complies with their prescriptions. The latter aspect leads to Explainable [...] Read more.
A temporal model describes processes as a sequence of observable events characterised by distinguishable actions in time. Conformance checking allows these models to determine whether any sequence of temporally ordered and fully-observable events complies with their prescriptions. The latter aspect leads to Explainable and Trustworthy AI, as we can immediately assess the flaws in the recorded behaviours while suggesting any possible way to amend the wrongdoings. Recent findings on conformance checking and temporal learning lead to an interest in temporal models beyond the usual business process management community, thus including other domain areas such as Cyber Security, Industry 4.0, and e-Health. As current technologies for accessing this are purely formal and not ready for the real world returning large data volumes, the need to improve existing conformance checking and temporal model mining algorithms to make Explainable and Trustworthy AI more efficient and competitive is increasingly pressing. To effectively meet such demands, this paper offers KnoBAB, a novel business process management system for efficient Conformance Checking computations performed on top of a customised relational model. This architecture was implemented from scratch after following common practices in the design of relational database management systems. After defining our proposed temporal algebra for temporal queries (xtLTLf), we show that this can express existing temporal languages over finite and non-empty traces such as LTLf. This paper also proposes a parallelisation strategy for such queries, thus reducing conformance checking into an embarrassingly parallel problem leading to super-linear speed up. This paper also presents how a single xtLTLf operator (or even entire sub-expressions) might be efficiently implemented via different algorithms, thus paving the way to future algorithmic improvements. Finally, our benchmarks highlight that our proposed implementation of xtLTLf (KnoBAB) outperforms state-of-the-art conformance checking software running on LTLf logic. Full article
(This article belongs to the Special Issue International Database Engineered Applications)
Show Figures

Figure 1

13 pages, 960 KiB  
Article
DEGAIN: Generative-Adversarial-Network-Based Missing Data Imputation
by Reza Shahbazian and Irina Trubitsyna
Information 2022, 13(12), 575; https://doi.org/10.3390/info13120575 - 12 Dec 2022
Cited by 6 | Viewed by 3242
Abstract
Insights and analysis are only as good as the available data. Data cleaning is one of the most important steps to create quality data decision making. Machine learning (ML) helps deal with data quickly, and to create error-free or limited-error datasets. One of [...] Read more.
Insights and analysis are only as good as the available data. Data cleaning is one of the most important steps to create quality data decision making. Machine learning (ML) helps deal with data quickly, and to create error-free or limited-error datasets. One of the quality standards for cleaning the data includes handling the missing data, also known as data imputation. This research focuses on the use of machine learning methods to deal with missing data. In particular, we propose a generative adversarial network (GAN) based model called DEGAIN to estimate the missing values in the dataset. We evaluate the performance of the presented method and compare the results with some of the existing methods on publicly available Letter Recognition and SPAM datasets. The Letter dataset consists of 20,000 samples and 16 input features and the SPAM dataset consists of 4601 samples and 57 input features. The results show that the proposed DEGAIN outperforms the existing ones in terms of root mean square error and Frechet inception distance metrics. Full article
(This article belongs to the Special Issue International Database Engineered Applications)
Show Figures

Figure 1

Back to TopTop