Submit to Special Issue Submit Abstract to Special Issue Review for Information Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Multidimensional Data Structures and Big Data Management

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Published Papers

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Systems".

Deadline for manuscript submissions: 31 January 2025 | Viewed by 10613

Share This Special Issue

Special Issue Editor

Prof. Dr. Spyros Sioutas

E-Mail Website
Guest Editor

Department of Computer Engineering and Informatics, University of Patras, 26504 Rio Achaia, Greece
Interests: multidimensional data structures; decentralized systems for big data management; indexing; query processing and query optimization
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The MDPI journal Information invites submissions to a Special Issue on “Multidimensional Data Structures and Big Data Management”.

The area of big data has increasingly become a space where data are described by their volume, velocity, variety, and other factors that comprise them in a modern world scenario. This has led to the emergence of a growing sector in the field of big data management, as data often arise from different sources which are sometimes heterogeneous, and therefore, their efficient organizing and handling are of particular interest.

This raises several challenges, such as indexing in massive datasets, query optimization in large databases, distributed methods and data mining, as well as knowledge extraction.

Furthermore, data structures which are responsible for storing, organizing, retrieving, and processing data play a major role in solving the aforementioned challenges. Official distributed engine software such as Apache Spark, Containers (Kubernetes and elastic cloud with Kibana), etc. is capable of transposing the problem of handling massive datasets into a more undemanding one by splitting the data into chunks and creating clusters to perform tasks.

The integration of data structures and indexes along with innovative distributed engine tools is the main scope of this Special Issue, accompanied by modern ML and AI methods for provisioning and prediction on a large scale.

Ultimately, this Special Issue is concerned with groundbreaking topics at the interface of data structures and indexing, distributed ML, query processing, and optimization, with particular emphasis on multidimensional data structures for big data management.

Topics of call

Efficient data structures
Big data indexing strategies
Distributed machine learning
Big data management techniques
Random sampling for data mining
Automated machine learning
Modern database systems
Big data management for smart IoT applications
Advanced distributed hash tables (DHTs)
Innovative schemes for information retrieval and knowledge extraction
AI and machine learning approaches for handling massive datasets
Query optimization based on machine learning approaches

Prof. Dr. Spyros Sioutas
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

multidimensional data structures
big data management
big data indexing strategies
large scale query processing and query optimization
large scale machine learning

Published Papers (6 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

Jump to: Review

20 pages, 1199 KiB

Open AccessArticle

An Agent-Based Model for Disease Epidemics in Greece

by Vasileios Thomopoulos and Kostas Tsichlas

Information 2024, 15(3), 150; https://doi.org/10.3390/info15030150 - 07 Mar 2024

Viewed by 1144

Abstract

In this research, we present the first steps toward developing a data-driven agent-based model (ABM) specifically designed for simulating infectious disease dynamics in Greece. Amidst the ongoing COVID-19 pandemic caused by SARS-CoV-2, this research holds significant importance as it can offer valuable insights into disease transmission patterns and assist in devising effective intervention strategies. To the best of our knowledge, no similar study has been conducted in Greece. We constructed a prototype ABM that utilizes publicly accessible data to accurately represent the complex interactions and dynamics of disease spread in the Greek population. By incorporating demographic information and behavioral patterns, our model captures the specific characteristics of Greece, enabling accurate and context-specific simulations. By using our proposed ABM, we aim to assist policymakers in making informed decisions regarding disease control and prevention. Through the use of simulations, policymakers have the opportunity to explore different scenarios and predict the possible results of various intervention measures. These may include strategies like testing approaches, contact tracing, vaccination campaigns, and social distancing measures. Through these simulations, policymakers can assess the effectiveness and feasibility of these interventions, leading to the development of well-informed strategies aimed at reducing the impact of infectious diseases on the Greek population. This study is an initial exploration toward understanding disease transmission patterns and a first step towards formulating effective intervention strategies for Greece. Full article

(This article belongs to the Special Issue Multidimensional Data Structures and Big Data Management)

► Show Figures

Figure 1

19 pages, 3617 KiB

Open AccessArticle

Deep Learning Approaches for Big Data-Driven Metadata Extraction in Online Job Postings

by Panagiotis Skondras, Nikos Zotos, Dimitris Lagios, Panagiotis Zervas, Konstantinos C. Giotopoulos and Giannis Tzimas

Information 2023, 14(11), 585; https://doi.org/10.3390/info14110585 - 25 Oct 2023

Viewed by 1730

Abstract

This article presents a study on the multi-class classification of job postings using machine learning algorithms. With the growth of online job platforms, there has been an influx of labor market data. Machine learning, particularly NLP, is increasingly used to analyze and classify job postings. However, the effectiveness of these algorithms largely hinges on the quality and volume of the training data. In our study, we propose a multi-class classification methodology for job postings, drawing on AI models such as text-davinci-003 and the quantized versions of Falcon 7b (Falcon), Wizardlm 7B (Wizardlm), and Vicuna 7B (Vicuna) to generate synthetic datasets. These synthetic data are employed in two use-case scenarios: (a) exclusively as training datasets composed of synthetic job postings (situations where no real data is available) and (b) as an augmentation method to bolster underrepresented job title categories. To evaluate our proposed method, we relied on two well-established approaches: the feedforward neural network (FFNN) and the BERT model. Both the use cases and training methods were assessed against a genuine job posting dataset to gauge classification accuracy. Our experiments substantiated the benefits of using synthetic data to enhance job posting classification. In the first scenario, the models’ performance matched, and occasionally exceeded, that of the real data. In the second scenario, the augmented classes consistently outperformed in most instances. This research confirms that AI-generated datasets can enhance the efficacy of NLP algorithms, especially in the domain of multi-class classification job postings. While data augmentation can boost model generalization, its impact varies. It is especially beneficial for simpler models like FNN. BERT, due to its context-aware architecture, also benefits from augmentation but sees limited improvement. Selecting the right type and amount of augmentation is essential. Full article

(This article belongs to the Special Issue Multidimensional Data Structures and Big Data Management)

► Show Figures

Figure 1

27 pages, 1404 KiB

Open AccessArticle

EVCA Classifier: A MCMC-Based Classifier for Analyzing High-Dimensional Big Data

by Eleni Vlachou, Christos Karras, Aristeidis Karras, Dimitrios Tsolis and Spyros Sioutas

Information 2023, 14(8), 451; https://doi.org/10.3390/info14080451 - 09 Aug 2023

Cited by 2 | Viewed by 1645

Abstract

In this work, we introduce an innovative Markov Chain Monte Carlo (MCMC) classifier, a synergistic combination of Bayesian machine learning and Apache Spark, highlighting the novel use of this methodology in the spectrum of big data management and environmental analysis. By employing a large dataset of air pollutant concentrations in Madrid from 2001 to 2018, we developed a Bayesian Logistic Regression model, capable of accurately classifying the Air Quality Index (AQI) as safe or hazardous. This mathematical formulation adeptly synthesizes prior beliefs and observed data into robust posterior distributions, enabling superior management of overfitting, enhancing the predictive accuracy, and demonstrating a scalable approach for large-scale data processing. Notably, the proposed model achieved a maximum accuracy of 87.91% and an exceptional recall value of 99.58% at a decision threshold of 0.505, reflecting its proficiency in accurately identifying true negatives and mitigating misclassification, even though it slightly underperformed in comparison to the traditional Frequentist Logistic Regression in terms of accuracy and the AUC score. Ultimately, this research underscores the efficacy of Bayesian machine learning for big data management and environmental analysis, while signifying the pivotal role of the first-ever MCMC Classifier and Apache Spark in dealing with the challenges posed by large datasets and high-dimensional data with broader implications not only in sectors such as statistics, mathematics, physics but also in practical, real-world applications. Full article

(This article belongs to the Special Issue Multidimensional Data Structures and Big Data Management)

► Show Figures

Figure 1

24 pages, 3063 KiB

Open AccessArticle

Local Community Detection in Graph Streams with Anchors

by Konstantinos Christopoulos, Georgia Baltsou and Konstantinos Tsichlas

Information 2023, 14(6), 332; https://doi.org/10.3390/info14060332 - 12 Jun 2023

Cited by 2 | Viewed by 1107

Abstract

Community detection in dynamic networks is a challenging research problem. One of the main obstacles is the stability issues that arise during the evolution of communities. In dynamic networks, new communities may emerge and existing communities may disappear, grow, or shrink. As a result, a community can evolve into a completely different one, making it difficult to track its evolution (this is known as the drifting/identity problem). In this paper, we focused on the evolution of a single community. Our aim was to identify the community that contains a particularly important node, called the anchor, and to track its evolution over time. In this way, we circumvented the identity problem by allowing the anchor to define the core of the relevant community. We proposed a framework that tracks the evolution of the community defined by the anchor and verified its efficiency and effectiveness through experimental evaluation. Full article

(This article belongs to the Special Issue Multidimensional Data Structures and Big Data Management)

► Show Figures

Figure 1

34 pages, 1619 KiB

Open AccessArticle

AutoML with Bayesian Optimizations for Big Data Management

by Aristeidis Karras, Christos Karras, Nikolaos Schizas, Markos Avlonitis and Spyros Sioutas

Information 2023, 14(4), 223; https://doi.org/10.3390/info14040223 - 05 Apr 2023

Cited by 7 | Viewed by 2491

Abstract

The field of automated machine learning (AutoML) has gained significant attention in recent years due to its ability to automate the process of building and optimizing machine learning models. However, the increasing amount of big data being generated has presented new challenges for AutoML systems in terms of big data management. In this paper, we introduce Fabolas and learning curve extrapolation as two methods for accelerating hyperparameter optimization. Four methods for quickening training were presented including Bag of Little Bootstraps, k-means clustering for Support Vector Machines, subsample size selection for gradient descent, and subsampling for logistic regression. Additionally, we also discuss the use of Markov Chain Monte Carlo (MCMC) methods and other stochastic optimization techniques to improve the efficiency of AutoML systems in managing big data. These methods enhance various facets of the training process, making it feasible to combine them in diverse ways to gain further speedups. We review several combinations that have potential and provide a comprehensive understanding of the current state of AutoML and its potential for managing big data in various industries. Furthermore, we also mention the importance of parallel computing and distributed systems to improve the scalability of the AutoML systems while working with big data. Full article

(This article belongs to the Special Issue Multidimensional Data Structures and Big Data Management)

► Show Figures

Figure 1

Review

Jump to: Research

31 pages, 183324 KiB

Open AccessReview

Linked Data Interfaces: A Survey

by Eleonora Bernasconi, Miguel Ceriani, Davide Di Pierro, Stefano Ferilli and Domenico Redavid

Information 2023, 14(9), 483; https://doi.org/10.3390/info14090483 - 30 Aug 2023

Cited by 3 | Viewed by 1319

Abstract

In the era of big data, linked data interfaces play a critical role in enabling access to and management of large-scale, heterogeneous datasets. This survey investigates forty-seven interfaces developed by the semantic web community in the context of the Web of Linked Data, displaying information about general topics and digital library contents. The interfaces are classified based on their interaction paradigm, the type of information they display, and the complexity reduction strategies they employ. The main purpose to be addressed is the possibility of categorizing a great number of available tools so that comparison among them becomes feasible and valuable. The analysis reveals that most interfaces use a hybrid interaction paradigm combining browsing, searching, and displaying information in lists or tables. Complexity reduction strategies, such as faceted search and summary visualization, are also identified. Emerging trends in linked data interface focus on user-centric design and advancements in semantic annotation methods, leveraging machine learning techniques for data enrichment and retrieval. Additionally, an interactive platform is provided to explore and compare data on the analyzed tools. Overall, there is no one-size-fits-all solution for developing linked data interfaces and tailoring the interaction paradigm and complexity reduction strategies to specific user needs is essential. Full article

(This article belongs to the Special Issue Multidimensional Data Structures and Big Data Management)

► Show Figures

Journal Menu

Journal Browser

Multidimensional Data Structures and Big Data Management

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Published Papers (6 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI