Advances in Data Science: Methods, Systems, and Applications

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 31 May 2024 | Viewed by 7516

Special Issue Editors

Department of Control and Computer Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy
Interests: mobile and pervasive systems; tracking systems; natural interfaces; data-intensive architectures; data-driven methodologies for cultural heritage
French Council of Scientific Research (CNRS), LIRIS, France Campus de la Doua, 25 Avenue Pierre de Coubertin, CEDEX, 69622 Villeurbanne, France
Interests: data science pipeline optimization and enactment; data analytics operators; graph analytics pipeline specification and execution on just-in-time architectures; data analytics on multi-scale target architectures; domain-specific query languages for data science queries
Department of Control and Computer Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy
Interests: explainable AI; data science; automated data analytics; machine learning; natural language processing; concept drift methodologies; computational social science
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

We are pleased to announce a new Special Issue, “Advances in Data Science: Methods, Systems and Applications”, which aims to allow researchers and practitioners from different research areas to share their experiences in developing state-of-the-art analytics solutions through new methods, novel architectures and systems, and real-world applications that could benefit from the proposed solutions. Researchers are invited to submit research activities describing innovative methods, algorithms, and platforms that cover all facets of a data analytics process that provides interesting and useful services. Papers detailing industrial implementations of data analytics applications, design, and deployment experience reports on various issues raised by data analytics projects are particularly welcome. We call for research, experience reports, and demonstration proposals covering all aspects of data analytics projects and research activities. 

We welcome technical, experimental, and methodological manuscripts, as well as contributions to applied data science, that address the following topics:

  • Advances in data science and data management methods, systems, and applications.
  • Intelligent systems, cyber-physical systems, data engine, IoT platforms, big data frameworks and architectures.
  • Advances in AI and ML methods such as deep neural networks, explainable AI, computational intelligence, natural language processing, reinforcement learning models, concept drift management, and augmented reality.
  • The fields of engineering, computer science, physical, social, and life sciences, with a particular emphasis on ethical issues, fairness, and accountability.
  • Outlining academic and industrial needs and suggesting future research directions and agendas.

Application scenarios of interest include, but are not limited to:

Dr. Giovanni Malnati
Dr. Genoveva Vargas-Solar
Dr. Tania Cerquitelli
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data management
  • explainable AI
  • machine learning
  • big data architectures
  • applied data science
  • computational social science

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 1844 KiB  
Article
APIMiner: Identifying Web Application APIs Based on Web Page States Similarity Analysis
by Yuanchao Chen, Yuliang Lu, Zulie Pan, Juxing Chen, Fan Shi, Yang Li and Yonghui Jiang
Electronics 2024, 13(6), 1112; https://doi.org/10.3390/electronics13061112 - 18 Mar 2024
Viewed by 375
Abstract
Modern web applications offer various APIs for data interaction. However, as the number of these APIs increases, so does the potential for security threats. Essentially, more APIs in an application can lead to more detectable vulnerabilities. Thus, it is crucial to identify APIs [...] Read more.
Modern web applications offer various APIs for data interaction. However, as the number of these APIs increases, so does the potential for security threats. Essentially, more APIs in an application can lead to more detectable vulnerabilities. Thus, it is crucial to identify APIs as comprehensively as possible in web applications. However, this task faces challenges due to the increasing complexity of web development techniques and the abundance of similar web pages. In this paper, we propose APIMiner, a framework for identifying APIs in web applications by dynamically traversing web pages based on web page state similarity analysis. APIMiner first builds a web page model based on the HTML elements of the current web page. APIMiner then uses this model to represent the state of the page. Then, APIMiner evaluates each element’s similarity in the page model and determines the page state similarity based on these similarity values. From the different states of the page, APIMiner extracts the data interaction APIs on the page. We conduct extensive experiments to evaluate APIMiner’s effectiveness. In the similarity analysis, our method surpasses state-of-the-art methods like NDD and mNDD in accurately distinguishing similar pages. We compare APIMiner with state-of-the-art tools (e.g., Enemy of the State, Crawlergo, and Wapiti3) for API identification. APIMiner excels in the number of identified APIs (average 1136) and code coverage (average 28,470). Relative to these tools, on average, APIMiner identifies 7.96 times more APIs and increases code coverage by 142.72%. Full article
(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)
Show Figures

Figure 1

12 pages, 2919 KiB  
Article
Aircraft Behavior Recognition on Trajectory Data with a Multimodal Approach
by Meng Zhang, Lingxi Zhang and Tao Liu
Electronics 2024, 13(2), 367; https://doi.org/10.3390/electronics13020367 - 16 Jan 2024
Viewed by 522
Abstract
Moving traces are essential data for target detection and associated behavior recognition. Previous studies have used time–location sequences, route maps, or tracking videos to establish mathematical recognition models for behavior recognition. The multimodal approach has seldom been considered because of the limited modality [...] Read more.
Moving traces are essential data for target detection and associated behavior recognition. Previous studies have used time–location sequences, route maps, or tracking videos to establish mathematical recognition models for behavior recognition. The multimodal approach has seldom been considered because of the limited modality of sensing data. With the rapid development of natural language processing and computer vision, the multimodal model has become a possible choice to process multisource data. In this study, we have proposed a mathematical model for aircraft behavior recognition with joint data manners. The feature abstraction, cross-modal fusion, and classification layers are included in the proposed model for obtaining multiscale features and analyzing multimanner information. Attention has been placed on providing self- and cross-relation assessments on the spatiotemporal and geographic data related to a moving object. We have adopted both a feedforward network and a softmax function to form the classifier. Moreover, we have enabled a modality-increasing phase, combining longitude and latitude sequences with related geographic maps to avoid monotonous data. We have collected an aircraft trajectory dataset of longitude and latitude sequences for experimental validation. We have demonstrated the excellent behavior recognition performance of the proposed model joint with the modality-increasing phase. As a result, our proposed methodology reached the highest accuracy of 95.8% among all the adopted methods, demonstrating the effectiveness and feasibility of trajectory-based behavior recognition. Full article
(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)
Show Figures

Figure 1

24 pages, 825 KiB  
Article
Evaluation Method of IP Geolocation Database Based on City Delay Characteristics
by Yuancheng Xie, Zhaoxin Zhang, Yang Liu, Enhao Chen and Ning Li
Electronics 2024, 13(1), 15; https://doi.org/10.3390/electronics13010015 - 19 Dec 2023
Viewed by 601
Abstract
Despite the widespread use of IP geolocation databases, a robust and precise method for evaluating their accuracy remains elusive. This study presents a novel algorithm designed to assess the reliability of IP geolocation databases, leveraging the congruence of delay distributions across network segments [...] Read more.
Despite the widespread use of IP geolocation databases, a robust and precise method for evaluating their accuracy remains elusive. This study presents a novel algorithm designed to assess the reliability of IP geolocation databases, leveraging the congruence of delay distributions across network segments and cities. We developed a fusion reference database, termed CDCDB, to facilitate the evaluation of commercial IP geolocation databases. Remarkably, CDCDB achieves an average positioning accuracy at the city level of 94%, coupled with a city coverage of 99.99%. This allows for an effective and comprehensive evaluation of IP geolocation databases. When compared to IPUU, CDCDB demonstrates an increase in the number of network segments by 18.7%, an increase in the number of high-quality network segments by 13.2%, and an enhancement in the coverage of city-level network segments by 20.92%. The evaluation outcomes reveal that the reliability of IP geolocation databases is not uniform across different cities. Moreover, distinct IP geolocation databases display varying preferences for cities. Consequently, we advise online service providers to select suitable IP geolocation databases based on the cities they cater to, as this could significantly enhance service quality. Full article
(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)
Show Figures

Figure 1

20 pages, 3841 KiB  
Article
High-Level K-Nearest Neighbors (HLKNN): A Supervised Machine Learning Model for Classification Analysis
by Elife Ozturk Kiyak, Bita Ghasemkhani and Derya Birant
Electronics 2023, 12(18), 3828; https://doi.org/10.3390/electronics12183828 - 10 Sep 2023
Cited by 1 | Viewed by 1863
Abstract
The k-nearest neighbors (KNN) algorithm has been widely used for classification analysis in machine learning. However, it suffers from noise samples that reduce its classification ability and therefore prediction accuracy. This article introduces the high-level k-nearest neighbors (HLKNN) method, a new technique for [...] Read more.
The k-nearest neighbors (KNN) algorithm has been widely used for classification analysis in machine learning. However, it suffers from noise samples that reduce its classification ability and therefore prediction accuracy. This article introduces the high-level k-nearest neighbors (HLKNN) method, a new technique for enhancing the k-nearest neighbors algorithm, which can effectively address the noise problem and contribute to improving the classification performance of KNN. Instead of only considering k neighbors of a given query instance, it also takes into account the neighbors of these neighbors. Experiments were conducted on 32 well-known popular datasets. The results showed that the proposed HLKNN method outperformed the standard KNN method with average accuracy values of 81.01% and 79.76%, respectively. In addition, the experiments demonstrated the superiority of HLKNN over previous KNN variants in terms of the accuracy metric in various datasets. Full article
(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)
Show Figures

Figure 1

23 pages, 1520 KiB  
Article
Research and Hardware Implementation of a Reduced-Latency Quadruple-Precision Floating-Point Arctangent Algorithm
by Changjun He, Bosong Yan, Shiyun Xu, Yiwen Zhang, Zhenhua Wang and Mingjiang Wang
Electronics 2023, 12(16), 3472; https://doi.org/10.3390/electronics12163472 - 16 Aug 2023
Cited by 1 | Viewed by 794
Abstract
In the field of digital signal processing, such as in navigation and radar, a significant number of high-precision arctangent function calculations are required. Lookup tables, polynomial approximation, and single/double-precision floating-point Coordinate Rotation Digital Computer (CORDIC) algorithms are insufficient to meet the demands of [...] Read more.
In the field of digital signal processing, such as in navigation and radar, a significant number of high-precision arctangent function calculations are required. Lookup tables, polynomial approximation, and single/double-precision floating-point Coordinate Rotation Digital Computer (CORDIC) algorithms are insufficient to meet the demands of practical applications, where both high precision and low latency are essential. In this paper, based on the concept of trading area for speed, a four-step parallel branch iteration CORDIC algorithm is proposed. Using this improved algorithm, a 128-bit quad-precision floating-point arctangent function is designed, and the hardware circuit implementation of the arctangent algorithm is realized. The results demonstrate that the improved algorithm can achieve 128-bit floating-point arctangent calculations in just 32 cycles, with a maximum error not exceeding 2×1034 rad. It possesses exceptionally high computational accuracy and efficiency. Furthermore, the hardware area of the arithmetic unit is approximately 0.6317 mm2, and the power consumption is about 40.6483 mW under the TSMC 65 nm process at a working frequency of 500 MHz. This design can be well suited for dedicated CORDIC processor chip applications. The research presented in this paper holds significant value for high-precision and rapid arctangent function calculations in radar, navigation, meteorology, and other fields. Full article
(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)
Show Figures

Figure 1

34 pages, 2337 KiB  
Article
A Novel Process of Parsing Event-Log Activities for Process Mining Based on Information Content
by Fadilul-lah Yassaanah Issahaku, Xianwen Fang, Sumaiya Bashiru Danwana, Edem Kwedzo Bankas and Ke Lu
Electronics 2023, 12(2), 289; https://doi.org/10.3390/electronics12020289 - 05 Jan 2023
Cited by 1 | Viewed by 1381
Abstract
Process mining has piqued the interest of researchers and technology manufacturers. Process mining aims to extract information from event activities and their interdependencies from events recorded by some enterprise systems. An enterprise system’s transactions are labeled based on their information content, such as [...] Read more.
Process mining has piqued the interest of researchers and technology manufacturers. Process mining aims to extract information from event activities and their interdependencies from events recorded by some enterprise systems. An enterprise system’s transactions are labeled based on their information content, such as an activity that causes the occurrence of another, the timestamp between events, and the resource from which the transaction originated. This paper describes a novel process of parsing event-log activities based on information content (IC). The information content of attributes, especially activity names, which are used to describe the flow processes of enterprise systems, is grouped hierarchically as hypernyms and hyponyms in a subsume tree. The least common subsume (LCS) values of these activity names are calculated, and the corresponding relatedness values between them are obtained. These values are used to create a fuzzy causal matrix (FCM) for parsing the activities, from which a process mining algorithm is designed to mine the structural and semantic relationships among activities using an enhanced gray wolf optimizer and backpropagation algorithm. The proposed approach is resistant to noisy and incomplete event logs and can be used for process mining to reflect the structure and behavior of event logs. Full article
(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)
Show Figures

Figure 1

11 pages, 341 KiB  
Article
Theory-Guided Deep Learning Algorithms: An Experimental Evaluation
by Simone Monaco, Daniele Apiletti and Giovanni Malnati
Electronics 2022, 11(18), 2850; https://doi.org/10.3390/electronics11182850 - 09 Sep 2022
Cited by 2 | Viewed by 1323
Abstract
The use of theory-based knowledge in machine learning models has a major impact on many engineering and physics problems. The growth of deep learning algorithms is closely related to an increasing demand for data that is not acceptable or available in many use [...] Read more.
The use of theory-based knowledge in machine learning models has a major impact on many engineering and physics problems. The growth of deep learning algorithms is closely related to an increasing demand for data that is not acceptable or available in many use cases. In this context, the incorporation of physical knowledge or a priori constraints has proven beneficial in many tasks. On the other hand, this collection of approaches is context-specific, and it is difficult to generalize them to new problems. In this paper, we experimentally compare some of the most commonly used theory-injection strategies to perform a systematic analysis of their advantages. Selected state-of-the-art algorithms were reproduced for different use cases to evaluate their effectiveness with smaller training data and to discuss how the underlined strategies can fit into new application contexts. Full article
(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)
Show Figures

Figure 1

Back to TopTop