Next Issue
Volume 9, May
Previous Issue
Volume 9, March
 
 

Data, Volume 9, Issue 4 (April 2024) – 14 articles

Cover Story (view full-size image): We present a dataset containing the results of a survey developed during the COVID-19 pandemic that gathers insights from 587 tech-savvy respondents, highlighting nuances in smart device interactions. The users chose a single device for analysis, revealing the cognitive, emotional, and behavioral dimensions. A sense of agency, which reflects the users perceiving the actions influencing systems, is crucial for understanding the interaction between humans and computers as well as for understanding the users’ relationship with socio-technical systems. Investigating these interactions aids in designing user-centered technologies. This dataset can be used in multiple ways for advancing the understanding of the evolving digital landscape’s impact on users’ experiences. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
27 pages, 1874 KiB  
Article
Predicting Academic Success of College Students Using Machine Learning Techniques
by Jorge Humberto Guanin-Fajardo, Javier Guaña-Moya and Jorge Casillas
Data 2024, 9(4), 60; https://doi.org/10.3390/data9040060 - 22 Apr 2024
Viewed by 987
Abstract
College context and academic performance are important determinants of academic success; using students’ prior experience with machine learning techniques to predict academic success before the end of the first year reinforces college self-efficacy. Dropout prediction is related to student retention and has been [...] Read more.
College context and academic performance are important determinants of academic success; using students’ prior experience with machine learning techniques to predict academic success before the end of the first year reinforces college self-efficacy. Dropout prediction is related to student retention and has been studied extensively in recent work; however, there is little literature on predicting academic success using educational machine learning. For this reason, CRISP-DM methodology was applied to extract relevant knowledge and features from the data. The dataset examined consists of 6690 records and 21 variables with academic and socioeconomic information. Preprocessing techniques and classification algorithms were analyzed. The area under the curve was used to measure the effectiveness of the algorithm; XGBoost had an AUC = 87.75% and correctly classified eight out of ten cases, while the decision tree improved interpretation with ten rules in seven out of ten cases. Recognizing the gaps in the study and that on-time completion of college consolidates college self-efficacy, creating intervention and support strategies to retain students is a priority for decision makers. Assessing the fairness and discrimination of the algorithms was the main limitation of this work. In the future, we intend to apply the extracted knowledge and learn about its influence of on university management. Full article
Show Figures

Figure 1

22 pages, 700 KiB  
Review
Mapping of Data-Sharing Repositories for Paediatric Clinical Research—A Rapid Review
by Mariagrazia Felisi, Fedele Bonifazi, Maddalena Toma, Claudia Pansieri, Rebecca Leary, Victoria Hedley, Ronald Cornet, Giorgio Reggiardo, Annalisa Landi, Annunziata D’Ercole, Salma Malik, Sinéad Nally, Anando Sen, Avril Palmeri, Donato Bonifazi and Adriana Ceci
Data 2024, 9(4), 59; https://doi.org/10.3390/data9040059 - 20 Apr 2024
Viewed by 922
Abstract
The reuse of paediatric individual patient data (IPD) from clinical trials (CTs) is essential to overcome specific ethical, regulatory, methodological, and economic issues that hinder the progress of paediatric research. Sharing data through repositories enables the aggregation and dissemination of clinical information, fosters [...] Read more.
The reuse of paediatric individual patient data (IPD) from clinical trials (CTs) is essential to overcome specific ethical, regulatory, methodological, and economic issues that hinder the progress of paediatric research. Sharing data through repositories enables the aggregation and dissemination of clinical information, fosters collaboration between researchers, and promotes transparency. This work aims to identify and describe existing data-sharing repositories (DSRs) developed to store, share, and reuse paediatric IPD from CTs. A rapid review of platforms providing access to electronic DSRs was conducted. A two-stage process was used to characterize DSRs: a first step of identification, followed by a second step of analysis using a set of eight purpose-built indicators. From an initial set of forty-five publicly available DSRs, twenty-one DSRs were identified as meeting the eligibility criteria. Only two DSRs were found to be totally focused on the paediatric population. Despite an increased awareness of the importance of data sharing, the results of this study show that paediatrics remains an area in which targeted efforts are still needed. Promoting initiatives to raise awareness of these DSRs and creating ad hoc measures and common standards for the sharing of paediatric CT data could help to bridge this gap in paediatric research. Full article
Show Figures

Figure 1

21 pages, 5872 KiB  
Tutorial
Introduction to Reproducible Geospatial Analysis and Figures in R: A Tutorial Article
by Philippe Maesen and Edouard Salingros
Data 2024, 9(4), 58; https://doi.org/10.3390/data9040058 - 20 Apr 2024
Viewed by 650
Abstract
The present article is intended to serve an educational purpose for data scientists and students who already have experience with the R language and which to start using it for geospatial analysis and map creation. The basic concepts of raster data, vector data, [...] Read more.
The present article is intended to serve an educational purpose for data scientists and students who already have experience with the R language and which to start using it for geospatial analysis and map creation. The basic concepts of raster data, vector data, CRS and datum are first presented along with a basic workflow to conduct reproducible geospatial research in R. Examples of important types of maps (scatter, bubble, choropleth, hexbin and faceted) created from open-source environmental data are illustrated and their practical implementation in R is discussed. Through these examples, essential manipulations on geospatial vector data are demonstrated (reading, transforming CRS, creating geometries from scratch, buffer zones around existing geometries and intersections between geometries). Full article
Show Figures

Figure 1

10 pages, 309 KiB  
Data Descriptor
Experimental Data on Maximum Swelling Pressure of Clayey Soils and Related Soil Properties
by Reza Taherdangkoo, Muntasir Shehab, Thomas Nagel, Faramarz Doulati Ardejani and Christoph Butscher
Data 2024, 9(4), 57; https://doi.org/10.3390/data9040057 - 16 Apr 2024
Viewed by 838
Abstract
Clayey soils exhibit significant volumetric changes in response to variations in water content. The swelling pressure of clayey soils is a critical parameter for evaluating the stability and performance of structures built on them, facilitating the development of appropriate design methodologies and mitigation [...] Read more.
Clayey soils exhibit significant volumetric changes in response to variations in water content. The swelling pressure of clayey soils is a critical parameter for evaluating the stability and performance of structures built on them, facilitating the development of appropriate design methodologies and mitigation strategies to ensure their long-term integrity and safety. We present a dataset comprising maximum swelling pressure values from 759 compacted soil samples, compiled from 16 articles published between 1994 and 2022. The dataset is classified into two main groups: 463 samples of natural clays and 296 samples of bentonite and bentonite mixtures, providing data on various types of soils and their properties. Different swelling test methods, including zero swelling, swell consolidation, restrained swell, double oedometer, free swelling, constant volume oedometer, UPC isochoric cell, isochoric oedometer and consolidometer, were employed to measure the maximum swelling pressure. The comprehensive nature of the dataset enhances its applicability for geotechnical projects. The dataset is a valuable resource for understanding the complex interactions between soil properties and swelling behavior, contributing to advancements in soil mechanics and geotechnical engineering. Full article
Show Figures

Figure 1

8 pages, 494 KiB  
Data Descriptor
A Dataset for Studying the Relationship between Human and Smart Devices
by Francesco Lelli and Heidi Toivonen
Data 2024, 9(4), 56; https://doi.org/10.3390/data9040056 - 11 Apr 2024
Viewed by 1447
Abstract
This dataset reports the responses to a survey designed for investigating the relationship that humans have with their smart devices. The dataset was collected between May and July 2020 and is a sample of over 500 respondents of various ethnicities and backgrounds. These [...] Read more.
This dataset reports the responses to a survey designed for investigating the relationship that humans have with their smart devices. The dataset was collected between May and July 2020 and is a sample of over 500 respondents of various ethnicities and backgrounds. These data were used for modeling the ways that people relate to their devices using the notion of agency. However, the data can be used for complementing any study that intends to investigate a tool-mediated communication from the perspective of users, applying a variety of beliefs, attitudes, and expectations that users have in relation to their devices and themselves. This article presents the survey items as well as some preliminary data insights. The collected data were in English and the responses were anonymized to ensure GDPR compliance. The data were stored in a .csv file containing the respondents’ answers to the questions. Full article
Show Figures

Figure 1

18 pages, 2411 KiB  
Article
Learning from conect4children: A Collaborative Approach towards Standardisation of Disease-Specific Paediatric Research Data
by Anando Sen, Victoria Hedley, Eva Degraeuwe, Steven Hirschfeld, Ronald Cornet, Ramona Walls, John Owen, Peter N. Robinson, Edward G. Neilan, Thomas Liener, Giovanni Nisato, Neena Modi, Simon Woodworth, Avril Palmeri, Ricarda Gaentzsch, Melissa Walsh, Teresa Berkery, Joanne Lee, Laura Persijn, Kasey Baker, Kristina An Haack, Sonia Segovia Simon, Julius O. B. Jacobsen, Giorgio Reggiardo, Melissa A. Kirwin, Jessie Trueman, Claudia Pansieri, Donato Bonifazi, Sinéad Nally, Fedele Bonifazi, Rebecca Leary and Volker Straubadd Show full author list remove Hide full author list
Data 2024, 9(4), 55; https://doi.org/10.3390/data9040055 - 8 Apr 2024
Viewed by 1603
Abstract
The conect4children (c4c) initiative was established to facilitate the development of new drugs and other therapies for paediatric patients. It is widely recognised that there are not enough medicines tested for all relevant ages of the paediatric population. To overcome this, it is [...] Read more.
The conect4children (c4c) initiative was established to facilitate the development of new drugs and other therapies for paediatric patients. It is widely recognised that there are not enough medicines tested for all relevant ages of the paediatric population. To overcome this, it is imperative that clinical data from different sources are interoperable and can be pooled for larger post hoc studies. c4c has collaborated with the Clinical Data Interchange Standards Consortium (CDISC) to develop cross-cutting data resources that build on existing CDISC standards in an effort to standardise paediatric data. The natural next step was an extension to disease-specific data items. c4c brought together several existing initiatives and resources relevant to disease-specific data and analysed their use for standardising disease-specific data in clinical trials. Several case studies that combined disease-specific data from multiple trials have demonstrated the need for disease-specific data standardisation. We identified three relevant initiatives. These include European Reference Networks, European Joint Programme on Rare Diseases, and Pistoia Alliance. Other resources reviewed were National Cancer Institute Enterprise Vocabulary Services, CDISC standards, pharmaceutical company-specific data dictionaries, Human Phenotype Ontology, Phenopackets, Unified Registry for Inherited Metabolic Disorders, Orphacodes, Rare Disease Cures Accelerator-Data and Analytics Platform (RDCA-DAP), and Observational Medical Outcomes Partnership. The collaborative partners associated with these resources were also reviewed briefly. A plan of action focussed on collaboration was generated for standardising disease-specific paediatric clinical trial data. A paediatric data standards multistakeholder and multi-project user group was established to guide the remaining actions—FAIRification of metadata, a Phenopackets pilot with RDCA-DAP, applying Orphacodes to case report forms of clinical trials, introducing CDISC standards into European Reference Networks, testing of the CDISC Pediatric User Guide using data from the mentioned resources and organisation of further workshops and educational materials. Full article
Show Figures

Figure 1

13 pages, 2843 KiB  
Data Descriptor
Illumina 16S rRNA Gene Sequencing Dataset of Bacterial Communities of Soil Associated with Ironwood Trees (Casuarina equisetifolia) in Guam
by Tao Jin, Robert L. Schlub and Claudia Husseneder
Data 2024, 9(4), 54; https://doi.org/10.3390/data9040054 - 7 Apr 2024
Viewed by 851
Abstract
Ironwood trees, which are of great importance for the economy and environment of tropical areas, were first discovered to suffer from a slow progressive dieback in Guam in 2002, later referred to as ironwood tree decline (IWTD). A variety of biotic factors have [...] Read more.
Ironwood trees, which are of great importance for the economy and environment of tropical areas, were first discovered to suffer from a slow progressive dieback in Guam in 2002, later referred to as ironwood tree decline (IWTD). A variety of biotic factors have been shown to be associated with IWTD, including putative bacterial pathogens Ralstonia solanacearum and Klebsiella species (K. variicola and K. oxytoca), the fungus Ganoderma australe, and termites. Due to the soilborne nature of these pathogens, soil microbiomes have been suggested to be a significant factor influencing tree health. In this project, we sequenced the microbiome in the soil collected from the root region of healthy ironwood trees and those showing signs of IWTD to evaluate the association between the bacterial community in soil and IWTD. This dataset contains 4,782,728 raw sequencing reads present in soil samples collected from thirty-nine ironwood trees with varying scales of decline severity in Guam obtained via sequencing the V1–V3 region of the 16S rRNA gene on the Illumina NovaSeq (2 × 250 bp) platform. Sequences were taxonomically assigned in QIIME2 using the SILVA 132 database. Firmicutes and Actinobacteria were the most dominant phyla in soil. Differences in soil microbiomes were detected between limestone and sand soil parent materials. No putative plant pathogens of the genera Ralstonia or Klebsiella were found in the samples. Bacterial diversity was not linked to parameters of IWTD. The dataset has been made publicly available through NCBI GenBank under BioProject ID PRJNA883256. This dataset can be used to compare the bacterial taxa present in soil associated with ironwood trees in Guam to bacteria communities of other geographical locations to identify microbial signatures of IWTD. In addition, this dataset can also be used to investigate the relationship between soil microbiomes and the microbiomes of ironwood trees as well as those of the termites which attack ironwood trees. Full article
Show Figures

Figure 1

11 pages, 1651 KiB  
Data Descriptor
Wearable Device Bluetooth/BLE Physical Layer Dataset
by Artis Rusins, Deniss Tiscenko, Eriks Dobelis, Eduards Blumbergs, Krisjanis Nesenbergs and Peteris Paikens
Data 2024, 9(4), 53; https://doi.org/10.3390/data9040053 - 3 Apr 2024
Viewed by 1120
Abstract
Wearable devices, such as headsets and activity trackers, rely heavily on the Bluetooth and/or the Bluetooth Low Energy wireless communication standard to exchange data with smartphones or other peripherals. Since these devices collect personal health and activity data, ensuring the privacy and security [...] Read more.
Wearable devices, such as headsets and activity trackers, rely heavily on the Bluetooth and/or the Bluetooth Low Energy wireless communication standard to exchange data with smartphones or other peripherals. Since these devices collect personal health and activity data, ensuring the privacy and security of the transmitted data is crucial. Therefore, we present a dataset that captures complete Bluetooth communications—including advertising, connection, data exchange, and disconnection—in an RF isolated environment using software-defined radio. We were able to successfully decode the captured Bluetooth packets using existing tools. This dataset provides researchers with the ability to fully analyze Bluetooth traffic and gain insight into communication patterns and potential security vulnerabilities. Full article
Show Figures

Figure 1

17 pages, 2827 KiB  
Article
Natural Language Processing Patents Landscape Analysis
by Hend S. Al-Khalifa, Taif AlOmar and Ghala AlOlyyan
Data 2024, 9(4), 52; https://doi.org/10.3390/data9040052 - 31 Mar 2024
Viewed by 1283
Abstract
Understanding NLP patents provides valuable insights into innovation trends and competitive dynamics in artificial intelligence. This study uses the Lens patent database to investigate the landscape of NLP patents. The overall patent output in the NLP field on a global scale has exhibited [...] Read more.
Understanding NLP patents provides valuable insights into innovation trends and competitive dynamics in artificial intelligence. This study uses the Lens patent database to investigate the landscape of NLP patents. The overall patent output in the NLP field on a global scale has exhibited a rapid growth over the past decade, indicating rising research and commercial interests in applying NLP techniques. By analyzing patent assignees, technology categories, and geographic distribution, we identify leading innovators as well as research hotspots in applying NLP. The patent landscape reflects intensifying competition between technology giants and research institutions. This research aims to synthesize key patterns and developments in NLP innovation revealed through patent data analysis, highlighting implications for firms and policymakers. A detailed understanding of NLP patenting activity can inform intellectual property strategy and technology investment decisions in this burgeoning AI domain. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

12 pages, 1166 KiB  
Article
Longitudinal Patterns of Online Activity and Social Feedback Are Associated with Current and Perceived Changes in Quality of Life in Adult Facebook Users
by Davide Marengo and Michele Settanni
Data 2024, 9(4), 51; https://doi.org/10.3390/data9040051 - 31 Mar 2024
Viewed by 931
Abstract
The present study explored how sharing verbal status updates on Facebook and receiving Likes, as a form of positive social feedback, correlate with current and perceived changes in Quality of Life (QoL). Utilizing the Facebook Graph API, we collected a longitudinal dataset comprising [...] Read more.
The present study explored how sharing verbal status updates on Facebook and receiving Likes, as a form of positive social feedback, correlate with current and perceived changes in Quality of Life (QoL). Utilizing the Facebook Graph API, we collected a longitudinal dataset comprising status updates and Likes received by 1577 adult Facebook users over a 12-month period. Two monthly indicators were calculated: the percentage of verbal status updates and the average number of Likes per post. Participants were administered a survey to assess current and perceived changes in QoL. Confirmatory Factor Analysis (CFA) and the Auto-Regressive Latent Trajectory Model with Structured Residuals (ALT-SRs) were used to model longitudinal patterns emerging from the objective recordings of Facebook activity and explore their correlation with QoL measures. Findings indicated a positive correlation between the percentage of verbal status updated on Facebook and current QoL. Online positive social feedback, measured through received Likes, was associated with both current QoL and perceived improvements in QoL. Of note, perceived improvements in QoL correlated with an increase in received Likes over time. Results highlight the relevance of collecting and modeling longitudinal Facebook data for the investigation of the association between activity on social media and individual well-being. Full article
Show Figures

Figure 1

14 pages, 3697 KiB  
Article
DNA of Music: Identifying Relationships among Different Versions of the Composition Sadhukarn from Thailand, Laos, and Cambodia Using Multivariate Statistics
by Sumetus Eambangyung, Gretel Schwörer-Kohl and Witoon Purahong
Data 2024, 9(4), 50; https://doi.org/10.3390/data9040050 - 30 Mar 2024
Viewed by 1310
Abstract
Sadhukarn, a sacred music composition performed ritually to salute and invite divine powers to open a ceremony or feast, is played in Thailand, Cambodia, and Laos. Different countries have unique versions, arranged based on musicians’ skills and en vogue styles. This study presents [...] Read more.
Sadhukarn, a sacred music composition performed ritually to salute and invite divine powers to open a ceremony or feast, is played in Thailand, Cambodia, and Laos. Different countries have unique versions, arranged based on musicians’ skills and en vogue styles. This study presents the results of multivariate statistical analyses of 26 different versions of Sadhukarn main melodies using non-metric multidimensional scaling (NMDS) and cluster analysis. The objective was to identify the optimal number of parameters for identifying the origin and relationships among Sadhukarn versions, including rhyme structures, pillar tone, rhythmic and melodic patterns, intervals, pitches, and combinations of these parameters. The data were analyzed using both full and normalized datasets (32 phrases) to avoid biases due to differences in phrases among versions. Overall, the combination of six parameters is the best approach for data analysis in both full and normalized datasets. The analysis of the ‘full version’ shows the separation of Sadhukarn versions from different countries of origin, while the analysis of the ‘normalized version’ reveals the rhyme structure, rhythmic structure, and pitch as crucial parameters for identifying Sadhukarn versions. We conclude that multivariate statistics are powerful tools for identifying relationships among different versions of Sadhukarn compositions from Thailand, Laos, and Cambodia and within the same countries of origin. Full article
(This article belongs to the Special Issue Data Analysis for Audio-Visual Stimuli and Learning Algorithms)
Show Figures

Figure 1

11 pages, 3836 KiB  
Article
Analysis of a Bluetooth Traffic Dataset Obtained during University Examination Sessions
by Radu Bouaru, Adrian Peculea, Bogdan Iancu, Sorin Buzura, Emil Cebuc and Vasile Dadarlat
Data 2024, 9(4), 49; https://doi.org/10.3390/data9040049 - 30 Mar 2024
Viewed by 825
Abstract
In academic environments, students take exams simultaneously in campus examination classrooms. Due to recent advancements in technology, examination rooms are flooded with Bluetooth data traffic generated by personal devices (smartphones, smartwatches, etc.). The work presented in this article proposes a method for collecting [...] Read more.
In academic environments, students take exams simultaneously in campus examination classrooms. Due to recent advancements in technology, examination rooms are flooded with Bluetooth data traffic generated by personal devices (smartphones, smartwatches, etc.). The work presented in this article proposes a method for collecting Bluetooth traffic in an academic examination setting. The desired data were collected during several examination sessions using an Ubertooth One device, and then an in-depth post-processing analysis was performed on the collected dataset. The devices generating traffic were precisely located within the examination room, and areas with heightened data traffic were highlighted. Additionally, another goal of the current research was to provide a unique type of dataset to the academic community, facilitating its utilization in further research endeavors. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

20 pages, 1028 KiB  
Review
Luxury Car Data Analysis: A Literature Review
by Pegah Barakati, Flavio Bertini, Emanuele Corsi, Maurizio Gabbrielli and Danilo Montesi
Data 2024, 9(4), 48; https://doi.org/10.3390/data9040048 - 30 Mar 2024
Viewed by 1609
Abstract
The concept of luxury, considering it a rare and exclusive attribute, is evolving due to technological advances and the increasing influence of consumers in the market. Luxury cars have always symbolized wealth, social status, and sophistication. Recently, as technology progresses, the ability and [...] Read more.
The concept of luxury, considering it a rare and exclusive attribute, is evolving due to technological advances and the increasing influence of consumers in the market. Luxury cars have always symbolized wealth, social status, and sophistication. Recently, as technology progresses, the ability and interest to gather, store, and analyze data from these elegant vehicles has also increased. In recent years, the analysis of luxury car data has emerged as a significant area of research, highlighting researchers’ exploration of various aspects that may differentiate luxury cars from ordinary ones. For instance, researchers study factors such as economic impact, technological advancements, customer preferences and demographics, environmental implications, brand reputation, security, and performance. Although the percentage of individuals purchasing luxury cars is lower than that of ordinary cars, the significance of analyzing luxury car data lies in its impact on various aspects of the automotive industry and society. This literature review aims to provide an overview of the current state of the art in luxury car data analysis. Full article
(This article belongs to the Topic Big Data Intelligence: Methodologies and Applications)
Show Figures

Figure 1

12 pages, 14369 KiB  
Data Descriptor
An EEG Dataset of Subject Pairs during Collaboration and Competition Tasks in Face-to-Face and Online Modalities
by María A. Hernández-Mustieles, Yoshua E. Lima-Carmona, Axel A. Mendoza-Armenta, Ximena Hernandez-Machain, Diego A. Garza-Vélez, Aranza Carrillo-Márquez, Diana C. Rodríguez-Alvarado, Jorge de J. Lozoya-Santos and Mauricio A. Ramírez-Moreno
Data 2024, 9(4), 47; https://doi.org/10.3390/data9040047 - 27 Mar 2024
Viewed by 1318
Abstract
This dataset was acquired during collaboration and competition tasks performed by sixteen subject pairs (N = 32) of one female and one male under different (face-to-face and online) modalities. The collaborative task corresponds to cooperating to put together a 100-piece puzzle, while the [...] Read more.
This dataset was acquired during collaboration and competition tasks performed by sixteen subject pairs (N = 32) of one female and one male under different (face-to-face and online) modalities. The collaborative task corresponds to cooperating to put together a 100-piece puzzle, while the competition task refers to playing against each other in a one-on-one classic 28-piece dominoes game. In the face-to-face modality, all interactions between the pair occurred in person. On the other hand, in the online modality, participants were physically separated, and interaction was only allowed through Zoom software with an active microphone and camera. Electroencephalography data of the two subjects were acquired simultaneously while performing the tasks. This article describes the experimental setup, the process of the data streams acquired during the tasks, and the assessment of data quality. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop