Data Science and Big Data in Biology, Physical Science and Engineering

A special issue of Technologies (ISSN 2227-7080). This special issue belongs to the section "Information and Communication Technologies".

Deadline for manuscript submissions: closed (30 September 2023) | Viewed by 50635

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editor

Special Issue Information

Dear Colleagues,

Nowadays, Big Data analysis represents one of the most important contemporary areas of development and research. Tremendous amounts of data are generated every single day from digital technologies and modern information systems, such as cloud computing and Internet of Things (IoT) devices. Analysis of these enormous amounts of data has become of crucial significance and requires a great deal of effort in order to extract valuable knowledge for decision-making which, in turn, will make important contributions in both academia and industry.

Big Data and data science have emerged due to the significant need for generating, storing, organising and processing immense amounts of data. Data scientists strive to use artificial intelligence (AI) and machine learning (ML) approaches and models to allow computers to detect and identify what the data represents and be able to detect patterns more quickly, efficiently and reliably than humans.

The goal behind this Special Issue is to explore and discuss various principles, tools and models in the context of data science, besides the diverse and varied concepts and techniques relating to Big Data in biology, chemistry, biomedical engineering, physics, mathematics and other areas that work with Big Data.

Prof. Dr. Mohammed Mahmoud
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Technologies is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Prof. Dr. Mohammed Mahmoud
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Technologies is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data science
  • big data
  • machine learning
  • artificial intelligence

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research, Review, Other

5 pages, 159 KiB  
Editorial
Editorial for the Special Issue “Data Science and Big Data in Biology, Physical Science and Engineering”
by Mohammed Mahmoud
Technologies 2024, 12(1), 8; https://doi.org/10.3390/technologies12010008 - 08 Jan 2024
Viewed by 1354
Abstract
Big Data analysis is one of the most contemporary areas of development and research in the present day [...] Full article

Research

Jump to: Editorial, Review, Other

16 pages, 293 KiB  
Article
Get Real Get Better: A Framework for Developing Agile Program Management in the U.S. Navy Supported by the Application of Advanced Data Analytics and AI
by Jonathan Haase, Peter B. Walker, Olivia Berardi and Waldemar Karwowski
Technologies 2023, 11(6), 165; https://doi.org/10.3390/technologies11060165 - 20 Nov 2023
Cited by 1 | Viewed by 2009
Abstract
This paper discusses the “Get Real Get Better” (GRGB) approach to implementing agile program management in the U.S. Navy, supported by advanced data analytics and artificial intelligence (AI). GRGB was designed as a set of foundational principles to advance Navy culture [...] Read more.
This paper discusses the “Get Real Get Better” (GRGB) approach to implementing agile program management in the U.S. Navy, supported by advanced data analytics and artificial intelligence (AI). GRGB was designed as a set of foundational principles to advance Navy culture and support its core values. This article identifies a need for a more informed and efficient approach to program management by highlighting the benefits of implementing comprehensive data analytics that leverage recent advances in cloud computing and machine learning. The Jupiter enclave within Advana implemented by the U.S. Navy, is also discussed. The presented approach represents a practical framework that cultivates a “Get Real Get Better” mindset for implementing agile program management in the U.S. Navy. Full article
18 pages, 2892 KiB  
Article
Deep Learning Techniques for Web-Based Attack Detection in Industry 5.0: A Novel Approach
by Abdu Salam, Faizan Ullah, Farhan Amin and Mohammad Abrar
Technologies 2023, 11(4), 107; https://doi.org/10.3390/technologies11040107 - 08 Aug 2023
Cited by 2 | Viewed by 3574
Abstract
As the manufacturing industry advances towards Industry 5.0, which heavily integrates advanced technologies such as cyber-physical systems, artificial intelligence, and the Internet of Things (IoT), the potential for web-based attacks increases. Cybersecurity concerns remain a crucial challenge for Industry 5.0 environments, where cyber-attacks [...] Read more.
As the manufacturing industry advances towards Industry 5.0, which heavily integrates advanced technologies such as cyber-physical systems, artificial intelligence, and the Internet of Things (IoT), the potential for web-based attacks increases. Cybersecurity concerns remain a crucial challenge for Industry 5.0 environments, where cyber-attacks can cause devastating consequences, including production downtime, data breaches, and even physical harm. To address this challenge, this research proposes an innovative deep-learning methodology for detecting web-based attacks in Industry 5.0. Convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformer models are examples of deep learning techniques that are investigated in this study for their potential to effectively classify attacks and identify anomalous behavior. The proposed transformer-based system outperforms traditional machine learning methods and existing deep learning approaches in terms of accuracy, precision, and recall, demonstrating the effectiveness of deep learning for intrusion detection in Industry 5.0. The study’s findings showcased the superiority of the proposed transformer-based system, outperforming previous approaches in accuracy, precision, and recall. This highlights the significant contribution of deep learning in addressing cybersecurity challenges in Industry 5.0 environments. This study contributes to advancing cybersecurity in Industry 5.0, ensuring the protection of critical infrastructure and sensitive data. Full article
Show Figures

Figure 1

10 pages, 217 KiB  
Article
Self-Directed and Self-Designed Learning: Integrating Imperative Topics in the Case of COVID-19
by Alireza Ebrahimi
Technologies 2023, 11(4), 85; https://doi.org/10.3390/technologies11040085 - 29 Jun 2023
Cited by 1 | Viewed by 1484
Abstract
Self-directed learning and self-design became unexpectedly popular and common during the COVID-19 era. Learners are encouraged to take charge of their learning and, often the opportunity to independently design their learning experience. This research illustrates the use of technology in teaching and learning [...] Read more.
Self-directed learning and self-design became unexpectedly popular and common during the COVID-19 era. Learners are encouraged to take charge of their learning and, often the opportunity to independently design their learning experience. This research illustrates the use of technology in teaching and learning technology with a central theme of promoting self-directed learning with engaging self-design for both educators and learners. The technology used includes existing tools such as web page design, Learning Management Systems (LMS), project management tools, and basic programming foundations and concepts of big data and databases. In addition, end-users and developers can create their own tools with simple coding. Planning techniques, such as Visual Plan Construct Language with its embedded AI, are used to integrate course material and rubrics with time management. Educators may use project management tools instead. The research proposes a self-directed paradigm with self-designed resources using the existing technology with LMS modules, discussions, and self-tests. The research establishes its criteria for ensuring the quality of content and design, known as 7x2C. Additionally, other criteria for analysis, such as Design Thinking, are included. The approach is examined for a technology-based business course in creating an experiential learning system for COVID-19 awareness. Likewise, among other projects, an environment for educating learners about diabetes and obesity has been designed. The project is known as Sunchoke, which has a theme of Grow, Eat, and Heal. Educators can use their own content and rubrics to adapt this approach to their own customized teaching methods. Full article
24 pages, 2335 KiB  
Article
An Advanced Decision Tree-Based Deep Neural Network in Nonlinear Data Classification
by Mohammad Arifuzzaman, Md. Rakibul Hasan, Tasnia Jahan Toma, Samia Binta Hassan and Anup Kumar Paul
Technologies 2023, 11(1), 24; https://doi.org/10.3390/technologies11010024 - 01 Feb 2023
Cited by 2 | Viewed by 3734
Abstract
Deep neural networks (DNNs), the integration of neural networks (NNs) and deep learning (DL), have proven highly efficient in executing numerous complex tasks, such as data and image classification. Because the multilayer in a nonlinearly separable data structure is not transparent, it is [...] Read more.
Deep neural networks (DNNs), the integration of neural networks (NNs) and deep learning (DL), have proven highly efficient in executing numerous complex tasks, such as data and image classification. Because the multilayer in a nonlinearly separable data structure is not transparent, it is critical to develop a specific data classification model from a new and unexpected dataset. In this paper, we propose a novel approach using the concepts of DNN and decision tree (DT) for classifying nonlinear data. We first developed a decision tree-based neural network (DTBNN) model. Next, we extend our model to a decision tree-based deep neural network (DTBDNN), in which the multiple hidden layers in DNN are utilized. Using DNN, the DTBDNN model achieved higher accuracy compared to the related and relevant approaches. Our proposal achieves the optimal trainable weights and bias to build an efficient model for nonlinear data classification by combining the benefits of DT and NN. By conducting in-depth performance evaluations, we demonstrate the effectiveness and feasibility of the proposal by achieving good accuracy over different datasets. Full article
Show Figures

Figure 1

19 pages, 2288 KiB  
Article
Data Model Design to Support Data-Driven IT Governance Implementation
by Vittoria Biagi and Angela Russo
Technologies 2022, 10(5), 106; https://doi.org/10.3390/technologies10050106 - 08 Oct 2022
Cited by 3 | Viewed by 3110
Abstract
Organizations must quickly adapt their processes to understand the dynamic nature of modern business environments. As highlighted in the literature, centralized governance supports decision-making and performance measurement processes in technology companies. For this reason, a reliable decision-making system with an integrated data model [...] Read more.
Organizations must quickly adapt their processes to understand the dynamic nature of modern business environments. As highlighted in the literature, centralized governance supports decision-making and performance measurement processes in technology companies. For this reason, a reliable decision-making system with an integrated data model that enables the rapid collection and transformation of data stored in heterogeneous and different sources is needed. Therefore, this paper proposes the design of a data model to implement data-driven governance through a literature review of adopted approaches. The lack of a standardized procedure and a disconnection between theoretical frameworks and practical application has emerged. This paper documented the suggested approach following these steps: (i) mapping of monitoring requirements to the data structure, (ii) documentation of ER diagram design, and (iii) reporting dashboards used for monitoring and reporting. The paper helped fill the gaps highlighted in the literature by supporting the design and development of a DWH data model coupled with a BI system. The application prototype shows benefits for top management, particularly those responsible for governance and operations, especially for risk monitoring, audit compliance, communication, knowledge sharing on strategic areas of the company, and identification and implementation of performance improvements and optimizations. Full article
Show Figures

Figure 1

13 pages, 7056 KiB  
Article
Rough-Set-Theory-Based Classification with Optimized k-Means Discretization
by Teguh Handjojo Dwiputranto, Noor Akhmad Setiawan and Teguh Bharata Adji
Technologies 2022, 10(2), 51; https://doi.org/10.3390/technologies10020051 - 08 Apr 2022
Cited by 5 | Viewed by 2724
Abstract
The discretization of continuous attributes in a dataset is an essential step before the Rough-Set-Theory (RST)-based classification process is applied. There are many methods for discretization, but not many of them have linked the RST instruments from the beginning of the discretization process. [...] Read more.
The discretization of continuous attributes in a dataset is an essential step before the Rough-Set-Theory (RST)-based classification process is applied. There are many methods for discretization, but not many of them have linked the RST instruments from the beginning of the discretization process. The objective of this research is to propose a method to improve the accuracy and reliability of the RST-based classifier model by involving RST instruments at the beginning of the discretization process. In the proposed method, a k-means-based discretization method optimized with a genetic algorithm (GA) was introduced. Four datasets taken from UCI were selected to test the performance of the proposed method. The evaluation of the proposed discretization technique for RST-based classification is performed by comparing it to other discretization methods, i.e., equal-frequency and entropy-based. The performance comparison among these methods is measured by the number of bins and rules generated and by its accuracy, precision, and recall. A Friedman test continued with post hoc analysis is also applied to measure the significance of the difference in performance. The experimental results indicate that, in general, the performance of the proposed discretization method is significantly better than the other compared methods. Full article
Show Figures

Figure 1

15 pages, 841 KiB  
Article
A Novel Ensemble Machine Learning Approach for Bioarchaeological Sex Prediction
by Evan Muzzall
Technologies 2021, 9(2), 23; https://doi.org/10.3390/technologies9020023 - 01 Apr 2021
Cited by 4 | Viewed by 2723
Abstract
I present a novel machine learning approach to predict sex in the bioarchaeological record. Eighteen cranial interlandmark distances and five maxillary dental metric distances were recorded from n = 420 human skeletons from the necropolises at Alfedena (600–400 BCE) and Campovalano (750–200 BCE [...] Read more.
I present a novel machine learning approach to predict sex in the bioarchaeological record. Eighteen cranial interlandmark distances and five maxillary dental metric distances were recorded from n = 420 human skeletons from the necropolises at Alfedena (600–400 BCE) and Campovalano (750–200 BCE and 9–11th Centuries CE) in central Italy. A generalized low rank model (GLRM) was used to impute missing data and Area under the Curve—Receiver Operating Characteristic (AUC-ROC) with 20-fold stratified cross-validation was used to evaluate predictive performance of eight machine learning algorithms on different subsets of the data. Additional perspectives such as this one show strong potential for sex prediction in bioarchaeological and forensic anthropological contexts. Furthermore, GLRMs have the potential to handle missing data in ways previously unexplored in the discipline. Although results of this study look promising (highest AUC-ROC = 0.9722 for predicting binary male/female sex), the main limitation is that the sexes of the individuals included were not known but were estimated using standard macroscopic bioarchaeological methods. However, future research should apply this machine learning approach to known-sex reference samples in order to better understand its value, along with the more general contributions that machine learning can make to the reconstruction of past human lifeways. Full article
Show Figures

Figure 1

Review

Jump to: Editorial, Research, Other

26 pages, 3485 KiB  
Review
Hyperparameter Optimization and Combined Data Sampling Techniques in Machine Learning for Customer Churn Prediction: A Comparative Analysis
by Mehdi Imani and Hamid Reza Arabnia
Technologies 2023, 11(6), 167; https://doi.org/10.3390/technologies11060167 - 26 Nov 2023
Cited by 1 | Viewed by 2556
Abstract
This paper explores the application of various machine learning techniques for predicting customer churn in the telecommunications sector. We utilized a publicly accessible dataset and implemented several models, including Artificial Neural Networks, Decision Trees, Support Vector Machines, Random Forests, Logistic Regression, and gradient [...] Read more.
This paper explores the application of various machine learning techniques for predicting customer churn in the telecommunications sector. We utilized a publicly accessible dataset and implemented several models, including Artificial Neural Networks, Decision Trees, Support Vector Machines, Random Forests, Logistic Regression, and gradient boosting techniques (XGBoost, LightGBM, and CatBoost). To mitigate the challenges posed by imbalanced datasets, we adopted different data sampling strategies, namely SMOTE, SMOTE combined with Tomek Links, and SMOTE combined with Edited Nearest Neighbors. Moreover, hyperparameter tuning was employed to enhance model performance. Our evaluation employed standard metrics, such as Precision, Recall, F1-score, and the Receiver Operating Characteristic Area Under Curve (ROC AUC). In terms of the F1-score metric, CatBoost demonstrates superior performance compared to other machine learning models, achieving an outstanding 93% following the application of Optuna hyperparameter optimization. In the context of the ROC AUC metric, both XGBoost and CatBoost exhibit exceptional performance, recording remarkable scores of 91%. This achievement for XGBoost is attained after implementing a combination of SMOTE with Tomek Links, while CatBoost reaches this level of performance after the application of Optuna hyperparameter optimization. Full article
Show Figures

Figure 1

14 pages, 277 KiB  
Review
A Review of Deep Transfer Learning and Recent Advancements
by Mohammadreza Iman, Hamid Reza Arabnia and Khaled Rasheed
Technologies 2023, 11(2), 40; https://doi.org/10.3390/technologies11020040 - 14 Mar 2023
Cited by 71 | Viewed by 12189
Abstract
Deep learning has been the answer to many machine learning problems during the past two decades. However, it comes with two significant constraints: dependency on extensive labeled data and training costs. Transfer learning in deep learning, known as Deep Transfer Learning (DTL), attempts [...] Read more.
Deep learning has been the answer to many machine learning problems during the past two decades. However, it comes with two significant constraints: dependency on extensive labeled data and training costs. Transfer learning in deep learning, known as Deep Transfer Learning (DTL), attempts to reduce such reliance and costs by reusing obtained knowledge from a source data/task in training on a target data/task. Most applied DTL techniques are network/model-based approaches. These methods reduce the dependency of deep learning models on extensive training data and drastically decrease training costs. Moreover, the training cost reduction makes DTL viable on edge devices with limited resources. Like any new advancement, DTL methods have their own limitations, and a successful transfer depends on specific adjustments and strategies for different scenarios. This paper reviews the concept, definition, and taxonomy of deep transfer learning and well-known methods. It investigates the DTL approaches by reviewing applied DTL techniques in the past five years and a couple of experimental analyses of DTLs to discover the best practice for using DTL in different scenarios. Moreover, the limitations of DTLs (catastrophic forgetting dilemma and overly biased pre-trained models) are discussed, along with possible solutions and research trends. Full article
Show Figures

Figure 1

20 pages, 1248 KiB  
Review
Big Data in Biodiversity Science: A Framework for Engagement
by Tendai Musvuugwa, Muxe Gladmond Dlomu and Adekunle Adebowale
Technologies 2021, 9(3), 60; https://doi.org/10.3390/technologies9030060 - 17 Aug 2021
Cited by 6 | Viewed by 6253
Abstract
Despite best efforts, the loss of biodiversity has continued at a pace that constitutes a major threat to the efficient functioning of ecosystems. Curbing the loss of biodiversity and assessing its local and global trends requires a vast amount of datasets from a [...] Read more.
Despite best efforts, the loss of biodiversity has continued at a pace that constitutes a major threat to the efficient functioning of ecosystems. Curbing the loss of biodiversity and assessing its local and global trends requires a vast amount of datasets from a variety of sources. Although the means for generating, aggregating and analyzing big datasets to inform policies are now within the reach of the scientific community, the data-driven nature of a complex multidisciplinary field such as biodiversity science necessitates an overarching framework for engagement. In this review, we propose such a schematic based on the life cycle of data to interrogate the science. The framework considers data generation and collection, storage and curation, access and analysis and, finally, communication as distinct yet interdependent themes for engaging biodiversity science for the purpose of making evidenced-based decisions. We summarize historical developments in each theme, including the challenges and prospects, and offer some recommendations based on best practices. Full article
Show Figures

Figure 1

Other

10 pages, 1007 KiB  
Case Report
Dynamic Storage Location Assignment in Warehouses Using Deep Reinforcement Learning
by Constantin Waubert de Puiseau, Dimitri Tegomo Nanfack, Hasan Tercan, Johannes Löbbert-Plattfaut and Tobias Meisen
Technologies 2022, 10(6), 129; https://doi.org/10.3390/technologies10060129 - 11 Dec 2022
Cited by 2 | Viewed by 2977
Abstract
The warehousing industry is faced with increasing customer demands and growing global competition. A major factor in the efficient operation of warehouses is the strategic storage location assignment of arriving goods, termed the dynamic storage location assignment problem (DSLAP). This paper presents a [...] Read more.
The warehousing industry is faced with increasing customer demands and growing global competition. A major factor in the efficient operation of warehouses is the strategic storage location assignment of arriving goods, termed the dynamic storage location assignment problem (DSLAP). This paper presents a real-world use case of the DSLAP, in which deep reinforcement learning (DRL) is used to derive a suitable storage location assignment strategy to decrease transportation costs within the warehouse. The DRL agent is trained on historic data of storage and retrieval operations gathered over one year of operation. The evaluation of the agent on new data of two months shows a 6.3% decrease in incurring costs compared to the currently utilized storage location assignment strategy which is based on manual ABC-classifications. Hence, DRL proves to be a competitive solution alternative for the DSLAP and related problems in the warehousing industry. Full article
Show Figures

Figure 1

38 pages, 3783 KiB  
Case Report
Business Intelligence’s Self-Service Tools Evaluation
by Jordina Orcajo Hernández and Pau Fonseca i Casas
Technologies 2022, 10(4), 92; https://doi.org/10.3390/technologies10040092 - 10 Aug 2022
Cited by 2 | Viewed by 1826
Abstract
The software selection process in the context of a big company is not an easy task. In the Business Intelligence area, this decision is critical, since the resources needed to implement the tool are huge and imply the participation of all organization actors. [...] Read more.
The software selection process in the context of a big company is not an easy task. In the Business Intelligence area, this decision is critical, since the resources needed to implement the tool are huge and imply the participation of all organization actors. We propose to adopt the systemic quality model to perform a neutral comparison between four business intelligence self-service tools. To assess the quality, we consider eight characteristics and eighty-two metrics. We built a methodology to evaluate self-service BI tools, adapting the systemic quality model. As an example, we evaluated four tools that were selected from all business intelligence platforms, following a rigorous methodology. Through the assessment, we obtained two tools with the maximum quality level. To obtain the differences between them, we were more restrictive increasing the level of satisfaction. Finally, we got a unique tool with the maximum quality level, while the other one was rejected according to the rules established in the methodology. The methodology works well for this type of software, helping in the detailed analysis and neutral selection of the final software to be used for the implementation. Full article
Show Figures

Figure 1

Back to TopTop