Editor’s Choice Articles

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 382 KiB  
Article
ZeroTrustBlock: Enhancing Security, Privacy, and Interoperability of Sensitive Data through ZeroTrust Permissioned Blockchain
by Pratik Thantharate and Anurag Thantharate
Big Data Cogn. Comput. 2023, 7(4), 165; https://doi.org/10.3390/bdcc7040165 - 17 Oct 2023
Cited by 9 | Viewed by 1916
Abstract
With the digitization of healthcare, an immense amount of sensitive medical data are generated and shared between various healthcare stakeholders—however, traditional health data management mechanisms present interoperability, security, and privacy challenges. The centralized nature of current health information systems leads to single points [...] Read more.
With the digitization of healthcare, an immense amount of sensitive medical data are generated and shared between various healthcare stakeholders—however, traditional health data management mechanisms present interoperability, security, and privacy challenges. The centralized nature of current health information systems leads to single points of failure, making the data vulnerable to cyberattacks. Patients also have little control over their medical records, raising privacy concerns. Blockchain technology presents a promising solution to these challenges through its decentralized, transparent, and immutable properties. This research proposes ZeroTrustBlock, a comprehensive blockchain framework for secure and private health information exchange. The decentralized ledger enhances integrity, while permissioned access and smart contracts enable patient-centric control over medical data sharing. A hybrid on-chain and off-chain storage model balances transparency with confidentiality. Integration gateways bridge ZeroTrustBlock protocols with existing systems like EHRs. Implemented on Hyperledger Fabric, ZeroTrustBlock demonstrates substantial security improvements over mainstream databases via cryptographic mechanisms, formal privacy-preserving protocols, and access policies enacting patient consent. Results validate the architecture’s effectiveness in achieving 14,200 TPS average throughput, 480 ms average latency for 100,000 concurrent transactions, and linear scalability up to 20 nodes. However, enhancements around performance, advanced cryptography, and real-world pilots are future work. Overall, ZeroTrustBlock provides a robust application of blockchain capabilities to transform security, privacy, interoperability, and patient agency in health data management. Full article
(This article belongs to the Special Issue Big Data in Health Care Information Systems)
Show Figures

Figure 1

21 pages, 5814 KiB  
Article
Intelligent Method for Classifying the Level of Anthropogenic Disasters
by Khrystyna Lipianina-Honcharenko, Carsten Wolff, Anatoliy Sachenko, Ivan Kit and Diana Zahorodnia
Big Data Cogn. Comput. 2023, 7(3), 157; https://doi.org/10.3390/bdcc7030157 - 21 Sep 2023
Cited by 1 | Viewed by 1476
Abstract
Anthropogenic disasters pose a challenge to management in the modern world. At the same time, it is important to have accurate and timely information to assess the level of danger and take appropriate measures to eliminate disasters. Therefore, the purpose of the paper [...] Read more.
Anthropogenic disasters pose a challenge to management in the modern world. At the same time, it is important to have accurate and timely information to assess the level of danger and take appropriate measures to eliminate disasters. Therefore, the purpose of the paper is to develop an effective method for assessing the level of anthropogenic disasters based on information from witnesses to the event. For this purpose, a conceptual model for assessing the consequences of anthropogenic disasters is proposed, the main components of which are the following ones: the analysis of collected data, modeling and assessment of their consequences. The main characteristics of the intelligent method for classifying the level of anthropogenic disasters are considered, in particular, exploratory data analysis using the EDA method, classification based on textual data using SMOTE, and data classification by the ensemble method of machine learning using boosting. The experimental results confirmed that for textual data, the best classification is at level V and level I with an error of 0.97 and 0.94, respectively, and the average error estimate is 0.68. For quantitative data, the classification accuracy of Potential Accident Level relative to Industry Sector is 77%, and the f1-score is 0.88, which indicates a fairly high accuracy of the model. The architecture of a mobile application for classifying the level of anthropogenic disasters has been developed, which reduces the time required to assess consequences of danger in the region. In addition, the proposed approach ensures interaction with dynamic and uncertain environments, which makes it an effective tool for classifying. Full article
(This article belongs to the Special Issue Quality and Security of Critical Infrastructure Systems)
Show Figures

Figure 1

27 pages, 3194 KiB  
Article
Predicting Forex Currency Fluctuations Using a Novel Bio-Inspired Modular Neural Network
by Christos Bormpotsis, Mohamed Sedky and Asma Patel
Big Data Cogn. Comput. 2023, 7(3), 152; https://doi.org/10.3390/bdcc7030152 - 15 Sep 2023
Viewed by 3852
Abstract
In the realm of foreign exchange (Forex) market predictions, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have been commonly employed. However, these models often exhibit instability due to vulnerability to data perturbations attributed to their monolithic architecture. Hence, this study proposes [...] Read more.
In the realm of foreign exchange (Forex) market predictions, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have been commonly employed. However, these models often exhibit instability due to vulnerability to data perturbations attributed to their monolithic architecture. Hence, this study proposes a novel neuroscience-informed modular network that harnesses closing prices and sentiments from Yahoo Finance and Twitter APIs. Compared to monolithic methods, the objective is to advance the effectiveness of predicting price fluctuations in Euro to British Pound Sterling (EUR/GBP). The proposed model offers a unique methodology based on a reinvigorated modular CNN, replacing pooling layers with orthogonal kernel initialisation RNNs coupled with Monte Carlo Dropout (MCoRNNMCD). It integrates two pivotal modules: a convolutional simple RNN and a convolutional Gated Recurrent Unit (GRU). These modules incorporate orthogonal kernel initialisation and Monte Carlo Dropout techniques to mitigate overfitting, assessing each module’s uncertainty. The synthesis of these parallel feature extraction modules culminates in a three-layer Artificial Neural Network (ANN) decision-making module. Established on objective metrics like the Mean Square Error (MSE), rigorous evaluation underscores the proposed MCoRNNMCD–ANN’s exceptional performance. MCoRNNMCD–ANN surpasses single CNNs, LSTMs, GRUs, and the state-of-the-art hybrid BiCuDNNLSTM, CLSTM, CNN–LSTM, and LSTM–GRU in predicting hourly EUR/GBP closing price fluctuations. Full article
Show Figures

Figure 1

28 pages, 4173 KiB  
Review
Innovative Robotic Technologies and Artificial Intelligence in Pharmacy and Medicine: Paving the Way for the Future of Health Care—A Review
by Maryna Stasevych and Viktor Zvarych
Big Data Cogn. Comput. 2023, 7(3), 147; https://doi.org/10.3390/bdcc7030147 - 30 Aug 2023
Cited by 5 | Viewed by 5071
Abstract
The future of innovative robotic technologies and artificial intelligence (AI) in pharmacy and medicine is promising, with the potential to revolutionize various aspects of health care. These advances aim to increase efficiency, improve patient outcomes, and reduce costs while addressing pressing challenges such [...] Read more.
The future of innovative robotic technologies and artificial intelligence (AI) in pharmacy and medicine is promising, with the potential to revolutionize various aspects of health care. These advances aim to increase efficiency, improve patient outcomes, and reduce costs while addressing pressing challenges such as personalized medicine and the need for more effective therapies. This review examines the major advances in robotics and AI in the pharmaceutical and medical fields, analyzing the advantages, obstacles, and potential implications for future health care. In addition, prominent organizations and research institutions leading the way in these technological advancements are highlighted, showcasing their pioneering efforts in creating and utilizing state-of-the-art robotic solutions in pharmacy and medicine. By thoroughly analyzing the current state of robotic technologies in health care and exploring the possibilities for further progress, this work aims to provide readers with a comprehensive understanding of the transformative power of robotics and AI in the evolution of the healthcare sector. Striking a balance between embracing technology and preserving the human touch, investing in R&D, and establishing regulatory frameworks within ethical guidelines will shape a future for robotics and AI systems. The future of pharmacy and medicine is in the seamless integration of robotics and AI systems to benefit patients and healthcare providers. Full article
Show Figures

Figure 1

17 pages, 4460 KiB  
Article
An End-to-End Online Traffic-Risk Incident Prediction in First-Person Dash Camera Videos
by Hilmil Pradana
Big Data Cogn. Comput. 2023, 7(3), 129; https://doi.org/10.3390/bdcc7030129 - 06 Jul 2023
Cited by 3 | Viewed by 1567
Abstract
Predicting traffic risk incidents in first-person helps to ensure a safety reaction can occur before the incident happens for a wide range of driving scenarios and conditions. One challenge to building advanced driver assistance systems is to create an early warning system for [...] Read more.
Predicting traffic risk incidents in first-person helps to ensure a safety reaction can occur before the incident happens for a wide range of driving scenarios and conditions. One challenge to building advanced driver assistance systems is to create an early warning system for the driver to react safely and accurately while perceiving the diversity of traffic-risk predictions in real-world applications. In this paper, we aim to bridge the gap by investigating two key research questions regarding the driver’s current status of driving through online videos and the types of other moving objects that lead to dangerous situations. To address these problems, we proposed an end-to-end two-stage architecture: in the first stage, unsupervised learning is applied to collect all suspicious events on actual driving; in the second stage, supervised learning is used to classify all suspicious event results from the first stage to a common event type. To enrich the classification type, the metadata from the result of the first stage is sent to the second stage to handle the data limitation while training our classification model. Through the online situation, our method runs 9.60 fps on average with 1.44 fps on standard deviation. Our quantitative evaluation shows that our method reaches 81.87% and 73.43% for the average F1-score on labeled data of CST-S3D and real driving datasets, respectively. Furthermore, the proposed method has the potential to assist distribution companies in evaluating the driving performance of their driver by automatically monitoring near-miss events and analyzing driving patterns for training programs to reduce future accidents. Full article
(This article belongs to the Special Issue Deep Network Learning and Its Applications)
Show Figures

Figure 1

24 pages, 4621 KiB  
Article
Cognitive Network Science Reveals Bias in GPT-3, GPT-3.5 Turbo, and GPT-4 Mirroring Math Anxiety in High-School Students
by Katherine Abramski, Salvatore Citraro, Luigi Lombardi, Giulio Rossetti and Massimo Stella
Big Data Cogn. Comput. 2023, 7(3), 124; https://doi.org/10.3390/bdcc7030124 - 27 Jun 2023
Cited by 8 | Viewed by 4410
Abstract
Large Language Models (LLMs) are becoming increasingly integrated into our lives. Hence, it is important to understand the biases present in their outputs in order to avoid perpetuating harmful stereotypes, which originate in our own flawed ways of thinking. This challenge requires developing [...] Read more.
Large Language Models (LLMs) are becoming increasingly integrated into our lives. Hence, it is important to understand the biases present in their outputs in order to avoid perpetuating harmful stereotypes, which originate in our own flawed ways of thinking. This challenge requires developing new benchmarks and methods for quantifying affective and semantic bias, keeping in mind that LLMs act as psycho-social mirrors that reflect the views and tendencies that are prevalent in society. One such tendency that has harmful negative effects is the global phenomenon of anxiety toward math and STEM subjects. In this study, we introduce a novel application of network science and cognitive psychology to understand biases towards math and STEM fields in LLMs from ChatGPT, such as GPT-3, GPT-3.5, and GPT-4. Specifically, we use behavioral forma mentis networks (BFMNs) to understand how these LLMs frame math and STEM disciplines in relation to other concepts. We use data obtained by probing the three LLMs in a language generation task that has previously been applied to humans. Our findings indicate that LLMs have negative perceptions of math and STEM fields, associating math with negative concepts in 6 cases out of 10. We observe significant differences across OpenAI’s models: newer versions (i.e., GPT-4) produce 5× semantically richer, more emotionally polarized perceptions with fewer negative associations compared to older versions and N=159 high-school students. These findings suggest that advances in the architecture of LLMs may lead to increasingly less biased models that could even perhaps someday aid in reducing harmful stereotypes in society rather than perpetuating them. Full article
Show Figures

Figure 1

21 pages, 7048 KiB  
Article
Molecular Structure-Based Prediction of Absorption Maxima of Dyes Using ANN Model
by Neeraj Tomar, Geeta Rani, Vijaypal Singh Dhaka, Praveen K. Surolia, Kalpit Gupta, Eugenio Vocaturo and Ester Zumpano
Big Data Cogn. Comput. 2023, 7(2), 115; https://doi.org/10.3390/bdcc7020115 - 08 Jun 2023
Cited by 2 | Viewed by 1976
Abstract
The exponentially growing energy requirements and, in turn, extensive depletion of non-restorable sources of energy are a major cause of concern. Restorable energy sources such as solar cells can be used as an alternative. However, their low efficiency is a barrier to their [...] Read more.
The exponentially growing energy requirements and, in turn, extensive depletion of non-restorable sources of energy are a major cause of concern. Restorable energy sources such as solar cells can be used as an alternative. However, their low efficiency is a barrier to their practical use. This provokes the research community to design efficient solar cells. Based on the study of efficacy, design feasibility, and cost of fabrication, DSSC shows supremacy over other photovoltaic solar cells. However, fabricating DSSC in a laboratory and then assessing their characteristics is a costly affair. The researchers applied techniques of computational chemistry such as Time-Dependent Density Functional Theory, and an ab initio method for defining the structure and electronic properties of dyes without synthesizing them. However, the inability of descriptors to provide an intuitive physical depiction of the effect of all parameters is a limitation of the proposed approaches. The proven potential of neural network models in data analysis, pattern recognition, and object detection motivated researchers to extend their applicability for predicting the absorption maxima (λmax) of dye. The objective of this research is to develop an ANN-based QSPR model for correctly predicting the value of λmax for inorganic ruthenium complex dyes used in DSSC. Furthermore, it demonstrates the impact of different activation functions, optimizers, and loss functions on the prediction accuracy of λmax. Moreover, this research showcases the impact of atomic weight, types of bonds between constituents of the dye molecule, and the molecular weight of the dye molecule on the value of λmax. The experimental results proved that the value of λmax varies with changes in constituent atoms and types of bonds in a dye molecule. In addition, the model minimizes the difference in the experimental and calculated values of absorption maxima. The comparison with the existing models proved the dominance of the proposed model. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing in 2023)
Show Figures

Figure 1

13 pages, 749 KiB  
Article
Massive Parallel Alignment of RNA-seq Reads in Serverless Computing
by Pietro Cinaglia, José Luis Vázquez-Poletti and Mario Cannataro
Big Data Cogn. Comput. 2023, 7(2), 98; https://doi.org/10.3390/bdcc7020098 - 15 May 2023
Cited by 3 | Viewed by 1687
Abstract
In recent years, the use of Cloud infrastructures for data processing has proven useful, with a computing potential that is not affected by the limitations of a local infrastructure. In this context, Serverless computing is the fastest-growing Cloud service model due to its [...] Read more.
In recent years, the use of Cloud infrastructures for data processing has proven useful, with a computing potential that is not affected by the limitations of a local infrastructure. In this context, Serverless computing is the fastest-growing Cloud service model due to its auto-scaling methodologies, reliability, and fault tolerance. We present a solution based on in-house Serverless infrastructure, which is able to perform large-scale RNA-seq data analysis focused on the mapping of sequencing reads to a reference genome. The main contribution was bringing the computation of genomic data into serverless computing, focusing on RNA-seq read-mapping to a reference genome, as this is the most time-consuming task for some pipelines. The proposed solution handles massive parallel instances to maximize the efficiency in terms of running time. We evaluated the performance of our solution by performing two main tests, both based on the mapping of RNA-seq reads to Human GRCh38. Our experiments demonstrated a reduction of 79.838%, 90.079%, and 96.382%, compared to the local environments with 16, 8, and 4 virtual cores, respectively. Furthermore, serverless limitations were investigated. Full article
(This article belongs to the Special Issue Data-Based Bioinformatics and Applications)
Show Figures

Figure 1

22 pages, 930 KiB  
Systematic Review
A Systematic Review of Blockchain Technology Adoption Barriers and Enablers for Smart and Sustainable Agriculture
by Gopi Krishna Akella, Santoso Wibowo, Srimannarayana Grandhi and Sameera Mubarak
Big Data Cogn. Comput. 2023, 7(2), 86; https://doi.org/10.3390/bdcc7020086 - 04 May 2023
Cited by 9 | Viewed by 3047
Abstract
Smart and sustainable agricultural practices are more complex than other industries as the production depends on many pre- and post-harvesting factors which are difficult to predict and control. Previous studies have shown that technologies such as blockchain along with sustainable practices can achieve [...] Read more.
Smart and sustainable agricultural practices are more complex than other industries as the production depends on many pre- and post-harvesting factors which are difficult to predict and control. Previous studies have shown that technologies such as blockchain along with sustainable practices can achieve smart and sustainable agriculture. These studies state that there is a need for a reliable and trustworthy environment among the intermediaries throughout the agrifood supply chain to achieve sustainability. However, there are limited studies on blockchain technology adoption for smart and sustainable agriculture. Therefore, this systematic review uses the PRISMA technique to explore the barriers and enablers of blockchain adoption for smart and sustainable agriculture. Data was collected using exhaustive selection criteria and filters to evaluate the barriers and enablers of blockchain technology for smart and sustainable agriculture. The results provide on the one hand adoption enablers such as stakeholder collaboration, enhance customer trust, and democratization, and, on the other hand, barriers such as lack of global standards, industry level best practices and policies for blockchain adoption in the agrifood sector. The outcome of this review highlights the adoption barriers over enablers of blockchain technology for smart and sustainable agriculture. Furthermore, several recommendations and implications are presented for addressing knowledge gaps for successful implementation. Full article
Show Figures

Figure 1

20 pages, 3726 KiB  
Article
DLBCNet: A Deep Learning Network for Classifying Blood Cells
by Ziquan Zhu, Zeyu Ren, Siyuan Lu, Shuihua Wang and Yudong Zhang
Big Data Cogn. Comput. 2023, 7(2), 75; https://doi.org/10.3390/bdcc7020075 - 14 Apr 2023
Cited by 3 | Viewed by 2178
Abstract
Background: Blood is responsible for delivering nutrients to various organs, which store important health information about the human body. Therefore, the diagnosis of blood can indirectly help doctors judge a person’s physical state. Recently, researchers have applied deep learning (DL) to the automatic [...] Read more.
Background: Blood is responsible for delivering nutrients to various organs, which store important health information about the human body. Therefore, the diagnosis of blood can indirectly help doctors judge a person’s physical state. Recently, researchers have applied deep learning (DL) to the automatic analysis of blood cells. However, there are still some deficiencies in these models. Methods: To cope with these issues, we propose a novel network for the multi-classification of blood cells, which is called DLBCNet. A new specifical model for blood cells (BCGAN) is designed to generate synthetic images. The pre-trained ResNet50 is implemented as the backbone model, which serves as the feature extractor. The extracted features are fed to the proposed ETRN to improve the multi-classification performance of blood cells. Results: The average accuracy, average sensitivity, average precision, average specificity, and average f1-score of the proposed model are 95.05%, 93.25%, 97.75%, 93.72%, and 95.38%, accordingly. Conclusions: The performance of the proposed model surpasses other state-of-the-art methods in reported classification results. Full article
Show Figures

Figure 1

26 pages, 895 KiB  
Review
Predicting Colorectal Cancer Using Machine and Deep Learning Algorithms: Challenges and Opportunities
by Dabiah Alboaneen, Razan Alqarni, Sheikah Alqahtani, Maha Alrashidi, Rawan Alhuda, Eyman Alyahyan and Turki Alshammari
Big Data Cogn. Comput. 2023, 7(2), 74; https://doi.org/10.3390/bdcc7020074 - 13 Apr 2023
Cited by 7 | Viewed by 4741
Abstract
One of the three most serious and deadly cancers in the world is colorectal cancer. The most crucial stage, like with any cancer, is early diagnosis. In the medical industry, artificial intelligence (AI) has recently made tremendous strides and showing promise for clinical [...] Read more.
One of the three most serious and deadly cancers in the world is colorectal cancer. The most crucial stage, like with any cancer, is early diagnosis. In the medical industry, artificial intelligence (AI) has recently made tremendous strides and showing promise for clinical applications. Machine learning (ML) and deep learning (DL) applications have recently gained popularity in the analysis of medical texts and images due to the benefits and achievements they have made in the early diagnosis of cancerous tissues and organs. In this paper, we intend to systematically review the state-of-the-art research on AI-based ML and DL techniques applied to the modeling of colorectal cancer. All research papers in the field of colorectal cancer are collected based on ML and DL techniques, and they are then classified into three categories: the aim of the prediction, the method of the prediction, and data samples. Following that, a thorough summary and a list of the studies gathered under each topic are provided. We conclude our study with a critical discussion of the challenges and opportunities in colorectal cancer prediction using ML and DL techniques by concentrating on the technical and medical points of view. Finally, we believe that our study will be helpful to scientists who are considering employing ML and DL methods to diagnose colorectal cancer. Full article
Show Figures

Figure 1

16 pages, 355 KiB  
Article
The Role of ChatGPT in Data Science: How AI-Assisted Conversational Interfaces Are Revolutionizing the Field
by Hossein Hassani and Emmanuel Sirmal Silva
Big Data Cogn. Comput. 2023, 7(2), 62; https://doi.org/10.3390/bdcc7020062 - 27 Mar 2023
Cited by 71 | Viewed by 30469
Abstract
ChatGPT, a conversational AI interface that utilizes natural language processing and machine learning algorithms, is taking the world by storm and is the buzzword across many sectors today. Given the likely impact of this model on data science, through this perspective article, we [...] Read more.
ChatGPT, a conversational AI interface that utilizes natural language processing and machine learning algorithms, is taking the world by storm and is the buzzword across many sectors today. Given the likely impact of this model on data science, through this perspective article, we seek to provide an overview of the potential opportunities and challenges associated with using ChatGPT in data science, provide readers with a snapshot of its advantages, and stimulate interest in its use for data science projects. The paper discusses how ChatGPT can assist data scientists in automating various aspects of their workflow, including data cleaning and preprocessing, model training, and result interpretation. It also highlights how ChatGPT has the potential to provide new insights and improve decision-making processes by analyzing unstructured data. We then examine the advantages of ChatGPT’s architecture, including its ability to be fine-tuned for a wide range of language-related tasks and generate synthetic data. Limitations and issues are also addressed, particularly around concerns about bias and plagiarism when using ChatGPT. Overall, the paper concludes that the benefits outweigh the costs and ChatGPT has the potential to greatly enhance the productivity and accuracy of data science workflows and is likely to become an increasingly important tool for intelligence augmentation in the field of data science. ChatGPT can assist with a wide range of natural language processing tasks in data science, including language translation, sentiment analysis, and text classification. However, while ChatGPT can save time and resources compared to training a model from scratch, and can be fine-tuned for specific use cases, it may not perform well on certain tasks if it has not been specifically trained for them. Additionally, the output of ChatGPT may be difficult to interpret, which could pose challenges for decision-making in data science applications. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing in 2023)
33 pages, 12116 KiB  
Article
MalBERTv2: Code Aware BERT-Based Model for Malware Identification
by Abir Rahali and Moulay A. Akhloufi
Big Data Cogn. Comput. 2023, 7(2), 60; https://doi.org/10.3390/bdcc7020060 - 24 Mar 2023
Cited by 6 | Viewed by 4192
Abstract
To proactively mitigate malware threats, cybersecurity tools, such as anti-virus and anti-malware software, as well as firewalls, require frequent updates and proactive implementation. However, processing the vast amounts of dataset examples can be overwhelming when relying solely on traditional methods. In cybersecurity workflows, [...] Read more.
To proactively mitigate malware threats, cybersecurity tools, such as anti-virus and anti-malware software, as well as firewalls, require frequent updates and proactive implementation. However, processing the vast amounts of dataset examples can be overwhelming when relying solely on traditional methods. In cybersecurity workflows, recent advances in natural language processing (NLP) models can aid in proactively detecting various threats. In this paper, we present a novel approach for representing the relevance and significance of the Malware/Goodware (MG) datasets, through the use of a pre-trained language model called MalBERTv2. Our model is trained on publicly available datasets, with a focus on the source code of the apps by extracting the top-ranked files that present the most relevant information. These files are then passed through a pre-tokenization feature generator, and the resulting keywords are used to train the tokenizer from scratch. Finally, we apply a classifier using bidirectional encoder representations from transformers (BERT) as a layer within the model pipeline. The performance of our model is evaluated on different datasets, achieving a weighted f1 score ranging from 82% to 99%. Our results demonstrate the effectiveness of our approach for proactively detecting malware threats using NLP techniques. Full article
(This article belongs to the Special Issue Artificial Intelligence and Natural Language Processing)
Show Figures

Figure 1

19 pages, 10184 KiB  
Article
Recognizing Road Surface Traffic Signs Based on Yolo Models Considering Image Flips
by Christine Dewi, Rung-Ching Chen, Yong-Cun Zhuang, Xiaoyi Jiang and Hui Yu
Big Data Cogn. Comput. 2023, 7(1), 54; https://doi.org/10.3390/bdcc7010054 - 22 Mar 2023
Cited by 6 | Viewed by 2507
Abstract
In recent years, there have been significant advances in deep learning and road marking recognition due to machine learning and artificial intelligence. Despite significant progress, it often relies heavily on unrepresentative datasets and limited situations. Drivers and advanced driver assistance systems rely on [...] Read more.
In recent years, there have been significant advances in deep learning and road marking recognition due to machine learning and artificial intelligence. Despite significant progress, it often relies heavily on unrepresentative datasets and limited situations. Drivers and advanced driver assistance systems rely on road markings to help them better understand their environment on the street. Road markings are signs and texts painted on the road surface, including directional arrows, pedestrian crossings, speed limit signs, zebra crossings, and other equivalent signs and texts. Pavement markings are also known as road markings. Our experiments briefly discuss convolutional neural network (CNN)-based object detection algorithms, specifically for Yolo V2, Yolo V3, Yolo V4, and Yolo V4-tiny. In our experiments, we built the Taiwan Road Marking Sign Dataset (TRMSD) and made it a public dataset so other researchers could use it. Further, we train the model to distinguish left and right objects into separate classes. Furthermore, Yolo V4 and Yolo V4-tiny results can benefit from the “No Flip” setting. In our case, we want the model to distinguish left and right objects into separate classes. The best model in the experiment is Yolo V4 (No Flip), with a test accuracy of 95.43% and an IoU of 66.12%. In this study, Yolo V4 (without flipping) outperforms state-of-the-art schemes, achieving 81.22% training accuracy and 95.34% testing accuracy on the TRMSD dataset. Full article
Show Figures

Figure 1

16 pages, 2013 KiB  
Article
A Hybrid Deep Learning Framework with Decision-Level Fusion for Breast Cancer Survival Prediction
by Nermin Abdelhakim Othman, Manal A. Abdel-Fattah and Ahlam Talaat Ali
Big Data Cogn. Comput. 2023, 7(1), 50; https://doi.org/10.3390/bdcc7010050 - 16 Mar 2023
Cited by 4 | Viewed by 2327
Abstract
Because of technological advancements and their use in the medical area, many new methods and strategies have been developed to address complex real-life challenges. Breast cancer, a particular kind of tumor that arises in breast cells, is one of the most prevalent types [...] Read more.
Because of technological advancements and their use in the medical area, many new methods and strategies have been developed to address complex real-life challenges. Breast cancer, a particular kind of tumor that arises in breast cells, is one of the most prevalent types of cancer in women and is. Early breast cancer detection and classification are crucial. Early detection considerably increases the likelihood of survival, which motivates us to contribute to different detection techniques from a technical standpoint. Additionally, manual detection requires a lot of time and effort and carries the risk of pathologist error and inaccurate classification. To address these problems, in this study, a hybrid deep learning model that enables decision making based on data from multiple data sources is proposed and used with two different classifiers. By incorporating multi-omics data (clinical data, gene expression data, and copy number alteration data) from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset, the accuracy of patient survival predictions is expected to be improved relative to prediction utilizing only one modality of data. A convolutional neural network (CNN) architecture is used for feature extraction. LSTM and GRU are used as classifiers. The accuracy achieved by LSTM is 97.0%, and that achieved by GRU is 97.5, while using decision fusion (LSTM and GRU) achieves the best accuracy of 98.0%. The prediction performance assessed using various performance indicators demonstrates that our model outperforms currently used methodologies. Full article
(This article belongs to the Special Issue Deep Network Learning and Its Applications)
Show Figures

Figure 1

17 pages, 12591 KiB  
Article
Real-Time Attention Monitoring System for Classroom: A Deep Learning Approach for Student’s Behavior Recognition
by Zouheir Trabelsi, Fady Alnajjar, Medha Mohan Ambali Parambil, Munkhjargal Gochoo and Luqman Ali
Big Data Cogn. Comput. 2023, 7(1), 48; https://doi.org/10.3390/bdcc7010048 - 09 Mar 2023
Cited by 12 | Viewed by 10697
Abstract
Effective classroom instruction requires monitoring student participation and interaction during class, identifying cues to simulate their attention. The ability of teachers to analyze and evaluate students’ classroom behavior is becoming a crucial criterion for quality teaching. Artificial intelligence (AI)-based behavior recognition techniques can [...] Read more.
Effective classroom instruction requires monitoring student participation and interaction during class, identifying cues to simulate their attention. The ability of teachers to analyze and evaluate students’ classroom behavior is becoming a crucial criterion for quality teaching. Artificial intelligence (AI)-based behavior recognition techniques can help evaluate students’ attention and engagement during classroom sessions. With rapid digitalization, the global education system is adapting and exploring emerging technological innovations, such as AI, the Internet of Things, and big data analytics, to improve education systems. In educational institutions, modern classroom systems are supplemented with the latest technologies to make them more interactive, student centered, and customized. However, it is difficult for instructors to assess students’ interest and attention levels even with these technologies. This study harnesses modern technology to introduce an intelligent real-time vision-based classroom to monitor students’ emotions, attendance, and attention levels even when they have face masks on. We used a machine learning approach to train students’ behavior recognition models, including identifying facial expressions, to identify students’ attention/non-attention in a classroom. The attention/no-attention dataset is collected based on nine categories. The dataset is given the YOLOv5 pre-trained weights for training. For validation, the performance of various versions of the YOLOv5 model (v5m, v5n, v5l, v5s, and v5x) are compared based on different evaluation measures (precision, recall, mAP, and F1 score). Our results show that all models show promising performance with 76% average accuracy. Applying the developed model can enable instructors to visualize students’ behavior and emotional states at different levels, allowing them to appropriately manage teaching sessions by considering student-centered learning scenarios. Overall, the proposed model will enhance instructors’ performance and students at an academic level. Full article
Show Figures

Figure 1

18 pages, 2033 KiB  
Article
Machine Learning-Based Identifications of COVID-19 Fake News Using Biomedical Information Extraction
by Faizi Fifita, Jordan Smith, Melissa B. Hanzsek-Brill, Xiaoyin Li and Mengshi Zhou
Big Data Cogn. Comput. 2023, 7(1), 46; https://doi.org/10.3390/bdcc7010046 - 07 Mar 2023
Cited by 3 | Viewed by 3560
Abstract
The spread of fake news related to COVID-19 is an infodemic that leads to a public health crisis. Therefore, detecting fake news is crucial for an effective management of the COVID-19 pandemic response. Studies have shown that machine learning models can detect COVID-19 [...] Read more.
The spread of fake news related to COVID-19 is an infodemic that leads to a public health crisis. Therefore, detecting fake news is crucial for an effective management of the COVID-19 pandemic response. Studies have shown that machine learning models can detect COVID-19 fake news based on the content of news articles. However, the use of biomedical information, which is often featured in COVID-19 news, has not been explored in the development of these models. We present a novel approach for predicting COVID-19 fake news by leveraging biomedical information extraction (BioIE) in combination with machine learning models. We analyzed 1164 COVID-19 news articles and used advanced BioIE algorithms to extract 158 novel features. These features were then used to train 15 machine learning classifiers to predict COVID-19 fake news. Among the 15 classifiers, the random forest model achieved the best performance with an area under the ROC curve (AUC) of 0.882, which is 12.36% to 31.05% higher compared to models trained on traditional features. Furthermore, incorporating BioIE-based features improved the performance of a state-of-the-art multi-modality model (AUC 0.914 vs. 0.887). Our study suggests that incorporating biomedical information into fake news detection models improves their performance, and thus could be a valuable tool in the fight against the COVID-19 infodemic. Full article
Show Figures

Figure 1

16 pages, 3458 KiB  
Article
An Obstacle-Finding Approach for Autonomous Mobile Robots Using 2D LiDAR Data
by Lesia Mochurad, Yaroslav Hladun and Roman Tkachenko
Big Data Cogn. Comput. 2023, 7(1), 43; https://doi.org/10.3390/bdcc7010043 - 01 Mar 2023
Cited by 9 | Viewed by 2567
Abstract
Obstacle detection is crucial for the navigation of autonomous mobile robots: it is necessary to ensure their presence as accurately as possible and find their position relative to the robot. Autonomous mobile robots for indoor navigation purposes use several special sensors for various [...] Read more.
Obstacle detection is crucial for the navigation of autonomous mobile robots: it is necessary to ensure their presence as accurately as possible and find their position relative to the robot. Autonomous mobile robots for indoor navigation purposes use several special sensors for various tasks. One such study is localizing the robot in space. In most cases, the LiDAR sensor is employed to solve this problem. In addition, the data from this sensor are critical, as the sensor is directly related to the distance of objects and obstacles surrounding the robot, so LiDAR data can be used for detection. This article is devoted to developing an obstacle detection algorithm based on 2D LiDAR sensor data. We propose a parallelization method to speed up this algorithm while processing big data. The result is an algorithm that finds obstacles and objects with high accuracy and speed: it receives a set of points from the sensor and data about the robot’s movements. It outputs a set of line segments, where each group of such line segments describes an object. The two proposed metrics assessed accuracy, and both averages are high: 86% and 91% for the first and second metrics, respectively. The proposed method is flexible enough to optimize it for a specific configuration of the LiDAR sensor. Four hyperparameters are experimentally found for a given sensor configuration to maximize the correspondence between real and found objects. The work of the proposed algorithm has been carefully tested on simulated and actual data. The authors also investigated the relationship between the selected hyperparameters’ values and the algorithm’s efficiency. Potential applications, limitations, and opportunities for future research are discussed. Full article
(This article belongs to the Special Issue Quality and Security of Critical Infrastructure Systems)
Show Figures

Figure 1

25 pages, 6265 KiB  
Article
COVID-19 Classification through Deep Learning Models with Three-Channel Grayscale CT Images
by Maisarah Mohd Sufian, Ervin Gubin Moung, Mohd Hanafi Ahmad Hijazi, Farashazillah Yahya, Jamal Ahmad Dargham, Ali Farzamnia, Florence Sia and Nur Faraha Mohd Naim
Big Data Cogn. Comput. 2023, 7(1), 36; https://doi.org/10.3390/bdcc7010036 - 16 Feb 2023
Cited by 3 | Viewed by 3016
Abstract
COVID-19, an infectious coronavirus disease, has triggered a pandemic that has claimed many lives. Clinical institutes have long considered computed tomography (CT) as an excellent and complementary screening method to reverse transcriptase-polymerase chain reaction (RT-PCR). Because of the limited dataset available on COVID-19, [...] Read more.
COVID-19, an infectious coronavirus disease, has triggered a pandemic that has claimed many lives. Clinical institutes have long considered computed tomography (CT) as an excellent and complementary screening method to reverse transcriptase-polymerase chain reaction (RT-PCR). Because of the limited dataset available on COVID-19, transfer learning-based models have become the go-to solutions for automatic COVID-19 detection. However, CT images are typically provided in grayscale, thus posing a challenge for automatic detection using pre-trained models, which were previously trained on RGB images. Several methods have been proposed in the literature for converting grayscale images to RGB (three-channel) images for use with pre-trained deep-learning models, such as pseudo-colorization, replication, and colorization. The most common method is replication, where the one-channel grayscale image is repeated in the three-channel image. While this technique is simple, it does not provide new information and can lead to poor performance due to redundant image features fed into the DL model. This study proposes a novel image pre-processing method for grayscale medical images that utilize Histogram Equalization (HE) and Contrast Limited Adaptive Histogram Equalization (CLAHE) to create a three-channel image representation that provides different information on each channel. The effectiveness of this method is evaluated using six other pre-trained models, including InceptionV3, MobileNet, ResNet50, VGG16, ViT-B16, and ViT-B32. The results show that the proposed image representation significantly improves the classification performance of the models, with the InceptionV3 model achieving an accuracy of 99.60% and a recall (also referred as sensitivity) of 99.59%. The proposed method addresses the limitation of using grayscale medical images for COVID-19 detection and can potentially improve the early detection and control of the disease. Additionally, the proposed method can be applied to other medical imaging tasks with a grayscale image input, thus making it a generalizable solution. Full article
Show Figures

Figure 1

10 pages, 1754 KiB  
Article
“What Can ChatGPT Do?” Analyzing Early Reactions to the Innovative AI Chatbot on Twitter
by Viriya Taecharungroj
Big Data Cogn. Comput. 2023, 7(1), 35; https://doi.org/10.3390/bdcc7010035 - 16 Feb 2023
Cited by 113 | Viewed by 28291
Abstract
In this study, the author collected tweets about ChatGPT, an innovative AI chatbot, in the first month after its launch. A total of 233,914 English tweets were analyzed using the latent Dirichlet allocation (LDA) topic modeling algorithm to answer the question “what can [...] Read more.
In this study, the author collected tweets about ChatGPT, an innovative AI chatbot, in the first month after its launch. A total of 233,914 English tweets were analyzed using the latent Dirichlet allocation (LDA) topic modeling algorithm to answer the question “what can ChatGPT do?”. The results revealed three general topics: news, technology, and reactions. The author also identified five functional domains: creative writing, essay writing, prompt writing, code writing, and answering questions. The analysis also found that ChatGPT has the potential to impact technologies and humans in both positive and negative ways. In conclusion, the author outlines four key issues that need to be addressed as a result of this AI advancement: the evolution of jobs, a new technological landscape, the quest for artificial general intelligence, and the progress-ethics conundrum. Full article
(This article belongs to the Special Issue Artificial Intelligence and Natural Language Processing)
Show Figures

Figure 1

16 pages, 4313 KiB  
Article
A Novel Approach for Diabetic Retinopathy Screening Using Asymmetric Deep Learning Features
by Pradeep Kumar Jena, Bonomali Khuntia, Charulata Palai, Manjushree Nayak, Tapas Kumar Mishra and Sachi Nandan Mohanty
Big Data Cogn. Comput. 2023, 7(1), 25; https://doi.org/10.3390/bdcc7010025 - 29 Jan 2023
Cited by 32 | Viewed by 3325
Abstract
Automatic screening of diabetic retinopathy (DR) is a well-identified area of research in the domain of computer vision. It is challenging due to structural complexity and a marginal contrast difference between the retinal vessels and the background of the fundus image. As bright [...] Read more.
Automatic screening of diabetic retinopathy (DR) is a well-identified area of research in the domain of computer vision. It is challenging due to structural complexity and a marginal contrast difference between the retinal vessels and the background of the fundus image. As bright lesions are prominent in the green channel, we applied contrast-limited adaptive histogram equalization (CLAHE) on the green channel for image enhancement. This work proposes a novel diabetic retinopathy screening technique using an asymmetric deep learning feature. The asymmetric deep learning features are extracted using U-Net for segmentation of the optic disc and blood vessels. Then a convolutional neural network (CNN) with a support vector machine (SVM) is used for the DR lesions classification. The lesions are classified into four classes, i.e., normal, microaneurysms, hemorrhages, and exudates. The proposed method is tested with two publicly available retinal image datasets, i.e., APTOS and MESSIDOR. The accuracy achieved for non-diabetic retinopathy detection is 98.6% and 91.9% for the APTOS and MESSIDOR datasets, respectively. The accuracies of exudate detection for these two datasets are 96.9% and 98.3%, respectively. The accuracy of the DR screening system is improved due to the precise retinal image segmentation. Full article
Show Figures

Figure 1

20 pages, 3764 KiB  
Article
A Real-Time Computer Vision Based Approach to Detection and Classification of Traffic Incidents
by Mohammed Imran Basheer Ahmed, Rim Zaghdoud, Mohammed Salih Ahmed, Razan Sendi, Sarah Alsharif, Jomana Alabdulkarim, Bashayr Adnan Albin Saad, Reema Alsabt, Atta Rahman and Gomathi Krishnasamy
Big Data Cogn. Comput. 2023, 7(1), 22; https://doi.org/10.3390/bdcc7010022 - 28 Jan 2023
Cited by 22 | Viewed by 6696
Abstract
To constructively ameliorate and enhance traffic safety measures in Saudi Arabia, a prolific number of AI (Artificial Intelligence) traffic surveillance technologies have emerged, including Saher, throughout the past years. However, rapidly detecting a vehicle incident can play a cardinal role in ameliorating the [...] Read more.
To constructively ameliorate and enhance traffic safety measures in Saudi Arabia, a prolific number of AI (Artificial Intelligence) traffic surveillance technologies have emerged, including Saher, throughout the past years. However, rapidly detecting a vehicle incident can play a cardinal role in ameliorating the response speed of incident management, which in turn minimizes road injuries that have been induced by the accident’s occurrence. To attain a permeating effect in increasing the entailed demand for road traffic security and safety, this paper presents a real-time traffic incident detection and alert system that is based on a computer vision approach. The proposed framework consists of three models, each of which is integrated within a prototype interface to fully visualize the system’s overall architecture. To begin, the vehicle detection and tracking model utilized the YOLOv5 object detector with the DeepSORT tracker to detect and track the vehicles’ movements by allocating a unique identification number (ID) to each vehicle. This model attained a mean average precision (mAP) of 99.2%. Second, a traffic accident and severity classification model attained a mAP of 83.3% while utilizing the YOLOv5 algorithm to accurately detect and classify an accident’s severity level, sending an immediate alert message to the nearest hospital if a severe accident has taken place. Finally, the ResNet152 algorithm was utilized to detect the ignition of a fire following the accident’s occurrence; this model achieved an accuracy rate of 98.9%, with an automated alert being sent to the fire station if this perilous event occurred. This study employed an innovative parallel computing technique for reducing the overall complexity and inference time of the AI-based system to run the proposed system in a concurrent and parallel manner. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

18 pages, 3402 KiB  
Article
X-Wines: A Wine Dataset for Recommender Systems and Machine Learning
by Rogério Xavier de Azambuja, A. Jorge Morais and Vítor Filipe
Big Data Cogn. Comput. 2023, 7(1), 20; https://doi.org/10.3390/bdcc7010020 - 22 Jan 2023
Cited by 3 | Viewed by 5711
Abstract
In the current technological scenario of artificial intelligence growth, especially using machine learning, large datasets are necessary. Recommender systems appear with increasing frequency with different techniques for information filtering. Few large wine datasets are available for use with wine recommender systems. This work [...] Read more.
In the current technological scenario of artificial intelligence growth, especially using machine learning, large datasets are necessary. Recommender systems appear with increasing frequency with different techniques for information filtering. Few large wine datasets are available for use with wine recommender systems. This work presents X-Wines, a new and consistent wine dataset containing 100,000 instances and 21 million real evaluations carried out by users. Data were collected on the open Web in 2022 and pre-processed for wider free use. They refer to the scale 1–5 ratings carried out over a period of 10 years (2012–2021) for wines produced in 62 different countries. A demonstration of some applications using X-Wines in the scope of recommender systems with deep learning algorithms is also presented. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 975 KiB  
Article
Federated Learning to Safeguard Patients Data: A Medical Image Retrieval Case
by Gurtaj Singh, Vincenzo Violi and Marco Fisichella
Big Data Cogn. Comput. 2023, 7(1), 18; https://doi.org/10.3390/bdcc7010018 - 18 Jan 2023
Cited by 9 | Viewed by 2824
Abstract
Healthcare data are distributed and confidential, making it difficult to use centralized automatic diagnostic techniques. For example, different hospitals hold the electronic health records (EHRs) of different patient populations; however, transferring this data between hospitals is difficult due to the sensitive nature of [...] Read more.
Healthcare data are distributed and confidential, making it difficult to use centralized automatic diagnostic techniques. For example, different hospitals hold the electronic health records (EHRs) of different patient populations; however, transferring this data between hospitals is difficult due to the sensitive nature of the information. This presents a significant obstacle to the development of efficient and generalizable analytical methods that require a large amount of diverse Big Data. Federated learning allows multiple institutions to work together to develop a machine learning algorithm without sharing their data. We conducted a systematic study to analyze the current state of FL in the healthcare industry and explore both the limitations of this technology and its potential. Organizations share the parameters of their models with each other. This allows them to reap the benefits of a model developed with a richer data set while protecting the confidentiality of their data. Standard methods for large-scale machine learning, distributed optimization, and privacy-friendly data analytics need to be fundamentally rethought to address the new problems posed by training on diverse networks that may contain large amounts of data. In this article, we discuss the particular qualities and difficulties of federated learning, provide a comprehensive overview of current approaches, and outline several directions for future work that are relevant to a variety of research communities. These issues are important to many different research communities. Full article
(This article belongs to the Special Issue Artificial Intelligence for Online Safety)
Show Figures

Figure 1

24 pages, 2389 KiB  
Article
The Extended Digital Maturity Model
by Tining Haryanti, Nur Aini Rakhmawati and Apol Pribadi Subriadi
Big Data Cogn. Comput. 2023, 7(1), 17; https://doi.org/10.3390/bdcc7010017 - 17 Jan 2023
Cited by 9 | Viewed by 8020
Abstract
The Digital Transformation (DX) potentially affects productivity and efficiency while offering high risks to organizations. Necessary frameworks and tools to help organizations navigate such radical changes are needed. An extended framework of DMM is presented through a comparative analysis of various digital maturity [...] Read more.
The Digital Transformation (DX) potentially affects productivity and efficiency while offering high risks to organizations. Necessary frameworks and tools to help organizations navigate such radical changes are needed. An extended framework of DMM is presented through a comparative analysis of various digital maturity models and qualitative approaches through expert feedback. The maturity level determination uses the Emprise test of the international standard ISO/IEC Assessment known as SPICE. This research reveals seven interrelated dimensions for supporting the success of DX as a form of development of an existing Maturity Model. The DX–Self Assessment Maturity Model (DX-SAMM) is built to guide organizations by providing a broad roadmap for improving digital maturity. This article presents a digital maturity model from a holistic point of view and meets the criteria for assessment maturity. The case study results show that DX-SAMM can identify DX maturity levels while providing roadmap recommendations for increasing maturity levels in every aspect of its dimensions. It offers practical implications for improving maturity levels and the ease of real-time monitoring and evaluating digital maturity. With the development of maturity measurement, DX-SAMM contributes to the sustainability of the organization by proposing DX strategies in the future based on the current maturity achievements. Full article
(This article belongs to the Special Issue Human Factor in Information Systems Development and Management)
Show Figures

Figure 1

31 pages, 732 KiB  
Systematic Review
Bias and Unfairness in Machine Learning Models: A Systematic Review on Datasets, Tools, Fairness Metrics, and Identification and Mitigation Methods
by Tiago P. Pagano, Rafael B. Loureiro, Fernanda V. N. Lisboa, Rodrigo M. Peixoto, Guilherme A. S. Guimarães, Gustavo O. R. Cruz, Maira M. Araujo, Lucas L. Santos, Marco A. S. Cruz, Ewerton L. S. Oliveira, Ingrid Winkler and Erick G. S. Nascimento
Big Data Cogn. Comput. 2023, 7(1), 15; https://doi.org/10.3390/bdcc7010015 - 13 Jan 2023
Cited by 18 | Viewed by 14164
Abstract
One of the difficulties of artificial intelligence is to ensure that model decisions are fair and free of bias. In research, datasets, metrics, techniques, and tools are applied to detect and mitigate algorithmic unfairness and bias. This study examines the current knowledge on [...] Read more.
One of the difficulties of artificial intelligence is to ensure that model decisions are fair and free of bias. In research, datasets, metrics, techniques, and tools are applied to detect and mitigate algorithmic unfairness and bias. This study examines the current knowledge on bias and unfairness in machine learning models. The systematic review followed the PRISMA guidelines and is registered on OSF plataform. The search was carried out between 2021 and early 2022 in the Scopus, IEEE Xplore, Web of Science, and Google Scholar knowledge bases and found 128 articles published between 2017 and 2022, of which 45 were chosen based on search string optimization and inclusion and exclusion criteria. We discovered that the majority of retrieved works focus on bias and unfairness identification and mitigation techniques, offering tools, statistical approaches, important metrics, and datasets typically used for bias experiments. In terms of the primary forms of bias, data, algorithm, and user interaction were addressed in connection to the preprocessing, in-processing, and postprocessing mitigation methods. The use of Equalized Odds, Opportunity Equality, and Demographic Parity as primary fairness metrics emphasizes the crucial role of sensitive attributes in mitigating bias. The 25 datasets chosen span a wide range of areas, including criminal justice image enhancement, finance, education, product pricing, and health, with the majority including sensitive attributes. In terms of tools, Aequitas is the most often referenced, yet many of the tools were not employed in empirical experiments. A limitation of current research is the lack of multiclass and multimetric studies, which are found in just a few works and constrain the investigation to binary-focused method. Furthermore, the results indicate that different fairness metrics do not present uniform results for a given use case, and that more research with varied model architectures is necessary to standardize which ones are more appropriate for a given context. We also observed that all research addressed the transparency of the algorithm, or its capacity to explain how decisions are taken. Full article
Show Figures

Figure 1

28 pages, 4508 KiB  
Article
Big Data Analytics Applications in Information Management Driving Operational Efficiencies and Decision-Making: Mapping the Field of Knowledge with Bibliometric Analysis Using R
by Konstantina Ragazou, Ioannis Passas, Alexandros Garefalakis, Emilios Galariotis and Constantin Zopounidis
Big Data Cogn. Comput. 2023, 7(1), 13; https://doi.org/10.3390/bdcc7010013 - 12 Jan 2023
Cited by 10 | Viewed by 7125
Abstract
Organizations may examine both past and present data with the aid of information management, giving them access to all the knowledge they need to make sound strategic choices. For the majority of contemporary enterprises, using data to make relevant, valid, and timely choices [...] Read more.
Organizations may examine both past and present data with the aid of information management, giving them access to all the knowledge they need to make sound strategic choices. For the majority of contemporary enterprises, using data to make relevant, valid, and timely choices has become a must for success. The volume and format of data have changed significantly over the past few years as a result of the development of new technologies and applications, but there are also impressive possibilities for their analysis and processing. This study offers a bibliometric analysis of 650 publications written by 1977 academics on the use of information management and big data analytics. The Bibliometrix function in the R package and VOSviewer program were used to obtain the bibliographic data from the Scopus database and to analyze it. Based on citation analysis criteria, the top research journals, authors, and organizations were identified. The cooperation network at the author level reveals the connections between academics throughout the world, and Multiple Correspondence Analysis (MCA) identifies the research holes in the area. The recommendations for further study are influenced by the findings. Full article
Show Figures

Figure 1

20 pages, 698 KiB  
Review
Artificial Intelligence in Pharmaceutical and Healthcare Research
by Subrat Kumar Bhattamisra, Priyanka Banerjee, Pratibha Gupta, Jayashree Mayuren, Susmita Patra and Mayuren Candasamy
Big Data Cogn. Comput. 2023, 7(1), 10; https://doi.org/10.3390/bdcc7010010 - 11 Jan 2023
Cited by 17 | Viewed by 21330
Abstract
Artificial intelligence (AI) is a branch of computer science that allows machines to work efficiently, can analyze complex data. The research focused on AI has increased tremendously, and its role in healthcare service and research is emerging at a greater pace. This review [...] Read more.
Artificial intelligence (AI) is a branch of computer science that allows machines to work efficiently, can analyze complex data. The research focused on AI has increased tremendously, and its role in healthcare service and research is emerging at a greater pace. This review elaborates on the opportunities and challenges of AI in healthcare and pharmaceutical research. The literature was collected from domains such as PubMed, Science Direct and Google scholar using specific keywords and phrases such as ‘Artificial intelligence’, ‘Pharmaceutical research’, ‘drug discovery’, ‘clinical trial’, ‘disease diagnosis’, etc. to select the research and review articles published within the last five years. The application of AI in disease diagnosis, digital therapy, personalized treatment, drug discovery and forecasting epidemics or pandemics was extensively reviewed in this article. Deep learning and neural networks are the most used AI technologies; Bayesian nonparametric models are the potential technologies for clinical trial design; natural language processing and wearable devices are used in patient identification and clinical trial monitoring. Deep learning and neural networks were applied in predicting the outbreak of seasonal influenza, Zika, Ebola, Tuberculosis and COVID-19. With the advancement of AI technologies, the scientific community may witness rapid and cost-effective healthcare and pharmaceutical research as well as provide improved service to the general public. Full article
Show Figures

Figure 1

17 pages, 1267 KiB  
Review
Impact of Artificial Intelligence on COVID-19 Pandemic: A Survey of Image Processing, Tracking of Disease, Prediction of Outcomes, and Computational Medicine
by Khaled H. Almotairi, Ahmad MohdAziz Hussein, Laith Abualigah, Sohaib K. M. Abujayyab, Emad Hamdi Mahmoud, Bassam Omar Ghanem and Amir H. Gandomi
Big Data Cogn. Comput. 2023, 7(1), 11; https://doi.org/10.3390/bdcc7010011 - 11 Jan 2023
Cited by 13 | Viewed by 7139
Abstract
Integrating machine learning technologies into artificial intelligence (AI) is at the forefront of the scientific and technological tools employed to combat the COVID-19 pandemic. This study assesses different uses and deployments of modern technology for combating the COVID-19 pandemic at various levels, such [...] Read more.
Integrating machine learning technologies into artificial intelligence (AI) is at the forefront of the scientific and technological tools employed to combat the COVID-19 pandemic. This study assesses different uses and deployments of modern technology for combating the COVID-19 pandemic at various levels, such as image processing, tracking of disease, prediction of outcomes, and computational medicine. The results prove that computerized tomography (CT) scans help to diagnose patients infected by COVID-19. This includes two-sided, multilobar ground glass opacification (GGO) by a posterior distribution or peripheral, primarily in the lower lobes, and fewer recurrences in the intermediate lobe. An extensive search of modern technology databases relating to COVID-19 was undertaken. Subsequently, a review of the extracted information from the database search looked at how technology can be employed to tackle the pandemic. We discussed the technological advancements deployed to alleviate the communicability and effect of the pandemic. Even though there are many types of research on the use of technology in combating COVID-19, the application of technology in combating COVID-19 is still not yet fully explored. In addition, we suggested some open research issues and challenges in deploying AI technology to combat the global pandemic. Full article
Show Figures

Figure 1

20 pages, 1799 KiB  
Article
An Information System Supporting Insurance Use Cases by Automated Anomaly Detection
by Thoralf Reis, Alexander Kreibich, Sebastian Bruchhaus, Thomas Krause, Florian Freund, Marco X. Bornschlegl and Matthias L. Hemmje
Big Data Cogn. Comput. 2023, 7(1), 4; https://doi.org/10.3390/bdcc7010004 - 28 Dec 2022
Cited by 2 | Viewed by 2719
Abstract
The increasing availability of vast quantities of data from various sources significantly impacts the insurance industry, although this industry has always been data driven. It accelerates manual processes and enables new products or business models. On the other hand, it also burdens insurance [...] Read more.
The increasing availability of vast quantities of data from various sources significantly impacts the insurance industry, although this industry has always been data driven. It accelerates manual processes and enables new products or business models. On the other hand, it also burdens insurance analysts and other users that need to cope with this development parallel to other global changes. A novel information system (IS) for artificial intelligence (AI)-supported big data analysis, introduced within this paper, shall help to overcome user overload and to empower human data analysts in the insurance industry. The IS research’s focus lies neither in novel algorithms nor datasets but in concepts that combine AI and big data analysis for synergies, such as usability enhancements. For this purpose, this paper systematically designs and implements an AI2VIS4BigData reference model to help information systems conform to automatically detect anomalies and increase its users’ confidence and efficiency. Practical relevance is assured by an interview with an insurance analyst to verify the demand for the developed system and derive all requirements from two insurance industry user stories. A core contribution is the introduction of the IS. Another significant contribution is an extension of the AI2VIS4BigData service-based architecture and user interface (UI) concept on AI and machine learning (ML)-based user empowerment and data transformation. The implemented prototype was applied to synthetic data to enable the evaluation of the system. The quantitative and qualitative evaluations confirm the system’s usability and applicability to the insurance domain yet reveal the need for improvements toward bigger quantities of data and further evaluations with a more extensive user group. Full article
Show Figures

Figure 1

20 pages, 1359 KiB  
Article
A Scientific Perspective on Using Artificial Intelligence in Sustainable Urban Development
by Emanuel Rieder, Matthias Schmuck and Alexandru Tugui
Big Data Cogn. Comput. 2023, 7(1), 3; https://doi.org/10.3390/bdcc7010003 - 20 Dec 2022
Cited by 6 | Viewed by 3753
Abstract
Digital transformation (or digitalization) is the process of continuous further development of digital technologies (such as smart devices, cloud services, and Big Data) that have a lasting impact on our economy and society. In this manner, digitalization is a huge driver for permanent [...] Read more.
Digital transformation (or digitalization) is the process of continuous further development of digital technologies (such as smart devices, cloud services, and Big Data) that have a lasting impact on our economy and society. In this manner, digitalization is a huge driver for permanent change, even in the field of Sustainable Urban Development. In the wake of digitalization, expectations are changing, placing pressure at the societal level on the design and development of smart environments for everything that means Sustainable Urban Development. In this sense, the solution is the integration of Artificial Intelligence into Sustainable Urban Development, because technology can simplify people’s lives. The aim of this paper is to ascertain which Sustainable Urban Development dimensions are taken into account when integrating Artificial Intelligence and what results can be achieved. These questions formed the basic framework for this research article. In order to make the current state of Artificial Intelligence in Sustainable Urban Development as a snapshot visible, a systematic review of the current literature between 2012 and 2022 was conducted. The data were collected and analyzed using PRISMA. Based on the studies identified, we found a significant growth in studies, starting in 2018, and that Artificial Intelligence applications refer to the Sustainable Urban Development dimensions of environmental protection, economic development, social justice and equity, culture, and governance. The used Artificial Intelligence techniques in Sustainable Urban Development cover a broad field of Artificial Intelligence, such as Artificial Intelligence in general, Machine Learning, Deep Learning, Artificial Neuronal Networks, Operations Research, Predictive Analytics, and Data Mining. However, with the integration of Artificial Intelligence in Sustainable Urban Development, challenges are marked out. These include responsible municipal policies, awareness of data quality, privacy and data security, the formation of partnerships among stakeholders (e.g., local citizens, civil society, industry, and various levels of government), and transparency and traceability in the implementation and rollout of Artificial Intelligence. A first step was taken towards providing an overview of the possible applications of Artificial Intelligence in Sustainable Urban Development. It was clearly shown that Artificial Intelligence is also gaining ground in this sector. Full article
Show Figures

Figure 1

20 pages, 1982 KiB  
Article
Using an Evidence-Based Approach for Policy-Making Based on Big Data Analysis and Applying Detection Techniques on Twitter
by Somayeh Labafi, Sanee Ebrahimzadeh, Mohamad Mahdi Kavousi, Habib Abdolhossein Maregani and Samad Sepasgozar
Big Data Cogn. Comput. 2022, 6(4), 160; https://doi.org/10.3390/bdcc6040160 - 19 Dec 2022
Viewed by 2535
Abstract
Evidence-based policy seeks to use evidence in public policy in a systematic way in a bid to improve decision-making quality. Evidence-based policy cannot work properly and achieve the expected results without accurate, appropriate, and sufficient evidence. Given the prevalence of social media and [...] Read more.
Evidence-based policy seeks to use evidence in public policy in a systematic way in a bid to improve decision-making quality. Evidence-based policy cannot work properly and achieve the expected results without accurate, appropriate, and sufficient evidence. Given the prevalence of social media and intense user engagement, the question to ask is whether the data on social media can be used as evidence in the policy-making process. The question gives rise to the debate on what characteristics of data should be considered as evidence. Despite the numerous research studies carried out on social media analysis or policy-making, this domain has not been dealt with through an “evidence detection” lens. Thus, this study addresses the gap in the literature on how to analyze the big text data produced by social media and how to use it for policy-making based on evidence detection. The present paper seeks to fill the gap by developing and offering a model that can help policy-makers to distinguish “evidence” from “non-evidence”. To do so, in the first phase of the study, the researchers elicited the characteristics of the “evidence” by conducting a thematic analysis of semi-structured interviews with experts and policy-makers. In the second phase, the developed model was tested against 6-month data elicited from Twitter accounts. The experimental results show that the evidence detection model performed better with decision tree (DT) than the other algorithms. Decision tree (DT) outperformed the other algorithms by an 85.9% accuracy score. This study shows how the model managed to fulfill the aim of the present study, which was detecting Twitter posts that can be used as evidence. This study contributes to the body of knowledge by exploring novel models of text processing and offering an efficient method for analyzing big text data. The practical implication of the study also lies in its efficiency and ease of use, which offers the required evidence for policy-makers. Full article
Show Figures

Figure 1

13 pages, 2240 KiB  
Article
Proposal of Decentralized P2P Service Model for Transfer between Blockchain-Based Heterogeneous Cryptocurrencies and CBDCs
by Keundug Park and Heung-Youl Youm
Big Data Cogn. Comput. 2022, 6(4), 159; https://doi.org/10.3390/bdcc6040159 - 19 Dec 2022
Cited by 5 | Viewed by 2777
Abstract
This paper proposes a solution to the transfer problem between blockchain-based heterogeneous cryptocurrencies and CBDCs, with research derived from an analysis of the existing literature. Interoperability between heterogeneous blockchains has been an obstacle to service diversity and user convenience. Many types of cryptocurrencies [...] Read more.
This paper proposes a solution to the transfer problem between blockchain-based heterogeneous cryptocurrencies and CBDCs, with research derived from an analysis of the existing literature. Interoperability between heterogeneous blockchains has been an obstacle to service diversity and user convenience. Many types of cryptocurrencies are currently trading on the market, and many countries are researching and testing central bank digital currencies (CBDCs). In this paper, existing interoperability studies and solutions between heterogeneous blockchains and differences from the proposed service model are described. To enhance digital financial services and improve user convenience, transfer between heterogeneous cryptocurrencies, transfer between heterogeneous CBDCs, and transfer between cryptocurrency and CBDC should be required. This paper proposes an interoperable architecture between heterogeneous blockchains, and a decentralized peer-to-peer (P2P) service model based on the interoperable architecture for transferring between blockchain-based heterogeneous cryptocurrencies and CBDCs. Security threats to the proposed service model are identified and security requirements to prevent the identified security threats are specified. The mentioned security threats and security requirements should be considered when implementing the proposed service model. Full article
Show Figures

Figure 1

19 pages, 784 KiB  
Review
A Survey on Big Data in Pharmacology, Toxicology and Pharmaceutics
by Krithika Latha Bhaskaran, Richard Sakyi Osei, Evans Kotei, Eric Yaw Agbezuge, Carlos Ankora and Ernest D. Ganaa
Big Data Cogn. Comput. 2022, 6(4), 161; https://doi.org/10.3390/bdcc6040161 - 19 Dec 2022
Cited by 5 | Viewed by 2531
Abstract
Patients, hospitals, sensors, researchers, providers, phones, and healthcare organisations are producing enormous amounts of data in both the healthcare and drug detection sectors. The real challenge in these sectors is to find, investigate, manage, and collect information from patients in order to make [...] Read more.
Patients, hospitals, sensors, researchers, providers, phones, and healthcare organisations are producing enormous amounts of data in both the healthcare and drug detection sectors. The real challenge in these sectors is to find, investigate, manage, and collect information from patients in order to make their lives easier and healthier, not only in terms of formulating new therapies and understanding diseases, but also to predict the results at earlier stages and make effective decisions. The volumes of data available in the fields of pharmacology, toxicology, and pharmaceutics are constantly increasing. These increases are driven by advances in technology, which allow for the analysis of ever-larger data sets. Big Data (BD) has the potential to transform drug development and safety testing by providing new insights into the effects of drugs on human health. However, harnessing this potential involves several challenges, including the need for specialised skills and infrastructure. In this survey, we explore how BD approaches are currently being used in the pharmacology, toxicology, and pharmaceutics fields; in particular, we highlight how researchers have applied BD in pharmacology, toxicology, and pharmaceutics to address various challenges and establish solutions. A comparative analysis helps to trace the implementation of big data in the fields of pharmacology, toxicology, and pharmaceutics. Certain relevant limitations and directions for future research are emphasised. The pharmacology, toxicology, and pharmaceutics fields are still at an early stage of BD adoption, and there are many research challenges to be overcome, in order to effectively employ BD to address specific issues. Full article
Show Figures

Figure 1

29 pages, 464 KiB  
Article
What Is (Not) Big Data Based on Its 7Vs Challenges: A Survey
by Cristian González García and Eva Álvarez-Fernández
Big Data Cogn. Comput. 2022, 6(4), 158; https://doi.org/10.3390/bdcc6040158 - 14 Dec 2022
Cited by 1 | Viewed by 3015
Abstract
Big Data has changed how enterprises and people manage knowledge and make decisions. However, when talking about Big Data, so many times there are different definitions about what it is and what it is used for, as there are many interpretations and disagreements. [...] Read more.
Big Data has changed how enterprises and people manage knowledge and make decisions. However, when talking about Big Data, so many times there are different definitions about what it is and what it is used for, as there are many interpretations and disagreements. For these reasons, we have reviewed the literature to compile and provide a possible solution to the existing discrepancies between the terms Data Analysis, Data Mining, Knowledge Discovery in Databases, and Big Data. In addition, we have gathered the patterns used in Data Mining, the different phases of Knowledge Discovery in Databases, and some definitions of Big Data according to some important companies and organisations. Moreover, Big Data has challenges that sometimes are the same as its own characteristics. These characteristics are known as the Vs. Nonetheless, depending on the author, these Vs can be more or less, from 3 to 5, or even 7. Furthermore, the 4Vs or 5Vs are not the same every time. Therefore, in this survey, we reviewed the literature to explain how many Vs have been detected and explained according to different existing problems. In addition, we detected 7Vs, three of which had subtypes. Full article
23 pages, 735 KiB  
Review
Explore Big Data Analytics Applications and Opportunities: A Review
by Zaher Ali Al-Sai, Mohd Heikal Husin, Sharifah Mashita Syed-Mohamad, Rasha Moh’d Sadeq Abdin, Nour Damer, Laith Abualigah and Amir H. Gandomi
Big Data Cogn. Comput. 2022, 6(4), 157; https://doi.org/10.3390/bdcc6040157 - 14 Dec 2022
Cited by 11 | Viewed by 7070
Abstract
Big data applications and analytics are vital in proposing ultimate strategic decisions. The existing literature emphasizes that big data applications and analytics can empower those who apply Big Data Analytics during the COVID-19 pandemic. This paper reviews the existing literature specializing in big [...] Read more.
Big data applications and analytics are vital in proposing ultimate strategic decisions. The existing literature emphasizes that big data applications and analytics can empower those who apply Big Data Analytics during the COVID-19 pandemic. This paper reviews the existing literature specializing in big data applications pre and peri-COVID-19. A comparison between Pre and Peri of the pandemic for using Big Data applications is presented. The comparison is expanded to four highly recognized industry fields: Healthcare, Education, Transportation, and Banking. A discussion on the effectiveness of the four major types of data analytics across the mentioned industries is highlighted. Hence, this paper provides an illustrative description of the importance of big data applications in the era of COVID-19, as well as aligning the applications to their relevant big data analytics models. This review paper concludes that applying the ultimate big data applications and their associated data analytics models can harness the significant limitations faced by organizations during one of the most fateful pandemics worldwide. Future work will conduct a systematic literature review and a comparative analysis of the existing Big Data Systems and models. Moreover, future work will investigate the critical challenges of Big Data Analytics and applications during the COVID-19 pandemic. Full article
Show Figures

Figure 1

31 pages, 6664 KiB  
Review
Machine Learning Styles for Diabetic Retinopathy Detection: A Review and Bibliometric Analysis
by Shyamala Subramanian, Sashikala Mishra, Shruti Patil, Kailash Shaw and Ebrahim Aghajari
Big Data Cogn. Comput. 2022, 6(4), 154; https://doi.org/10.3390/bdcc6040154 - 12 Dec 2022
Cited by 7 | Viewed by 5877
Abstract
Diabetic retinopathy (DR) is a medical condition caused by diabetes. The development of retinopathy significantly depends on how long a person has had diabetes. Initially, there may be no symptoms or just a slight vision problem due to impairment of the retinal blood [...] Read more.
Diabetic retinopathy (DR) is a medical condition caused by diabetes. The development of retinopathy significantly depends on how long a person has had diabetes. Initially, there may be no symptoms or just a slight vision problem due to impairment of the retinal blood vessels. Later, it may lead to blindness. Recognizing the early clinical signs of DR is very important for intervening in and effectively treating DR. Thus, regular eye check-ups are necessary to direct the person to a doctor for a comprehensive ocular examination and treatment as soon as possible to avoid permanent vision loss. Nevertheless, due to limited resources, it is not feasible for screening. As a result, emerging technologies, such as artificial intelligence, for the automatic detection and classification of DR are alternative screening methodologies and thereby make the system cost-effective. People have been working on artificial-intelligence-based technologies to detect and analyze DR in recent years. This study aimed to investigate different machine learning styles that are chosen for diagnosing retinopathy. Thus, a bibliometric analysis was systematically done to discover different machine learning styles for detecting diabetic retinopathy. The data were exported from popular databases, namely, Web of Science (WoS) and Scopus. These data were analyzed using Biblioshiny and VOSviewer in terms of publications, top countries, sources, subject area, top authors, trend topics, co-occurrences, thematic evolution, factorial map, citation analysis, etc., which form the base for researchers to identify the research gaps in diabetic retinopathy detection and classification. Full article
Show Figures

Figure 1

17 pages, 1860 KiB  
Article
Explaining Exploration–Exploitation in Humans
by Antonio Candelieri, Andrea Ponti and Francesco Archetti
Big Data Cogn. Comput. 2022, 6(4), 155; https://doi.org/10.3390/bdcc6040155 - 12 Dec 2022
Cited by 1 | Viewed by 1494
Abstract
Human as well as algorithmic searches are performed to balance exploration and exploitation. The search task in this paper is the global optimization of a 2D multimodal function, unknown to the searcher. Thus, the task presents the following features: (i) uncertainty [...] Read more.
Human as well as algorithmic searches are performed to balance exploration and exploitation. The search task in this paper is the global optimization of a 2D multimodal function, unknown to the searcher. Thus, the task presents the following features: (i) uncertainty (i.e., information about the function can be acquired only through function observations), (ii) sequentiality (i.e., the choice of the next point to observe depends on the previous ones), and (iii) limited budget (i.e., a maximum number of sequential choices allowed to the players). The data about human behavior are gathered through a gaming app whose screen represents all the possible locations the player can click on. The associated value of the unknown function is shown to the player. Experimental data are gathered from 39 subjects playing 10 different tasks each. Decisions are analyzed in a Pareto optimality setting—improvement vs. uncertainty. The experimental results show that the most significant deviations from the Pareto rationality are associated with a behavior named “exasperated exploration”, close to random search. This behavior shows a statistically significant association with stressful situations occurring when, according to their current belief, the human feels there are no chances to improve over the best value observed so far, while the remaining budget is running out. To classify between Pareto and Not-Pareto decisions, an explainable/interpretable Machine Learning model based on Decision Tree learning is developed. The resulting model is used to implement a synthetic human searcher/optimizer successively compared against Bayesian Optimization. On half of the test problems, the synthetic human results as more effective and efficient. Full article
Show Figures

Figure 1

22 pages, 2447 KiB  
Article
An Advanced Big Data Quality Framework Based on Weighted Metrics
by Widad Elouataoui, Imane El Alaoui, Saida El Mendili and Youssef Gahi
Big Data Cogn. Comput. 2022, 6(4), 153; https://doi.org/10.3390/bdcc6040153 - 09 Dec 2022
Cited by 7 | Viewed by 2800
Abstract
While big data benefits are numerous, the use of big data requires, however, addressing new challenges related to data processing, data security, and especially degradation of data quality. Despite the increased importance of data quality for big data, data quality measurement is actually [...] Read more.
While big data benefits are numerous, the use of big data requires, however, addressing new challenges related to data processing, data security, and especially degradation of data quality. Despite the increased importance of data quality for big data, data quality measurement is actually limited to few metrics. Indeed, while more than 50 data quality dimensions have been defined in the literature, the number of measured dimensions is limited to 11 dimensions. Therefore, this paper aims to extend the measured dimensions by defining four new data quality metrics: Integrity, Accessibility, Ease of manipulation, and Security. Thus, we propose a comprehensive Big Data Quality Assessment Framework based on 12 metrics: Completeness, Timeliness, Volatility, Uniqueness, Conformity, Consistency, Ease of manipulation, Relevancy, Readability, Security, Accessibility, and Integrity. In addition, to ensure accurate data quality assessment, we apply data weights at three data unit levels: data fields, quality metrics, and quality aspects. Furthermore, we define and measure five quality aspects to provide a macro-view of data quality. Finally, an experiment is performed to implement the defined measures. The results show that the suggested methodology allows a more exhaustive and accurate big data quality assessment, with a more extensive methodology defining a weighted quality score based on 12 metrics and achieving a best quality model score of 9/10. Full article
Show Figures

Figure 1

21 pages, 3495 KiB  
Article
Innovative Business Process Reengineering Adoption: Framework of Big Data Sentiment, Improving Customers’ Service Level Agreement
by Heru Susanto, Aida Sari and Fang-Yie Leu
Big Data Cogn. Comput. 2022, 6(4), 151; https://doi.org/10.3390/bdcc6040151 - 08 Dec 2022
Cited by 2 | Viewed by 2350
Abstract
Social media is now regarded as the most valuable source of data for trend analysis and innovative business process reengineering preferences. Data made accessible through social media can be utilized for a variety of purposes, such as by an entrepreneur who wants to [...] Read more.
Social media is now regarded as the most valuable source of data for trend analysis and innovative business process reengineering preferences. Data made accessible through social media can be utilized for a variety of purposes, such as by an entrepreneur who wants to learn more about the market they intend to enter and uncover their consumers’ requirements before launching their new products or services. Sentiment analysis and text mining of telecommunication businesses via social media posts and comments are the subject of this study. A proposed framework will be utilized as a guideline, and it will be tested for sentiment analysis. Lexicon-based sentiment categorization is used as a model training dataset for a supervised machine learning support vector machine. The result is very promising. The accuracy and the quantity of the true sentiments it can detect are compared. This result signifies the usefulness of text mining and sentiment analysis on social media data, while the use of machine learning classifiers for predicting sentiment orientation provides a useful tool for operations and marketing departments. The availability of large amounts of data in this digitally active society is advantageous for sectors such as the telecommunication industry. These companies can be two steps ahead with their strategy and develop a more cohesive company that can make customers happier and mitigate problems easily with the use of text mining and sentiment analysis for further adopting innovative business process reengineering for service improvements within the telecommunications industry. Full article
(This article belongs to the Special Issue Advanced Data Mining Techniques for IoT and Big Data)
Show Figures

Figure 1

41 pages, 6572 KiB  
Review
A Systematic Literature Review on Diabetic Retinopathy Using an Artificial Intelligence Approach
by Pooja Bidwai, Shilpa Gite, Kishore Pahuja and Ketan Kotecha
Big Data Cogn. Comput. 2022, 6(4), 152; https://doi.org/10.3390/bdcc6040152 - 08 Dec 2022
Cited by 9 | Viewed by 7862
Abstract
Diabetic retinopathy occurs due to long-term diabetes with changing blood glucose levels and has become the most common cause of vision loss worldwide. It has become a severe problem among the working-age group that needs to be solved early to avoid vision loss [...] Read more.
Diabetic retinopathy occurs due to long-term diabetes with changing blood glucose levels and has become the most common cause of vision loss worldwide. It has become a severe problem among the working-age group that needs to be solved early to avoid vision loss in the future. Artificial intelligence-based technologies have been utilized to detect and grade diabetic retinopathy at the initial level. Early detection allows for proper treatment and, as a result, eyesight complications can be avoided. The in-depth analysis now details the various methods for diagnosing diabetic retinopathy using blood vessels, microaneurysms, exudates, macula, optic discs, and hemorrhages. In most trials, fundus images of the retina are used, which are taken using a fundus camera. This survey discusses the basics of diabetes, its prevalence, complications, and artificial intelligence approaches to deal with the early detection and classification of diabetic retinopathy. The research also discusses artificial intelligence-based techniques such as machine learning and deep learning. New research fields such as transfer learning using generative adversarial networks, domain adaptation, multitask learning, and explainable artificial intelligence in diabetic retinopathy are also considered. A list of existing datasets, screening systems, performance measurements, biomarkers in diabetic retinopathy, potential issues, and challenges faced in ophthalmology, followed by the future scope conclusion, is discussed. To the author, no other literature has analyzed recent state-of-the-art techniques considering the PRISMA approach and artificial intelligence as the core. Full article
Show Figures

Figure 1

16 pages, 4442 KiB  
Article
Yolov5 Series Algorithm for Road Marking Sign Identification
by Christine Dewi, Rung-Ching Chen, Yong-Cun Zhuang and Henoch Juli Christanto
Big Data Cogn. Comput. 2022, 6(4), 149; https://doi.org/10.3390/bdcc6040149 - 07 Dec 2022
Cited by 8 | Viewed by 4575
Abstract
Road markings and signs provide vehicles and pedestrians with essential information that assists them to follow the traffic regulations. Road surface markings include pedestrian crossings, directional arrows, zebra crossings, speed limit signs, other similar signs and text, and so on, which are usually [...] Read more.
Road markings and signs provide vehicles and pedestrians with essential information that assists them to follow the traffic regulations. Road surface markings include pedestrian crossings, directional arrows, zebra crossings, speed limit signs, other similar signs and text, and so on, which are usually painted directly onto the road surface. Road markings fulfill a variety of important functions, such as alerting drivers to the potentially hazardous road section, directing traffic, prohibiting certain actions, and slowing down. This research paper provides a summary of the Yolov5 algorithm series for road marking sign identification, which includes Yolov5s, Yolov5m, Yolov5n, Yolov5l, and Yolov5x. This study explores a wide range of contemporary object detectors, such as the ones that are used to determine the location of road marking signs. Performance metrics monitor important data, including the quantity of BFLOPS, the mean average precision (mAP), and the detection time (IoU). Our findings shows that Yolov5m is the most stable method compared to other methods with 76% precision, 86% recall, and 83% mAP during the training stage. Moreover, Yolov5m and Yolov5l achieve the highest score, mAP 87% on average in the testing stage. In addition, we have created a new dataset for road marking signs in Taiwan, called TRMSD. Full article
(This article belongs to the Special Issue Computational Collective Intelligence with Big Data–AI Society)
Show Figures

Figure 1

13 pages, 879 KiB  
Article
Trust-Based Data Communication in Wireless Body Area Network for Healthcare Applications
by Sangeetha Ramaswamy and Usha Devi Gandhi
Big Data Cogn. Comput. 2022, 6(4), 148; https://doi.org/10.3390/bdcc6040148 - 01 Dec 2022
Cited by 2 | Viewed by 1843
Abstract
A subset of Wireless Sensor Networks, Wireless Body Area Networks (WBAN) is an emerging technology. WBAN is a collection of tiny pieces of wireless body sensors with small computational capability, communicating short distances using ZigBee or Bluetooth, an application mainly in the healthcare [...] Read more.
A subset of Wireless Sensor Networks, Wireless Body Area Networks (WBAN) is an emerging technology. WBAN is a collection of tiny pieces of wireless body sensors with small computational capability, communicating short distances using ZigBee or Bluetooth, an application mainly in the healthcare industry like remote patient monitoring. The small piece of sensor monitors health factors like body temperature, pulse rate, ECG, heart rate, etc., and communicates to the base station or central coordinator for aggregation or data computation. The final data is communicated to remote monitoring devices through the internet or cloud service providers. The main challenge for this technology is energy consumption and secure communication within the network and the possibility of attacks executed by malicious nodes, creating problems for the network. This system proposes a suitable trust model for secure communication in WBAN based on node trust and data trust. Node trust is calculated using direct trust calculation and node behaviours. The data trust is calculated using consistent data success and data aging. The performance is compared with an existing protocol like Trust Evaluation (TE)-WBAN and Body Area Network (BAN)-Trust which is not a cryptographic technique. The protocol is lightweight and has low overhead. The performance is rated best for Throughput, Packet Delivery Ratio, and Minimum delay. With extensive simulation on-off attacks, Selfishness attacks, sleeper attacks, and Message suppression attacks were prevented. Full article
(This article belongs to the Special Issue Computational Collective Intelligence with Big Data–AI Society)
Show Figures

Figure 1

19 pages, 6928 KiB  
Article
Image Fundus Classification System for Diabetic Retinopathy Stage Detection Using Hybrid CNN-DELM
by Dian Candra Rini Novitasari, Fatmawati Fatmawati, Rimuljo Hendradi, Hetty Rohayani, Rinda Nariswari, Arnita Arnita, Moch Irfan Hadi, Rizal Amegia Saputra and Ardhin Primadewi
Big Data Cogn. Comput. 2022, 6(4), 146; https://doi.org/10.3390/bdcc6040146 - 01 Dec 2022
Cited by 7 | Viewed by 1987
Abstract
Diabetic retinopathy is the leading cause of blindness suffered by working-age adults. The increase in the population diagnosed with DR can be prevented by screening and early treatment of eye damage. This screening process can be conducted by utilizing deep learning techniques. In [...] Read more.
Diabetic retinopathy is the leading cause of blindness suffered by working-age adults. The increase in the population diagnosed with DR can be prevented by screening and early treatment of eye damage. This screening process can be conducted by utilizing deep learning techniques. In this study, the detection of DR severity was carried out using the hybrid CNN-DELM method (CDELM). The CNN architectures used were ResNet-18, ResNet-50, ResNet-101, GoogleNet, and DenseNet. The learning outcome features were further classified using the DELM algorithm. The comparison of CNN architecture aimed to find the best CNN architecture for fundus image features extraction. This research also compared the effect of using the kernel function on the performance of DELM in fundus image classification. All experiments using CDELM showed maximum results, with an accuracy of 100% in the DRIVE data and the two-class MESSIDOR data. Meanwhile, the best results obtained in the MESSIDOR 4 class data reached 98.20%. The advantage of the DELM method compared to the conventional CNN method is that the training time duration is much shorter. CNN takes an average of 30 min for training, while the CDELM method takes only an average of 2.5 min. Based on the value of accuracy and duration of training time, the CDELM method had better performance than the conventional CNN method. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

20 pages, 6697 KiB  
Article
Image Segmentation for Mitral Regurgitation with Convolutional Neural Network Based on UNet, Resnet, Vnet, FractalNet and SegNet: A Preliminary Study
by Linda Atika, Siti Nurmaini, Radiyati Umi Partan and Erwin Sukandi
Big Data Cogn. Comput. 2022, 6(4), 141; https://doi.org/10.3390/bdcc6040141 - 25 Nov 2022
Cited by 5 | Viewed by 2160
Abstract
The heart’s mitral valve is the valve that separates the chambers of the heart between the left atrium and left ventricle. Heart valve disease is a fairly common heart disease, and one type of heart valve disease is mitral regurgitation, which is an [...] Read more.
The heart’s mitral valve is the valve that separates the chambers of the heart between the left atrium and left ventricle. Heart valve disease is a fairly common heart disease, and one type of heart valve disease is mitral regurgitation, which is an abnormality of the mitral valve on the left side of the heart that causes an inability of the mitral valve to close properly. Convolutional Neural Network (CNN) is a type of deep learning that is suitable for use in image analysis. Segmentation is widely used in analyzing medical images because it can divide images into simpler ones to facilitate the analysis process by separating objects that are not analyzed into backgrounds and objects to be analyzed into foregrounds. This study builds a dataset from the data of patients with mitral regurgitation and patients who have normal hearts, and heart valve image analysis is done by segmenting the images of their mitral heart valves. Several types of CNN architecture were applied in this research, including U-Net, SegNet, V-Net, FractalNet, and ResNet architectures. The experimental results show that the best architecture is U-Net3 in terms of Pixel Accuracy (97.59%), Intersection over Union (86.98%), Mean Accuracy (93.46%), Precision (85.60%), Recall (88.39%), and Dice Coefficient (86.58%). Full article
(This article belongs to the Special Issue Advancements in Deep Learning and Deep Federated Learning Models)
Show Figures

Figure 1

13 pages, 367 KiB  
Article
PSO-Driven Feature Selection and Hybrid Ensemble for Network Anomaly Detection
by Maya Hilda Lestari Louk and Bayu Adhi Tama
Big Data Cogn. Comput. 2022, 6(4), 137; https://doi.org/10.3390/bdcc6040137 - 13 Nov 2022
Cited by 6 | Viewed by 1952
Abstract
As a system capable of monitoring and evaluating illegitimate network access, an intrusion detection system (IDS) profoundly impacts information security research. Since machine learning techniques constitute the backbone of IDS, it has been challenging to develop an accurate detection mechanism. This study aims [...] Read more.
As a system capable of monitoring and evaluating illegitimate network access, an intrusion detection system (IDS) profoundly impacts information security research. Since machine learning techniques constitute the backbone of IDS, it has been challenging to develop an accurate detection mechanism. This study aims to enhance the detection performance of IDS by using a particle swarm optimization (PSO)-driven feature selection approach and hybrid ensemble. Specifically, the final feature subsets derived from different IDS datasets, i.e., NSL-KDD, UNSW-NB15, and CICIDS-2017, are trained using a hybrid ensemble, comprising two well-known ensemble learners, i.e., gradient boosting machine (GBM) and bootstrap aggregation (bagging). Instead of training GBM with individual ensemble learning, we train GBM on a subsample of each intrusion dataset and combine the final class prediction using majority voting. Our proposed scheme led to pivotal refinements over existing baselines, such as TSE-IDS, voting ensembles, weighted majority voting, and other individual ensemble-based IDS such as LightGBM. Full article
Show Figures

Figure 1

24 pages, 718 KiB  
Review
An Overview of Data Warehouse and Data Lake in Modern Enterprise Data Management
by Athira Nambiar and Divyansh Mundra
Big Data Cogn. Comput. 2022, 6(4), 132; https://doi.org/10.3390/bdcc6040132 - 07 Nov 2022
Cited by 24 | Viewed by 27586
Abstract
Data is the lifeblood of any organization. In today’s world, organizations recognize the vital role of data in modern business intelligence systems for making meaningful decisions and staying competitive in the field. Efficient and optimal data analytics provides a competitive edge to its [...] Read more.
Data is the lifeblood of any organization. In today’s world, organizations recognize the vital role of data in modern business intelligence systems for making meaningful decisions and staying competitive in the field. Efficient and optimal data analytics provides a competitive edge to its performance and services. Major organizations generate, collect and process vast amounts of data, falling under the category of big data. Managing and analyzing the sheer volume and variety of big data is a cumbersome process. At the same time, proper utilization of the vast collection of an organization’s information can generate meaningful insights into business tactics. In this regard, two of the popular data management systems in the area of big data analytics (i.e., data warehouse and data lake) act as platforms to accumulate the big data generated and used by organizations. Although seemingly similar, both of them differ in terms of their characteristics and applications. This article presents a detailed overview of the roles of data warehouses and data lakes in modern enterprise data management. We detail the definitions, characteristics and related works for the respective data management frameworks. Furthermore, we explain the architecture and design considerations of the current state of the art. Finally, we provide a perspective on the challenges and promising research directions for the future. Full article
Show Figures

Figure 1

19 pages, 686 KiB  
Article
THOR: A Hybrid Recommender System for the Personalized Travel Experience
by Alireza Javadian Sabet, Mahsa Shekari, Chaofeng Guan, Matteo Rossi, Fabio Schreiber and Letizia Tanca
Big Data Cogn. Comput. 2022, 6(4), 131; https://doi.org/10.3390/bdcc6040131 - 04 Nov 2022
Cited by 2 | Viewed by 3290
Abstract
One of the travelers’ main challenges is that they have to spend a great effort to find and choose the most desired travel offer(s) among a vast list of non-categorized and non-personalized items. Recommendation systems provide an effective way to solve the problem [...] Read more.
One of the travelers’ main challenges is that they have to spend a great effort to find and choose the most desired travel offer(s) among a vast list of non-categorized and non-personalized items. Recommendation systems provide an effective way to solve the problem of information overload. In this work, we design and implement “The Hybrid Offer Ranker” (THOR), a hybrid, personalized recommender system for the transportation domain. THOR assigns every traveler a unique contextual preference model built using solely their personal data, which makes the model sensitive to the user’s choices. This model is used to rank travel offers presented to each user according to their personal preferences. We reduce the recommendation problem to one of binary classification that predicts the probability with which the traveler will buy each available travel offer. Travel offers are ranked according to the computed probabilities, hence to the user’s personal preference model. Moreover, to tackle the cold start problem for new users, we apply clustering algorithms to identify groups of travelers with similar profiles and build a preference model for each group. To test the system’s performance, we generate a dataset according to some carefully designed rules. The results of the experiments show that the THOR tool is capable of learning the contextual preferences of each traveler and ranks offers starting from those that have the higher probability of being selected. Full article
(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems)
Show Figures

Figure 1

29 pages, 620 KiB  
Article
A Space-Time Framework for Sentiment Scope Analysis in Social Media
by Gianluca Bonifazi, Francesco Cauteruccio, Enrico Corradini, Michele Marchetti, Luigi Sciarretta, Domenico Ursino and Luca Virgili
Big Data Cogn. Comput. 2022, 6(4), 130; https://doi.org/10.3390/bdcc6040130 - 03 Nov 2022
Cited by 19 | Viewed by 2615
Abstract
The concept of scope was introduced in Social Network Analysis to assess the authoritativeness and convincing ability of a user toward other users on one or more social platforms. It has been studied in the past in some specific contexts, for example to [...] Read more.
The concept of scope was introduced in Social Network Analysis to assess the authoritativeness and convincing ability of a user toward other users on one or more social platforms. It has been studied in the past in some specific contexts, for example to assess the ability of a user to spread information on Twitter. In this paper, we propose a new investigation on scope, as we want to assess the scope of the sentiment of a user on a topic. We also propose a multi-dimensional definition of scope. In fact, besides the traditional spatial scope, we introduce the temporal one, which has never been addressed in the literature, and propose a model that allows the concept of scope to be extended to further dimensions in the future. Furthermore, we propose an approach and a related set of parameters for measuring the scope of the sentiment of a user on a topic in a social network. Finally, we illustrate the results of an experimental campaign we conducted to evaluate the proposed framework on a dataset derived from Reddit. The main novelties of this paper are: (i) a multi-dimensional view of scope; (ii) the introduction of the concept of sentiment scope; (iii) the definition of a general framework capable of analyzing the sentiment scope related to any subject on any social network. Full article
Show Figures

Figure 1

22 pages, 601 KiB  
Review
Facial Age Estimation Using Machine Learning Techniques: An Overview
by Khaled ELKarazle, Valliappan Raman and Patrick Then
Big Data Cogn. Comput. 2022, 6(4), 128; https://doi.org/10.3390/bdcc6040128 - 26 Oct 2022
Cited by 10 | Viewed by 8059
Abstract
Automatic age estimation from facial images is an exciting machine learning topic that has attracted researchers’ attention over the past several years. Numerous human–computer interaction applications, such as targeted marketing, content access control, or soft-biometrics systems, employ age estimation models to carry out [...] Read more.
Automatic age estimation from facial images is an exciting machine learning topic that has attracted researchers’ attention over the past several years. Numerous human–computer interaction applications, such as targeted marketing, content access control, or soft-biometrics systems, employ age estimation models to carry out secondary tasks such as user filtering or identification. Despite the vast array of applications that could benefit from automatic age estimation, building an automatic age estimation system comes with issues such as data disparity, the unique ageing pattern of each individual, and facial photo quality. This paper provides a survey on the standard methods of building automatic age estimation models, the benchmark datasets for building these models, and some of the latest proposed pieces of literature that introduce new age estimation methods. Finally, we present and discuss the standard evaluation metrics used to assess age estimation models. In addition to the survey, we discuss the identified gaps in the reviewed literature and present recommendations for future research. Full article
Show Figures

Figure 1

Back to TopTop