Next Issue
Volume 7, September
Previous Issue
Volume 7, March
 
 

Big Data Cogn. Comput., Volume 7, Issue 2 (June 2023) – 64 articles

Cover Story (view full-size image): The authors propose a proactive approach to mitigating malware threats using natural language processing (NLP) techniques. The paper introduces a model called MalBERTv2, which has been trained on publicly available datasets and focuses on the app source code to extract relevant information. The model utilizes pre-tokenization feature generation and a classifier with bidirectional encoder representations from transformers (BERT) to detect malware threats. The model performs well, with a weighted F1 score ranging from 82% to 99% using different datasets. The results demonstrate the effectiveness of the NLP-based approach in proactively detecting malware threats. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Select all
Export citation of selected articles as:
15 pages, 2248 KiB  
Article
YOLO-v5 Variant Selection Algorithm Coupled with Representative Augmentations for Modelling Production-Based Variance in Automated Lightweight Pallet Racking Inspection
by Muhammad Hussain
Big Data Cogn. Comput. 2023, 7(2), 120; https://doi.org/10.3390/bdcc7020120 - 14 Jun 2023
Cited by 2 | Viewed by 1418
Abstract
The aim of this research is to develop an automated pallet inspection architecture with two key objectives: high performance with respect to defect classification and computational efficacy, i.e., lightweight footprint. As automated pallet racking via machine vision is a developing field, the procurement [...] Read more.
The aim of this research is to develop an automated pallet inspection architecture with two key objectives: high performance with respect to defect classification and computational efficacy, i.e., lightweight footprint. As automated pallet racking via machine vision is a developing field, the procurement of racking datasets can be a difficult task. Therefore, the first contribution of this study was the proposal of several tailored augmentations that were generated based on modelling production floor conditions/variances within warehouses. Secondly, the variant selection algorithm was proposed, starting with extreme-end analysis and providing a protocol for selecting the optimal architecture with respect to accuracy and computational efficiency. The proposed YOLO-v5n architecture generated the highest MAP@0.5 of 96.8% compared to previous works in the racking domain, with a computational footprint in terms of the number of parameters at its lowest, i.e., 1.9 M compared to YOLO-v5x at 86.7 M. Full article
Show Figures

Figure 1

24 pages, 1122 KiB  
Article
Efficient Method for Continuous IoT Data Stream Indexing in the Fog-Cloud Computing Level
by Karima Khettabi, Zineddine Kouahla, Brahim Farou, Hamid Seridi and Mohamed Amine Ferrag
Big Data Cogn. Comput. 2023, 7(2), 119; https://doi.org/10.3390/bdcc7020119 - 14 Jun 2023
Viewed by 1315
Abstract
Internet of Things (IoT) systems include many smart devices that continuously generate massive spatio-temporal data, which can be difficult to process. These continuous data streams need to be stored smartly so that query searches are efficient. In this work, we propose an efficient [...] Read more.
Internet of Things (IoT) systems include many smart devices that continuously generate massive spatio-temporal data, which can be difficult to process. These continuous data streams need to be stored smartly so that query searches are efficient. In this work, we propose an efficient method, in the fog-cloud computing architecture, to index continuous and heterogeneous data streams in metric space. This method divides the fog layer into three levels: clustering, clusters processing and indexing. The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is used to group the data from each stream into homogeneous clusters at the clustering fog level. Each cluster in the first data stream is stored in the clusters processing fog level and indexed directly in the indexing fog level in a Binary tree with Hyperplane (BH tree). The indexing of clusters in the subsequent data stream is determined by the coefficient of variation (CV) value of the union of the new cluster with the existing clusters in the cluster processing fog layer. An analysis and comparison of our experimental results with other results in the literature demonstrated the effectiveness of the CV method in reducing energy consumption during BH tree construction, as well as reducing the search time and energy consumption during a k Nearest Neighbor (kNN) parallel query search. Full article
Show Figures

Figure 1

18 pages, 2466 KiB  
Article
Transformational Entrepreneurship and Digital Platforms: A Combination of ISM-MICMAC and Unsupervised Machine Learning Algorithms
by Pejman Ebrahimi, Hakimeh Dustmohammadloo, Hosna Kabiri, Parisa Bouzari and Mária Fekete-Farkas
Big Data Cogn. Comput. 2023, 7(2), 118; https://doi.org/10.3390/bdcc7020118 - 13 Jun 2023
Viewed by 1934
Abstract
For many years, entrepreneurs were considered the change agents of their societies. They use their initiative and innovative minds to solve problems and create value. In the aftermath of the digital transformation era, a new group of entrepreneurs have emerged who are called [...] Read more.
For many years, entrepreneurs were considered the change agents of their societies. They use their initiative and innovative minds to solve problems and create value. In the aftermath of the digital transformation era, a new group of entrepreneurs have emerged who are called transformational entrepreneurs. They use various digital platforms to create value. Surprisingly, despite their importance, they have not been sufficiently investigated. Therefore, this research scrutinizes the elements affecting transformational entrepreneurship in digital platforms. To do so, the authors have considered a two-phase method. First, interpretive structural modeling (ISM) and Matrices d’Impacts Croises Multiplication Appliqué a Un Classement (MICMAC) are used to suggest a model. ISM is a qualitative method to reach a visualized hierarchical structure. Then, four unsupervised machine learning algorithms are used to ensure the accuracy of the proposed model. The findings reveal that transformational leadership could mediate the relationship between the entrepreneurial mindset and thinking and digital transformation, interdisciplinary approaches, value creation logic, and technology diffusion. The GMM in the full type, however, has the best accuracy among the various covariance types, with an accuracy of 0.895. From the practical point of view, this paper provides important insights for practitioners, entrepreneurs, and public actors to help them develop transformational entrepreneurship skills. The results could also serve as a guideline for companies regarding how to manage the consequences of a crisis such as a pandemic. The findings also provide significant insight for higher education policymakers. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing in 2023)
Show Figures

Figure 1

27 pages, 7168 KiB  
Article
Tactically Maximize Game Advantage by Predicting Football Substitutions Using Machine Learning
by Alex Mohandas, Mominul Ahsan and Julfikar Haider
Big Data Cogn. Comput. 2023, 7(2), 117; https://doi.org/10.3390/bdcc7020117 - 12 Jun 2023
Viewed by 2478
Abstract
Football (also known as Soccer), boasts a staggering fan base of 3.5 billion individuals spread across 200 countries, making it the world’s most beloved sport. The widespread adoption of advanced technology in sports has become increasingly prominent, empowering players, coaches, and team management [...] Read more.
Football (also known as Soccer), boasts a staggering fan base of 3.5 billion individuals spread across 200 countries, making it the world’s most beloved sport. The widespread adoption of advanced technology in sports has become increasingly prominent, empowering players, coaches, and team management to enhance their performance and refine team strategies. Among these advancements, player substitution plays a crucial role in altering the dynamics of a match. However, due to the absence of proven methods or software capable of accurately predicting substitutions, these decisions are often based on instinct rather than concrete data. The purpose of this research is to explore the potential of employing machine learning algorithms to predict substitutions in Football, and how it could influence the outcome of a match. This study investigates the effect of timely and tactical substitutions in football matches and their influence on the match results. Machine learning techniques such as Logistic Regression (LR), Decision tree (DT), K-nearest Neighbor (KNN), Support Vector Machine (SVM), Multinomial Naïve Bayes (MNB), Random Forest (RF) classifiers were implemented and tested to develop models and to predict player substitutions. Relevant data was collected from the Kaggle dataset, which contains data of 51,738 substitutions from 9074 European league football matches in 5 leagues spanning 6 seasons. Machine learning models were trained and tested using an 80-20 data split and it was observed that RF model provided the best accuracy of over 70% and the best F1-score of 0.65 on the test set across all football leagues. SVM model achieved the best Precision of almost 0.8. However, the worst computation time of up to 2 min was consumed. LR showed some overfitting issues with 100% accuracy in the training set, but only 60% accuracy was obtained for the test set. To conclude, based on the time of substitution and match score-line, it was possible to predict the players who can be substituted, which can provide a match advantage. The achieved results provided an effective way to decide on player substitutions for both the team manager and coaches. Full article
Show Figures

Figure 1

21 pages, 2309 KiB  
Communication
Sentiment Analysis and Text Analysis of the Public Discourse on Twitter about COVID-19 and MPox
by Nirmalya Thakur
Big Data Cogn. Comput. 2023, 7(2), 116; https://doi.org/10.3390/bdcc7020116 - 09 Jun 2023
Cited by 16 | Viewed by 2659
Abstract
Mining and analysis of the big data of Twitter conversations have been of significant interest to the scientific community in the fields of healthcare, epidemiology, big data, data science, computer science, and their related areas, as can be seen from several works in [...] Read more.
Mining and analysis of the big data of Twitter conversations have been of significant interest to the scientific community in the fields of healthcare, epidemiology, big data, data science, computer science, and their related areas, as can be seen from several works in the last few years that focused on sentiment analysis and other forms of text analysis of tweets related to Ebola, E-Coli, Dengue, Human Papillomavirus (HPV), Middle East Respiratory Syndrome (MERS), Measles, Zika virus, H1N1, influenza-like illness, swine flu, flu, Cholera, Listeriosis, cancer, Liver Disease, Inflammatory Bowel Disease, kidney disease, lupus, Parkinson’s, Diphtheria, and West Nile virus. The recent outbreaks of COVID-19 and MPox have served as “catalysts” for Twitter usage related to seeking and sharing information, views, opinions, and sentiments involving both of these viruses. None of the prior works in this field analyzed tweets focusing on both COVID-19 and MPox simultaneously. To address this research gap, a total of 61,862 tweets that focused on MPox and COVID-19 simultaneously, posted between 7 May 2022 and 3 March 2023, were studied. The findings and contributions of this study are manifold. First, the results of sentiment analysis using the VADER (Valence Aware Dictionary for sEntiment Reasoning) approach shows that nearly half the tweets (46.88%) had a negative sentiment. It was followed by tweets that had a positive sentiment (31.97%) and tweets that had a neutral sentiment (21.14%), respectively. Second, this paper presents the top 50 hashtags used in these tweets. Third, it presents the top 100 most frequently used words in these tweets after performing tokenization, removal of stopwords, and word frequency analysis. The findings indicate that tweets in this context included a high level of interest regarding COVID-19, MPox and other viruses, President Biden, and Ukraine. Finally, a comprehensive comparative study that compares the contributions of this paper with 49 prior works in this field is presented to further uphold the relevance and novelty of this work. Full article
(This article belongs to the Special Issue Machine Learning in Data Mining for Knowledge Discovery)
Show Figures

Figure 1

21 pages, 7048 KiB  
Article
Molecular Structure-Based Prediction of Absorption Maxima of Dyes Using ANN Model
by Neeraj Tomar, Geeta Rani, Vijaypal Singh Dhaka, Praveen K. Surolia, Kalpit Gupta, Eugenio Vocaturo and Ester Zumpano
Big Data Cogn. Comput. 2023, 7(2), 115; https://doi.org/10.3390/bdcc7020115 - 08 Jun 2023
Cited by 2 | Viewed by 1976
Abstract
The exponentially growing energy requirements and, in turn, extensive depletion of non-restorable sources of energy are a major cause of concern. Restorable energy sources such as solar cells can be used as an alternative. However, their low efficiency is a barrier to their [...] Read more.
The exponentially growing energy requirements and, in turn, extensive depletion of non-restorable sources of energy are a major cause of concern. Restorable energy sources such as solar cells can be used as an alternative. However, their low efficiency is a barrier to their practical use. This provokes the research community to design efficient solar cells. Based on the study of efficacy, design feasibility, and cost of fabrication, DSSC shows supremacy over other photovoltaic solar cells. However, fabricating DSSC in a laboratory and then assessing their characteristics is a costly affair. The researchers applied techniques of computational chemistry such as Time-Dependent Density Functional Theory, and an ab initio method for defining the structure and electronic properties of dyes without synthesizing them. However, the inability of descriptors to provide an intuitive physical depiction of the effect of all parameters is a limitation of the proposed approaches. The proven potential of neural network models in data analysis, pattern recognition, and object detection motivated researchers to extend their applicability for predicting the absorption maxima (λmax) of dye. The objective of this research is to develop an ANN-based QSPR model for correctly predicting the value of λmax for inorganic ruthenium complex dyes used in DSSC. Furthermore, it demonstrates the impact of different activation functions, optimizers, and loss functions on the prediction accuracy of λmax. Moreover, this research showcases the impact of atomic weight, types of bonds between constituents of the dye molecule, and the molecular weight of the dye molecule on the value of λmax. The experimental results proved that the value of λmax varies with changes in constituent atoms and types of bonds in a dye molecule. In addition, the model minimizes the difference in the experimental and calculated values of absorption maxima. The comparison with the existing models proved the dominance of the proposed model. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing in 2023)
Show Figures

Figure 1

12 pages, 416 KiB  
Article
Twi Machine Translation
by Frederick Gyasi and Tim Schlippe
Big Data Cogn. Comput. 2023, 7(2), 114; https://doi.org/10.3390/bdcc7020114 - 08 Jun 2023
Cited by 1 | Viewed by 2090
Abstract
French is a strategically and economically important language in the regions where the African language Twi is spoken. However, only a very small proportion of Twi speakers in Ghana speak French. The development of a Twi–French parallel corpus and corresponding machine translation applications [...] Read more.
French is a strategically and economically important language in the regions where the African language Twi is spoken. However, only a very small proportion of Twi speakers in Ghana speak French. The development of a Twi–French parallel corpus and corresponding machine translation applications would provide various advantages, including stimulating trade and job creation, supporting the Ghanaian diaspora in French-speaking nations, assisting French-speaking tourists and immigrants seeking medical care in Ghana, and facilitating numerous downstream natural language processing tasks. Since there are hardly any machine translation systems or parallel corpora between Twi and French that cover a modern and versatile vocabulary, our goal was to extend a modern Twi–English corpus with French and develop machine translation systems between Twi and French: Consequently, in this paper, we present our Twi–French corpus of 10,708 parallel sentences. Furthermore, we describe our machine translation experiments with this corpus. We investigated direct machine translation and cascading systems that use English as a pivot language. Our best Twi–French system is a direct state-of-the-art transformer-based machine translation system that achieves a BLEU score of 0.76. Our best French–Twi system, which is a cascading system that uses English as a pivot language, results in a BLEU score of 0.81. Both systems are fine tuned with our corpus, and our French–Twi system even slightly outperforms Google Translate on our test set by 7% relative. Full article
(This article belongs to the Special Issue Artificial Intelligence and Natural Language Processing)
Show Figures

Figure 1

25 pages, 990 KiB  
Review
Exploring Machine Learning Models for Soil Nutrient Properties Prediction: A Systematic Review
by Olusegun Folorunso, Oluwafolake Ojo, Mutiu Busari, Muftau Adebayo, Adejumobi Joshua, Daniel Folorunso, Charles Okechukwu Ugwunna, Olufemi Olabanjo and Olusola Olabanjo
Big Data Cogn. Comput. 2023, 7(2), 113; https://doi.org/10.3390/bdcc7020113 - 08 Jun 2023
Cited by 4 | Viewed by 6928
Abstract
Agriculture is essential to a flourishing economy. Although soil is essential for sustainable food production, its quality can decline as cultivation becomes more intensive and demand increases. The importance of healthy soil cannot be overstated, as a lack of nutrients can significantly lower [...] Read more.
Agriculture is essential to a flourishing economy. Although soil is essential for sustainable food production, its quality can decline as cultivation becomes more intensive and demand increases. The importance of healthy soil cannot be overstated, as a lack of nutrients can significantly lower crop yield. Smart soil prediction and digital soil mapping offer accurate data on soil nutrient distribution needed for precision agriculture. Machine learning techniques are now driving intelligent soil prediction systems. This article provides a comprehensive analysis of the use of machine learning in predicting soil qualities. The components and qualities of soil, the prediction of soil parameters, the existing soil dataset, the soil map, the effect of soil nutrients on crop growth, as well as the soil information system, are the key subjects under inquiry. Smart agriculture, as exemplified by this study, can improve food quality and productivity. Full article
Show Figures

Figure 1

25 pages, 5224 KiB  
Article
Expanding the Horizons of Situated Visualization: The Extended SV Model
by Nuno Cid Martins, Bernardo Marques, Paulo Dias and Beatriz Sousa Santos
Big Data Cogn. Comput. 2023, 7(2), 112; https://doi.org/10.3390/bdcc7020112 - 07 Jun 2023
Viewed by 1220
Abstract
To fully leverage the benefits of augmented and mixed reality (AR/MR) in supporting users, it is crucial to establish a consistent and well-defined situated visualization (SV) model. SV encompasses visualizations that adapt based on context, considering the relevant visualizations within their physical display [...] Read more.
To fully leverage the benefits of augmented and mixed reality (AR/MR) in supporting users, it is crucial to establish a consistent and well-defined situated visualization (SV) model. SV encompasses visualizations that adapt based on context, considering the relevant visualizations within their physical display environment. Recognizing the potential of SV in various domains such as collaborative tasks, situational awareness, decision-making, assistance, training, and maintenance, AR/MR is well-suited to facilitate these scenarios by providing additional data and context-driven visualization techniques. While some perspectives on the SV model have been proposed, such as space, time, place, activity, and community, a comprehensive and up-to-date systematization of the entire SV model is yet to be established. Therefore, there is a pressing need for a more comprehensive and updated description of the SV model within the AR/MR framework to foster research discussions. Full article
(This article belongs to the Special Issue Augmented Reality, Virtual Reality, and Computer Graphics)
Show Figures

Figure 1

16 pages, 1026 KiB  
Article
Is My Pruned Model Trustworthy? PE-Score: A New CAM-Based Evaluation Metric
by Cesar G. Pachon, Diego Renza and Dora Ballesteros
Big Data Cogn. Comput. 2023, 7(2), 111; https://doi.org/10.3390/bdcc7020111 - 06 Jun 2023
Cited by 1 | Viewed by 1298
Abstract
One of the strategies adopted to compress CNN models for image classification tasks is pruning, where some elements, channels or filters of the network are discarded. Typically, pruning methods present results in terms of model performance before and after pruning (assessed by accuracy [...] Read more.
One of the strategies adopted to compress CNN models for image classification tasks is pruning, where some elements, channels or filters of the network are discarded. Typically, pruning methods present results in terms of model performance before and after pruning (assessed by accuracy or a related parameter such as the F1-score), assuming that if the difference is less than a certain value (e.g., 2%), the pruned model is trustworthy. However, state-of-the-art models are not concerned with measuring the actual impact of pruning on the network by evaluating the pixels used by the model to make the decision, or the confidence of the class itself. Consequently, this paper presents a new metric, called the Pruning Efficiency score (PE-score), which allows us to identify whether a pruned model preserves the behavior (i.e., the extracted patterns) of the unpruned model, through visualization and interpretation with CAM-based methods. With the proposed metric, it will be possible to better compare pruning methods for CNN-based image classification models, as well as to verify whether the pruned model is efficient by focusing on the same patterns (pixels) as those of the original model, even if it has reduced the number of parameters and FLOPs. Full article
Show Figures

Figure 1

18 pages, 8452 KiB  
Article
Comparing Reservoir Artificial and Spiking Neural Networks in Machine Fault Detection Tasks
by Vladislav Kholkin, Olga Druzhina, Valerii Vatnik, Maksim Kulagin, Timur Karimov and Denis Butusov
Big Data Cogn. Comput. 2023, 7(2), 110; https://doi.org/10.3390/bdcc7020110 - 05 Jun 2023
Cited by 2 | Viewed by 1427
Abstract
For the last two decades, artificial neural networks (ANNs) of the third generation, also known as spiking neural networks (SNN), have remained a subject of interest for researchers. A significant difficulty for the practical application of SNNs is their poor suitability for von [...] Read more.
For the last two decades, artificial neural networks (ANNs) of the third generation, also known as spiking neural networks (SNN), have remained a subject of interest for researchers. A significant difficulty for the practical application of SNNs is their poor suitability for von Neumann computer architecture, so many researchers are currently focusing on the development of alternative hardware. Nevertheless, today several experimental libraries implementing SNNs for conventional computers are available. In this paper, using the RCNet library, we compare the performance of reservoir computing architectures based on artificial and spiking neural networks. We explicitly show that, despite the higher execution time, SNNs can demonstrate outstanding classification accuracy in the case of complicated datasets, such as data from industrial sensors used for the fault detection of bearings and gears. For one of the test problems, namely, ball bearing diagnosis using an accelerometer, the accuracy of the classification using reservoir SNN almost reached 100%, while the reservoir ANN was able to achieve recognition accuracy up to only 61%. The results of the study clearly demonstrate the superiority and benefits of SNN classificators. Full article
Show Figures

Figure 1

20 pages, 1788 KiB  
Article
DSpamOnto: An Ontology Modelling for Domain-Specific Social Spammers in Microblogging
by Malak Al-Hassan, Bilal Abu-Salih and Ahmad Al Hwaitat
Big Data Cogn. Comput. 2023, 7(2), 109; https://doi.org/10.3390/bdcc7020109 - 02 Jun 2023
Cited by 2 | Viewed by 1425
Abstract
The lack of regulations and oversight on Online Social Networks (OSNs) has resulted in the rise of social spam, which is the dissemination of unsolicited and low-quality content that aims to deceive and manipulate users. Social spam can cause a range of negative [...] Read more.
The lack of regulations and oversight on Online Social Networks (OSNs) has resulted in the rise of social spam, which is the dissemination of unsolicited and low-quality content that aims to deceive and manipulate users. Social spam can cause a range of negative consequences for individuals and businesses, such as the spread of malware, phishing scams, and reputational damage. While machine learning techniques can be used to detect social spammers by analysing patterns in data, they have limitations such as the potential for false positives and false negatives. In contrast, ontologies allow for the explicit modelling and representation of domain knowledge, which can be used to create a set of rules for identifying social spammers. However, the literature exposes a deficiency of ontologies that conceptualize domain-based social spam. This paper aims to address this gap by designing a domain-specific ontology called DSpamOnto to detect social spammers in microblogging that targes a specific domain. DSpamOnto can identify social spammers based on their domain-specific behaviour, such as posting repetitive or irrelevant content and using misleading information. The proposed model is compared and benchmarked against well-proven ML models using various evaluation metrics to verify and validate its utility in capturing social spammers. Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
Show Figures

Figure 1

36 pages, 1674 KiB  
Review
Privacy-Enhancing Digital Contact Tracing with Machine Learning for Pandemic Response: A Comprehensive Review
by Ching-Nam Hang, Yi-Zhen Tsai, Pei-Duo Yu, Jiasi Chen and Chee-Wei Tan
Big Data Cogn. Comput. 2023, 7(2), 108; https://doi.org/10.3390/bdcc7020108 - 01 Jun 2023
Cited by 1 | Viewed by 3407
Abstract
The rapid global spread of the coronavirus disease (COVID-19) has severely impacted daily life worldwide. As potential solutions, various digital contact tracing (DCT) strategies have emerged to mitigate the virus’s spread while maintaining economic and social activities. The computational epidemiology problems of DCT [...] Read more.
The rapid global spread of the coronavirus disease (COVID-19) has severely impacted daily life worldwide. As potential solutions, various digital contact tracing (DCT) strategies have emerged to mitigate the virus’s spread while maintaining economic and social activities. The computational epidemiology problems of DCT often involve parameter optimization through learning processes, making it crucial to understand how to apply machine learning techniques for effective DCT optimization. While numerous research studies on DCT have emerged recently, most existing reviews primarily focus on DCT application design and implementation. This paper offers a comprehensive overview of privacy-preserving machine learning-based DCT in preparation for future pandemics. We propose a new taxonomy to classify existing DCT strategies into forward, backward, and proactive contact tracing. We then categorize several DCT apps developed during the COVID-19 pandemic based on their tracing strategies. Furthermore, we derive three research questions related to computational epidemiology for DCT and provide a detailed description of machine learning techniques to address these problems. We discuss the challenges of learning-based DCT and suggest potential solutions. Additionally, we include a case study demonstrating the review’s insights into the pandemic response. Finally, we summarize the study’s limitations and highlight promising future research directions in DCT. Full article
(This article belongs to the Special Issue Digital Health and Data Analytics in Public Health)
Show Figures

Figure 1

21 pages, 722 KiB  
Article
Semantic Hierarchical Indexing for Online Video Lessons Using Natural Language Processing
by Marco Arazzi, Marco Ferretti and Antonino Nocera
Big Data Cogn. Comput. 2023, 7(2), 107; https://doi.org/10.3390/bdcc7020107 - 31 May 2023
Cited by 1 | Viewed by 1247
Abstract
Huge quantities of audio and video material are available at universities and teaching institutions, but their use can be limited because of the lack of intelligent search tools. This paper describes a possible way to set up an indexing scheme that offers a [...] Read more.
Huge quantities of audio and video material are available at universities and teaching institutions, but their use can be limited because of the lack of intelligent search tools. This paper describes a possible way to set up an indexing scheme that offers a smart search modality, that combines semantic analysis of video/audio transcripts with the exact time positioning of uttered words. The proposal leverages NLP methods for topic modeling with lexical analysis of lessons’ transcripts and builds a semantic hierarchical index into the corpus of lessons analyzed. Moreover, using abstracting summarization, the system can offer short summaries on the subject semantically implied by the search carried out. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing in 2023)
Show Figures

Figure 1

13 pages, 398 KiB  
Article
Adaptive KNN-Based Extended Collaborative Filtering Recommendation Services
by Luong Vuong Nguyen, Quoc-Trinh Vo and Tri-Hai Nguyen
Big Data Cogn. Comput. 2023, 7(2), 106; https://doi.org/10.3390/bdcc7020106 - 31 May 2023
Cited by 6 | Viewed by 4499
Abstract
In the current era of e-commerce, users are overwhelmed with countless products, making it difficult to find relevant items. Recommendation systems generate suggestions based on user preferences, to avoid information overload. Collaborative filtering is a widely used model in modern recommendation systems. Despite [...] Read more.
In the current era of e-commerce, users are overwhelmed with countless products, making it difficult to find relevant items. Recommendation systems generate suggestions based on user preferences, to avoid information overload. Collaborative filtering is a widely used model in modern recommendation systems. Despite its popularity, collaborative filtering has limitations that researchers aim to overcome. In this paper, we enhance the K-nearest neighbor (KNN)-based collaborative filtering algorithm for a recommendation system, by considering the similarity of user cognition. This enhancement aimed to improve the accuracy in grouping users and generating more relevant recommendations for the active user. The experimental results showed that the proposed model outperformed benchmark models, in terms of MAE, RMSE, MAP, and NDCG metrics. Full article
(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems 2nd Edition)
Show Figures

Figure 1

17 pages, 617 KiB  
Article
Breaking Barriers: Unveiling Factors Influencing the Adoption of Artificial Intelligence by Healthcare Providers
by BM Zeeshan Hameed, Nithesh Naik, Sufyan Ibrahim, Nisha S. Tatkar, Milap J. Shah, Dharini Prasad, Prithvi Hegde, Piotr Chlosta, Bhavan Prasad Rai and Bhaskar K Somani
Big Data Cogn. Comput. 2023, 7(2), 105; https://doi.org/10.3390/bdcc7020105 - 30 May 2023
Cited by 6 | Viewed by 2731
Abstract
Artificial intelligence (AI) is an emerging technological system that provides a platform to manage and analyze data by emulating human cognitive functions with greater accuracy, revolutionizing patient care and introducing a paradigm shift to the healthcare industry. The purpose of this study is [...] Read more.
Artificial intelligence (AI) is an emerging technological system that provides a platform to manage and analyze data by emulating human cognitive functions with greater accuracy, revolutionizing patient care and introducing a paradigm shift to the healthcare industry. The purpose of this study is to identify the underlying factors that affect the adoption of artificial intelligence in healthcare (AIH) by healthcare providers and to understand “What are the factors that influence healthcare providers’ behavioral intentions to adopt AIH in their routine practice?” An integrated survey was conducted among healthcare providers, including consultants, residents/students, and nurses. The survey included items related to performance expectancy, effort expectancy, initial trust, personal innovativeness, task complexity, and technology characteristics. The collected data were analyzed using structural equation modeling. A total of 392 healthcare professionals participated in the survey, with 72.4% being male and 50.7% being 30 years old or younger. The results showed that performance expectancy, effort expectancy, and initial trust have a positive influence on the behavioral intentions of healthcare providers to use AIH. Personal innovativeness was found to have a positive influence on effort expectancy, while task complexity and technology characteristics have a positive influence on effort expectancy for AIH. The study’s empirically validated model sheds light on healthcare providers’ intention to adopt AIH, while the study’s findings can be used to develop strategies to encourage this adoption. However, further investigation is necessary to understand the individual factors affecting the adoption of AIH by healthcare providers. Full article
(This article belongs to the Special Issue Deep Network Learning and Its Applications)
Show Figures

Figure 1

4 pages, 185 KiB  
Editorial
Perspectives on Big Data, Cloud-Based Data Analysis and Machine Learning Systems
by Fabrizio Marozzo and Domenico Talia
Big Data Cogn. Comput. 2023, 7(2), 104; https://doi.org/10.3390/bdcc7020104 - 30 May 2023
Cited by 1 | Viewed by 1694
Abstract
Huge amounts of digital data are continuously generated and collected from different sources, such as sensors, cameras, in-vehicle infotainment, smart meters, mobile devices, social media platforms, and web applications and services [...] Full article
27 pages, 11455 KiB  
Article
On-Shore Plastic Waste Detection with YOLOv5 and RGB-Near-Infrared Fusion: A State-of-the-Art Solution for Accurate and Efficient Environmental Monitoring
by Owen Tamin, Ervin Gubin Moung, Jamal Ahmad Dargham, Farashazillah Yahya, Ali Farzamnia, Florence Sia, Nur Faraha Mohd Naim and Lorita Angeline
Big Data Cogn. Comput. 2023, 7(2), 103; https://doi.org/10.3390/bdcc7020103 - 29 May 2023
Cited by 1 | Viewed by 2525
Abstract
Plastic waste is a growing environmental concern that poses a significant threat to onshore ecosystems, human health, and wildlife. The accumulation of plastic waste in oceans has reached a staggering estimate of over eight million tons annually, leading to hazardous outcomes in marine [...] Read more.
Plastic waste is a growing environmental concern that poses a significant threat to onshore ecosystems, human health, and wildlife. The accumulation of plastic waste in oceans has reached a staggering estimate of over eight million tons annually, leading to hazardous outcomes in marine life and the food chain. Plastic waste is prevalent in urban areas, posing risks to animals that may ingest it or become entangled in it, and negatively impacting the economy and tourism industry. Effective plastic waste management requires a comprehensive approach that includes reducing consumption, promoting recycling, and developing innovative technologies such as automated plastic detection systems. The development of accurate and efficient plastic detection methods is therefore essential for effective waste management. To address this challenge, machine learning techniques such as the YOLOv5 model have emerged as promising tools for developing automated plastic detection systems. Furthermore, there is a need to study both visible light (RGB) and near-infrared (RGNIR) as part of plastic waste detection due to the unique properties of plastic waste in different environmental settings. To this end, two plastic waste datasets, comprising RGB and RGNIR images, were utilized to train the proposed model, YOLOv5m. The performance of the model was then evaluated using a 10-fold cross-validation method on both datasets. The experiment was extended by adding background images into the training dataset to reduce false positives. An additional experiment was carried out to fuse both the RGB and RGNIR datasets. A performance-metric score called the Weighted Metric Score (WMS) was proposed, where the WMS equaled the sum of the mean average precision at the intersection over union (IoU) threshold of 0.5 (mAP@0.5) × 0.1 and the mean average precision averaged over different IoU thresholds ranging from 0.5 to 0.95 (mAP@0.5:0.95) × 0.9. In addition, a 10-fold cross-validation procedure was implemented. Based on the results, the proposed model achieved the best performance using the fusion of the RGB and RGNIR datasets when evaluated on the testing dataset with a mean of mAP@0.5, mAP@0.5:0.95, and a WMS of 92.96% ± 2.63%, 69.47% ± 3.11%, and 71.82% ± 3.04%, respectively. These findings indicate that utilizing both normal visible light and the near-infrared spectrum as feature representations in machine learning could lead to improved performance in plastic waste detection. This opens new opportunities in the development of automated plastic detection systems for use in fields such as automation, environmental management, and resource management. Full article
Show Figures

Figure 1

14 pages, 3229 KiB  
Article
Hand Gesture Recognition Using Automatic Feature Extraction and Deep Learning Algorithms with Memory
by Rubén E. Nogales and Marco E. Benalcázar
Big Data Cogn. Comput. 2023, 7(2), 102; https://doi.org/10.3390/bdcc7020102 - 23 May 2023
Cited by 2 | Viewed by 3999
Abstract
Gesture recognition is widely used to express emotions or to communicate with other people or machines. Hand gesture recognition is a problem of great interest to researchers because it is a high-dimensional pattern recognition problem. The high dimensionality of the problem is directly [...] Read more.
Gesture recognition is widely used to express emotions or to communicate with other people or machines. Hand gesture recognition is a problem of great interest to researchers because it is a high-dimensional pattern recognition problem. The high dimensionality of the problem is directly related to the performance of machine learning models. The dimensionality problem can be addressed through feature selection and feature extraction. In this sense, the evaluation of a model with manual feature extraction and automatic feature extraction was proposed. The manual feature extraction was performed using the statistical functions of central tendency, while the automatic extraction was performed by means of a CNN and BiLSTM. These features were also evaluated in classifiers such as Softmax, ANN, and SVM. The best-performing model was the combination of BiLSTM and ANN (BiLSTM-ANN), with an accuracy of 99.9912%. Full article
Show Figures

Figure 1

23 pages, 4549 KiB  
Article
An Ontology Development Methodology Based on Ontology-Driven Conceptual Modeling and Natural Language Processing: Tourism Case Study
by Shaimaa Haridy, Rasha M. Ismail, Nagwa Badr and Mohamed Hashem
Big Data Cogn. Comput. 2023, 7(2), 101; https://doi.org/10.3390/bdcc7020101 - 21 May 2023
Cited by 2 | Viewed by 2737
Abstract
Ontologies provide a powerful method for representing, reusing, and sharing domain knowledge. They are extensively used in a wide range of disciplines, including artificial intelligence, knowledge engineering, biomedical informatics, and many more. For several reasons, developing domain ontologies is a challenging task. One [...] Read more.
Ontologies provide a powerful method for representing, reusing, and sharing domain knowledge. They are extensively used in a wide range of disciplines, including artificial intelligence, knowledge engineering, biomedical informatics, and many more. For several reasons, developing domain ontologies is a challenging task. One of these reasons is that it is a complicated and time-consuming process. Multiple ontology development methodologies have already been proposed. However, there is room for improvement in terms of covering more activities during development (such as enrichment) and enhancing others (such as conceptualization). In this research, an enhanced ontology development methodology (ON-ODM) is proposed. Ontology-driven conceptual modeling (ODCM) and natural language processing (NLP) serve as the foundation of the proposed methodology. ODCM is defined as the utilization of ontological ideas from various areas to build engineering artifacts that improve conceptual modeling. NLP refers to the scientific discipline that employs computer techniques to analyze human language. The proposed ON-ODM is applied to build a tourism ontology that will be beneficial for a variety of applications, including e-tourism. The produced ontology is evaluated based on competency questions (CQs) and quality metrics. It is verified that the ontology answers SPARQL queries covering all CQ groups specified by domain experts. Quality metrics are used to compare the produced ontology with four existing tourism ontologies. For instance, according to the metrics related to conciseness, the produced ontology received a first place ranking when compared to the others, whereas it received a second place ranking regarding understandability. These results show that utilizing ODCM and NLP could facilitate and improve the development process, respectively. Full article
(This article belongs to the Special Issue Big Data Analytics for Cultural Heritage 2nd Edition)
Show Figures

Figure 1

23 pages, 15598 KiB  
Article
Investigating the Accuracy of Autoregressive Recurrent Networks Using Hierarchical Aggregation Structure-Based Data Partitioning
by José Manuel Oliveira and Patrícia Ramos
Big Data Cogn. Comput. 2023, 7(2), 100; https://doi.org/10.3390/bdcc7020100 - 18 May 2023
Cited by 1 | Viewed by 1515
Abstract
Global models have been developed to tackle the challenge of forecasting sets of series that are related or share similarities, but they have not been developed for heterogeneous datasets. Various methods of partitioning by relatedness have been introduced to enhance the similarities of [...] Read more.
Global models have been developed to tackle the challenge of forecasting sets of series that are related or share similarities, but they have not been developed for heterogeneous datasets. Various methods of partitioning by relatedness have been introduced to enhance the similarities of sets, resulting in improved forecasting accuracy but often at the cost of a reduced sample size, which could be harmful. To shed light on how the relatedness between series impacts the effectiveness of global models in real-world demand-forecasting problems, we perform an extensive empirical study using the M5 competition dataset. We examine cross-learning scenarios driven by the product hierarchy commonly employed in retail planning to allow global models to capture interdependencies across products and regions more effectively. Our findings show that global models outperform state-of-the-art local benchmarks by a considerable margin, indicating that they are not inherently more limited than local models and can handle unrelated time-series data effectively. The accuracy of data-partitioning approaches increases as the sizes of the data pools and the models’ complexity decrease. However, there is a trade-off between data availability and data relatedness. Smaller data pools lead to increased similarity among time series, making it easier to capture cross-product and cross-region dependencies, but this comes at the cost of a reduced sample, which may not be beneficial. Finally, it is worth noting that the successful implementation of global models for heterogeneous datasets can significantly impact forecasting practice. Full article
Show Figures

Figure 1

23 pages, 1484 KiB  
Article
Unsupervised Deep Learning for Structural Health Monitoring
by Roberto Boccagna, Maurizio Bottini, Massimo Petracca, Alessia Amelio and Guido Camata
Big Data Cogn. Comput. 2023, 7(2), 99; https://doi.org/10.3390/bdcc7020099 - 17 May 2023
Cited by 3 | Viewed by 2095
Abstract
In the last few decades, structural health monitoring has gained relevance in the context of civil engineering, and much effort has been made to automate the process of data acquisition and analysis through the use of data-driven methods. Currently, the main issues arising [...] Read more.
In the last few decades, structural health monitoring has gained relevance in the context of civil engineering, and much effort has been made to automate the process of data acquisition and analysis through the use of data-driven methods. Currently, the main issues arising in automated monitoring processing regard the establishment of a robust approach that covers all intermediate steps from data acquisition to output production and interpretation. To overcome this limitation, we introduce a dedicated artificial-intelligence-based monitoring approach for the assessment of the health conditions of structures in near-real time. The proposed approach is based on the construction of an unsupervised deep learning algorithm, with the aim of establishing a reliable method of anomaly detection for data acquired from sensors positioned on buildings. After preprocessing, the data are fed into various types of artificial neural network autoencoders, which are trained to produce outputs as close as possible to the inputs. We tested the proposed approach on data generated from an OpenSees numerical model of a railway bridge and data acquired from physical sensors positioned on the Historical Tower of Ravenna (Italy). The results show that the approach actually flags the data produced when damage scenarios are activated in the OpenSees model as coming from a damaged structure. The proposed method is also able to reliably detect anomalous structural behaviors of the tower, preventing critical scenarios. Compared to other state-of-the-art methods for anomaly detection, the proposed approach shows very promising results. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing in 2023)
Show Figures

Figure 1

13 pages, 749 KiB  
Article
Massive Parallel Alignment of RNA-seq Reads in Serverless Computing
by Pietro Cinaglia, José Luis Vázquez-Poletti and Mario Cannataro
Big Data Cogn. Comput. 2023, 7(2), 98; https://doi.org/10.3390/bdcc7020098 - 15 May 2023
Cited by 3 | Viewed by 1687
Abstract
In recent years, the use of Cloud infrastructures for data processing has proven useful, with a computing potential that is not affected by the limitations of a local infrastructure. In this context, Serverless computing is the fastest-growing Cloud service model due to its [...] Read more.
In recent years, the use of Cloud infrastructures for data processing has proven useful, with a computing potential that is not affected by the limitations of a local infrastructure. In this context, Serverless computing is the fastest-growing Cloud service model due to its auto-scaling methodologies, reliability, and fault tolerance. We present a solution based on in-house Serverless infrastructure, which is able to perform large-scale RNA-seq data analysis focused on the mapping of sequencing reads to a reference genome. The main contribution was bringing the computation of genomic data into serverless computing, focusing on RNA-seq read-mapping to a reference genome, as this is the most time-consuming task for some pipelines. The proposed solution handles massive parallel instances to maximize the efficiency in terms of running time. We evaluated the performance of our solution by performing two main tests, both based on the mapping of RNA-seq reads to Human GRCh38. Our experiments demonstrated a reduction of 79.838%, 90.079%, and 96.382%, compared to the local environments with 16, 8, and 4 virtual cores, respectively. Furthermore, serverless limitations were investigated. Full article
(This article belongs to the Special Issue Data-Based Bioinformatics and Applications)
Show Figures

Figure 1

44 pages, 4594 KiB  
Systematic Review
SQL and NoSQL Database Software Architecture Performance Analysis and Assessments—A Systematic Literature Review
by Wisal Khan, Teerath Kumar, Cheng Zhang, Kislay Raj, Arunabha M. Roy and Bin Luo
Big Data Cogn. Comput. 2023, 7(2), 97; https://doi.org/10.3390/bdcc7020097 - 12 May 2023
Cited by 13 | Viewed by 11899
Abstract
The competent software architecture plays a crucial role in the difficult task of big data processing for SQL and NoSQL databases. SQL databases were created to organize data and allow for horizontal expansion. NoSQL databases, on the other hand, support horizontal scalability and [...] Read more.
The competent software architecture plays a crucial role in the difficult task of big data processing for SQL and NoSQL databases. SQL databases were created to organize data and allow for horizontal expansion. NoSQL databases, on the other hand, support horizontal scalability and can efficiently process large amounts of unstructured data. Organizational needs determine which paradigm is appropriate, yet selecting the best option is not always easy. Differences in database design are what set SQL and NoSQL databases apart. Each NoSQL database type also consistently employs a mixed-model approach. Therefore, it is challenging for cloud users to transfer their data among different cloud storage services (CSPs). There are several different paradigms being monitored by the various cloud platforms (IaaS, PaaS, SaaS, and DBaaS). The purpose of this SLR is to examine the articles that address cloud data portability and interoperability, as well as the software architectures of SQL and NoSQL databases. Numerous studies comparing the capabilities of SQL and NoSQL of databases, particularly Oracle RDBMS and NoSQL Document Database (MongoDB), in terms of scale, performance, availability, consistency, and sharding, were presented as part of the state of the art. Research indicates that NoSQL databases, with their specifically tailored structures, may be the best option for big data analytics, while SQL databases are best suited for online transaction processing (OLTP) purposes. Full article
Show Figures

Figure 1

17 pages, 2423 KiB  
Article
Design Proposal for a Virtual Shopping Assistant for People with Vision Problems Applying Artificial Intelligence Techniques
by William Villegas-Ch, Rodrigo Amores-Falconi and Eduardo Coronel-Silva
Big Data Cogn. Comput. 2023, 7(2), 96; https://doi.org/10.3390/bdcc7020096 - 12 May 2023
Cited by 4 | Viewed by 3555
Abstract
Accessibility is an increasingly important topic for Ecommerce, especially for individuals with vision problems. To improve their online experience, the design of a voice assistant has been proposed to allow these individuals to browse and shop online more quickly and efficiently. This voice [...] Read more.
Accessibility is an increasingly important topic for Ecommerce, especially for individuals with vision problems. To improve their online experience, the design of a voice assistant has been proposed to allow these individuals to browse and shop online more quickly and efficiently. This voice assistant forms an intelligent system that can understand and respond to users’ voice commands. The design considers the visual limitations of the users, such as difficulty reading information on the screen or identifying images. The voice assistant provides detailed product descriptions and ideas in a clear, easy-to-understand voice. In addition, the voice assistant has a series of additional features to improve the shopping experience. For example, the assistant can provide product recommendations based on the user’s previous purchases and information about special promotions and discounts. The main goal of this design is to create an accessible and inclusive online shopping experience for the visually impaired. The voice assistant is based on a conversational user interface, allowing users to easily navigate an eCommerce website, search for products, and make purchases. Full article
(This article belongs to the Special Issue Speech Recognition and Machine Learning: Current Trends and Future)
Show Figures

Figure 1

17 pages, 2065 KiB  
Article
Virtual Reality-Based Digital Twins: A Case Study on Pharmaceutical Cannabis
by Orestis Spyrou, William Hurst and Cor Verdouw
Big Data Cogn. Comput. 2023, 7(2), 95; https://doi.org/10.3390/bdcc7020095 - 10 May 2023
Cited by 1 | Viewed by 2935
Abstract
Digital Twins are digital equivalents of real-life objects. They allow producers to act immediately in case of (expected) deviations and to simulate effects of interventions based on real-life data. Digital Twin and eXtended Reality technologies (including Augmented Reality, Mixed Reality and Virtual Reality [...] Read more.
Digital Twins are digital equivalents of real-life objects. They allow producers to act immediately in case of (expected) deviations and to simulate effects of interventions based on real-life data. Digital Twin and eXtended Reality technologies (including Augmented Reality, Mixed Reality and Virtual Reality technologies), when coupled, are promising solutions to address the challenges of highly regulated crop production, namely the complexity of modern production environments for pharmaceutical cannabis, which are growing constantly as a result of legislative changes. Cannabis farms not only have to meet very high quality standards and regulatory requirements but also have to deal with high production and market uncertainties, including energy considerations. Thus, the main contributions of the research include an architecture design for eXtended-Reality-based Digital Twins for pharmaceutical cannabis production and a proof of concept, which was demonstrated at the Wageningen University Digital Twins conference. A convenience sampling method was used to recruit 30 participants who provided feedback on the application. The findings indicate that, despite 70% being unfamiliar with the concept, 80% of the participants were positive regarding the innovation and creativity. Full article
(This article belongs to the Special Issue Digital Twins for Complex Systems)
Show Figures

Figure 1

20 pages, 6486 KiB  
Article
Recognizing Similar Musical Instruments with YOLO Models
by Christine Dewi, Abbott Po Shun Chen and Henoch Juli Christanto
Big Data Cogn. Comput. 2023, 7(2), 94; https://doi.org/10.3390/bdcc7020094 - 10 May 2023
Cited by 6 | Viewed by 2500
Abstract
Researchers in the fields of machine learning and artificial intelligence have recently begun to focus their attention on object recognition. One of the biggest obstacles in image recognition through computer vision is the detection and identification of similar items. Identifying similar musical instruments [...] Read more.
Researchers in the fields of machine learning and artificial intelligence have recently begun to focus their attention on object recognition. One of the biggest obstacles in image recognition through computer vision is the detection and identification of similar items. Identifying similar musical instruments can be approached as a classification problem, where the goal is to train a machine learning model to classify instruments based on their features and shape. Cellos, clarinets, erhus, guitars, saxophones, trumpets, French horns, harps, recorders, bassoons, and violins were all classified in this investigation. There are many different musical instruments that have the same size, shape, and sound. In addition, we were amazed by the simplicity with which humans can identify items that are very similar to one another, but this is a challenging task for computers. For this study, we used YOLOv7 to identify pairs of musical instruments that are most like one another. Next, we compared and evaluated the results from YOLOv7 with those from YOLOv5. Furthermore, the results of our tests allowed us to enhance the performance in terms of detecting similar musical instruments. Moreover, with an average accuracy of 86.7%, YOLOv7 outperformed previous approaches and other research results. Full article
(This article belongs to the Special Issue Computational Collective Intelligence with Big Data–AI Society)
Show Figures

Figure 1

19 pages, 3116 KiB  
Article
Application of Artificial Intelligence for Fraudulent Banking Operations Recognition
by Bohdan Mytnyk, Oleksandr Tkachyk, Nataliya Shakhovska, Solomiia Fedushko and Yuriy Syerov
Big Data Cogn. Comput. 2023, 7(2), 93; https://doi.org/10.3390/bdcc7020093 - 10 May 2023
Cited by 5 | Viewed by 5971
Abstract
This study considers the task of applying artificial intelligence to recognize bank fraud. In recent years, due to the COVID-19 pandemic, bank fraud has become even more common due to the massive transition of many operations to online platforms and the creation of [...] Read more.
This study considers the task of applying artificial intelligence to recognize bank fraud. In recent years, due to the COVID-19 pandemic, bank fraud has become even more common due to the massive transition of many operations to online platforms and the creation of many charitable funds that criminals can use to deceive users. The present work focuses on machine learning algorithms as a tool well suited for analyzing and recognizing online banking transactions. The study’s scientific novelty is the development of machine learning models for identifying fraudulent banking transactions and techniques for preprocessing bank data for further comparison and selection of the best results. This paper also details various methods for improving detection accuracy, i.e., handling highly imbalanced datasets, feature transformation, and feature engineering. The proposed model, which is based on an artificial neural network, effectively improves the accuracy of fraudulent transaction detection. The results of the different algorithms are visualized, and the logistic regression algorithm performs the best, with an output AUC value of approximately 0.946. The stacked generalization shows a better AUC of 0.954. The recognition of banking fraud using artificial intelligence algorithms is a topical issue in our digital society. Full article
(This article belongs to the Special Issue Quality and Security of Critical Infrastructure Systems)
Show Figures

Figure 1

15 pages, 2317 KiB  
Article
An Improved Pattern Sequence-Based Energy Load Forecast Algorithm Based on Self-Organizing Maps and Artificial Neural Networks
by D. Criado-Ramón, L. G. B. Ruiz and M. C. Pegalajar
Big Data Cogn. Comput. 2023, 7(2), 92; https://doi.org/10.3390/bdcc7020092 - 10 May 2023
Cited by 3 | Viewed by 1611
Abstract
Pattern sequence-based models are a type of forecasting algorithm that utilizes clustering and other techniques to produce easily interpretable predictions faster than traditional machine learning models. This research focuses on their application in energy demand forecasting and introduces two significant contributions to the [...] Read more.
Pattern sequence-based models are a type of forecasting algorithm that utilizes clustering and other techniques to produce easily interpretable predictions faster than traditional machine learning models. This research focuses on their application in energy demand forecasting and introduces two significant contributions to the field. Firstly, this study evaluates the use of pattern sequence-based models with large datasets. Unlike previous works that use only one dataset or multiple datasets with less than two years of data, this work evaluates the models in three different public datasets, each containing eleven years of data. Secondly, we propose a new pattern sequence-based algorithm that uses a genetic algorithm to optimize the number of clusters alongside all other hyperparameters of the forecasting method, instead of using the Cluster Validity Indices (CVIs) commonly used in previous proposals. The results indicate that neural networks provide more accurate results than any pattern sequence-based algorithm and that our proposed algorithm outperforms other pattern sequence-based algorithms, albeit with a longer training time. Full article
Show Figures

Figure 1

21 pages, 7442 KiB  
Article
Predicting Cell Cleavage Timings from Time-Lapse Videos of Human Embryos
by Akriti Sharma, Ayaz Z. Ansari, Radhika Kakulavarapu, Mette H. Stensen, Michael A. Riegler and Hugo L. Hammer
Big Data Cogn. Comput. 2023, 7(2), 91; https://doi.org/10.3390/bdcc7020091 - 09 May 2023
Viewed by 3064
Abstract
Assisted reproductive technology is used for treating infertility, and its success relies on the quality and viability of embryos chosen for uterine transfer. Currently, embryologists manually assess embryo development, including the time duration between the cell cleavages. This paper introduces a machine learning [...] Read more.
Assisted reproductive technology is used for treating infertility, and its success relies on the quality and viability of embryos chosen for uterine transfer. Currently, embryologists manually assess embryo development, including the time duration between the cell cleavages. This paper introduces a machine learning methodology for automating the computations for the start of cell cleavage stages, in hours post insemination, in time-lapse videos. The methodology detects embryo cells in video frames and predicts the frame with the onset of the cell cleavage stage. Next, the methodology reads hours post insemination from the frame using optical character recognition. Unlike traditional embryo cell detection techniques, our suggested approach eliminates the need for extra image processing tasks such as locating embryos or removing extracellular material (fragmentation). The methodology accurately predicts cell cleavage stages up to five cells. The methodology was also able to detect the morphological structures of later cell cleavage stages, such as morula and blastocyst. It takes about one minute for the methodology to annotate the times of all the cell cleavages in a time-lapse video. Full article
(This article belongs to the Special Issue Multimedia Systems for Multimedia Big Data)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop