Big Data and Cognitive Computing

23 pages, 1380 KiB

Open AccessArticle

Enhancing Self-Supervised Learning through Explainable Artificial Intelligence Mechanisms: A Computational Analysis

by Elie Neghawi and Yan Liu

Big Data Cogn. Comput. 2024, 8(6), 58; https://doi.org/10.3390/bdcc8060058 (registering DOI) - 3 Jun 2024

Self-supervised learning continues to drive advancements in machine learning. However, the absence of unified computational processes for benchmarking and evaluation remains a challenge. This study conducts a comprehensive analysis of state-of-the-art self-supervised learning algorithms, emphasizing their underlying mechanisms and computational intricacies. Building upon [...] Read more.

Self-supervised learning continues to drive advancements in machine learning. However, the absence of unified computational processes for benchmarking and evaluation remains a challenge. This study conducts a comprehensive analysis of state-of-the-art self-supervised learning algorithms, emphasizing their underlying mechanisms and computational intricacies. Building upon this analysis, we introduce a unified model-agnostic computation (UMAC) process, tailored to complement modern self-supervised learning algorithms. UMAC serves as a model-agnostic and global explainable artificial intelligence (XAI) methodology that is capable of systematically integrating and enhancing state-of-the-art algorithms. Through UMAC, we identify key computational mechanisms and craft a unified framework for self-supervised learning evaluation. Leveraging UMAC, we integrate an XAI methodology to enhance transparency and interpretability. Our systematic approach yields a 17.12% increase in improvement in training time complexity and a 13.1% boost in improvement in testing time complexity. Notably, improvements are observed in augmentation, encoder architecture, and auxiliary components within the network classifier. These findings underscore the importance of structured computational processes in enhancing model efficiency and fortifying algorithmic transparency in self-supervised learning, paving the way for more interpretable and efficient AI models. Full article

► Show Figures

Figure 1

16 pages, 4754 KiB

Open AccessArticle

Dynamic Electrocardiogram Signal Quality Assessment Method Based on Convolutional Neural Network and Long Short-Term Memory Network

by Chen He, Yuxuan Wei, Yeru Wei, Qiang Liu and Xiang An

Big Data Cogn. Comput. 2024, 8(6), 57; https://doi.org/10.3390/bdcc8060057 - 31 May 2024

Abstract

Cardiovascular diseases (CVDs) are highly prevalent, sudden onset, and relatively fatal, posing a significant public health burden. Long-term dynamic electrocardiography, which can continuously record the long-term dynamic ECG activities of individuals in their daily lives, has high research value. However, ECG signals are [...] Read more.

Cardiovascular diseases (CVDs) are highly prevalent, sudden onset, and relatively fatal, posing a significant public health burden. Long-term dynamic electrocardiography, which can continuously record the long-term dynamic ECG activities of individuals in their daily lives, has high research value. However, ECG signals are weak and highly susceptible to external interference, which may lead to false alarms and misdiagnosis, affecting the diagnostic efficiency and the utilization rate of healthcare resources, so research on the quality of dynamic ECG signals is extremely necessary. Aimed at the above problems, this paper proposes a dynamic ECG signal quality assessment method based on CNN and LSTM that divides the signal into three quality categories: the signal of the Q1 category has a lower noise level, which can be used for reliable diagnosis of arrhythmia, etc.; the signal of the Q2 category has a higher noise level, but it still contains information that can be used for heart rate calculation, HRV analysis, etc.; and the signal of the Q3 category has a higher noise level that can interfere with the diagnosis of cardiovascular disease and should be discarded or labeled. In this paper, we use the widely recognized MIT-BIH database, based on which the model is applied to realistically collect exercise experimental data to assess the performance of the model in dealing with real-world situations. The model achieves an accuracy of 98.65% on the test set, a macro-averaged F1 score of 98.5%, and a high F1 score of 99.71% for the prediction of Q3 category signals, which shows that the model has good accuracy and generalization performance. Full article

► Show Figures

Figure 1

18 pages, 4725 KiB

Open AccessArticle

Stock Trend Prediction with Machine Learning: Incorporating Inter-Stock Correlation Information through Laplacian Matrix

by Wenxuan Zhang and Benzhuo Lu

Big Data Cogn. Comput. 2024, 8(6), 56; https://doi.org/10.3390/bdcc8060056 - 30 May 2024

Abstract

Predicting stock trends in financial markets is of significant importance to investors and portfolio managers. In addition to a stock’s historical price information, the correlation between that stock and others can also provide valuable information for forecasting future returns. Existing methods often fall [...] Read more.

Predicting stock trends in financial markets is of significant importance to investors and portfolio managers. In addition to a stock’s historical price information, the correlation between that stock and others can also provide valuable information for forecasting future returns. Existing methods often fall short of straightforward and effective capture of the intricate interdependencies between stocks. In this research, we introduce the concept of a Laplacian correlation graph (LOG), designed to explicitly model the correlations in stock price changes as the edges of a graph. After constructing the LOG, we will build a machine learning model, such as a graph attention network (GAT), and incorporate the LOG into the loss term. This innovative loss term is designed to empower the neural network to learn and leverage price correlations among different stocks in a straightforward but effective manner. The advantage of a Laplacian matrix is that matrix operation form is more suitable for current machine learning frameworks, thus achieving high computational efficiency and simpler model representation. Experimental results demonstrate improvements across multiple evaluation metrics using our LOG. Incorporating our LOG into five base machine learning models consistently enhances their predictive performance. Furthermore, backtesting results reveal superior returns and information ratios, underscoring the practical implications of our approach for real-world investment decisions. Our study addresses the limitations of existing methods that miss the correlation between stocks or fail to model correlation in a simple and effective way, and the proposed LOG emerges as a promising tool for stock returns prediction, offering enhanced predictive accuracy and improved investment outcomes. Full article

(This article belongs to the Special Issue Big Data Analytics and Edge Computing: Recent Trends and Future)

► Show Figures

Figure 1

25 pages, 8282 KiB

Open AccessArticle

A Secure Data Publishing and Access Service for Sensitive Data from Living Labs: Enabling Collaboration with External Researchers via Shareable Data

by Mikel Hernandez, Evdokimos Konstantinidis, Gorka Epelde, Francisco Londoño, Despoina Petsani, Michalis Timoleon, Vasiliki Fiska, Lampros Mpaltadoros, Christoniki Maga-Nteve, Ilias Machairas and Panagiotis D. Bamidis

Big Data Cogn. Comput. 2024, 8(6), 55; https://doi.org/10.3390/bdcc8060055 - 28 May 2024

Viewed by 260

Abstract

Intending to enable a broader collaboration with the scientific community while maintaining privacy of the data stored and generated in Living Labs, this paper presents the Shareable Data Publishing and Access Service for Living Labs, implemented within the framework of the H2020 VITALISE [...] Read more.

Intending to enable a broader collaboration with the scientific community while maintaining privacy of the data stored and generated in Living Labs, this paper presents the Shareable Data Publishing and Access Service for Living Labs, implemented within the framework of the H2020 VITALISE project. Building upon previous work, significant enhancements and improvements are presented in the architecture enabling Living Labs to securely publish collected data in an internal and isolated node for external use. External researchers can access a portal to discover and download shareable data versions (anonymised or synthetic data) derived from the data stored across different Living Labs that they can use to develop, test, and debug their processing scripts locally, adhering to legal and ethical data handling practices. Subsequently, they may request remote execution of the same algorithms against the real internal data in Living Lab nodes, comparing the outcomes with those obtained using shareable data. The paper details the architecture, data flows, technical details and validation of the service with real-world usage examples, demonstrating its efficacy in promoting data-driven research in digital health while preserving privacy. The presented service can be used as an intermediary between Living Labs and external researchers for secure data exchange and to accelerate research on data analytics paradigms in digital health, ensuring compliance with data protection laws. Full article

(This article belongs to the Special Issue Privacy-Enhancing Technologies of Data for Sustainable and Secure Cooperation)

► Show Figures

Figure 1

20 pages, 70388 KiB

Open AccessArticle

Analyzing the Attractiveness of Food Images Using an Ensemble of Deep Learning Models Trained via Social Media Images

by Tanyaboon Morinaga, Karn Patanukhom and Yuthapong Somchit

Big Data Cogn. Comput. 2024, 8(6), 54; https://doi.org/10.3390/bdcc8060054 - 27 May 2024

Viewed by 189

Abstract

With the growth of digital media and social networks, sharing visual content has become common in people’s daily lives. In the food industry, visually appealing food images can attract attention, drive engagement, and influence consumer behavior. Therefore, it is crucial for businesses to [...] Read more.

With the growth of digital media and social networks, sharing visual content has become common in people’s daily lives. In the food industry, visually appealing food images can attract attention, drive engagement, and influence consumer behavior. Therefore, it is crucial for businesses to understand what constitutes attractive food images. Assessing the attractiveness of food images poses significant challenges due to the lack of large labeled datasets that align with diverse public preferences. Additionally, it is challenging for computer assessments to approach human judgment in evaluating aesthetic quality. This paper presents a novel framework that circumvents the need for explicit human annotation by leveraging user engagement data that are readily available on social media platforms. We propose procedures to collect, filter, and automatically label the attractiveness classes of food images based on their user engagement levels. The data gathered from social media are used to create predictive models for category-specific attractiveness assessments. Our experiments across five food categories demonstrate the efficiency of our approach. The experimental results show that our proposed user-engagement-based attractiveness class labeling achieves a high consistency of 97.2% compared to human judgments obtained through A/B testing. Separate attractiveness assessment models were created for each food category using convolutional neural networks (CNNs). When analyzing unseen food images, our models achieve a consistency of 76.0% compared to human judgments. The experimental results suggest that the food image dataset collected from social networks, using the proposed framework, can be successfully utilized for learning food attractiveness assessment models. Full article

(This article belongs to the Special Issue Advances and Applications of Deep Learning Methods and Image Processing)

► Show Figures

Figure 1

23 pages, 2866 KiB

Open AccessArticle

Exploiting Rating Prediction Certainty for Recommendation Formulation in Collaborative Filtering

by Dionisis Margaris, Kiriakos Sgardelis, Dimitris Spiliotopoulos and Costas Vassilakis

Big Data Cogn. Comput. 2024, 8(6), 53; https://doi.org/10.3390/bdcc8060053 - 27 May 2024

Viewed by 263

Abstract

Collaborative filtering is a popular recommender system (RecSys) method that produces rating prediction values for products by combining the ratings that close users have already given to the same products. Afterwards, the products that achieve the highest prediction values are recommended to the [...] Read more.

Collaborative filtering is a popular recommender system (RecSys) method that produces rating prediction values for products by combining the ratings that close users have already given to the same products. Afterwards, the products that achieve the highest prediction values are recommended to the user. However, as expected, prediction estimation may contain errors, which, in the case of RecSys, will lead to either not recommending a product that the user would actually like (i.e., purchase, watch, or listen) or to recommending a product that the user would not like, with both cases leading to degraded recommendation quality. Especially in the latter case, the RecSys would be deemed unreliable. In this work, we design and develop a recommendation algorithm that considers both the rating prediction values and the prediction confidence, derived from features associated with rating prediction accuracy in collaborative filtering. The presented algorithm is based on the rationale that it is preferable to recommend an item with a slightly lower prediction value, if that prediction seems to be certain and safe, over another that has a higher value but of lower certainty. The proposed algorithm prevents low-confidence rating predictions from being included in recommendations, ensuring the recommendation quality and reliability of the RecSys. Full article

(This article belongs to the Special Issue Business Intelligence and Big Data in E-commerce)

► Show Figures

Figure 1

14 pages, 4246 KiB

Open AccessArticle

Image-Based Leaf Disease Recognition Using Transfer Deep Learning with a Novel Versatile Optimization Module

by Petra Radočaj, Dorijan Radočaj and Goran Martinović

Big Data Cogn. Comput. 2024, 8(6), 52; https://doi.org/10.3390/bdcc8060052 - 23 May 2024

Viewed by 382

Abstract

Due to the projected increase in food production by 70% in 2050, crops should be additionally protected from diseases and pests to ensure a sufficient food supply. Transfer deep learning approaches provide a more efficient solution than traditional methods, which are labor-intensive and [...] Read more.

Due to the projected increase in food production by 70% in 2050, crops should be additionally protected from diseases and pests to ensure a sufficient food supply. Transfer deep learning approaches provide a more efficient solution than traditional methods, which are labor-intensive and struggle to effectively monitor large areas, leading to delayed disease detection. This study proposed a versatile module based on the Inception module, Mish activation function, and Batch normalization (IncMB) as a part of deep neural networks. A convolutional neural network (CNN) with transfer learning was used as the base for evaluated approaches for tomato disease detection: (1) CNNs, (2) CNNs with a support vector machine (SVM), and (3) CNNs with the proposed IncMB module. In the experiment, the public dataset PlantVillage was used, containing images of six different tomato leaf diseases. The best results were achieved by the pre-trained InceptionV3 network, which contains an IncMB module with an accuracy of 97.78%. In three out of four cases, the highest accuracy was achieved by networks containing the proposed IncMB module in comparison to evaluated CNNs. The proposed IncMB module represented an improvement in the early detection of plant diseases, providing a basis for timely leaf disease detection. Full article

(This article belongs to the Topic Big Data and Artificial Intelligence, 2nd Volume)

► Show Figures

Figure 1

20 pages, 4936 KiB

Open AccessArticle

Development of Context-Based Sentiment Classification for Intelligent Stock Market Prediction

by Nurmaganbet Smatov, Ruslan Kalashnikov and Amandyk Kartbayev

Big Data Cogn. Comput. 2024, 8(6), 51; https://doi.org/10.3390/bdcc8060051 - 22 May 2024

Viewed by 385

Abstract

This paper presents a novel approach to sentiment analysis specifically customized for predicting stock market movements, bypassing the need for external dictionaries that are often unavailable for many languages. Our methodology directly analyzes textual data, with a particular focus on context-specific sentiment words [...] Read more.

This paper presents a novel approach to sentiment analysis specifically customized for predicting stock market movements, bypassing the need for external dictionaries that are often unavailable for many languages. Our methodology directly analyzes textual data, with a particular focus on context-specific sentiment words within neural network models. This specificity ensures that our sentiment analysis is both relevant and accurate in identifying trends in the stock market. We employ sophisticated mathematical modeling techniques to enhance both the precision and interpretability of our models. Through meticulous data handling and advanced machine learning methods, we leverage large datasets from Twitter and financial markets to examine the impact of social media sentiment on financial trends. We achieved an accuracy exceeding 75%, highlighting the effectiveness of our modeling approach, which we further refined into a convolutional neural network model. This achievement contributes valuable insights into sentiment analysis within the financial domain, thereby improving the overall clarity of forecasting in this field. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Big Data Cogn. Comput., Volume 8, Issue 6 (June 2024) – 8 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI