Big Data and Cognitive Computing

24 pages, 7573 KiB

Open AccessEditor’s ChoiceArticle

Robust Multi-Mode Synchronization of Chaotic Fractional Order Systems in the Presence of Disturbance, Time Delay and Uncertainty with Application in Secure Communications

by Ali Akbar Kekha Javan, Assef Zare, Roohallah Alizadehsani and Saeed Balochian

Big Data Cogn. Comput. 2022, 6(2), 51; https://doi.org/10.3390/bdcc6020051 - 08 May 2022

Cited by 3 | Viewed by 2323

Abstract

This paper investigates the robust adaptive synchronization of multi-mode fractional-order chaotic systems (MMFOCS). To that end, synchronization was performed with unknown parameters, unknown time delays, the presence of disturbance, and uncertainty with the unknown boundary. The convergence of the synchronization error to zero [...] Read more.

This paper investigates the robust adaptive synchronization of multi-mode fractional-order chaotic systems (MMFOCS). To that end, synchronization was performed with unknown parameters, unknown time delays, the presence of disturbance, and uncertainty with the unknown boundary. The convergence of the synchronization error to zero was guaranteed using the Lyapunov function. Additionally, the control rules were extracted as explicit continuous functions. An image encryption approach was proposed based on maps with time-dependent coding for secure communication. The simulations indicated the effectiveness of the proposed design regarding the suitability of the parameters, the convergence of errors, and robustness. Subsequently, the presented method was applied to fractional-order Chen systems and was encrypted using the chaotic masking of different benchmark images. The results indicated the desirable performance of the proposed method in encrypting the benchmark images. Full article

► Show Figures

Figure 1

32 pages, 5511 KiB

Open AccessEditor’s ChoiceArticle

Gender Stereotypes in Hollywood Movies and Their Evolution over Time: Insights from Network Analysis

by Arjun M. Kumar, Jasmine Y. Q. Goh, Tiffany H. H. Tan and Cynthia S. Q. Siew

Big Data Cogn. Comput. 2022, 6(2), 50; https://doi.org/10.3390/bdcc6020050 - 06 May 2022

Cited by 3 | Viewed by 42999

Abstract

The present analysis of more than 180,000 sentences from movie plots across the period from 1940 to 2019 emphasizes how gender stereotypes are expressed through the cultural products of society. By applying a network analysis to the word co-occurrence networks of movie plots [...] Read more.

The present analysis of more than 180,000 sentences from movie plots across the period from 1940 to 2019 emphasizes how gender stereotypes are expressed through the cultural products of society. By applying a network analysis to the word co-occurrence networks of movie plots and using a novel method of identifying story tropes, we demonstrate that gender stereotypes exist in Hollywood movies. An analysis of specific paths in the network and the words reflecting various domains show the dynamic changes in some of these stereotypical associations. Our results suggest that gender stereotypes are complex and dynamic in nature. Specifically, whereas male characters appear to be associated with a diversity of themes in movies, female characters seem predominantly associated with the theme of romance. Although associations of female characters to physical beauty and marriage are declining over time, associations of female characters to sexual relationships and weddings are increasing. Our results demonstrate how the application of cognitive network science methods can enable a more nuanced investigation of gender stereotypes in textual data. Full article

(This article belongs to the Special Issue Knowledge Modelling and Learning through Cognitive Networks)

► Show Figures

Figure 1

19 pages, 3173 KiB

Open AccessEditor’s ChoiceArticle

A Comparative Study of MongoDB and Document-Based MySQL for Big Data Application Data Management

by Cornelia A. Győrödi, Diana V. Dumşe-Burescu, Doina R. Zmaranda and Robert Ş. Győrödi

Big Data Cogn. Comput. 2022, 6(2), 49; https://doi.org/10.3390/bdcc6020049 - 05 May 2022

Cited by 7 | Viewed by 10160

Abstract

In the context of the heavy demands of Big Data, software developers have also begun to consider NoSQL data storage solutions. One of the important criteria when choosing a NoSQL database for an application is its performance in terms of speed of data [...] Read more.

In the context of the heavy demands of Big Data, software developers have also begun to consider NoSQL data storage solutions. One of the important criteria when choosing a NoSQL database for an application is its performance in terms of speed of data accessing and processing, including response times to the most important CRUD operations (CREATE, READ, UPDATE, DELETE). In this paper, the behavior of two of the major document-based NoSQL databases, MongoDB and document-based MySQL, was analyzed in terms of the complexity and performance of CRUD operations, especially in query operations. The main objective of the paper is to make a comparative analysis of the impact that each specific database has on application performance when realizing CRUD requests. To perform this analysis, a case-study application was developed using the two document-based MongoDB and MySQL databases, which aim to model and streamline the activity of service providers that use a lot of data. The results obtained demonstrate the performance of both databases for different volumes of data; based on these, a detailed analysis and several conclusions were presented to support a decision for choosing an appropriate solution that could be used in a big-data application. Full article

(This article belongs to the Topic Complex Data Analytics and Computing with Real-World Applications)

► Show Figures

Figure 1

19 pages, 2456 KiB

Open AccessEditor’s ChoiceArticle

A New Ontology-Based Method for Arabic Sentiment Analysis

by Safaa M. Khabour, Qasem A. Al-Radaideh and Dheya Mustafa

Big Data Cogn. Comput. 2022, 6(2), 48; https://doi.org/10.3390/bdcc6020048 - 29 Apr 2022

Cited by 8 | Viewed by 3953

Abstract

Arabic sentiment analysis is a process that aims to extract the subjective opinions of different users about different subjects since these opinions and sentiments are used to recognize their perspectives and judgments in a particular domain. Few research studies addressed semantic-oriented approaches for [...] Read more.

Arabic sentiment analysis is a process that aims to extract the subjective opinions of different users about different subjects since these opinions and sentiments are used to recognize their perspectives and judgments in a particular domain. Few research studies addressed semantic-oriented approaches for Arabic sentiment analysis based on domain ontologies and features’ importance. In this paper, we built a semantic orientation approach for calculating overall polarity from the Arabic subjective texts based on built domain ontology and the available sentiment lexicon. We used the ontology concepts to extract and weight the semantic domain features by considering their levels in the ontology tree and their frequencies in the dataset to compute the overall polarity of a given textual review based on the importance of each domain feature. For evaluation, an Arabic dataset from the hotels’ domain was selected to build the domain ontology and to test the proposed approach. The overall accuracy and f-measure reach 79.20% and 78.75%, respectively. Results showed that the approach outperformed the other semantic orientation approaches, and it is an appealing approach to be used for Arabic sentiment analysis. Full article

(This article belongs to the Topic Complex Data Analytics and Computing with Real-World Applications)

► Show Figures

Figure 1

28 pages, 772 KiB

Open AccessEditor’s ChoiceArticle

Incentive Mechanisms for Smart Grid: State of the Art, Challenges, Open Issues, Future Directions

by Sweta Bhattacharya, Rajeswari Chengoden, Gautam Srivastava, Mamoun Alazab, Abdul Rehman Javed, Nancy Victor, Praveen Kumar Reddy Maddikunta and Thippa Reddy Gadekallu

Big Data Cogn. Comput. 2022, 6(2), 47; https://doi.org/10.3390/bdcc6020047 - 27 Apr 2022

Cited by 26 | Viewed by 5823

Abstract

Smart grids (SG) are electricity grids that communicate with each other, provide reliable information, and enable administrators to operate energy supplies across the country, ensuring optimized reliability and efficiency. The smart grid contains sensors that measure and transmit data to adjust the flow [...] Read more.

Smart grids (SG) are electricity grids that communicate with each other, provide reliable information, and enable administrators to operate energy supplies across the country, ensuring optimized reliability and efficiency. The smart grid contains sensors that measure and transmit data to adjust the flow of electricity automatically based on supply/demand, and thus, responding to problems becomes quicker and easier. This also plays a crucial role in controlling carbon emissions, by avoiding energy losses during peak load hours and ensuring optimal energy management. The scope of big data analytics in smart grids is huge, as they collect information from raw data and derive intelligent information from the same. However, these benefits of the smart grid are dependent on the active and voluntary participation of the consumers in real-time. Consumers need to be motivated and conscious to avail themselves of the achievable benefits. Incentivizing the appropriate actor is an absolute necessity to encourage prosumers to generate renewable energy sources (RES) and motivate industries to establish plants that support sustainable and green-energy-based processes or products. The current study emphasizes similar aspects and presents a comprehensive survey of the start-of-the-art contributions pertinent to incentive mechanisms in smart grids, which can be used in smart grids to optimize the power distribution during peak times and also reduce carbon emissions. The various technologies, such as game theory, blockchain, and artificial intelligence, used in implementing incentive mechanisms in smart grids are discussed, followed by different incentive projects being implemented across the globe. The lessons learnt, challenges faced in such implementations, and open issues such as data quality, privacy, security, and pricing related to incentive mechanisms in SG are identified to guide the future scope of research in this sector. Full article

(This article belongs to the Special Issue Energy-Efficient IoT (Internet of Things) and Big Data Challenges for Connected Intelligence)

► Show Figures

Figure 1

40 pages, 14654 KiB

Open AccessEditor’s ChoiceReview

Deep Learning Approaches for Video Compression: A Bibliometric Analysis

by Ranjeet Vasant Bidwe, Sashikala Mishra, Shruti Patil, Kailash Shaw, Deepali Rahul Vora, Ketan Kotecha and Bhushan Zope

Big Data Cogn. Comput. 2022, 6(2), 44; https://doi.org/10.3390/bdcc6020044 - 19 Apr 2022

Cited by 32 | Viewed by 6726

Abstract

Every data and kind of data need a physical drive to store it. There has been an explosion in the volume of images, video, and other similar data types circulated over the internet. Users using the internet expect intelligible data, even under the [...] Read more.

Every data and kind of data need a physical drive to store it. There has been an explosion in the volume of images, video, and other similar data types circulated over the internet. Users using the internet expect intelligible data, even under the pressure of multiple resource constraints such as bandwidth bottleneck and noisy channels. Therefore, data compression is becoming a fundamental problem in wider engineering communities. There has been some related work on data compression using neural networks. Various machine learning approaches are currently applied in data compression techniques and tested to obtain better lossy and lossless compression results. A very efficient and variety of research is already available for image compression. However, this is not the case for video compression. Because of the explosion of big data and the excess use of cameras in various places globally, around 82% of the data generated involve videos. Proposed approaches have used Deep Neural Networks (DNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial Networks (GANs), and various variants of Autoencoders (AEs) are used in their approaches. All newly proposed methods aim to increase performance (reducing bitrate up to 50% at the same data quality and complexity). This paper presents a bibliometric analysis and literature survey of all Deep Learning (DL) methods used in video compression in recent years. Scopus and Web of Science are well-known research databases. The results retrieved from them are used for this analytical study. Two types of analysis are performed on the extracted documents. They include quantitative and qualitative results. In quantitative analysis, records are analyzed based on their citations, keywords, source of publication, and country of publication. The qualitative analysis provides information on DL-based approaches for video compression, as well as the advantages, disadvantages, and challenges of using them. Full article

► Show Figures

Figure 1

25 pages, 664 KiB

Open AccessEditor’s ChoiceArticle

New Efficient Approach to Solve Big Data Systems Using Parallel Gauss–Seidel Algorithms

by Shih Yu Chang, Hsiao-Chun Wu and Yifan Wang

Big Data Cogn. Comput. 2022, 6(2), 43; https://doi.org/10.3390/bdcc6020043 - 19 Apr 2022

Viewed by 2554

Abstract

In order to perform big-data analytics, regression involving large matrices is often necessary. In particular, large scale regression problems are encountered when one wishes to extract semantic patterns for knowledge discovery and data mining. When a large matrix can be processed in its [...] Read more.

In order to perform big-data analytics, regression involving large matrices is often necessary. In particular, large scale regression problems are encountered when one wishes to extract semantic patterns for knowledge discovery and data mining. When a large matrix can be processed in its factorized form, advantages arise in terms of computation, implementation, and data-compression. In this work, we propose two new parallel iterative algorithms as extensions of the Gauss–Seidel algorithm (GSA) to solve regression problems involving many variables. The convergence study in terms of error-bounds of the proposed iterative algorithms is also performed, and the required computation resources, namely time- and memory-complexities, are evaluated to benchmark the efficiency of the proposed new algorithms. Finally, the numerical results from both Monte Carlo simulations and real-world datasets are presented to demonstrate the striking effectiveness of our proposed new methods. Full article

► Show Figures

Figure 1

18 pages, 1307 KiB

Open AccessEditor’s ChoiceArticle

An Emergency Event Detection Ensemble Model Based on Big Data

by Khalid Alfalqi and Martine Bellaiche

Big Data Cogn. Comput. 2022, 6(2), 42; https://doi.org/10.3390/bdcc6020042 - 16 Apr 2022

Cited by 4 | Viewed by 3751

Abstract

Emergency events arise when a serious, unexpected, and often dangerous threat affects normal life. Hence, knowing what is occurring during and after emergency events is critical to mitigate the effect of the incident on humans’ life, on the environment and our infrastructures, as [...] Read more.

Emergency events arise when a serious, unexpected, and often dangerous threat affects normal life. Hence, knowing what is occurring during and after emergency events is critical to mitigate the effect of the incident on humans’ life, on the environment and our infrastructures, as well as the inherent financial consequences. Social network utilization in emergency event detection models can play an important role as information is shared and users’ status is updated once an emergency event occurs. Besides, big data proved its significance as a tool to assist and alleviate emergency events by processing an enormous amount of data over a short time interval. This paper shows that it is necessary to have an appropriate emergency event detection ensemble model (EEDEM) to respond quickly once such unfortunate events occur. Furthermore, it integrates Snapchat maps to propose a novel method to pinpoint the exact location of an emergency event. Moreover, merging social networks and big data can accelerate the emergency event detection system: social network data, such as those from Twitter and Snapchat, allow us to manage, monitor, analyze and detect emergency events. The main objective of this paper is to propose a novel and efficient big data-based EEDEM to pinpoint the exact location of emergency events by employing the collected data from social networks, such as “Twitter” and “Snapchat”, while integrating big data (BD) and machine learning (ML). Furthermore, this paper evaluates the performance of five ML base models and the proposed ensemble approach to detect emergency events. Results show that the proposed ensemble approach achieved a very high accuracy of 99.87% which outperform the other base models. Moreover, the proposed base models yields a high level of accuracy: 99.72%, 99.70% for LSTM and decision tree, respectively, with an acceptable training time. Full article

(This article belongs to the Topic Big Data and Artificial Intelligence)

► Show Figures

Figure 1

21 pages, 312 KiB

Open AccessEditor’s ChoiceArticle

Operations with Nested Named Sets as a Tool for Artificial Intelligence

by Mark Burgin

Big Data Cogn. Comput. 2022, 6(2), 37; https://doi.org/10.3390/bdcc6020037 - 01 Apr 2022

Viewed by 2415

Abstract

Knowledge and data representations are important for artificial intelligence (AI), as well as for intelligence in general. Intelligent functioning presupposes efficient operation with knowledge and data representations in particular. At the same time, it has been demonstrated that named sets, which are also [...] Read more.

Knowledge and data representations are important for artificial intelligence (AI), as well as for intelligence in general. Intelligent functioning presupposes efficient operation with knowledge and data representations in particular. At the same time, it has been demonstrated that named sets, which are also called fundamental triads, instantiate the most fundamental structure in general and for knowledge and data representations in particular. In this context, named sets allow for effective mathematical portrayal of the key phenomenon, called nesting. Nesting plays a weighty role in a variety of fields, such as mathematics and computer science. Computing tools of AI include nested levels of parentheses in arithmetical expressions; different types of recursion; nesting of several levels of subroutines; nesting in recursive calls; multilevel nesting in information hiding; a variety of nested data structures, such as records, objects, and classes; and nested blocks of imperative source code, such as nested repeat-until clauses, while clauses, if clauses, etc. In this paper, different operations with nested named sets are constructed and their properties obtained, reflecting different attributes of nesting. An AI system receives information in the form of data and knowledge and processing information, performs operations with these data and knowledge. Thus, such a system needs various operations for these processes. Operations constructed in this paper perform processing of data and knowledge in the form of nested named sets. Knowing properties of these operations can help to optimize the processing of data and knowledge in AI systems. Full article

(This article belongs to the Special Issue Data, Structure, and Information in Artificial Intelligence)

22 pages, 2824 KiB

Open AccessEditor’s ChoiceArticle

Startups and Consumer Purchase Behavior: Application of Support Vector Machine Algorithm

by Pejman Ebrahimi, Aidin Salamzadeh, Maryam Soleimani, Seyed Mohammad Khansari, Hadi Zarea and Maria Fekete-Farkas

Big Data Cogn. Comput. 2022, 6(2), 34; https://doi.org/10.3390/bdcc6020034 - 25 Mar 2022

Cited by 17 | Viewed by 5005

Abstract

This study evaluated the impact of startup technology innovations and customer relationship management (CRM) performance on customer participation, value co-creation, and consumer purchase behavior (CPB). This analytical study empirically tested the proposed hypotheses using structural equation modeling (SEM) and SmartPLS 3 techniques. Moreover, [...] Read more.

This study evaluated the impact of startup technology innovations and customer relationship management (CRM) performance on customer participation, value co-creation, and consumer purchase behavior (CPB). This analytical study empirically tested the proposed hypotheses using structural equation modeling (SEM) and SmartPLS 3 techniques. Moreover, we used a support vector machine (SVM) algorithm to verify the model’s accuracy. SVM algorithm uses four different kernels to check the accuracy criterion, and we checked all of them. This research used the convenience sampling approach in gathering the data. We used the conventional bias test method. A total of 466 respondents were completed. Technological innovations of startups and CRM have a positive and significant effect on customer participation. Customer participation significantly affects the value of pleasure, economic value, and relationship value. Based on the importance-performance map analysis (IPMA) matrix results, “customer participation” with a score of 0.782 had the highest importance. If customers increase their participation performance by one unit during the COVID-19 epidemic, its overall CPB increases by 0.782. In addition, our results showed that the lowest performance is related to the technological innovations of startups, which indicates an excellent opportunity for development in this area. SVM results showed that polynomial kernel, to a high degree, is the best kernel that confirms the model’s accuracy. Full article

(This article belongs to the Special Issue Advancements in Deep Learning and Deep Federated Learning Models)

► Show Figures

Figure 1

18 pages, 1612 KiB

Open AccessEditor’s ChoiceArticle

Social Networks Marketing and Consumer Purchase Behavior: The Combination of SEM and Unsupervised Machine Learning Approaches

by Pejman Ebrahimi, Marjan Basirat, Ali Yousefi, Md. Nekmahmud, Abbas Gholampour and Maria Fekete-Farkas

Big Data Cogn. Comput. 2022, 6(2), 35; https://doi.org/10.3390/bdcc6020035 - 25 Mar 2022

Cited by 30 | Viewed by 13924

Abstract

The purpose of this paper is to reveal how social network marketing (SNM) can affect consumers’ purchase behavior (CPB). We used the combination of structural equation modeling (SEM) and unsupervised machine learning approaches as an innovative method. The statistical population of the study [...] Read more.

The purpose of this paper is to reveal how social network marketing (SNM) can affect consumers’ purchase behavior (CPB). We used the combination of structural equation modeling (SEM) and unsupervised machine learning approaches as an innovative method. The statistical population of the study concluded users who live in Hungary and use Facebook Marketplace. This research uses the convenience sampling approach to overcome bias. Out of 475 surveys distributed, a total of 466 respondents successfully filled out the entire survey with a response rate of 98.1%. The results showed that all dimensions of social network marketing, such as entertainment, customization, interaction, WoM and trend, had positively and significantly influenced consumer purchase behavior (CPB) in Facebook Marketplace. Furthermore, we used hierarchical clustering and K-means unsupervised algorithms to cluster consumers. The results show that respondents of this research can be clustered in nine different groups based on behavior regarding demographic attributes. It means that distinctive strategies can be used for different clusters. Meanwhile, marketing managers can provide different options, products and services for each group. This study is of high importance in that it has adopted and used plspm and Matrixpls packages in R to show the model predictive power. Meanwhile, we used unsupervised machine learning algorithms to cluster consumer behaviors. Full article

(This article belongs to the Special Issue Machine Learning for Dependable Edge Computing Systems and Services)

► Show Figures

Figure 1

23 pages, 830 KiB

Open AccessEditor’s ChoiceArticle

Service Oriented R-ANN Knowledge Model for Social Internet of Things

by Mohana S. D., S. P. Shiva Prakash and Kirill Krinkin

Big Data Cogn. Comput. 2022, 6(1), 32; https://doi.org/10.3390/bdcc6010032 - 18 Mar 2022

Cited by 5 | Viewed by 3293

Abstract

Increase in technologies around the world requires adding intelligence to the objects, and making it a smart object in an environment leads to the Social Internet of Things (SIoT). These social objects are uniquely identifiable, transferable and share information from user-to-objects and objects-to [...] Read more.

Increase in technologies around the world requires adding intelligence to the objects, and making it a smart object in an environment leads to the Social Internet of Things (SIoT). These social objects are uniquely identifiable, transferable and share information from user-to-objects and objects-to objects through interactions in a smart environment such as smart homes, smart cities and many more applications. SIoT faces certain challenges such as handling of heterogeneous objects, selection of generated data in objects, missing values in data. Therefore, the discovery and communication of meaningful patterns in data are more important for every application. Thus, the analysis of data is essential in smarter decisions and qualifies performance of data for various applications. In a smart environment, social networks of intelligent objects are increasing services and decreasing the relationship in a reliable and efficient way of sharing resources and services. Hence, this work proposed the feature selection method based on proposed semantic rules and established the relationships to classify the services using relationship artificial neural networks (R-ANN). R-ANN is an inversely proportional relationship to the objects based on certain rules and conditions between the objects to objects and users to objects. It provides the service oriented knowledge model to make decisions in the proposed R-ANN model that produces service to the users. The proposed R-ANN provides an accuracy of 89.62% for various services namely weather, air quality, parking, light status, and people presence respectively in the SIoT environment compared to the existing model. Full article

► Show Figures

Figure 1

17 pages, 2573 KiB

Open AccessEditor’s ChoiceArticle

Big Data Management in Drug–Drug Interaction: A Modern Deep Learning Approach for Smart Healthcare

by Muhammad Salman, Hafiz Suliman Munawar, Khalid Latif, Muhammad Waseem Akram, Sara Imran Khan and Fahim Ullah

Big Data Cogn. Comput. 2022, 6(1), 30; https://doi.org/10.3390/bdcc6010030 - 09 Mar 2022

Cited by 8 | Viewed by 5757

Abstract

The detection and classification of drug–drug interactions (DDI) from existing data are of high importance because recent reports show that DDIs are among the major causes of hospital-acquired conditions and readmissions and are also necessary for smart healthcare. Therefore, to avoid adverse drug [...] Read more.

The detection and classification of drug–drug interactions (DDI) from existing data are of high importance because recent reports show that DDIs are among the major causes of hospital-acquired conditions and readmissions and are also necessary for smart healthcare. Therefore, to avoid adverse drug interactions, it is necessary to have an up-to-date knowledge of DDIs. This knowledge could be extracted by applying text-processing techniques to the medical literature published in the form of ‘Big Data’ because, whenever a drug interaction is investigated, it is typically reported and published in healthcare and clinical pharmacology journals. However, it is crucial to automate the extraction of the interactions taking place between drugs because the medical literature is being published in immense volumes, and it is impossible for healthcare professionals to read and collect all of the investigated DDI reports from these Big Data. To avoid this time-consuming procedure, the Information Extraction (IE) and Relationship Extraction (RE) techniques that have been studied in depth in Natural Language Processing (NLP) could be very promising. Since 2011, a lot of research has been reported in this particular area, and there are many approaches that have been implemented that can also be applied to biomedical texts to extract DDI-related information. A benchmark corpus is also publicly available for the advancement of DDI extraction tasks. The current state-of-the-art implementations for extracting DDIs from biomedical texts has employed Support Vector Machines (SVM) or other machine learning methods that work on manually defined features and that might be the cause of the low precision and recall that have been achieved in this domain so far. Modern deep learning techniques have also been applied for the automatic extraction of DDIs from the scientific literature and have proven to be very promising for the advancement of DDI extraction tasks. As such, it is pertinent to investigate deep learning techniques for the extraction and classification of DDIs in order for them to be used in the smart healthcare domain. We proposed a deep neural network-based method (SEV-DDI: Severity-Drug–Drug Interaction) with some further-integrated units/layers to achieve higher precision and accuracy. After successfully outperforming other methods in the DDI classification task, we moved a step further and utilized the methods in a sentiment analysis task to investigate the severity of an interaction. The ability to determine the severity of a DDI will be very helpful for clinical decision support systems in making more accurate and informed decisions, ensuring the safety of the patients. Full article

► Show Figures

Figure 1

29 pages, 2381 KiB

Open AccessEditor’s ChoiceArticle

Radiology Imaging Scans for Early Diagnosis of Kidney Tumors: A Review of Data Analytics-Based Machine Learning and Deep Learning Approaches

by Maha Gharaibeh, Dalia Alzu’bi, Malak Abdullah, Ismail Hmeidi, Mohammad Rustom Al Nasar, Laith Abualigah and Amir H. Gandomi

Big Data Cogn. Comput. 2022, 6(1), 29; https://doi.org/10.3390/bdcc6010029 - 08 Mar 2022

Cited by 26 | Viewed by 10961

Abstract

Plenty of disease types exist in world communities that can be explained by humans’ lifestyles or the economic, social, genetic, and other factors of the country of residence. Recently, most research has focused on studying common diseases in the population to reduce death [...] Read more.

Plenty of disease types exist in world communities that can be explained by humans’ lifestyles or the economic, social, genetic, and other factors of the country of residence. Recently, most research has focused on studying common diseases in the population to reduce death risks, take the best procedure for treatment, and enhance the healthcare level of the communities. Kidney Disease is one of the common diseases that have affected our societies. Sectionicularly Kidney Tumors (KT) are the 10th most prevalent tumor for men and women worldwide. Overall, the lifetime likelihood of developing a kidney tumor for males is about 1 in 466 (2.02 percent) and it is around 1 in 80 (1.03 percent) for females. Still, more research is needed on new diagnostic, early, and innovative methods regarding finding an appropriate treatment method for KT. Compared to the tedious and time-consuming traditional diagnosis, automatic detection algorithms of machine learning can save diagnosis time, improve test accuracy, and reduce costs. Previous studies have shown that deep learning can play a role in dealing with complex tasks, diagnosis and segmentation, and classification of Kidney Tumors, one of the most malignant tumors. The goals of this review article on deep learning in radiology imaging are to summarize what has already been accomplished, determine the techniques used by the researchers in previous years in diagnosing Kidney Tumors through medical imaging, and identify some promising future avenues, whether in terms of applications or technological developments, as well as identifying common problems, describing ways to expand the data set, summarizing the knowledge and best practices, and determining remaining challenges and future directions. Full article

(This article belongs to the Collection Machine Learning and Artificial Intelligence for Health Applications on Social Networks)

► Show Figures

Figure 1

19 pages, 14228 KiB

Open AccessEditor’s ChoiceArticle

Comparison of Object Detection in Head-Mounted and Desktop Displays for Congruent and Incongruent Environments

by René Reinhard, Erinchan Telatar and Shah Rukh Humayoun

Big Data Cogn. Comput. 2022, 6(1), 28; https://doi.org/10.3390/bdcc6010028 - 07 Mar 2022

Cited by 1 | Viewed by 3038

Abstract

Virtual reality technologies, including head-mounted displays (HMD), can provide benefits to psychological research by combining high degrees of experimental control with improved ecological validity. This is due to the strong feeling of being in the displayed environment (presence) experienced by VR users. As [...] Read more.

Virtual reality technologies, including head-mounted displays (HMD), can provide benefits to psychological research by combining high degrees of experimental control with improved ecological validity. This is due to the strong feeling of being in the displayed environment (presence) experienced by VR users. As of yet, it is not fully explored how using HMDs impacts basic perceptual tasks, such as object perception. In traditional display setups, the congruency between background environment and object category has been shown to impact response times in object perception tasks. In this study, we investigated whether this well-established effect is comparable when using desktop and HMD devices. In the study, 21 participants used both desktop and HMD setups to perform an object identification task and, subsequently, their subjective presence while experiencing two-distinct virtual environments (a beach and a home environment) was evaluated. Participants were quicker to identify objects in the HMD condition, independent of object-environment congruency, while congruency effects were not impacted. Furthermore, participants reported significantly higher presence in the HMD condition. Full article

(This article belongs to the Special Issue Virtual Reality, Augmented Reality, and Human-Computer Interaction)

► Show Figures

Figure 1

42 pages, 679 KiB

Open AccessEditor’s ChoiceArticle

Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0

by Anna Kirkpatrick, Chidozie Onyeze, David Kartchner, Stephen Allegri, Davi Nakajima An, Kevin McCoy, Evie Davalbhakta and Cassie S. Mitchell

Big Data Cogn. Comput. 2022, 6(1), 27; https://doi.org/10.3390/bdcc6010027 - 01 Mar 2022

Cited by 7 | Viewed by 4073

Abstract

Literature-based discovery (LBD) summarizes information and generates insight from large text corpuses. The SemNet framework utilizes a large heterogeneous information network or “knowledge graph” of nodes and edges to compute relatedness and rank concepts pertinent to a user-specified target. SemNet provides a way [...] Read more.

Literature-based discovery (LBD) summarizes information and generates insight from large text corpuses. The SemNet framework utilizes a large heterogeneous information network or “knowledge graph” of nodes and edges to compute relatedness and rank concepts pertinent to a user-specified target. SemNet provides a way to perform multi-factorial and multi-scalar analysis of complex disease etiology and therapeutic identification using the 33+ million articles in PubMed. The present work improves the efficacy and efficiency of LBD for end users by augmenting SemNet to create SemNet 2.0. A custom Python data structure replaced reliance on Neo4j to improve knowledge graph query times by several orders of magnitude. Additionally, two randomized algorithms were built to optimize the HeteSim metric calculation for computing metapath similarity. The unsupervised learning algorithm for rank aggregation (ULARA), which ranks concepts with respect to the user-specified target, was reconstructed using derived mathematical proofs of correctness and probabilistic performance guarantees for optimization. The upgraded ULARA is generalizable to other rank aggregation problems outside of SemNet. In summary, SemNet 2.0 is a comprehensive open-source software for significantly faster, more effective, and user-friendly means of automated biomedical LBD. An example case is performed to rank relationships between Alzheimer’s disease and metabolic co-morbidities. Full article

(This article belongs to the Special Issue Graph-Based Data Mining and Social Network Analysis)

► Show Figures

Figure 1

28 pages, 1211 KiB

Open AccessEditor’s ChoiceArticle

A Combined System Metrics Approach to Cloud Service Reliability Using Artificial Intelligence

by Tek Raj Chhetri, Chinmaya Kumar Dehury, Artjom Lind, Satish Narayana Srirama and Anna Fensel

Big Data Cogn. Comput. 2022, 6(1), 26; https://doi.org/10.3390/bdcc6010026 - 01 Mar 2022

Cited by 4 | Viewed by 4048

Abstract

Identifying and anticipating potential failures in the cloud is an effective method for increasing cloud reliability and proactive failure management. Many studies have been conducted to predict potential failure, but none have combined SMART (self-monitoring, analysis, and reporting technology) hard drive metrics with [...] Read more.

Identifying and anticipating potential failures in the cloud is an effective method for increasing cloud reliability and proactive failure management. Many studies have been conducted to predict potential failure, but none have combined SMART (self-monitoring, analysis, and reporting technology) hard drive metrics with other system metrics, such as central processing unit (CPU) utilisation. Therefore, we propose a combined system metrics approach for failure prediction based on artificial intelligence to improve reliability. We tested over 100 cloud servers’ data and four artificial intelligence algorithms: random forest, gradient boosting, long short-term memory, and gated recurrent unit, and also performed correlation analysis. Our correlation analysis sheds light on the relationships that exist between system metrics and failure, and the experimental results demonstrate the advantages of combining system metrics, outperforming the state-of-the-art. Full article

(This article belongs to the Special Issue Advanced Machine Learning and Data Mining: A New Frontier in Artificial Intelligence Research)

► Show Figures

Figure 1

24 pages, 1832 KiB

Open AccessEditor’s ChoiceReview

Big Data in Criteria Selection and Identification in Managing Flood Disaster Events Based on Macro Domain PESTEL Analysis: Case Study of Malaysia Adaptation Index

by Mohammad Fikry Abdullah, Zurina Zainol, Siaw Yin Thian, Noor Hisham Ab Ghani, Azman Mat Jusoh, Mohd Zaki Mat Amin and Nur Aiza Mohamad

Big Data Cogn. Comput. 2022, 6(1), 25; https://doi.org/10.3390/bdcc6010025 - 01 Mar 2022

Cited by 6 | Viewed by 6874

Abstract

The impact of Big Data (BD) creates challenges in selecting relevant and significant data to be used as criteria to facilitate flood management plans. Studies on macro domain criteria expand the criteria selection, which is important for assessment in allowing a comprehensive understanding [...] Read more.

The impact of Big Data (BD) creates challenges in selecting relevant and significant data to be used as criteria to facilitate flood management plans. Studies on macro domain criteria expand the criteria selection, which is important for assessment in allowing a comprehensive understanding of the current situation, readiness, preparation, resources, and others for decision assessment and disaster events planning. This study aims to facilitate the criteria identification and selection from a macro domain perspective in improving flood management planning. The objectives of this study are (a) to explore and identify potential and possible criteria to be incorporated in the current flood management plan in the macro domain perspective; (b) to understand the type of flood measures and decision goals implemented to facilitate flood management planning decisions; and (c) to examine the possible structured mechanism for criteria selection based on the decision analysis technique. Based on a systematic literature review and thematic analysis using the PESTEL framework, the findings have identified and clustered domains and their criteria to be considered and applied in future flood management plans. The critical review on flood measures and decision goals would potentially equip stakeholders and policy makers for better decision making based on a disaster management plan. The decision analysis technique as a structured mechanism would significantly improve criteria identification and selection for comprehensive and collective decisions. The findings from this study could further improve Malaysia Adaptation Index (MAIN) criteria identification and selection, which could be the complementary and supporting reference in managing flood disaster management. A proposed framework from this study can be used as guidance in dealing with and optimising the criteria based on challenges and the current application of Big Data and criteria in managing disaster events. Full article

► Show Figures

Figure 1

27 pages, 1296 KiB

Open AccessEditor’s ChoiceArticle

A Framework for Content-Based Search in Large Music Collections

by Tiange Zhu, Raphaël Fournier-S’niehotta, Philippe Rigaux and Nicolas Travers

Big Data Cogn. Comput. 2022, 6(1), 23; https://doi.org/10.3390/bdcc6010023 - 23 Feb 2022

Cited by 3 | Viewed by 3848

Abstract

We address the problem of scalable content-based search in large collections of music documents. Music content is highly complex and versatile and presents multiple facets that can be considered independently or in combination. Moreover, music documents can be digitally encoded in many ways. [...] Read more.

We address the problem of scalable content-based search in large collections of music documents. Music content is highly complex and versatile and presents multiple facets that can be considered independently or in combination. Moreover, music documents can be digitally encoded in many ways. We propose a general framework for building a scalable search engine, based on (i) a music description language that represents music content independently from a specific encoding, (ii) an extendible list of feature-extraction functions, and (iii) indexing, searching, and ranking procedures designed to be integrated into the standard architecture of a text-oriented search engine. As a proof of concept, we also detail an actual implementation of the framework for searching in large collections of XML-encoded music scores, based on the popular ElasticSearch system. It is released as open-source in GitHub, and available as a ready-to-use Docker image for communities that manage large collections of digitized music documents. Full article

(This article belongs to the Special Issue Big Music Data)

► Show Figures

Figure 1

15 pages, 347 KiB

Open AccessEditor’s ChoiceArticle

LFA: A Lévy Walk and Firefly-Based Search Algorithm: Application to Multi-Target Search and Multi-Robot Foraging

by Ouarda Zedadra, Antonio Guerrieri and Hamid Seridi

Big Data Cogn. Comput. 2022, 6(1), 22; https://doi.org/10.3390/bdcc6010022 - 21 Feb 2022

Cited by 5 | Viewed by 3207

Abstract

In the literature, several exploration algorithms have been proposed so far. Among these, Lévy walk is commonly used since it is proved to be more efficient than the simple random-walk exploration. It is beneficial when targets are sparsely distributed in the search space. [...] Read more.

In the literature, several exploration algorithms have been proposed so far. Among these, Lévy walk is commonly used since it is proved to be more efficient than the simple random-walk exploration. It is beneficial when targets are sparsely distributed in the search space. However, due to its super-diffusive behavior, some tuning is needed to improve its performance, specifically when targets are clustered. Firefly algorithm is a swarm intelligence-based algorithm useful for intensive search, but its exploration rate is very limited. An efficient and reliable search could be attained by combining the two algorithms since the first one allows exploration space, and the second one encourages its exploitation. In this paper, we propose a swarm intelligence-based search algorithm called Lévy walk and Firefly-based Algorithm (LFA), which is a hybridization of the two aforementioned algorithms. The algorithm is applied to Multi-Target Search and Multi-Robot Foraging. Numerical experiments to test the performances are conducted on the robotic simulator ARGoS. A comparison with the original firefly algorithm proves the goodness of our contribution. Full article

(This article belongs to the Special Issue Big Data and Cognitive Computing: 5th Anniversary Feature Papers)

► Show Figures

Figure 1

27 pages, 8416 KiB

Open AccessEditor’s ChoiceReview

Big Data in Construction: Current Applications and Future Opportunities

by Hafiz Suliman Munawar, Fahim Ullah, Siddra Qayyum and Danish Shahzad

Big Data Cogn. Comput. 2022, 6(1), 18; https://doi.org/10.3390/bdcc6010018 - 06 Feb 2022

Cited by 35 | Viewed by 16183

Abstract

Big data have become an integral part of various research fields due to the rapid advancements in the digital technologies available for dealing with data. The construction industry is no exception and has seen a spike in the data being generated due to [...] Read more.

Big data have become an integral part of various research fields due to the rapid advancements in the digital technologies available for dealing with data. The construction industry is no exception and has seen a spike in the data being generated due to the introduction of various digital disruptive technologies. However, despite the availability of data and the introduction of such technologies, the construction industry is lagging in harnessing big data. This paper critically explores literature published since 2010 to identify the data trends and how the construction industry can benefit from big data. The presence of tools such as computer-aided drawing (CAD) and building information modelling (BIM) provide a great opportunity for researchers in the construction industry to further improve how infrastructure can be developed, monitored, or improved in the future. The gaps in the existing research data have been explored and a detailed analysis was carried out to identify the different ways in which big data analysis and storage work in relevance to the construction industry. Big data engineering (BDE) and statistics are among the most crucial steps for integrating big data technology in construction. The results of this study suggest that while the existing research studies have set the stage for improving big data research, the integration of the associated digital technologies into the construction industry is not very clear. Among the future opportunities, big data research into construction safety, site management, heritage conservation, and project waste minimization and quality improvements are key areas. Full article

(This article belongs to the Special Issue Review Papers in Big Data, Cloud-Based Data Analysis and Learning Systems)

► Show Figures

Figure 1

29 pages, 1272 KiB

Open AccessEditor’s ChoiceReview

Big Data Analytics in Supply Chain Management: A Systematic Literature Review and Research Directions

by In Lee and George Mangalaraj

Big Data Cogn. Comput. 2022, 6(1), 17; https://doi.org/10.3390/bdcc6010017 - 01 Feb 2022

Cited by 44 | Viewed by 26235

Abstract

Big data analytics has been successfully used for various business functions, such as accounting, marketing, supply chain, and operations. Currently, along with the recent development in machine learning and computing infrastructure, big data analytics in the supply chain are surging in importance. In [...] Read more.

Big data analytics has been successfully used for various business functions, such as accounting, marketing, supply chain, and operations. Currently, along with the recent development in machine learning and computing infrastructure, big data analytics in the supply chain are surging in importance. In light of the great interest and evolving nature of big data analytics in supply chains, this study conducts a systematic review of existing studies in big data analytics. This study presents a framework of a systematic literature review from interdisciplinary perspectives. From the organizational perspective, this study examines the theoretical foundations and research models that explain the sustainability and performances achieved through the use of big data analytics. Then, from the technical perspective, this study analyzes types of big data analytics, techniques, algorithms, and features developed for enhanced supply chain functions. Finally, this study identifies the research gap and suggests future research directions. Full article

(This article belongs to the Special Issue Big Data and Cognitive Computing: 5th Anniversary Feature Papers)

► Show Figures

Figure 1

22 pages, 2156 KiB

Open AccessEditor’s ChoiceBrief Report

A Dataset for Emotion Recognition Using Virtual Reality and EEG (DER-VREEG): Emotional State Classification Using Low-Cost Wearable VR-EEG Headsets

by Nazmi Sofian Suhaimi, James Mountstephens and Jason Teo

Big Data Cogn. Comput. 2022, 6(1), 16; https://doi.org/10.3390/bdcc6010016 - 28 Jan 2022

Cited by 30 | Viewed by 7659

Abstract

Emotions are viewed as an important aspect of human interactions and conversations, and allow effective and logical decision making. Emotion recognition uses low-cost wearable electroencephalography (EEG) headsets to collect brainwave signals and interpret these signals to provide information on the mental state of [...] Read more.

Emotions are viewed as an important aspect of human interactions and conversations, and allow effective and logical decision making. Emotion recognition uses low-cost wearable electroencephalography (EEG) headsets to collect brainwave signals and interpret these signals to provide information on the mental state of a person, with the implementation of a virtual reality environment in different applications; the gap between human and computer interaction, as well as the understanding process, would shorten, providing an immediate response to an individual’s mental health. This study aims to use a virtual reality (VR) headset to induce four classes of emotions (happy, scared, calm, and bored), to collect brainwave samples using a low-cost wearable EEG headset, and to run popular classifiers to compare the most feasible ones that can be used for this particular setup. Firstly, we attempt to build an immersive VR database that is accessible to the public and that can potentially assist with emotion recognition studies using virtual reality stimuli. Secondly, we use a low-cost wearable EEG headset that is both compact and small, and can be attached to the scalp without any hindrance, allowing freedom of movement for participants to view their surroundings inside the immersive VR stimulus. Finally, we evaluate the emotion recognition system by using popular machine learning algorithms and compare them for both intra-subject and inter-subject classification. The results obtained here show that the prediction model for the four-class emotion classification performed well, including the more challenging inter-subject classification, with the support vector machine (SVM Class Weight kernel) obtaining 85.01% classification accuracy. This shows that using less electrode channels but with proper parameter tuning and selection features affects the performance of the classifications. Full article

(This article belongs to the Special Issue Virtual Reality, Augmented Reality, and Human-Computer Interaction)

► Show Figures

Figure 1

29 pages, 6332 KiB

Open AccessEditor’s ChoiceArticle

Fuzzy Neural Network Expert System with an Improved Gini Index Random Forest-Based Feature Importance Measure Algorithm for Early Diagnosis of Breast Cancer in Saudi Arabia

by Ebrahem A. Algehyne, Muhammad Lawan Jibril, Naseh A. Algehainy, Osama Abdulaziz Alamri and Abdullah K. Alzahrani

Big Data Cogn. Comput. 2022, 6(1), 13; https://doi.org/10.3390/bdcc6010013 - 27 Jan 2022

Cited by 37 | Viewed by 5436

Abstract

Breast cancer is one of the common malignancies among females in Saudi Arabia and has also been ranked as the one most prevalent and the number two killer disease in the country. However, the clinical diagnosis process of any disease such as breast [...] Read more.

Breast cancer is one of the common malignancies among females in Saudi Arabia and has also been ranked as the one most prevalent and the number two killer disease in the country. However, the clinical diagnosis process of any disease such as breast cancer, coronary artery diseases, diabetes, COVID-19, among others, is often associated with uncertainty due to the complexity and fuzziness of the process. In this work, a fuzzy neural network expert system with an improved gini index random forest-based feature importance measure algorithm for early diagnosis of breast cancer in Saudi Arabia was proposed to address the uncertainty and ambiguity associated with the diagnosis of breast cancer and also the heavier burden on the overlay of the network nodes of the fuzzy neural network system that often happens due to insignificant features that are used to predict or diagnose the disease. An Improved Gini Index Random Forest-Based Feature Importance Measure Algorithm was used to select the five fittest features of the diagnostic wisconsin breast cancer database out of the 32 features of the dataset. The logistic regression, support vector machine, k-nearest neighbor, random forest, and gaussian naïve bayes learning algorithms were used to develop two sets of classification models. Hence, the classification models with full features (32) and models with the 5 fittest features. The two sets of classification models were evaluated, and the results of the evaluation were compared. The result of the comparison shows that the models with the selected fittest features outperformed their counterparts with full features in terms of accuracy, sensitivity, and sensitivity. Therefore, a fuzzy neural network based expert system was developed with the five selected fittest features and the system achieved 99.33% accuracy, 99.41% sensitivity, and 99.24% specificity. Moreover, based on the comparison of the system developed in this work against the previous works that used fuzzy neural network or other applied artificial intelligence techniques on the same dataset for diagnosis of breast cancer using the same dataset, the system stands to be the best in terms of accuracy, sensitivity, and specificity, respectively. The z test was also conducted, and the test result shows that there is significant accuracy achieved by the system for early diagnosis of breast cancer. Full article

► Show Figures

Figure 1

16 pages, 1815 KiB

Open AccessEditor’s ChoiceArticle

Google Street View Images as Predictors of Patient Health Outcomes, 2017–2019

by Quynh C. Nguyen, Tom Belnap, Pallavi Dwivedi, Amir Hossein Nazem Deligani, Abhinav Kumar, Dapeng Li, Ross Whitaker, Jessica Keralis, Heran Mane, Xiaohe Yue, Thu T. Nguyen, Tolga Tasdizen and Kim D. Brunisholz

Big Data Cogn. Comput. 2022, 6(1), 15; https://doi.org/10.3390/bdcc6010015 - 27 Jan 2022

Cited by 9 | Viewed by 5254

Abstract

Collecting neighborhood data can both be time- and resource-intensive, especially across broad geographies. In this study, we leveraged 1.4 million publicly available Google Street View (GSV) images from Utah to construct indicators of the neighborhood built environment and evaluate their associations with 2017–2019 [...] Read more.

Collecting neighborhood data can both be time- and resource-intensive, especially across broad geographies. In this study, we leveraged 1.4 million publicly available Google Street View (GSV) images from Utah to construct indicators of the neighborhood built environment and evaluate their associations with 2017–2019 health outcomes of approximately one-third of the population living in Utah. The use of electronic medical records allows for the assessment of associations between neighborhood characteristics and individual-level health outcomes while controlling for predisposing factors, which distinguishes this study from previous GSV studies that were ecological in nature. Among 938,085 adult patients, we found that individuals living in communities in the highest tertiles of green streets and non-single-family homes have 10–27% lower diabetes, uncontrolled diabetes, hypertension, and obesity, but higher substance use disorders—controlling for age, White race, Hispanic ethnicity, religion, marital status, health insurance, and area deprivation index. Conversely, the presence of visible utility wires overhead was associated with 5–10% more diabetes, uncontrolled diabetes, hypertension, obesity, and substance use disorders. Our study found that non-single-family and green streets were related to a lower prevalence of chronic conditions, while visible utility wires and single-lane roads were connected with a higher burden of chronic conditions. These contextual characteristics can better help healthcare organizations understand the drivers of their patients’ health by further considering patients’ residential environments, which present both risks and resources. Full article

(This article belongs to the Special Issue Machine and Deep Learning in Computer Vision Applications)

► Show Figures

Figure 1

17 pages, 337 KiB

Open AccessEditor’s ChoiceReview

Scalable Extended Reality: A Future Research Agenda

by Vera Marie Memmesheimer and Achim Ebert

Big Data Cogn. Comput. 2022, 6(1), 12; https://doi.org/10.3390/bdcc6010012 - 26 Jan 2022

Cited by 7 | Viewed by 4580

Abstract

Extensive research has outlined the potential of augmented, mixed, and virtual reality applications. However, little attention has been paid to scalability enhancements fostering practical adoption. In this paper, we introduce the concept of scalable extended reality (XR

^{S}

), i.e., spaces scaling between [...] Read more.

Extensive research has outlined the potential of augmented, mixed, and virtual reality applications. However, little attention has been paid to scalability enhancements fostering practical adoption. In this paper, we introduce the concept of scalable extended reality (XR

^{S}

), i.e., spaces scaling between different displays and degrees of virtuality that can be entered by multiple, possibly distributed users. The development of such XR

^{S}

spaces concerns several research fields. To provide bidirectional interaction and maintain consistency with the real environment, virtual reconstructions of physical scenes need to be segmented semantically and adapted dynamically. Moreover, scalable interaction techniques for selection, manipulation, and navigation as well as a world-stabilized rendering of 2D annotations in 3D space are needed to let users intuitively switch between handheld and head-mounted displays. Collaborative settings should further integrate access control and awareness cues indicating the collaborators’ locations and actions. While many of these topics were investigated by previous research, very few have considered their integration to enhance scalability. Addressing this gap, we review related previous research, list current barriers to the development of XR

^{S}

spaces, and highlight dependencies between them. Full article

(This article belongs to the Special Issue Virtual Reality, Augmented Reality, and Human-Computer Interaction)

► Show Figures

Figure 1

21 pages, 30151 KiB

Open AccessEditor’s ChoiceArticle

Context-Aware Explainable Recommendation Based on Domain Knowledge Graph

by Muzamil Hussain Syed, Tran Quoc Bao Huy and Sun-Tae Chung

Big Data Cogn. Comput. 2022, 6(1), 11; https://doi.org/10.3390/bdcc6010011 - 20 Jan 2022

Cited by 11 | Viewed by 5939

Abstract

With the rapid growth of internet data, knowledge graphs (KGs) are considered as efficient form of knowledge representation that captures the semantics of web objects. In recent years, reasoning over KG for various artificial intelligence tasks have received a great deal of research [...] Read more.

With the rapid growth of internet data, knowledge graphs (KGs) are considered as efficient form of knowledge representation that captures the semantics of web objects. In recent years, reasoning over KG for various artificial intelligence tasks have received a great deal of research interest. Providing recommendations based on users’ natural language queries is an equally difficult undertaking. In this paper, we propose a novel, context-aware recommender system, based on domain KG, to respond to user-defined natural queries. The proposed recommender system consists of three stages. First, we generate incomplete triples from user queries, which are then segmented using logical conjunction (∧) and disjunction (∨) operations. Then, we generate candidates by utilizing a KGE-based framework (Query2Box) for reasoning over segmented logical triples, with ∧, ∨, and ∃ operators; finally, the generated candidates are re-ranked using neural collaborative filtering (NCF) model by exploiting contextual (auxiliary) information from GraphSAGE embedding. Our approach demonstrates to be simple, yet efficient, at providing explainable recommendations on user’s queries, while leveraging user-item contextual information. Furthermore, our framework has shown to be capable of handling logical complex queries by transforming them into a disjunctive normal form (DNF) of simple queries. In this work, we focus on the restaurant domain as an application domain and use the Yelp dataset to evaluate the system. Experiments demonstrate that the proposed recommender system generalizes well on candidate generation from logical queries and effectively re-ranks those candidates, compared to the matrix factorization model. Full article

(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems)

► Show Figures

Figure 1

16 pages, 9862 KiB

Open AccessEditor’s ChoiceArticle

On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining

by Gomathy Ramaswami, Teo Susnjak and Anuradha Mathrani

Big Data Cogn. Comput. 2022, 6(1), 6; https://doi.org/10.3390/bdcc6010006 - 07 Jan 2022

Cited by 16 | Viewed by 4307

Abstract

Poor academic performance of students is a concern in the educational sector, especially if it leads to students being unable to meet minimum course requirements. However, with timely prediction of students’ performance, educators can detect at-risk students, thereby enabling early interventions for supporting [...] Read more.

Poor academic performance of students is a concern in the educational sector, especially if it leads to students being unable to meet minimum course requirements. However, with timely prediction of students’ performance, educators can detect at-risk students, thereby enabling early interventions for supporting these students in overcoming their learning difficulties. However, the majority of studies have taken the approach of developing individual models that target a single course while developing prediction models. These models are tailored to specific attributes of each course amongst a very diverse set of possibilities. While this approach can yield accurate models in some instances, this strategy is associated with limitations. In many cases, overfitting can take place when course data is small or when new courses are devised. Additionally, maintaining a large suite of models per course is a significant overhead. This issue can be tackled by developing a generic and course-agnostic predictive model that captures more abstract patterns and is able to operate across all courses, irrespective of their differences. This study demonstrates how a generic predictive model can be developed that identifies at-risk students across a wide variety of courses. Experiments were conducted using a range of algorithms, with the generic model producing an effective accuracy. The findings showed that the CatBoost algorithm performed the best on our dataset across the F-measure, ROC (receiver operating characteristic) curve and AUC scores; therefore, it is an excellent candidate algorithm for providing solutions on this domain given its capabilities to seamlessly handle categorical and missing data, which is frequently a feature in educational datasets. Full article

(This article belongs to the Special Issue Educational Data Mining and Technology)

► Show Figures

Figure 1

26 pages, 1044 KiB

Open AccessEditor’s ChoiceArticle

A Hierarchical Hadoop Framework to Process Geo-Distributed Big Data

by Giuseppe Di Modica and Orazio Tomarchio

Big Data Cogn. Comput. 2022, 6(1), 5; https://doi.org/10.3390/bdcc6010005 - 06 Jan 2022

Cited by 1 | Viewed by 3360

Abstract

In the past twenty years, we have witnessed an unprecedented production of data worldwide that has generated a growing demand for computing resources and has stimulated the design of computing paradigms and software tools to efficiently and quickly obtain insights on such a [...] Read more.

In the past twenty years, we have witnessed an unprecedented production of data worldwide that has generated a growing demand for computing resources and has stimulated the design of computing paradigms and software tools to efficiently and quickly obtain insights on such a Big Data. State-of-the-art parallel computing techniques such as the MapReduce guarantee high performance in scenarios where involved computing nodes are equally sized and clustered via broadband network links, and the data are co-located with the cluster of nodes. Unfortunately, the mentioned techniques have proven ineffective in geographically distributed scenarios, i.e., computing contexts where nodes and data are geographically distributed across multiple distant data centers. In the literature, researchers have proposed variants of the MapReduce paradigm that obtain awareness of the constraints imposed in those scenarios (such as the imbalance of nodes computing power and of interconnecting links) to enforce smart task scheduling strategies. We have designed a hierarchical computing framework in which a context-aware scheduler orchestrates computing tasks that leverage the potential of the vanilla Hadoop framework within each data center taking part in the computation. In this work, after presenting the features of the developed framework, we advocate the opportunity of fragmenting the data in a smart way so that the scheduler produces a fairer distribution of the workload among the computing tasks. To prove the concept, we implemented a software prototype of the framework and ran several experiments on a small-scale testbed. Test results are discussed in the last part of the paper. Full article

(This article belongs to the Special Issue Big Data Analytics and Cloud Data Management)

► Show Figures

Figure 1

16 pages, 6652 KiB

Open AccessEditor’s ChoiceArticle

Analyzing COVID-19 Medical Papers Using Artificial Intelligence: Insights for Researchers and Medical Professionals

by Dmitry Soshnikov, Tatiana Petrova, Vickie Soshnikova and Andrey Grunin

Big Data Cogn. Comput. 2022, 6(1), 4; https://doi.org/10.3390/bdcc6010004 - 05 Jan 2022

Cited by 1 | Viewed by 3440

Abstract

Since the beginning of the COVID-19 pandemic almost two years ago, there have been more than 700,000 scientific papers published on the subject. An individual researcher cannot possibly get acquainted with such a huge text corpus and, therefore, some help from artificial intelligence [...] Read more.

Since the beginning of the COVID-19 pandemic almost two years ago, there have been more than 700,000 scientific papers published on the subject. An individual researcher cannot possibly get acquainted with such a huge text corpus and, therefore, some help from artificial intelligence (AI) is highly needed. We propose the AI-based tool to help researchers navigate the medical papers collections in a meaningful way and extract some knowledge from scientific COVID-19 papers. The main idea of our approach is to get as much semi-structured information from text corpus as possible, using named entity recognition (NER) with a model called PubMedBERT and Text Analytics for Health service, then store the data into NoSQL database for further fast processing and insights generation. Additionally, the contexts in which the entities were used (neutral or negative) are determined. Application of NLP and text-based emotion detection (TBED) methods to COVID-19 text corpus allows us to gain insights on important issues of diagnosis and treatment (such as changes in medical treatment over time, joint treatment strategies using several medications, and the connection between signs and symptoms of coronavirus, etc.). Full article

► Show Figures

Figure 1

16 pages, 1393 KiB

Open AccessFeature PaperEditor’s ChoiceArticle

Analyzing Political Polarization on Social Media by Deleting Bot Spamming

by Riccardo Cantini, Fabrizio Marozzo, Domenico Talia and Paolo Trunfio

Big Data Cogn. Comput. 2022, 6(1), 3; https://doi.org/10.3390/bdcc6010003 - 04 Jan 2022

Cited by 9 | Viewed by 5897

Abstract

Social media platforms are part of everyday life, allowing the interconnection of people around the world in large discussion groups relating to every topic, including important social or political issues. Therefore, social media have become a valuable source of information-rich data, commonly referred [...] Read more.

Social media platforms are part of everyday life, allowing the interconnection of people around the world in large discussion groups relating to every topic, including important social or political issues. Therefore, social media have become a valuable source of information-rich data, commonly referred to as Social Big Data, effectively exploitable to study the behavior of people, their opinions, moods, interests and activities. However, these powerful communication platforms can be also used to manipulate conversation, polluting online content and altering the popularity of users, through spamming activities and misinformation spreading. Recent studies have shown the use on social media of automatic entities, defined as social bots, that appear as legitimate users by imitating human behavior aimed at influencing discussions of any kind, including political issues. In this paper we present a new methodology, namely TIMBRE (Time-aware opInion Mining via Bot REmoval), aimed at discovering the polarity of social media users during election campaigns characterized by the rivalry of political factions. This methodology is temporally aware and relies on a keyword-based classification of posts and users. Moreover, it recognizes and filters out data produced by social media bots, which aim to alter public opinion about political candidates, thus avoiding heavily biased information. The proposed methodology has been applied to a case study that analyzes the polarization of a large number of Twitter users during the 2016 US presidential election. The achieved results show the benefits brought by both removing bots and taking into account temporal aspects in the forecasting process, revealing the high accuracy and effectiveness of the proposed approach. Finally, we investigated how the presence of social bots may affect political discussion by studying the 2016 US presidential election. Specifically, we analyzed the main differences between human and artificial political support, estimating also the influence of social bots on legitimate users. Full article

(This article belongs to the Special Issue Big Data and Cognitive Computing: 5th Anniversary Feature Papers)

► Show Figures

Figure 1

23 pages, 4422 KiB

Open AccessEditor’s ChoiceArticle

Early Diagnosis of Alzheimer’s Disease Using Cerebral Catheter Angiogram Neuroimaging: A Novel Model Based on Deep Learning Approaches

by Maha Gharaibeh, Mothanna Almahmoud, Mostafa Z. Ali, Amer Al-Badarneh, Mwaffaq El-Heis, Laith Abualigah, Maryam Altalhi, Ahmad Alaiad and Amir H. Gandomi

Big Data Cogn. Comput. 2022, 6(1), 2; https://doi.org/10.3390/bdcc6010002 - 28 Dec 2021

Cited by 21 | Viewed by 6056

Abstract

Neuroimaging refers to the techniques that provide efficient information about the neural structure of the human brain, which is utilized for diagnosis, treatment, and scientific research. The problem of classifying neuroimages is one of the most important steps that are needed by medical [...] Read more.

Neuroimaging refers to the techniques that provide efficient information about the neural structure of the human brain, which is utilized for diagnosis, treatment, and scientific research. The problem of classifying neuroimages is one of the most important steps that are needed by medical staff to diagnose their patients early by investigating the indicators of different neuroimaging types. Early diagnosis of Alzheimer’s disease is of great importance in preventing the deterioration of the patient’s situation. In this research, a novel approach was devised based on a digital subtracted angiogram scan that provides sufficient features of a new biomarker cerebral blood flow. The used dataset was acquired from the database of K.A.U.H hospital and contains digital subtracted angiograms of participants who were diagnosed with Alzheimer’s disease, besides samples of normal controls. Since each scan included multiple frames for the left and right ICA’s, pre-processing steps were applied to make the dataset prepared for the next stages of feature extraction and classification. The multiple frames of scans transformed from real space into DCT space and averaged to remove noises. Then, the averaged image was transformed back to the real space, and both sides filtered with Meijering and concatenated in a single image. The proposed model extracts the features using different pre-trained models: InceptionV3 and DenseNet201. Then, the PCA method was utilized to select the features with 0.99 explained variance ratio, where the combination of selected features from both pre-trained models is fed into machine learning classifiers. Overall, the obtained experimental results are at least as good as other state-of-the-art approaches in the literature and more efficient according to the recent medical standards with a 99.14% level of accuracy, considering the difference in dataset samples and the used cerebral blood flow biomarker. Full article

(This article belongs to the Special Issue Machine Learning and Data Analysis for Image Processing)

► Show Figures

Figure 1

41 pages, 2295 KiB

Open AccessEditor’s ChoiceArticle

AGR4BS: A Generic Multi-Agent Organizational Model for Blockchain Systems

by Hector Roussille, Önder Gürcan and Fabien Michel

Big Data Cogn. Comput. 2022, 6(1), 1; https://doi.org/10.3390/bdcc6010001 - 21 Dec 2021

Cited by 7 | Viewed by 5004

Abstract

Blockchain is a very attractive technology since it maintains a public, append-only, immutable and ordered log of transactions which guarantees an auditable ledger accessible by anyone. Blockchain systems are inherently interdisciplinary since they combine various fields such as cryptography, multi-agent systems, distributed systems, [...] Read more.

Blockchain is a very attractive technology since it maintains a public, append-only, immutable and ordered log of transactions which guarantees an auditable ledger accessible by anyone. Blockchain systems are inherently interdisciplinary since they combine various fields such as cryptography, multi-agent systems, distributed systems, social systems, economy, and finance. Furthermore, they have a very active and dynamic ecosystem where new blockchain platforms and algorithms are developed continuously due to the interest of the public and the industries to the technology. Consequently, we anticipate a challenging and interdisciplinary research agenda in blockchain systems, built upon a methodology that strives to capture the rich process resulting from the interplay between the behavior of agents and the dynamic interactions among them. To be effective, however, modeling studies providing insights into blockchain systems, and appropriate description of agents paired with a generic understanding of their components are needed. Such studies will create a more unified field of blockchain systems that advances our understanding and leads to further insight. According to this perspective, in this study, we propose using a generic multi-agent organizational modeling for studying blockchain systems, namely AGR4BS. Concretely, we use the Agent/Group/Role (AGR) organizational modeling approach to identify and represent the generic entities which are common to blockchain systems. We show through four real case studies how this generic model can be used to model different blockchain systems. We also show briefly how it can be used for modeling three well-known attacks on blockchain systems. Full article

► Show Figures

Graphical abstract

18 pages, 546 KiB

Open AccessEditor’s ChoiceArticle

DASentimental: Detecting Depression, Anxiety, and Stress in Texts via Emotional Recall, Cognitive Networks, and Machine Learning

by Asra Fatima, Ying Li, Thomas Trenholm Hills and Massimo Stella

Big Data Cogn. Comput. 2021, 5(4), 77; https://doi.org/10.3390/bdcc5040077 - 13 Dec 2021

Cited by 5 | Viewed by 5920

Abstract

Most current affect scales and sentiment analysis on written text focus on quantifying valence/sentiment, the primary dimension of emotion. Distinguishing broader, more complex negative emotions of similar valence is key to evaluating mental health. We propose a semi-supervised machine learning model, DASentimental, to [...] Read more.

Most current affect scales and sentiment analysis on written text focus on quantifying valence/sentiment, the primary dimension of emotion. Distinguishing broader, more complex negative emotions of similar valence is key to evaluating mental health. We propose a semi-supervised machine learning model, DASentimental, to extract depression, anxiety, and stress from written text. We trained DASentimental to identify how N = 200 sequences of recalled emotional words correlate with recallers’ depression, anxiety, and stress from the Depression Anxiety Stress Scale (DASS-21). Using cognitive network science, we modeled every recall list as a bag-of-words (BOW) vector and as a walk over a network representation of semantic memory—in this case, free associations. This weights BOW entries according to their centrality (degree) in semantic memory and informs recalls using semantic network distances, thus embedding recalls in a cognitive representation. This embedding translated into state-of-the-art, cross-validated predictions for depression (R = 0.7), anxiety (R = 0.44), and stress (R = 0.52), equivalent to previous results employing additional human data. Powered by a multilayer perceptron neural network, DASentimental opens the door to probing the semantic organizations of emotional distress. We found that semantic distances between recalls (i.e., walk coverage), was key for estimating depression levels but redundant for anxiety and stress levels. Semantic distances from “fear” boosted anxiety predictions but were redundant when the “sad–happy” dyad was considered. We applied DASentimental to a clinical dataset of 142 suicide notes and found that the predicted depression and anxiety levels (high/low) corresponded to differences in valence and arousal as expected from a circumplex model of affect. We discuss key directions for future research enabled by artificial intelligence detecting stress, anxiety, and depression in texts. Full article

(This article belongs to the Special Issue Knowledge Modelling and Learning through Cognitive Networks)

► Show Figures

Figure 1

11 pages, 440 KiB

Open AccessEditor’s ChoiceArticle

Exploring Ensemble-Based Class Imbalance Learners for Intrusion Detection in Industrial Control Networks

by Maya Hilda Lestari Louk and Bayu Adhi Tama

Big Data Cogn. Comput. 2021, 5(4), 72; https://doi.org/10.3390/bdcc5040072 - 06 Dec 2021

Cited by 18 | Viewed by 3415

Abstract

Classifier ensembles have been utilized in the industrial cybersecurity sector for many years. However, their efficacy and reliability for intrusion detection systems remain questionable in current research, owing to the particularly imbalanced data issue. The purpose of this article is to address a [...] Read more.

Classifier ensembles have been utilized in the industrial cybersecurity sector for many years. However, their efficacy and reliability for intrusion detection systems remain questionable in current research, owing to the particularly imbalanced data issue. The purpose of this article is to address a gap in the literature by illustrating the benefits of ensemble-based models for identifying threats and attacks in a cyber-physical power grid. We provide a framework that compares nine cost-sensitive individual and ensemble models designed specifically for handling imbalanced data, including cost-sensitive C4.5, roughly balanced bagging, random oversampling bagging, random undersampling bagging, synthetic minority oversampling bagging, random undersampling boosting, synthetic minority oversampling boosting, AdaC2, and EasyEnsemble. Each ensemble’s performance is tested against a range of benchmarked power system datasets utilizing balanced accuracy, Kappa statistics, and AUC metrics. Our findings demonstrate that EasyEnsemble outperformed significantly in comparison to its rivals across the board. Furthermore, undersampling and oversampling strategies were effective in a boosting-based ensemble but not in a bagging-based ensemble. Full article

(This article belongs to the Special Issue Artificial Intelligence for Trustworthy Industrial Internet of Things)

► Show Figures

Figure 1

12 pages, 857 KiB

Open AccessEditor’s ChoiceReview

Spiking Neural Networks for Computational Intelligence: An Overview

by Shirin Dora and Nikola Kasabov

Big Data Cogn. Comput. 2021, 5(4), 67; https://doi.org/10.3390/bdcc5040067 - 15 Nov 2021

Cited by 21 | Viewed by 6752

Abstract

Deep neural networks with rate-based neurons have exhibited tremendous progress in the last decade. However, the same level of progress has not been observed in research on spiking neural networks (SNN), despite their capability to handle temporal data, energy-efficiency and low latency. This [...] Read more.

Deep neural networks with rate-based neurons have exhibited tremendous progress in the last decade. However, the same level of progress has not been observed in research on spiking neural networks (SNN), despite their capability to handle temporal data, energy-efficiency and low latency. This could be because the benchmarking techniques for SNNs are based on the methods used for evaluating deep neural networks, which do not provide a clear evaluation of the capabilities of SNNs. Particularly, the benchmarking of SNN approaches with regards to energy efficiency and latency requires realization in suitable hardware, which imposes additional temporal and resource constraints upon ongoing projects. This review aims to provide an overview of the current real-world applications of SNNs and identifies steps to accelerate research involving SNNs in the future. Full article

(This article belongs to the Special Issue Computational Intelligence: Spiking Neural Networks)

► Show Figures

Figure 1

25 pages, 873 KiB

Open AccessEditor’s ChoiceArticle

An Enhanced Parallelisation Model for Performance Prediction of Apache Spark on a Multinode Hadoop Cluster

by Nasim Ahmed, Andre L. C. Barczak, Mohammad A. Rashid and Teo Susnjak

Big Data Cogn. Comput. 2021, 5(4), 65; https://doi.org/10.3390/bdcc5040065 - 05 Nov 2021

Cited by 4 | Viewed by 4120

Abstract

Big data frameworks play a vital role in storing, processing, and analysing large datasets. Apache Spark has been established as one of the most popular big data engines for its efficiency and reliability. However, one of the significant problems of the Spark system [...] Read more.

Big data frameworks play a vital role in storing, processing, and analysing large datasets. Apache Spark has been established as one of the most popular big data engines for its efficiency and reliability. However, one of the significant problems of the Spark system is performance prediction. Spark has more than 150 configurable parameters, and configuration of so many parameters is challenging task when determining the suitable parameters for the system. In this paper, we proposed two distinct parallelisation models for performance prediction. Our insight is that each node in a Hadoop cluster can communicate with identical nodes, and a certain function of the non-parallelisable runtime can be estimated accordingly. Both models use simple equations that allows us to predict the runtime when the size of the job and the number of executables are known. The proposed models were evaluated based on five HiBench workloads, Kmeans, PageRank, Graph (NWeight), SVM, and WordCount. The workload’s empirical data were fitted with one of the two models meeting the accuracy requirements. Finally, the experimental findings show that the model can be a handy and helpful tool for scheduling and planning system deployment. Full article

► Show Figures

Figure 1

13 pages, 2180 KiB

Open AccessEditor’s ChoiceArticle

Prediction of Cloud Fractional Cover Using Machine Learning

by Hanna Svennevik, Michael A. Riegler, Steven Hicks, Trude Storelvmo and Hugo L. Hammer

Big Data Cogn. Comput. 2021, 5(4), 62; https://doi.org/10.3390/bdcc5040062 - 03 Nov 2021

Cited by 1 | Viewed by 4028

Abstract

Climate change is stated as one of the largest issues of our time, resulting in many unwanted effects on life on earth. Cloud fractional cover (CFC), the portion of the sky covered by clouds, might affect global warming and different other aspects of [...] Read more.

Climate change is stated as one of the largest issues of our time, resulting in many unwanted effects on life on earth. Cloud fractional cover (CFC), the portion of the sky covered by clouds, might affect global warming and different other aspects of human society such as agriculture and solar energy production. It is therefore important to improve the projection of future CFC, which is usually projected using numerical climate methods. In this paper, we explore the potential of using machine learning as part of a statistical downscaling framework to project future CFC. We are not aware of any other research that has explored this. We evaluated the potential of two different methods, a convolutional long short-term memory model (ConvLSTM) and a multiple regression equation, to predict CFC from other environmental variables. The predictions were associated with much uncertainty indicating that there might not be much information in the environmental variables used in the study to predict CFC. Overall the regression equation performed the best, but the ConvLSTM was the better performing model along some coastal and mountain areas. All aspects of the research analyses are explained including data preparation, model development, ML training, performance evaluation and visualization. Full article

(This article belongs to the Special Issue Multimedia Systems for Multimedia Big Data)

► Show Figures

Figure 1

54 pages, 6458 KiB

Open AccessEditor’s ChoiceArticle

6G Cognitive Information Theory: A Mailbox Perspective

by Yixue Hao, Yiming Miao, Min Chen, Hamid Gharavi and Victor C. M. Leung

Big Data Cogn. Comput. 2021, 5(4), 56; https://doi.org/10.3390/bdcc5040056 - 16 Oct 2021

Cited by 24 | Viewed by 26717

Abstract

With the rapid development of 5G communications, enhanced mobile broadband, massive machine type communications and ultra-reliable low latency communications are widely supported. However, a 5G communication system is still based on Shannon’s information theory, while the meaning and value of information itself are [...] Read more.

With the rapid development of 5G communications, enhanced mobile broadband, massive machine type communications and ultra-reliable low latency communications are widely supported. However, a 5G communication system is still based on Shannon’s information theory, while the meaning and value of information itself are not taken into account in the process of transmission. Therefore, it is difficult to meet the requirements of intelligence, customization, and value transmission of 6G networks. In order to solve the above challenges, we propose a 6G mailbox theory, namely a cognitive information carrier to enable distributed algorithm embedding for intelligence networking. Based on Mailbox, a 6G network will form an intelligent agent with self-organization, self-learning, self-adaptation, and continuous evolution capabilities. With the intelligent agent, redundant transmission of data can be reduced while the value transmission of information can be improved. Then, the features of mailbox principle are introduced, including polarity, traceability, dynamics, convergence, figurability, and dependence. Furthermore, key technologies with which value transmission of information can be realized are introduced, including knowledge graph, distributed learning, and blockchain. Finally, we establish a cognitive communication system assisted by deep learning. The experimental results show that, compared with a traditional communication system, our communication system performs less data transmission quantity and error. Full article

(This article belongs to the Special Issue Big Data and Cognitive Computing: 5th Anniversary Feature Papers)

► Show Figures

Figure 1

16 pages, 2196 KiB

Open AccessEditor’s ChoiceArticle

Effects of Neuro-Cognitive Load on Learning Transfer Using a Virtual Reality-Based Driving System

by Usman Alhaji Abdurrahman, Shih-Ching Yeh, Yunying Wong and Liang Wei

Big Data Cogn. Comput. 2021, 5(4), 54; https://doi.org/10.3390/bdcc5040054 - 13 Oct 2021

Cited by 10 | Viewed by 4301

Abstract

Understanding the ways different people perceive and apply acquired knowledge, especially when driving, is an important area of study. This study introduced a novel virtual reality (VR)-based driving system to determine the effects of neuro-cognitive load on learning transfer. In the experiment, easy [...] Read more.

Understanding the ways different people perceive and apply acquired knowledge, especially when driving, is an important area of study. This study introduced a novel virtual reality (VR)-based driving system to determine the effects of neuro-cognitive load on learning transfer. In the experiment, easy and difficult routes were introduced to the participants, and the VR system is capable of recording eye-gaze, pupil dilation, heart rate, as well as driving performance data. So, the main purpose here is to apply multimodal data fusion, several machine learning algorithms, and strategic analytic methods to measure neurocognitive load for user classification. A total of ninety-eight (98) university students participated in the experiment, in which forty-nine (49) were male participants and forty-nine (49) were female participants. The results showed that data fusion methods achieved higher accuracy compared to other classification methods. These findings highlight the importance of physiological monitoring to measure mental workload during the process of learning transfer. Full article

(This article belongs to the Special Issue Virtual Reality, Augmented Reality, and Human-Computer Interaction)

► Show Figures

Figure 1

21 pages, 1020 KiB

Open AccessFeature PaperEditor’s ChoiceArticle

Hardening the Security of Multi-Access Edge Computing through Bio-Inspired VM Introspection

by Huseyn Huseynov, Tarek Saadawi and Kenichi Kourai

Big Data Cogn. Comput. 2021, 5(4), 52; https://doi.org/10.3390/bdcc5040052 - 08 Oct 2021

Cited by 4 | Viewed by 3385

Abstract

The extreme bandwidth and performance of 5G mobile networks changes the way we develop and utilize digital services. Within a few years, 5G will not only touch technology and applications, but dramatically change the economy, our society and individual life. One of the [...] Read more.

The extreme bandwidth and performance of 5G mobile networks changes the way we develop and utilize digital services. Within a few years, 5G will not only touch technology and applications, but dramatically change the economy, our society and individual life. One of the emerging technologies that enables the evolution to 5G by bringing cloud capabilities near to the end users is Edge Computing or also known as Multi-Access Edge Computing (MEC) that will become pertinent towards the evolution of 5G. This evolution also entails growth in the threat landscape and increase privacy in concerns at different application areas, hence security and privacy plays a central role in the evolution towards 5G. Since MEC application instantiated in the virtualized infrastructure, in this paper we present a distributed application that aims to constantly introspect multiple virtual machines (VMs) in order to detect malicious activities based on their anomalous behavior. Once suspicious processes detected, our IDS in real-time notifies system administrator about the potential threat. Developed software is able to detect keyloggers, rootkits, trojans, process hiding and other intrusion artifacts via agent-less operation, by operating remotely or directly from the host machine. Remote memory introspection means no software to install, no notice to malware to evacuate or destroy data. Experimental results of remote VMI on more than 50 different malicious code demonstrate average anomaly detection rate close to 97%. We have established wide testbed environment connecting networks of two universities Kyushu Institute of Technology and The City College of New York through secure GRE tunnel. Conducted experiments on this testbed deliver high response time of the proposed system. Full article

(This article belongs to the Special Issue Information Security and Cyber Intelligence)

► Show Figures

Figure 1

15 pages, 42126 KiB

Open AccessEditor’s ChoiceArticle

Bag of Features (BoF) Based Deep Learning Framework for Bleached Corals Detection

by Sonain Jamil, MuhibUr Rahman and Amir Haider

Big Data Cogn. Comput. 2021, 5(4), 53; https://doi.org/10.3390/bdcc5040053 - 08 Oct 2021

Cited by 17 | Viewed by 5395

Abstract

Coral reefs are the sub-aqueous calcium carbonate structures collected by the invertebrates known as corals. The charm and beauty of coral reefs attract tourists, and they play a vital role in preserving biodiversity, ceasing coastal erosion, and promoting business trade. However, they are [...] Read more.

Coral reefs are the sub-aqueous calcium carbonate structures collected by the invertebrates known as corals. The charm and beauty of coral reefs attract tourists, and they play a vital role in preserving biodiversity, ceasing coastal erosion, and promoting business trade. However, they are declining because of over-exploitation, damaging fishery, marine pollution, and global climate changes. Also, coral reefs help treat human immune-deficiency virus (HIV), heart disease, and coastal erosion. The corals of Australia’s great barrier reef have started bleaching due to the ocean acidification, and global warming, which is an alarming threat to the earth’s ecosystem. Many techniques have been developed to address such issues. However, each method has a limitation due to the low resolution of images, diverse weather conditions, etc. In this paper, we propose a bag of features (BoF) based approach that can detect and localize the bleached corals before the safety measures are applied. The dataset contains images of bleached and unbleached corals, and various kernels are used to support the vector machine so that extracted features can be classified. The accuracy of handcrafted descriptors and deep convolutional neural networks is analyzed and provided in detail with comparison to the current method. Various handcrafted descriptors like local binary pattern, a histogram of an oriented gradient, locally encoded transform feature histogram, gray level co-occurrence matrix, and completed joint scale local binary pattern are used for feature extraction. Specific deep convolutional neural networks such as AlexNet, GoogLeNet, VGG-19, ResNet-50, Inception v3, and CoralNet are being used for feature extraction. From experimental analysis and results, the proposed technique outperforms in comparison to the current state-of-the-art methods. The proposed technique achieves 99.08% accuracy with a classification error of 0.92%. A novel bleached coral positioning algorithm is also proposed to locate bleached corals in the coral reef images. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

22 pages, 3598 KiB

Open AccessEditor’s ChoiceArticle

Big Data Contribution in Desktop and Mobile Devices Comparison, Regarding Airlines’ Digital Brand Name Effect

by Damianos P. Sakas and Nikolaos Th. Giannakopoulos

Big Data Cogn. Comput. 2021, 5(4), 48; https://doi.org/10.3390/bdcc5040048 - 26 Sep 2021

Cited by 26 | Viewed by 4930

Abstract

Rising demand for optimized digital marketing strategies has led firms in a hunt to harvest every possible aspect indicating users’ experience and preference. People visit, regularly through the day, numerous websites using both desktop and mobile devices. For businesses to acknowledge device’s usage [...] Read more.

Rising demand for optimized digital marketing strategies has led firms in a hunt to harvest every possible aspect indicating users’ experience and preference. People visit, regularly through the day, numerous websites using both desktop and mobile devices. For businesses to acknowledge device’s usage rates is extremely important. Thus, this research is focused on analyzing each device’s usage and their effect on airline firms’ digital brand name. In the first phase of the research, we gathered web data from 10 airline firms during an observation period of 180 days. We then proceeded in developing an exploratory model using Fuzzy Cognitive Mapping, as well as a predictive and simulation model using Agent-Based Modeling. We inferred that various factors of airlines’ digital brand name are affected by both desktop and mobile usage, with mobile usage having a slightly bigger impact on most of them, with gradually rising values. Desktop device usage also appeared to be quite significant, especially in traffic coming from referral sources. The paper’s contribution has been to provide a handful of time-accurate insights for marketeers, regarding airlines’ digital marketing strategies. Full article

► Show Figures

Figure 1

45 pages, 10033 KiB

Open AccessEditor’s ChoiceArticle

AI Based Emotion Detection for Textual Big Data: Techniques and Contribution

by Sheetal Kusal, Shruti Patil, Ketan Kotecha, Rajanikanth Aluvalu and Vijayakumar Varadarajan

Big Data Cogn. Comput. 2021, 5(3), 43; https://doi.org/10.3390/bdcc5030043 - 09 Sep 2021

Cited by 30 | Viewed by 13268

Abstract

Online Social Media (OSM) like Facebook and Twitter has emerged as a powerful tool to express via text people’s opinions and feelings about the current surrounding events. Understanding the emotions at the fine-grained level of these expressed thoughts is important for system improvement. [...] Read more.

Online Social Media (OSM) like Facebook and Twitter has emerged as a powerful tool to express via text people’s opinions and feelings about the current surrounding events. Understanding the emotions at the fine-grained level of these expressed thoughts is important for system improvement. Such crucial insights cannot be completely obtained by doing AI-based big data sentiment analysis; hence, text-based emotion detection using AI in social media big data has become an upcoming area of Natural Language Processing research. It can be used in various fields such as understanding expressed emotions, human–computer interaction, data mining, online education, recommendation systems, and psychology. Even though the research work is ongoing in this domain, it still lacks a formal study that can give a qualitative (techniques used) and quantitative (contributions) literature overview. This study has considered 827 Scopus and 83 Web of Science research papers from the years 2005–2020 for the analysis. The qualitative review represents different emotion models, datasets, algorithms, and application domains of text-based emotion detection. The quantitative bibliometric review of contributions presents research details such as publications, volume, co-authorship networks, citation analysis, and demographic research distribution. In the end, challenges and probable solutions are showcased, which can provide future research directions in this area. Full article

(This article belongs to the Special Issue Big Data and Internet of Things)

► Show Figures

Figure 1

28 pages, 3163 KiB

Open AccessEditor’s ChoiceArticle

Indoor Localization for Personalized Ambient Assisted Living of Multiple Users in Multi-Floor Smart Environments

by Nirmalya Thakur and Chia Y. Han

Big Data Cogn. Comput. 2021, 5(3), 42; https://doi.org/10.3390/bdcc5030042 - 08 Sep 2021

Cited by 21 | Viewed by 4173

Abstract

This paper presents a multifunctional interdisciplinary framework that makes four scientific contributions towards the development of personalized ambient assisted living (AAL), with a specific focus to address the different and dynamic needs of the diverse aging population in the future of smart living [...] Read more.

This paper presents a multifunctional interdisciplinary framework that makes four scientific contributions towards the development of personalized ambient assisted living (AAL), with a specific focus to address the different and dynamic needs of the diverse aging population in the future of smart living environments. First, it presents a probabilistic reasoning-based mathematical approach to model all possible forms of user interactions for any activity arising from user diversity of multiple users in such environments. Second, it presents a system that uses this approach with a machine learning method to model individual user-profiles and user-specific user interactions for detecting the dynamic indoor location of each specific user. Third, to address the need to develop highly accurate indoor localization systems for increased trust, reliance, and seamless user acceptance, the framework introduces a novel methodology where two boosting approaches—Gradient Boosting and the AdaBoost algorithm are integrated and used on a decision tree-based learning model to perform indoor localization. Fourth, the framework introduces two novel functionalities to provide semantic context to indoor localization in terms of detecting each user’s floor-specific location as well as tracking whether a specific user was located inside or outside a given spatial region in a multi-floor-based indoor setting. These novel functionalities of the proposed framework were tested on a dataset of localization-related Big Data collected from 18 different users who navigated in 3 buildings consisting of 5 floors and 254 indoor spatial regions, with an to address the limitation in prior works in this field centered around the lack of training data from diverse users. The results show that this approach of indoor localization for personalized AAL that models each specific user always achieves higher accuracy as compared to the traditional approach of modeling an average user. The results further demonstrate that the proposed framework outperforms all prior works in this field in terms of functionalities, performance characteristics, and operational features. Full article

(This article belongs to the Special Issue Advanced Data Mining Techniques for IoT and Big Data)

► Show Figures

Figure 1

21 pages, 476 KiB

Open AccessEditor’s ChoiceReview

A Review of Artificial Intelligence, Big Data, and Blockchain Technology Applications in Medicine and Global Health

by Supriya M. and Vijay Kumar Chattu

Big Data Cogn. Comput. 2021, 5(3), 41; https://doi.org/10.3390/bdcc5030041 - 06 Sep 2021

Cited by 52 | Viewed by 11781

Abstract

Artificial intelligence (AI) programs are applied to methods such as diagnostic procedures, treatment protocol development, patient monitoring, drug development, personalized medicine in healthcare, and outbreak predictions in global health, as in the case of the current COVID-19 pandemic. Machine learning (ML) is a [...] Read more.

Artificial intelligence (AI) programs are applied to methods such as diagnostic procedures, treatment protocol development, patient monitoring, drug development, personalized medicine in healthcare, and outbreak predictions in global health, as in the case of the current COVID-19 pandemic. Machine learning (ML) is a field of AI that allows computers to learn and improve without being explicitly programmed. ML algorithms can also analyze large amounts of data called Big data through electronic health records for disease prevention and diagnosis. Wearable medical devices are used to continuously monitor an individual’s health status and store it in cloud computing. In the context of a newly published study, the potential benefits of sophisticated data analytics and machine learning are discussed in this review. We have conducted a literature search in all the popular databases such as Web of Science, Scopus, MEDLINE/PubMed and Google Scholar search engines. This paper describes the utilization of concepts underlying ML, big data, blockchain technology and their importance in medicine, healthcare, public health surveillance, case estimations in COVID-19 pandemic and other epidemics. The review also goes through the possible consequences and difficulties for medical practitioners and health technologists in designing futuristic models to improve the quality and well-being of human lives. Full article

(This article belongs to the Special Issue Big Data and Cognitive Computing: 5th Anniversary Feature Papers)

► Show Figures

Figure 1

16 pages, 1130 KiB

Open AccessEditor’s ChoiceArticle

A Novel Approach to Learning Models on EEG Data Using Graph Theory Features—A Comparative Study

by Bhargav Prakash, Gautam Kumar Baboo and Veeky Baths

Big Data Cogn. Comput. 2021, 5(3), 39; https://doi.org/10.3390/bdcc5030039 - 28 Aug 2021

Cited by 4 | Viewed by 5338

Abstract

Brain connectivity is studied as a functionally connected network using statistical methods such as measuring correlation or covariance. The non-invasive neuroimaging techniques such as Electroencephalography (EEG) signals are converted to networks by transforming the signals into a Correlation Matrix and analyzing the resulting [...] Read more.

Brain connectivity is studied as a functionally connected network using statistical methods such as measuring correlation or covariance. The non-invasive neuroimaging techniques such as Electroencephalography (EEG) signals are converted to networks by transforming the signals into a Correlation Matrix and analyzing the resulting networks. Here, four learning models, namely, Logistic Regression, Random Forest, Support Vector Machine, and Recurrent Neural Networks (RNN), are implemented on two different types of correlation matrices: Correlation Matrix (static connectivity) and Time-resolved Correlation Matrix (dynamic connectivity), to classify them either on their psychometric assessment or the effect of therapy. These correlation matrices are different from traditional learning techniques in the sense that they incorporate theory-based graph features into the learning models, thus providing novelty to this study. The EEG data used in this study is trail-based/event-related from five different experimental paradigms, of which can be broadly classified as working memory tasks and assessment of emotional states (depression, anxiety, and stress). The classifications based on RNN provided higher accuracy (74–88%) than the other three models (50–78%). Instead of using individual graph features, a Correlation Matrix provides an initial test of the data. When compared with the Time-resolved Correlation Matrix, it offered a 4–5% higher accuracy. The Time-resolved Correlation Matrix is better suited for dynamic studies here; it provides lower accuracy when compared to the Correlation Matrix, a static feature. Full article

(This article belongs to the Special Issue Knowledge Modelling and Learning through Cognitive Networks)

► Show Figures

Figure 1

15 pages, 355 KiB

Open AccessEditor’s ChoiceArticle

Comparing Swarm Intelligence Algorithms for Dimension Reduction in Machine Learning

by Gabriella Kicska and Attila Kiss

Big Data Cogn. Comput. 2021, 5(3), 36; https://doi.org/10.3390/bdcc5030036 - 13 Aug 2021

Cited by 14 | Viewed by 4549

Abstract

Nowadays, the high-dimensionality of data causes a variety of problems in machine learning. It is necessary to reduce the feature number by selecting only the most relevant of them. Different approaches called Feature Selection are used for this task. In this paper, we [...] Read more.

Nowadays, the high-dimensionality of data causes a variety of problems in machine learning. It is necessary to reduce the feature number by selecting only the most relevant of them. Different approaches called Feature Selection are used for this task. In this paper, we propose a Feature Selection method that uses Swarm Intelligence techniques. Swarm Intelligence algorithms perform optimization by searching for optimal points in the search space. We show the usability of these techniques for solving Feature Selection and compare the performance of five major swarm algorithms: Particle Swarm Optimization, Artificial Bee Colony, Invasive Weed Optimization, Bat Algorithm, and Grey Wolf Optimizer. The accuracy of a decision tree classifier was used to evaluate the algorithms. It turned out that the dimension of the data can be reduced about two times without a loss in accuracy. Moreover, the accuracy increased when abandoning redundant features. Based on our experiments GWO turned out to be the best. It has the highest ranking on different datasets, and its average iteration number to find the best solution is 30.8. ABC obtained the lowest ranking on high-dimensional datasets. Full article

► Show Figures

Figure 1

20 pages, 1241 KiB

Open AccessEditor’s ChoiceArticle

Big Data Research in Fighting COVID-19: Contributions and Techniques

by Dianadewi Riswantini, Ekasari Nugraheni, Andria Arisal, Purnomo Husnul Khotimah, Devi Munandar and Wiwin Suwarningsih

Big Data Cogn. Comput. 2021, 5(3), 30; https://doi.org/10.3390/bdcc5030030 - 12 Jul 2021

Cited by 12 | Viewed by 7051

Abstract

The COVID-19 pandemic has induced many problems in various sectors of human life. After more than one year of the pandemic, many studies have been conducted to discover various technological innovations and applications to combat the virus that has claimed many lives. The [...] Read more.

The COVID-19 pandemic has induced many problems in various sectors of human life. After more than one year of the pandemic, many studies have been conducted to discover various technological innovations and applications to combat the virus that has claimed many lives. The use of Big Data technology to mitigate the threats of the pandemic has been accelerated. Therefore, this survey aims to explore Big Data technology research in fighting the pandemic. Furthermore, the relevance of Big Data technology was analyzed while technological contributions to five main areas were highlighted. These include healthcare, social life, government policy, business and management, and the environment. The analytical techniques of machine learning, deep learning, statistics, and mathematics were discussed to solve issues regarding the pandemic. The data sources used in previous studies were also presented and they consist of government officials, institutional service, IoT generated, online media, and open data. Therefore, this study presents the role of Big Data technologies in enhancing the research relative to COVID-19 and provides insights into the current state of knowledge within the domain and references for further development or starting new studies are provided. Full article

(This article belongs to the Special Issue Advanced Data Mining Techniques for IoT and Big Data)

► Show Figures

Figure 1

29 pages, 549 KiB

Open AccessEditor’s ChoiceArticle

Big Data and the United Nations Sustainable Development Goals (UN SDGs) at a Glance

by Hossein Hassani, Xu Huang, Steve MacFeely and Mohammad Reza Entezarian

Big Data Cogn. Comput. 2021, 5(3), 28; https://doi.org/10.3390/bdcc5030028 - 28 Jun 2021

Cited by 45 | Viewed by 14843

Abstract

The launch of the United Nations (UN) 17 Sustainable Development Goals (SDGs) in 2015 was a historic event, uniting countries around the world around the shared agenda of sustainable development with a more balanced relationship between human beings and the planet. The SDGs [...] Read more.

The launch of the United Nations (UN) 17 Sustainable Development Goals (SDGs) in 2015 was a historic event, uniting countries around the world around the shared agenda of sustainable development with a more balanced relationship between human beings and the planet. The SDGs affect or impact almost all aspects of life, as indeed does the technological revolution, empowered by Big Data and their related technologies. It is inevitable that these two significant domains and their integration will play central roles in achieving the 2030 Agenda. This research aims to provide a comprehensive overview of how these domains are currently interacting, by illustrating the impact of Big Data on sustainable development in the context of each of the 17 UN SDGs. Full article

(This article belongs to the Special Issue Big Data and UN Sustainable Development Goals (SDGs))

► Show Figures

Figure 1

Journal Menu

Journal Browser

Editor’s Choice Articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI