Editor’s Choice Articles

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 735 KiB  
Review
Explore Big Data Analytics Applications and Opportunities: A Review
by Zaher Ali Al-Sai, Mohd Heikal Husin, Sharifah Mashita Syed-Mohamad, Rasha Moh’d Sadeq Abdin, Nour Damer, Laith Abualigah and Amir H. Gandomi
Big Data Cogn. Comput. 2022, 6(4), 157; https://doi.org/10.3390/bdcc6040157 - 14 Dec 2022
Cited by 11 | Viewed by 7608
Abstract
Big data applications and analytics are vital in proposing ultimate strategic decisions. The existing literature emphasizes that big data applications and analytics can empower those who apply Big Data Analytics during the COVID-19 pandemic. This paper reviews the existing literature specializing in big [...] Read more.
Big data applications and analytics are vital in proposing ultimate strategic decisions. The existing literature emphasizes that big data applications and analytics can empower those who apply Big Data Analytics during the COVID-19 pandemic. This paper reviews the existing literature specializing in big data applications pre and peri-COVID-19. A comparison between Pre and Peri of the pandemic for using Big Data applications is presented. The comparison is expanded to four highly recognized industry fields: Healthcare, Education, Transportation, and Banking. A discussion on the effectiveness of the four major types of data analytics across the mentioned industries is highlighted. Hence, this paper provides an illustrative description of the importance of big data applications in the era of COVID-19, as well as aligning the applications to their relevant big data analytics models. This review paper concludes that applying the ultimate big data applications and their associated data analytics models can harness the significant limitations faced by organizations during one of the most fateful pandemics worldwide. Future work will conduct a systematic literature review and a comparative analysis of the existing Big Data Systems and models. Moreover, future work will investigate the critical challenges of Big Data Analytics and applications during the COVID-19 pandemic. Full article
Show Figures

Figure 1

31 pages, 6664 KiB  
Review
Machine Learning Styles for Diabetic Retinopathy Detection: A Review and Bibliometric Analysis
by Shyamala Subramanian, Sashikala Mishra, Shruti Patil, Kailash Shaw and Ebrahim Aghajari
Big Data Cogn. Comput. 2022, 6(4), 154; https://doi.org/10.3390/bdcc6040154 - 12 Dec 2022
Cited by 7 | Viewed by 6341
Abstract
Diabetic retinopathy (DR) is a medical condition caused by diabetes. The development of retinopathy significantly depends on how long a person has had diabetes. Initially, there may be no symptoms or just a slight vision problem due to impairment of the retinal blood [...] Read more.
Diabetic retinopathy (DR) is a medical condition caused by diabetes. The development of retinopathy significantly depends on how long a person has had diabetes. Initially, there may be no symptoms or just a slight vision problem due to impairment of the retinal blood vessels. Later, it may lead to blindness. Recognizing the early clinical signs of DR is very important for intervening in and effectively treating DR. Thus, regular eye check-ups are necessary to direct the person to a doctor for a comprehensive ocular examination and treatment as soon as possible to avoid permanent vision loss. Nevertheless, due to limited resources, it is not feasible for screening. As a result, emerging technologies, such as artificial intelligence, for the automatic detection and classification of DR are alternative screening methodologies and thereby make the system cost-effective. People have been working on artificial-intelligence-based technologies to detect and analyze DR in recent years. This study aimed to investigate different machine learning styles that are chosen for diagnosing retinopathy. Thus, a bibliometric analysis was systematically done to discover different machine learning styles for detecting diabetic retinopathy. The data were exported from popular databases, namely, Web of Science (WoS) and Scopus. These data were analyzed using Biblioshiny and VOSviewer in terms of publications, top countries, sources, subject area, top authors, trend topics, co-occurrences, thematic evolution, factorial map, citation analysis, etc., which form the base for researchers to identify the research gaps in diabetic retinopathy detection and classification. Full article
Show Figures

Figure 1

17 pages, 1860 KiB  
Article
Explaining Exploration–Exploitation in Humans
by Antonio Candelieri, Andrea Ponti and Francesco Archetti
Big Data Cogn. Comput. 2022, 6(4), 155; https://doi.org/10.3390/bdcc6040155 - 12 Dec 2022
Cited by 1 | Viewed by 1585
Abstract
Human as well as algorithmic searches are performed to balance exploration and exploitation. The search task in this paper is the global optimization of a 2D multimodal function, unknown to the searcher. Thus, the task presents the following features: (i) uncertainty [...] Read more.
Human as well as algorithmic searches are performed to balance exploration and exploitation. The search task in this paper is the global optimization of a 2D multimodal function, unknown to the searcher. Thus, the task presents the following features: (i) uncertainty (i.e., information about the function can be acquired only through function observations), (ii) sequentiality (i.e., the choice of the next point to observe depends on the previous ones), and (iii) limited budget (i.e., a maximum number of sequential choices allowed to the players). The data about human behavior are gathered through a gaming app whose screen represents all the possible locations the player can click on. The associated value of the unknown function is shown to the player. Experimental data are gathered from 39 subjects playing 10 different tasks each. Decisions are analyzed in a Pareto optimality setting—improvement vs. uncertainty. The experimental results show that the most significant deviations from the Pareto rationality are associated with a behavior named “exasperated exploration”, close to random search. This behavior shows a statistically significant association with stressful situations occurring when, according to their current belief, the human feels there are no chances to improve over the best value observed so far, while the remaining budget is running out. To classify between Pareto and Not-Pareto decisions, an explainable/interpretable Machine Learning model based on Decision Tree learning is developed. The resulting model is used to implement a synthetic human searcher/optimizer successively compared against Bayesian Optimization. On half of the test problems, the synthetic human results as more effective and efficient. Full article
Show Figures

Figure 1

22 pages, 2447 KiB  
Article
An Advanced Big Data Quality Framework Based on Weighted Metrics
by Widad Elouataoui, Imane El Alaoui, Saida El Mendili and Youssef Gahi
Big Data Cogn. Comput. 2022, 6(4), 153; https://doi.org/10.3390/bdcc6040153 - 09 Dec 2022
Cited by 7 | Viewed by 3020
Abstract
While big data benefits are numerous, the use of big data requires, however, addressing new challenges related to data processing, data security, and especially degradation of data quality. Despite the increased importance of data quality for big data, data quality measurement is actually [...] Read more.
While big data benefits are numerous, the use of big data requires, however, addressing new challenges related to data processing, data security, and especially degradation of data quality. Despite the increased importance of data quality for big data, data quality measurement is actually limited to few metrics. Indeed, while more than 50 data quality dimensions have been defined in the literature, the number of measured dimensions is limited to 11 dimensions. Therefore, this paper aims to extend the measured dimensions by defining four new data quality metrics: Integrity, Accessibility, Ease of manipulation, and Security. Thus, we propose a comprehensive Big Data Quality Assessment Framework based on 12 metrics: Completeness, Timeliness, Volatility, Uniqueness, Conformity, Consistency, Ease of manipulation, Relevancy, Readability, Security, Accessibility, and Integrity. In addition, to ensure accurate data quality assessment, we apply data weights at three data unit levels: data fields, quality metrics, and quality aspects. Furthermore, we define and measure five quality aspects to provide a macro-view of data quality. Finally, an experiment is performed to implement the defined measures. The results show that the suggested methodology allows a more exhaustive and accurate big data quality assessment, with a more extensive methodology defining a weighted quality score based on 12 metrics and achieving a best quality model score of 9/10. Full article
Show Figures

Figure 1

21 pages, 3495 KiB  
Article
Innovative Business Process Reengineering Adoption: Framework of Big Data Sentiment, Improving Customers’ Service Level Agreement
by Heru Susanto, Aida Sari and Fang-Yie Leu
Big Data Cogn. Comput. 2022, 6(4), 151; https://doi.org/10.3390/bdcc6040151 - 08 Dec 2022
Cited by 2 | Viewed by 2467
Abstract
Social media is now regarded as the most valuable source of data for trend analysis and innovative business process reengineering preferences. Data made accessible through social media can be utilized for a variety of purposes, such as by an entrepreneur who wants to [...] Read more.
Social media is now regarded as the most valuable source of data for trend analysis and innovative business process reengineering preferences. Data made accessible through social media can be utilized for a variety of purposes, such as by an entrepreneur who wants to learn more about the market they intend to enter and uncover their consumers’ requirements before launching their new products or services. Sentiment analysis and text mining of telecommunication businesses via social media posts and comments are the subject of this study. A proposed framework will be utilized as a guideline, and it will be tested for sentiment analysis. Lexicon-based sentiment categorization is used as a model training dataset for a supervised machine learning support vector machine. The result is very promising. The accuracy and the quantity of the true sentiments it can detect are compared. This result signifies the usefulness of text mining and sentiment analysis on social media data, while the use of machine learning classifiers for predicting sentiment orientation provides a useful tool for operations and marketing departments. The availability of large amounts of data in this digitally active society is advantageous for sectors such as the telecommunication industry. These companies can be two steps ahead with their strategy and develop a more cohesive company that can make customers happier and mitigate problems easily with the use of text mining and sentiment analysis for further adopting innovative business process reengineering for service improvements within the telecommunications industry. Full article
(This article belongs to the Special Issue Advanced Data Mining Techniques for IoT and Big Data)
Show Figures

Figure 1

41 pages, 6572 KiB  
Review
A Systematic Literature Review on Diabetic Retinopathy Using an Artificial Intelligence Approach
by Pooja Bidwai, Shilpa Gite, Kishore Pahuja and Ketan Kotecha
Big Data Cogn. Comput. 2022, 6(4), 152; https://doi.org/10.3390/bdcc6040152 - 08 Dec 2022
Cited by 9 | Viewed by 8474
Abstract
Diabetic retinopathy occurs due to long-term diabetes with changing blood glucose levels and has become the most common cause of vision loss worldwide. It has become a severe problem among the working-age group that needs to be solved early to avoid vision loss [...] Read more.
Diabetic retinopathy occurs due to long-term diabetes with changing blood glucose levels and has become the most common cause of vision loss worldwide. It has become a severe problem among the working-age group that needs to be solved early to avoid vision loss in the future. Artificial intelligence-based technologies have been utilized to detect and grade diabetic retinopathy at the initial level. Early detection allows for proper treatment and, as a result, eyesight complications can be avoided. The in-depth analysis now details the various methods for diagnosing diabetic retinopathy using blood vessels, microaneurysms, exudates, macula, optic discs, and hemorrhages. In most trials, fundus images of the retina are used, which are taken using a fundus camera. This survey discusses the basics of diabetes, its prevalence, complications, and artificial intelligence approaches to deal with the early detection and classification of diabetic retinopathy. The research also discusses artificial intelligence-based techniques such as machine learning and deep learning. New research fields such as transfer learning using generative adversarial networks, domain adaptation, multitask learning, and explainable artificial intelligence in diabetic retinopathy are also considered. A list of existing datasets, screening systems, performance measurements, biomarkers in diabetic retinopathy, potential issues, and challenges faced in ophthalmology, followed by the future scope conclusion, is discussed. To the author, no other literature has analyzed recent state-of-the-art techniques considering the PRISMA approach and artificial intelligence as the core. Full article
Show Figures

Figure 1

16 pages, 4442 KiB  
Article
Yolov5 Series Algorithm for Road Marking Sign Identification
by Christine Dewi, Rung-Ching Chen, Yong-Cun Zhuang and Henoch Juli Christanto
Big Data Cogn. Comput. 2022, 6(4), 149; https://doi.org/10.3390/bdcc6040149 - 07 Dec 2022
Cited by 10 | Viewed by 4777
Abstract
Road markings and signs provide vehicles and pedestrians with essential information that assists them to follow the traffic regulations. Road surface markings include pedestrian crossings, directional arrows, zebra crossings, speed limit signs, other similar signs and text, and so on, which are usually [...] Read more.
Road markings and signs provide vehicles and pedestrians with essential information that assists them to follow the traffic regulations. Road surface markings include pedestrian crossings, directional arrows, zebra crossings, speed limit signs, other similar signs and text, and so on, which are usually painted directly onto the road surface. Road markings fulfill a variety of important functions, such as alerting drivers to the potentially hazardous road section, directing traffic, prohibiting certain actions, and slowing down. This research paper provides a summary of the Yolov5 algorithm series for road marking sign identification, which includes Yolov5s, Yolov5m, Yolov5n, Yolov5l, and Yolov5x. This study explores a wide range of contemporary object detectors, such as the ones that are used to determine the location of road marking signs. Performance metrics monitor important data, including the quantity of BFLOPS, the mean average precision (mAP), and the detection time (IoU). Our findings shows that Yolov5m is the most stable method compared to other methods with 76% precision, 86% recall, and 83% mAP during the training stage. Moreover, Yolov5m and Yolov5l achieve the highest score, mAP 87% on average in the testing stage. In addition, we have created a new dataset for road marking signs in Taiwan, called TRMSD. Full article
(This article belongs to the Special Issue Computational Collective Intelligence with Big Data–AI Society)
Show Figures

Figure 1

13 pages, 879 KiB  
Article
Trust-Based Data Communication in Wireless Body Area Network for Healthcare Applications
by Sangeetha Ramaswamy and Usha Devi Gandhi
Big Data Cogn. Comput. 2022, 6(4), 148; https://doi.org/10.3390/bdcc6040148 - 01 Dec 2022
Cited by 3 | Viewed by 1936
Abstract
A subset of Wireless Sensor Networks, Wireless Body Area Networks (WBAN) is an emerging technology. WBAN is a collection of tiny pieces of wireless body sensors with small computational capability, communicating short distances using ZigBee or Bluetooth, an application mainly in the healthcare [...] Read more.
A subset of Wireless Sensor Networks, Wireless Body Area Networks (WBAN) is an emerging technology. WBAN is a collection of tiny pieces of wireless body sensors with small computational capability, communicating short distances using ZigBee or Bluetooth, an application mainly in the healthcare industry like remote patient monitoring. The small piece of sensor monitors health factors like body temperature, pulse rate, ECG, heart rate, etc., and communicates to the base station or central coordinator for aggregation or data computation. The final data is communicated to remote monitoring devices through the internet or cloud service providers. The main challenge for this technology is energy consumption and secure communication within the network and the possibility of attacks executed by malicious nodes, creating problems for the network. This system proposes a suitable trust model for secure communication in WBAN based on node trust and data trust. Node trust is calculated using direct trust calculation and node behaviours. The data trust is calculated using consistent data success and data aging. The performance is compared with an existing protocol like Trust Evaluation (TE)-WBAN and Body Area Network (BAN)-Trust which is not a cryptographic technique. The protocol is lightweight and has low overhead. The performance is rated best for Throughput, Packet Delivery Ratio, and Minimum delay. With extensive simulation on-off attacks, Selfishness attacks, sleeper attacks, and Message suppression attacks were prevented. Full article
(This article belongs to the Special Issue Computational Collective Intelligence with Big Data–AI Society)
Show Figures

Figure 1

19 pages, 6928 KiB  
Article
Image Fundus Classification System for Diabetic Retinopathy Stage Detection Using Hybrid CNN-DELM
by Dian Candra Rini Novitasari, Fatmawati Fatmawati, Rimuljo Hendradi, Hetty Rohayani, Rinda Nariswari, Arnita Arnita, Moch Irfan Hadi, Rizal Amegia Saputra and Ardhin Primadewi
Big Data Cogn. Comput. 2022, 6(4), 146; https://doi.org/10.3390/bdcc6040146 - 01 Dec 2022
Cited by 7 | Viewed by 2080
Abstract
Diabetic retinopathy is the leading cause of blindness suffered by working-age adults. The increase in the population diagnosed with DR can be prevented by screening and early treatment of eye damage. This screening process can be conducted by utilizing deep learning techniques. In [...] Read more.
Diabetic retinopathy is the leading cause of blindness suffered by working-age adults. The increase in the population diagnosed with DR can be prevented by screening and early treatment of eye damage. This screening process can be conducted by utilizing deep learning techniques. In this study, the detection of DR severity was carried out using the hybrid CNN-DELM method (CDELM). The CNN architectures used were ResNet-18, ResNet-50, ResNet-101, GoogleNet, and DenseNet. The learning outcome features were further classified using the DELM algorithm. The comparison of CNN architecture aimed to find the best CNN architecture for fundus image features extraction. This research also compared the effect of using the kernel function on the performance of DELM in fundus image classification. All experiments using CDELM showed maximum results, with an accuracy of 100% in the DRIVE data and the two-class MESSIDOR data. Meanwhile, the best results obtained in the MESSIDOR 4 class data reached 98.20%. The advantage of the DELM method compared to the conventional CNN method is that the training time duration is much shorter. CNN takes an average of 30 min for training, while the CDELM method takes only an average of 2.5 min. Based on the value of accuracy and duration of training time, the CDELM method had better performance than the conventional CNN method. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

20 pages, 6697 KiB  
Article
Image Segmentation for Mitral Regurgitation with Convolutional Neural Network Based on UNet, Resnet, Vnet, FractalNet and SegNet: A Preliminary Study
by Linda Atika, Siti Nurmaini, Radiyati Umi Partan and Erwin Sukandi
Big Data Cogn. Comput. 2022, 6(4), 141; https://doi.org/10.3390/bdcc6040141 - 25 Nov 2022
Cited by 5 | Viewed by 2282
Abstract
The heart’s mitral valve is the valve that separates the chambers of the heart between the left atrium and left ventricle. Heart valve disease is a fairly common heart disease, and one type of heart valve disease is mitral regurgitation, which is an [...] Read more.
The heart’s mitral valve is the valve that separates the chambers of the heart between the left atrium and left ventricle. Heart valve disease is a fairly common heart disease, and one type of heart valve disease is mitral regurgitation, which is an abnormality of the mitral valve on the left side of the heart that causes an inability of the mitral valve to close properly. Convolutional Neural Network (CNN) is a type of deep learning that is suitable for use in image analysis. Segmentation is widely used in analyzing medical images because it can divide images into simpler ones to facilitate the analysis process by separating objects that are not analyzed into backgrounds and objects to be analyzed into foregrounds. This study builds a dataset from the data of patients with mitral regurgitation and patients who have normal hearts, and heart valve image analysis is done by segmenting the images of their mitral heart valves. Several types of CNN architecture were applied in this research, including U-Net, SegNet, V-Net, FractalNet, and ResNet architectures. The experimental results show that the best architecture is U-Net3 in terms of Pixel Accuracy (97.59%), Intersection over Union (86.98%), Mean Accuracy (93.46%), Precision (85.60%), Recall (88.39%), and Dice Coefficient (86.58%). Full article
(This article belongs to the Special Issue Advancements in Deep Learning and Deep Federated Learning Models)
Show Figures

Figure 1

13 pages, 367 KiB  
Article
PSO-Driven Feature Selection and Hybrid Ensemble for Network Anomaly Detection
by Maya Hilda Lestari Louk and Bayu Adhi Tama
Big Data Cogn. Comput. 2022, 6(4), 137; https://doi.org/10.3390/bdcc6040137 - 13 Nov 2022
Cited by 6 | Viewed by 2156
Abstract
As a system capable of monitoring and evaluating illegitimate network access, an intrusion detection system (IDS) profoundly impacts information security research. Since machine learning techniques constitute the backbone of IDS, it has been challenging to develop an accurate detection mechanism. This study aims [...] Read more.
As a system capable of monitoring and evaluating illegitimate network access, an intrusion detection system (IDS) profoundly impacts information security research. Since machine learning techniques constitute the backbone of IDS, it has been challenging to develop an accurate detection mechanism. This study aims to enhance the detection performance of IDS by using a particle swarm optimization (PSO)-driven feature selection approach and hybrid ensemble. Specifically, the final feature subsets derived from different IDS datasets, i.e., NSL-KDD, UNSW-NB15, and CICIDS-2017, are trained using a hybrid ensemble, comprising two well-known ensemble learners, i.e., gradient boosting machine (GBM) and bootstrap aggregation (bagging). Instead of training GBM with individual ensemble learning, we train GBM on a subsample of each intrusion dataset and combine the final class prediction using majority voting. Our proposed scheme led to pivotal refinements over existing baselines, such as TSE-IDS, voting ensembles, weighted majority voting, and other individual ensemble-based IDS such as LightGBM. Full article
Show Figures

Figure 1

24 pages, 718 KiB  
Review
An Overview of Data Warehouse and Data Lake in Modern Enterprise Data Management
by Athira Nambiar and Divyansh Mundra
Big Data Cogn. Comput. 2022, 6(4), 132; https://doi.org/10.3390/bdcc6040132 - 07 Nov 2022
Cited by 25 | Viewed by 29521
Abstract
Data is the lifeblood of any organization. In today’s world, organizations recognize the vital role of data in modern business intelligence systems for making meaningful decisions and staying competitive in the field. Efficient and optimal data analytics provides a competitive edge to its [...] Read more.
Data is the lifeblood of any organization. In today’s world, organizations recognize the vital role of data in modern business intelligence systems for making meaningful decisions and staying competitive in the field. Efficient and optimal data analytics provides a competitive edge to its performance and services. Major organizations generate, collect and process vast amounts of data, falling under the category of big data. Managing and analyzing the sheer volume and variety of big data is a cumbersome process. At the same time, proper utilization of the vast collection of an organization’s information can generate meaningful insights into business tactics. In this regard, two of the popular data management systems in the area of big data analytics (i.e., data warehouse and data lake) act as platforms to accumulate the big data generated and used by organizations. Although seemingly similar, both of them differ in terms of their characteristics and applications. This article presents a detailed overview of the roles of data warehouses and data lakes in modern enterprise data management. We detail the definitions, characteristics and related works for the respective data management frameworks. Furthermore, we explain the architecture and design considerations of the current state of the art. Finally, we provide a perspective on the challenges and promising research directions for the future. Full article
Show Figures

Figure 1

19 pages, 686 KiB  
Article
THOR: A Hybrid Recommender System for the Personalized Travel Experience
by Alireza Javadian Sabet, Mahsa Shekari, Chaofeng Guan, Matteo Rossi, Fabio Schreiber and Letizia Tanca
Big Data Cogn. Comput. 2022, 6(4), 131; https://doi.org/10.3390/bdcc6040131 - 04 Nov 2022
Cited by 2 | Viewed by 3527
Abstract
One of the travelers’ main challenges is that they have to spend a great effort to find and choose the most desired travel offer(s) among a vast list of non-categorized and non-personalized items. Recommendation systems provide an effective way to solve the problem [...] Read more.
One of the travelers’ main challenges is that they have to spend a great effort to find and choose the most desired travel offer(s) among a vast list of non-categorized and non-personalized items. Recommendation systems provide an effective way to solve the problem of information overload. In this work, we design and implement “The Hybrid Offer Ranker” (THOR), a hybrid, personalized recommender system for the transportation domain. THOR assigns every traveler a unique contextual preference model built using solely their personal data, which makes the model sensitive to the user’s choices. This model is used to rank travel offers presented to each user according to their personal preferences. We reduce the recommendation problem to one of binary classification that predicts the probability with which the traveler will buy each available travel offer. Travel offers are ranked according to the computed probabilities, hence to the user’s personal preference model. Moreover, to tackle the cold start problem for new users, we apply clustering algorithms to identify groups of travelers with similar profiles and build a preference model for each group. To test the system’s performance, we generate a dataset according to some carefully designed rules. The results of the experiments show that the THOR tool is capable of learning the contextual preferences of each traveler and ranks offers starting from those that have the higher probability of being selected. Full article
(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems)
Show Figures

Figure 1

29 pages, 620 KiB  
Article
A Space-Time Framework for Sentiment Scope Analysis in Social Media
by Gianluca Bonifazi, Francesco Cauteruccio, Enrico Corradini, Michele Marchetti, Luigi Sciarretta, Domenico Ursino and Luca Virgili
Big Data Cogn. Comput. 2022, 6(4), 130; https://doi.org/10.3390/bdcc6040130 - 03 Nov 2022
Cited by 19 | Viewed by 2746
Abstract
The concept of scope was introduced in Social Network Analysis to assess the authoritativeness and convincing ability of a user toward other users on one or more social platforms. It has been studied in the past in some specific contexts, for example to [...] Read more.
The concept of scope was introduced in Social Network Analysis to assess the authoritativeness and convincing ability of a user toward other users on one or more social platforms. It has been studied in the past in some specific contexts, for example to assess the ability of a user to spread information on Twitter. In this paper, we propose a new investigation on scope, as we want to assess the scope of the sentiment of a user on a topic. We also propose a multi-dimensional definition of scope. In fact, besides the traditional spatial scope, we introduce the temporal one, which has never been addressed in the literature, and propose a model that allows the concept of scope to be extended to further dimensions in the future. Furthermore, we propose an approach and a related set of parameters for measuring the scope of the sentiment of a user on a topic in a social network. Finally, we illustrate the results of an experimental campaign we conducted to evaluate the proposed framework on a dataset derived from Reddit. The main novelties of this paper are: (i) a multi-dimensional view of scope; (ii) the introduction of the concept of sentiment scope; (iii) the definition of a general framework capable of analyzing the sentiment scope related to any subject on any social network. Full article
Show Figures

Figure 1

22 pages, 601 KiB  
Review
Facial Age Estimation Using Machine Learning Techniques: An Overview
by Khaled ELKarazle, Valliappan Raman and Patrick Then
Big Data Cogn. Comput. 2022, 6(4), 128; https://doi.org/10.3390/bdcc6040128 - 26 Oct 2022
Cited by 10 | Viewed by 8769
Abstract
Automatic age estimation from facial images is an exciting machine learning topic that has attracted researchers’ attention over the past several years. Numerous human–computer interaction applications, such as targeted marketing, content access control, or soft-biometrics systems, employ age estimation models to carry out [...] Read more.
Automatic age estimation from facial images is an exciting machine learning topic that has attracted researchers’ attention over the past several years. Numerous human–computer interaction applications, such as targeted marketing, content access control, or soft-biometrics systems, employ age estimation models to carry out secondary tasks such as user filtering or identification. Despite the vast array of applications that could benefit from automatic age estimation, building an automatic age estimation system comes with issues such as data disparity, the unique ageing pattern of each individual, and facial photo quality. This paper provides a survey on the standard methods of building automatic age estimation models, the benchmark datasets for building these models, and some of the latest proposed pieces of literature that introduce new age estimation methods. Finally, we present and discuss the standard evaluation metrics used to assess age estimation models. In addition to the survey, we discuss the identified gaps in the reviewed literature and present recommendations for future research. Full article
Show Figures

Figure 1

23 pages, 598 KiB  
Review
Applications and Challenges of Federated Learning Paradigm in the Big Data Era with Special Emphasis on COVID-19
by Abdul Majeed, Xiaohan Zhang and Seong Oun Hwang
Big Data Cogn. Comput. 2022, 6(4), 127; https://doi.org/10.3390/bdcc6040127 - 26 Oct 2022
Cited by 5 | Viewed by 3760
Abstract
Federated learning (FL) is one of the leading paradigms of modern times with higher privacy guarantees than any other digital solution. Since its inception in 2016, FL has been rigorously investigated from multiple perspectives. Some of these perspectives are extensions of FL’s applications [...] Read more.
Federated learning (FL) is one of the leading paradigms of modern times with higher privacy guarantees than any other digital solution. Since its inception in 2016, FL has been rigorously investigated from multiple perspectives. Some of these perspectives are extensions of FL’s applications in different sectors, communication overheads, statistical heterogeneity problems, client dropout issues, the legitimacy of FL system results, privacy preservation, etc. Recently, FL is being increasingly used in the medical domain for multiple purposes, and many successful applications exist that are serving mankind in various ways. In this work, we describe the novel applications and challenges of the FL paradigm with special emphasis on the COVID-19 pandemic. We describe the synergies of FL with other emerging technologies to accomplish multiple services to fight the COVID-19 pandemic. We analyze the recent open-source development of FL which can help in designing scalable and reliable FL models. Lastly, we suggest valuable recommendations to enhance the technical persuasiveness of the FL paradigm. To the best of the authors’ knowledge, this is the first work that highlights the efficacy of FL in the era of COVID-19. The analysis enclosed in this article can pave the way for understanding the technical efficacy of FL in medical field, specifically COVID-19. Full article
(This article belongs to the Special Issue Cyber Security in Big Data Era)
Show Figures

Figure 1

20 pages, 3064 KiB  
Article
Explaining Intrusion Detection-Based Convolutional Neural Networks Using Shapley Additive Explanations (SHAP)
by Remah Younisse, Ashraf Ahmad and Qasem Abu Al-Haija
Big Data Cogn. Comput. 2022, 6(4), 126; https://doi.org/10.3390/bdcc6040126 - 25 Oct 2022
Cited by 13 | Viewed by 2840
Abstract
Artificial intelligence (AI) and machine learning (ML) models have become essential tools used in many critical systems to make significant decisions; the decisions taken by these models need to be trusted and explained on many occasions. On the other hand, the performance of [...] Read more.
Artificial intelligence (AI) and machine learning (ML) models have become essential tools used in many critical systems to make significant decisions; the decisions taken by these models need to be trusted and explained on many occasions. On the other hand, the performance of different ML and AI models varies with the same used dataset. Sometimes, developers have tried to use multiple models before deciding which model should be used without understanding the reasons behind this variance in performance. Explainable artificial intelligence (XAI) models have presented an explanation for the models’ performance based on highlighting the features that the model considered necessary while making the decision. This work presents an analytical approach to studying the density functions for intrusion detection dataset features. The study explains how and why these features are essential during the XAI process. We aim, in this study, to explain XAI behavior to add an extra layer of explainability. The density function analysis presented in this paper adds a deeper understanding of the importance of features in different AI models. Specifically, we present a method to explain the results of SHAP (Shapley additive explanations) for different machine learning models based on the feature data’s KDE (kernel density estimation) plots. We also survey the specifications of dataset features that can perform better for convolutional neural networks (CNN) based models. Full article
(This article belongs to the Special Issue Machine Learning for Dependable Edge Computing Systems and Services)
Show Figures

Figure 1

15 pages, 1058 KiB  
Article
White Blood Cell Classification Using Multi-Attention Data Augmentation and Regularization
by Nasrin Bayat, Diane D. Davey, Melanie Coathup and Joon-Hyuk Park
Big Data Cogn. Comput. 2022, 6(4), 122; https://doi.org/10.3390/bdcc6040122 - 21 Oct 2022
Cited by 7 | Viewed by 4392
Abstract
Accurate and robust human immune system assessment through white blood cell evaluation require computer-aided tools with pathologist-level accuracy. This work presents a multi-attention leukocytes subtype classification method by leveraging fine-grained and spatial locality attributes of white blood cell. The proposed framework comprises three [...] Read more.
Accurate and robust human immune system assessment through white blood cell evaluation require computer-aided tools with pathologist-level accuracy. This work presents a multi-attention leukocytes subtype classification method by leveraging fine-grained and spatial locality attributes of white blood cell. The proposed framework comprises three main components: texture-aware/attention map generation blocks, attention regularization, and attention-based data augmentation. The developed framework is applicable to general CNN-based architectures and enhances decision making by paying specific attention to the discriminative regions of a white blood cell. The performance of the proposed method/model was evaluated through an extensive set of experiments and validation. The obtained results demonstrate the superior performance of the model achieving 99.69 % accuracy compared to other state-of-the-art approaches. The proposed model is a good alternative and complementary to existing computer diagnosis tools to assist pathologists in evaluating white blood cells from blood smear images. Full article
(This article belongs to the Special Issue Data Science in Health Care)
Show Figures

Figure 1

24 pages, 1286 KiB  
Article
Ontology-Based Personalized Job Recommendation Framework for Migrants and Refugees
by Dimos Ntioudis, Panagiota Masa, Anastasios Karakostas, Georgios Meditskos, Stefanos Vrochidis and Ioannis Kompatsiaris
Big Data Cogn. Comput. 2022, 6(4), 120; https://doi.org/10.3390/bdcc6040120 - 19 Oct 2022
Cited by 7 | Viewed by 2551
Abstract
Participation in the labor market is seen as the most important factor favoring long-term integration of migrants and refugees into society. This paper describes the job recommendation framework of the Integration of Migrants MatchER SErvice (IMMERSE). The proposed framework acts as a matching [...] Read more.
Participation in the labor market is seen as the most important factor favoring long-term integration of migrants and refugees into society. This paper describes the job recommendation framework of the Integration of Migrants MatchER SErvice (IMMERSE). The proposed framework acts as a matching tool that enables the contexts of individual migrants and refugees, including their expectations, languages, educational background, previous job experience and skills, to be captured in the ontology and facilitate their matching with the job opportunities available in their host country. Profile information and job listings are processed in real time in the back-end, and matches are revealed in the front-end. Moreover, the matching tool considers the activity of the users on the platform to provide recommendations based on the similarity among existing jobs that they already showed interest in and new jobs posted on the platform. Finally, the framework takes into account the location of the users to rank the results and only shows the most relevant location-based recommendations. Full article
(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems)
Show Figures

Figure 1

14 pages, 307 KiB  
Article
A Survey on Medical Image Segmentation Based on Deep Learning Techniques
by Jayashree Moorthy and Usha Devi Gandhi
Big Data Cogn. Comput. 2022, 6(4), 117; https://doi.org/10.3390/bdcc6040117 - 17 Oct 2022
Cited by 18 | Viewed by 4902
Abstract
Deep learning techniques have rapidly become important as a preferred method for evaluating medical image segmentation. This survey analyses different contributions in the deep learning medical field, including the major common issues published in recent years, and also discusses the fundamentals of deep [...] Read more.
Deep learning techniques have rapidly become important as a preferred method for evaluating medical image segmentation. This survey analyses different contributions in the deep learning medical field, including the major common issues published in recent years, and also discusses the fundamentals of deep learning concepts applicable to medical image segmentation. The study of deep learning can be applied to image categorization, object recognition, segmentation, registration, and other tasks. First, the basic ideas of deep learning techniques, applications, and frameworks are introduced. Deep learning techniques that operate the ideal applications are briefly explained. This paper indicates that there is a previous experience with different techniques in the class of medical image segmentation. Deep learning has been designed to describe and respond to various challenges in the field of medical image analysis such as low accuracy of image classification, low segmentation resolution, and poor image enhancement. Aiming to solve these present issues and improve the evolution of medical image segmentation challenges, we provide suggestions for future research. Full article
(This article belongs to the Special Issue Computational Collective Intelligence with Big Data–AI Society)
Show Figures

Figure 1

40 pages, 4281 KiB  
Article
A Probabilistic Data Fusion Modeling Approach for Extracting True Values from Uncertain and Conflicting Attributes
by Ashraf Jaradat, Fadi Safieddine, Aziz Deraman, Omar Ali, Ahmad Al-Ahmad and Yehia Ibrahim Alzoubi
Big Data Cogn. Comput. 2022, 6(4), 114; https://doi.org/10.3390/bdcc6040114 - 13 Oct 2022
Cited by 1 | Viewed by 2176
Abstract
Real-world data obtained from integrating heterogeneous data sources are often multi-valued, uncertain, imprecise, error-prone, outdated, and have different degrees of accuracy and correctness. It is critical to resolve data uncertainty and conflicts to present quality data that reflect actual world values. This task [...] Read more.
Real-world data obtained from integrating heterogeneous data sources are often multi-valued, uncertain, imprecise, error-prone, outdated, and have different degrees of accuracy and correctness. It is critical to resolve data uncertainty and conflicts to present quality data that reflect actual world values. This task is called data fusion. In this paper, we deal with the problem of data fusion based on probabilistic entity linkage and uncertainty management in conflict data. Data fusion has been widely explored in the research community. However, concerns such as explicit uncertainty management and on-demand data fusion, which can cope with dynamic data sources, have not been studied well. This paper proposes a new probabilistic data fusion modeling approach that attempts to find true data values under conditions of uncertain or conflicted multi-valued attributes. These attributes are generated from the probabilistic linkage and merging alternatives of multi-corresponding entities. Consequently, the paper identifies and formulates several data fusion cases and sample spaces that require further conditional computation using our computational fusion method. The identification is established to fit with a real-world data fusion problem. In the real world, there is always the possibility of heterogeneous data sources, the integration of probabilistic entities, single or multiple truth values for certain attributes, and different combinations of attribute values as alternatives for each generated entity. We validate our probabilistic data fusion approach through mathematical representation based on three data sources with different reliability scores. The validity of the approach was assessed via implementation into our probabilistic integration system to show how it can manage and resolve different cases of data conflicts and inconsistencies. The outcome showed improved accuracy in identifying true values due to the association of constructive evidence. Full article
Show Figures

Figure 1

44 pages, 5439 KiB  
Article
Graph-Based Conversation Analysis in Social Media
by Marco Brambilla, Alireza Javadian Sabet, Kalyani Kharmale and Amin Endah Sulistiawati
Big Data Cogn. Comput. 2022, 6(4), 113; https://doi.org/10.3390/bdcc6040113 - 12 Oct 2022
Cited by 4 | Viewed by 4563
Abstract
Social media platforms offer their audience the possibility to reply to posts through comments and reactions. This allows social media users to express their ideas and opinions on shared content, thus opening virtual discussions. Most studies on social networks have focused only on [...] Read more.
Social media platforms offer their audience the possibility to reply to posts through comments and reactions. This allows social media users to express their ideas and opinions on shared content, thus opening virtual discussions. Most studies on social networks have focused only on user relationships or on the shared content, while ignoring the valuable information hidden in the digital conversations, in terms of structure of the discussion and relation between contents, which is essential for understanding online communication behavior. This work proposes a graph-based framework to assess the shape and structure of online conversations. The analysis was composed of two main stages: intent analysis and network generation. Users’ intention was detected using keyword-based classification, followed by the implementation of machine learning-based classification algorithms for uncategorized comments. Afterwards, human-in-the-loop was involved in improving the keyword-based classification. To extract essential information on social media communication patterns among the users, we built conversation graphs using a directed multigraph network and we show our model at work in two real-life experiments. The first experiment used data from a real social media challenge and it was able to categorize 90% of comments with 98% accuracy. The second experiment focused on COVID vaccine-related discussions in online forums and investigated the stance and sentiment to understand how the comments are affected by their parent discussion. Finally, the most popular online discussion patterns were mined and interpreted. We see that the dynamics obtained from conversation graphs are similar to traditional communication activities. Full article
(This article belongs to the Special Issue Graph-Based Data Mining and Social Network Analysis)
Show Figures

Figure 1

33 pages, 2658 KiB  
Article
Question Answer System: A State-of-Art Representation of Quantitative and Qualitative Analysis
by Bhushan Zope, Sashikala Mishra, Kailash Shaw, Deepali Rahul Vora, Ketan Kotecha and Ranjeet Vasant Bidwe
Big Data Cogn. Comput. 2022, 6(4), 109; https://doi.org/10.3390/bdcc6040109 - 07 Oct 2022
Cited by 13 | Viewed by 5738
Abstract
Question Answer System (QAS) automatically answers the question asked in natural language. Due to the varying dimensions and approaches that are available, QAS has a very diverse solution space, and a proper bibliometric study is required to paint the entire domain space. This [...] Read more.
Question Answer System (QAS) automatically answers the question asked in natural language. Due to the varying dimensions and approaches that are available, QAS has a very diverse solution space, and a proper bibliometric study is required to paint the entire domain space. This work presents a bibliometric and literature analysis of QAS. Scopus and Web of Science are two well-known research databases used for the study. A systematic analytical study comprising performance analysis and science mapping is performed. Recent research trends, seminal work, and influential authors are identified in performance analysis using statistical tools on research constituents. On the other hand, science mapping is performed using network analysis on a citation and co-citation network graph. Through this analysis, the domain’s conceptual evolution and intellectual structure are shown. We have divided the literature into four important architecture types and have provided the literature analysis of Knowledge Base (KB)-based and GNN-based approaches for QAS. Full article
Show Figures

Figure 1

13 pages, 2235 KiB  
Article
Deep Learning-Based Computer-Aided Classification of Amniotic Fluid Using Ultrasound Images from Saudi Arabia
by Irfan Ullah Khan, Nida Aslam, Fatima M. Anis, Samiha Mirza, Alanoud AlOwayed, Reef M. Aljuaid, Razan M. Bakr and Nourah Hasan Al Qahtani
Big Data Cogn. Comput. 2022, 6(4), 107; https://doi.org/10.3390/bdcc6040107 - 03 Oct 2022
Cited by 5 | Viewed by 2308
Abstract
Amniotic Fluid (AF) refers to a protective liquid surrounding the fetus inside the amniotic sac, serving multiple purposes, and hence is a key indicator of fetal health. Determining the AF levels at an early stage helps to ascertain the maturation of lungs and [...] Read more.
Amniotic Fluid (AF) refers to a protective liquid surrounding the fetus inside the amniotic sac, serving multiple purposes, and hence is a key indicator of fetal health. Determining the AF levels at an early stage helps to ascertain the maturation of lungs and gastrointestinal development, etc. Low AF entails the risk of premature birth, perinatal mortality, and thereby admission to intensive care unit (ICU). Moreover, AF level is also a critical factor in determining early deliveries. Hence, AF detection is a vital measurement required during early ultrasound (US), and its automation is essential. The detection of AF is usually a time-consuming process as it is patient specific. Furthermore, its measurement and accuracy are prone to errors as it heavily depends on the sonographer’s experience. However, automating this process by developing robust, precise, and effective methods for detection will be beneficial to the healthcare community. Therefore, in this paper, we utilized transfer learning models in order to classify the AF levels as normal or abnormal using the US images. The dataset used consisted of 166 US images of pregnant women, and initially the dataset was preprocessed before training the model. Five transfer learning models, namely, Xception, Densenet, InceptionResNet, MobileNet, and ResNet, were applied. The results showed that MobileNet achieved an overall accuracy of 0.94. Overall, the proposed study produces an effective result in successfully classifying the AF levels, thereby building automated, effective models reliant on transfer learning in order to aid sonographers in evaluating fetal health. Full article
(This article belongs to the Special Issue Data Science in Health Care)
Show Figures

Figure 1

17 pages, 935 KiB  
Article
Supporting Meteorologists in Data Analysis through Knowledge-Based Recommendations
by Thoralf Reis, Tim Funke, Sebastian Bruchhaus, Florian Freund, Marco X. Bornschlegl and Matthias L. Hemmje
Big Data Cogn. Comput. 2022, 6(4), 103; https://doi.org/10.3390/bdcc6040103 - 28 Sep 2022
Cited by 2 | Viewed by 2011
Abstract
Climate change means coping directly or indirectly with extreme weather conditions for everybody. Therefore, analyzing meteorological data to create precise models is gaining more importance and might become inevitable. Meteorologists have extensive domain knowledge about meteorological data yet lack practical data analysis skills. [...] Read more.
Climate change means coping directly or indirectly with extreme weather conditions for everybody. Therefore, analyzing meteorological data to create precise models is gaining more importance and might become inevitable. Meteorologists have extensive domain knowledge about meteorological data yet lack practical data analysis skills. This paper presents a method to bridge this gap by empowering the data knowledge carriers to analyze the data. The proposed system utilizes symbolic AI, a knowledge base created by experts, and a recommendation expert system to offer suiting data analysis methods or data pre-processing to meteorologists. This paper systematically analyzes the target user group of meteorologists and practical use cases to arrive at a conceptual and technical system design implemented in the CAMeRI prototype. The concepts in this paper are aligned with the AI2VIS4BigData Reference Model and comprise a novel first-order logic knowledge base that represents analysis methods and related pre-processings. The prototype implementation was qualitatively and quantitatively evaluated. This evaluation included recommendation validation for real-world data, a cognitive walkthrough, and measuring computation timings of the different system components. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

42 pages, 4691 KiB  
Article
An Improved African Vulture Optimization Algorithm for Feature Selection Problems and Its Application of Sentiment Analysis on Movie Reviews
by Aitak Shaddeli, Farhad Soleimanian Gharehchopogh, Mohammad Masdari and Vahid Solouk
Big Data Cogn. Comput. 2022, 6(4), 104; https://doi.org/10.3390/bdcc6040104 - 28 Sep 2022
Cited by 14 | Viewed by 2994
Abstract
The African Vulture Optimization Algorithm (AVOA) is inspired by African vultures’ feeding and orienting behaviors. It comprises powerful operators while maintaining the balance of exploration and efficiency in solving optimization problems. To be used in discrete applications, this algorithm needs to be discretized. [...] Read more.
The African Vulture Optimization Algorithm (AVOA) is inspired by African vultures’ feeding and orienting behaviors. It comprises powerful operators while maintaining the balance of exploration and efficiency in solving optimization problems. To be used in discrete applications, this algorithm needs to be discretized. This paper introduces two versions based on the S-shaped and V-shaped transfer functions of AVOA and BAOVAH. Moreover, the increase in computational complexity is avoided. Disruption operator and Bitwise strategy have also been used to maximize this model’s performance. A multi-strategy version of the AVOA called BAVOA-v1 is presented. In the proposed approach, i.e., BAVOA-v1, different strategies such as IPRS, mutation neighborhood search strategy (MNSS) (balance between exploration and exploitation), multi-parent crossover (increasing exploitation), and Bitwise (increasing diversity and exploration) are used to provide solutions with greater variety and to assure the quality of solutions. The proposed methods are evaluated on 30 UCI datasets with different dimensions. The simulation results showed that the proposed BAOVAH algorithm performed better than other binary meta-heuristic algorithms. So that the proposed BAOVAH algorithm set is the most accurate in 67% of the data set, and 93% of the data set is the best value of the fitness functions. In terms of feature selection, it has shown high performance. Finally, the proposed method in a case study to determine the number of neurons and the activator function to improve deep learning results was used in the sentiment analysis of movie viewers. In this paper, the CNNEM model is designed. The results of experiments on three datasets of sentiment analysis—IMDB, Amazon, and Yelp—show that the BAOVAH algorithm increases the accuracy of the CNNEM network in the IMDB dataset by 6%, the Amazon dataset by 33%, and the Yelp dataset by 30%. Full article
Show Figures

Figure 1

20 pages, 3926 KiB  
Article
An Efficient and Secure Big Data Storage in Cloud Environment by Using Triple Data Encryption Standard
by Mohan Naik Ramachandra, Madala Srinivasa Rao, Wen Cheng Lai, Bidare Divakarachari Parameshachari, Jayachandra Ananda Babu and Kivudujogappa Lingappa Hemalatha
Big Data Cogn. Comput. 2022, 6(4), 101; https://doi.org/10.3390/bdcc6040101 - 26 Sep 2022
Cited by 28 | Viewed by 4346
Abstract
In recent decades, big data analysis has become the most important research topic. Hence, big data security offers Cloud application security and monitoring to host highly sensitive data to support Cloud platforms. However, the privacy and security of big data has become an [...] Read more.
In recent decades, big data analysis has become the most important research topic. Hence, big data security offers Cloud application security and monitoring to host highly sensitive data to support Cloud platforms. However, the privacy and security of big data has become an emerging issue that restricts the organization to utilize Cloud services. The existing privacy preserving approaches showed several drawbacks such as a lack of data privacy and accurate data analysis, a lack of efficiency of performance, and completely rely on third party. In order to overcome such an issue, the Triple Data Encryption Standard (TDES) methodology is proposed to provide security for big data in the Cloud environment. The proposed TDES methodology provides a relatively simpler technique by increasing the sizes of keys in Data Encryption Standard (DES) to protect against attacks and defend the privacy of data. The experimental results showed that the proposed TDES method is effective in providing security and privacy to big healthcare data in the Cloud environment. The proposed TDES methodology showed less encryption and decryption time compared to the existing Intelligent Framework for Healthcare Data Security (IFHDS) method. Full article
Show Figures

Figure 1

23 pages, 6422 KiB  
Article
Triggers and Tweets: Implicit Aspect-Based Sentiment and Emotion Analysis of Community Chatter Relevant to Education Post-COVID-19
by Heba Ismail, Ashraf Khalil, Nada Hussein and Rawan Elabyad
Big Data Cogn. Comput. 2022, 6(3), 99; https://doi.org/10.3390/bdcc6030099 - 16 Sep 2022
Cited by 10 | Viewed by 3085
Abstract
This research proposes a well-being analytical framework using social media chatter data. The proposed framework infers analytics and provides insights into the public’s well-being relevant to education throughout and post the COVID-19 pandemic through a comprehensive Emotion and Aspect-based Sentiment Analysis (ABSA). Moreover, [...] Read more.
This research proposes a well-being analytical framework using social media chatter data. The proposed framework infers analytics and provides insights into the public’s well-being relevant to education throughout and post the COVID-19 pandemic through a comprehensive Emotion and Aspect-based Sentiment Analysis (ABSA). Moreover, this research aims to examine the variability in emotions of students, parents, and faculty toward the e-learning process over time and across different locations. The proposed framework curates Twitter chatter data relevant to the education sector, identifies tweets with the sentiment, and then identifies the exact emotion and emotional triggers associated with those feelings through implicit ABSA. The produced analytics are then factored by location and time to provide more comprehensive insights that aim to assist the decision-makers and personnel in the educational sector enhance and adapt the educational process during and following the pandemic and looking toward the future. The experimental results for emotion classification show that the Linear Support Vector Classifier (SVC) outperformed other classifiers in terms of overall accuracy, precision, recall, and F-measure of 91%. Moreover, the Logistic Regression classifier outperformed all other classifiers in terms of overall accuracy, recall, an F-measure of 81%, and precision of 83% for aspect classification. In online experiments using UAE COVID-19 education-related data, the analytics show high relevance with the public concerns around the education process that were reported during the experiment’s timeframe. Full article
Show Figures

Figure 1

15 pages, 364 KiB  
Article
Machine Learning Techniques for Chronic Kidney Disease Risk Prediction
by Elias Dritsas and Maria Trigka
Big Data Cogn. Comput. 2022, 6(3), 98; https://doi.org/10.3390/bdcc6030098 - 14 Sep 2022
Cited by 40 | Viewed by 6395
Abstract
Chronic kidney disease (CKD) is a condition characterized by progressive loss of kidney function over time. It describes a clinical entity that causes kidney damage and affects the general health of the human body. Improper diagnosis and treatment of the disease can eventually [...] Read more.
Chronic kidney disease (CKD) is a condition characterized by progressive loss of kidney function over time. It describes a clinical entity that causes kidney damage and affects the general health of the human body. Improper diagnosis and treatment of the disease can eventually lead to end-stage renal disease and ultimately lead to the patient’s death. Machine Learning (ML) techniques have acquired an important role in disease prediction and are a useful tool in the field of medical science. In the present research work, we aim to build efficient tools for predicting CKD occurrence, following an approach which exploits ML techniques. More specifically, first, we apply class balancing in order to tackle the non-uniform distribution of the instances in the two classes, then features ranking and analysis are performed, and finally, several ML models are trained and evaluated based on various performance metrics. The derived results highlighted the Rotation Forest (RotF), which prevailed in relation to compared models with an Area Under the Curve (AUC) of 100%, Precision, Recall, F-Measure and Accuracy equal to 99.2%. Full article
(This article belongs to the Special Issue Digital Health and Data Analytics in Public Health)
Show Figures

Figure 1

21 pages, 6061 KiB  
Article
Improving Real Estate Rental Estimations with Visual Data
by Ilia Azizi and Iegor Rudnytskyi
Big Data Cogn. Comput. 2022, 6(3), 96; https://doi.org/10.3390/bdcc6030096 - 09 Sep 2022
Cited by 3 | Viewed by 2887
Abstract
Multi-modal data are widely available for online real estate listings. Announcements can contain various forms of data, including visual data and unstructured textual descriptions. Nonetheless, many traditional real estate pricing models rely solely on well-structured tabular features. This work investigates whether it is [...] Read more.
Multi-modal data are widely available for online real estate listings. Announcements can contain various forms of data, including visual data and unstructured textual descriptions. Nonetheless, many traditional real estate pricing models rely solely on well-structured tabular features. This work investigates whether it is possible to improve the performance of the pricing model using additional unstructured data, namely images of the property and satellite images. We compare four models based on the type of input data they use: (1) tabular data only, (2) tabular data and property images, (3) tabular data and satellite images, and (4) tabular data and a combination of property and satellite images. In a supervised context, the branches of dedicated neural networks for each data type are fused (concatenated) to predict log rental prices. The novel dataset devised for the study (SRED) consists of 11,105 flat rentals advertised over the internet in Switzerland. The results reveal that using all three sources of data generally outperforms machine learning models built on only tabular information. The findings pave the way for further research on integrating other non-structured inputs, for instance, the textual descriptions of properties. Full article
Show Figures

Figure 1

21 pages, 7858 KiB  
Article
Multimodal Emotional Classification Based on Meaningful Learning
by Hajar Filali, Jamal Riffi, Chafik Boulealam, Mohamed Adnane Mahraz and Hamid Tairi
Big Data Cogn. Comput. 2022, 6(3), 95; https://doi.org/10.3390/bdcc6030095 - 08 Sep 2022
Cited by 6 | Viewed by 2634
Abstract
Emotion recognition has become one of the most researched subjects in the scientific community, especially in the human–computer interface field. Decades of scientific research have been conducted on unimodal emotion analysis, whereas recent contributions concentrate on multimodal emotion recognition. These efforts have achieved [...] Read more.
Emotion recognition has become one of the most researched subjects in the scientific community, especially in the human–computer interface field. Decades of scientific research have been conducted on unimodal emotion analysis, whereas recent contributions concentrate on multimodal emotion recognition. These efforts have achieved great success in terms of accuracy in diverse areas of Deep Learning applications. To achieve better performance for multimodal emotion recognition systems, we exploit Meaningful Neural Network Effectiveness to enable emotion prediction during a conversation. Using the text and the audio modalities, we proposed feature extraction methods based on Deep Learning. Then, the bimodal modality that is created following the fusion of the text and audio features is used. The feature vectors from these three modalities are assigned to feed a Meaningful Neural Network to separately learn each characteristic. Its architecture consists of a set of neurons for each component of the input vector before combining them all together in the last layer. Our model was evaluated on a multimodal and multiparty dataset for emotion recognition in conversation MELD. The proposed approach reached an accuracy of 86.69%, which significantly outperforms all current multimodal systems. To sum up, several evaluation techniques applied to our work demonstrate the robustness and superiority of our model over other state-of-the-art MELD models. Full article
Show Figures

Figure 1

14 pages, 421 KiB  
Article
Hierarchical Co-Attention Selection Network for Interpretable Fake News Detection
by Xiaoyi Ge, Shuai Hao, Yuxiao Li, Bin Wei and Mingshu Zhang
Big Data Cogn. Comput. 2022, 6(3), 93; https://doi.org/10.3390/bdcc6030093 - 05 Sep 2022
Cited by 2 | Viewed by 3533
Abstract
Social media fake news has become a pervasive and problematic issue today with the development of the internet. Recent studies have utilized different artificial intelligence technologies to verify the truth of the news and provide explanations for the results, which have shown remarkable [...] Read more.
Social media fake news has become a pervasive and problematic issue today with the development of the internet. Recent studies have utilized different artificial intelligence technologies to verify the truth of the news and provide explanations for the results, which have shown remarkable success in interpretable fake news detection. However, individuals’ judgments of news are usually hierarchical, prioritizing valuable words above essential sentences, which is neglected by existing fake news detection models. In this paper, we propose an interpretable novel neural network-based model, the hierarchical co-attention selection network (HCSN), to predict whether the source post is fake, as well as an explanation that emphasizes important comments and particular words. The key insight of the HCSN model is to incorporate the Gumbel–Max trick in the hierarchical co-attention selection mechanism that captures sentence-level and word-level information from the source post and comments following the sequence of words–sentences–words–event. In addition, HCSN enjoys the additional benefit of interpretability—it provides a conscious explanation of how it reaches certain results by selecting comments and highlighting words. According to the experiments conducted on real-world datasets, our model outperformed state-of-the-art methods and generated reasonable explanations. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

19 pages, 705 KiB  
Article
PRIVAFRAME: A Frame-Based Knowledge Graph for Sensitive Personal Data
by Gaia Gambarelli and Aldo Gangemi
Big Data Cogn. Comput. 2022, 6(3), 90; https://doi.org/10.3390/bdcc6030090 - 26 Aug 2022
Cited by 3 | Viewed by 2625
Abstract
The pervasiveness of dialogue systems and virtual conversation applications raises an important theme: the potential of sharing sensitive information, and the consequent need for protection. To guarantee the subject’s right to privacy, and avoid the leakage of private content, it is important to [...] Read more.
The pervasiveness of dialogue systems and virtual conversation applications raises an important theme: the potential of sharing sensitive information, and the consequent need for protection. To guarantee the subject’s right to privacy, and avoid the leakage of private content, it is important to treat sensitive information. However, any treatment requires firstly to identify sensitive text, and appropriate techniques to do it automatically. The Sensitive Information Detection (SID) task has been explored in the literature in different domains and languages, but there is no common benchmark. Current approaches are mostly based on artificial neural networks (ANN) or transformers based on them. Our research focuses on identifying categories of personal data in informal English sentences, by adopting a new logical-symbolic approach, and eventually hybridising it with ANN models. We present a frame-based knowledge graph built for personal data categories defined in the Data Privacy Vocabulary (DPV). The knowledge graph is designed through the logical composition of already existing frames, and has been evaluated as background knowledge for a SID system against a labeled sensitive information dataset. The accuracy of PRIVAFRAME reached 78%. By comparison, a transformer-based model achieved 12% lower performance on the same dataset. The top-down logical-symbolic frame-based model allows a granular analysis, and does not require a training dataset. These advantages lead us to use it as a layer in a hybrid model, where the logical SID is combined with an ANNs SID tested in a previous study by the authors. Full article
(This article belongs to the Special Issue Artificial Intelligence for Online Safety)
Show Figures

Figure 1

17 pages, 715 KiB  
Article
Argumentation-Based Query Answering under Uncertainty with Application to Cybersecurity
by Mario A. Leiva, Alejandro J. García, Paulo Shakarian and Gerardo I. Simari
Big Data Cogn. Comput. 2022, 6(3), 91; https://doi.org/10.3390/bdcc6030091 - 26 Aug 2022
Cited by 5 | Viewed by 2119
Abstract
Decision support tools are key components of intelligent sociotechnical systems, and their successful implementation faces a variety of challenges, including the multiplicity of information sources, heterogeneous format, and constant changes. Handling such challenges requires the ability to analyze and process inconsistent and incomplete [...] Read more.
Decision support tools are key components of intelligent sociotechnical systems, and their successful implementation faces a variety of challenges, including the multiplicity of information sources, heterogeneous format, and constant changes. Handling such challenges requires the ability to analyze and process inconsistent and incomplete information with varying degrees of associated uncertainty. Moreover, some domains require the system’s outputs to be explainable and interpretable; an example of this is cyberthreat analysis (CTA) in cybersecurity domains. In this paper, we first present the P-DAQAP system, an extension of a recently developed query-answering platform based on defeasible logic programming (DeLP) that incorporates a probabilistic model and focuses on delivering these capabilities. After discussing the details of its design and implementation, and describing how it can be applied in a CTA use case, we report on the results of an empirical evaluation designed to explore the effectiveness and efficiency of a possible world sampling-based approximate query answering approach that addresses the intractability of exact computations. Full article
Show Figures

Figure 1

19 pages, 33832 KiB  
Article
Large-Scale Oil Palm Trees Detection from High-Resolution Remote Sensing Images Using Deep Learning
by Hery Wibowo, Imas Sukaesih Sitanggang, Mushthofa Mushthofa and Hari Agung Adrianto
Big Data Cogn. Comput. 2022, 6(3), 89; https://doi.org/10.3390/bdcc6030089 - 24 Aug 2022
Cited by 10 | Viewed by 4554
Abstract
Tree counting is an important plantation practice for biological asset inventories, etc. The application of precision agriculture in counting oil palm trees can be implemented by detecting oil palm trees from aerial imagery. This research uses the deep learning approach using YOLOv3, YOLOv4, [...] Read more.
Tree counting is an important plantation practice for biological asset inventories, etc. The application of precision agriculture in counting oil palm trees can be implemented by detecting oil palm trees from aerial imagery. This research uses the deep learning approach using YOLOv3, YOLOv4, and YOLOv5m in detecting oil palm trees. The dataset consists of drone images of an oil palm plantation acquired using a Fixed Wing VTOL drone with a resolution of 5cm/pixel, covering an area of 730 ha labeled with an oil palm class of 56,614 labels. The test dataset covers an area of 180 ha with flat and hilly conditions with sparse, dense, and overlapping canopy and oil palm trees intersecting with other vegetations. Model testing using images from 24 regions, each of which covering 12 ha with up to 1000 trees (for a total of 17,343 oil palm trees), yielded F1-scores of 97.28%, 97.74%, and 94.94%, with an average detection time of 43 s, 45 s, and 21 s for models trained with YOLOv3, YOLOv4, and YOLOv5m, respectively. This result shows that the method is sufficiently accurate and efficient in detecting oil palm trees and has the potential to be implemented in commercial applications for plantation companies. Full article
Show Figures

Figure 1

26 pages, 5309 KiB  
Article
RSS-Based Wireless LAN Indoor Localization and Tracking Using Deep Architectures
by Muhammed Zahid Karakusak, Hasan Kivrak, Hasan Fehmi Ates and Mehmet Kemal Ozdemir
Big Data Cogn. Comput. 2022, 6(3), 84; https://doi.org/10.3390/bdcc6030084 - 08 Aug 2022
Cited by 8 | Viewed by 3115
Abstract
Wireless Local Area Network (WLAN) positioning is a challenging task indoors due to environmental constraints and the unpredictable behavior of signal propagation, even at a fixed location. The aim of this work is to develop deep learning-based approaches for indoor localization and tracking [...] Read more.
Wireless Local Area Network (WLAN) positioning is a challenging task indoors due to environmental constraints and the unpredictable behavior of signal propagation, even at a fixed location. The aim of this work is to develop deep learning-based approaches for indoor localization and tracking by utilizing Received Signal Strength (RSS). The study proposes Multi-Layer Perceptron (MLP), One and Two Dimensional Convolutional Neural Networks (1D CNN and 2D CNN), and Long Short Term Memory (LSTM) deep networks architectures for WLAN indoor positioning based on the data obtained by actual RSS measurements from an existing WLAN infrastructure in a mobile user scenario. The results, using different types of deep architectures including MLP, CNNs, and LSTMs with existing WLAN algorithms, are presented. The Root Mean Square Error (RMSE) is used as the assessment criterion. The proposed LSTM Model 2 achieved a dynamic positioning RMSE error of 1.73m, which outperforms probabilistic WLAN algorithms such as Memoryless Positioning (RMSE: 10.35m) and Nonparametric Information (NI) filter with variable acceleration (RMSE: 5.2m) under the same experiment environment. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

17 pages, 1280 KiB  
Article
Impactful Digital Twin in the Healthcare Revolution
by Hossein Hassani, Xu Huang and Steve MacFeely
Big Data Cogn. Comput. 2022, 6(3), 83; https://doi.org/10.3390/bdcc6030083 - 08 Aug 2022
Cited by 58 | Viewed by 8650
Abstract
Over the last few decades, our digitally expanding world has experienced another significant digitalization boost because of the COVID-19 pandemic. Digital transformations are changing every aspect of this world. New technological innovations are springing up continuously, attracting increasing attention and investments. Digital twin, [...] Read more.
Over the last few decades, our digitally expanding world has experienced another significant digitalization boost because of the COVID-19 pandemic. Digital transformations are changing every aspect of this world. New technological innovations are springing up continuously, attracting increasing attention and investments. Digital twin, one of the highest trending technologies of recent years, is now joining forces with the healthcare sector, which has been under the spotlight since the outbreak of COVID-19. This paper sets out to promote a better understanding of digital twin technology, clarify some common misconceptions, and review the current trajectory of digital twin applications in healthcare. Furthermore, the functionalities of the digital twin in different life stages are summarized in the context of a digital twin model in healthcare. Following the Internet of Things as a service concept and digital twining as a service model supporting Industry 4.0, we propose a paradigm of digital twinning everything as a healthcare service, and different groups of physical entities are also clarified for clear reference of digital twin architecture in healthcare. This research discusses the value of digital twin technology in healthcare, as well as current challenges and insights for future research. Full article
Show Figures

Figure 1

17 pages, 26907 KiB  
Article
Real-Time End-to-End Speech Emotion Recognition with Cross-Domain Adaptation
by Konlakorn Wongpatikaseree, Sattaya Singkul, Narit Hnoohom and Sumeth Yuenyong
Big Data Cogn. Comput. 2022, 6(3), 79; https://doi.org/10.3390/bdcc6030079 - 15 Jul 2022
Cited by 8 | Viewed by 4696
Abstract
Language resources are the main factor in speech-emotion-recognition (SER)-based deep learning models. Thai is a low-resource language that has a smaller data size than high-resource languages such as German. This paper describes the framework of using a pretrained-model-based front-end and back-end network to [...] Read more.
Language resources are the main factor in speech-emotion-recognition (SER)-based deep learning models. Thai is a low-resource language that has a smaller data size than high-resource languages such as German. This paper describes the framework of using a pretrained-model-based front-end and back-end network to adapt feature spaces from the speech recognition domain to the speech emotion classification domain. It consists of two parts: a speech recognition front-end network and a speech emotion recognition back-end network. For speech recognition, Wav2Vec2 is the state-of-the-art for high-resource languages, while XLSR is used for low-resource languages. Wav2Vec2 and XLSR have proposed generalized end-to-end learning for speech understanding based on the speech recognition domain as feature space representations from feature encoding. This is one reason why our front-end network was selected as Wav2Vec2 and XLSR for the pretrained model. The pre-trained Wav2Vec2 and XLSR are used for front-end networks and fine-tuned for specific languages using the Common Voice 7.0 dataset. Then, feature vectors of the front-end network are input for back-end networks; this includes convolution time reduction (CTR) and linear mean encoding transformation (LMET). Experiments using two different datasets show that our proposed framework can outperform the baselines in terms of unweighted and weighted accuracies. Full article
Show Figures

Figure 1

22 pages, 1108 KiB  
Article
We Know You Are Living in Bali: Location Prediction of Twitter Users Using BERT Language Model
by Lihardo Faisal Simanjuntak, Rahmad Mahendra and Evi Yulianti
Big Data Cogn. Comput. 2022, 6(3), 77; https://doi.org/10.3390/bdcc6030077 - 07 Jul 2022
Cited by 15 | Viewed by 3626
Abstract
Twitter user location data provide essential information that can be used for various purposes. However, user location is not easy to identify because many profiles omit this information, or users enter data that do not correspond to their actual locations. Several related works [...] Read more.
Twitter user location data provide essential information that can be used for various purposes. However, user location is not easy to identify because many profiles omit this information, or users enter data that do not correspond to their actual locations. Several related works attempted to predict location on English-language tweets. In this study, we attempted to predict the location of Indonesian tweets. We utilized machine learning approaches, i.e., long-short term memory (LSTM) and bidirectional encoder representations from transformers (BERT) to infer Twitter users’ home locations using display name in profile, user description, and user tweets. By concatenating display name, description, and aggregated tweet, the model achieved the best accuracy of 0.77. The performance of the IndoBERT model outperformed several baseline models. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

15 pages, 695 KiB  
Article
Topological Data Analysis Helps to Improve Accuracy of Deep Learning Models for Fake News Detection Trained on Very Small Training Sets
by Ran Deng and Fedor Duzhin
Big Data Cogn. Comput. 2022, 6(3), 74; https://doi.org/10.3390/bdcc6030074 - 05 Jul 2022
Cited by 6 | Viewed by 4214
Abstract
Topological data analysis has recently found applications in various areas of science, such as computer vision and understanding of protein folding. However, applications of topological data analysis to natural language processing remain under-researched. This study applies topological data analysis to a particular natural [...] Read more.
Topological data analysis has recently found applications in various areas of science, such as computer vision and understanding of protein folding. However, applications of topological data analysis to natural language processing remain under-researched. This study applies topological data analysis to a particular natural language processing task: fake news detection. We have found that deep learning models are more accurate in this task than topological data analysis. However, assembling a deep learning model with topological data analysis significantly improves the model’s accuracy if the available training set is very small. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

19 pages, 491 KiB  
Article
Digital Technologies and the Role of Data in Cultural Heritage: The Past, the Present, and the Future
by Vassilis Poulopoulos and Manolis Wallace
Big Data Cogn. Comput. 2022, 6(3), 73; https://doi.org/10.3390/bdcc6030073 - 04 Jul 2022
Cited by 19 | Viewed by 7825
Abstract
Is culture considered to be our past, our roots, ancient ruins, or an old piece of art? Culture is all the factors that define who we are, how we act and interact in our world, in our daily activities, in our personal and [...] Read more.
Is culture considered to be our past, our roots, ancient ruins, or an old piece of art? Culture is all the factors that define who we are, how we act and interact in our world, in our daily activities, in our personal and public relations, in our life. Culture is all the things we are not obliged to do. However, today, we live in a mixed environment, an environment that is a combination of “offline” and the online, digital world. In this mixed environment, it is technology that defines our behaviour, technology that unites people in a large world, that finally, defines a status of “monoculture”. In this article, we examine the role of technology, and especially big data, in relation to the culture. We present the advances that led to paradigm shifts in the research area of cultural informatics, and forecast the future of culture as will be defined in this mixed world. Full article
(This article belongs to the Special Issue Big Data Analytics for Cultural Heritage)
Show Figures

Figure 1

16 pages, 4437 KiB  
Article
Lightweight AI Framework for Industry 4.0 Case Study: Water Meter Recognition
by Jalel Ktari, Tarek Frikha, Monia Hamdi, Hela Elmannai and Habib Hmam
Big Data Cogn. Comput. 2022, 6(3), 72; https://doi.org/10.3390/bdcc6030072 - 01 Jul 2022
Cited by 21 | Viewed by 3940
Abstract
The evolution of applications in telecommunication, network, computing, and embedded systems has led to the emergence of the Internet of Things and Artificial Intelligence. The combination of these technologies enabled improving productivity by optimizing consumption and facilitating access to real-time information. In this [...] Read more.
The evolution of applications in telecommunication, network, computing, and embedded systems has led to the emergence of the Internet of Things and Artificial Intelligence. The combination of these technologies enabled improving productivity by optimizing consumption and facilitating access to real-time information. In this work, there is a focus on Industry 4.0 and Smart City paradigms and a proposal of a new approach to monitor and track water consumption using an OCR, as well as the artificial intelligence algorithm and, in particular the YoLo 4 machine learning model. The goal of this work is to provide optimized results in real time. The recognition rate obtained with the proposed algorithms is around 98%. Full article
(This article belongs to the Special Issue Advancements in Deep Learning and Deep Federated Learning Models)
Show Figures

Figure 1

25 pages, 3658 KiB  
Article
A Comprehensive Spark-Based Layer for Converting Relational Databases to NoSQL
by Manal A. Abdel-Fattah, Wael Mohamed and Sayed Abdelgaber
Big Data Cogn. Comput. 2022, 6(3), 71; https://doi.org/10.3390/bdcc6030071 - 27 Jun 2022
Cited by 1 | Viewed by 3720
Abstract
Currently, the continuous massive growth in the size, variety, and velocity of data is defined as big data. Relational databases have a limited ability to work with big data. Consequently, not only structured query language (NoSQL) databases were utilized to handle big data [...] Read more.
Currently, the continuous massive growth in the size, variety, and velocity of data is defined as big data. Relational databases have a limited ability to work with big data. Consequently, not only structured query language (NoSQL) databases were utilized to handle big data because NoSQL represents data in diverse models and uses a variety of query languages, unlike traditional relational databases. Therefore, using NoSQL has become essential, and many studies have attempted to propose different layers to convert relational databases to NoSQL; however, most of them targeted only one or two models of NoSQL, and evaluated their layers on a single node, not in a distributed environment. This study proposes a Spark-based layer for mapping relational databases to NoSQL models, focusing on the document, column, and key–value databases of NoSQL models. The proposed Spark-based layer comprises of two parts. The first part is concerned with converting relational databases to document, column, and key–value databases, and encompasses two phases: a metadata analyzer of relational databases and Spark-based transformation and migration. The second part focuses on executing a structured query language (SQL) on the NoSQL. The suggested layer was applied and compared with Unity, as it has similar components and features and supports sub-queries and join operations in a single-node environment. The experimental results show that the proposed layer outperformed Unity in terms of the query execution time by a factor of three. In addition, the proposed layer was applied to multi-node clusters using different scenarios, and the results show that the integration between the Spark cluster and NoSQL databases on multi-node clusters provided better performance in reading and writing while increasing the dataset size than using a single node. Full article
Show Figures

Figure 1

20 pages, 6876 KiB  
Article
DeepWings©: Automatic Wing Geometric Morphometrics Classification of Honey Bee (Apis mellifera) Subspecies Using Deep Learning for Detecting Landmarks
by Pedro João Rodrigues, Walter Gomes and Maria Alice Pinto
Big Data Cogn. Comput. 2022, 6(3), 70; https://doi.org/10.3390/bdcc6030070 - 27 Jun 2022
Cited by 10 | Viewed by 5455
Abstract
Honey bee classification by wing geometric morphometrics entails the first step of manual annotation of 19 landmarks in the forewing vein junctions. This is a time-consuming and error-prone endeavor, with implications for classification accuracy. Herein, we developed a software called DeepWings© that overcomes [...] Read more.
Honey bee classification by wing geometric morphometrics entails the first step of manual annotation of 19 landmarks in the forewing vein junctions. This is a time-consuming and error-prone endeavor, with implications for classification accuracy. Herein, we developed a software called DeepWings© that overcomes this constraint in wing geometric morphometrics classification by automatically detecting the 19 landmarks on digital images of the right forewing. We used a database containing 7634 forewing images, including 1864 analyzed by F. Ruttner in the original delineation of 26 honey bee subspecies, to tune a convolutional neural network as a wing detector, a deep learning U-Net as a landmarks segmenter, and a support vector machine as a subspecies classifier. The implemented MobileNet wing detector was able to achieve a mAP of 0.975 and the landmarks segmenter was able to detect the 19 landmarks with 91.8% accuracy, with an average positional precision of 0.943 resemblance to manually annotated landmarks. The subspecies classifier, in turn, presented an average accuracy of 86.6% for 26 subspecies and 95.8% for a subset of five important subspecies. The final implementation of the system showed good speed performance, requiring only 14 s to process 10 images. DeepWings© is very user-friendly and is the first fully automated software, offered as a free Web service, for honey bee classification from wing geometric morphometrics. DeepWings© can be used for honey bee breeding, conservation, and even scientific purposes as it provides the coordinates of the landmarks in excel format, facilitating the work of research teams using classical identification approaches and alternative analytical tools. Full article
Show Figures

Figure 1

19 pages, 5202 KiB  
Article
Iris Liveness Detection Using Multiple Deep Convolution Networks
by Smita Khade, Shilpa Gite and Biswajeet Pradhan
Big Data Cogn. Comput. 2022, 6(2), 67; https://doi.org/10.3390/bdcc6020067 - 15 Jun 2022
Cited by 8 | Viewed by 3181
Abstract
In the recent decade, comprehensive research has been carried out in terms of promising biometrics modalities regarding humans’ physical features for person recognition. This work focuses on iris characteristics and traits for person identification and iris liveness detection. This study used five pre-trained [...] Read more.
In the recent decade, comprehensive research has been carried out in terms of promising biometrics modalities regarding humans’ physical features for person recognition. This work focuses on iris characteristics and traits for person identification and iris liveness detection. This study used five pre-trained networks, including VGG-16, Inceptionv3, Resnet50, Densenet121, and EfficientNetB7, to recognize iris liveness using transfer learning techniques. These models are compared using three state-of-the-art biometric databases: the LivDet-Iris 2015 dataset, IIITD contact dataset, and ND Iris3D 2020 dataset. Validation accuracy, loss, precision, recall, and f1-score, APCER (attack presentation classification error rate), NPCER (normal presentation classification error rate), and ACER (average classification error rate) were used to evaluate the performance of all pre-trained models. According to the observational data, these models have a considerable ability to transfer their experience to the field of iris recognition and to recognize the nanostructures within the iris region. Using the ND Iris 3D 2020 dataset, the EfficeintNetB7 model has achieved 99.97% identification accuracy. Experiments show that pre-trained models outperform other current iris biometrics variants. Full article
(This article belongs to the Special Issue Data, Structure, and Information in Artificial Intelligence)
Show Figures

Figure 1

32 pages, 7749 KiB  
Article
CompositeView: A Network-Based Visualization Tool
by Stephen A. Allegri, Kevin McCoy and Cassie S. Mitchell
Big Data Cogn. Comput. 2022, 6(2), 66; https://doi.org/10.3390/bdcc6020066 - 14 Jun 2022
Cited by 3 | Viewed by 4305
Abstract
Large networks are quintessential to bioinformatics, knowledge graphs, social network analysis, and graph-based learning. CompositeView is a Python-based open-source application that improves interactive complex network visualization and extraction of actionable insight. CompositeView utilizes specifically formatted input data to calculate composite scores and display [...] Read more.
Large networks are quintessential to bioinformatics, knowledge graphs, social network analysis, and graph-based learning. CompositeView is a Python-based open-source application that improves interactive complex network visualization and extraction of actionable insight. CompositeView utilizes specifically formatted input data to calculate composite scores and display them using the Cytoscape component of Dash. Composite scores are defined representations of smaller sets of conceptually similar data that, when combined, generate a single score to reduce information overload. Visualized interactive results are user-refined via filtering elements such as node value and edge weight sliders and graph manipulation options (e.g., node color and layout spread). The primary difference between CompositeView and other network visualization tools is its ability to auto-calculate and auto-update composite scores as the user interactively filters or aggregates data. CompositeView was developed to visualize network relevance rankings, but it performs well with non-network data. Three disparate CompositeView use cases are shown: relevance rankings from SemNet 2.0, an open-source knowledge graph relationship ranking software for biomedical literature-based discovery; Human Development Index (HDI) data; and the Framingham cardiovascular study. CompositeView was stress tested to construct reference benchmarks that define breadth and size of data effectively visualized. Finally, CompositeView is compared to Excel, Tableau, Cytoscape, neo4j, NodeXL, and Gephi. Full article
(This article belongs to the Special Issue Graph-Based Data Mining and Social Network Analysis)
Show Figures

Figure 1

22 pages, 3553 KiB  
Article
Synthesizing a Talking Child Avatar to Train Interviewers Working with Maltreated Children
by Pegah Salehi, Syed Zohaib Hassan, Myrthe Lammerse, Saeed Shafiee Sabet, Ingvild Riiser, Ragnhild Klingenberg Røed, Miriam S. Johnson, Vajira Thambawita, Steven A. Hicks, Martine Powell, Michael E. Lamb, Gunn Astrid Baugerud, Pål Halvorsen and Michael A. Riegler
Big Data Cogn. Comput. 2022, 6(2), 62; https://doi.org/10.3390/bdcc6020062 - 01 Jun 2022
Cited by 10 | Viewed by 5022
Abstract
When responding to allegations of child sexual, physical, and psychological abuse, Child Protection Service (CPS) workers and police personnel need to elicit detailed and accurate accounts of the abuse to assist in decision-making and prosecution. Current research emphasizes the importance of the interviewer’s [...] Read more.
When responding to allegations of child sexual, physical, and psychological abuse, Child Protection Service (CPS) workers and police personnel need to elicit detailed and accurate accounts of the abuse to assist in decision-making and prosecution. Current research emphasizes the importance of the interviewer’s ability to follow empirically based guidelines. In doing so, it is essential to implement economical and scientific training courses for interviewers. Due to recent advances in artificial intelligence, we propose to generate a realistic and interactive child avatar, aiming to mimic a child. Our ongoing research involves the integration and interaction of different components with each other, including how to handle the language, auditory, emotional, and visual components of the avatar. This paper presents three subjective studies that investigate and compare various state-of-the-art methods for implementing multiple aspects of the child avatar. The first user study evaluates the whole system and shows that the system is well received by the expert and highlights the importance of its realism. The second user study investigates the emotional component and how it can be integrated with video and audio, and the third user study investigates realism in the auditory and visual components of the avatar created by different methods. The insights and feedback from these studies have contributed to the refined and improved architecture of the child avatar system which we present here. Full article
(This article belongs to the Special Issue Multimedia Systems for Multimedia Big Data)
Show Figures

Figure 1

20 pages, 9323 KiB  
Article
COVID-19 Tweets Classification Based on a Hybrid Word Embedding Method
by Yosra Didi, Ahlam Walha and Ali Wali
Big Data Cogn. Comput. 2022, 6(2), 58; https://doi.org/10.3390/bdcc6020058 - 18 May 2022
Cited by 16 | Viewed by 4427
Abstract
In March 2020, the World Health Organisation declared that COVID-19 was a new pandemic. This deadly virus spread and affected many countries in the world. During the outbreak, social media platforms such as Twitter contributed valuable and massive amounts of data to better [...] Read more.
In March 2020, the World Health Organisation declared that COVID-19 was a new pandemic. This deadly virus spread and affected many countries in the world. During the outbreak, social media platforms such as Twitter contributed valuable and massive amounts of data to better assess health-related decision making. Therefore, we propose that users’ sentiments could be analysed with the application of effective supervised machine learning approaches to predict disease prevalence and provide early warnings. The collected tweets were prepared for preprocessing and categorised into: negative, positive, and neutral. In the second phase, different features were extracted from the posts by applying several widely used techniques, such as TF-IDF, Word2Vec, Glove, and FastText to capture features’ datasets. The novelty of this study is based on hybrid features extraction, where we combined syntactic features (TF-IDF) with semantic features (FastText and Glove) to represent posts accurately, which helps in improving the classification process. Experimental results show that FastText combined with TF-IDF performed better with SVM than the other models. SVM outperformed the other models by 88.72%, as well as for XGBoost, with an 85.29% accuracy score. This study shows that the hybrid methods proved their capability of extracting features from the tweets and increasing the performance of classification. Full article
Show Figures

Figure 1

19 pages, 2274 KiB  
Article
Virtual Reality Adaptation Using Electrodermal Activity to Support the User Experience
by Francesco Chiossi, Robin Welsch, Steeven Villa, Lewis Chuang and Sven Mayer
Big Data Cogn. Comput. 2022, 6(2), 55; https://doi.org/10.3390/bdcc6020055 - 13 May 2022
Cited by 16 | Viewed by 4223
Abstract
Virtual reality is increasingly used for tasks such as work and education. Thus, rendering scenarios that do not interfere with such goals and deplete user experience are becoming progressively more relevant. We present a physiologically adaptive system that optimizes the virtual environment based [...] Read more.
Virtual reality is increasingly used for tasks such as work and education. Thus, rendering scenarios that do not interfere with such goals and deplete user experience are becoming progressively more relevant. We present a physiologically adaptive system that optimizes the virtual environment based on physiological arousal, i.e., electrodermal activity. We investigated the usability of the adaptive system in a simulated social virtual reality scenario. Participants completed an n-back task (primary) and a visual detection (secondary) task. Here, we adapted the visual complexity of the secondary task in the form of the number of non-player characters of the secondary task to accomplish the primary task. We show that an adaptive virtual reality can improve users’ comfort by adapting to physiological arousal regarding the task complexity. Our findings suggest that physiologically adaptive virtual reality systems can improve users’ experience in a wide range of scenarios. Full article
(This article belongs to the Special Issue Cognitive and Physiological Assessments in Human-Computer Interaction)
Show Figures

Figure 1

21 pages, 4585 KiB  
Article
Cognitive Networks Extract Insights on COVID-19 Vaccines from English and Italian Popular Tweets: Anticipation, Logistics, Conspiracy and Loss of Trust
by Massimo Stella, Michael S. Vitevitch and Federico Botta
Big Data Cogn. Comput. 2022, 6(2), 52; https://doi.org/10.3390/bdcc6020052 - 12 May 2022
Cited by 10 | Viewed by 4097
Abstract
Monitoring social discourse about COVID-19 vaccines is key to understanding how large populations perceive vaccination campaigns. This work reconstructs how popular and trending posts framed semantically and emotionally COVID-19 vaccines on Twitter. We achieve this by merging natural language processing, cognitive network science [...] Read more.
Monitoring social discourse about COVID-19 vaccines is key to understanding how large populations perceive vaccination campaigns. This work reconstructs how popular and trending posts framed semantically and emotionally COVID-19 vaccines on Twitter. We achieve this by merging natural language processing, cognitive network science and AI-based image analysis. We focus on 4765 unique popular tweets in English or Italian about COVID-19 vaccines between December 2020 and March 2021. One popular English tweet contained in our data set was liked around 495,000 times, highlighting how popular tweets could cognitively affect large parts of the population. We investigate both text and multimedia content in tweets and build a cognitive network of syntactic/semantic associations in messages, including emotional cues and pictures. This network representation indicates how online users linked ideas in social discourse and framed vaccines along specific semantic/emotional content. The English semantic frame of “vaccine” was highly polarised between trust/anticipation (towards the vaccine as a scientific asset saving lives) and anger/sadness (mentioning critical issues with dose administering). Semantic associations with “vaccine,” “hoax” and conspiratorial jargon indicated the persistence of conspiracy theories and vaccines in extremely popular English posts. Interestingly, these were absent in Italian messages. Popular tweets with images of people wearing face masks used language that lacked the trust and joy found in tweets showing people with no masks. This difference indicates a negative effect attributed to face-covering in social discourse. Behavioural analysis revealed a tendency for users to share content eliciting joy, sadness and disgust and to like sad messages less. Both patterns indicate an interplay between emotions and content diffusion beyond sentiment. After its suspension in mid-March 2021, “AstraZeneca” was associated with trustful language driven by experts. After the deaths of a small number of vaccinated people in mid-March, popular Italian tweets framed “vaccine” by crucially replacing earlier levels of trust with deep sadness. Our results stress how cognitive networks and innovative multimedia processing open new ways for reconstructing online perceptions about vaccines and trust. Full article
Show Figures

Figure 1

Back to TopTop