Editorial

Jump to: Research

8 pages, 201 KiB

Open AccessEditorial

Special Issue on Applied Machine Learning

by Grzegorz Dudek

Appl. Sci. 2022, 12(4), 2039; https://doi.org/10.3390/app12042039 - 16 Feb 2022

Viewed by 1343

Abstract

Machine learning (ML) is one of the most exciting fields of computing today [...] Full article

(This article belongs to the Special Issue Applied Machine Learning)

Research

Jump to: Editorial

17 pages, 13452 KiB

Open AccessArticle

A Comparison of Time-Series Predictions for Healthcare Emergency Department Indicators and the Impact of COVID-19

by Diego Duarte, Chris Walshaw and Nadarajah Ramesh

Appl. Sci. 2021, 11(8), 3561; https://doi.org/10.3390/app11083561 - 15 Apr 2021

Cited by 16 | Viewed by 3131

Abstract

Across the world, healthcare systems are under stress and this has been hugely exacerbated by the COVID pandemic. Key Performance Indicators (KPIs), usually in the form of time-series data, are used to help manage that stress. Making reliable predictions of these indicators, particularly [...] Read more.

Across the world, healthcare systems are under stress and this has been hugely exacerbated by the COVID pandemic. Key Performance Indicators (KPIs), usually in the form of time-series data, are used to help manage that stress. Making reliable predictions of these indicators, particularly for emergency departments (ED), can facilitate acute unit planning, enhance quality of care and optimise resources. This motivates models that can forecast relevant KPIs and this paper addresses that need by comparing the Autoregressive Integrated Moving Average (ARIMA) method, a purely statistical model, to Prophet, a decomposable forecasting model based on trend, seasonality and holidays variables, and to the General Regression Neural Network (GRNN), a machine learning model. The dataset analysed is formed of four hourly valued indicators from a UK hospital: Patients in Department; Number of Attendances; Unallocated Patients with a DTA (Decision to Admit); Medically Fit for Discharge. Typically, the data exhibit regular patterns and seasonal trends and can be impacted by external factors such as the weather or major incidents. The COVID pandemic is an extreme instance of the latter and the behaviour of sample data changed dramatically. The capacity to quickly adapt to these changes is crucial and is a factor that shows better results for GRNN in both accuracy and reliability. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

16 pages, 1614 KiB

Open AccessArticle

Anticipatory Classifier System with Average Reward Criterion in Discretized Multi-Step Environments

by Norbert Kozłowski and Olgierd Unold

Appl. Sci. 2021, 11(3), 1098; https://doi.org/10.3390/app11031098 - 25 Jan 2021

Cited by 5 | Viewed by 1570

Abstract

Initially, Anticipatory Classifier Systems (ACS) were designed to address both single and multistep decision problems. In the latter case, the objective was to maximize the total discounted rewards, usually based on Q-learning algorithms. Studies on other Learning Classifier Systems (LCS) revealed many real-world [...] Read more.

Initially, Anticipatory Classifier Systems (ACS) were designed to address both single and multistep decision problems. In the latter case, the objective was to maximize the total discounted rewards, usually based on Q-learning algorithms. Studies on other Learning Classifier Systems (LCS) revealed many real-world sequential decision problems where the preferred objective is the maximization of the average of successive rewards. This paper proposes a relevant modification toward the learning component, allowing us to address such problems. The modified system is called AACS2 (Averaged ACS2) and is tested on three multistep benchmark problems. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

27 pages, 1135 KiB

Open AccessArticle

NowDeepN: An Ensemble of Deep Learning Models for Weather Nowcasting Based on Radar Products’ Values Prediction

by Gabriela Czibula, Andrei Mihai and Eugen Mihuleţ

Appl. Sci. 2021, 11(1), 125; https://doi.org/10.3390/app11010125 - 24 Dec 2020

Cited by 6 | Viewed by 2372

Abstract

One of the hottest topics in today’s meteorological research is Weather nowcasting, which is the weather forecast for a short time period such as one to six hours. Radar is an important data source used by operational meteorologists for issuing nowcasting warnings. With [...] Read more.

One of the hottest topics in today’s meteorological research is Weather nowcasting, which is the weather forecast for a short time period such as one to six hours. Radar is an important data source used by operational meteorologists for issuing nowcasting warnings. With the main goal of helping meteorologists in analysing radar data for issuing nowcasting warnings, we propose

N o w D e e p N

, a supervised learning based regression model which uses an ensemble of deep artificial neural networks for predicting the values for radar products at a certain time moment. The values predicted by

N o w D e e p N

may be used by meteorologists in estimating the future development of potential severe phenomena and would replace the time consuming process of extrapolating the radar echoes.

N o w D e e p N

is intended to be a proof of concept for the effectiveness of learning from radar data relevant patterns that would be useful for predicting future values for radar products based on their historical values. For assessing the performance of

N o w D e e p N

, a set of experiments on real radar data provided by the Romanian National Meteorological Administration is conducted. The impact of a data cleaning step introduced for correcting the erroneous radar products’ values is investigated both from the computational and meteorological perspectives. The experimental results also indicate the relevance of the features considered in the supervised learning task, highlighting that the radar products’ values at a certain geographical location at a time moment may be predicted from the products’ values from a neighboring area of that location at previous time moments. An overall Normalized Root Mean Squared Error less than

4 %

was obtained for

N o w D e e p N

on the cleaned radar data. Compared to similar related work from the nowcasting literature,

N o w D e e p N

outperforms several approaches and this emphasizes the performance of our proposal. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

22 pages, 3809 KiB

Open AccessArticle

Effectiveness of Machine Learning Approaches Towards Credibility Assessment of Crowdfunding Projects for Reliable Recommendations

by Wafa Shafqat, Yung-Cheol Byun and Namje Park

Appl. Sci. 2020, 10(24), 9062; https://doi.org/10.3390/app10249062 - 18 Dec 2020

Cited by 4 | Viewed by 2276

Abstract

Recommendation systems aim to decipher user interests, preferences, and behavioral patterns automatically. However, it becomes trickier to make the most trustworthy and reliable recommendation to users, especially when their hardest earned money is at risk. The credibility of the recommendation is of magnificent [...] Read more.

Recommendation systems aim to decipher user interests, preferences, and behavioral patterns automatically. However, it becomes trickier to make the most trustworthy and reliable recommendation to users, especially when their hardest earned money is at risk. The credibility of the recommendation is of magnificent importance in crowdfunding project recommendations. This research work devises a hybrid machine learning-based approach for credible crowdfunding projects’ recommendations by wisely incorporating backers’ sentiments and other influential features. The proposed model has four modules: a feature extraction module, a hybrid LDA-LSTM (latent Dirichlet allocation and long short-term memory) based latent topics evaluation module, credibility formulation, and recommendation module. The credibility analysis proffers a process of correlating project creator’s proficiency, reviewers’ sentiments, and their influence to estimate a project’s authenticity level that makes our model robust to unauthentic and untrustworthy projects and profiles. The recommendation module selects projects based on the user’s interests with the highest credible scores and recommends them. The proposed recommendation method harnesses numeric data and sentiment expressions linked with comments, backers’ preferences, profile data, and the creator’s credibility for quantitative examination of several alternative projects. The proposed model’s evaluation depicts that credibility assessment based on the hybrid machine learning approach contributes efficient results (with 98% accuracy) than existing recommendation models. We have also evaluated our credibility assessment technique on different categories of the projects, i.e., suspended, canceled, delivered, and never delivered projects, and achieved satisfactory outcomes, i.e., 93%, 84%, 58%, and 93%, projects respectively accurately classify into our desired range of credibility. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

17 pages, 1201 KiB

Open AccessArticle

Using BiLSTM Networks for Context-Aware Deep Sensitivity Labelling on Conversational Data

by Antreas Pogiatzis and Georgios Samakovitis

Appl. Sci. 2020, 10(24), 8924; https://doi.org/10.3390/app10248924 - 14 Dec 2020

Cited by 14 | Viewed by 2286

Abstract

Information privacy is a critical design feature for any exchange system, with privacy-preserving applications requiring, most of the time, the identification and labelling of sensitive information. However, privacy and the concept of “sensitive information” are extremely elusive terms, as they are heavily dependent [...] Read more.

Information privacy is a critical design feature for any exchange system, with privacy-preserving applications requiring, most of the time, the identification and labelling of sensitive information. However, privacy and the concept of “sensitive information” are extremely elusive terms, as they are heavily dependent upon the context they are conveyed in. To accommodate such specificity, we first introduce a taxonomy of four context classes to categorise relationships of terms with their textual surroundings by meaning, interaction, precedence, and preference. We then propose a predictive context-aware model based on a Bidirectional Long Short Term Memory network with Conditional Random Fields (BiLSTM + CRF) to identify and label sensitive information in conversational data (multi-class sensitivity labelling). We train our model on a synthetic annotated dataset of real-world conversational data categorised in 13 sensitivity classes that we derive from the P3P standard. We parameterise and run a series of experiments featuring word and character embeddings and introduce a set of auxiliary features to improve model performance. Our results demonstrate that the BiLSTM + CRF model architecture with BERT embeddings and WordShape features is the most effective (F1 score 96.73%). Evaluation of the model is conducted under both temporal and semantic contexts, achieving a 76.33% F1 score on unseen data and outperforms Google’s Data Loss Prevention (DLP) system on sensitivity labelling tasks. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

16 pages, 763 KiB

Open AccessArticle

Parsing Expression Grammars and Their Induction Algorithm

by Wojciech Wieczorek, Olgierd Unold and Łukasz Strąk

Appl. Sci. 2020, 10(23), 8747; https://doi.org/10.3390/app10238747 - 07 Dec 2020

Cited by 1 | Viewed by 1942

Abstract

Grammatical inference (GI), i.e., the task of finding a rule that lies behind given words, can be used in the analyses of amyloidogenic sequence fragments, which are essential in studies of neurodegenerative diseases. In this paper, we developed a new method that generates [...] Read more.

Grammatical inference (GI), i.e., the task of finding a rule that lies behind given words, can be used in the analyses of amyloidogenic sequence fragments, which are essential in studies of neurodegenerative diseases. In this paper, we developed a new method that generates non-circular parsing expression grammars (PEGs) and compares it with other GI algorithms on the sequences from a real dataset. The main contribution of this paper is a genetic programming-based algorithm for the induction of parsing expression grammars from a finite sample. The induction method has been tested on a real bioinformatics dataset and its classification performance has been compared to the achievements of existing grammatical inference methods. The evaluation of the generated PEG on an amyloidogenic dataset revealed its accuracy when predicting amyloid segments. We show that the new grammatical inference algorithm achieves the best ACC (Accuracy), AUC (Area under ROC curve), and MCC (Mathew’s correlation coefficient) scores in comparison to five other automata or grammar learning methods. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

18 pages, 3482 KiB

Open AccessArticle

Structural Vibration Tests: Use of Artificial Neural Networks for Live Prediction of Structural Stress

by Laura Wilmes, Raymond Olympio, Kristin M. de Payrebrune and Markus Schatz

Appl. Sci. 2020, 10(23), 8542; https://doi.org/10.3390/app10238542 - 29 Nov 2020

Cited by 3 | Viewed by 2442

Abstract

One of the ongoing tasks in space structure testing is the vibration test, in which a given structure is mounted onto a shaker and excited by a certain input load on a given frequency range, in order to reproduce the rigor of launch. [...] Read more.

One of the ongoing tasks in space structure testing is the vibration test, in which a given structure is mounted onto a shaker and excited by a certain input load on a given frequency range, in order to reproduce the rigor of launch. These vibration tests need to be conducted in order to ensure that the devised structure meets the expected loads of its future application. However, the structure must not be overtested to avoid any risk of damage. For this, the system’s response to the testing loads, i.e., stresses and forces in the structure, must be monitored and predicted live during the test. In order to solve the issues associated with existing methods of live monitoring of the structure’s response, this paper investigated the use of artificial neural networks (ANNs) to predict the system’s responses during the test. Hence, a framework was developed with different use cases to compare various kinds of artificial neural networks and eventually identify the most promising one. Thus, the conducted research accounts for a novel method for live prediction of stresses, allowing failure to be evaluated for different types of material via yield criteria. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

15 pages, 2823 KiB

Open AccessArticle

Machine Learning-Based Code Auto-Completion Implementation for Firmware Developers

by Junghyun Kim, Kyuman Lee and Sanghyun Choi

Appl. Sci. 2020, 10(23), 8520; https://doi.org/10.3390/app10238520 - 28 Nov 2020

Cited by 4 | Viewed by 2824

Abstract

With the advent of artificial intelligence, the research paradigm in natural language processing has been transitioned from statistical methods to machine learning-based approaches. One application is to develop a deep learning-based language model that helps software engineers write code faster. Although there have [...] Read more.

With the advent of artificial intelligence, the research paradigm in natural language processing has been transitioned from statistical methods to machine learning-based approaches. One application is to develop a deep learning-based language model that helps software engineers write code faster. Although there have already been many attempts to develop code auto-completion functionality from different research groups, a need to establish an in-house code has been identified for the following reasons: (1) a security-sensitive company (e.g., Samsung Electronics) may not want to utilize commercial tools given that there is a risk of leaked source codes and (2) commercial tools may not be applicable to the specific domain (e.g., SSD firmware development) especially if one needs to predict unique code patterns and style. This research proposes a hybrid approach that harnesses the synergy between machine learning techniques and advanced design methods aiming to develop a code auto-completion framework that helps firmware developers write code in a more efficient manner. The sensitivity analysis results show that the deterministic design results in reducing prediction accuracy as it generates output in some unexpected ways, while the probabilistic design provides a list of reasonable next code elements in which one could select it manually to increase prediction accuracy. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

17 pages, 1372 KiB

Open AccessArticle

Time-Aware Learning Framework for Over-The-Top Consumer Classification Based on Machine- and Deep-Learning Capabilities

by Jaeun Choi and Yongsung Kim

Appl. Sci. 2020, 10(23), 8476; https://doi.org/10.3390/app10238476 - 27 Nov 2020

Cited by 5 | Viewed by 2143

Abstract

With the widespread use of over-the-top (OTT) media, such as YouTube and Netflix, network markets are changing and innovating rapidly, making it essential for network providers to quickly and efficiently analyze OTT traffic with respect to pricing plans and infrastructure investments. This study [...] Read more.

With the widespread use of over-the-top (OTT) media, such as YouTube and Netflix, network markets are changing and innovating rapidly, making it essential for network providers to quickly and efficiently analyze OTT traffic with respect to pricing plans and infrastructure investments. This study proposes a time-aware deep-learning method of analyzing OTT traffic to classify users for this purpose. With traditional deep learning, classification accuracy can be improved over conventional methods, but it takes a considerable amount of time. Therefore, we propose a novel framework to better exploit accuracy, which is the strength of deep learning, while dramatically reducing classification time. This framework uses a two-step classification process. Because only ambiguous data need to be subjected to deep-learning classification, vast numbers of unambiguous data can be filtered out. This reduces the workload and ensures higher accuracy. The resultant method provides a simple method for customizing pricing plans and load balancing by classifying OTT users more accurately. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

20 pages, 13577 KiB

Open AccessArticle

Comparing Classical and Modern Machine Learning Techniques for Monitoring Pedestrian Workers in Top-View Construction Site Video Sequences

by Marcel Neuhausen, Dennis Pawlowski and Markus König

Appl. Sci. 2020, 10(23), 8466; https://doi.org/10.3390/app10238466 - 27 Nov 2020

Cited by 5 | Viewed by 2104

Abstract

Keeping an overview of all ongoing processes on construction sites is almost unfeasible, especially for the construction workers executing their tasks. It is difficult for workers to concentrate on their work while paying attention to other processes. If their workflows in hazardous areas [...] Read more.

Keeping an overview of all ongoing processes on construction sites is almost unfeasible, especially for the construction workers executing their tasks. It is difficult for workers to concentrate on their work while paying attention to other processes. If their workflows in hazardous areas do not run properly, this can lead to dangerous accidents. Tracking pedestrian workers could improve the productivity and safety management on construction sites. For this, vision-based tracking approaches are suitable, but the training and evaluation of such a system requires a large amount of data originating from construction sites. These are rarely available, which complicates deep learning approaches. Thus, we use a small generic dataset and juxtapose a deep learning detector with an approach based on classical machine learning techniques. We identify workers using a YOLOv3 detector and compare its performance with an approach based on a soft cascaded classifier. Afterwards, tracking is done by a Kalman filter. In our experiments, the classical approach outperforms YOLOv3 on the detection task given a small training dataset. However, the Kalman filter is sufficiently robust to compensate for the drawbacks of YOLOv3. We found that both approaches generally yield a satisfying tracking performances but feature different characteristics. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

19 pages, 3262 KiB

Open AccessArticle

Predicting and Interpreting Students’ Grades in Distance Higher Education through a Semi-Regression Method

by Stamatis Karlos, Georgios Kostopoulos and Sotiris Kotsiantis

Appl. Sci. 2020, 10(23), 8413; https://doi.org/10.3390/app10238413 - 26 Nov 2020

Cited by 21 | Viewed by 2541

Abstract

Multi-view learning is a machine learning app0roach aiming to exploit the knowledge retrieved from data, represented by multiple feature subsets known as views. Co-training is considered the most representative form of multi-view learning, a very effective semi-supervised classification algorithm for building highly accurate [...] Read more.

Multi-view learning is a machine learning app0roach aiming to exploit the knowledge retrieved from data, represented by multiple feature subsets known as views. Co-training is considered the most representative form of multi-view learning, a very effective semi-supervised classification algorithm for building highly accurate and robust predictive models. Even though it has been implemented in various scientific fields, it has not adequately used in educational data mining and learning analytics, since the hypothesis about the existence of two feature views cannot be easily implemented. Some notable studies have emerged recently dealing with semi-supervised classification tasks, such as student performance or student dropout prediction, while semi-supervised regression is uncharted territory. Therefore, the present study attempts to implement a semi-regression algorithm for predicting the grades of undergraduate students in the final exams of a one-year online course, which exploits three independent and naturally formed feature views, since they are derived from different sources. Moreover, we examine a well-established framework for interpreting the acquired results regarding their contribution to the final outcome per student/instance. To this purpose, a plethora of experiments is conducted based on data offered by the Hellenic Open University and representative machine learning algorithms. The experimental results demonstrate that the early prognosis of students at risk of failure can be accurately achieved compared to supervised models, even for a small amount of initially collected data from the first two semesters. The robustness of the applying semi-supervised regression scheme along with supervised learners and the investigation of features’ reasoning could highly benefit the educational domain. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

12 pages, 2140 KiB

Open AccessArticle

Heatwave Damage Prediction Using Random Forest Model in Korea

by Minsoo Park, Daekyo Jung, Seungsoo Lee and Seunghee Park

Appl. Sci. 2020, 10(22), 8237; https://doi.org/10.3390/app10228237 - 20 Nov 2020

Cited by 17 | Viewed by 3396

Abstract

Climate change increases the frequency and intensity of heatwaves, causing significant human and material losses every year. Big data, whose volumes are rapidly increasing, are expected to be used for preemptive responses. However, human cognitive abilities are limited, which can lead to ineffective [...] Read more.

Climate change increases the frequency and intensity of heatwaves, causing significant human and material losses every year. Big data, whose volumes are rapidly increasing, are expected to be used for preemptive responses. However, human cognitive abilities are limited, which can lead to ineffective decision making during disaster responses when artificial intelligence-based analysis models are not employed. Existing prediction models have limitations with regard to their validation, and most models focus only on heat-associated deaths. In this study, a random forest model was developed for the weekly prediction of heat-related damages on the basis of four years (2015–2018) of statistical, meteorological, and floating population data from South Korea. The model was evaluated through comparisons with other traditional regression models in terms of mean absolute error, root mean squared error, root mean squared logarithmic error, and coefficient of determination (R²). In a comparative analysis with observed values, the proposed model showed an R² value of 0.804. The results show that the proposed model outperforms existing models. They also show that the floating population variable collected from mobile global positioning systems contributes more to predictions than the aggregate population variable. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

20 pages, 1354 KiB

Open AccessArticle

Prediction of Stock Performance Using Deep Neural Networks

by Yanlei Gu, Takuya Shibukawa, Yohei Kondo, Shintaro Nagao and Shunsuke Kamijo

Appl. Sci. 2020, 10(22), 8142; https://doi.org/10.3390/app10228142 - 17 Nov 2020

Cited by 11 | Viewed by 3997

Abstract

Stock performance prediction is one of the most challenging issues in time series data analysis. Machine learning models have been widely used to predict financial time series during the past decades. Even though automatic trading systems that use Artificial Intelligence (AI) have become [...] Read more.

Stock performance prediction is one of the most challenging issues in time series data analysis. Machine learning models have been widely used to predict financial time series during the past decades. Even though automatic trading systems that use Artificial Intelligence (AI) have become a commonplace topic, there are few examples that successfully leverage the proven method invented by human stock traders to build automatic trading systems. This study proposes to build an automatic trading system by integrating AI and the proven method invented by human stock traders. In this study, firstly, the knowledge and experience of the successful stock traders are extracted from their related publications. After that, a Long Short-Term Memory-based deep neural network is developed to use the human stock traders’ knowledge in the automatic trading system. In this study, four different strategies are developed for the stock performance prediction and feature selection is performed to achieve the best performance in the classification of good performance stocks. Finally, the proposed deep neural network is trained and evaluated based on the historic data of the Japanese stock market. Experimental results indicate that the proposed ranking-based stock classification considering historical volatility strategy has the best performance in the developed four strategies. This method can achieve about a 20% earning rate per year over the basis of all stocks and has a lower risk than the basis. Comparison experiments also show that the proposed method outperforms conventional methods. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

17 pages, 4282 KiB

Open AccessArticle

An Improvement on Estimated Drifter Tracking through Machine Learning and Evolutionary Search

by Yong-Wook Nam, Hwi-Yeon Cho, Do-Youn Kim, Seung-Hyun Moon and Yong-Hyuk Kim

Appl. Sci. 2020, 10(22), 8123; https://doi.org/10.3390/app10228123 - 16 Nov 2020

Cited by 3 | Viewed by 2115

Abstract

In this study, we estimated drifter tracking over seawater using machine learning and evolutionary search techniques. The parameters used for the prediction are the hourly position of the drifter, the wind velocity, and the flow velocity of each drifter position. Our prediction model [...] Read more.

In this study, we estimated drifter tracking over seawater using machine learning and evolutionary search techniques. The parameters used for the prediction are the hourly position of the drifter, the wind velocity, and the flow velocity of each drifter position. Our prediction model was constructed through cross-validation. Trajectories were affected by wind velocity and flow velocity from the starting points of drifters. Mean absolute error (MAE) and normalized cumulative Lagrangian separation (NCLS) were used to evaluate various prediction models. Radial basis function network showed the lowest MAE of 0.0556, an improvement of 35.20% over the numerical model MOHID. Long short-term memory showed the highest NCLS of 0.8762, an improvement of 6.24% over MOHID. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

17 pages, 398 KiB

Open AccessArticle

PreNNsem: A Heterogeneous Ensemble Learning Framework for Vulnerability Detection in Software

by Lu Wang, Xin Li, Ruiheng Wang, Yang Xin, Mingcheng Gao and Yulin Chen

Appl. Sci. 2020, 10(22), 7954; https://doi.org/10.3390/app10227954 - 10 Nov 2020

Cited by 10 | Viewed by 2259

Abstract

Automated vulnerability detection is one of the critical issues in the realm of software security. Existing solutions to this problem are mostly based on features that are defined by human experts and directly lead to missed potential vulnerability. Deep learning is an effective [...] Read more.

Automated vulnerability detection is one of the critical issues in the realm of software security. Existing solutions to this problem are mostly based on features that are defined by human experts and directly lead to missed potential vulnerability. Deep learning is an effective method for automating the extraction of vulnerability characteristics. Our paper proposes intelligent and automated vulnerability detection while using deep representation learning and heterogeneous ensemble learning. Firstly, we transform sample data from source code by removing segments that are unrelated to the vulnerability in order to reduce code analysis and improve detection efficiency in our experiments. Secondly, we represent the sample data as real vectors by pre-training on the corpus and maintaining its semantic information. Thirdly, the vectors are fed to a deep learning model to obtain the features of vulnerability. Lastly, we train a heterogeneous ensemble classifier. We analyze the effectiveness and resource consumption of different network models, pre-training methods, classifiers, and vulnerabilities separately in order to evaluate the detection method. We also compare our approach with some well-known vulnerability detection commercial tools and academic methods. The experimental results show that our proposed method provides improvements in false positive rate, false negative rate, precision, recall, and F1 score. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

15 pages, 427 KiB

Open AccessArticle

Gaussian Mixture Models for Detecting Sleep Apnea Events Using Single Oronasal Airflow Record

by Hisham ElMoaqet, Jungyoon Kim, Dawn Tilbury, Satya Krishna Ramachandran, Mutaz Ryalat and Chao-Hsien Chu

Appl. Sci. 2020, 10(21), 7889; https://doi.org/10.3390/app10217889 - 06 Nov 2020

Cited by 22 | Viewed by 2327

Abstract

Sleep apnea is a common sleep-related disorder that significantly affects the population. It is characterized by repeated breathing interruption during sleep. Such events can induce hypoxia, which is a risk factor for multiple cardiovascular and cerebrovascular diseases. Polysomnography, the gold standard, is expensive, [...] Read more.

Sleep apnea is a common sleep-related disorder that significantly affects the population. It is characterized by repeated breathing interruption during sleep. Such events can induce hypoxia, which is a risk factor for multiple cardiovascular and cerebrovascular diseases. Polysomnography, the gold standard, is expensive, inaccessible, uncomfortable and an expert technician is needed to score sleep-related events. To address these limitations, many previous studies have proposed and implemented automatic scoring processes based on fewer sensors and machine learning classification algorithms. However, alternative device technologies developed for both home and hospital still have limited diagnostic accuracy for detecting apnea events even though many of the previous investigational algorithms are based on multiple physiological channel inputs. In this paper, we propose a new probabilistic algorithm based on (only) oronasal respiration signal for automated detection of apnea events during sleep. The proposed model leverages AASM recommendations for characterizing apnea events with respect to dynamic changes in the local respiratory airflow baseline. Unlike classical threshold-based classification models, we use a Gaussian mixture probability model for detecting sleep apnea based on the posterior probabilities of the respective events. Our results show significant improvement in the ability to detect sleep apnea events compared to a rule-based classifier that uses the same classification features and also compared to two previously published studies for automated apnea detection using the same respiratory flow signal. We use 96 sleep patients with different apnea severity levels as reflected by their Apnea-Hypopnea Index (AHI) levels. The performance was not only analyzed over obstructive sleep apnea (OSA) but also over other types of sleep apnea events including central and mixed sleep apnea (CSA, MSA). Also the performance was comprehensively analyzed and evaluated over patients with varying disease severity conditions, where it achieved an overall performance of

T P R = 88.5 %

,

T N R = 82.5 %

, and

A U C = 86.7 %

. The proposed approach contributes a new probabilistic framework for detecting sleep apnea events using a single airflow record with an improved capability to generalize over different apnea severity conditions Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

16 pages, 5908 KiB

Open AccessArticle

A Machine Learning-Assisted Numerical Predictor for Compressive Strength of Geopolymer Concrete Based on Experimental Data and Sensitivity Analysis

by An Thao Huynh, Quang Dang Nguyen, Qui Lieu Xuan, Bryan Magee, TaeChoong Chung, Kiet Tuan Tran and Khoa Tan Nguyen

Appl. Sci. 2020, 10(21), 7726; https://doi.org/10.3390/app10217726 - 31 Oct 2020

Cited by 43 | Viewed by 3652

Abstract

Geopolymer concrete offers a favourable alternative to conventional Portland concrete due to its reduced embodied carbon dioxide (CO₂) content. Engineering properties of geopolymer concrete, such as compressive strength, are commonly characterised based on experimental practices requiring large volumes of raw materials, [...] Read more.

Geopolymer concrete offers a favourable alternative to conventional Portland concrete due to its reduced embodied carbon dioxide (CO₂) content. Engineering properties of geopolymer concrete, such as compressive strength, are commonly characterised based on experimental practices requiring large volumes of raw materials, time for sample preparation, and costly equipment. To help address this inefficiency, this study proposes machine learning-assisted numerical methods to predict compressive strength of fly ash-based geopolymer (FAGP) concrete. Methods assessed included artificial neural network (ANN), deep neural network (DNN), and deep residual network (ResNet), based on experimentally collected data. Performance of the proposed approaches were evaluated using various statistical measures including R-squared (R²), root mean square error (RMSE), and mean absolute percentage error (MAPE). Sensitivity analysis was carried out to identify effects of the following six input variables on the compressive strength of FAGP concrete: sodium hydroxide/sodium silicate ratio, fly ash/aggregate ratio, alkali activator/fly ash ratio, concentration of sodium hydroxide, curing time, and temperature. Fly ash/aggregate ratio was found to significantly affect compressive strength of FAGP concrete. Results obtained indicate that the proposed approaches offer reliable methods for FAGP design and optimisation. Of note was ResNet, which demonstrated the highest R² and lowest RMSE and MAPE values. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

18 pages, 370 KiB

Open AccessArticle

Answer Set Programming for Regular Inference

by Wojciech Wieczorek, Tomasz Jastrzab and Olgierd Unold

Appl. Sci. 2020, 10(21), 7700; https://doi.org/10.3390/app10217700 - 30 Oct 2020

Cited by 2 | Viewed by 1815

Abstract

We propose an approach to non-deterministic finite automaton (NFA) inductive synthesis that is based on answer set programming (ASP) solvers. To that end, we explain how an NFA and its response to input samples can be encoded as rules in a logic program. [...] Read more.

We propose an approach to non-deterministic finite automaton (NFA) inductive synthesis that is based on answer set programming (ASP) solvers. To that end, we explain how an NFA and its response to input samples can be encoded as rules in a logic program. We then ask an ASP solver to find an answer set for the program, which we use to extract the automaton of the required size. We conduct a series of experiments on some benchmark sets, using the implementation of our approach. The results show that our method outperforms, in terms of CPU time, a SAT approach and other exact algorithms on all benchmarks. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

14 pages, 2527 KiB

Open AccessArticle

Selection of Support Vector Candidates Using Relative Support Distance for Sustainability in Large-Scale Support Vector Machines

by Minho Ryu and Kichun Lee

Appl. Sci. 2020, 10(19), 6979; https://doi.org/10.3390/app10196979 - 06 Oct 2020

Cited by 2 | Viewed by 1888

Abstract

Support vector machines (SVMs) are a well-known classifier due to their superior classification performance. They are defined by a hyperplane, which separates two classes with the largest margin. In the computation of the hyperplane, however, it is necessary to solve a quadratic programming [...] Read more.

Support vector machines (SVMs) are a well-known classifier due to their superior classification performance. They are defined by a hyperplane, which separates two classes with the largest margin. In the computation of the hyperplane, however, it is necessary to solve a quadratic programming problem. The storage cost of a quadratic programming problem grows with the square of the number of training sample points, and the time complexity is proportional to the cube of the number in general. Thus, it is worth studying how to reduce the training time of SVMs without compromising the performance to prepare for sustainability in large-scale SVM problems. In this paper, we proposed a novel data reduction method for reducing the training time by combining decision trees and relative support distance. We applied a new concept, relative support distance, to select good support vector candidates in each partition generated by the decision trees. The selected support vector candidates improved the training speed for large-scale SVM problems. In experiments, we demonstrated that our approach significantly reduced the training time while maintaining good classification performance in comparison with existing approaches. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

26 pages, 7714 KiB

Open AccessArticle

Application of Machine Learning Techniques to Delineate Homogeneous Climate Zones in River Basins of Pakistan for Hydro-Climatic Change Impact Studies

by Ammara Nusrat, Hamza Farooq Gabriel, Sajjad Haider, Shakil Ahmad, Muhammad Shahid and Saad Ahmed Jamal

Appl. Sci. 2020, 10(19), 6878; https://doi.org/10.3390/app10196878 - 01 Oct 2020

Cited by 9 | Viewed by 4157

Abstract

Climatic data archives, including grid-based remote-sensing and general circulation model (GCM) data, are used to identify future climate change trends. The performances of climate models vary in regions with spatio-temporal climatic heterogeneities because of uncertainties in model equations, anthropogenic forcing or climate variability. [...] Read more.

Climatic data archives, including grid-based remote-sensing and general circulation model (GCM) data, are used to identify future climate change trends. The performances of climate models vary in regions with spatio-temporal climatic heterogeneities because of uncertainties in model equations, anthropogenic forcing or climate variability. Hence, GCMs should be selected from climatically homogeneous zones. This study presents a framework for selecting GCMs and detecting future climate change trends after regionalizing the Indus river sub-basins in three basic steps: (1) regionalization of large river basins, based on spatial climate homogeneities, for four seasons using different machine learning algorithms and daily gridded precipitation data for 1975–2004; (2) selection of GCMs in each homogeneous climate region based on performance to simulate past climate and its temporal distribution pattern; (3) detecting future precipitation change trends using projected data (2006–2099) from the selected model for two future scenarios. The comprehensive framework, subject to some limitations and assumptions, provides divisional boundaries for the climatic zones in the study area, suitable GCMs for climate change impact projections for adaptation studies and spatially mapped precipitation change trend projections for four seasons. Thus, the importance of machine learning techniques for different types of analyses and managing long-term data is highlighted. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

18 pages, 4053 KiB

Open AccessArticle

Data-Driven Real-Time Online Taxi-Hailing Demand Forecasting Based on Machine Learning Method

by Zhizhen Liu, Hong Chen, Xiaoke Sun and Hengrui Chen

Appl. Sci. 2020, 10(19), 6681; https://doi.org/10.3390/app10196681 - 24 Sep 2020

Cited by 17 | Viewed by 3136

Abstract

The development of the intelligent transport system has created conditions for solving the supply–demand imbalance of public transportation services. For example, forecasting the demand for online taxi-hailing could help to rebalance the resource of taxis. In this research, we introduced a method to [...] Read more.

The development of the intelligent transport system has created conditions for solving the supply–demand imbalance of public transportation services. For example, forecasting the demand for online taxi-hailing could help to rebalance the resource of taxis. In this research, we introduced a method to forecast real-time online taxi-hailing demand. First, we analyze the relation between taxi demand and online taxi-hailing demand. Next, we propose six models containing different information based on backpropagation neural network (BPNN) and extreme gradient boosting (XGB) to forecast online taxi-hailing demand. Finally, we present a real-time online taxi-hailing demand forecasting model considering the projected taxi demand (“PTX”). The results indicate that including more information leads to better prediction performance, and the results show that including the information of projected taxi demand leads to a reduction of MAPE from 0.190 to 0.183 and an RMSE reduction from 23.921 to 21.050, and it increases R² from 0.845 to 0.853. The analysis indicates the demand regularity of online taxi-hailing and taxi, and the experiment realizes real-time prediction of online taxi-hailing by considering the projected taxi demand. The proposed method can help to schedule online taxi-hailing resources in advance. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

15 pages, 4286 KiB

Open AccessArticle

Detection of Precipitation and Fog Using Machine Learning on Backscatter Data from Lidar Ceilometer

by Yong-Hyuk Kim, Seung-Hyun Moon and Yourim Yoon

Appl. Sci. 2020, 10(18), 6452; https://doi.org/10.3390/app10186452 - 16 Sep 2020

Cited by 5 | Viewed by 2472

Abstract

The lidar ceilometer estimates cloud height by analyzing backscatter data. This study examines weather detectability using a lidar ceilometer by making an unprecedented attempt at detecting weather phenomena through the application of machine learning techniques to the backscatter data obtained from a lidar [...] Read more.

The lidar ceilometer estimates cloud height by analyzing backscatter data. This study examines weather detectability using a lidar ceilometer by making an unprecedented attempt at detecting weather phenomena through the application of machine learning techniques to the backscatter data obtained from a lidar ceilometer. This study investigates the weather phenomena of precipitation and fog, which are expected to greatly affect backscatter data. In this experiment, the backscatter data obtained from the lidar ceilometer, CL51, installed in Boseong, South Korea, were used. For validation, the data from the automatic weather station for precipitation and visibility sensor PWD20 for fog, installed at the same location, were used. The experimental results showed potential for precipitation detection, which yielded an F1 score of 0.34. However, fog detection was found to be very difficult and yielded an F1 score of 0.10. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

24 pages, 9002 KiB

Open AccessArticle

Remote Sensing Scene Classification and Explanation Using RSSCNet and LIME

by Sheng-Chieh Hung, Hui-Ching Wu and Ming-Hseng Tseng

Appl. Sci. 2020, 10(18), 6151; https://doi.org/10.3390/app10186151 - 04 Sep 2020

Cited by 26 | Viewed by 3304

Abstract

Classification is needed in disaster investigation, traffic control, and land-use resource management. How to quickly and accurately classify such remote sensing imagery has become a popular research topic. However, the application of large, deep neural network models for the training of classifiers in [...] Read more.

Classification is needed in disaster investigation, traffic control, and land-use resource management. How to quickly and accurately classify such remote sensing imagery has become a popular research topic. However, the application of large, deep neural network models for the training of classifiers in the hope of obtaining good classification results is often very time-consuming. In this study, a new CNN (convolutional neutral networks) architecture, i.e., RSSCNet (remote sensing scene classification network), with high generalization capability was designed. Moreover, a two-stage cyclical learning rate policy and the no-freezing transfer learning method were developed to speed up model training and enhance accuracy. In addition, the manifold learning t-SNE (t-distributed stochastic neighbor embedding) algorithm was used to verify the effectiveness of the proposed model, and the LIME (local interpretable model, agnostic explanation) algorithm was applied to improve the results in cases where the model made wrong predictions. Comparing the results of three publicly available datasets in this study with those obtained in previous studies, the experimental results show that the model and method proposed in this paper can achieve better scene classification more quickly and more efficiently. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

17 pages, 1276 KiB

Open AccessArticle

Solving Partial Differential Equations Using Deep Learning and Physical Constraints

by Yanan Guo, Xiaoqun Cao, Bainian Liu and Mei Gao

Appl. Sci. 2020, 10(17), 5917; https://doi.org/10.3390/app10175917 - 26 Aug 2020

Cited by 59 | Viewed by 14570

Abstract

The various studies of partial differential equations (PDEs) are hot topics of mathematical research. Among them, solving PDEs is a very important and difficult task. Since many partial differential equations do not have analytical solutions, numerical methods are widely used to solve PDEs. [...] Read more.

The various studies of partial differential equations (PDEs) are hot topics of mathematical research. Among them, solving PDEs is a very important and difficult task. Since many partial differential equations do not have analytical solutions, numerical methods are widely used to solve PDEs. Although numerical methods have been widely used with good performance, researchers are still searching for new methods for solving partial differential equations. In recent years, deep learning has achieved great success in many fields, such as image classification and natural language processing. Studies have shown that deep neural networks have powerful function-fitting capabilities and have great potential in the study of partial differential equations. In this paper, we introduce an improved Physics Informed Neural Network (PINN) for solving partial differential equations. PINN takes the physical information that is contained in partial differential equations as a regularization term, which improves the performance of neural networks. In this study, we use the method to study the wave equation, the KdV–Burgers equation, and the KdV equation. The experimental results show that PINN is effective in solving partial differential equations and deserves further research. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

28 pages, 1043 KiB

Open AccessArticle

Towards the Discovery of Influencers to Follow in Micro-Blogs (Twitter) by Detecting Topics in Posted Messages (Tweets)

by Mubashir Ali, Anees Baqir, Giuseppe Psaila and Sayyam Malik

Appl. Sci. 2020, 10(16), 5715; https://doi.org/10.3390/app10165715 - 18 Aug 2020

Cited by 8 | Viewed by 4390

Abstract

Micro-blogs, such as Twitter, have become important tools to share opinions and information among users. Messages concerning any topic are daily posted. A message posted by a given user reaches all the users that decided to follow her/him. Some users post many messages, [...] Read more.

Micro-blogs, such as Twitter, have become important tools to share opinions and information among users. Messages concerning any topic are daily posted. A message posted by a given user reaches all the users that decided to follow her/him. Some users post many messages, because they aim at being recognized as influencers, typically on specific topics. How a user can discover influencers concerned with her/his interest? Micro-blog apps and web sites lack a functionality to recommend users with influencers, on the basis of the content of posted messages. In this paper, we envision such a scenario and we identify the problem that constitutes the basic brick for developing a recommender of (possibly influencer) users: training a classification model by exploiting messages labeled with topical classes, so as this model can be used to classify unlabeled messages, to let the hidden topic they talk about emerge. Specifically, the paper reports the investigation activity we performed to demonstrate the suitability of our idea. To perform the investigation, we developed an investigation framework that exploits various patterns for extracting features from within messages (labeled with topical classes) in conjunction with the mostly-used classifiers for text classification problems. By means of the investigation framework, we were able to perform a large pool of experiments, that allowed us to evaluate all the combinations of feature patterns with classifiers. By means of a cost-benefit function called “Suitability”, that combines accuracy with execution time, we were able to demonstrate that a technique for discovering topics from within messages suitable for the application context is available. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

19 pages, 7567 KiB

Open AccessArticle

Fast Self-Adaptive Digital Camouflage Design Method Based on Deep Learning

by Houdi Xiao, Zhipeng Qu, Mingyun Lv, Yi Jiang, Chuanzhi Wang and Ruiru Qin

Appl. Sci. 2020, 10(15), 5284; https://doi.org/10.3390/app10155284 - 30 Jul 2020

Cited by 16 | Viewed by 6579

Abstract

Traditional digital camouflage is mainly designed for a single background and state. Its camouflage performance is appealing in the specified time and place, but with the change of place, season, and time, its camouflage performance is greatly weakened. Therefore, camouflage technology, which can [...] Read more.

Traditional digital camouflage is mainly designed for a single background and state. Its camouflage performance is appealing in the specified time and place, but with the change of place, season, and time, its camouflage performance is greatly weakened. Therefore, camouflage technology, which can change with the environment in real-time, is the inevitable development direction of the military camouflage field in the future. In this paper, a fast-self-adaptive digital camouflage design method based on deep learning is proposed for the new generation of adaptive optical camouflage. Firstly, we trained a YOLOv3 model that could identify four typical military targets with mean average precision (mAP) of 91.55%. Secondly, a pre-trained deepfillv1 model was used to design the preliminary camouflage texture. Finally, the preliminary camouflage texture was standardized by the k-means algorithm. The experimental results show that the camouflage pattern designed by our proposed method is consistent with the background in texture and semantics, and has excellent camouflage performance in optical camouflage. Meanwhile, the whole pattern generation process takes a short time, less than 0.4 s, which meets the camouflage design requirements of the near-real-time camouflage in the future. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

21 pages, 4849 KiB

Open AccessArticle

Research on Sentiment Classification of Online Travel Review Text

by Wen Chen, Zhiyun Xu, Xiaoyao Zheng, Qingying Yu and Yonglong Luo

Appl. Sci. 2020, 10(15), 5275; https://doi.org/10.3390/app10155275 - 30 Jul 2020

Cited by 33 | Viewed by 5012

Abstract

In recent years, the number of review texts on online travel review sites has increased dramatically, which has provided a novel source of data for travel research. Sentiment analysis is a process that can extract tourists’ sentiments regarding travel destinations from online travel [...] Read more.

In recent years, the number of review texts on online travel review sites has increased dramatically, which has provided a novel source of data for travel research. Sentiment analysis is a process that can extract tourists’ sentiments regarding travel destinations from online travel review texts. The results of sentiment analysis form an important basis for tourism decision making. Thus far, there has been minimal concern as to how sentiment analysis methods can be effectively applied to improve the effect of sentiment analysis. However, online travel review texts are largely short texts characterized by uneven sentiment distribution, which makes it difficult to obtain accurate sentiment analysis results. Accordingly, in order to improve the sentiment classification accuracy of online travel review texts, this study transformed sentiment analysis into a multi-classification problem based on machine learning methods, and further designed a keyword semantic expansion method based on a knowledge graph. Our proposed method extracts keywords from online travel review texts and obtains the concept list of keywords through Microsoft Knowledge Graph. This list is then added to the review text to facilitate the construction of semantically expanded classification data. Our proposed method increases the number of classification features used for short text by employing the huge corpus of information associated with the knowledge graph. In addition, this article introduces online travel review text preprocessing, keyword extraction, text representation, sampling, establishment classification labeling, and the selection and application of machine learning-based sentiment classification methods in order to build an effective sentiment classification model for online travel review text. Experiments were implemented and evaluated based on the English review texts of four famous attractions in four countries on the TripAdvisor website. Our experimental results demonstrate that the method proposed in this paper can be used to effectively improve the accuracy of the sentiment classification of online travel review texts. Our research attempts to emphasize and improve the methodological relevance and applicability of sentiment analysis for future travel research. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

31 pages, 987 KiB

Open AccessArticle

Automatic Identification of Local Features Representing Image Content with the Use of Convolutional Neural Networks

by Paweł Tarasiuk, Arkadiusz Tomczyk and Bartłomiej Stasiak

Appl. Sci. 2020, 10(15), 5186; https://doi.org/10.3390/app10155186 - 28 Jul 2020

Cited by 3 | Viewed by 2015

Abstract

Image analysis has many practical applications and proper representation of image content is its crucial element. In this work, a novel type of representation is proposed where an image is reduced to a set of highly sparse matrices. Equivalently, it can be viewed [...] Read more.

Image analysis has many practical applications and proper representation of image content is its crucial element. In this work, a novel type of representation is proposed where an image is reduced to a set of highly sparse matrices. Equivalently, it can be viewed as a set of local features of different types, as precise coordinates of detected keypoints are given. Additionally, every keypoint has a value expressing feature intensity at a given location. These features are extracted from a dedicated convolutional neural network autoencoder. This kind of representation has many advantages. First of all, local features are not manually designed but are automatically trained for a given class of images. Second, as they are trained in a network that restores its input on the output, they may be expected to minimize information loss. Consequently, they can be used to solve similar tasks replacing original images; such an ability was illustrated with image classification task. Third, the generated features, although automatically synthesized, are relatively easy to interpret. Taking a decoder part of our network, one can easily generate a visual building block connected with a specific feature. As the proposed method is entirely new, a detailed analysis of its properties for a relatively simple data set was conducted and is described in this work. Moreover, to present the quality of trained features, it is compared with results of convolutional neural networks having a similar working principle (sparse coding). Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

16 pages, 714 KiB

Open AccessArticle

The Efficiency of Social Network Services Management in Organizations. An In-Depth Analysis Applying Machine Learning Algorithms and Multiple Linear Regressions

by Luis Matosas-López and Alberto Romero-Ania

Appl. Sci. 2020, 10(15), 5167; https://doi.org/10.3390/app10155167 - 27 Jul 2020

Cited by 9 | Viewed by 2498

Abstract

The objective of this work is to detect the variables that allow organizations to manage their social network services efficiently. The study, applying machine learning algorithms and multiple linear regressions, reveals which aspects of published content increase the recognition of publications through retweets [...] Read more.

The objective of this work is to detect the variables that allow organizations to manage their social network services efficiently. The study, applying machine learning algorithms and multiple linear regressions, reveals which aspects of published content increase the recognition of publications through retweets and favorites. The authors examine (I) the characteristics of the content (publication volumes, publication components, and publication moments) and (II) the message of the content (publication topics). The research considers 21,771 publications and thirty-nine variables. The results show that the recognition obtained through retweets and favorites is conditioned both by the characteristics of the content and by the message of the content. The recognition through retweets improves when the organization uses links, hashtags, and topics related to gender equality, whereas the recognition through favorites increases when the organization uses original tweets, publications between 8:00 and 10:00 a.m. and, again, gender equality related topics. The findings of this research provide new knowledge about trends and patterns of use in social media, providing academics and professionals with the necessary guidelines to efficiently manage these technologies in the organizational field. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Graphical abstract

15 pages, 6192 KiB

Open AccessArticle

Environment Classification for Unmanned Aerial Vehicle Using Convolutional Neural Networks

by Carlos Villaseñor, Alberto A. Gallegos, Javier Gomez-Avila, Gehová López-González, Jorge D. Rios and Nancy Arana-Daniel

Appl. Sci. 2020, 10(14), 4991; https://doi.org/10.3390/app10144991 - 20 Jul 2020

Cited by 2 | Viewed by 2150

Abstract

Environment classification is one of the most critical tasks for Unmanned Aerial Vehicles (UAV). Since water accumulation may destabilize UAV, clouds must be detected and avoided. In a previous work presented by the authors, Superpixel Segmentation (SPS) descriptors with low computational cost are [...] Read more.

Environment classification is one of the most critical tasks for Unmanned Aerial Vehicles (UAV). Since water accumulation may destabilize UAV, clouds must be detected and avoided. In a previous work presented by the authors, Superpixel Segmentation (SPS) descriptors with low computational cost are used to classify ground, sky, and clouds. In this paper, an enhanced approach to classify the environment in those three classes is presented. The proposed scheme consists of a Convolutional Neural Network (CNN) trained with a dataset generated by both, an human expert and a Support Vector Machine (SVM) to capture context and precise localization. The advantage of using this approach is that the CNN classifies each pixel, instead of a cluster like in SPS, which improves the resolution of the classification, also, is less tedious for the human expert to generate a few training samples instead of the normal amount that it is required. This proposal is implemented for images obtained from video and photographic cameras mounted on a UAV facing in the same direction of the vehicle flight. Experimental results and comparison with other approaches are shown to demonstrate the effectiveness of the algorithm. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

15 pages, 1090 KiB

Open AccessArticle

Prediction of Academic Performance at Undergraduate Graduation: Course Grades or Grade Point Average?

by Ahmet Emin Tatar and Dilek Düştegör

Appl. Sci. 2020, 10(14), 4967; https://doi.org/10.3390/app10144967 - 19 Jul 2020

Cited by 25 | Viewed by 4854

Abstract

Predicting the academic standing of a student at the graduation time can be very useful, for example, in helping institutions select among candidates, or in helping potentially weak students in overcoming educational challenges. Most studies use individual course grades to represent college performance, [...] Read more.

Predicting the academic standing of a student at the graduation time can be very useful, for example, in helping institutions select among candidates, or in helping potentially weak students in overcoming educational challenges. Most studies use individual course grades to represent college performance, with a recent trend towards using grade point average (GPA) per semester. It is unknown however which of these representations can yield the best predictive power, due to the lack of a comparative study. To answer this question, a case study is conducted that generates two sets of classification models, using respectively individual course grades and GPAs. Comprehensive sets of experiments are conducted, spanning different student data, using several well-known machine learning algorithms, and trying various prediction window sizes. Results show that using course grades yields better accuracy if the prediction is done before the third term, whereas using GPAs achieves better accuracy otherwise. Most importantly, variance analysis on the experiment results reveals interesting insights easily generalizable: individual course grades with short prediction window induces noise, and using GPAs with long prediction window causes over-simplification. The demonstrated analytical approach can be applied to any dataset to determine when to use which college performance representation for enhanced prediction. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

18 pages, 9007 KiB

Open AccessArticle

Using Synthetic Data to Improve and Evaluate the Tracking Performance of Construction Workers on Site

by Marcel Neuhausen, Patrick Herbers and Markus König

Appl. Sci. 2020, 10(14), 4948; https://doi.org/10.3390/app10144948 - 18 Jul 2020

Cited by 18 | Viewed by 2819

Abstract

Vision-based tracking systems enable the optimization of the productivity and safety management on construction sites by monitoring the workers’ movements. However, training and evaluation of such a system requires a vast amount of data. Sufficient datasets rarely exist for this purpose. We investigate [...] Read more.

Vision-based tracking systems enable the optimization of the productivity and safety management on construction sites by monitoring the workers’ movements. However, training and evaluation of such a system requires a vast amount of data. Sufficient datasets rarely exist for this purpose. We investigate the use of synthetic data to overcome this issue. Using 3D computer graphics software, we model virtual construction site scenarios. These are rendered for the use as a synthetic dataset which augments a self-recorded real world dataset. Our approach is verified by means of a tracking system. For this, we train a YOLOv3 detector identifying pedestrian workers. Kalman filtering is applied to the detections to track them over consecutive video frames. First, the detector’s performance is examined when using synthetic data of various environmental conditions for training. Second, we compare the evaluation results of our tracking system on real world and synthetic scenarios. With an increase of about 7.5 percentage points in mean average precision, our findings show that a synthetic extension is beneficial for otherwise small datasets. The similarity of synthetic and real world results allow for the conclusion that 3D scenes are an alternative to evaluate vision-based tracking systems on hazardous scenes without exposing workers to risks. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

28 pages, 2198 KiB

Open AccessArticle

Optimization of Warehouse Operations with Genetic Algorithms

by Mirosław Kordos, Jan Boryczko, Marcin Blachnik and Sławomir Golak

Appl. Sci. 2020, 10(14), 4817; https://doi.org/10.3390/app10144817 - 13 Jul 2020

Cited by 20 | Viewed by 7483

Abstract

We present a complete, fully automatic solution based on genetic algorithms for the optimization of discrete product placement and of order picking routes in a warehouse. The solution takes as input the warehouse structure and the list of orders and returns the optimized [...] Read more.

We present a complete, fully automatic solution based on genetic algorithms for the optimization of discrete product placement and of order picking routes in a warehouse. The solution takes as input the warehouse structure and the list of orders and returns the optimized product placement, which minimizes the sum of the order picking times. The order picking routes are optimized mostly by genetic algorithms with multi-parent crossover operator, but for some cases also permutations and local search methods can be used. The product placement is optimized by another genetic algorithm, where the sum of the lengths of the optimized order picking routes is used as the cost of the given product placement. We present several ideas, which improve and accelerate the optimization, as the proper number of parents in crossover, the caching procedure, multiple restart and order grouping. In the presented experiments, in comparison with the random product placement and random product picking order, the optimization of order picking routes allowed the decrease of the total order picking times to 54%, optimization of product placement with the basic version of the method allowed to reduce that time to 26% and optimization of product placement with the methods with the improvements, as multiple restart and multi-parent crossover to 21%. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

16 pages, 596 KiB

Open AccessArticle

Hybrid Forecasting Models Based on the Neural Networks for the Volatility of Bitcoin

by Monghwan Seo and Geonwoo Kim

Appl. Sci. 2020, 10(14), 4768; https://doi.org/10.3390/app10144768 - 10 Jul 2020

Cited by 13 | Viewed by 3592

Abstract

In this paper, we study the volatility forecasts in the Bitcoin market, which has become popular in the global market in recent years. Since the volatility forecasts help trading decisions of traders who want a profit, the volatility forecasting is an important task [...] Read more.

In this paper, we study the volatility forecasts in the Bitcoin market, which has become popular in the global market in recent years. Since the volatility forecasts help trading decisions of traders who want a profit, the volatility forecasting is an important task in the market. For the improvement of the forecasting accuracy of Bitcoin’s volatility, we develop the hybrid forecasting models combining the GARCH family models with the machine learning (ML) approach. Specifically, we adopt Artificial Neural Network (ANN) and Higher Order Neural Network (HONN) for the ML approach and construct the hybrid models using the outputs of the GARCH models and several relevant variables as input variables. We carry out many experiments based on the proposed models and compare the forecasting accuracy of the models. In addition, we provide the Model Confidence Set (MCS) test to find statistically the best model. The results show that the hybrid models based on HONN provide more accurate forecasts than the other models. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

16 pages, 1559 KiB

Open AccessArticle

Social Media Rumor Refuter Feature Analysis and Crowd Identification Based on XGBoost and NLP

by Zongmin Li, Qi Zhang, Yuhong Wang and Shihang Wang

Appl. Sci. 2020, 10(14), 4711; https://doi.org/10.3390/app10144711 - 08 Jul 2020

Cited by 15 | Viewed by 3194

Abstract

One prominent dark side of online information behavior is the spreading of rumors. The feature analysis and crowd identification of social media rumor refuters based on machine learning methods can shed light on the rumor refutation process. This paper analyzed the association between [...] Read more.

One prominent dark side of online information behavior is the spreading of rumors. The feature analysis and crowd identification of social media rumor refuters based on machine learning methods can shed light on the rumor refutation process. This paper analyzed the association between user features and rumor refuting behavior in five main rumor categories: economics, society, disaster, politics, and military. Natural language processing (NLP) techniques are applied to quantify the user’s sentiment tendency and recent interests. Then, those results were combined with other personalized features to train an XGBoost classification model, and potential refuters can be identified. Information from 58,807 Sina Weibo users (including their 646,877 microblogs) for the five anti-rumor microblog categories was collected for model training and feature analysis. The results revealed that there were significant differences between rumor stiflers and refuters, as well as between refuters for different categories. Refuters tended to be more active on social media and a large proportion of them gathered in more developed regions. Tweeting history was a vital reference as well, and refuters showed higher interest in topics related with the rumor refuting message. Meanwhile, features such as gender, age, user labels and sentiment tendency also varied between refuters considering categories. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

24 pages, 812 KiB

Open AccessArticle

Salespeople Performance Evaluation with Predictive Analytics in B2B

by Nelito Calixto and João Ferreira

Appl. Sci. 2020, 10(11), 4036; https://doi.org/10.3390/app10114036 - 11 Jun 2020

Cited by 10 | Viewed by 6423

Abstract

Performance Evaluation is a process that occurs multiple times per year on a company. During this process, the manager and the salesperson evaluate how the salesperson performed on numerous Key Performance Indicators (KPIs). To prepare the evaluation meeting, managers have to gather data [...] Read more.

Performance Evaluation is a process that occurs multiple times per year on a company. During this process, the manager and the salesperson evaluate how the salesperson performed on numerous Key Performance Indicators (KPIs). To prepare the evaluation meeting, managers have to gather data from Customer Relationship Management System, Financial Systems, Excel files, among others, leading to a very time-consuming process. The result of the Performance Evaluation is a classification followed by actions to improve the performance where it is needed. Nowadays, through predictive analytics technologies, it is possible to make classifications based on data. In this work, the authors applied a Naive Bayes model over a dataset that is composed by sales from 594 salespeople along 3 years from a global freight forwarding company, to classify salespeople into pre-defined categories provided by the business. The classification is done in 3 classes, being: Not Performing, Good, and Outstanding. The classification was achieved based on KPI’s like growth volume and percentage, sales variability along the year, opportunities created, customer base line, target achievement among others. The authors assessed the performance of the model with a confusion matrix and other techniques like True Positives, True Negatives, and F1 score. The results showed an accuracy of 92.50% for the whole model. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

19 pages, 3728 KiB

Open AccessArticle

Comparison of Instance Selection and Construction Methods with Various Classifiers

by Marcin Blachnik and Mirosław Kordos

Appl. Sci. 2020, 10(11), 3933; https://doi.org/10.3390/app10113933 - 05 Jun 2020

Cited by 9 | Viewed by 3088

Abstract

Instance selection and construction methods were originally designed to improve the performance of the k-nearest neighbors classifier by increasing its speed and improving the classification accuracy. These goals were achieved by eliminating redundant and noisy samples, thus reducing the size of the training [...] Read more.

Instance selection and construction methods were originally designed to improve the performance of the k-nearest neighbors classifier by increasing its speed and improving the classification accuracy. These goals were achieved by eliminating redundant and noisy samples, thus reducing the size of the training set. In this paper, the performance of instance selection methods is investigated in terms of classification accuracy and reduction of training set size. The classification accuracy of the following classifiers is evaluated: decision trees, random forest, Naive Bayes, linear model, support vector machine and k-nearest neighbors. The obtained results indicate that for the most of the classifiers compressing the training set affects prediction performance and only a small group of instance selection methods can be recommended as a general purpose preprocessing step. These are learning vector quantization based algorithms, along with the Drop2 and Drop3. Other methods are less efficient or provide low compression ratio. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

19 pages, 3022 KiB

Open AccessArticle

Analysis of Cross-Referencing Artificial Intelligence Topics Based on Sentence Modeling

by Hosung Woo, JaMee Kim and WonGyu Lee

Appl. Sci. 2020, 10(11), 3681; https://doi.org/10.3390/app10113681 - 26 May 2020

Cited by 6 | Viewed by 3505

Abstract

Artificial intelligence (AI) is bringing about enormous changes in everyday life and today’s society. Interest in AI is continuously increasing as many countries are creating new AI-related degrees, short-term intensive courses, and secondary school programs. This study was conducted with the aim of [...] Read more.

Artificial intelligence (AI) is bringing about enormous changes in everyday life and today’s society. Interest in AI is continuously increasing as many countries are creating new AI-related degrees, short-term intensive courses, and secondary school programs. This study was conducted with the aim of identifying the interrelationships among topics based on the understanding of various bodies of knowledge and to provide a foundation for topic compositions to construct an academic body of knowledge of AI. To this end, machine learning-based sentence similarity measurement models used in machine translation, chatbots, and document summarization were applied to the body of knowledge of AI. Consequently, several similar topics related to agent designing in AI, such as algorithm complexity, discrete structures, fundamentals of software development, and parallel and distributed computing were identified. The results of this study provide the knowledge necessary to cultivate talent by identifying relationships with other fields in the edutech field. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

12 pages, 776 KiB

Open AccessArticle

Transfer Learning from Deep Neural Networks for Predicting Student Performance

by Maria Tsiakmaki, Georgios Kostopoulos, Sotiris Kotsiantis and Omiros Ragos

Appl. Sci. 2020, 10(6), 2145; https://doi.org/10.3390/app10062145 - 21 Mar 2020

Cited by 77 | Viewed by 7129

Abstract

Transferring knowledge from one domain to another has gained a lot of attention among scientists in recent years. Transfer learning is a machine learning approach aiming to exploit the knowledge retrieved from one problem for improving the predictive performance of a learning model [...] Read more.

Transferring knowledge from one domain to another has gained a lot of attention among scientists in recent years. Transfer learning is a machine learning approach aiming to exploit the knowledge retrieved from one problem for improving the predictive performance of a learning model for a different but related problem. This is particularly the case when there is a lack of data regarding a problem, but there is plenty of data about another related one. To this end, the present study intends to investigate the effectiveness of transfer learning from deep neural networks for the task of students’ performance prediction in higher education. Since building predictive models in the Educational Data Mining field through transfer learning methods has been poorly studied so far, we consider this study as an important step in this direction. Therefore, a plethora of experiments were conducted based on data originating from five compulsory courses of two undergraduate programs. The experimental results demonstrate that the prognosis of students at risk of failure can be achieved with satisfactory accuracy in most cases, provided that datasets of students who have attended other related courses are available. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

19 pages, 5701 KiB

Open AccessArticle

Use of Deep Multi-Target Prediction to Identify Learning Styles

by Everton Gomede, Rodolfo Miranda de Barros and Leonardo de Souza Mendes

Appl. Sci. 2020, 10(5), 1756; https://doi.org/10.3390/app10051756 - 04 Mar 2020

Cited by 26 | Viewed by 5193

Abstract

It is possible to classify students according to the manner they recognize, process, and store information. This classification should be considered when developing adaptive e-learning systems. It also creates a comprehension of the different styles students demonstrate while in the process of learning, [...] Read more.

It is possible to classify students according to the manner they recognize, process, and store information. This classification should be considered when developing adaptive e-learning systems. It also creates a comprehension of the different styles students demonstrate while in the process of learning, which can help adaptive e-learning systems offer advice and instructions to students, teachers, administrators, and parents in order to optimize students’ learning processes. Moreover, e-learning systems using computational and statistical algorithms to analyze students’ learning may offer the opportunity to complement traditional learning evaluation methods with new ones based on analytical intelligence. In this work, we propose a method based on deep multi-target prediction algorithm using Felder–Silverman learning styles model to improve students’ learning evaluation using feature selection, learning styles models, and multiple target classification. As a result, we present a set of features and a model based on an artificial neural network to investigate the possibility of improving the accuracy of automatic learning styles identification. The obtained results show that learning styles allow adaptive e-learning systems to improve the learning processes of students. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Graphical abstract

19 pages, 2395 KiB

Open AccessArticle

Graphs Regularized Robust Matrix Factorization and Its Application on Student Grade Prediction

by Yupei Zhang, Yue Yun, Huan Dai, Jiaqi Cui and Xuequn Shang

Appl. Sci. 2020, 10(5), 1755; https://doi.org/10.3390/app10051755 - 04 Mar 2020

Cited by 24 | Viewed by 2713

Abstract

Student grade prediction (SGP) is an important educational problem for designing personalized strategies of teaching and learning. Many studies adopt the technique of matrix factorization (MF). However, their methods often focus on the grade records regardless of the side information, such as backgrounds [...] Read more.

Student grade prediction (SGP) is an important educational problem for designing personalized strategies of teaching and learning. Many studies adopt the technique of matrix factorization (MF). However, their methods often focus on the grade records regardless of the side information, such as backgrounds and relationships. To this end, in this paper, we propose a new MF method, called graph regularized robust matrix factorization (GRMF), based on the recent robust MF version. GRMF integrates two side graphs built on the side data of students and courses into the objective of robust low-rank MF. As a result, the learned features of students and courses can grasp more priors from educational situations to achieve higher grade prediction results. The resulting objective problem can be effectively optimized by the Majorization Minimization (MM) algorithm. In addition, GRMF not only can yield the specific features for the education domain but can also deal with the case of missing, noisy, and corruptive data. To verify our method, we test GRMF on two public data sets for rating prediction and image recovery. Finally, we apply GRMF to educational data from our university, which is composed of 1325 students and 832 courses. The extensive experimental results manifestly show that GRMF is robust to various data problem and achieves more effective features in comparison with other methods. Moreover, GRMF also delivers higher prediction accuracy than other methods on our educational data set. This technique can facilitate personalized teaching and learning in higher education. Full article

(This article belongs to the Special Issue Applied Machine Learning)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Applied Machine Learning

Share This Special Issue

Special Issue Editor

Special Issue Information

Published Papers (42 papers)

Editorial

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI