Applications of Machine Learning in Big Data

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (30 June 2022) | Viewed by 18627

Special Issue Editors

Special Issue Information

Dear Colleagues,

In recent years, the rapid growth of storage technologies in combination with other factors, such as the development of mobile networks, the digital transformation of the society, and the emergence of new technologies, has enabled the generation of huge volume of data. This has led to information explosion. In the pursuit of finding patterns, machine learning techniques have shown to be of great utility. However, in this context, such techniques have to be adapted to handle a huge data volume.

This Special issue focuses on the design, adaptation, and implementation of machine learning techniques to the Big Data context for solving real-life problems, such as finance, health care, astrophysics, physics, geoscience, e-commerce, chemistry, life sciences, education, etc.

Prof. Dr. Miguel García-Torres
Prof. Dr. Federico Divina
Prof. Dr. Francisco A. Gómez Vela
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • big data
  • supervised learning
  • unsupervised learning
  • data preprocessing

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

21 pages, 2838 KiB  
Article
Analysis of Electric Energy Consumption Profiles Using a Machine Learning Approach: A Paraguayan Case Study
by Félix Morales, Miguel García-Torres, Gustavo Velázquez, Federico Daumas-Ladouce, Pedro E. Gardel-Sotomayor, Francisco Gómez-Vela, Federico Divina, José Luis Vázquez Noguera, Carlos Sauer Ayala, Diego P. Pinto-Roa, Julio César Mello-Román and David Becerra-Alonso
Electronics 2022, 11(2), 267; https://doi.org/10.3390/electronics11020267 - 14 Jan 2022
Cited by 6 | Viewed by 2404
Abstract
Correctly defining and grouping electrical feeders is of great importance for electrical system operators. In this paper, we compare two different clustering techniques, K-means and hierarchical agglomerative clustering, applied to real data from the east region of Paraguay. The raw data were pre-processed, [...] Read more.
Correctly defining and grouping electrical feeders is of great importance for electrical system operators. In this paper, we compare two different clustering techniques, K-means and hierarchical agglomerative clustering, applied to real data from the east region of Paraguay. The raw data were pre-processed, resulting in four data sets, namely, (i) a weekly feeder demand, (ii) a monthly feeder demand, (iii) a statistical feature set extracted from the original data and (iv) a seasonal and daily consumption feature set obtained considering the characteristics of the Paraguayan load curve. Considering the four data sets, two clustering algorithms, two distance metrics and five linkage criteria a total of 36 models with the Silhouette, Davies–Bouldin and Calinski–Harabasz index scores was assessed. The K-means algorithms with the seasonal feature data sets showed the best performance considering the Silhouette, Calinski–Harabasz and Davies–Bouldin validation index scores with a configuration of six clusters. Full article
(This article belongs to the Special Issue Applications of Machine Learning in Big Data)
Show Figures

Figure 1

21 pages, 771 KiB  
Article
ImbTreeEntropy and ImbTreeAUC: Novel R Packages for Decision Tree Learning on the Imbalanced Datasets
by Krzysztof Gajowniczek and Tomasz Ząbkowski
Electronics 2021, 10(6), 657; https://doi.org/10.3390/electronics10060657 - 11 Mar 2021
Cited by 6 | Viewed by 1726
Abstract
This paper presents two R packages ImbTreeEntropy and ImbTreeAUC to handle imbalanced data problems. ImbTreeEntropy functionality includes application of a generalized entropy functions, such as Rényi, Tsallis, Sharma–Mittal, Sharma–Taneja and Kapur, to measure impurity of a node. ImbTreeAUC provides non-standard measures to choose [...] Read more.
This paper presents two R packages ImbTreeEntropy and ImbTreeAUC to handle imbalanced data problems. ImbTreeEntropy functionality includes application of a generalized entropy functions, such as Rényi, Tsallis, Sharma–Mittal, Sharma–Taneja and Kapur, to measure impurity of a node. ImbTreeAUC provides non-standard measures to choose an optimal split point for an attribute (as well the optimal attribute for splitting) by employing local, semi-global and global AUC (Area Under the ROC curve) measures. Both packages are applicable for binary and multiclass problems and they support cost-sensitive learning, by defining a misclassification cost matrix, and weighted-sensitive learning. The packages accept all types of attributes, including continuous, ordered and nominal, where the latter type is simplified for multiclass problems to reduce the computational overheads. Both applications enable optimization of the thresholds where posterior probabilities determine final class labels in a way that misclassification costs are minimized. Model overfitting can be managed either during the growing phase or at the end using post-pruning. The packages are mainly implemented in R, however some computationally demanding functions are written in plain C++. In order to speed up learning time, parallel processing is supported as well. Full article
(This article belongs to the Special Issue Applications of Machine Learning in Big Data)
Show Figures

Figure 1

22 pages, 5795 KiB  
Article
XGB+FM for Severe Convection Forecast and Factor Selection
by Zhiying Lu, Xudong Ding, Xin Li, Haopeng Wu and Xiaolei Sun
Electronics 2021, 10(3), 321; https://doi.org/10.3390/electronics10030321 - 30 Jan 2021
Viewed by 1690
Abstract
In the field of meteorology, radiosonde data and observation data are critical for analyzing regional meteorological characteristics. Because of the high false alarm rate, severe convection forecasting is still challenging. In addition, the existing methods are difficult to use to capture the interaction [...] Read more.
In the field of meteorology, radiosonde data and observation data are critical for analyzing regional meteorological characteristics. Because of the high false alarm rate, severe convection forecasting is still challenging. In addition, the existing methods are difficult to use to capture the interaction of meteorological factors at the same time. In this research, a cascade of extreme gradient boosting (XGBoost) for feature transformation and a factorization machine (FM) for second-order feature interaction to capture the nonlinear interaction—XGB+FM—is proposed. An attention-based bidirectional long short-term memory (Att-Bi-LSTM) network is proposed to impute the missing data of meteorological observation stations. The problem of class imbalance is resolved by the support vector machines–synthetic minority oversampling technique (SVM-SMOTE), in which two oversampling strategies based on the support vector discrimination mechanism are proposed. It is proven that the method is effective, and the threat score (TS) is 7.27~14.28% higher than other methods. Moreover, we propose the meteorological factor selection method based on XGB+FM and improve the forecast accuracy, which is one of our contributions, as well as the forecast system. Full article
(This article belongs to the Special Issue Applications of Machine Learning in Big Data)
Show Figures

Figure 1

19 pages, 3799 KiB  
Article
Translating Sentimental Statements Using Deep Learning Techniques
by Yin-Fu Huang and Yi-Hao Li
Electronics 2021, 10(2), 138; https://doi.org/10.3390/electronics10020138 - 10 Jan 2021
Cited by 1 | Viewed by 1777
Abstract
Natural Language Processing (NLP) allows machines to know nature languages and helps us do tasks, such as retrieving information, answering questions, text summarization, categorizing text, and machine translation. To our understanding, no NLP was used to translate statements from negative sentiment to positive [...] Read more.
Natural Language Processing (NLP) allows machines to know nature languages and helps us do tasks, such as retrieving information, answering questions, text summarization, categorizing text, and machine translation. To our understanding, no NLP was used to translate statements from negative sentiment to positive sentiment with resembling semantics, although human communication needs. The developments of translating sentimental statements using deep learning techniques are proposed in this paper. First, for a sentiment translation model, we create negative–positive sentimental statement datasets. Then using deep learning techniques, the sentiment translation model is developed. Perplexity, bilingual evaluation understudy, and human evaluations are used in the experiments to test the model, and the results are satisfactory. Finally, if the trained datasets can be constructed as planned, we believe the techniques used in translating sentimental statements are possible, and more sophisticated models can be developed. Full article
(This article belongs to the Special Issue Applications of Machine Learning in Big Data)
Show Figures

Figure 1

15 pages, 639 KiB  
Article
gMSR: A Multi-GPU Algorithm to Accelerate a Massive Validation of Biclusters
by Aurelio López-Fernández, Domingo S. Rodríguez-Baena and Francisco Gómez-Vela
Electronics 2020, 9(11), 1782; https://doi.org/10.3390/electronics9111782 - 27 Oct 2020
Cited by 1 | Viewed by 1977
Abstract
Nowadays, Biclustering is one of the most widely used machine learning techniques to discover local patterns in datasets from different areas such as energy consumption, marketing, social networks or bioinformatics, among them. Particularly in bioinformatics, Biclustering techniques have become extremely time-consuming, also being [...] Read more.
Nowadays, Biclustering is one of the most widely used machine learning techniques to discover local patterns in datasets from different areas such as energy consumption, marketing, social networks or bioinformatics, among them. Particularly in bioinformatics, Biclustering techniques have become extremely time-consuming, also being huge the number of results generated, due to the continuous increase in the size of the databases over the last few years. For this reason, validation techniques must be adapted to this new environment in order to help researchers focus their efforts on a specific subset of results in an efficient, fast and reliable way. The aforementioned situation may well be considered as Big Data context. In this sense, multiple machine learning techniques have been implemented by the application of Graphic Processing Units (GPU) technology and CUDA architecture to accelerate the processing of large databases. However, as far as we know, this technology has not yet been applied to any bicluster validation technique. In this work, a multi-GPU version of one of the most used bicluster validation measure, Mean Squared Residue (MSR), is presented. It takes advantage of all the hardware and memory resources offered by GPU devices. Because of to this, gMSR is able to validate a massive number of biclusters in any Biclustering-based study within a Big Data context. Full article
(This article belongs to the Special Issue Applications of Machine Learning in Big Data)
Show Figures

Figure 1

Review

Jump to: Research

32 pages, 7536 KiB  
Review
Use and Adaptations of Machine Learning in Big Data—Applications in Real Cases in Agriculture
by Ania Cravero and Samuel Sepúlveda
Electronics 2021, 10(5), 552; https://doi.org/10.3390/electronics10050552 - 26 Feb 2021
Cited by 44 | Viewed by 7623
Abstract
The data generated in modern agricultural operations are provided by diverse elements, which allow a better understanding of the dynamic conditions of the crop, soil and climate, which indicates that these processes will be increasingly data-driven. Big Data and Machine Learning (ML) have [...] Read more.
The data generated in modern agricultural operations are provided by diverse elements, which allow a better understanding of the dynamic conditions of the crop, soil and climate, which indicates that these processes will be increasingly data-driven. Big Data and Machine Learning (ML) have emerged as high-performance computing technologies to create new opportunities to unravel, quantify and understand agricultural processes through data. However, there are many challenges to achieve the integration of these technologies. It implies making some adaptations to ML for using it with Big Data. These adaptations must consider the increasing volume of data, its variety and the transmission speed issues. This paper provides information on the use of Big Data and ML for agriculture, identifying challenges, adaptations and the design of architectures for these systems. We conducted a Systematic Literature Review (SLR), which allowed us to analyze 34 real cases applied in agriculture. This review may be of interest to computer or data scientists and electronic or software engineers. The results show that manipulating large volumes of data is no longer a challenge due to Cloud technologies. There are still challenges regarding (1) processing speed due to little control of the data in its different stages, raw, semi-processed and processed data (value data); (2) information visualization systems, which support technical data little understood by farmers. Full article
(This article belongs to the Special Issue Applications of Machine Learning in Big Data)
Show Figures

Figure 1

Back to TopTop