Recent Advances in Data Science

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Network Science".

Deadline for manuscript submissions: closed (31 July 2021) | Viewed by 13241

Special Issue Editor


E-Mail Website
Guest Editor
Department of Statistics, Universidad Carlos III of Madrid, 28903 Madrid, Spain
Interests: data science; machine learning; statistical learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Currently, data science is an emerging field that is attracting an enormous amount of interest from both academia and industry. This interdisciplinary field applies the analytical knowledge generated in the areas of statistics, mathematics, and machine learning, supported by recent advances in computer science, to an area of interest. In other words, it combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data. This combination of expertise has succeeded in solving problems in several areas, including computer vision, natural language, or medicine. These successes have led to data science being called the sexiest job in the 21st century.

This Special Issue looks for novel theoretical developments as well as interesting applications of data science. Among the methodological advances are:

  • Deep learning (GANs, reinforcement learning, transfer learning, etc.);
  • Kernel methods;
  • Ensemble methods;
  • Tree-based techniques;
  • Discriminants;
  • Gaussian processes;
  • Bayesian methods;
  • Unsupervised techniques.

Interesting applications include (but are not limited to) the following:

  • Natural language processing;
  • Computer vision;
  • Finance;
  • Medicine;
  • Recommender systems;
  • Industry;
  • Sports;

Prof. Dr. David Delgado-Gómez
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Data science
  • Machine learning
  • Statistical learning
  • Deep learning
  • Kernel methods

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 569 KiB  
Article
Motor Load Balancing with Roll Force Prediction for a Cold-Rolling Setup with Neural Networks
by Sangho Lee and Youngdoo Son
Mathematics 2021, 9(12), 1367; https://doi.org/10.3390/math9121367 - 12 Jun 2021
Cited by 6 | Viewed by 1732
Abstract
The use of machine learning algorithms to improve productivity and quality and to maximize efficiency in the steel industry has recently become a major trend. In this paper, we propose an algorithm that automates the setup in the cold-rolling process and maximizes productivity [...] Read more.
The use of machine learning algorithms to improve productivity and quality and to maximize efficiency in the steel industry has recently become a major trend. In this paper, we propose an algorithm that automates the setup in the cold-rolling process and maximizes productivity by predicting the roll forces and motor loads with multi-layer perceptron networks in addition to balancing the motor loads to increase production speed. The proposed method first constructs multilayer perceptron models with all available information from the components, the hot-rolling process, and the cold-rolling process. Then, the cold-rolling variables related to the normal part set-up are adjusted to balance the motor loads among the rolling stands. To validate the proposed method, we used a data set with 70,533 instances of 128 types of steels with 78 variables, extracted from the actual manufacturing process. The proposed method was found to be superior to the physical prediction model currently used for setups with regard to the prediction accuracy, motor load balancing, and production speed. Full article
(This article belongs to the Special Issue Recent Advances in Data Science)
Show Figures

Figure 1

21 pages, 354 KiB  
Article
Measure of Similarity between GMMs by Embedding of the Parameter Space That Preserves KL Divergence
by Branislav Popović, Lenka Cepova, Robert Cep, Marko Janev and Lidija Krstanović
Mathematics 2021, 9(9), 957; https://doi.org/10.3390/math9090957 - 25 Apr 2021
Cited by 2 | Viewed by 1764
Abstract
In this work, we deliver a novel measure of similarity between Gaussian mixture models (GMMs) by neighborhood preserving embedding (NPE) of the parameter space, that projects components of GMMs, which by our assumption lie close to lower dimensional manifold. By doing so, we [...] Read more.
In this work, we deliver a novel measure of similarity between Gaussian mixture models (GMMs) by neighborhood preserving embedding (NPE) of the parameter space, that projects components of GMMs, which by our assumption lie close to lower dimensional manifold. By doing so, we obtain a transformation from the original high-dimensional parameter space, into a much lower-dimensional resulting parameter space. Therefore, resolving the distance between two GMMs is reduced to (taking the account of the corresponding weights) calculating the distance between sets of lower-dimensional Euclidean vectors. Much better trade-off between the recognition accuracy and the computational complexity is achieved in comparison to measures utilizing distances between Gaussian components evaluated in the original parameter space. The proposed measure is much more efficient in machine learning tasks that operate on large data sets, as in such tasks, the required number of overall Gaussian components is always large. Artificial, as well as real-world experiments are conducted, showing much better trade-off between recognition accuracy and computational complexity of the proposed measure, in comparison to all baseline measures of similarity between GMMs tested in this paper. Full article
(This article belongs to the Special Issue Recent Advances in Data Science)
Show Figures

Figure 1

17 pages, 836 KiB  
Article
Automatic Tempered Posterior Distributions for Bayesian Inversion Problems
by Luca Martino, Fernando Llorente, Ernesto Curbelo, Javier López-Santiago and Joaquín Míguez
Mathematics 2021, 9(7), 784; https://doi.org/10.3390/math9070784 - 06 Apr 2021
Cited by 9 | Viewed by 1966
Abstract
We propose a novel adaptive importance sampling scheme for Bayesian inversion problems where the inference of the variables of interest and the power of the data noise are carried out using distinct (but interacting) methods. More specifically, we consider a Bayesian analysis for [...] Read more.
We propose a novel adaptive importance sampling scheme for Bayesian inversion problems where the inference of the variables of interest and the power of the data noise are carried out using distinct (but interacting) methods. More specifically, we consider a Bayesian analysis for the variables of interest (i.e., the parameters of the model to invert), whereas we employ a maximum likelihood approach for the estimation of the noise power. The whole technique is implemented by means of an iterative procedure with alternating sampling and optimization steps. Moreover, the noise power is also used as a tempered parameter for the posterior distribution of the the variables of interest. Therefore, a sequence of tempered posterior densities is generated, where the tempered parameter is automatically selected according to the current estimate of the noise power. A complete Bayesian study over the model parameters and the scale parameter can also be performed. Numerical experiments show the benefits of the proposed approach. Full article
(This article belongs to the Special Issue Recent Advances in Data Science)
Show Figures

Figure 1

20 pages, 937 KiB  
Article
Combining Grammatical Evolution with Modal Interval Analysis: An Application to Solve Problems with Uncertainty
by Ivan Contreras, Remei Calm, Miguel A. Sainz, Pau Herrero and Josep Vehi
Mathematics 2021, 9(6), 631; https://doi.org/10.3390/math9060631 - 16 Mar 2021
Cited by 1 | Viewed by 1790
Abstract
Complex systems are usually affected by various sources of uncertainty, and it is essential to account for mechanisms that ensure the proper management of such disturbances. This paper introduces a novel approach to solve symbolic regression problems, which combines the potential of Grammatical [...] Read more.
Complex systems are usually affected by various sources of uncertainty, and it is essential to account for mechanisms that ensure the proper management of such disturbances. This paper introduces a novel approach to solve symbolic regression problems, which combines the potential of Grammatical Evolution to obtain solutions by describing the search space with context-free grammars, and the ability of Modal Interval Analysis (MIA) to handle quantified uncertainty. The presented methodology uses an MIA solver to evaluate the fitness function, which represents a novel method to manage uncertainty by means of interval-based prediction models. This paper first introduces the theory that establishes the basis of the proposed methodology, and follows with a description of the system architecture and implementation details. Then, we present an illustrative application example which consists of determining the outer and inner approximations of the mean velocity of the water current of a river stretch. Finally, the interpretation of the obtained results and the limitations of the proposed methodology are discussed. Full article
(This article belongs to the Special Issue Recent Advances in Data Science)
Show Figures

Figure 1

18 pages, 538 KiB  
Article
Assessment of Variability in Irregularly Sampled Time Series: Applications to Mental Healthcare
by Pablo Bonilla-Escribano, David Ramírez, Alejandro Porras-Segovia and Antonio Artés-Rodríguez
Mathematics 2021, 9(1), 71; https://doi.org/10.3390/math9010071 - 31 Dec 2020
Cited by 1 | Viewed by 2015
Abstract
Variability is defined as the propensity at which a given signal is likely to change. There are many choices for measuring variability, and it is not generally known which ones offer better properties. This paper compares different variability metrics applied to irregularly (nonuniformly) [...] Read more.
Variability is defined as the propensity at which a given signal is likely to change. There are many choices for measuring variability, and it is not generally known which ones offer better properties. This paper compares different variability metrics applied to irregularly (nonuniformly) sampled time series, which have important clinical applications, particularly in mental healthcare. Using both synthetic and real patient data, we identify the most robust and interpretable variability measures out of a set 21 candidates. Some of these candidates are also proposed in this work based on the absolute slopes of the time series. An additional synthetic data experiment shows that when the complete time series is unknown, as it happens with real data, a non-negligible bias that favors normalized and/or metrics based on the raw observations of the series appears. Therefore, only the results of the synthetic experiments, which have access to the full series, should be used to draw conclusions. Accordingly, the median absolute deviation of the absolute value of the successive slopes of the data is the best way of measuring variability for this kind of time series. Full article
(This article belongs to the Special Issue Recent Advances in Data Science)
Show Figures

Figure 1

27 pages, 2977 KiB  
Article
Cognitive Emotional Embedded Representations of Text to Predict Suicidal Ideation and Psychiatric Symptoms
by Mauricio Toledo-Acosta, Talin Barreiro, Asela Reig-Alamillo, Markus Müller, Fuensanta Aroca Bisquert, Maria Luisa Barrigon, Enrique Baca-Garcia and Jorge Hermosillo-Valadez
Mathematics 2020, 8(11), 2088; https://doi.org/10.3390/math8112088 - 23 Nov 2020
Cited by 2 | Viewed by 2720
Abstract
Mathematical modeling of language in Artificial Intelligence is of the utmost importance for many research areas and technological applications. Over the last decade, research on text representation has been directed towards the investigation of dense vectors popularly known as word embeddings. In this [...] Read more.
Mathematical modeling of language in Artificial Intelligence is of the utmost importance for many research areas and technological applications. Over the last decade, research on text representation has been directed towards the investigation of dense vectors popularly known as word embeddings. In this paper, we propose a cognitive-emotional scoring and representation framework for text based on word embeddings. This representation framework aims to mathematically model the emotional content of words in short free-form text messages, produced by adults in follow-up due to any mental health condition in the outpatient facilities within the Psychiatry Department of Hospital Fundación Jiménez Díaz in Madrid, Spain. Our contribution is a geometrical-topological framework for Sentiment Analysis, that includes a hybrid method that uses a cognitively-based lexicon together with word embeddings to generate graded sentiment scores for words, and a new topological method for clustering dense vector representations in high-dimensional spaces, where points are very sparsely distributed. Our framework is useful in detecting word association topics, emotional scoring patterns, and embedded vectors’ geometrical behavior, which might be useful in understanding language use in this kind of texts. Our proposed scoring system and representation framework might be helpful in studying relations between language and behavior and their use might have a predictive potential to prevent suicide. Full article
(This article belongs to the Special Issue Recent Advances in Data Science)
Show Figures

Figure 1

Back to TopTop