entropy-logo

Journal Browser

Journal Browser

Robust Distance Metric Learning in the Framework of Statistical Information Theory

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (30 November 2023) | Viewed by 7960

Special Issue Editors


E-Mail Website
Guest Editor
Department of Statistics and O.R., Complutense University of Madrid, 28040 Madrid, Spain
Interests: minimum divergence estimators: robustness and efficiency; robust test procedures based on minimum divergence estimators; robust test procedures in composite likelihood, empirical likelihood, change point, and time series
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Faculty of Mathematical Sciences, Department of Statistics and Operations Research, Complutense University of Madrid, 28040 Madrid, Spain
Interests: decision making; fuzzy measures; convex polytopes; mathematical aspects of subfamilies of fuzzy measures; divergence measures in statistical inference

Special Issue Information

Dear Colleagues,

In the last 30 years, the use of suitable tools of statistical information theory (divergence measures and entropies) in inferential procedures has become a very popular technique, not only in the field of statistics, but also in other areas, such as machine learning (ML), pattern recognition, and so on. Distance metrics learning (DML) plays an important role in ML to better understand and analyze the structure of the data. The reason is that distances show if some groups of data are close together or are quite separated. Therefore, DML provides a strong foundation for several machine learning algorithms, like k-nearest neighbors for supervised learning and k-means clustering for unsupervised learning. Hence, DML has received great attention from many researchers in recent years. Different distance metrics are chosen depending on the type of the data and the problem considered. Although some typical and simple distances, e.g., the Euclidean distance, can be considered to measure the similarities among a group of data, they cannot usually catch the statistical irregularities appearing in the data (as data contamination or outliers). Hence, the performance of these measures is rather poor even when the specified model assumptions are slightly violated (due to noises present in the data). This has led to consider entropies and divergence measures as an alternative way to measure distances between data sets, as these measures consider the underlying data distribution. Consequently, the Kullback–Leibler divergence measure, the family of Phi-divergencies measures (including the Cressie–Read family), the Bregman divergence measures (including the density power divergence family) and other divergence measures, as well as different entropies, have led to alternative procedures for treating the classical problems considered in ML.

The scope of the contributions to this Special Issue is to present new and original research papers based on different families of entropy and divergence measures for solving, from a theoretical or applied point of view, different problems considered in ML, paying special attention to robustness. Problems to be considered include (but are not limited to):

  • Dimensional reduction: Informational Correlation Analysis (ICA), Canonical Correlation Analysis, Principal Components, ICA procedures for solving blind source separation, and so on.
  • Clustering.
  • Classification.
  • Density-Ratio Estimation.
  • Non-negative Matrix Factorization.
  • Singular value decomposition: robust SVD, Active Learning, and so on.

Papers treating the case of high-dimensional data in ML problems in the framework of divergence measures are also welcome.

Finally, reviews making emphasis on the most recent state-of-the-art in relation to the solution of ML problems based on divergence measures are also accepted.

Prof. Dr. Leandro Pardo
Prof. Dr. Pedro Miranda
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 500 KiB  
Article
Distance-Metric Learning for Personalized Survival Analysis
by Wolfgang Galetzka, Bernd Kowall, Cynthia Jusi, Eva-Maria Huessler and Andreas Stang
Entropy 2023, 25(10), 1404; https://doi.org/10.3390/e25101404 - 30 Sep 2023
Viewed by 915
Abstract
Personalized time-to-event or survival prediction with right-censored outcomes is a pervasive challenge in healthcare research. Although various supervised machine learning methods, such as random survival forests or neural networks, have been adapted to handle such outcomes effectively, they do not provide explanations for [...] Read more.
Personalized time-to-event or survival prediction with right-censored outcomes is a pervasive challenge in healthcare research. Although various supervised machine learning methods, such as random survival forests or neural networks, have been adapted to handle such outcomes effectively, they do not provide explanations for their predictions, lacking interpretability. In this paper, an alternative method for survival prediction by weighted nearest neighbors is proposed. Fitting this model to data entails optimizing the weights by learning a metric. An individual prediction of this method can be explained by providing the user with the most influential data points for this prediction, i.e., the closest data points and their weights. The strengths and weaknesses in terms of predictive performance are highlighted on simulated data and an application of the method on two different real-world datasets of breast cancer patients shows its competitiveness with established methods. Full article
Show Figures

Figure 1

21 pages, 425 KiB  
Article
An Approach to Canonical Correlation Analysis Based on Rényi’s Pseudodistances
by María Jaenada, Pedro Miranda, Leandro Pardo and Konstantinos Zografos
Entropy 2023, 25(5), 713; https://doi.org/10.3390/e25050713 - 25 Apr 2023
Viewed by 878
Abstract
Canonical Correlation Analysis (CCA) infers a pairwise linear relationship between two groups of random variables, X and Y. In this paper, we present a new procedure based on Rényi’s pseudodistances (RP) aiming to detect linear and non-linear relationships between the two groups. [...] Read more.
Canonical Correlation Analysis (CCA) infers a pairwise linear relationship between two groups of random variables, X and Y. In this paper, we present a new procedure based on Rényi’s pseudodistances (RP) aiming to detect linear and non-linear relationships between the two groups. RP canonical analysis (RPCCA) finds canonical coefficient vectors, a and b, by maximizing an RP-based measure. This new family includes the Information Canonical Correlation Analysis (ICCA) as a particular case and extends the method for distances inherently robust against outliers. We provide estimating techniques for RPCCA and show the consistency of the proposed estimated canonical vectors. Further, a permutation test for determining the number of significant pairs of canonical variables is described. The robustness properties of the RPCCA are examined theoretically and empirically through a simulation study, concluding that the RPCCA presents a competitive alternative to ICCA with an added advantage in terms of robustness against outliers and data contamination. Full article
Show Figures

Figure 1

35 pages, 988 KiB  
Article
Revisiting Chernoff Information with Likelihood Ratio Exponential Families
by Frank Nielsen
Entropy 2022, 24(10), 1400; https://doi.org/10.3390/e24101400 - 01 Oct 2022
Cited by 6 | Viewed by 3802
Abstract
The Chernoff information between two probability measures is a statistical divergence measuring their deviation defined as their maximally skewed Bhattacharyya distance. Although the Chernoff information was originally introduced for bounding the Bayes error in statistical hypothesis testing, the divergence found many other applications [...] Read more.
The Chernoff information between two probability measures is a statistical divergence measuring their deviation defined as their maximally skewed Bhattacharyya distance. Although the Chernoff information was originally introduced for bounding the Bayes error in statistical hypothesis testing, the divergence found many other applications due to its empirical robustness property found in applications ranging from information fusion to quantum information. From the viewpoint of information theory, the Chernoff information can also be interpreted as a minmax symmetrization of the Kullback–Leibler divergence. In this paper, we first revisit the Chernoff information between two densities of a measurable Lebesgue space by considering the exponential families induced by their geometric mixtures: The so-called likelihood ratio exponential families. Second, we show how to (i) solve exactly the Chernoff information between any two univariate Gaussian distributions or get a closed-form formula using symbolic computing, (ii) report a closed-form formula of the Chernoff information of centered Gaussians with scaled covariance matrices and (iii) use a fast numerical scheme to approximate the Chernoff information between any two multivariate Gaussian distributions. Full article
Show Figures

Graphical abstract

18 pages, 713 KiB  
Article
Testing Equality of Multiple Population Means under Contaminated Normal Model Using the Density Power Divergence
by Jagannath Das, Beste Hamiye Beyaztas, Maxwell Kwesi Mac-Ocloo, Arunabha Majumdar and Abhijit Mandal
Entropy 2022, 24(9), 1189; https://doi.org/10.3390/e24091189 - 25 Aug 2022
Viewed by 1451
Abstract
This paper considers the problem of comparing several means under the one-way Analysis of Variance (ANOVA) setup. In ANOVA, outliers and heavy-tailed error distribution can seriously hinder the treatment effect, leading to false positive or false negative test results. We propose a robust [...] Read more.
This paper considers the problem of comparing several means under the one-way Analysis of Variance (ANOVA) setup. In ANOVA, outliers and heavy-tailed error distribution can seriously hinder the treatment effect, leading to false positive or false negative test results. We propose a robust test of ANOVA using an M-estimator based on the density power divergence. Compared with the existing robust and non-robust approaches, the proposed testing procedure is less affected by data contamination and improves the analysis. The asymptotic properties of the proposed test are derived under some regularity conditions. The finite-sample performance of the proposed test is examined via a series of Monte-Carlo experiments and two empirical data examples—bone marrow transplant dataset and glucose level dataset. The results produced by the proposed testing procedure are favorably compared with the classical ANOVA and robust tests based on Huber’s M-estimator and Tukey’s MM-estimator. Full article
Show Figures

Figure 1

Back to TopTop