entropy-logo

Journal Browser

Journal Browser

Information Theoretic Criteria: New Theoretical Developments and Applications

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (21 August 2023) | Viewed by 15867

Special Issue Editor


E-Mail Website
Guest Editor
Department of Statistics, University of Auckland, Auckland 1142, New Zealand
Interests: statistical signal processing; time series; model selection; graphical models; dictionary learning; data compression
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The Akaike Information Criterion and the Bayesian Information Criterion (sometimes called Schwarz Information Criterion) were introduced decades ago and are still extensively used in many areas of science and engineering. They are implemented in various software packages and deemed to be standard instruments for model selection. At the same time, there is a significant body of literature in which other information theoretic (IT) criteria have been proposed.

This Special Issue aims to be a forum for the presentation of new and improved IT criteria. Authors are encouraged to submit works that are focused on the following topics:

  1. Development of novel criteria;
  2. Theoretical analysis of the performance of the existing or newly proposed criteria;
  3. Computational aspects related to IT criteria;
  4. Use of the IT criteria in machine learning, especially in deep learning;
  5. Applications of the IT criteria in non-standard settings such as, for example, big data, data sets with missing values, and non-Gaussian noise;
  6. Use of the IT criteria in time series analysis and forecasting;
  7. Applications of the Minimum Description Length and of the Minimum Message Length;
  8. Other applications where IT criteria play the key role for improving the performance.

Dr. Ciprian Giurcaneanu
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • information theoretic criteria
  • model selection
  • theoretical analysis
  • machine learning
  • signal processing
  • time series analysis and forecasting
  • applications

Related Special Issue

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 706 KiB  
Article
Expecting the Unexpected: Entropy and Multifractal Systems in Finance
by Giuseppe Orlando and Marek Lampart
Entropy 2023, 25(11), 1527; https://doi.org/10.3390/e25111527 - 09 Nov 2023
Viewed by 1084
Abstract
Entropy serves as a measure of chaos in systems by representing the average rate of information loss about a phase point’s position on the attractor. When dealing with a multifractal system, a single exponent cannot fully describe its dynamics, necessitating a continuous spectrum [...] Read more.
Entropy serves as a measure of chaos in systems by representing the average rate of information loss about a phase point’s position on the attractor. When dealing with a multifractal system, a single exponent cannot fully describe its dynamics, necessitating a continuous spectrum of exponents, known as the singularity spectrum. From an investor’s point of view, a rise in entropy is a signal of abnormal and possibly negative returns. This means he has to expect the unexpected and prepare for it. To explore this, we analyse the New York Stock Exchange (NYSE) U.S. Index as well as its constituents. Through this examination, we assess their multifractal characteristics and identify market conditions (bearish/bullish markets) using entropy, an effective method for recognizing fluctuating fractal markets. Our findings challenge conventional beliefs by demonstrating that price declines lead to increased entropy, contrary to some studies in the literature that suggest that reduced entropy in market crises implies more determinism. Instead, we propose that bear markets are likely to exhibit higher entropy, indicating a greater chance of unexpected extreme events. Moreover, our study reveals a power-law behaviour and indicates the absence of variance. Full article
Show Figures

Figure 1

26 pages, 4309 KiB  
Article
Stock Market Forecasting Based on Spatiotemporal Deep Learning
by Yung-Chen Li, Hsiao-Yun Huang, Nan-Ping Yang and Yi-Hung Kung
Entropy 2023, 25(9), 1326; https://doi.org/10.3390/e25091326 - 12 Sep 2023
Viewed by 1739
Abstract
This study introduces the Spacetimeformer model, a novel approach for predicting stock prices, leveraging the Transformer architecture with a time–space mechanism to capture both spatial and temporal interactions among stocks. Traditional Long–Short Term Memory (LSTM) and recent Transformer models lack the ability to [...] Read more.
This study introduces the Spacetimeformer model, a novel approach for predicting stock prices, leveraging the Transformer architecture with a time–space mechanism to capture both spatial and temporal interactions among stocks. Traditional Long–Short Term Memory (LSTM) and recent Transformer models lack the ability to directly incorporate spatial information, making the Spacetimeformer model a valuable addition to stock price prediction. This article uses the ten minute stock prices of the constituent stocks of the Taiwan 50 Index and the intraday data of individual stock on the Taiwan Stock Exchange. By training the Timespaceformer model with multi-time-step stock price data, we can predict the stock prices at every ten minute interval within the next hour. Finally, we also compare the prediction results with LSTM and Transformer models that only consider temporal relationships. The research demonstrates that the Spacetimeformer model consistently captures essential trend changes and provides stable predictions in stock price forecasting. This article proposes a Spacetimeformer model combined with daily moving windows. This method has superior performance in stock price prediction and also demonstrates the significance and value of the space–time mechanism for prediction. We recommend that people who want to predict stock prices or other financial instruments try our proposed method to obtain a better return on investment. Full article
Show Figures

Figure 1

17 pages, 5676 KiB  
Article
An Approach for the Estimation of Concentrations of Soluble Compounds in E. coli Bioprocesses
by Deividas Masaitis, Renaldas Urniezius, Rimvydas Simutis, Vygandas Vaitkus, Mindaugas Matukaitis, Benas Kemesis, Vytautas Galvanauskas and Benas Sinkevicius
Entropy 2023, 25(9), 1302; https://doi.org/10.3390/e25091302 - 06 Sep 2023
Cited by 1 | Viewed by 956
Abstract
Accurate estimations of the concentrations of soluble compounds are crucial for optimizing bioprocesses involving Escherichia coli (E. coli). This study proposes a hybrid model structure that leverages off-gas analysis data and physiological parameters, including the average biomass age and specific growth [...] Read more.
Accurate estimations of the concentrations of soluble compounds are crucial for optimizing bioprocesses involving Escherichia coli (E. coli). This study proposes a hybrid model structure that leverages off-gas analysis data and physiological parameters, including the average biomass age and specific growth rate, to estimate soluble compounds such as acetate and glutamate in fed-batch cultivations We used a hybrid recurrent neural network to establish the relationships between these parameters. To enhance the precision of the estimates, the model incorporates ensemble averaging and information gain. Ensemble averaging combines varying model inputs, leading to more robust representations of the underlying dynamics in E. coli bioprocesses. Our hybrid model estimates acetates with 1% and 8% system precision using data from the first site and the second site at GSK plc, respectively. Using the data from the second site, the precision of the approach for other solutes was as fallows: isoleucine −8%, lactate and glutamate −9%, and a 13% error for glutamine., These results, demonstrate its practical potential. Full article
Show Figures

Figure 1

29 pages, 2475 KiB  
Article
Consistent Model Selection Procedure for Random Coefficient INAR Models
by Kaizhi Yu and Tielai Tao
Entropy 2023, 25(8), 1220; https://doi.org/10.3390/e25081220 - 16 Aug 2023
Viewed by 805
Abstract
In the realm of time series data analysis, information criteria constructed on the basis of likelihood functions serve as crucial instruments for determining the appropriate lag order. However, the intricate structure of random coefficient integer-valued time series models, which are founded on thinning [...] Read more.
In the realm of time series data analysis, information criteria constructed on the basis of likelihood functions serve as crucial instruments for determining the appropriate lag order. However, the intricate structure of random coefficient integer-valued time series models, which are founded on thinning operators, complicates the establishment of likelihood functions. Consequently, employing information criteria such as AIC and BIC for model selection becomes problematic. This study introduces an innovative methodology that formulates a penalized criterion by utilizing the estimation equation within conditional least squares estimation, effectively addressing the aforementioned challenge. Initially, the asymptotic properties of the penalized criterion are derived, followed by a numerical simulation study and a comparative analysis. The findings from both theoretical examinations and simulation investigations reveal that this novel approach consistently selects variables under relatively relaxed conditions. Lastly, the applications of this method to infectious disease data and seismic frequency data produce satisfactory outcomes. Full article
Show Figures

Figure 1

23 pages, 1183 KiB  
Article
Improving the Performance and Stability of TIC and ICE
by Tyler Ward
Entropy 2023, 25(3), 512; https://doi.org/10.3390/e25030512 - 16 Mar 2023
Viewed by 840
Abstract
Takeuchi’s Information Criterion (TIC) was introduced as a generalization of Akaike’s Information Criterion (AIC) in 1976. Though TIC avoids many of AIC’s strict requirements and assumptions, it is only rarely used. One of the reasons for this is that the trace term introduced [...] Read more.
Takeuchi’s Information Criterion (TIC) was introduced as a generalization of Akaike’s Information Criterion (AIC) in 1976. Though TIC avoids many of AIC’s strict requirements and assumptions, it is only rarely used. One of the reasons for this is that the trace term introduced in TIC is numerically unstable and computationally expensive to compute. An extension of TIC called ICE was published in 2021, which allows this trace term to be used for model fitting (where it was primarily compared to L2 regularization) instead of just model selection. That paper also examined numerically stable and computationally efficient approximations that could be applied to TIC or ICE, but these approximations were only examined on small synthetic models. This paper applies and extends these approximations to larger models on real datasets for both TIC and ICE. This work shows the practical models may use TIC and ICE in a numerically stable way to achieve superior results at a reasonable computational cost. Full article
Show Figures

Figure 1

24 pages, 446 KiB  
Article
ROC Analyses Based on Measuring Evidence Using the Relative Belief Ratio
by Luai Al-Labadi, Michael Evans and Qiaoyu Liang
Entropy 2022, 24(12), 1710; https://doi.org/10.3390/e24121710 - 23 Nov 2022
Viewed by 1128
Abstract
ROC (Receiver Operating Characteristic) analyses are considered under a variety of assumptions concerning the distributions of a measurement X in two populations. These include the binormal model as well as nonparametric models where little is assumed about the form of distributions. The methodology [...] Read more.
ROC (Receiver Operating Characteristic) analyses are considered under a variety of assumptions concerning the distributions of a measurement X in two populations. These include the binormal model as well as nonparametric models where little is assumed about the form of distributions. The methodology is based on a characterization of statistical evidence which is dependent on the specification of prior distributions for the unknown population distributions as well as for the relevant prevalence w of the disease in a given population. In all cases, elicitation algorithms are provided to guide the selection of the priors. Inferences are derived for the AUC (Area Under the Curve), the cutoff c used for classification as well as the error characteristics used to assess the quality of the classification. Full article
Show Figures

Figure 1

27 pages, 402 KiB  
Article
How to Evaluate Theory-Based Hypotheses in Meta-Analysis Using an AIC-Type Criterion
by Rebecca M. Kuiper
Entropy 2022, 24(11), 1525; https://doi.org/10.3390/e24111525 - 25 Oct 2022
Viewed by 1165
Abstract
Meta-analysis techniques allow researchers to aggregate effect sizes—like standardized mean difference(s), correlation(s), or odds ratio(s)—of different studies. This leads to overall effect-size estimates and their confidence intervals. Additionally, researchers can aim for theory development or theory evaluation. That is, researchers may not only [...] Read more.
Meta-analysis techniques allow researchers to aggregate effect sizes—like standardized mean difference(s), correlation(s), or odds ratio(s)—of different studies. This leads to overall effect-size estimates and their confidence intervals. Additionally, researchers can aim for theory development or theory evaluation. That is, researchers may not only be interested in these overall estimates but also in a specific ordering or size of them, which then reflects a theory. Researchers may have expectations regarding the ordering of standardized mean differences or about the (ranges of) sizes of an odds ratio or Hedges’ g. Such theory-based hypotheses most probably contain inequality constraints and can be evaluated with the Akaike’s information criterion type (i.e., AIC-type) confirmatory model selection criterion called generalized order-restricted information criterion (GORICA). This paper introduces and illustrates how the GORICA can be applied to meta-analyzed estimates. Additionally, it compares the use of the GORICA to that of classical null hypothesis testing and the AIC, that is, the use of theory-based hypotheses versus null hypotheses. By using the GORICA, researchers from all types of fields (e.g., psychology, sociology, political science, biomedical science, and medicine) can quantify the support for theory-based hypotheses specified a priori. This leads to increased statistical power, because of (i) the use of theory-based hypotheses (cf. one-sided vs. two-sided testing) and (ii) the use of meta-analyzed results (that are based on multiple studies which increase the combined sample size). The quantification of support and the power increase aid in, for instance, evaluating and developing theories and, therewith, developing evidence-based treatments and policy. Full article
36 pages, 2085 KiB  
Article
Mixture Complexity and Its Application to Gradual Clustering Change Detection
by Shunki Kyoya and Kenji Yamanishi
Entropy 2022, 24(10), 1407; https://doi.org/10.3390/e24101407 - 01 Oct 2022
Cited by 1 | Viewed by 1509
Abstract
We consider measuring the number of clusters (cluster size) in the finite mixture models for interpreting their structures. Many existing information criteria have been applied for this issue by regarding it as the same as the number of mixture components (mixture size); however, [...] Read more.
We consider measuring the number of clusters (cluster size) in the finite mixture models for interpreting their structures. Many existing information criteria have been applied for this issue by regarding it as the same as the number of mixture components (mixture size); however, this may not be valid in the presence of overlaps or weight biases. In this study, we argue that the cluster size should be measured as a continuous value and propose a new criterion called mixture complexity (MC) to formulate it. It is formally defined from the viewpoint of information theory and can be seen as a natural extension of the cluster size considering overlap and weight bias. Subsequently, we apply MC to the issue of gradual clustering change detection. Conventionally, clustering changes have been regarded as abrupt, induced by the changes in the mixture size or cluster size. Meanwhile, we consider the clustering changes to be gradual in terms of MC; it has the benefits of finding the changes earlier and discerning the significant and insignificant changes. We further demonstrate that the MC can be decomposed according to the hierarchical structures of the mixture models; it helps us to analyze the detail of substructures. Full article
Show Figures

Figure 1

13 pages, 1853 KiB  
Article
Bibliometric Analysis of Information Theoretic Studies
by Weng Hoe Lam, Weng Siew Lam, Saiful Hafizah Jaaman and Pei Fun Lee
Entropy 2022, 24(10), 1359; https://doi.org/10.3390/e24101359 - 25 Sep 2022
Cited by 7 | Viewed by 1674
Abstract
Statistical information theory is a method for quantifying the amount of stochastic uncertainty in a system. This theory originated in communication theory. The application of information theoretic approaches has been extended to different fields. This paper aims to perform a bibliometric analysis of [...] Read more.
Statistical information theory is a method for quantifying the amount of stochastic uncertainty in a system. This theory originated in communication theory. The application of information theoretic approaches has been extended to different fields. This paper aims to perform a bibliometric analysis of information theoretic publications listed on the Scopus database. The data of 3701 documents were extracted from the Scopus database. The software used for analysis includes Harzing’s Publish or Perish and VOSviewer. Results including publication growth, subject areas, geographical contributions, country co-authorship, most cited publications, keyword co-occurrence analysis, and citation metrics are presented in this paper. Publication growth has been steady since 2003. The United States has the highest number of publications and received more than half of the total citations from all 3701 publications. Most of the publications are in computer science, engineering, and mathematics. The United States, the United Kingdom, and China have the highest collaboration across countries. The focus on information theoretic is slowly shifting from mathematical models to technology-driven applications such as machine learning and robotics. This study highlights the trends and developments of information theoretic publications, which helps researchers to understand the state of the art of information theoretic approaches for future contributions in this research domain. Full article
Show Figures

Figure 1

25 pages, 409 KiB  
Article
Information Theoretic Methods for Variable Selection—A Review
by Jan Mielniczuk
Entropy 2022, 24(8), 1079; https://doi.org/10.3390/e24081079 - 04 Aug 2022
Cited by 6 | Viewed by 1628
Abstract
We review the principal information theoretic tools and their use for feature selection, with the main emphasis on classification problems with discrete features. Since it is known that empirical versions of conditional mutual information perform poorly for high-dimensional problems, we focus on various [...] Read more.
We review the principal information theoretic tools and their use for feature selection, with the main emphasis on classification problems with discrete features. Since it is known that empirical versions of conditional mutual information perform poorly for high-dimensional problems, we focus on various ways of constructing its counterparts and the properties and limitations of such methods. We present a unified way of constructing such measures based on truncation, or truncation and weighing, for the Möbius expansion of conditional mutual information. We also discuss the main approaches to feature selection which apply the introduced measures of conditional dependence, together with the ways of assessing the quality of the obtained vector of predictors. This involves discussion of recent results on asymptotic distributions of empirical counterparts of criteria, as well as advances in resampling. Full article
49 pages, 1768 KiB  
Article
Multivariate Time Series Imputation: An Approach Based on Dictionary Learning
by Xiaomeng Zheng, Bogdan Dumitrescu, Jiamou Liu and Ciprian Doru Giurcăneanu
Entropy 2022, 24(8), 1057; https://doi.org/10.3390/e24081057 - 31 Jul 2022
Viewed by 1607
Abstract
The problem addressed by dictionary learning (DL) is the representation of data as a sparse linear combination of columns of a matrix called dictionary. Both the dictionary and the sparse representations are learned from the data. We show how DL can be employed [...] Read more.
The problem addressed by dictionary learning (DL) is the representation of data as a sparse linear combination of columns of a matrix called dictionary. Both the dictionary and the sparse representations are learned from the data. We show how DL can be employed in the imputation of multivariate time series. We use a structured dictionary, which is comprised of one block for each time series and a common block for all the time series. The size of each block and the sparsity level of the representation are selected by using information theoretic criteria. The objective function used in learning is designed to minimize either the sum of the squared errors or the sum of the magnitudes of the errors. We propose dimensionality reduction techniques for the case of high-dimensional time series. For demonstrating how the new algorithms can be used in practical applications, we conduct a large set of experiments on five real-life data sets. The missing data (MD) are simulated according to various scenarios where both the percentage of MD and the length of the sequences of MD are considered. This allows us to identify the situations in which the novel DL-based methods are superior to the existing methods. Full article
Show Figures

Figure 1

Back to TopTop