Submit to Special Issue Submit Abstract to Special Issue Review for Entropy Propose a Special Issue

Journal Menu

Journal Browser

Information-Theoretic Methods in Deep Learning: Theory and Applications

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Published Papers

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: 15 May 2024 | Viewed by 12405

Share This Special Issue

Special Issue Editors

Dr. Shuangming Yang

E-Mail Website
Guest Editor

School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
Interests: information theoretic learning; information bottleneck; deep learning; artificial general intelligence; correntropy

Dr. Shujian Yu

E-Mail Website
Guest Editor

Department of Computer Science, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, The Netherlands
Interests: information theory of deep neural network; explainable/interpretable AI; machine learning in non-stationary environments; time series analysis; brain network analysis

Dr. Luis Gonzalo Sánchez Giraldo

E-Mail Website
Guest Editor

Department of Electrical and Computer Engineering, University of Kentucky, Lexington, KY 40506, USA
Interests: machine learning for signal processing; information theoretic learning; representation learning; computer vision; computational neuroscience
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. Badong Chen

E-Mail Website
Guest Editor

Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, 28 Xianning West Road, Xi'an 710049, China
Interests: information theoretic learning; artificial intelligence; cognitive science; adaptive filtering; brain machine learning; robotics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Information theory is a mathematical infrastructure to deal with manipulation of information. It has a significant influence on the design of efficient and reliable communication systems. Information theoretic learning (ITL) has attracted increasing attention in the field of deep learning in recent years. It provides useful descriptions of the underlying behavior of random variables or processes to develop and analyze deep models. Novel ITL estimators and principles have been used for different deep learning problems, such as mutual information neural estimator for representation learning with the information maximization principle; and the principle of relevant information for redundancy compression and graph sparsification. As a vital approach to describe performance constraints and design mappings, ITL has essential applications in supervised, unsupervised and reinforcement learning problems, such as classification, clustering, and sequential decision making. In this field, information bottleneck (IB) aims at the right balance between data fit and generalization based on the mutual information as both a regularizer and a cost function. The IB theory helps to better understand the basic limits of learning problems, such as the learning performance of deep neural networks, geometric clustering, and extracting the Gaussian part of a signal, etc.

In recent years, researchers have revealed that ITL provides a powerful paradigm for analyzing neural networks by shedding light on the layered structure, generalization capabilities and learning dynamics. For example, the IB theory have demonstrated great potential to solve critical problems in deep learning, including understanding and analyzing black-box neural networks, and serving as an optimization criterion for training deep neural networks. Divergence estimation is another approach with a broad range of applications including domain shift detection, domain adaptation, generative modeling, and model regularization

With the development of ITL theory, we believe that ITL can provide new perspectives, theories, and algorithms to the challenging problems of deep learning. Therefore, this Special Issue aims at reporting the latest developments on ITL methods and their applications. Topics of interest include but are not limited to:

Information-Theoretic Quantities and Estimators;
Information-Theoretic Principles and Regularization in deep neural networks;
Interpretation and explanation of deep learning models with information-theoretic methods;
Information theoretic methods for distributed deep learning;
Information theoretic methods for brain inspired neural networks;
Information Bottleneck in deep representation learning;
Representation learning beyond the Information Bottleneck, such as total correlation explanation and principles of relevant information.

Dr. Shuangming Yang
Dr. Shujian Yu
Dr. Luis Gonzalo Sánchez Giraldo
Prof. Dr. Badong Chen
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

information theoretic learning
information bottleneck
deep learning
neural networks

Published Papers (10 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

Jump to: Review

23 pages, 927 KiB

Open AccessArticle

PyDTS: A Python Toolkit for Deep Learning Time Series Modelling

by Pascal A. Schirmer and Iosif Mporas

Entropy 2024, 26(4), 311; https://doi.org/10.3390/e26040311 - 31 Mar 2024

Viewed by 967

Abstract

In this article, the topic of time series modelling is discussed. It highlights the criticality of analysing and forecasting time series data across various sectors, identifying five primary application areas: denoising, forecasting, nonlinear transient modelling, anomaly detection, and degradation modelling. It further outlines the mathematical frameworks employed in a time series modelling task, categorizing them into statistical, linear algebra, and machine- or deep-learning-based approaches, with each category serving distinct dimensions and complexities of time series problems. Additionally, the article reviews the extensive literature on time series modelling, covering statistical processes, state space representations, and machine and deep learning applications in various fields. The unique contribution of this work lies in its presentation of a Python-based toolkit for time series modelling (PyDTS) that integrates popular methodologies and offers practical examples and benchmarking across diverse datasets. Full article

(This article belongs to the Special Issue Information-Theoretic Methods in Deep Learning: Theory and Applications)

► Show Figures

Figure 1

24 pages, 1055 KiB

Open AccessArticle

A Unifying Generator Loss Function for Generative Adversarial Networks

by Justin Veiner, Fady Alajaji and Bahman Gharesifard

Entropy 2024, 26(4), 290; https://doi.org/10.3390/e26040290 - 27 Mar 2024

Viewed by 505

Abstract

A unifying

α

-parametrized generator loss function is introduced for a dual-objective generative adversarial network (GAN) that uses a canonical (or classical) discriminator loss function such as the one in the original GAN (VanillaGAN) system. The generator loss function is based on a [...] Read more.

A unifying

α

L_{α}

, and the resulting GAN system is termed

L_{α}

-GAN. Under an optimal discriminator, it is shown that the generator’s optimization problem consists of minimizing a Jensen-

f_{α}

-divergence, a natural generalization of the Jensen-Shannon divergence, where

f_{α}

is a convex function expressed in terms of the loss function

L_{α}

. It is also demonstrated that this

L_{α}

-GAN problem recovers as special cases a number of GAN problems in the literature, including VanillaGAN, least squares GAN (LSGAN), least kth-order GAN (LkGAN), and the recently introduced

(α_{D}, α_{G})

-GAN with

α_{D} = 1

. Finally, experimental results are provided for three datasets—MNIST, CIFAR-10, and Stacked MNIST—to illustrate the performance of various examples of the

L_{α}

-GAN system. Full article

(This article belongs to the Special Issue Information-Theoretic Methods in Deep Learning: Theory and Applications)

► Show Figures

Figure 1

22 pages, 1728 KiB

Open AccessArticle

Ensemble Transductive Propagation Network for Semi-Supervised Few-Shot Learning

by Xueling Pan, Guohe Li and Yifeng Zheng

Entropy 2024, 26(2), 135; https://doi.org/10.3390/e26020135 - 31 Jan 2024

Cited by 1 | Viewed by 723

Abstract

Few-shot learning aims to solve the difficulty in obtaining training samples, leading to high variance, high bias, and over-fitting. Recently, graph-based transductive few-shot learning approaches supplement the deficiency of label information via unlabeled data to make a joint prediction, which has become a new research hotspot. Therefore, in this paper, we propose a novel ensemble semi-supervised few-shot learning strategy via transductive network and Dempster–Shafer (D-S) evidence fusion, named ensemble transductive propagation networks (ETPN). First, we present homogeneity and heterogeneity ensemble transductive propagation networks to better use the unlabeled data, which introduce a preset weight coefficient and provide the process of iterative inferences during transductive propagation learning. Then, we combine the information entropy to improve the D-S evidence fusion method, which improves the stability of multi-model results fusion from the pre-processing of the evidence source. Third, we combine the L2 norm to improve an ensemble pruning approach to select individual learners with higher accuracy to participate in the integration of the few-shot model results. Moreover, interference sets are introduced to semi-supervised training to improve the anti-disturbance ability of the mode. Eventually, experiments indicate that the proposed approaches outperform the state-of-the-art few-shot model. The best accuracy of ETPN increases by 0.3% and 0.28% in the 5-way 5-shot, and by 3.43% and 7.6% in the 5-way 1-shot on miniImagNet and tieredImageNet, respectively. Full article

(This article belongs to the Special Issue Information-Theoretic Methods in Deep Learning: Theory and Applications)

► Show Figures

Figure 1

17 pages, 857 KiB

Open AccessArticle

Deep Individual Active Learning: Safeguarding against Out-of-Distribution Challenges in Neural Networks

by Shachar Shayovitz, Koby Bibas and Meir Feder

Entropy 2024, 26(2), 129; https://doi.org/10.3390/e26020129 - 31 Jan 2024

Viewed by 604

Abstract

Active learning (AL) is a paradigm focused on purposefully selecting training data to enhance a model’s performance by minimizing the need for annotated samples. Typically, strategies assume that the training pool shares the same distribution as the test set, which is not always valid in privacy-sensitive applications where annotating user data is challenging. In this study, we operate within an individual setting and leverage an active learning criterion which selects data points for labeling based on minimizing the min-max regret on a small unlabeled test set sample. Our key contribution lies in the development of an efficient algorithm, addressing the challenging computational complexity associated with approximating this criterion for neural networks. Notably, our results show that, especially in the presence of out-of-distribution data, the proposed algorithm substantially reduces the required training set size by up to 15.4%, 11%, and 35.1% for CIFAR10, EMNIST, and MNIST datasets, respectively. Full article

(This article belongs to the Special Issue Information-Theoretic Methods in Deep Learning: Theory and Applications)

► Show Figures

Figure 1

16 pages, 2785 KiB

Open AccessArticle

Continual Reinforcement Learning for Quadruped Robot Locomotion

by Sibo Gai, Shangke Lyu, Hongyin Zhang and Donglin Wang

Entropy 2024, 26(1), 93; https://doi.org/10.3390/e26010093 - 22 Jan 2024

Cited by 1 | Viewed by 1045

Abstract

The ability to learn continuously is crucial for a robot to achieve a high level of intelligence and autonomy. In this paper, we consider continual reinforcement learning (RL) for quadruped robots, which includes the ability to continuously learn sub-sequential tasks (plasticity) and maintain performance on previous tasks (stability). The policy obtained by the proposed method enables robots to learn multiple tasks sequentially, while overcoming both catastrophic forgetting and loss of plasticity. At the same time, it achieves the above goals with as little modification to the original RL learning process as possible. The proposed method uses the Piggyback algorithm to select protected parameters for each task, and reinitializes the unused parameters to increase plasticity. Meanwhile, we encourage the policy network exploring by encouraging the entropy of the soft network of the policy network. Our experiments show that traditional continual learning algorithms cannot perform well on robot locomotion problems, and our algorithm is more stable and less disruptive to the RL training progress. Several robot locomotion experiments validate the effectiveness of our method. Full article

(This article belongs to the Special Issue Information-Theoretic Methods in Deep Learning: Theory and Applications)

► Show Figures

Figure 1

15 pages, 656 KiB

Open AccessArticle

A Deep Neural Network Regularization Measure: The Class-Based Decorrelation Method

by Chenguang Zhang, Tian Liu and Xuejiao Du

Entropy 2024, 26(1), 7; https://doi.org/10.3390/e26010007 - 20 Dec 2023

Viewed by 950

Abstract

In response to the challenge of overfitting, which may lead to a decline in network generalization performance, this paper proposes a new regularization technique, called the class-based decorrelation method (CDM). Specifically, this method views the neurons in a specific hidden layer as base learners, and aims to boost network generalization as well as model accuracy by minimizing the correlation among individual base learners while simultaneously maximizing their class-conditional correlation. Intuitively, CDM not only promotes diversity among the hidden neurons, but also enhances their cohesiveness among them when processing samples from the same class. Comparative experiments conducted on various datasets using deep models demonstrate that CDM effectively reduces overfitting and improves classification performance. Full article

(This article belongs to the Special Issue Information-Theoretic Methods in Deep Learning: Theory and Applications)

► Show Figures

Figure 1

21 pages, 913 KiB

Open AccessArticle

Analysis of Deep Convolutional Neural Networks Using Tensor Kernels and Matrix-Based Entropy

by Kristoffer K. Wickstrøm, Sigurd Løkse, Michael C. Kampffmeyer, Shujian Yu, José C. Príncipe and Robert Jenssen

Entropy 2023, 25(6), 899; https://doi.org/10.3390/e25060899 - 03 Jun 2023

Viewed by 1344

Abstract

Analyzing deep neural networks (DNNs) via information plane (IP) theory has gained tremendous attention recently to gain insight into, among others, DNNs’ generalization ability. However, it is by no means obvious how to estimate the mutual information (MI) between each hidden layer and the input/desired output to construct the IP. For instance, hidden layers with many neurons require MI estimators with robustness toward the high dimensionality associated with such layers. MI estimators should also be able to handle convolutional layers while at the same time being computationally tractable to scale to large networks. Existing IP methods have not been able to study truly deep convolutional neural networks (CNNs). We propose an IP analysis using the new matrix-based Rényi’s entropy coupled with tensor kernels, leveraging the power of kernel methods to represent properties of the probability distribution independently of the dimensionality of the data. Our results shed new light on previous studies concerning small-scale DNNs using a completely new approach. We provide a comprehensive IP analysis of large-scale CNNs, investigating the different training phases and providing new insights into the training dynamics of large-scale neural networks. Full article

(This article belongs to the Special Issue Information-Theoretic Methods in Deep Learning: Theory and Applications)

► Show Figures

Figure 1

49 pages, 10680 KiB

Open AccessArticle

Multivariate Time Series Information Bottleneck

by Denis Ullmann, Olga Taran and Slava Voloshynovskiy

Entropy 2023, 25(5), 831; https://doi.org/10.3390/e25050831 - 22 May 2023

Cited by 1 | Viewed by 1973

Abstract

Time series (TS) and multiple time series (MTS) predictions have historically paved the way for distinct families of deep learning models. The temporal dimension, distinguished by its evolutionary sequential aspect, is usually modeled by decomposition into the trio of “trend, seasonality, noise”, by attempts to copy the functioning of human synapses, and more recently, by transformer models with self-attention on the temporal dimension. These models may find applications in finance and e-commerce, where any increase in performance of less than

1 %

has large monetary repercussions, they also have potential applications in natural language processing (NLP), medicine, and physics. To the best of our knowledge, the information bottleneck (IB) framework has not received significant attention in the context of TS or MTS analyses. One can demonstrate that a compression of the temporal dimension is key in the context of MTS. We propose a new approach with partial convolution, where a time sequence is encoded into a two-dimensional representation resembling images. Accordingly, we use the recent advances made in image extension to predict an unseen part of an image from a given one. We show that our model compares well with traditional TS models, has information–theoretical foundations, and can be easily extended to more dimensions than only time and space. An evaluation of our multiple time series–information bottleneck (MTS-IB) model proves its efficiency in electricity production, road traffic, and astronomical data representing solar activity, as recorded by NASA’s interface region imaging spectrograph (IRIS) satellite. Full article

(This article belongs to the Special Issue Information-Theoretic Methods in Deep Learning: Theory and Applications)

► Show Figures

Figure 1

19 pages, 3673 KiB

Open AccessArticle

Position-Wise Gated Res2Net-Based Convolutional Network with Selective Fusing for Sentiment Analysis

by Jinfeng Zhou, Xiaoqin Zeng, Yang Zou and Haoran Zhu

Entropy 2023, 25(5), 740; https://doi.org/10.3390/e25050740 - 30 Apr 2023

Viewed by 1134

Abstract

Sentiment analysis (SA) is an important task in natural language processing in which convolutional neural networks (CNNs) have been successfully applied. However, most existing CNNs can only extract predefined, fixed-scale sentiment features and cannot synthesize flexible, multi-scale sentiment features. Moreover, these models’ convolutional and pooling layers gradually lose local detailed information. In this study, a new CNN model based on residual network technology and attention mechanisms is proposed. This model exploits more abundant multi-scale sentiment features and addresses the loss of locally detailed information to enhance the accuracy of sentiment classification. It is primarily composed of a position-wise gated Res2Net (PG-Res2Net) module and a selective fusing module. The PG-Res2Net module can adaptively learn multi-scale sentiment features over a large range using multi-way convolution, residual-like connections, and position-wise gates. The selective fusing module is developed to fully reuse and selectively fuse these features for prediction. The proposed model was evaluated using five baseline datasets. The experimental results demonstrate that the proposed model surpassed the other models in performance. In the best case, the model outperforms the other models by up to 1.2%. Ablation studies and visualizations further revealed the model’s ability to extract and fuse multi-scale sentiment features. Full article

(This article belongs to the Special Issue Information-Theoretic Methods in Deep Learning: Theory and Applications)

► Show Figures

Figure 1

Review

Jump to: Research

28 pages, 570 KiB

Open AccessReview

To Compress or Not to Compress—Self-Supervised Learning and Information Theory: A Review

by Ravid Shwartz Ziv and Yann LeCun

Entropy 2024, 26(3), 252; https://doi.org/10.3390/e26030252 - 12 Mar 2024

Cited by 18 | Viewed by 1943

Abstract

Deep neural networks excel in supervised learning tasks but are constrained by the need for extensive labeled data. Self-supervised learning emerges as a promising alternative, allowing models to learn without explicit labels. Information theory has shaped deep neural networks, particularly the information bottleneck principle. This principle optimizes the trade-off between compression and preserving relevant information, providing a foundation for efficient network design in supervised contexts. However, its precise role and adaptation in self-supervised learning remain unclear. In this work, we scrutinize various self-supervised learning approaches from an information-theoretic perspective, introducing a unified framework that encapsulates the self-supervised information-theoretic learning problem. This framework includes multiple encoders and decoders, suggesting that all existing work on self-supervised learning can be seen as specific instances. We aim to unify these approaches to understand their underlying principles better and address the main challenge: many works present different frameworks with differing theories that may seem contradictory. By weaving existing research into a cohesive narrative, we delve into contemporary self-supervised methodologies, spotlight potential research areas, and highlight inherent challenges. Moreover, we discuss how to estimate information-theoretic quantities and their associated empirical problems. Overall, this paper provides a comprehensive review of the intersection of information theory, self-supervised learning, and deep neural networks, aiming for a better understanding through our proposed unified approach. Full article

(This article belongs to the Special Issue Information-Theoretic Methods in Deep Learning: Theory and Applications)

► Show Figures

Journal Menu

Journal Browser

Information-Theoretic Methods in Deep Learning: Theory and Applications

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (10 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI