Thermodynamics of the Ising Model Encoded in Restricted Boltzmann Machines

Gu, Jing; Zhang, Kai

doi:10.3390/e24121701

Open AccessArticle

Thermodynamics of the Ising Model Encoded in Restricted Boltzmann Machines

by

Jing Gu

¹

and

Kai Zhang

^1,2,*

¹

Division of Natural and Applied Sciences, Duke Kunshan University, Kunshan 215300, China

²

Data Science Research Center (DSRC), Duke Kunshan University, Kunshan 215300, China

^*

Author to whom correspondence should be addressed.

Entropy 2022, 24(12), 1701; https://doi.org/10.3390/e24121701

Submission received: 18 October 2022 / Revised: 13 November 2022 / Accepted: 17 November 2022 / Published: 22 November 2022

(This article belongs to the Section Statistical Physics)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The restricted Boltzmann machine (RBM) is a two-layer energy-based model that uses its hidden–visible connections to learn the underlying distribution of visible units, whose interactions are often complicated by high-order correlations. Previous studies on the Ising model of small system sizes have shown that RBMs are able to accurately learn the Boltzmann distribution and reconstruct thermal quantities at temperatures away from the critical point

T_{c}

. How the RBM encodes the Boltzmann distribution and captures the phase transition are, however, not well explained. In this work, we perform RBM learning of the

2 d

and

3 d

Ising model and carefully examine how the RBM extracts useful probabilistic and physical information from Ising configurations. We find several indicators derived from the weight matrix that could characterize the Ising phase transition. We verify that the hidden encoding of a visible state tends to have an equal number of positive and negative units, whose sequence is randomly assigned during training and can be inferred by analyzing the weight matrix. We also explore the physical meaning of the visible energy and loss function (pseudo-likelihood) of the RBM and show that they could be harnessed to predict the critical point or estimate physical quantities such as entropy.

Keywords:

restricted Boltzmann machine; Ising model; machine learning; statistical physics; phase transition; entropy estimation

1. Introduction

The tremendous success of deep learning in multiple areas over the last decade has really revived the interplay between physics and machine learning, in particular neural networks [1]. On the one hand, (statistical) physics ideas [2], such as the renormalization group (RG) [3], the energy landscape [4], free energy [5], glassy dynamics [6], jamming [7], Langevin dynamics [8], and field theory [9], shed some light on the interpretation of deep learning and statistical inference in general [10]. On the other hand, machine learning and deep learning tools are harnessed to solved a wide range of physics problems, such as interaction potential construction [11], phase transition detection [12,13], structure encoding [14], physical concepts’ discovery [15], and many others [16,17]. At the very intersection of these two fields lies the restricted Boltzmann machine (RBM) [18], which serves as a classical paradigm to investigate how an overarching perspective could benefit both sides.

The RBM uses hidden–visible connections to encode (high-order) correlations between visible units [19]. Its precursor—the (unrestricted) Boltzmann machine—was inspired by spin glasses [20,21] and is often used in the inverse Ising problem to infer physical parameters [22,23,24]. The restriction of hidden–hidden and visible–visible connections in RBMs allows for more efficient training algorithms and, therefore, leads to recent applications in Monte Carlo simulation acceleration [25], quantum wavefunction representation [26,27], and polymer configuration generation [28]. Deep neural networks formed by stacks of RBMs have been mapped onto the variational RG due to their conceptual similarity [29]. RBMs are also shown to be equivalent to tensor network states from quantum many-body physics [30] and interpretable in light of statistical thermodynamics [31,32,33]. As simple as it seems, energy-based models like the RBM could eventually become the building blocks of autonomous machine intelligence [34].

Besides the above-mentioned efforts, the RBM has also been applied extensively in the study of the minimal model for second-order phase transition—the Ising model. For the small systems under investigation, it was found that RBMs with an enough hidden units can encode the Boltzmann distribution, reconstruct thermal quantities, and generate new Ising configurations fairly well [35,36,37]. The visible → hidden → visible ⋯ generating sequence of the RBM can be mapped onto an RG flow in physical temperature (often towards the critical point) [38,39,40,41,42]. However, the mechanism and power of the RBM to capture physics concepts and principles have not been fully explored. First, in what way is the Boltzmann distribution of the Ising model learned by the RBM? Second, can the RBM learn and even quantitatively predict the phase transition without extra human knowledge? An affirmative answer to the second question is particularly appealing, because simple unsupervised learning methods such as principal component analysis (PCA) using configuration information alone do not provide quantitative prediction for the transition temperature [43,44,45] and supervised learning with neural networks requires human labeling of the phase type or temperature of a given configuration [46,47].

In this article, we report a detailed numerical study on RBM learning of the Ising model with a system size much larger than those used previously. The purpose is to thoroughly dissect the various parts of the RBM and reveal how each part contributes to the learning of the Boltzmann distribution of the input Ising configurations. Such understanding allows us to extract several useful machine learning estimators or predictors for physical quantities, such as entropy and phase transition temperature. Conversely, the analysis of a physical model helps us to obtain important insights about the meaning of RBM parameters and functions, such as the weight matrix, visible energy, and pseudo-likelihood. Below, we first introduce our Ising datasets and the RBM and its training protocols in Section 2. We then report and discuss the results of the model parameters, hidden layers, visible energy, and pseudo-likelihood in Section 3. After the conclusion, more details about the Ising model and the RBM are provided in the Appendix A, Appendix B and Appendix C. Sample codes of the RBM are shared on GitHub at https://github.com/Jing-DS/isingrbm (accessed on 18 November 2022).

2. Models and Methods

2.1. Dataset of Ising Configurations Generated by Monte Carlo Simulations

The Hamiltonian of the Ising model with

N = L^{d}

spins in a configuration

s = {[s_{1}, s_{2}, \dots, s_{N}]}^{T}

on a d-dimensional hypercubic lattice of linear dimension L in the absence of a magnetic field is

\begin{matrix} H (s) = - J \sum_{〈 i, j 〉} s_{i} s_{j} \end{matrix}

(1)

where the spin variable

s_{i} = \pm 1

(

i = 1, 2, \dots, N

), the coupling parameter

J > 0

(set to unity) favors ferromagnetic configurations (parallel spins), and the notation

〈 i, j 〉

means to sum over nearest neighbors [48]. At a given temperature T, the configuration

s

drawn from the sample space of

2^{N}

states follows the Boltzmann distribution

\begin{matrix} p_{T} (s) = \frac{e^{- \frac{H (s)}{k_{B} T}}}{Z_{T}} \end{matrix}

(2)

where

Z_{T} = \sum_{s} e^{- \frac{H (s)}{k_{B} T}}

is the partition function. The Boltzmann constant

k_{B}

is set to unity.

Using single-flip Monte Carlo simulations under periodic boundary conditions [49], we generate Ising configurations for two-dimensional (

2 d

) systems (

d = 2

) of

L = 64

(

N = 4096

) at

n_{T} = 16

temperatures

T = 0.25, 0.5, 0.75, 1.0, \dots, 4.0

(in units of

J / k_{B}

) and for three-dimensional (

3 d

) systems

(d = 3)

of

L = 16

(

N = 4096

) at

n_{T} = 20

temperatures

T = 2.5, 2.75

,

3.0, 3.25

,

3.5, 3.75, 4.0

,

4.25, 4.3, 4.4, 4.5

,

4.6, 4.7, 4.75, 5.0, 5.25

,

5.5, 5.75, 6.0, 6.25

. After being fully equilibrated,

M =

50,000 configurations at each T are collected into a dataset

D_{T}

for that T. For

2 d

systems, we also use a dataset

D_{\cup T}

consisting of 50,000 configurations per temperature from all Ts.

Analytical results of the thermal quantities of the

2 d

Ising model, such as internal energy

〈 E 〉

, (physical) entropy S, heat capacity

C_{V}

, and magnetization

〈 m 〉

, are well known [50,51,52,53]. Numerical simulation methods and results of the

3 d

Ising model have also been reported [54]. The thermodynamic definitions and relations used in this work are summarized in Appendix A.

2.2. Restricted Boltzmann Machine

The restricted Boltzmann machine (RBM) is a two-layer energy-based model with

n_{h}

hidden units (or neurons)

h_{i} = \pm 1

(

i = 1, 2, \dots, n_{h}

) in the hidden layer, whose state vector is

h = {[h_{1}, h_{2}, \dots, h_{n_{h}}]}^{T}

, and

n_{v}

visible units

v_{j} = \pm 1

(

j = 1, 2, \dots, n_{v}

) in the visible layer, whose state vector is

v = {[v_{1}, v_{2}, \dots, v_{n_{v}}]}^{T}

(Figure 1) [55]. In this work, the visible layer is just the Ising configuration vector, i.e.,

v = s

, with

n_{v} = N

. We chose the binary unit

{- 1, + 1}

(instead of

{0, 1}

) to better align with the definition of Ising spin variable

s_{i}

.

The total energy

E_{θ} (v, h)

of the RBM is defined as

\begin{matrix} E_{θ} (v, h) & = - b^{T} v - c^{T} h - h^{T} W v \\ = - \sum_{j = 1}^{n_{v}} b_{j} v_{j} - \sum_{i = 1}^{n_{h}} c_{i} h_{i} - \sum_{i = 1}^{n_{h}} \sum_{j = 1}^{n_{v}} W_{i j} h_{i} v_{j} \end{matrix}

(3)

where

b = {[b_{1}, b_{2}, \dots, b_{n_{v}}]}^{T}

is the visible bias,

c = {[c_{1}, c_{2}, \dots, c_{n_{h}}]}^{T}

is the hidden bias, and

W_{n_{h} \times n_{v}} = [\begin{matrix} - w_{1}^{T} - \\ - w_{2}^{T} - \\ ⋮ \\ - w_{n_{h}}^{T} - \end{matrix}] = [\begin{matrix} | & | & | \\ w_{:, 1} & w_{:, 2} & \dots & w_{:, n_{v}} \\ | & | & | \end{matrix}]

(4)

is the interaction weight matrix between visible and hidden units. Under this notation, each row vector

w_{i}^{T}

(of dimension

n_{v}

) is a filter mapping from the visible state

v

to a hidden unit i, and each column vector

w_{:, j}

(of dimension

n_{h}

) is an inverse filter mapping from the hidden state

h

to a visible unit j. All parameters are collectively written as

θ = {W, b, c}

. “Restricted” refers to the lack of interaction between hidden units or between visible units.

The joint distribution for an overall state

(v, h)

is

p_{θ} (v, h) = \frac{e^{- E_{θ} (v, h)}}{Z_{θ}}

(5)

where the partition function of the RBM:

Z_{θ} = \sum_{v} \sum_{h} e^{- E_{θ} (v, h)} .

(6)

The learned model distribution for visible state

v

is from the marginalization of

p_{θ} (v, h)

:

p_{θ} (v) = \sum_{h} p_{θ} (v, h) = \frac{1}{Z_{θ}} e^{- E_{θ} (v)},

(7)

where the visible energy (an effective energy for visible state

v

(often termed as “free energy” in the machine learning literature)):

\begin{matrix} E_{θ} (v) & = - b^{T} v - \sum_{i = 1}^{n_{h}} ln (e^{- w_{i}^{T} v - c_{i}} + e^{w_{i}^{T} v + c_{i}}) \end{matrix}

(8)

is defined according to

e^{- E_{θ} (v)} = \sum_{h} e^{- E_{θ} (v, h)}

such that

Z_{θ} = \sum_{v} e^{- E_{θ} (v)} .

See Appendix B for a detailed derivation.

The conditional distributions to generate

h

from

v

,

p_{θ} (h | v)

, and to generate

v

from

h

,

p_{θ} (v | h)

, satisfying

p_{θ} (v, h) = p_{θ} (h | v) p_{θ} (v) = p_{θ} (v | h) p_{θ} (h)

, can be written as products:

\begin{matrix} p_{θ} (h | v) & = \prod_{i = 1}^{n_{h}} p_{θ} (h_{i} | v) \\ p_{θ} (v | h) & = \prod_{j = 1}^{n_{v}} p_{θ} (v_{j} | h) \end{matrix}

(9)

because

h_{i}

are independent of each other (at fixed

v

) and

v_{j}

are independent of each other (at fixed

h

). It can be shown that

\begin{matrix} p_{θ} (h_{i} = 1 | v) & = σ (2 (c_{i} + w_{i}^{T} v)) \\ p_{θ} (h_{i} = - 1 | v) & = 1 - σ (2 (c_{i} + w_{i}^{T} v)) \\ p_{θ} (v_{j} = 1 | h) & = σ (2 (b_{j} + h^{T} w_{:, j})) \\ p_{θ} (v_{j} = - 1 | h) & = 1 - σ (2 (b_{j} + h^{T} w_{:, j})) \end{matrix}

(10)

where the sigmoid function

σ (z) = \frac{1}{1 + e^{- z}}

(Appendix B).

2.3. Loss Function and Training of RBMs

Given the dataset

D = {[v_{1}, v_{2}, \dots, v_{M}]}^{T}

of M samples generated independently from the identical data distribution

p_{D} (v)

(

v \overset{i . i . d .}{\sim} p_{D} (v)

), the goal of RBM learning is to find a model distribution

p_{θ} (v)

that approximates

p_{D} (v)

. In the context of this work, the data samples

v

s are Ising configurations, and the data distribution

p_{D} (v)

is or is related to the Ising–Boltzmann distribution

p_{T} (s)

.

Based on maximum likelihood estimation, the optimal parameters

θ^{*} = arg min_{θ} L (θ)

can be found by minimizing the negative log likelihood:

L (θ) = {〈 - ln p_{θ} (v) 〉}_{v \sim p_{D}} = {〈 E_{θ} (v) 〉}_{v \sim p_{D}} + ln Z_{θ}

(11)

which serves as the loss function of RBM learning. Note that the partition function

Z_{θ}

only depends on the model, not on the data. Since the calculation of

Z_{θ}

involves summation over all possible

(v, h)

states, which is not feasible,

L (θ)

cannot be evaluated exactly, except for very small systems [56]. Special treatments have to be devised, for example by mean-field theory [57] or by importance sampling methods [58]. An interesting feature of the RBM is that, although the actual loss function

L (θ)

is not accessible, its gradient:

\nabla_{θ} L (θ) = {〈 \nabla_{θ} E_{θ} (v) 〉}_{v \sim p_{D}} - {〈 \nabla_{θ} E_{θ} (v) 〉}_{v \sim p_{θ}}

(12)

can be sampled, which enables a gradient descent learning algorithm. From step t to step

t + 1

, the model parameters are updated with learning rate

η

as

θ_{t + 1} = θ_{t} - η \nabla_{θ} L (θ_{t}) .

(13)

To evaluate the loss function, we used its approximate, the pseudo-(negative log) likelihood [59]:

\tilde{L} (θ) = {〈- \sum_{i = 1}^{n_{v}} ln p_{θ} (v_{i} | v_{j \neq i})〉}_{v \sim p_{D}} \approx L (θ)

(14)

where the notation:

\begin{matrix} p_{θ} (v_{i} | v_{j \neq i}) = p_{θ} (v_{i} | v_{j} for j \neq i) = \frac{e^{- E_{θ} (v)}}{e^{- E_{θ} (v)} + e^{- E_{θ} ([v_{1}, \dots, - v_{i}, \dots, v_{n_{v}}])}} \end{matrix}

(15)

is the conditional probability for component

v_{i}

given that all the other components

v_{j}

(j \neq i)

are fixed [37]. Practically, to avoid the time-consuming sum over all visible units

\sum_{i = 1}^{n_{v}}

, it is suggested to randomly sample one

i_{0} \in {1, 2, \dots, n_{v}}

and estimate that:

\tilde{L} (θ) \approx {〈- n_{v} ln p_{θ} (v_{i_{0}} | v_{j \neq i_{0}})〉}_{v \sim p_{D}},

(16)

if all the visible units are on average translation-invariant [60]. To monitor the reconstruction error, we also calculated the cross-entropy

CE

between the initial configuration

v

and the conditional probability

p_{θ} (v^{'} | h)

for reconstruction

v \overset{p_{θ} (h | v)}{⟶} h \overset{p_{θ} (v^{'} | h)}{⟶} v^{'}

(see Appendix C for the definition).

For both

2 d

and

3 d

Ising systems, we first trained single-temperature RBMs (T-RBM).

M =

50,000 Ising configurations at each T forming a dataset

D_{T}

are used to train one model such that there are

n_{T}

T-RBMs in total. While

n_{v} = N

, we tried various numbers of hidden units with

n_{h} = 400, 900, 1600, 2500

in

2 d

and

n_{h} = 400, 900, 1600

in

3 d

. For

2 d

systems, we also trained an all-temperature RBM (

\cup T

-RBM) for which 50,000 Ising configurations per temperature are drawn to compose a dataset

D_{\cup T}

of M = 50,000

n_{T} = 8 \times 10^{5}

samples. The number of hidden units for this

\cup T

-RBM is

n_{h} = 400, 900, 1600 .

Weight matrix

W

is initialized with Glorot normal initialization [61] (

b

and

c

are initialized as zero). Parameters are optimized with the stochastic gradient descent algorithm of learning rate

η = 1.0 \times 10^{- 4}

and batch size 128. The negative phase (model term) of the gradient

{〈 \nabla_{θ} E_{θ} (v) 〉}_{v \sim p_{θ}}

is calculated using CD-k Gibbs sampling with

k = 5

. We stopped the training until

\tilde{L}

and CE converged, typically at 100–2000 epochs (see the Supplementary Materials). Three Nvidia GPU cards (GeForce RTX 3090 and 2070) were used to train the model, which took about two minutes per epoch for a

M =

50,000 dataset.

3. Results and Discussion

In this section, we investigate how the RBM uses its weight matrix

W

and hidden layer

h

to encode the Boltzmann distributed states of the Ising model and what physical information can be extracted from machine learning concepts such as the visible energy and loss function.

3.1. Filters and Inverse Filters

It can be verified that the trained weight matrix elements

W_{i j}

of a T-RBM follow a Gaussian distribution of zero mean with the largest variance at

T \sim T_{c}

(Figure 2a) [62]. The low temperature distribution here is different from the uniform distribution observed in [35], which results from the uniform initialization scheme used there. This suggests that the training of RBMs could converge to different minima when initialized differently. According to Equation (10), the biases

c_{i}

and

b_{j}

can be associated with the activation threshold of a hidden unit and a visible unit, respectively. For example, whether a hidden unit is activated (

h_{i} = + 1

) or anti-activated (

h_{i} = - 1

) depends on whether the incoming signal

w_{i}^{T} v

from all visible units exceeds the threshold

- c_{i}

. The values of

c_{i}

(and

b_{j}

) are all close to zero and are often negligible in comparison with the total incoming signal

w_{i}^{T} v

(and

h^{T} w_{:, j}

) (see the Supplementary Materials for the results of constrained RBMs where all biases are set to zero). The distribution of

c_{i}

and

b_{j}

should in principle be symmetric about zero (Figure 2b,c). A non-zero mean can be caused by an unbalanced dataset with an unequal number of

m > 0

and

m < 0

Ising configurations. The corresponding filter or inverse filter sum may also be distributed with a non-zero mean in order to compensate the asymmetric bias, as will be shown next.

Since

v = s

is an Ising configuration with

\pm 1

units in our problem,

w_{i}^{T} v

will be more positive (or negative) if the components of

w_{i}^{T}

better match (or anti-match) the signs of the spin variables. In this sense, we can think of

w_{i}^{T}

as a filter extracting certain patterns in Ising configurations. Knowing the representative spin configurations of the Ising model below, close to, and above the critical temperature

T_{c}

, we expect that

w_{i}^{T}

(

i = 1, 2, \dots, n_{h}

) wrapped into an

L^{d}

arrangement exhibits similar features. In Figure 3a, we show sample filters of T-RBMs with

n_{h} = 400

trained for the

2 d

Ising model at three temperatures

T = 1.0, 2.25

, and

3.5

(see the Supplementary Materials for more examples of filters). At low T, the components of

w_{i}^{T}

tend to be mostly positive (or negative), matching the spin up (or spin down) configurations in the ferromagnetic phase. At high T, filters

w_{i}^{T}

possess strip domains consisting of roughly equal numbers of well-mixed positive and negative components, like Ising configurations during spinodal decomposition. Close to

T_{c}

, the

w_{i}^{T}

patterns vary dramatically from each other, in accord with the large critical fluctuation. In particular, some even exhibit hierarchical clusters of various sizes. The element sum of the filter—filter sum

sum (w_{i}^{T}) = \sum_{j = 1}^{n_{v}} W_{i j}

—plays a similar role as the magnetization m. The distribution of all the

n_{h}

filter sums at each T changes with increasing temperature as the Ising magnetization changes, from bimodal to unimodal with the largest variance at

T_{c}

(Figure 3b). This suggests that the peak of the variance

〈{|\sum_{j = 1}^{n_{v}} W_{i j}|}^{2}〉 - {〈|\sum_{j = 1}^{n_{v}} W_{i j}|〉}^{2}

as a function of temperature coincides with the Ising phase transition (inset of Figure 3b). More detailed results about the

2 d

and

3 d

Ising models are in the Supplementary Materials.

When a hidden layer

h

is provided, the RBM reconstructs the visible layer

v

by applying the

n_{v}

inverse filters

w_{:, j}

(

j = 1, 2, \dots, n_{v}

) on

h

. The distribution of the inverse filter sum

sum (w_{:, j}) = \sum_{i = 1}^{n_{h}} W_{i j}

is Gaussian with a mean close to zero (Figure 3c), where a large deviation from zero mean is accompanied by a non-zero average bias

\sum_{j} b_{j} / n_{v}

, as mentioned above (Figure 2b). We find that this is a result of the unbalanced dataset, which has ∼60%

m < 0

Ising configurations. Because the activation probability of a visible unit

v_{j}

is determined by

w_{:, j}

, the correlation between visible units (Ising spins) is reflected in the correlation between inverse filters. This is equivalent to the analysis of the

n_{v} \times n_{v}

matrix

W^{T} W

or its eigenvectors as in [38,42], whose entries are the inner product

w_{:, j}^{T} w_{:, j^{'}}

of inverse filters. We can therefore locate the Ising phase transition by identifying the temperature with the strongest correlation among the

w_{:, j}

s, e.g., the peak of

w_{:, j}^{T} w_{:, j^{'}}

at a given distance

r_{j j^{'}}

(inset of Figure 3c). See the Supplementary Materials for results in

2 d

and

3 d

.

In contrast, the filters of the

\cup T

-RBM trained from

2 d

Ising configurations at all temperatures have background patterns like the high temperature T-RBM (in the paramagnetic phase). A clear difference is that most

\cup T

-RBM filters have one large domain of positive or negative elements (Figure 4a), similar to the receptive field in a deep neural network [29]. This domain randomly covers an area of the visual field of the

L \times L

Ising configuration (see the Supplementary Materials for all the

n_{h}

filters). The existence of such domains in the filter causes the filter sum and the corresponding bias

c_{i}

to be positive or negative with a bimodal distribution (Figure 4b,c). The inverse filter sum and its corresponding bias

b_{j}

still have a Gaussian distribution, although the unbalanced dataset shifts the mean of

b_{j}

away from zero.

3.2. Hidden Layer

Whether a hidden unit uses

+ 1

or

- 1

to encode a pattern of the visible layer

v

is randomly assigned during training. In the former case, the filter

w_{i}^{T}

matches the pattern (

w_{i}^{T} v

is positive); in the latter case, the filter anti-matches the pattern (

w_{i}^{T} v

is negative). For a visible layer

v

of magnetization m, the sign of

w_{i}^{T} v

and the encoding

h_{i}

is largely determined by the sign of

sum (w_{i}^{T})

(Table 1). Since the distribution of

sum (w_{i}^{T})

is symmetric about zero, the hidden layer of a T-RBM roughly consists of an equal number of

+ 1

and

- 1

units—the “magnetization”

m_{h} = \frac{1}{n_{h}} \sum_{i = 1}^{n_{h}} h_{i}

of the hidden layer is always close to zero and its average

〈 m_{h} 〉 \approx 0

. The histogram of

m_{h}

for all hidden encodings of visible states is expected to be symmetric about zero (Figure 5). We found that, for the smallest

n_{h}

, the histogram of

m_{h}

at temperatures close to

T_{c}

is bimodal due to the relatively large randomness of small hidden layers. As more hidden units are added, the two peaks merge into one and the distribution of

m_{h}

becomes narrower. This suggests that a larger hidden layer tends to have a smaller deviation from

m_{h} = 0

.

The order of the

h_{i} = \pm 1

sequence in each hidden encoding

h

is arbitrary, but relatively fixed once the T-RBM is trained. The permutation of hidden units together with their corresponding filters (swap the rows of the matrix

W

) results in an equivalent T-RBM. Examples of hidden layers of T-RBMs with

n_{h} = 400

at different temperatures are shown in the inset of Figure 5, where the vector

h

is wrapped into a

20 \times 20

arrangement. Note that there are actually no spatial relationships between different hidden units, and any apparent pattern in this

2 d

illustration is an artifact of the wrapping protocol.

As a generative model, a T-RBM can be used to produce more Boltzmann-distributed Ising configurations. Starting from a random hidden state

h^{(0)}

, this is often fulfilled by a sequence of Markov chain moves

h^{(0)} \to v^{(0)} \to h^{(1)} \to v^{(1)} \to \dots

until the steady state is achieved [31]. Based on the above-mentioned observations, we can design an algorithm to initialize

h^{(0)}

that better captures the hidden encoding of visible states (equilibrium Ising configurations), thus enabling faster convergence of the Markov chain. After choosing a low temperature

T_{L}

and a high temperature

T_{H}

, we generate the hidden layer as follows:

At low $T \leq T_{L} < T_{c}$ , if $sum (w_{i}^{T}) > 0$ , $h_{i} = + 1$ ; if $sum (w_{i}^{T}) < 0$ , $h_{i} = - 1$ . This will be an encoding of an $m > 0$ ferromagnetic configuration. To encode of an $m < 0$ ferromagnetic configuration, just flip the sign of $h_{i}$ .
At high $T \geq T_{H} > T_{c}$ , randomly assign $h_{i} = + 1$ or $- 1$ with equal probability. This will be an encoding of a paramagnetic configuration with $m \approx 0$ .
At intermediate $T_{L} < T < T_{H}$ , to encode an $m > 0$ Ising configuration, if $sum (w_{i}^{T}) > 0$ , assign $h_{i} = + 1$ with probability $p_{h} \in (0.5, 1.0)$ and $h_{i} = - 1$ with probability $1 - p_{h}$ ; if $sum (w_{i}^{T}) < 0$ , assign $h_{i} = - 1$ with probability $p_{h} \in (0.5, 1.0)$ and $h_{i} = + 1$ with probability $1 - p_{h}$ . $p_{h}$ is a predetermined parameter, and the above two algorithms are just the special cases with $p_{h} = 1.0$ ( $T \leq T_{L}$ ) and $p_{h} = 0.5$ ( $T \geq T_{H}$ ), respectively. In practice, one may approximately use $p_{h} = (〈 | m | 〉 + 1) / 2$ or use linear interpolation within $T_{L} < T < T_{H}$ , $p_{h} = 0.5 + 0.5 (T - T_{L}) / (T_{H} - T_{L})$ .

Below, we compare the (one-step) reconstructed thermal quantities using two different initial hidden encodings with results from a conventional multi-step Markov chain (Figure 6). The hidden encoding methods proposed here are quite reliable at low and high T, but less accurate at T close to

T_{c}

.

3.3. Visible Energy

When a T-RBM for temperature T is trained, we expect that

p_{θ} (v) \approx p_{D} (v) \approx p_{T} (s)

—the Boltzmann distribution at that T. Although formally related to the physical energy in the Boltzmann factor (with temperature absorbed), the visible energy

E_{θ} (v)

of an RBM should be really considered as the negative log (relative) probability of a visible state

v

. For single-temperature T-RBMs, the mean visible energy

〈 E_{θ} (v) 〉

increases monotonically with temperature (except for the largest

n_{h}

, which might be due to overfitting) (Figure 7a,b). The value of

〈 E_{θ} (v) 〉

and its trend, however, cannot be used to identify the physical phase transition. In fact,

E_{θ} (v)

can differ from the reduced Hamiltonian

\frac{H (s)}{k_{B} T}

by an arbitrary (temperature-dependent) constant while still maintaining the Boltzmann distribution

p_{θ} (v) \approx p_{T} (s)

(if the partition function

Z_{θ}

is calibrated accordingly).

The trend of

〈 E_{θ} (v) 〉

for T-RBMs can be understood by considering following approximate forms. First, due to the symmetry of

+ 1

and

- 1

, the biases

b_{j}

and

c_{i}

are all close to zero. A constrained T-RBM with zero bias has a visible energy:

\begin{matrix} E_{W} (v) = - \sum_{i = 1}^{n_{h}} ln (e^{- w_{i}^{T} v} + e^{w_{i}^{T} v}) \end{matrix}

(17)

that approximates the visible energy of the full T-RBM, i.e.,

E_{θ} (v) \approx E_{W} (v)

. Next, unless

w_{i}^{T} v

is close to zero, one of the two exponential terms in Equation (17) always dominates such that

E_{W} (v) \approx {\tilde{E}}_{W} (v)

, where

\begin{matrix} {\tilde{E}}_{W} (v) = - \sum_{i = 1}^{n_{h}} |w_{i}^{T} v| = - \sum_{i = 1}^{n_{h}} |\sum_{j = 1}^{n_{v}} W_{i j} v_{j}| . \end{matrix}

(18)

Equation (18) can further be approximated by setting

v = 1

with all

v_{j} = + 1

, i.e.,

{\tilde{E}}_{W} (v) \approx {\tilde{E}}_{W} (1)

with

\begin{matrix} {\tilde{E}}_{W} (1) = - \sum_{i = 1}^{n_{h}} |sum (w_{i}^{T})| = - \sum_{i = 1}^{n_{h}} |\sum_{j = 1}^{n_{v}} W_{i j}| . \end{matrix}

(19)

In summary,

E_{W} (v)

,

{\tilde{E}}_{W} (v)

, and

{\tilde{E}}_{W} (1)

are all good approximations to the original

E_{θ} (v)

(Figure 7a). The increase of mean

〈 E_{θ} (v) 〉

with temperature coincides with the increase of

- |sum (w_{i}^{T})|

with temperature, which is evident from Figure 3b. At fixed temperature, the decrease of

〈 E_{θ} (v) 〉

with

n_{h}

is a consequence of the sum

\sum_{i = 1}^{n_{h}}

in the definition of visible energy. The variance

〈 E_{θ}^{2} 〉 - {〈 E_{θ} 〉}^{2}

is a useful quantity for phase transition detection, because it reflects the fluctuation of the probability

p_{θ} (v)

. In both low T ferromagnetic and high T paramagnetic regimes,

p_{θ} (v)

is relatively homogeneous among different states. When T is close to

T_{c}

, the variance of

p_{θ} (v)

and

E_{θ} (v)

is expected to peak (Figure 7d,e). The abnormal rounded (and even shifted) peaks at large

n_{h}

could be a sign of overfitting.

For the all-temperature

\cup T

-RBM, the Ising phase transition can be revealed by either the sharp increase of the mean

〈 E_{θ} (v) 〉

or the peak of the variance

〈 E_{θ}^{2} 〉 - {〈 E_{θ} 〉}^{2}

(Figure 7c,f). However, this apparent detection can be a trivial consequence of the special composition of the dataset

D_{\cup T}

, which contains Ising configurations at different temperatures in equal proportion. Only configurations at a specific T are fed into the model to calculate the average quantity at that T. Technically, a visible state

v

in

D_{\cup T}

is not subject to the Boltzmann distribution at any specific temperature. Instead, the true ensemble of

D_{\cup T}

is a collection of

n_{T}

different Boltzmann-distributed subsets. Many replicas of the same or similar ferromagnetic states are in

D_{\cup T}

, giving rise to a large multiplicity, high probability, and low visible energy for such states. In comparison, high temperature paramagnetic states are all different from each other and, therefore, have low

p_{θ} (v)

(high

E_{θ} (v)

) for each one of them. Knowing this caveat, one should be cautious when monitoring the visible energy of a

\cup T

-RBM to detect phase transition, because changing the proportion of Ising configurations at different temperatures in

D_{\cup T}

can modify the relative probability of each state.

3.4. Pseudo-Likelihood and Entropy Estimation

The likelihood

L (θ)

defined in Equation (11) is conceptually equivalent to the physical entropy S defined by the Gibbs entropy formula, apart from the Boltzmann constant

k_{B}

difference (Appendix A). However, just as entropy S cannot be directly sampled, the exact value of

L (θ)

is not accessible. In order to estimate S, we calculated the pseudo-likelihood

\tilde{L} (θ)

instead, which is based on the mean-field-like approximation

p_{θ} (v) \approx \prod_{i = 1}^{n_{v}} p_{θ} (v_{i} | v_{j \neq i})

. Similar ideas to estimate the free energy or entropy were put forward with the aid of variational autoregressive networks [63] or neural importance sampling [64]. The true and estimated entropy of the

2 d

and

3 d

Ising models using T-RBMs with different

n_{h}

are shown in Figure 8a,b. As a comparison, we also considered a “pseudo-entropy” with a similar approximation:

\tilde{S} = - k_{B} {〈\sum_{i = 1}^{N} p_{T} (s_{i} | s_{j \neq i})〉}_{s \sim p_{T}} \approx S

(20)

where the conditional probability:

p_{T} (s_{i} | s_{j \neq i}) = \frac{e^{- \frac{H (s)}{k_{B} T}}}{e^{- \frac{H (s)}{k_{B} T}} + e^{- \frac{H ([s_{1}, \dots, - s_{i}, \dots, s_{N}])}{k_{B} T}}}

(21)

and the ensemble average

{〈\dots〉}_{s \sim p_{T}}

is taken over states obtained from Monte Carlo sampling. In both

2 d

and

3 d

,

\tilde{S}

is lower than the true S, especially at high T, because a mean-field treatment tends to underestimate fluctuations.

While increasing model complexity by adding hidden units is usually believed to reduce the reconstruction error, e.g., of energy and heat capacity [35,36] (see also the Supplementary Materials), a recent study suggested that a trade-off could exist between the accuracy of different statistical quantities [65]. Here, we found that the pseudo-likelihood of T-RBMs with the fewest hidden units in our trials (

n_{h} = 400

) appears to provide the best prediction for entropy. Increasing

n_{h}

leads to larger deviations from the true S at higher T. The decreasing of

\tilde{L}

with

n_{h}

at fixed temperature agrees with the trend of the visible energy. A lower

E_{θ} (v)

corresponds to a higher

p_{θ} (v)

and, thus, a lower

\tilde{L}

according to its definition. The surprisingly good performance of

\tilde{L}

in approximating S could be due to the fact that visible units

v_{i}

in RBMs are only indirectly correlated through hidden units, which collectively serve as an effective mean-field on each visible unit. We also calculated

\tilde{L} (θ)

with the all-temperature

\cup T

-RBM in

2 d

(Figure 8c). Compared with single-temperature T-RBMs of the same

n_{h}

(Figure 8a), the

\cup T

-RBM predicts higher

\tilde{L} (θ)

with considerable deviations even at low T. The trend of

\tilde{L} (θ)

also agrees with that of

〈 E_{θ} (v) 〉

(Figure 7c).

The knowledge of the entropy allows us to estimate the phase transition point according to the thermodynamic relation

C_{V} = T \frac{d S}{d T}

. We constructed this estimated

C_{V}

as a function of temperature using

\tilde{L} (θ)

and its numerical fitting, whose peaks are expected to be located at

T_{c}

(Figure 9). The predicted

T_{c}

s are compared with the results from entropy and pseudo-entropy, as well as the Monte Carlo simulation results for our finite systems and the known exact values for infinite systems in Table 2. It can be seen that single-temperature T-RBMs capture the transition point fairly well within an error of about 1–

3 %

.

4. Conclusions

In this work, we trained RBMs using equilibrium Ising configurations in

2 d

and

3 d

collected from Monte Carlo simulations at various temperatures. For single-temperature T-RBMs, the filters (row vectors) and the inverse filters (column vectors) of the weight matrix exhibit different characteristic patterns and correlations, respectively, below, around, and above the phase transition. These metrics, such as filter sum fluctuation and inverse filter correlation, can be used to locate the phase transition point. The hidden layer

h

on average contains an equal number of

+ 1

and

- 1

units, whose variance decreases as more hidden units are added. The sign of a particular hidden unit

h_{i}

is determined by the signs of the filter sum

sum (w_{i}^{T})

and the magnetization m of the visible pattern. However, there is no spatial pattern in the sequence of positive and negative units in a hidden encoding.

The visible energy reflects the relative probability of visible states in the Boltzmann distribution. Although the mean of visible energy is not directly related to the (physical) internal energy and does not reveal a clear transition, its fluctuation, which peaks at the critical point, can be used to identify the phase transition. The value and trend of the visible energy can be understood from its several approximation forms, in particular the sum of the absolute value of filter sums. The pseudo-likelihood of RBMs is conceptually related to and can be used to estimate the physical entropy. Numerical differentiation of the pseudo-likelihood provides another estimator of the transition temperature because it provides an estimate of the heat capacity. All these predictions about the critical temperature were made by unsupervised RBM learning, for which human labeling of the phase types is not needed.

As a comparison, we also trained an all-temperature

\cup T

-RBM, whose dataset is a mixture of Boltzmann-distributed states over a range of temperatures. Each filter of this

\cup T

-RBM is featured by one large domain in its receptive field. Although the visible energy and pseudo-likelihood of the

\cup T

-RBM show a certain signature of the phase transition, one should be cautious, as this detection could be an artifact of the composition of the dataset. Changing the proportions of Ising configurations at different temperatures could bias the probability and the transition learned by the

\cup T

-RBM.

By extracting the underlying (Boltzmann) distribution of the input data, RBMs capture the rapid (phase) transition of such a distribution as the tuning parameter (temperature) is changed, without knowledge of the physical Hamiltonian. Information about the distribution is completely embedded in the configurations and their frequencies in the dataset. It would be interesting to see if such a general scheme of RBM learning can be extended to study other physical models of phase transition.

Supplementary Materials

The following Supporting Information can be downloaded at: https://www.mdpi.com/article/10.3390/e24121701/s1, training of RBMs; the constrained RBM with biases set to zero; variance of filter sum; correlation of inverse filter; reconstructed thermal quantities by RBMs; example filters of single temperature RBMs at different temperatures; example filters of the all temperature RBM.

Author Contributions

Conceptualization, J.G. and K.Z.; methodology, K.Z.; software, J.G.; formal analysis, J.G.; investigation, J.G. and K.Z.; data curation, J.G.; writing—original draft preparation, J.G. and K.Z.; writing—review and editing, K.Z.; visualization, J.G.; supervision, K.Z. All authors have read and agreed to the published version of the manuscript.

Funding

Duke Kunshan startup and SRS fund; Kunshan Government Research fund (KGR-R97030021S).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank Duke Kunshan startup funding, the Summer Research Scholars (SRS) program and Kunshan Government Research fund (KGR-R97030021S) for supporting this work. We also thank Giuseppe Magnifico and Kim Nicoli for helpful discussions.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

RBM	restricted Boltzmann machine
T-RBM	single-temperature restricted Boltzmann machine
$\cup T$ -RBM	all-temperature restricted Boltzmann machine

Appendix A. Statistical Thermodynamics of Ising Model

In this Appendix, we review the statistical thermodynamics of the Ising model covered in this work. The internal energy at a given temperature:

\begin{matrix} 〈 E 〉 = \sum_{s} p_{T} (s) H (s) = \frac{\sum_{s} H (s) e^{- \frac{H (s)}{k_{B} T}}}{Z_{T}} \end{matrix}

(A1)

where

〈 \dots 〉

means to take the thermal average over equilibrated configurations. The heat capacity is

\begin{matrix} C_{V} = k_{B} β^{2} (〈 E^{2} 〉 - {〈 E 〉}^{2}) \end{matrix}

(A2)

where

β = \frac{1}{k_{B} T}

, and the heat capacity per spin (or specific heat) is

c_{V} = C_{V} / N

. The magnetization per spin:

\begin{matrix} 〈 m 〉 = \frac{1}{N} 〈\sum_{i = 1}^{N} s_{i}〉 . \end{matrix}

(A3)

In small finite systems, because flips from m to

- m

configurations are common, we need to take the absolute value

| m |

before the thermal average:

\begin{matrix} 〈 | m | 〉 = \frac{1}{N} 〈|\sum_{i = 1}^{N} s_{i}|〉 . \end{matrix}

(A4)

The physical entropy can be defined using the Gibbs entropy formula:

\begin{matrix} S = - k_{B} 〈 ln p_{T} (s) 〉 = - k_{B} \sum_{s} p_{T} (s) ln p_{T} (s) . \end{matrix}

(A5)

For the

2 d

Ising model, the critical temperature solved from

sinh (2 \frac{J}{k_{B} T_{c}}) = 1

is

k_{B} T_{c} = \frac{2 J}{ln (1 + \sqrt{2})} = 2.269185 J .

Define

\begin{matrix} K = & \frac{J}{k_{B} T}, x = e^{- 2 K}, q (K) = \frac{2 sinh 2 K}{{cosh}^{2} 2 K} \\ K_{1} (q) = \int_{0}^{π / 2} \frac{d ϕ}{\sqrt{1 - q^{2} {sin}^{2} ϕ}}, E_{1} (q) = \int_{0}^{π / 2} d ϕ \sqrt{1 - q^{2} {sin}^{2} ϕ}; \end{matrix}

analytical results of the

2 d

Ising model are expressed as: magnetization per spin [52]:

\begin{matrix} 〈 m 〉 & = {[\frac{1 + x^{2}}{{(1 - x^{2})}^{2}} {(1 - 6 x^{2} + x^{4})}^{\frac{1}{2}}]}^{\frac{1}{4}} = {[1 - {sinh}^{- 4} (2 K)]}^{1 / 8}, \end{matrix}

(A6)

internal energy per spin [50]:

\frac{〈 E 〉}{N} = - J coth 2 K [1 + \frac{2}{π} (2 {tanh}^{2} 2 K - 1) K_{1} (q)],

(A7)

specific heat [53]:

\begin{matrix} c_{V} = k_{B} \frac{4}{π} {(K coth 2 K)}^{2} \{K_{1} (q) - E_{1} (q) - (1 - {tanh}^{2} 2 K) [\frac{π}{2} + (2 {tanh}^{2} 2 K - 1) K_{1} (q)]\}, \end{matrix}

(A8)

and the partition function per spin (or free energy per spin

f = F / N

) [51]:

\begin{matrix} - β f & = ln (\sqrt{2} cosh 2 K) + \frac{1}{π} \int_{0}^{π / 2} ln (1 + \sqrt{1 - q^{2} {sin}^{2} ϕ}) d ϕ . \end{matrix}

(A9)

The equation for entropy can be obtained from thermodynamic relation

F = 〈 E 〉 - T S

.

For the

3 d

Ising model,

〈 m 〉

,

〈 E 〉

, and

c_{V}

can be calculated directly from Monte Carlo sampling [54]. The numerical prediction for the critical temperature is

T_{c} \approx 4.511 \frac{J}{k_{B}}

[66]. Special techniques are needed to compute the free energy or entropy. We used the thermodynamic integration in the high temperature regime:

\begin{matrix} \begin{matrix} F & = - N k_{B} T ln 2 + k_{B} T \int_{0}^{\frac{1}{k_{B} T}} 〈 E 〉 d β^{'} \end{matrix} \end{matrix}

(A10)

or

\begin{matrix} \begin{matrix} S (T) = \int_{0}^{T} \frac{C_{V} (T^{'})}{T^{'}} d T^{'} \end{matrix} \end{matrix}

(A11)

in the low temperature regime, since

S (T \to 0) = 0

and

C_{V} (T \to 0) \to 0

for the Ising model.

Appendix B. Energy and Probability of RBMs

In this Appendix, we review the derivations of the energy and probability of RBMs, which can be found in the standard machine learning literature [67]. The visible energy

E_{θ} (v)

:

\begin{matrix} E_{θ} (v) & = - ln \sum_{h} e^{- E_{θ} (v, h)} = - ln p_{θ} (v) - ln Z_{θ} = - ln [e^{\sum_{j}^{n_{v}} b_{j} v_{j}} \sum_{h} e^{\sum_{i}^{n_{h}} (\sum_{j}^{n_{v}} W_{i j} v_{j} + c_{i}) h_{i}}] \\ = - \sum_{j}^{n_{v}} b_{j} v_{j} - ln [\sum_{h_{1} = - 1}^{+ 1} \sum_{h_{2} = - 1}^{+ 1} \dots \sum_{h_{n_{h}} = - 1}^{+ 1} \prod_{i = 1}^{n_{h}} e^{(\sum_{j}^{n_{v}} W_{i j} v_{j} + c_{i}) h_{i}}] = - \sum_{j}^{n_{v}} b_{j} v_{j} - ln [\prod_{i = 1}^{n_{h}} \sum_{h_{i} = - 1, 1} e^{(\sum_{j}^{n_{v}} W_{i j} v_{j} + c_{i}) h_{i}}] \\ = - \sum_{j}^{n_{v}} b_{j} v_{j} - ln \prod_{i = 1}^{n_{h}} (e^{- \sum_{j}^{n_{v}} W_{i j} v_{j} - c_{i}} + e^{\sum_{j}^{n_{v}} W_{i j} v_{j} + c_{i}}) = - \sum_{j}^{n_{v}} b_{j} v_{j} - \sum_{i = 1}^{n_{h}} ln (e^{- \sum_{j}^{n_{v}} W_{i j} v_{j} - c_{i}} + e^{\sum_{j}^{n_{v}} W_{i j} v_{j} + c_{i}}) \\ = - b^{T} v - \sum_{i = 1}^{n_{h}} ln (e^{- w_{i}^{T} v - c_{i}} + e^{w_{i}^{T} v + c_{i}}) . \end{matrix}

(A12)

The conditional probability:

\begin{matrix} p_{θ} (h | v) & = \frac{p_{θ} (v, h)}{p_{θ} (v)} = \frac{e^{- E_{θ} (v, h)}}{e^{- E_{θ} (v)}} = \frac{e^{b^{T} v}}{e^{- E_{θ} (v)}} e^{c^{T} h + h^{T} W v} = \frac{1}{Ω_{θ} (v)} e^{c^{T} h + h^{T} W v} \end{matrix}

(A13)

where the

h

-independent constant

Ω_{θ} (v) = e^{- b^{T} v - E_{θ} (v)} = \sum_{h} e^{c^{T} h + h^{T} W v}

such that

Z_{θ} = \sum_{v} Ω_{θ} (v) e^{b^{T} v}

. Therefore,

\begin{matrix} p_{θ} (h | v) & = \frac{1}{Ω_{θ} (v)} e^{\sum_{i = 1}^{n_{h}} c_{i} h_{i} + \sum_{i = 1}^{n_{h}} h_{i} w_{i}^{T} v} = \frac{1}{Ω_{θ} (v)} e^{\sum_{i = 1}^{n_{h}} h_{i} (c_{i} + w_{i}^{T} v)} = \frac{1}{Ω_{θ} (v)} \prod_{i = 1}^{n_{h}} e^{h_{i} (c_{i} + w_{i}^{T} v)} = \prod_{i = 1}^{n_{h}} p_{θ} (h_{i} | v) \end{matrix}

(A14)

from which it can be recognized that

p_{θ} (h_{i} | v) \propto e^{h_{i} (c_{i} + w_{i}^{T} v)}

. The single-unit conditional probability:

\begin{matrix} p_{θ} (h_{i} = 1 | v) & = \frac{p_{θ} (h_{i} = 1 | v)}{p_{θ} (h_{i} = - 1 | v) + p_{θ} (h_{i} = 1 | v)} = \frac{e^{c_{i} + w_{i}^{T} v}}{e^{- c_{i} - w_{i}^{T} v} + e^{c_{i} + w_{i}^{T} v}} \\ = \frac{1}{1 + e^{- 2 (c_{i} + w_{i}^{T} v)}} = σ (2 (c_{i} + w_{i}^{T} v)) . \end{matrix}

(A15)

Other relations of

p_{θ} (h_{i} = - 1 | v)

,

p_{θ} (v_{j} = 1 | h)

and

p_{θ} (v_{j} = - 1 | h)

can be found similarly.

Appendix C. Maximum Likelihood Estimation and Gradient Descent of RBMs

In this Appendix, we review the gradient descent algorithm of RBMs derived from maximum likelihood estimation [67]. The likelihood function for a given dataset

D = {[v_{1}, v_{2}, \dots, v_{M}]}^{T}

is

P_{θ} (D) = \prod_{m = 1}^{M} p_{θ} (v_{m})

, and the maximum likelihood is equivalent to the minimum negative log likelihood (or its average):

\begin{matrix} θ^{*} & = arg max_{θ} \prod_{m = 1}^{M} p_{θ} (v_{m}) = arg min_{θ} \{- \sum_{m = 1}^{M} ln p_{θ} (v_{m})\} \\ = arg min_{θ} \{- \frac{1}{M} \sum_{m = 1}^{M} ln p_{θ} (v_{m})\} = arg min_{θ} {〈 - ln p_{θ} (v) 〉}_{v \sim p_{D}} = arg min_{θ} L (θ) \end{matrix}

(A16)

where

v \sim p_{D}

means to randomly draw

v

from

p_{D}

and

〈 \dots 〉

is the expectation value (subject to the distribution). Alternatively, this can be considered as minimizing the Kullback–Leibler (KL) divergence:

\begin{matrix} D_{KL} (p_{D} | p_{θ}) & = \sum_{m = 1}^{M} p_{D} (v_{m}) ln \frac{p_{D} (v_{m})}{p_{θ} (v_{m})} = {〈ln p_{D} (v) - ln p_{θ} (v)〉}_{v \sim p_{D}} \geq 0 \end{matrix}

with respect to

θ

, where only the second term

{〈- ln p_{θ} (v)〉}_{v \sim p_{D}}

depends on parameter

θ

. In this work, we used

L (θ)

as the loss function to train the RBMs.

It is sometimes useful to directly monitor the reconstruction error by comparing the input (

v

) and reconstructed configurations (

v^{'}

) or, more quantitatively, by the (normalized) cross-entropy:

\begin{matrix} CE = {〈- \frac{1}{n_{v}} \sum_{j = 1}^{n_{v}} [1_{v_{j} = + 1} ln p_{θ} (v_{j}^{'} = + 1 | h) + 1_{v_{j} = - 1} ln p_{θ} (v_{j}^{'} = - 1 | h)]〉}_{v \sim p_{D}} \end{matrix}

(A17)

where the indicator function

1_{A} = 1

if A is true or 0 if A is false.

The gradient of the loss function:

\begin{matrix} \nabla_{θ} L (θ) & = \nabla_{θ} {〈 E_{θ} (v) 〉}_{v \sim p_{D}} + \nabla_{θ} ln Z_{θ} = {〈 \nabla_{θ} E_{θ} (v) 〉}_{v \sim p_{D}} + \nabla_{θ} ln Z_{θ} \end{matrix}

(A18)

where

\begin{matrix} \nabla_{θ} ln Z_{θ} & = \frac{\nabla_{θ} Z_{θ}}{Z_{θ}} = \frac{\nabla_{θ} \sum_{v} e^{- E_{θ} (v)}}{Z_{θ}} = \frac{\sum_{v} \nabla_{θ} e^{- E_{θ} (v)}}{Z_{θ}} = - \frac{\sum_{v} e^{- E_{θ} (v)} \nabla_{θ} E_{θ} (v)}{Z_{θ}} \\ = - \sum_{v} p_{θ} (v) \nabla_{θ} E_{θ} (v) = - {〈 \nabla_{θ} E_{θ} (v) 〉}_{v \sim p_{θ}} . \end{matrix}

Furthermore,

\begin{matrix} \nabla_{θ} L (θ) & = {〈 \nabla_{θ} E_{θ} (v) 〉}_{v \sim p_{D}} - {〈 \nabla_{θ} E_{θ} (v) 〉}_{v \sim p_{θ}} \\ = positive phase + negative phase \\ = data term + model term \end{matrix}

(A19)

In both the positive and negative phase,

\nabla_{θ} E_{θ} (v) = \nabla_{θ} [- b^{T} v - \sum_{i = 1}^{n_{h}} ln (e^{- w_{i}^{T} v - c_{i}} + e^{w_{i}^{T} v + c_{i}})]

which has components:

\begin{matrix} \frac{\partial E_{θ} (v)}{\partial W_{i j}} & = - \frac{- v_{j} e^{- w_{i}^{T} v - c_{i}} + v_{j} e^{w_{i}^{T} v + c_{i}}}{e^{- w_{i}^{T} v - c_{i}} + e^{w_{i}^{T} v + c_{i}}} \\ = v_{j} \frac{e^{- w_{i}^{T} v - c_{i}} - e^{w_{i}^{T} v + c_{i}}}{e^{- w_{i}^{T} v - c_{i}} + e^{w_{i}^{T} v + c_{i}}} = - v_{j} tanh (w_{i}^{T} v + c_{i}) \\ = - v_{j} [(- 1) p_{θ} (h_{i} = - 1 | v) + (+ 1) p_{θ} (h_{i} = 1 | v)] \\ = - v_{j} {〈 h_{i} 〉}_{h_{i} \sim p_{θ} (h_{i} | v)} \\ \frac{\partial E_{θ} (v)}{\partial c_{i}} & = - \frac{- e^{- w_{i}^{T} v - c_{i}} + e^{w_{i}^{T} v + c_{i}}}{e^{- w_{i}^{T} v - c_{i}} + e^{w_{i}^{T} v + c_{i}}} = - tanh (w_{i}^{T} v + c_{i}) \\ = - [(- 1) p_{θ} (h_{i} = - 1 | v) + (+ 1) p_{θ} (h_{i} = 1 | v)] \\ = - {〈 h_{i} 〉}_{h_{i} \sim p_{θ} (h_{i} | v)} \\ \frac{\partial E_{θ} (v)}{\partial b_{j}} & = - v_{j} . \end{matrix}

(A20)

To evaluate the expectation value

〈 \nabla_{θ} E_{θ} (v) 〉

, in the positive phase,

v

can be directly drawn from the dataset, while in the negative phase,

v

must be sampled from the model distribution

p_{θ} (v)

. In practice, as an approximation, the Markov chain Monte Carlo (MCMC) method is used to generate

v

states that obey the distribution

p_{θ} (v)

, such that

{〈 \nabla_{θ} E_{θ} (v) 〉}_{v \sim p_{θ}} \approx \frac{1}{sample size} \sum_{v \sim p_{θ}} \nabla_{θ} E_{θ} (v) .

(A21)

Using the conditional probability,

p_{θ} (h | v)

and

p_{θ} (v | h)

, we can generate a sequence of states:

v^{(0)} \to h^{(0)} \to v^{(1)} \to h^{(1)} \to \dots \to v^{(t)} \to h^{(t)} \to \dots .

As

t \to \infty

, the MCMC converges with

(v^{(t)}, h^{(t)}) \sim p_{θ} (v, h)

and

v^{(t)} \sim p_{θ} (v)

.

The Markov chain starting from a random

v^{(0)}

takes many steps to equilibrate. There are two ways to speed up the sampling [68]:

k Step contrastive divergence (CD-k):
For each parameter update, draw $v^{(0)}$ (or a minibatch) from the training data $D = {[v_{1}, v_{2}, \dots, v_{M}]}^{T}$ and run Gibbs sampling for k steps. Even CD-1 can work reasonably well.
Persistent contrastive divergence (PCD-k):
Always keep the same MC during the entire training process. For each parameter update, run this persistent MC for another k steps to collect $v$ states.

References

Carleo, G.; Cirac, I.; Cranmer, K.; Daudet, L.; Schuld, M.; Tishby, N.; Vogt-Maranto, L.; Zdeborová, L. Machine learning and the physical sciences. Rev. Mod. Phys. 2019, 91, 045002. [Google Scholar] [CrossRef] [Green Version]
Bahri, Y.; Kadmon, J.; Pennington, J.; Schoenholz, S.S.; Sohl-Dickstein, J.; Ganguli, S. Statistical mechanics of deep learning. Annu. Rev. Condens. Matter Phys. 2020, 11, 501. [Google Scholar] [CrossRef] [Green Version]
Lin, H.W.; Tegmark, M.; Rolnick, D. Why does deep and cheap learning work so well? J. Stat. Phys. 2017, 168, 1223–1247. [Google Scholar] [CrossRef] [Green Version]
Ballard, A.J.; Das, R.; Martiniani, S.; Mehta, D.; Sagun, L.; Stevenson, J.D.; Wales, D.J. Energy landscapes for machine learning. Phys. Chem. Chem. Phys. 2017, 19, 12585–12603. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhang, Y.; Saxe, A.M.; Advani, M.S.; Lee, A.A. Energy–entropy competition and the effectiveness of stochastic gradient descent in machine learning. Mol. Phys. 2018, 116, 3214–3223. [Google Scholar] [CrossRef] [Green Version]
Baity-Jesi, M.; Sagun, L.; Geiger, M.; Spigler, S.; Arous, G.B.; Cammarota, C.; LeCun, Y.; Wyart, M.; Biroli, G. Comparing dynamics: Deep neural networks versus glassy systems. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2018; pp. 314–323. [Google Scholar]
Geiger, M.; Spigler, S.; d’Ascoli, S.; Sagun, L.; Baity-Jesi, M.; Biroli, G.; Wyart, M. Jamming transition as a paradigm to understand the loss landscape of deep neural networks. Phys. Rev. E 2019, 100, 012115. [Google Scholar] [CrossRef] [Green Version]
Feng, Y.; Tu, Y. The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima. Proc. Natl. Acad. Sci. USA 2021, 118, e2015617118. [Google Scholar] [CrossRef]
Roberts, D.A.; Yaida, S.; Hanin, B. The Principles of Deep Learning Theory: An Effective Theory Approach to Understanding Neural Networks; Cambridge University Press: New York, NY, USA, 2022. [Google Scholar]
Zdeborová, L.; Krzakala, F. Statistical physics of inference: Thresholds and algorithms. Adv. Phys. 2016, 65, 453–552. [Google Scholar] [CrossRef] [Green Version]
Behler, J.; Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 2007, 98, 146401. [Google Scholar] [CrossRef]
Carrasquilla, J.; Melko, R.G. Machine learning phases of matter. Nat. Phys. 2017, 13, 431–434. [Google Scholar] [CrossRef]
Tibaldi, S.; Magnifico, G.; Vodola, D.; Ercolessi, E. Unsupervised and supervised learning of interacting topological phases from single-particle correlation functions. arXiv 2022, arXiv:2202.09281. [Google Scholar]
Bapst, V.; Keck, T.; Grabska-Barwińska, A.; Donner, C.; Cubuk, E.D.; Schoenholz, S.S.; Obika, A.; Nelson, A.W.; Back, T.; Hassabis, D.; et al. Unveiling the predictive power of static structure in glassy systems. Nat. Phys. 2020, 16, 448–454. [Google Scholar] [CrossRef]
Iten, R.; Metger, T.; Wilming, H.; del Rio, L.; Renner, R. Discovering physical concepts with neural networks. Phys. Rev. Lett. 2020, 124, 010508. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bedolla, E.; Padierna, L.C.; Castaneda-Priego, R. Machine learning for condensed matter physics. J. Phys. Condens. Matter 2020, 33, 053001. [Google Scholar] [CrossRef]
Cichos, F.; Gustavsson, K.; Mehlig, B.; Volpe, G. Machine learning for active matter. Nat. Mach. Intell. 2020, 2, 94–103. [Google Scholar] [CrossRef] [Green Version]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [Green Version]
Smolensky, P. Information processing in dynamical systems: Foundations of harmony theory. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition; MIT Press: Cambridge, MA, USA, 1986; pp. 194–281. [Google Scholar]
Sherrington, D.; Kirkpatrick, S. Solvable model of a spin-glass. Phys. Rev. Lett. 1975, 35, 1792. [Google Scholar] [CrossRef]
Ackley, D.H.; Hinton, G.E.; Sejnowski, T.J. A learning algorithm for Boltzmann machines. Cogn. Sci. 1985, 9, 147–169. [Google Scholar] [CrossRef]
Cocco, S.; Monasson, R. Adaptive cluster expansion for inferring Boltzmann machines with noisy data. Phys. Rev. Lett. 2011, 106, 090601. [Google Scholar] [CrossRef] [Green Version]
Aurell, E.; Ekeberg, M. Inverse Ising inference using all the data. Phys. Rev. Lett. 2012, 108, 090201. [Google Scholar] [CrossRef] [Green Version]
Nguyen, H.C.; Zecchina, R.; Berg, J. Inverse statistical problems: From the inverse Ising problem to data science. Adv. Phys. 2017, 66, 197–261. [Google Scholar] [CrossRef]
Huang, L.; Wang, L. Accelerated Monte Carlo simulations with restricted Boltzmann machines. Phys. Rev. B 2017, 95, 035105. [Google Scholar] [CrossRef] [Green Version]
Carleo, G.; Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 2017, 355, 602–606. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Melko, R.G.; Carleo, G.; Carrasquilla, J.; Cirac, J.I. Restricted Boltzmann machines in quantum physics. Nat. Phys. 2019, 15, 887–892. [Google Scholar] [CrossRef]
Yu, W.; Liu, Y.; Chen, Y.; Jiang, Y.; Chen, J.Z. Generating the conformational properties of a polymer by the restricted Boltzmann machine. J. Chem. Phys. 2019, 151, 031101. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Mehta, P.; Schwab, D.J. An exact mapping between the variational renormalization group and deep learning. arXiv 2014, arXiv:1410.3831. [Google Scholar]
Chen, J.; Cheng, S.; Xie, H.; Wang, L.; Xiang, T. Equivalence of restricted Boltzmann machines and tensor network states. Phys. Rev. B 2018, 97, 085104. [Google Scholar] [CrossRef] [Green Version]
Salazar, D.S. Nonequilibrium thermodynamics of restricted Boltzmann machines. Phys. Rev. E 2017, 96, 022131. [Google Scholar] [CrossRef] [Green Version]
Decelle, A.; Fissore, G.; Furtlehner, C. Thermodynamics of restricted Boltzmann machines and related learning dynamics. J. Stat. Phys. 2018, 172, 1576–1608. [Google Scholar] [CrossRef] [Green Version]
Decelle, A.; Furtlehner, C. Restricted Boltzmann machine: Recent advances and mean-field theory. Chin. Phys. B 2021, 30, 040202. [Google Scholar] [CrossRef]
LeCun, Y. A path towards autonomous machine intelligence. Openreview 2022. Available online: https://openreview.net/forum?id=BZ5a1r-kVsf (accessed on 17 October 2022).
Torlai, G.; Melko, R.G. Learning thermodynamics with Boltzmann machines. Phys. Rev. B 2016, 94, 165134. [Google Scholar] [CrossRef] [Green Version]
Morningstar, A.; Melko, R.G. Deep Learning the Ising Model Near Criticality. J. Mach. Learn. Res. 2018, 18, 1–17. [Google Scholar]
D’Angelo, F.; Böttcher, L. Learning the Ising model with generative neural networks. Phys. Rev. Res. 2020, 2, 023266. [Google Scholar] [CrossRef]
Iso, S.; Shiba, S.; Yokoo, S. Scale-invariant feature extraction of neural network and renormalization group flow. Phys. Rev. E 2018, 97, 053304. [Google Scholar] [CrossRef] [Green Version]
Funai, S.S.; Giataganas, D. Thermodynamics and feature extraction by machine learning. Phys. Rev. Res. 2020, 2, 033415. [Google Scholar] [CrossRef]
Koch, E.D.M.; Koch, R.D.M.; Cheng, L. Is deep learning a renormalization group flow? IEEE Access 2020, 8, 106487–106505. [Google Scholar] [CrossRef]
Veiga, R.; Vicente, R. Restricted Boltzmann Machine Flows and The Critical Temperature of Ising models. arXiv 2020, arXiv:2006.10176. [Google Scholar]
Funai, S.S. Feature extraction of machine learning and phase transition point of Ising model. arXiv 2021, arXiv:2111.11166. [Google Scholar]
Wang, L. Discovering phase transitions with unsupervised learning. Phys. Rev. B 2016, 94, 195105. [Google Scholar] [CrossRef] [Green Version]
Hu, W.; Singh, R.R.; Scalettar, R.T. Discovering phases, phase transitions, and crossovers through unsupervised machine learning: A critical examination. Phys. Rev. E 2017, 95, 062122. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wetzel, S.J. Unsupervised learning of phase transitions: From principal component analysis to variational autoencoders. Phys. Rev. E 2017, 96, 022140. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Tanaka, A.; Tomiya, A. Detection of phase transition via convolutional neural networks. J. Phys. Soc. Jpn. 2017, 86, 063001. [Google Scholar] [CrossRef] [Green Version]
Kashiwa, K.; Kikuchi, Y.; Tomiya, A. Phase transition encoded in neural network. Prog. Theor. Exp. Phys. 2019, 2019, 083A04. [Google Scholar] [CrossRef] [Green Version]
Cipra, B.A. An introduction to the Ising model. Am. Math. Mon. 1987, 94, 937–959. [Google Scholar] [CrossRef]
Newman, M.E.J.; Barkema, G.T. Monte Carlo Methods in Statistical Physics; Oxford University: Oxford, UK, 1999. [Google Scholar]
Kramers, H.A.; Wannier, G.H. Statistics of the Two-Dimensional Ferromagnet: Part I. Phys. Rev. 1941, 60, 252–262. [Google Scholar] [CrossRef]
Onsager, L. Crystal statistics. I. A two-dimensional model with an order-disorder transition. Phys. Rev. 1944, 65, 117. [Google Scholar] [CrossRef]
Yang, C.N. The Spontaneous Magnetization of a Two-Dimensional Ising Model. Phys. Rev. 1952, 85, 808–816. [Google Scholar] [CrossRef]
Plischke, M.; Bergersen, B. Equilibrium Statistical Physics; World Scientific: Singapore, 1994. [Google Scholar]
Landau, D.; Binder, K. A Guide to Monte Carlo Simulations in Statistical Physics; Cambridge University Press: New York, NY, USA, 2021. [Google Scholar]
Fischer, A.; Igel, C. An introduction to restricted Boltzmann machines. In Proceedings of the Iberoamerican Congress on Pattern Recognition, Havana, Cuba, 28–31 October 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 14–36. [Google Scholar]
Oh, S.; Baggag, A.; Nha, H. Entropy, free energy, and work of restricted boltzmann machines. Entropy 2020, 22, 538. [Google Scholar] [CrossRef]
Huang, H.; Toyoizumi, T. Advanced mean-field theory of the restricted Boltzmann machine. Phys. Rev. E 2015, 91, 050101(R). [Google Scholar] [CrossRef] [Green Version]
Cossu, G.; Del Debbio, L.; Giani, T.; Khamseh, A.; Wilson, M. Machine learning determination of dynamical parameters: The Ising model case. Phys. Rev. B 2019, 100, 064304. [Google Scholar] [CrossRef] [Green Version]
Besag, J. Statistical analysis of non-lattice data. J. R. Stat. Soc. Ser. D 1975, 24, 179–195. [Google Scholar] [CrossRef] [Green Version]
LISA. Deep Learning Tutorials. 2018. Available online: https://github.com/lisa-lab/DeepLearningTutorials (accessed on 1 August 2022).
Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
Rao, W.J.; Li, Z.; Zhu, Q.; Luo, M.; Wan, X. Identifying product order with restricted Boltzmann machines. Phys. Rev. B 2018, 97, 094207. [Google Scholar] [CrossRef] [Green Version]
Wu, D.; Wang, L.; Zhang, P. Solving statistical mechanics using variational autoregressive networks. Phys. Rev. Lett. 2019, 122, 080602. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Nicoli, K.A.; Nakajima, S.; Strodthoff, N.; Samek, W.; Müller, K.R.; Kessel, P. Asymptotically unbiased estimation of physical observables with neural samplers. Phys. Rev. E 2020, 101, 023304. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yevick, D.; Melko, R. The accuracy of restricted Boltzmann machine models of Ising systems. Comput. Phys. Commun. 2021, 258, 107518. [Google Scholar] [CrossRef]
Ferrenberg, A.M.; Landau, D.P. Critical behavior of the three-dimensional Ising model: A high-resolution Monte Carlo study. Phys. Rev. B 1991, 44, 5081. [Google Scholar] [CrossRef] [PubMed]
Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Boston, MA, USA, 2012. [Google Scholar]
Hinton, G.E. A practical guide to training restricted Boltzmann machines. In Neural Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 2012; pp. 599–619. [Google Scholar]

Figure 1. A restricted Boltzmann machine (RBM) with

n_{h} = 6

hidden units and

n_{v} = 9

visible units. Model parameters

θ = {W, b, c}

are represented by connections. A filter

w_{1}^{T}

from visible units to the first hidden unit is highlighted by red (light color) connections.

Figure 1. A restricted Boltzmann machine (RBM) with

n_{h} = 6

hidden units and

n_{v} = 9

visible units. Model parameters

θ = {W, b, c}

are represented by connections. A filter

w_{1}^{T}

from visible units to the first hidden unit is highlighted by red (light color) connections.

Figure 2. Probability density function (PDF) of the distribution of (a)

W_{i j}

, (b)

b_{j}

, and (c)

c_{i}

of T-RBMs with

n_{h} = 400

hidden units for the

2 d

Ising model at temperatures below, close to, and above

T_{c}

.

Figure 2. Probability density function (PDF) of the distribution of (a)

W_{i j}

, (b)

b_{j}

, and (c)

c_{i}

of T-RBMs with

n_{h} = 400

hidden units for the

2 d

Ising model at temperatures below, close to, and above

T_{c}

.

Figure 3. T-RBMs with

n_{h} = 400

for the

2 d

Ising model at temperature

T = 1.0

,

2.25

, and

3.5

. (a) Five sample filters

w_{i}^{T}

at each temperature. The color bar range is set to be within about two standard deviations of the distribution. (b) PDF of the distribution of the

n_{h} = 400

filter sums (normalized by

n_{v}

). Inset: variance

〈{|\sum_{j = 1}^{n_{v}} W_{i j}|}^{2}〉 - {〈|\sum_{j = 1}^{n_{v}} W_{i j}|〉}^{2}

of the filter sum as a function of temperature. (c) PDF of the distribution of the

n_{v} = 4096

inverse filter sums (normalized by

n_{h}

). Inset: correlation between a pair of inverse filters

w_{:, j}

and

w_{:, j^{'}}

(normalized by auto-correlation) as a function of spin–spin distance

r_{j j^{'}}

.

Figure 3. T-RBMs with

n_{h} = 400

for the

2 d

Ising model at temperature

T = 1.0

,

2.25

, and

3.5

. (a) Five sample filters

w_{i}^{T}

at each temperature. The color bar range is set to be within about two standard deviations of the distribution. (b) PDF of the distribution of the

n_{h} = 400

filter sums (normalized by

n_{v}

). Inset: variance

〈{|\sum_{j = 1}^{n_{v}} W_{i j}|}^{2}〉 - {〈|\sum_{j = 1}^{n_{v}} W_{i j}|〉}^{2}

of the filter sum as a function of temperature. (c) PDF of the distribution of the

n_{v} = 4096

inverse filter sums (normalized by

n_{h}

). Inset: correlation between a pair of inverse filters

w_{:, j}

and

w_{:, j^{'}}

(normalized by auto-correlation) as a function of spin–spin distance

r_{j j^{'}}

.

Figure 4. The

\cup T

-RBM with

n_{h} = 400

for the

2 d

Ising model. (a) Four sample filters

w_{i}^{T}

. (b) PDF of the distribution of

W_{i j}

,

b_{j}

, and

c_{i}

. (c) PDF of the distribution of the

n_{h} = 400

filter sums and the

n_{v} = 4096

inverse filter sums.

Figure 4. The

\cup T

-RBM with

n_{h} = 400

for the

2 d

Ising model. (a) Four sample filters

w_{i}^{T}

. (b) PDF of the distribution of

W_{i j}

,

b_{j}

, and

c_{i}

. (c) PDF of the distribution of the

n_{h} = 400

filter sums and the

n_{v} = 4096

inverse filter sums.

Figure 5. Histogram of

m_{h}

obtained from the hidden encodings of

M =

50,000

2 d

Ising configurations at

T = 2.25

using T-RBMs with various

n_{h}

. Inset: examples of the hidden layer of T-RBMs with

n_{h} = 400

wrapped into a

20 \times 20

matrix at three temperatures, where

+ 1

/

- 1

units are represented by black/white pixels.

Figure 5. Histogram of

m_{h}

obtained from the hidden encodings of

M =

50,000

2 d

Ising configurations at

T = 2.25

using T-RBMs with various

n_{h}

. Inset: examples of the hidden layer of T-RBMs with

n_{h} = 400

wrapped into a

20 \times 20

matrix at three temperatures, where

+ 1

/

- 1

units are represented by black/white pixels.

Figure 6. (a) Internal energy, (b) magnetization, and (c) specific heat of

2 d

Ising states reconstructed by T-RBMs (

n_{h} = 400

) with the hidden layer

h^{(0)}

initiated according to

p_{h} = (〈 | m | 〉 + 1) / 2

or

p_{h} = 1.0 (T \leq 2.0), 0.5 (T \geq 2.5), 0.75 (2.0 < T < 2.5)

(stepwise). Reconstruction by a seven-step Markov chain from random

h^{(0)}

is compared (

v^{(7)}

). Analytical and Monte Carlo simulation results are also shown.

Figure 6. (a) Internal energy, (b) magnetization, and (c) specific heat of

2 d

Ising states reconstructed by T-RBMs (

n_{h} = 400

) with the hidden layer

h^{(0)}

initiated according to

p_{h} = (〈 | m | 〉 + 1) / 2

or

p_{h} = 1.0 (T \leq 2.0), 0.5 (T \geq 2.5), 0.75 (2.0 < T < 2.5)

(stepwise). Reconstruction by a seven-step Markov chain from random

h^{(0)}

is compared (

v^{(7)}

). Analytical and Monte Carlo simulation results are also shown.

Figure 7. Mean and variance of visible energy

E_{θ}

as a function of temperature for

2 d

(a,c,d,f) and

3 d

(b,e) Ising models captured by T-RBMs (a,b,d,e) and the

\cup T

-RBM (c,f) of various hidden neurons

n_{h}

. Three approximate forms of visible energy for

n_{h} = 400

T-RBMs are shown in (a).

Figure 7. Mean and variance of visible energy

E_{θ}

as a function of temperature for

2 d

(a,c,d,f) and

3 d

(b,e) Ising models captured by T-RBMs (a,b,d,e) and the

\cup T

-RBM (c,f) of various hidden neurons

n_{h}

. Three approximate forms of visible energy for

n_{h} = 400

T-RBMs are shown in (a).

Figure 8. Pseudo-likelihood

\tilde{L}

per spin of T-RBMs (a,b) and of the

\cup T

-RBM (c) with different numbers

n_{h}

of hidden units for the

2 d

(a,c) and

3 d

(b) Ising models in comparison with entropy S and pseudo-entropy

\tilde{S}

per spin. Dashed lines are polynomial fittings around

T_{c}

.

Figure 8. Pseudo-likelihood

\tilde{L}

per spin of T-RBMs (a,b) and of the

\cup T

-RBM (c) with different numbers

n_{h}

of hidden units for the

2 d

(a,c) and

3 d

(b) Ising models in comparison with entropy S and pseudo-entropy

\tilde{S}

per spin. Dashed lines are polynomial fittings around

T_{c}

.

Figure 9.

T \frac{d \tilde{L}}{d T}

per spin of T-RBMs (a,b) and of the

\cup T

-RBM (c) with different numbers

n_{h}

of hidden units for the

2 d

(a,c) and

3 d

(b) Ising models in comparison with

T \frac{d S}{d T}

and

T \frac{d \tilde{S}}{d T}

per spin, as well as specific heat

c_{V}

calculated from Monte Carlo simulation.

Figure 9.

T \frac{d \tilde{L}}{d T}

per spin of T-RBMs (a,b) and of the

\cup T

-RBM (c) with different numbers

n_{h}

of hidden units for the

2 d

(a,c) and

3 d

(b) Ising models in comparison with

T \frac{d S}{d T}

and

T \frac{d \tilde{S}}{d T}

per spin, as well as specific heat

c_{V}

calculated from Monte Carlo simulation.

Table 1. When

sum (w_{i}^{T}) > 0

, a visible layer pattern

v

with magnetization

m > 0

(or

m < 0

) is more likely to be encoded by a hidden unit

h_{i} = + 1

(or

h_{i} = - 1

). When

sum (w_{i}^{T}) > 0

, the encoding is opposite.

Table 1. When

sum (w_{i}^{T}) > 0

, a visible layer pattern

v

with magnetization

m > 0

(or

m < 0

) is more likely to be encoded by a hidden unit

h_{i} = + 1

(or

h_{i} = - 1

). When

sum (w_{i}^{T}) > 0

, the encoding is opposite.

	$sum (w_{i}^{T}) > 0$	$sum (w_{i}^{T}) < 0$
$m > 0$	$h_{i} = + 1$	$h_{i} = - 1$
$m < 0$	$h_{i} = - 1$	$h_{i} = + 1$

Table 2.

T_{c}

estimated according to the peak of

T \frac{d \tilde{L}}{d T}

obtained from single-temperature T-RBMs and the all-temperature

\cup T

-RBM with different numbers (

n_{h}

) of hidden units. Predictions from numerical derivatives

T \frac{d S}{d T}

and

T \frac{d \tilde{S}}{d T}

are also shown for comparison. Results extracted from the peak of

c_{V}

obtained by Monte Carlo simulations of finite systems are listed under “MC”. “Exact” refers to analytical or numerical results for infinite systems.

Table 2.

T_{c}

estimated according to the peak of

T \frac{d \tilde{L}}{d T}

obtained from single-temperature T-RBMs and the all-temperature

\cup T

-RBM with different numbers (

n_{h}

) of hidden units. Predictions from numerical derivatives

T \frac{d S}{d T}

and

T \frac{d \tilde{S}}{d T}

are also shown for comparison. Results extracted from the peak of

c_{V}

obtained by Monte Carlo simulations of finite systems are listed under “MC”. “Exact” refers to analytical or numerical results for infinite systems.

Model	$n_{h} = 400$	900	1600	2500	S	$\tilde{S}$	MC	Exact
$2 d$ T-RBM	2.240	2.291	2.316	2.367	2.267	2.367	2.28	2.269
$2 d$ $\cup T$ -RBM	2.189	2.163	2.214	-	2.267	2.367	2.28	2.269
$3 d$ T-RBM	4.444	4.434	4.444	-	4.390	4.383	4.44	4.511

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gu, J.; Zhang, K. Thermodynamics of the Ising Model Encoded in Restricted Boltzmann Machines. Entropy 2022, 24, 1701. https://doi.org/10.3390/e24121701

AMA Style

Gu J, Zhang K. Thermodynamics of the Ising Model Encoded in Restricted Boltzmann Machines. Entropy. 2022; 24(12):1701. https://doi.org/10.3390/e24121701

Chicago/Turabian Style

Gu, Jing, and Kai Zhang. 2022. "Thermodynamics of the Ising Model Encoded in Restricted Boltzmann Machines" Entropy 24, no. 12: 1701. https://doi.org/10.3390/e24121701

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Thermodynamics of the Ising Model Encoded in Restricted Boltzmann Machines

Abstract

1. Introduction

2. Models and Methods

2.1. Dataset of Ising Configurations Generated by Monte Carlo Simulations

2.2. Restricted Boltzmann Machine

2.3. Loss Function and Training of RBMs

3. Results and Discussion

3.1. Filters and Inverse Filters

3.2. Hidden Layer

3.3. Visible Energy

3.4. Pseudo-Likelihood and Entropy Estimation

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Statistical Thermodynamics of Ising Model

Appendix B. Energy and Probability of RBMs

Appendix C. Maximum Likelihood Estimation and Gradient Descent of RBMs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI