Next Article in Journal
Hyperchaos, Intermittency, Noise and Disorder in Modified Semiconductor Superlattices
Previous Article in Journal
A Novel Drinking Category Detection Method Based on Wireless Signals and Artificial Neural Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Thermodynamics of the Ising Model Encoded in Restricted Boltzmann Machines

1
Division of Natural and Applied Sciences, Duke Kunshan University, Kunshan 215300, China
2
Data Science Research Center (DSRC), Duke Kunshan University, Kunshan 215300, China
*
Author to whom correspondence should be addressed.
Entropy 2022, 24(12), 1701; https://doi.org/10.3390/e24121701
Submission received: 18 October 2022 / Revised: 13 November 2022 / Accepted: 17 November 2022 / Published: 22 November 2022
(This article belongs to the Section Statistical Physics)

Abstract

:
The restricted Boltzmann machine (RBM) is a two-layer energy-based model that uses its hidden–visible connections to learn the underlying distribution of visible units, whose interactions are often complicated by high-order correlations. Previous studies on the Ising model of small system sizes have shown that RBMs are able to accurately learn the Boltzmann distribution and reconstruct thermal quantities at temperatures away from the critical point T c . How the RBM encodes the Boltzmann distribution and captures the phase transition are, however, not well explained. In this work, we perform RBM learning of the 2 d and 3 d Ising model and carefully examine how the RBM extracts useful probabilistic and physical information from Ising configurations. We find several indicators derived from the weight matrix that could characterize the Ising phase transition. We verify that the hidden encoding of a visible state tends to have an equal number of positive and negative units, whose sequence is randomly assigned during training and can be inferred by analyzing the weight matrix. We also explore the physical meaning of the visible energy and loss function (pseudo-likelihood) of the RBM and show that they could be harnessed to predict the critical point or estimate physical quantities such as entropy.

1. Introduction

The tremendous success of deep learning in multiple areas over the last decade has really revived the interplay between physics and machine learning, in particular neural networks [1]. On the one hand, (statistical) physics ideas [2], such as the renormalization group (RG) [3], the energy landscape [4], free energy [5], glassy dynamics [6], jamming [7], Langevin dynamics [8], and field theory [9], shed some light on the interpretation of deep learning and statistical inference in general [10]. On the other hand, machine learning and deep learning tools are harnessed to solved a wide range of physics problems, such as interaction potential construction [11], phase transition detection [12,13], structure encoding [14], physical concepts’ discovery [15], and many others [16,17]. At the very intersection of these two fields lies the restricted Boltzmann machine (RBM) [18], which serves as a classical paradigm to investigate how an overarching perspective could benefit both sides.
The RBM uses hidden–visible connections to encode (high-order) correlations between visible units [19]. Its precursor—the (unrestricted) Boltzmann machine—was inspired by spin glasses [20,21] and is often used in the inverse Ising problem to infer physical parameters [22,23,24]. The restriction of hidden–hidden and visible–visible connections in RBMs allows for more efficient training algorithms and, therefore, leads to recent applications in Monte Carlo simulation acceleration [25], quantum wavefunction representation [26,27], and polymer configuration generation [28]. Deep neural networks formed by stacks of RBMs have been mapped onto the variational RG due to their conceptual similarity [29]. RBMs are also shown to be equivalent to tensor network states from quantum many-body physics [30] and interpretable in light of statistical thermodynamics [31,32,33]. As simple as it seems, energy-based models like the RBM could eventually become the building blocks of autonomous machine intelligence [34].
Besides the above-mentioned efforts, the RBM has also been applied extensively in the study of the minimal model for second-order phase transition—the Ising model. For the small systems under investigation, it was found that RBMs with an enough hidden units can encode the Boltzmann distribution, reconstruct thermal quantities, and generate new Ising configurations fairly well [35,36,37]. The visible → hidden → visible ⋯ generating sequence of the RBM can be mapped onto an RG flow in physical temperature (often towards the critical point) [38,39,40,41,42]. However, the mechanism and power of the RBM to capture physics concepts and principles have not been fully explored. First, in what way is the Boltzmann distribution of the Ising model learned by the RBM? Second, can the RBM learn and even quantitatively predict the phase transition without extra human knowledge? An affirmative answer to the second question is particularly appealing, because simple unsupervised learning methods such as principal component analysis (PCA) using configuration information alone do not provide quantitative prediction for the transition temperature [43,44,45] and supervised learning with neural networks requires human labeling of the phase type or temperature of a given configuration [46,47].
In this article, we report a detailed numerical study on RBM learning of the Ising model with a system size much larger than those used previously. The purpose is to thoroughly dissect the various parts of the RBM and reveal how each part contributes to the learning of the Boltzmann distribution of the input Ising configurations. Such understanding allows us to extract several useful machine learning estimators or predictors for physical quantities, such as entropy and phase transition temperature. Conversely, the analysis of a physical model helps us to obtain important insights about the meaning of RBM parameters and functions, such as the weight matrix, visible energy, and pseudo-likelihood. Below, we first introduce our Ising datasets and the RBM and its training protocols in Section 2. We then report and discuss the results of the model parameters, hidden layers, visible energy, and pseudo-likelihood in Section 3. After the conclusion, more details about the Ising model and the RBM are provided in the Appendix A, Appendix B and Appendix C. Sample codes of the RBM are shared on GitHub at https://github.com/Jing-DS/isingrbm (accessed on 18 November 2022).

2. Models and Methods

2.1. Dataset of Ising Configurations Generated by Monte Carlo Simulations

The Hamiltonian of the Ising model with N = L d spins in a configuration s = [ s 1 , s 2 , , s N ] T on a d-dimensional hypercubic lattice of linear dimension L in the absence of a magnetic field is
H ( s ) = J i , j s i s j
where the spin variable s i = ± 1 ( i = 1 , 2 , , N ), the coupling parameter J > 0 (set to unity) favors ferromagnetic configurations (parallel spins), and the notation i , j means to sum over nearest neighbors [48]. At a given temperature T, the configuration s drawn from the sample space of 2 N states follows the Boltzmann distribution
p T ( s ) = e H ( s ) k B T Z T
where Z T = s e H ( s ) k B T is the partition function. The Boltzmann constant k B is set to unity.
Using single-flip Monte Carlo simulations under periodic boundary conditions [49], we generate Ising configurations for two-dimensional ( 2 d ) systems ( d = 2 ) of L = 64 ( N = 4096 ) at n T = 16 temperatures T = 0.25 , 0.5 , 0.75 , 1.0 , , 4.0 (in units of J / k B ) and for three-dimensional ( 3 d ) systems ( d = 3 ) of L = 16 ( N = 4096 ) at n T = 20 temperatures T = 2.5 , 2.75 , 3.0 , 3.25 , 3.5 , 3.75 , 4.0 , 4.25 , 4.3 , 4.4 , 4.5 , 4.6 , 4.7 , 4.75 , 5.0 , 5.25 , 5.5 , 5.75 , 6.0 , 6.25 . After being fully equilibrated, M = 50,000 configurations at each T are collected into a dataset D T for that T. For 2 d systems, we also use a dataset D T consisting of 50,000 configurations per temperature from all Ts.
Analytical results of the thermal quantities of the 2 d Ising model, such as internal energy E , (physical) entropy S, heat capacity C V , and magnetization m , are well known [50,51,52,53]. Numerical simulation methods and results of the 3 d Ising model have also been reported [54]. The thermodynamic definitions and relations used in this work are summarized in Appendix A.

2.2. Restricted Boltzmann Machine

The restricted Boltzmann machine (RBM) is a two-layer energy-based model with n h hidden units (or neurons) h i = ± 1 ( i = 1 , 2 , , n h ) in the hidden layer, whose state vector is h = [ h 1 , h 2 , , h n h ] T , and n v visible units v j = ± 1 ( j = 1 , 2 , , n v ) in the visible layer, whose state vector is v = [ v 1 , v 2 , , v n v ] T (Figure 1) [55]. In this work, the visible layer is just the Ising configuration vector, i.e., v = s , with n v = N . We chose the binary unit { 1 , + 1 } (instead of { 0 , 1 } ) to better align with the definition of Ising spin variable s i .
The total energy E θ ( v , h ) of the RBM is defined as
E θ ( v , h ) = b T v c T h h T W v = j = 1 n v b j v j i = 1 n h c i h i i = 1 n h j = 1 n v W i j h i v j
where b = [ b 1 , b 2 , , b n v ] T is the visible bias, c = [ c 1 , c 2 , , c n h ] T is the hidden bias, and
W n h × n v = w 1 T w 2 T w n h T = | | | w : , 1 w : , 2 w : , n v | | |
is the interaction weight matrix between visible and hidden units. Under this notation, each row vector w i T (of dimension n v ) is a filter mapping from the visible state v to a hidden unit i, and each column vector w : , j (of dimension n h ) is an inverse filter mapping from the hidden state h to a visible unit j. All parameters are collectively written as θ = { W , b , c } . “Restricted” refers to the lack of interaction between hidden units or between visible units.
The joint distribution for an overall state ( v , h ) is
p θ ( v , h ) = e E θ ( v , h ) Z θ
where the partition function of the RBM:
Z θ = v h e E θ ( v , h ) .
The learned model distribution for visible state v is from the marginalization of p θ ( v , h ) :
p θ ( v ) = h p θ ( v , h ) = 1 Z θ e E θ ( v ) ,
where the visible energy (an effective energy for visible state v (often termed as “free energy” in the machine learning literature)):
E θ ( v ) = b T v i = 1 n h ln e w i T v c i + e w i T v + c i
is defined according to e E θ ( v ) = h e E θ ( v , h ) such that Z θ = v e E θ ( v ) . See Appendix B for a detailed derivation.
The conditional distributions to generate h from v , p θ ( h | v ) , and to generate v from h , p θ ( v | h ) , satisfying p θ ( v , h ) = p θ ( h | v ) p θ ( v ) = p θ ( v | h ) p θ ( h ) , can be written as products:
p θ ( h | v ) = i = 1 n h p θ ( h i | v ) p θ ( v | h ) = j = 1 n v p θ ( v j | h )
because h i are independent of each other (at fixed v ) and v j are independent of each other (at fixed h ). It can be shown that
p θ ( h i = 1 | v ) = σ 2 ( c i + w i T v ) p θ ( h i = 1 | v ) = 1 σ 2 ( c i + w i T v ) p θ ( v j = 1 | h ) = σ 2 ( b j + h T w : , j ) p θ ( v j = 1 | h ) = 1 σ 2 ( b j + h T w : , j )
where the sigmoid function σ ( z ) = 1 1 + e z (Appendix B).

2.3. Loss Function and Training of RBMs

Given the dataset D = [ v 1 , v 2 , , v M ] T of M samples generated independently from the identical data distribution p D ( v ) ( v i . i . d . p D ( v ) ), the goal of RBM learning is to find a model distribution p θ ( v ) that approximates p D ( v ) . In the context of this work, the data samples v s are Ising configurations, and the data distribution p D ( v ) is or is related to the Ising–Boltzmann distribution p T ( s ) .
Based on maximum likelihood estimation, the optimal parameters θ * = arg min θ L ( θ ) can be found by minimizing the negative log likelihood:
L ( θ ) = ln p θ ( v ) v p D = E θ ( v ) v p D + ln Z θ
which serves as the loss function of RBM learning. Note that the partition function Z θ only depends on the model, not on the data. Since the calculation of Z θ involves summation over all possible ( v , h ) states, which is not feasible, L ( θ ) cannot be evaluated exactly, except for very small systems [56]. Special treatments have to be devised, for example by mean-field theory [57] or by importance sampling methods [58]. An interesting feature of the RBM is that, although the actual loss function L ( θ ) is not accessible, its gradient:
θ L ( θ ) = θ E θ ( v ) v p D θ E θ ( v ) v p θ
can be sampled, which enables a gradient descent learning algorithm. From step t to step t + 1 , the model parameters are updated with learning rate η as
θ t + 1 = θ t η θ L ( θ t ) .
To evaluate the loss function, we used its approximate, the pseudo-(negative log) likelihood [59]:
L ˜ ( θ ) = i = 1 n v ln p θ ( v i | v j i ) v p D L ( θ )
where the notation:
p θ ( v i | v j i ) = p θ ( v i | v j for j i ) = e E θ ( v ) e E θ ( v ) + e E θ ( [ v 1 , , v i , , v n v ] )
is the conditional probability for component v i given that all the other components v j ( j i ) are fixed [37]. Practically, to avoid the time-consuming sum over all visible units i = 1 n v , it is suggested to randomly sample one i 0 { 1 , 2 , , n v } and estimate that:
L ˜ ( θ ) n v ln p θ ( v i 0 | v j i 0 ) v p D ,
if all the visible units are on average translation-invariant [60]. To monitor the reconstruction error, we also calculated the cross-entropy CE between the initial configuration v and the conditional probability p θ ( v | h ) for reconstruction v p θ ( h | v ) h p θ ( v | h ) v (see Appendix C for the definition).
For both 2 d and 3 d Ising systems, we first trained single-temperature RBMs (T-RBM). M = 50,000 Ising configurations at each T forming a dataset D T are used to train one model such that there are n T T-RBMs in total. While n v = N , we tried various numbers of hidden units with n h = 400 , 900 , 1600 , 2500 in 2 d and n h = 400 , 900 , 1600 in 3 d . For 2 d systems, we also trained an all-temperature RBM ( T -RBM) for which 50,000 Ising configurations per temperature are drawn to compose a dataset D T of M = 50,000 n T = 8 × 10 5 samples. The number of hidden units for this T -RBM is n h = 400 , 900 , 1600 . Weight matrix W is initialized with Glorot normal initialization [61] ( b and c are initialized as zero). Parameters are optimized with the stochastic gradient descent algorithm of learning rate η = 1.0 × 10 4 and batch size 128. The negative phase (model term) of the gradient θ E θ ( v ) v p θ is calculated using CD-k Gibbs sampling with k = 5 . We stopped the training until L ˜ and CE converged, typically at 100–2000 epochs (see the Supplementary Materials). Three Nvidia GPU cards (GeForce RTX 3090 and 2070) were used to train the model, which took about two minutes per epoch for a M = 50,000 dataset.

3. Results and Discussion

In this section, we investigate how the RBM uses its weight matrix W and hidden layer h to encode the Boltzmann distributed states of the Ising model and what physical information can be extracted from machine learning concepts such as the visible energy and loss function.

3.1. Filters and Inverse Filters

It can be verified that the trained weight matrix elements W i j of a T-RBM follow a Gaussian distribution of zero mean with the largest variance at T T c (Figure 2a) [62]. The low temperature distribution here is different from the uniform distribution observed in [35], which results from the uniform initialization scheme used there. This suggests that the training of RBMs could converge to different minima when initialized differently. According to Equation (10), the biases c i and b j can be associated with the activation threshold of a hidden unit and a visible unit, respectively. For example, whether a hidden unit is activated ( h i = + 1 ) or anti-activated ( h i = 1 ) depends on whether the incoming signal w i T v from all visible units exceeds the threshold c i . The values of c i (and b j ) are all close to zero and are often negligible in comparison with the total incoming signal w i T v (and h T w : , j ) (see the Supplementary Materials for the results of constrained RBMs where all biases are set to zero). The distribution of c i and b j should in principle be symmetric about zero (Figure 2b,c). A non-zero mean can be caused by an unbalanced dataset with an unequal number of m > 0 and m < 0 Ising configurations. The corresponding filter or inverse filter sum may also be distributed with a non-zero mean in order to compensate the asymmetric bias, as will be shown next.
Since v = s is an Ising configuration with ± 1 units in our problem, w i T v will be more positive (or negative) if the components of w i T better match (or anti-match) the signs of the spin variables. In this sense, we can think of w i T as a filter extracting certain patterns in Ising configurations. Knowing the representative spin configurations of the Ising model below, close to, and above the critical temperature T c , we expect that w i T ( i = 1 , 2 , , n h ) wrapped into an L d arrangement exhibits similar features. In Figure 3a, we show sample filters of T-RBMs with n h = 400 trained for the 2 d Ising model at three temperatures T = 1.0 , 2.25 , and 3.5 (see the Supplementary Materials for more examples of filters). At low T, the components of w i T tend to be mostly positive (or negative), matching the spin up (or spin down) configurations in the ferromagnetic phase. At high T, filters w i T possess strip domains consisting of roughly equal numbers of well-mixed positive and negative components, like Ising configurations during spinodal decomposition. Close to T c , the w i T patterns vary dramatically from each other, in accord with the large critical fluctuation. In particular, some even exhibit hierarchical clusters of various sizes. The element sum of the filter—filter sum sum ( w i T ) = j = 1 n v W i j —plays a similar role as the magnetization m. The distribution of all the n h filter sums at each T changes with increasing temperature as the Ising magnetization changes, from bimodal to unimodal with the largest variance at T c (Figure 3b). This suggests that the peak of the variance j = 1 n v W i j 2 j = 1 n v W i j 2 as a function of temperature coincides with the Ising phase transition (inset of Figure 3b). More detailed results about the 2 d and 3 d Ising models are in the Supplementary Materials.
When a hidden layer h is provided, the RBM reconstructs the visible layer v by applying the n v inverse filters w : , j ( j = 1 , 2 , , n v ) on h . The distribution of the inverse filter sum sum ( w : , j ) = i = 1 n h W i j is Gaussian with a mean close to zero (Figure 3c), where a large deviation from zero mean is accompanied by a non-zero average bias j b j / n v , as mentioned above (Figure 2b). We find that this is a result of the unbalanced dataset, which has ∼60% m < 0 Ising configurations. Because the activation probability of a visible unit v j is determined by w : , j , the correlation between visible units (Ising spins) is reflected in the correlation between inverse filters. This is equivalent to the analysis of the n v × n v matrix W T W or its eigenvectors as in [38,42], whose entries are the inner product w : , j T w : , j of inverse filters. We can therefore locate the Ising phase transition by identifying the temperature with the strongest correlation among the w : , j s, e.g., the peak of w : , j T w : , j at a given distance r j j (inset of Figure 3c). See the Supplementary Materials for results in 2 d and 3 d .
In contrast, the filters of the T -RBM trained from 2 d Ising configurations at all temperatures have background patterns like the high temperature T-RBM (in the paramagnetic phase). A clear difference is that most T -RBM filters have one large domain of positive or negative elements (Figure 4a), similar to the receptive field in a deep neural network [29]. This domain randomly covers an area of the visual field of the L × L Ising configuration (see the Supplementary Materials for all the n h filters). The existence of such domains in the filter causes the filter sum and the corresponding bias c i to be positive or negative with a bimodal distribution (Figure 4b,c). The inverse filter sum and its corresponding bias b j still have a Gaussian distribution, although the unbalanced dataset shifts the mean of b j away from zero.

3.2. Hidden Layer

Whether a hidden unit uses + 1 or 1 to encode a pattern of the visible layer v is randomly assigned during training. In the former case, the filter w i T matches the pattern ( w i T v is positive); in the latter case, the filter anti-matches the pattern ( w i T v is negative). For a visible layer v of magnetization m, the sign of w i T v and the encoding h i is largely determined by the sign of sum ( w i T ) (Table 1). Since the distribution of sum ( w i T ) is symmetric about zero, the hidden layer of a T-RBM roughly consists of an equal number of + 1 and 1 units—the “magnetization” m h = 1 n h i = 1 n h h i of the hidden layer is always close to zero and its average m h 0 . The histogram of m h for all hidden encodings of visible states is expected to be symmetric about zero (Figure 5). We found that, for the smallest n h , the histogram of m h at temperatures close to T c is bimodal due to the relatively large randomness of small hidden layers. As more hidden units are added, the two peaks merge into one and the distribution of m h becomes narrower. This suggests that a larger hidden layer tends to have a smaller deviation from m h = 0 .
The order of the h i = ± 1 sequence in each hidden encoding h is arbitrary, but relatively fixed once the T-RBM is trained. The permutation of hidden units together with their corresponding filters (swap the rows of the matrix W ) results in an equivalent T-RBM. Examples of hidden layers of T-RBMs with n h = 400 at different temperatures are shown in the inset of Figure 5, where the vector h is wrapped into a 20 × 20 arrangement. Note that there are actually no spatial relationships between different hidden units, and any apparent pattern in this 2 d illustration is an artifact of the wrapping protocol.
As a generative model, a T-RBM can be used to produce more Boltzmann-distributed Ising configurations. Starting from a random hidden state h ( 0 ) , this is often fulfilled by a sequence of Markov chain moves h ( 0 ) v ( 0 ) h ( 1 ) v ( 1 ) until the steady state is achieved [31]. Based on the above-mentioned observations, we can design an algorithm to initialize h ( 0 ) that better captures the hidden encoding of visible states (equilibrium Ising configurations), thus enabling faster convergence of the Markov chain. After choosing a low temperature T L and a high temperature T H , we generate the hidden layer as follows:
  • At low T T L < T c , if sum ( w i T ) > 0 , h i = + 1 ; if sum ( w i T ) < 0 , h i = 1 . This will be an encoding of an m > 0 ferromagnetic configuration. To encode of an m < 0 ferromagnetic configuration, just flip the sign of h i .
  • At high T T H > T c , randomly assign h i = + 1 or 1 with equal probability. This will be an encoding of a paramagnetic configuration with m 0 .
  • At intermediate T L < T < T H , to encode an m > 0 Ising configuration, if sum ( w i T ) > 0 , assign h i = + 1 with probability p h ( 0.5 , 1.0 ) and h i = 1 with probability 1 p h ; if sum ( w i T ) < 0 , assign h i = 1 with probability p h ( 0.5 , 1.0 ) and h i = + 1 with probability 1 p h . p h is a predetermined parameter, and the above two algorithms are just the special cases with p h = 1.0 ( T T L ) and p h = 0.5 ( T T H ), respectively. In practice, one may approximately use p h = ( | m | + 1 ) / 2 or use linear interpolation within T L < T < T H , p h = 0.5 + 0.5 ( T T L ) / ( T H T L ) .
Below, we compare the (one-step) reconstructed thermal quantities using two different initial hidden encodings with results from a conventional multi-step Markov chain (Figure 6). The hidden encoding methods proposed here are quite reliable at low and high T, but less accurate at T close to T c .

3.3. Visible Energy

When a T-RBM for temperature T is trained, we expect that p θ ( v ) p D ( v ) p T ( s ) —the Boltzmann distribution at that T. Although formally related to the physical energy in the Boltzmann factor (with temperature absorbed), the visible energy E θ ( v ) of an RBM should be really considered as the negative log (relative) probability of a visible state v . For single-temperature T-RBMs, the mean visible energy E θ ( v ) increases monotonically with temperature (except for the largest n h , which might be due to overfitting) (Figure 7a,b). The value of E θ ( v ) and its trend, however, cannot be used to identify the physical phase transition. In fact, E θ ( v ) can differ from the reduced Hamiltonian H ( s ) k B T by an arbitrary (temperature-dependent) constant while still maintaining the Boltzmann distribution p θ ( v ) p T ( s ) (if the partition function Z θ is calibrated accordingly).
The trend of E θ ( v ) for T-RBMs can be understood by considering following approximate forms. First, due to the symmetry of + 1 and 1 , the biases b j and c i are all close to zero. A constrained T-RBM with zero bias has a visible energy:
E W ( v ) = i = 1 n h ln e w i T v + e w i T v
that approximates the visible energy of the full T-RBM, i.e., E θ ( v ) E W ( v ) . Next, unless w i T v is close to zero, one of the two exponential terms in Equation (17) always dominates such that E W ( v ) E ˜ W ( v ) , where
E ˜ W ( v ) = i = 1 n h w i T v = i = 1 n h j = 1 n v W i j v j .
Equation (18) can further be approximated by setting v = 1 with all v j = + 1 , i.e., E ˜ W ( v ) E ˜ W ( 1 ) with
E ˜ W ( 1 ) = i = 1 n h sum ( w i T ) = i = 1 n h j = 1 n v W i j .
In summary, E W ( v ) , E ˜ W ( v ) , and E ˜ W ( 1 ) are all good approximations to the original E θ ( v ) (Figure 7a). The increase of mean E θ ( v ) with temperature coincides with the increase of sum ( w i T ) with temperature, which is evident from Figure 3b. At fixed temperature, the decrease of E θ ( v ) with n h is a consequence of the sum i = 1 n h in the definition of visible energy. The variance E θ 2 E θ 2 is a useful quantity for phase transition detection, because it reflects the fluctuation of the probability p θ ( v ) . In both low T ferromagnetic and high T paramagnetic regimes, p θ ( v ) is relatively homogeneous among different states. When T is close to T c , the variance of p θ ( v ) and E θ ( v ) is expected to peak (Figure 7d,e). The abnormal rounded (and even shifted) peaks at large n h could be a sign of overfitting.
For the all-temperature T -RBM, the Ising phase transition can be revealed by either the sharp increase of the mean E θ ( v ) or the peak of the variance E θ 2 E θ 2 (Figure 7c,f). However, this apparent detection can be a trivial consequence of the special composition of the dataset D T , which contains Ising configurations at different temperatures in equal proportion. Only configurations at a specific T are fed into the model to calculate the average quantity at that T. Technically, a visible state v in D T is not subject to the Boltzmann distribution at any specific temperature. Instead, the true ensemble of D T is a collection of n T different Boltzmann-distributed subsets. Many replicas of the same or similar ferromagnetic states are in D T , giving rise to a large multiplicity, high probability, and low visible energy for such states. In comparison, high temperature paramagnetic states are all different from each other and, therefore, have low p θ ( v ) (high E θ ( v ) ) for each one of them. Knowing this caveat, one should be cautious when monitoring the visible energy of a T -RBM to detect phase transition, because changing the proportion of Ising configurations at different temperatures in D T can modify the relative probability of each state.

3.4. Pseudo-Likelihood and Entropy Estimation

The likelihood L ( θ ) defined in Equation (11) is conceptually equivalent to the physical entropy S defined by the Gibbs entropy formula, apart from the Boltzmann constant k B difference (Appendix A). However, just as entropy S cannot be directly sampled, the exact value of L ( θ ) is not accessible. In order to estimate S, we calculated the pseudo-likelihood L ˜ ( θ ) instead, which is based on the mean-field-like approximation p θ ( v ) i = 1 n v p θ ( v i | v j i ) . Similar ideas to estimate the free energy or entropy were put forward with the aid of variational autoregressive networks [63] or neural importance sampling [64]. The true and estimated entropy of the 2 d and 3 d Ising models using T-RBMs with different n h are shown in Figure 8a,b. As a comparison, we also considered a “pseudo-entropy” with a similar approximation:
S ˜ = k B i = 1 N p T ( s i | s j i ) s p T S
where the conditional probability:
p T ( s i | s j i ) = e H ( s ) k B T e H ( s ) k B T + e H ( [ s 1 , , s i , , s N ] ) k B T
and the ensemble average s p T is taken over states obtained from Monte Carlo sampling. In both 2 d and 3 d , S ˜ is lower than the true S, especially at high T, because a mean-field treatment tends to underestimate fluctuations.
While increasing model complexity by adding hidden units is usually believed to reduce the reconstruction error, e.g., of energy and heat capacity [35,36] (see also the Supplementary Materials), a recent study suggested that a trade-off could exist between the accuracy of different statistical quantities [65]. Here, we found that the pseudo-likelihood of T-RBMs with the fewest hidden units in our trials ( n h = 400 ) appears to provide the best prediction for entropy. Increasing n h leads to larger deviations from the true S at higher T. The decreasing of L ˜ with n h at fixed temperature agrees with the trend of the visible energy. A lower E θ ( v ) corresponds to a higher p θ ( v ) and, thus, a lower L ˜ according to its definition. The surprisingly good performance of L ˜ in approximating S could be due to the fact that visible units v i in RBMs are only indirectly correlated through hidden units, which collectively serve as an effective mean-field on each visible unit. We also calculated L ˜ ( θ ) with the all-temperature T -RBM in 2 d (Figure 8c). Compared with single-temperature T-RBMs of the same n h (Figure 8a), the T -RBM predicts higher L ˜ ( θ ) with considerable deviations even at low T. The trend of L ˜ ( θ ) also agrees with that of E θ ( v ) (Figure 7c).
The knowledge of the entropy allows us to estimate the phase transition point according to the thermodynamic relation C V = T d S d T . We constructed this estimated C V as a function of temperature using L ˜ ( θ ) and its numerical fitting, whose peaks are expected to be located at T c (Figure 9). The predicted T c s are compared with the results from entropy and pseudo-entropy, as well as the Monte Carlo simulation results for our finite systems and the known exact values for infinite systems in Table 2. It can be seen that single-temperature T-RBMs capture the transition point fairly well within an error of about 1– 3 % .

4. Conclusions

In this work, we trained RBMs using equilibrium Ising configurations in 2 d and 3 d collected from Monte Carlo simulations at various temperatures. For single-temperature T-RBMs, the filters (row vectors) and the inverse filters (column vectors) of the weight matrix exhibit different characteristic patterns and correlations, respectively, below, around, and above the phase transition. These metrics, such as filter sum fluctuation and inverse filter correlation, can be used to locate the phase transition point. The hidden layer h on average contains an equal number of + 1 and 1 units, whose variance decreases as more hidden units are added. The sign of a particular hidden unit h i is determined by the signs of the filter sum sum ( w i T ) and the magnetization m of the visible pattern. However, there is no spatial pattern in the sequence of positive and negative units in a hidden encoding.
The visible energy reflects the relative probability of visible states in the Boltzmann distribution. Although the mean of visible energy is not directly related to the (physical) internal energy and does not reveal a clear transition, its fluctuation, which peaks at the critical point, can be used to identify the phase transition. The value and trend of the visible energy can be understood from its several approximation forms, in particular the sum of the absolute value of filter sums. The pseudo-likelihood of RBMs is conceptually related to and can be used to estimate the physical entropy. Numerical differentiation of the pseudo-likelihood provides another estimator of the transition temperature because it provides an estimate of the heat capacity. All these predictions about the critical temperature were made by unsupervised RBM learning, for which human labeling of the phase types is not needed.
As a comparison, we also trained an all-temperature T -RBM, whose dataset is a mixture of Boltzmann-distributed states over a range of temperatures. Each filter of this T -RBM is featured by one large domain in its receptive field. Although the visible energy and pseudo-likelihood of the T -RBM show a certain signature of the phase transition, one should be cautious, as this detection could be an artifact of the composition of the dataset. Changing the proportions of Ising configurations at different temperatures could bias the probability and the transition learned by the T -RBM.
By extracting the underlying (Boltzmann) distribution of the input data, RBMs capture the rapid (phase) transition of such a distribution as the tuning parameter (temperature) is changed, without knowledge of the physical Hamiltonian. Information about the distribution is completely embedded in the configurations and their frequencies in the dataset. It would be interesting to see if such a general scheme of RBM learning can be extended to study other physical models of phase transition.

Supplementary Materials

The following Supporting Information can be downloaded at: https://www.mdpi.com/article/10.3390/e24121701/s1, training of RBMs; the constrained RBM with biases set to zero; variance of filter sum; correlation of inverse filter; reconstructed thermal quantities by RBMs; example filters of single temperature RBMs at different temperatures; example filters of the all temperature RBM.

Author Contributions

Conceptualization, J.G. and K.Z.; methodology, K.Z.; software, J.G.; formal analysis, J.G.; investigation, J.G. and K.Z.; data curation, J.G.; writing—original draft preparation, J.G. and K.Z.; writing—review and editing, K.Z.; visualization, J.G.; supervision, K.Z. All authors have read and agreed to the published version of the manuscript.

Funding

Duke Kunshan startup and SRS fund; Kunshan Government Research fund (KGR-R97030021S).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank Duke Kunshan startup funding, the Summer Research Scholars (SRS) program and Kunshan Government Research fund (KGR-R97030021S) for supporting this work. We also thank Giuseppe Magnifico and Kim Nicoli for helpful discussions.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
RBMrestricted Boltzmann machine
T-RBMsingle-temperature restricted Boltzmann machine
T -RBMall-temperature restricted Boltzmann machine

Appendix A. Statistical Thermodynamics of Ising Model

In this Appendix, we review the statistical thermodynamics of the Ising model covered in this work. The internal energy at a given temperature:
E = s p T ( s ) H ( s ) = s H ( s ) e H s k B T Z T
where means to take the thermal average over equilibrated configurations. The heat capacity is
C V = k B β 2 E 2 E 2
where β = 1 k B T , and the heat capacity per spin (or specific heat) is c V = C V / N . The magnetization per spin:
m = 1 N i = 1 N s i .
In small finite systems, because flips from m to m configurations are common, we need to take the absolute value | m | before the thermal average:
| m | = 1 N i = 1 N s i .
The physical entropy can be defined using the Gibbs entropy formula:
S = k B ln p T ( s ) = k B s p T ( s ) ln p T ( s ) .
For the 2 d Ising model, the critical temperature solved from sinh 2 J k B T c = 1 is k B T c = 2 J ln ( 1 + 2 ) = 2.269185 J . Define
K = J k B T , x = e 2 K , q ( K ) = 2 sinh 2 K cosh 2 2 K K 1 ( q ) = 0 π / 2 d ϕ 1 q 2 sin 2 ϕ , E 1 ( q ) = 0 π / 2 d ϕ 1 q 2 sin 2 ϕ ;
analytical results of the 2 d Ising model are expressed as: magnetization per spin [52]:
m = 1 + x 2 ( 1 x 2 ) 2 1 6 x 2 + x 4 1 2 1 4 = [ 1 sinh 4 ( 2 K ) ] 1 / 8 ,
internal energy per spin [50]:
E N = J coth 2 K 1 + 2 π 2 tanh 2 2 K 1 K 1 ( q ) ,
specific heat [53]:
c V = k B 4 π K coth 2 K 2 K 1 ( q ) E 1 ( q ) 1 tanh 2 2 K π 2 + 2 tanh 2 2 K 1 K 1 ( q ) ,
and the partition function per spin (or free energy per spin f = F / N ) [51]:
β f = ln ( 2 cosh 2 K ) + 1 π 0 π / 2 ln 1 + 1 q 2 sin 2 ϕ d ϕ .
The equation for entropy can be obtained from thermodynamic relation F = E T S .
For the 3 d Ising model, m , E , and c V can be calculated directly from Monte Carlo sampling [54]. The numerical prediction for the critical temperature is T c 4.511 J k B [66]. Special techniques are needed to compute the free energy or entropy. We used the thermodynamic integration in the high temperature regime:
F = N k B T ln 2 + k B T 0 1 k B T E d β
or
S ( T ) = 0 T C V ( T ) T d T
in the low temperature regime, since S ( T 0 ) = 0 and C V ( T 0 ) 0 for the Ising model.

Appendix B. Energy and Probability of RBMs

In this Appendix, we review the derivations of the energy and probability of RBMs, which can be found in the standard machine learning literature [67]. The visible energy E θ ( v ) :
E θ ( v ) = ln h e E θ ( v , h ) = ln p θ ( v ) ln Z θ = ln e j n v b j v j h e i n h j n v W i j v j + c i h i = j n v b j v j ln h 1 = 1 + 1 h 2 = 1 + 1 h n h = 1 + 1 i = 1 n h e j n v W i j v j + c i h i = j n v b j v j ln i = 1 n h h i = 1 , 1 e j n v W i j v j + c i h i = j n v b j v j ln i = 1 n h e j n v W i j v j c i + e j n v W i j v j + c i = j n v b j v j i = 1 n h ln e j n v W i j v j c i + e j n v W i j v j + c i = b T v i = 1 n h ln e w i T v c i + e w i T v + c i .
The conditional probability:
p θ ( h | v ) = p θ ( v , h ) p θ ( v ) = e E θ ( v , h ) e E θ ( v ) = e b T v e E θ ( v ) e c T h + h T W v = 1 Ω θ ( v ) e c T h + h T W v
where the h -independent constant Ω θ ( v ) = e b T v E θ ( v ) = h e c T h + h T W v such that Z θ = v Ω θ ( v ) e b T v . Therefore,
p θ ( h | v ) = 1 Ω θ ( v ) e i = 1 n h c i h i + i = 1 n h h i w i T v = 1 Ω θ ( v ) e i = 1 n h h i c i + w i T v = 1 Ω θ ( v ) i = 1 n h e h i c i + w i T v = i = 1 n h p θ ( h i | v )
from which it can be recognized that p θ ( h i | v ) e h i c i + w i T v . The single-unit conditional probability:
p θ ( h i = 1 | v ) = p θ ( h i = 1 | v ) p θ ( h i = 1 | v ) + p θ ( h i = 1 | v ) = e c i + w i T v e c i w i T v + e c i + w i T v = 1 1 + e 2 ( c i + w i T v ) = σ 2 ( c i + w i T v ) .
Other relations of p θ ( h i = 1 | v ) , p θ ( v j = 1 | h ) and p θ ( v j = 1 | h ) can be found similarly.

Appendix C. Maximum Likelihood Estimation and Gradient Descent of RBMs

In this Appendix, we review the gradient descent algorithm of RBMs derived from maximum likelihood estimation [67]. The likelihood function for a given dataset D = [ v 1 , v 2 , , v M ] T is P θ ( D ) = m = 1 M p θ ( v m ) , and the maximum likelihood is equivalent to the minimum negative log likelihood (or its average):
θ * = arg max θ m = 1 M p θ ( v m ) = arg min θ m = 1 M ln p θ ( v m ) = arg min θ 1 M m = 1 M ln p θ ( v m ) = arg min θ ln p θ ( v ) v p D = arg min θ L ( θ )
where v p D means to randomly draw v from p D and is the expectation value (subject to the distribution). Alternatively, this can be considered as minimizing the Kullback–Leibler (KL) divergence:
D KL ( p D | p θ ) = m = 1 M p D ( v m ) ln p D ( v m ) p θ ( v m ) = ln p D ( v ) ln p θ ( v ) v p D 0
with respect to θ , where only the second term ln p θ ( v ) v p D depends on parameter θ . In this work, we used L ( θ ) as the loss function to train the RBMs.
It is sometimes useful to directly monitor the reconstruction error by comparing the input ( v ) and reconstructed configurations ( v ) or, more quantitatively, by the (normalized) cross-entropy:
CE = 1 n v j = 1 n v 1 v j = + 1 ln p θ ( v j = + 1 | h ) + 1 v j = 1 ln p θ ( v j = 1 | h ) v p D
where the indicator function 1 A = 1 if A is true or 0 if A is false.
The gradient of the loss function:
θ L ( θ ) = θ E θ ( v ) v p D + θ ln Z θ = θ E θ ( v ) v p D + θ ln Z θ
where
θ ln Z θ = θ Z θ Z θ = θ v e E θ ( v ) Z θ = v θ e E θ ( v ) Z θ = v e E θ ( v ) θ E θ ( v ) Z θ = v p θ ( v ) θ E θ ( v ) = θ E θ ( v ) v p θ .
Furthermore,
θ L ( θ ) = θ E θ ( v ) v p D θ E θ ( v ) v p θ = positive phase + negative phase = data term + model term
In both the positive and negative phase,
θ E θ ( v ) = θ b T v i = 1 n h ln e w i T v c i + e w i T v + c i
which has components:
E θ ( v ) W i j = v j e w i T v c i + v j e w i T v + c i e w i T v c i + e w i T v + c i = v j e w i T v c i e w i T v + c i e w i T v c i + e w i T v + c i = v j tanh w i T v + c i = v j ( 1 ) p θ ( h i = 1 | v ) + ( + 1 ) p θ ( h i = 1 | v ) = v j h i h i p θ ( h i | v ) E θ ( v ) c i = e w i T v c i + e w i T v + c i e w i T v c i + e w i T v + c i = tanh w i T v + c i = ( 1 ) p θ ( h i = 1 | v ) + ( + 1 ) p θ ( h i = 1 | v ) = h i h i p θ ( h i | v ) E θ ( v ) b j = v j .
To evaluate the expectation value θ E θ ( v ) , in the positive phase, v can be directly drawn from the dataset, while in the negative phase, v must be sampled from the model distribution p θ ( v ) . In practice, as an approximation, the Markov chain Monte Carlo (MCMC) method is used to generate v states that obey the distribution p θ ( v ) , such that
θ E θ ( v ) v p θ 1 sample size v p θ θ E θ ( v ) .
Using the conditional probability, p θ ( h | v ) and p θ ( v | h ) , we can generate a sequence of states:
v ( 0 ) h ( 0 ) v ( 1 ) h ( 1 ) v ( t ) h ( t ) .
As t , the MCMC converges with ( v ( t ) , h ( t ) ) p θ ( v , h ) and v ( t ) p θ ( v ) .
The Markov chain starting from a random v ( 0 ) takes many steps to equilibrate. There are two ways to speed up the sampling [68]:
  • k Step contrastive divergence (CD-k):
    For each parameter update, draw v ( 0 ) (or a minibatch) from the training data D = [ v 1 , v 2 , , v M ] T and run Gibbs sampling for k steps. Even CD-1 can work reasonably well.
  • Persistent contrastive divergence (PCD-k):
    Always keep the same MC during the entire training process. For each parameter update, run this persistent MC for another k steps to collect v states.

References

  1. Carleo, G.; Cirac, I.; Cranmer, K.; Daudet, L.; Schuld, M.; Tishby, N.; Vogt-Maranto, L.; Zdeborová, L. Machine learning and the physical sciences. Rev. Mod. Phys. 2019, 91, 045002. [Google Scholar] [CrossRef] [Green Version]
  2. Bahri, Y.; Kadmon, J.; Pennington, J.; Schoenholz, S.S.; Sohl-Dickstein, J.; Ganguli, S. Statistical mechanics of deep learning. Annu. Rev. Condens. Matter Phys. 2020, 11, 501. [Google Scholar] [CrossRef] [Green Version]
  3. Lin, H.W.; Tegmark, M.; Rolnick, D. Why does deep and cheap learning work so well? J. Stat. Phys. 2017, 168, 1223–1247. [Google Scholar] [CrossRef] [Green Version]
  4. Ballard, A.J.; Das, R.; Martiniani, S.; Mehta, D.; Sagun, L.; Stevenson, J.D.; Wales, D.J. Energy landscapes for machine learning. Phys. Chem. Chem. Phys. 2017, 19, 12585–12603. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Zhang, Y.; Saxe, A.M.; Advani, M.S.; Lee, A.A. Energy–entropy competition and the effectiveness of stochastic gradient descent in machine learning. Mol. Phys. 2018, 116, 3214–3223. [Google Scholar] [CrossRef] [Green Version]
  6. Baity-Jesi, M.; Sagun, L.; Geiger, M.; Spigler, S.; Arous, G.B.; Cammarota, C.; LeCun, Y.; Wyart, M.; Biroli, G. Comparing dynamics: Deep neural networks versus glassy systems. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2018; pp. 314–323. [Google Scholar]
  7. Geiger, M.; Spigler, S.; d’Ascoli, S.; Sagun, L.; Baity-Jesi, M.; Biroli, G.; Wyart, M. Jamming transition as a paradigm to understand the loss landscape of deep neural networks. Phys. Rev. E 2019, 100, 012115. [Google Scholar] [CrossRef] [Green Version]
  8. Feng, Y.; Tu, Y. The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima. Proc. Natl. Acad. Sci. USA 2021, 118, e2015617118. [Google Scholar] [CrossRef]
  9. Roberts, D.A.; Yaida, S.; Hanin, B. The Principles of Deep Learning Theory: An Effective Theory Approach to Understanding Neural Networks; Cambridge University Press: New York, NY, USA, 2022. [Google Scholar]
  10. Zdeborová, L.; Krzakala, F. Statistical physics of inference: Thresholds and algorithms. Adv. Phys. 2016, 65, 453–552. [Google Scholar] [CrossRef] [Green Version]
  11. Behler, J.; Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 2007, 98, 146401. [Google Scholar] [CrossRef]
  12. Carrasquilla, J.; Melko, R.G. Machine learning phases of matter. Nat. Phys. 2017, 13, 431–434. [Google Scholar] [CrossRef]
  13. Tibaldi, S.; Magnifico, G.; Vodola, D.; Ercolessi, E. Unsupervised and supervised learning of interacting topological phases from single-particle correlation functions. arXiv 2022, arXiv:2202.09281. [Google Scholar]
  14. Bapst, V.; Keck, T.; Grabska-Barwińska, A.; Donner, C.; Cubuk, E.D.; Schoenholz, S.S.; Obika, A.; Nelson, A.W.; Back, T.; Hassabis, D.; et al. Unveiling the predictive power of static structure in glassy systems. Nat. Phys. 2020, 16, 448–454. [Google Scholar] [CrossRef]
  15. Iten, R.; Metger, T.; Wilming, H.; del Rio, L.; Renner, R. Discovering physical concepts with neural networks. Phys. Rev. Lett. 2020, 124, 010508. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Bedolla, E.; Padierna, L.C.; Castaneda-Priego, R. Machine learning for condensed matter physics. J. Phys. Condens. Matter 2020, 33, 053001. [Google Scholar] [CrossRef]
  17. Cichos, F.; Gustavsson, K.; Mehlig, B.; Volpe, G. Machine learning for active matter. Nat. Mach. Intell. 2020, 2, 94–103. [Google Scholar] [CrossRef] [Green Version]
  18. Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [Green Version]
  19. Smolensky, P. Information processing in dynamical systems: Foundations of harmony theory. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition; MIT Press: Cambridge, MA, USA, 1986; pp. 194–281. [Google Scholar]
  20. Sherrington, D.; Kirkpatrick, S. Solvable model of a spin-glass. Phys. Rev. Lett. 1975, 35, 1792. [Google Scholar] [CrossRef]
  21. Ackley, D.H.; Hinton, G.E.; Sejnowski, T.J. A learning algorithm for Boltzmann machines. Cogn. Sci. 1985, 9, 147–169. [Google Scholar] [CrossRef]
  22. Cocco, S.; Monasson, R. Adaptive cluster expansion for inferring Boltzmann machines with noisy data. Phys. Rev. Lett. 2011, 106, 090601. [Google Scholar] [CrossRef] [Green Version]
  23. Aurell, E.; Ekeberg, M. Inverse Ising inference using all the data. Phys. Rev. Lett. 2012, 108, 090201. [Google Scholar] [CrossRef] [Green Version]
  24. Nguyen, H.C.; Zecchina, R.; Berg, J. Inverse statistical problems: From the inverse Ising problem to data science. Adv. Phys. 2017, 66, 197–261. [Google Scholar] [CrossRef]
  25. Huang, L.; Wang, L. Accelerated Monte Carlo simulations with restricted Boltzmann machines. Phys. Rev. B 2017, 95, 035105. [Google Scholar] [CrossRef] [Green Version]
  26. Carleo, G.; Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 2017, 355, 602–606. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Melko, R.G.; Carleo, G.; Carrasquilla, J.; Cirac, J.I. Restricted Boltzmann machines in quantum physics. Nat. Phys. 2019, 15, 887–892. [Google Scholar] [CrossRef]
  28. Yu, W.; Liu, Y.; Chen, Y.; Jiang, Y.; Chen, J.Z. Generating the conformational properties of a polymer by the restricted Boltzmann machine. J. Chem. Phys. 2019, 151, 031101. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Mehta, P.; Schwab, D.J. An exact mapping between the variational renormalization group and deep learning. arXiv 2014, arXiv:1410.3831. [Google Scholar]
  30. Chen, J.; Cheng, S.; Xie, H.; Wang, L.; Xiang, T. Equivalence of restricted Boltzmann machines and tensor network states. Phys. Rev. B 2018, 97, 085104. [Google Scholar] [CrossRef] [Green Version]
  31. Salazar, D.S. Nonequilibrium thermodynamics of restricted Boltzmann machines. Phys. Rev. E 2017, 96, 022131. [Google Scholar] [CrossRef] [Green Version]
  32. Decelle, A.; Fissore, G.; Furtlehner, C. Thermodynamics of restricted Boltzmann machines and related learning dynamics. J. Stat. Phys. 2018, 172, 1576–1608. [Google Scholar] [CrossRef] [Green Version]
  33. Decelle, A.; Furtlehner, C. Restricted Boltzmann machine: Recent advances and mean-field theory. Chin. Phys. B 2021, 30, 040202. [Google Scholar] [CrossRef]
  34. LeCun, Y. A path towards autonomous machine intelligence. Openreview 2022. Available online: https://openreview.net/forum?id=BZ5a1r-kVsf (accessed on 17 October 2022).
  35. Torlai, G.; Melko, R.G. Learning thermodynamics with Boltzmann machines. Phys. Rev. B 2016, 94, 165134. [Google Scholar] [CrossRef] [Green Version]
  36. Morningstar, A.; Melko, R.G. Deep Learning the Ising Model Near Criticality. J. Mach. Learn. Res. 2018, 18, 1–17. [Google Scholar]
  37. D’Angelo, F.; Böttcher, L. Learning the Ising model with generative neural networks. Phys. Rev. Res. 2020, 2, 023266. [Google Scholar] [CrossRef]
  38. Iso, S.; Shiba, S.; Yokoo, S. Scale-invariant feature extraction of neural network and renormalization group flow. Phys. Rev. E 2018, 97, 053304. [Google Scholar] [CrossRef] [Green Version]
  39. Funai, S.S.; Giataganas, D. Thermodynamics and feature extraction by machine learning. Phys. Rev. Res. 2020, 2, 033415. [Google Scholar] [CrossRef]
  40. Koch, E.D.M.; Koch, R.D.M.; Cheng, L. Is deep learning a renormalization group flow? IEEE Access 2020, 8, 106487–106505. [Google Scholar] [CrossRef]
  41. Veiga, R.; Vicente, R. Restricted Boltzmann Machine Flows and The Critical Temperature of Ising models. arXiv 2020, arXiv:2006.10176. [Google Scholar]
  42. Funai, S.S. Feature extraction of machine learning and phase transition point of Ising model. arXiv 2021, arXiv:2111.11166. [Google Scholar]
  43. Wang, L. Discovering phase transitions with unsupervised learning. Phys. Rev. B 2016, 94, 195105. [Google Scholar] [CrossRef] [Green Version]
  44. Hu, W.; Singh, R.R.; Scalettar, R.T. Discovering phases, phase transitions, and crossovers through unsupervised machine learning: A critical examination. Phys. Rev. E 2017, 95, 062122. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  45. Wetzel, S.J. Unsupervised learning of phase transitions: From principal component analysis to variational autoencoders. Phys. Rev. E 2017, 96, 022140. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  46. Tanaka, A.; Tomiya, A. Detection of phase transition via convolutional neural networks. J. Phys. Soc. Jpn. 2017, 86, 063001. [Google Scholar] [CrossRef] [Green Version]
  47. Kashiwa, K.; Kikuchi, Y.; Tomiya, A. Phase transition encoded in neural network. Prog. Theor. Exp. Phys. 2019, 2019, 083A04. [Google Scholar] [CrossRef] [Green Version]
  48. Cipra, B.A. An introduction to the Ising model. Am. Math. Mon. 1987, 94, 937–959. [Google Scholar] [CrossRef]
  49. Newman, M.E.J.; Barkema, G.T. Monte Carlo Methods in Statistical Physics; Oxford University: Oxford, UK, 1999. [Google Scholar]
  50. Kramers, H.A.; Wannier, G.H. Statistics of the Two-Dimensional Ferromagnet: Part I. Phys. Rev. 1941, 60, 252–262. [Google Scholar] [CrossRef]
  51. Onsager, L. Crystal statistics. I. A two-dimensional model with an order-disorder transition. Phys. Rev. 1944, 65, 117. [Google Scholar] [CrossRef]
  52. Yang, C.N. The Spontaneous Magnetization of a Two-Dimensional Ising Model. Phys. Rev. 1952, 85, 808–816. [Google Scholar] [CrossRef]
  53. Plischke, M.; Bergersen, B. Equilibrium Statistical Physics; World Scientific: Singapore, 1994. [Google Scholar]
  54. Landau, D.; Binder, K. A Guide to Monte Carlo Simulations in Statistical Physics; Cambridge University Press: New York, NY, USA, 2021. [Google Scholar]
  55. Fischer, A.; Igel, C. An introduction to restricted Boltzmann machines. In Proceedings of the Iberoamerican Congress on Pattern Recognition, Havana, Cuba, 28–31 October 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 14–36. [Google Scholar]
  56. Oh, S.; Baggag, A.; Nha, H. Entropy, free energy, and work of restricted boltzmann machines. Entropy 2020, 22, 538. [Google Scholar] [CrossRef]
  57. Huang, H.; Toyoizumi, T. Advanced mean-field theory of the restricted Boltzmann machine. Phys. Rev. E 2015, 91, 050101(R). [Google Scholar] [CrossRef] [Green Version]
  58. Cossu, G.; Del Debbio, L.; Giani, T.; Khamseh, A.; Wilson, M. Machine learning determination of dynamical parameters: The Ising model case. Phys. Rev. B 2019, 100, 064304. [Google Scholar] [CrossRef] [Green Version]
  59. Besag, J. Statistical analysis of non-lattice data. J. R. Stat. Soc. Ser. D 1975, 24, 179–195. [Google Scholar] [CrossRef] [Green Version]
  60. LISA. Deep Learning Tutorials. 2018. Available online: https://github.com/lisa-lab/DeepLearningTutorials (accessed on 1 August 2022).
  61. Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
  62. Rao, W.J.; Li, Z.; Zhu, Q.; Luo, M.; Wan, X. Identifying product order with restricted Boltzmann machines. Phys. Rev. B 2018, 97, 094207. [Google Scholar] [CrossRef] [Green Version]
  63. Wu, D.; Wang, L.; Zhang, P. Solving statistical mechanics using variational autoregressive networks. Phys. Rev. Lett. 2019, 122, 080602. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  64. Nicoli, K.A.; Nakajima, S.; Strodthoff, N.; Samek, W.; Müller, K.R.; Kessel, P. Asymptotically unbiased estimation of physical observables with neural samplers. Phys. Rev. E 2020, 101, 023304. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  65. Yevick, D.; Melko, R. The accuracy of restricted Boltzmann machine models of Ising systems. Comput. Phys. Commun. 2021, 258, 107518. [Google Scholar] [CrossRef]
  66. Ferrenberg, A.M.; Landau, D.P. Critical behavior of the three-dimensional Ising model: A high-resolution Monte Carlo study. Phys. Rev. B 1991, 44, 5081. [Google Scholar] [CrossRef] [PubMed]
  67. Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Boston, MA, USA, 2012. [Google Scholar]
  68. Hinton, G.E. A practical guide to training restricted Boltzmann machines. In Neural Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 2012; pp. 599–619. [Google Scholar]
Figure 1. A restricted Boltzmann machine (RBM) with n h = 6 hidden units and n v = 9 visible units. Model parameters θ = { W , b , c } are represented by connections. A filter w 1 T from visible units to the first hidden unit is highlighted by red (light color) connections.
Figure 1. A restricted Boltzmann machine (RBM) with n h = 6 hidden units and n v = 9 visible units. Model parameters θ = { W , b , c } are represented by connections. A filter w 1 T from visible units to the first hidden unit is highlighted by red (light color) connections.
Entropy 24 01701 g001
Figure 2. Probability density function (PDF) of the distribution of (a) W i j , (b) b j , and (c) c i of T-RBMs with n h = 400 hidden units for the 2 d Ising model at temperatures below, close to, and above T c .
Figure 2. Probability density function (PDF) of the distribution of (a) W i j , (b) b j , and (c) c i of T-RBMs with n h = 400 hidden units for the 2 d Ising model at temperatures below, close to, and above T c .
Entropy 24 01701 g002
Figure 3. T-RBMs with n h = 400 for the 2 d Ising model at temperature T = 1.0 , 2.25 , and 3.5 . (a) Five sample filters w i T at each temperature. The color bar range is set to be within about two standard deviations of the distribution. (b) PDF of the distribution of the n h = 400 filter sums (normalized by n v ). Inset: variance j = 1 n v W i j 2 j = 1 n v W i j 2 of the filter sum as a function of temperature. (c) PDF of the distribution of the n v = 4096 inverse filter sums (normalized by n h ). Inset: correlation between a pair of inverse filters w : , j and w : , j (normalized by auto-correlation) as a function of spin–spin distance r j j .
Figure 3. T-RBMs with n h = 400 for the 2 d Ising model at temperature T = 1.0 , 2.25 , and 3.5 . (a) Five sample filters w i T at each temperature. The color bar range is set to be within about two standard deviations of the distribution. (b) PDF of the distribution of the n h = 400 filter sums (normalized by n v ). Inset: variance j = 1 n v W i j 2 j = 1 n v W i j 2 of the filter sum as a function of temperature. (c) PDF of the distribution of the n v = 4096 inverse filter sums (normalized by n h ). Inset: correlation between a pair of inverse filters w : , j and w : , j (normalized by auto-correlation) as a function of spin–spin distance r j j .
Entropy 24 01701 g003
Figure 4. The T -RBM with n h = 400 for the 2 d Ising model. (a) Four sample filters w i T . (b) PDF of the distribution of W i j , b j , and c i . (c) PDF of the distribution of the n h = 400 filter sums and the n v = 4096 inverse filter sums.
Figure 4. The T -RBM with n h = 400 for the 2 d Ising model. (a) Four sample filters w i T . (b) PDF of the distribution of W i j , b j , and c i . (c) PDF of the distribution of the n h = 400 filter sums and the n v = 4096 inverse filter sums.
Entropy 24 01701 g004
Figure 5. Histogram of m h obtained from the hidden encodings of M = 50,000 2 d Ising configurations at T = 2.25 using T-RBMs with various n h . Inset: examples of the hidden layer of T-RBMs with n h = 400 wrapped into a 20 × 20 matrix at three temperatures, where + 1 / 1 units are represented by black/white pixels.
Figure 5. Histogram of m h obtained from the hidden encodings of M = 50,000 2 d Ising configurations at T = 2.25 using T-RBMs with various n h . Inset: examples of the hidden layer of T-RBMs with n h = 400 wrapped into a 20 × 20 matrix at three temperatures, where + 1 / 1 units are represented by black/white pixels.
Entropy 24 01701 g005
Figure 6. (a) Internal energy, (b) magnetization, and (c) specific heat of 2 d Ising states reconstructed by T-RBMs ( n h = 400 ) with the hidden layer h ( 0 ) initiated according to p h = ( | m | + 1 ) / 2 or p h = 1.0 ( T 2.0 ) , 0.5 ( T 2.5 ) , 0.75 ( 2.0 < T < 2.5 ) (stepwise). Reconstruction by a seven-step Markov chain from random h ( 0 ) is compared ( v ( 7 ) ). Analytical and Monte Carlo simulation results are also shown.
Figure 6. (a) Internal energy, (b) magnetization, and (c) specific heat of 2 d Ising states reconstructed by T-RBMs ( n h = 400 ) with the hidden layer h ( 0 ) initiated according to p h = ( | m | + 1 ) / 2 or p h = 1.0 ( T 2.0 ) , 0.5 ( T 2.5 ) , 0.75 ( 2.0 < T < 2.5 ) (stepwise). Reconstruction by a seven-step Markov chain from random h ( 0 ) is compared ( v ( 7 ) ). Analytical and Monte Carlo simulation results are also shown.
Entropy 24 01701 g006
Figure 7. Mean and variance of visible energy E θ as a function of temperature for 2 d (a,c,d,f) and 3 d (b,e) Ising models captured by T-RBMs (a,b,d,e) and the T -RBM (c,f) of various hidden neurons n h . Three approximate forms of visible energy for n h = 400 T-RBMs are shown in (a).
Figure 7. Mean and variance of visible energy E θ as a function of temperature for 2 d (a,c,d,f) and 3 d (b,e) Ising models captured by T-RBMs (a,b,d,e) and the T -RBM (c,f) of various hidden neurons n h . Three approximate forms of visible energy for n h = 400 T-RBMs are shown in (a).
Entropy 24 01701 g007
Figure 8. Pseudo-likelihood L ˜ per spin of T-RBMs (a,b) and of the T -RBM (c) with different numbers n h of hidden units for the 2 d (a,c) and 3 d (b) Ising models in comparison with entropy S and pseudo-entropy S ˜ per spin. Dashed lines are polynomial fittings around T c .
Figure 8. Pseudo-likelihood L ˜ per spin of T-RBMs (a,b) and of the T -RBM (c) with different numbers n h of hidden units for the 2 d (a,c) and 3 d (b) Ising models in comparison with entropy S and pseudo-entropy S ˜ per spin. Dashed lines are polynomial fittings around T c .
Entropy 24 01701 g008
Figure 9. T d L ˜ d T per spin of T-RBMs (a,b) and of the T -RBM (c) with different numbers n h of hidden units for the 2 d (a,c) and 3 d (b) Ising models in comparison with T d S d T and T d S ˜ d T per spin, as well as specific heat c V calculated from Monte Carlo simulation.
Figure 9. T d L ˜ d T per spin of T-RBMs (a,b) and of the T -RBM (c) with different numbers n h of hidden units for the 2 d (a,c) and 3 d (b) Ising models in comparison with T d S d T and T d S ˜ d T per spin, as well as specific heat c V calculated from Monte Carlo simulation.
Entropy 24 01701 g009
Table 1. When sum ( w i T ) > 0 , a visible layer pattern v with magnetization m > 0 (or m < 0 ) is more likely to be encoded by a hidden unit h i = + 1 (or h i = 1 ). When sum ( w i T ) > 0 , the encoding is opposite.
Table 1. When sum ( w i T ) > 0 , a visible layer pattern v with magnetization m > 0 (or m < 0 ) is more likely to be encoded by a hidden unit h i = + 1 (or h i = 1 ). When sum ( w i T ) > 0 , the encoding is opposite.
sum ( w i T ) > 0 sum ( w i T ) < 0
m > 0 h i = + 1 h i = 1
m < 0 h i = 1 h i = + 1
Table 2. T c estimated according to the peak of T d L ˜ d T obtained from single-temperature T-RBMs and the all-temperature T -RBM with different numbers ( n h ) of hidden units. Predictions from numerical derivatives T d S d T and T d S ˜ d T are also shown for comparison. Results extracted from the peak of c V obtained by Monte Carlo simulations of finite systems are listed under “MC”. “Exact” refers to analytical or numerical results for infinite systems.
Table 2. T c estimated according to the peak of T d L ˜ d T obtained from single-temperature T-RBMs and the all-temperature T -RBM with different numbers ( n h ) of hidden units. Predictions from numerical derivatives T d S d T and T d S ˜ d T are also shown for comparison. Results extracted from the peak of c V obtained by Monte Carlo simulations of finite systems are listed under “MC”. “Exact” refers to analytical or numerical results for infinite systems.
Model n h = 400 90016002500S S ˜ MCExact
2 d T-RBM2.2402.2912.3162.3672.2672.3672.282.269
2 d T -RBM2.1892.1632.214-2.2672.3672.282.269
3 d T-RBM4.4444.4344.444-4.3904.3834.444.511
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Gu, J.; Zhang, K. Thermodynamics of the Ising Model Encoded in Restricted Boltzmann Machines. Entropy 2022, 24, 1701. https://doi.org/10.3390/e24121701

AMA Style

Gu J, Zhang K. Thermodynamics of the Ising Model Encoded in Restricted Boltzmann Machines. Entropy. 2022; 24(12):1701. https://doi.org/10.3390/e24121701

Chicago/Turabian Style

Gu, Jing, and Kai Zhang. 2022. "Thermodynamics of the Ising Model Encoded in Restricted Boltzmann Machines" Entropy 24, no. 12: 1701. https://doi.org/10.3390/e24121701

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop