Next Article in Journal
Straggler- and Adversary-Tolerant Secure Distributed Matrix Multiplication Using Polynomial Codes
Next Article in Special Issue
Social Depolarization and Diversity of Opinions—Unified ABM Framework
Previous Article in Journal
An Efficient Quantum Secret Sharing Scheme Based on Restricted Threshold Access Structure
Previous Article in Special Issue
Vanishing Opinions in Latané Model of Opinion Formation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Inferring Cultural Landscapes with the Inverse Ising Model

by
Victor Møller Poulsen
1 and
Simon DeDeo
1,2,*
1
Department of Social and Decision Sciences, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
2
Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(2), 264; https://doi.org/10.3390/e25020264
Submission received: 13 December 2022 / Revised: 22 January 2023 / Accepted: 25 January 2023 / Published: 31 January 2023
(This article belongs to the Special Issue Modern Trends in Sociophysics)

Abstract

:
The space of possible human cultures is vast, but some cultural configurations are more consistent with cognitive and social constraints than others. This leads to a “landscape” of possibilities that our species has explored over millennia of cultural evolution. However, what does this fitness landscape, which constrains and guides cultural evolution, look like? The machine-learning algorithms that can answer these questions are typically developed for large-scale datasets. Applications to the sparse, inconsistent, and incomplete data found in the historical record have received less attention, and standard recommendations can lead to bias against marginalized, under-studied, or minority cultures. We show how to adapt the minimum probability flow algorithm and the Inverse Ising model, a physics-inspired workhorse of machine learning, to the challenge. A series of natural extensions—including dynamical estimation of missing data, and cross-validation with regularization—enables reliable reconstruction of the underlying constraints. We demonstrate our methods on a curated subset of the Database of Religious History: records from 407 religious groups throughout human history, ranging from the Bronze Age to the present day. This reveals a complex, rugged, landscape, with both sharp, well-defined peaks where state-endorsed religions tend to concentrate, and diffuse cultural floodplains where evangelical religions, non-state spiritual practices, and mystery religions can be found.

1. Introduction

If we want to understand the powers and potentials of the human species—the landscape of both what has been, and could be, done—we are driven to make comparisons across vast ranges of time and culture. In these cases, data is not only missing, but differentially missing [1]. To analyze, at the same time, a contemporary culture of the digital age, and one that vanished five thousand years ago, requires careful accounting. There is both the intellectual challenge of making best use of what information reaches us, and an ethical imperative to treat long-lost cultures—and marginalized, under-studied, or minority cultures that survive today—on an equal epistemic footing with the dominant, often “WEIRD” [2] ones, for whom data is both more abundant and more complete.
Appropriate modeling of small, and potentially biased, data is a challenge. Replacing missing values with “no” or “not present”, for example, is the fallacy of taking absence of evidence for evidence of absence. Replacing them with the median answer, or the best match, from the remainder of the data makes unfamiliar cultures clones of the ones we know. Replacing them with “a fifty-fifty mixture of present and absent” is not much better: it attributes the lack of knowledge in the observer to a lack of coherence in the original culture; because we do not know what they did, we assume they did not, either. All these challenges are exacerbated in the “small data” limit common in studies of cultural evolution—archives with hundreds of data points, rather than the millions on which machine-learning algorithms are usually trained and tested.
This paper addresses the challenge of inferring cultural landscapes in deep time [3,4]. We show how to extend a commonly used workhorse of machine learning—the Inverse Ising Problem with minimum probability flow [5]—to the kind of sparse, under-sampled, and potentially biased samples of the historical record. While standard approaches can give misleading answers, we show how a set of carefully constructed modifications and extensions can provide new ways to ask basic questions about the evolution of human culture. We then demonstrate the power, and potential, of cultural landscape construction with an analysis of a curated subset of the Database of Religious History (DRH) [4,6].

2. Methods

The goal of our analysis is the construction of a cultural landscape: a general model of what makes different cultural patterns more or less likely to appear in the course of time. To be more specific, imagine that we have a set of “characteristics”—aspects of a culture that we care about, and which can be represented with a binary answer such as YES or NO, TRUE or FALSE, PRESENT or ABSENT, and so on. A particular setting of all the answers is called a configuration, and a landscape model says, for any particular configuration, how likely it is to appear.
Depending on how the experts understand the questions, the landscape derived from it might characterize, on one extreme, the patterns of behavior that could emerge in an individual—or, on the other extreme, the kinds of patterns that entire societies might explore across the span of human history. In the case treated here, we have cross-cultural data on religious groups in different cultures and time periods from 10,000 BCE to the present day; one group characteristic we consider, for example, is “Are supernatural beings believed to mete out punishment?” while another is “does membership in this group require participation in small-scale (private, household) rituals?” and a third is “does membership in this religious group require sacrifice of children?”
A landscape model, could it be found, would be a powerful tool for systematic investigation of how societies compose these different characteristics together to form the foundation of a stable cultural practice. We might want to know, for example, whether a “yes” answer to a belief in punishing gods makes it more likely for the religion to rely on small-scale rituals, all other things being equal, and how this relationship might be mediated by the presence of extreme practices such as child sacrifice.
Being able to answer these questions would provide important empirical constraints to more fundamental models. One model, for example, might understand child sacrifice as an extreme example of costly signals of devotion in a social context, otherwise disconnected from the metaphysical account the religion provides about god, while another might see the practice as something that could only be conceivable against a particular conceptualization of the relationship between humankind, nature, and the transcendent (see, e.g., Ref. [7] for discussion). The two models will make different predictions of how the practice co-varies with other characteristics.
Answers to questions such as these cannot be simply read off from the data, however, because religions with and without a belief in supernatural punishment generically differ on a wide range of characteristics, all of which might impact a violent practice such as child sacrifice. The correct answer requires a comparison to a fiducial culture that differs in only one characteristic. The space of configurations expands exponentially, and probing fundamental questions requires knowledge not just of the religions we happen to have observed, but the larger, law-governed landscape of what combinations—including those never observed in human history—are more or less likely.
A landscape model allows us to investigate which features of a religion most strongly couple with others. It provides insight into how different aspects of a religion bundle together [8], with a small number of distinct patterns of yes/no answers, as might happen if religions were divided into (for example) Axial and pre-Axial types. It would even allow to us identify practices that have yet to emerge—unexplored regions of cultural-evolutionary space. A more prosaic, though no less important, use of a landscape model is to predict missing data. For a long-lost culture, for example, whose metaphysical beliefs are unknown, a landscape model can predict the probabilities of different combinations of epistemic commitments on the basis of its material culture.

2.1. From Physics to Machine Learning: An Introduction to the Inverse Ising Problem

Inferring such a fitness landscape from data requires us first to specify the structure of the landscape itself—the spectrum of ways in which it allows one aspect of a pattern to make other aspects more or less likely. In traditional approaches, such as logistic regression, one chooses, ahead of time, a small number of possible effects, based on an explicit model; with a hundred data points, for example, one might try to learn—estimate—three or four regression coefficients.
When learning a landscape, by contrast, the number of parameters is very large—often comparable to, or even very much larger than, the number of observations [9]. The particular model we consider in this paper is a very general form of a neural network known in the machine-learning literature as the “unrestricted Boltzman machine”, and (in the physics literature) as the “inverse Ising problem” [10].
The inverse Ising model has been applied, with great success, to data ranging from neuroscience [11,12], the immune system [13], and the fitness landscapes of HIV [14], to animal behavior [15,16], political polarization and voting behavior [17,18], and linguistics [19]. It has also been used as a general model of generic complex cultural practices in cultural evolution [20]. In one common notation choice, the inverse Ising model says that the probability of observing a configuration i is
p i = exp E i ( θ ) Z ( θ ) ,
where θ are the parameters (to be estimated); Z ( θ ) , traditionally called the “partition function”, is the normalization constant; and the “energy”, E i ( θ ) , of a particular configuration is given by
E i ( θ ) = a , b ; a > b J a b σ a σ b + a h a σ a ,
where σ a is the truth value of the ath entry in configuration i; by convention, YES is + 1 , and NO is 1 ; there are n ( n 1 ) / 2 of the “J” parameters (the “pairwise couplings”), and n of the “h” parameters (the “local fields”).
In general, physicists take the J and h values (or the probability distributions they are drawn from) as given, and try to understand the properties of the resulting distribution [21]. The converse problem, which we consider here, is to infer the “best fit” J and h that can predict the observed frequencies of the occurrence of different configurations in a dataset.
As first noted by E.T. Jaynes [22], the form of Equation (2) means that, properly estimated, p is the distribution with maximum entropy that, at the same time, matches the observed means and pairwise correlations; i.e., those found by averaging over all the observed vectors, σ d , in the dataset D ,
i σ a p ( i ) = 1 | D | d D σ a , d and i σ a σ b p ( i ) = 1 | D | d D σ a , d σ b , d
Such models embody a kind of inverted form of Occam’s Razor: make the model just sophisticated enough to explain only the least complicated features of the data at hand, leaving everything else maximally undetermined. Surprisingly enough, this works: as has been repeatedly discovered, higher order correlations often “come along for the ride”, emerging spontaneously when the pairwise constraints of Equation (3) are satisfied [12,23,24]. Despite its simplicity, Equation (2) can capture a great deal of the real variability in complex systems, and many of the most celebrated successes of machine learning are, at heart, adaptations of this insight [25].

2.2. Minimum Probability Flow

Finding the values of J and h that satisfy Equation (3) is exponentially hard, because it requires averaging over all 2 n configurations in the probability distribution, Equation (1). We can rephrase the problem, however, as trying to find the Ising-model distribution, p i ( J a b , h a ) , that best fits the true (or “data”) distribution, p i , where the “best fit” is the one that minimizes the Kullback–Leibler divergence,
K ( θ ) = i C p i log 2 p i p i ( θ ) ,
where C is the (exponentially large) set of all 2 n configurations. The θ that minimizes Equation (4) produces a p i ( θ ) which is minimally distinguishable, in a basic information-theoretic sense, from the true distribution p i .
Minimizing Equation (4) directly, however, still requires multiple sums over C . The insight of MPF [5] is that, given a collection of observed configurations, D , Equation (4) can be approximated by minimizing the “probability flow”. When a parameter choice θ is a poor match to the data, probability tends to flow “away” from data states to non-data states. Up to constant factors, we can approximate Equation (4) as
K ( θ ) = j D i N D Γ i j ( θ ) ,
where Γ i j ( θ ) is the rate of flow from state j to state i for parameter choice ( θ ) , and N is a set of “neighbouring” non-data configurations. Minimizing Equation (5) is a tractable task; in contrast to Equation (4), the sums are no longer over C , but a radically smaller set of observed data, D , and a well-chosen N . MPF is related to a basic method in machine learning, contrastive divergence [26], with the principle advantage, for our purposes, of having a well-defined, epistemically principled, objective function.

2.3. Improvements and Extensions to the MPF Algorithm

In this section, we present a series of improvements and extensions to the basic MPF algorithm. These include both apparently minor, but critical, variations in the basic algorithm, and a new extension and derivation. We are particularly grateful to the authors of ConIII [27], whose implementation, and clear discussion, of MPF enabled us to debug and test our own code.
Section 2.3.1 and Section 2.3.2 present a pair of improvements to the basic algorithm; these provide significant boosts in performance and accuracy on sparse social and cultural data. Section 2.3.3 shows how to handle inconsistencies between different observers (or inconsistencies within the same observer), and Section 2.3.4 shows how the same tools also allow us to account for uneven sampling in time or space. Finally, Section 2.3.5 describes a novel extension to the MPF algorithm, Partial-MPF, which enables us to handle missing data in a principled fashion.

2.3.1. Nearest-Neighbour Sampling

In the original version of the MPF algorithm, flow is computed from the observed configurations (“data states”) to a subset of other configurations, explicitly excluding flow into any other data states. It is equally valid, under the MPF approximation, to allow flow into states that do appear elsewhere in the data; this can be seen at line A-6 of Ref. [5], where you can interchange the order of the derivative and the summation. This alternative choice is the default under ConIII.
Our experiments find that the alternative choice provides greatly improved out-of-sample performance, because the exclusion biases the algorithm against configurations near a metastable peak. With this change in hand, the function to be minimized is
K ( θ ) = j D i N ( j ) Γ i j ( θ ) ,
A natural choice is to set N ( j ) to include states within a certain Hamming distance of j; the original MPF paper considered states that differed from the data state at one position, i.e., N 1 ( j ) ; we also consider a strategy which uses states up to two ( N 2 ( j ) ) Hamming units away. Since | N ( j ) | is the same for all j, this provides equal weighting to all data states. (It is also possible to consider randomly chosen neighbours; however, this tends to give significantly decreased performance; the MPF algorithm performs best when it is allowed to focus on reasonably nearby variations from the observations.)

2.3.2. Regularization Constraint

Minimizing Equation (6) is equivalent to (attempting to) maximize the posterior log-probability of the data given the model. A proper Bayesian analysis, however, should include not just the posterior, but a prior over the parameters themselves,
K ( θ ) = K ( θ ) λ | D | | N | log P ( θ ) ,
where λ is a constant, and P ( θ ) is the probability of a particular choice for J and h.
It is natural to choose P ( θ ) so that, all other things being equal, smaller values are preferred; this is sometimes known as a regularization penalty, which often provides significant benefits to out-of-sample prediction [28]. Without regularization, models tend to overfit, producing unreasonably low probabilities for configurations that happen not to appear in the data.
If we assume that J and h are distributed as a Gaussian—what is sometimes known as the L 2 -norm—we have
K ( θ ) = K ( θ ) λ | D | | N | k = 1 N p θ k 2 2 ,
where the value of λ encodes the variance in the Gaussian; a larger λ corresponds to a smaller variance.
The optimal choice for λ depends on P ( θ ) , which is, in general, unknown. It can be estimated, however, by cross-validation: if there are m datapoints, fit the data using m k datapoints (the training set), and compute the log-likelihood for the remaining k datapoints (the test set). In this paper, we take k equal to one, i.e., leave-one-out cross-validation. Repeating this for all possible choices of the left-out observation, and then averaging the result, allows us to estimate the performance of the fit as a function of λ .

2.3.3. Inconsistent Data

In some case—for example, in about 17 % of religious groups in the DRH data used below—we have inconsistent coding, where multiple, incompatible answers exist for the same configuration. This can emerge when different observers interpret a question, or evidence, in different ways, or have different examples in mind. In the DRH, it most commonly appears when the same observer flags a feature as less straightforward than it appears; for example, “Iban traditional religion” is inconsistently coded for whether the religion had scriptures, with the coder citing it as a “borderline case” and answering both “yes”, and “no”. Another example is “Unitarian Universalism” (UU), where the same observer coded belief in afterlife as both “yes”, and “no”, noting that some UUs do, and some do not, believe in an afterlife. A proper accounting of the landscape ought to allow for both.
To make explicit use of inconsistent data requires an error model, and there are two natural choices. Consider, as an example, two observers who provide inconsistent answers, for the same system, to three binary questions: j 1 gives { 1 , 1 , 0 } , while j 2 gives { 1 , 0 , 1 } . If we assume that, for each observer, their best answer to one question is dependent upon all the others, we can include both records, with a weighting term, w j , which captures the epistemic uncertainty
K ( θ ) = j D i N ( j ) w j Γ i j ,
where w j 1 = w j 2 = 1 / 2 . Alternatively, one can take inconsistencies as evidence of uncertainty question by question—the “independent” model. Then we interpret the observations j 1 and j 2 as indicating that observers are, in general, uncertain about the answers to questions two and three, with independent probabilities of “yes” for each 1 / 2 . In this case, one includes not only the observed records ( r 1 = { 1 , 1 , 0 } , r 2 = { 1 , 0 , 1 } ) but also the unreported combinations r 3 = { 1 , 1 , 1 } and r 4 = { 1 , 0 , 0 } , each with weight 1 / 4 .
Both choices imply that differences between observers trace back, not to uncertainty about a fixed reality, but rather to fluidity in the practices themselves, where both answers are equally valid depending on the details of time and place. The examples presented above are the most common form of inconsistency, and this argues in favor of the independent model.

2.3.4. Correcting for Non-Uniform Weighting across Time and Space

Cultural data is often unevenly sampled. We have more examples from the present than the distant past; more from high-GDP countries than from low-GDP countries; more from dominant cultures in a region than from marginalized or minority ones.
This can lead to bias in our landscape estimation. If we have, for example, 20 observations from cultures of Type A (the “contemporary developed world” sample), and only 10 observations from cultures of Type B (the “understudied”, or “minority”, sample), then a naive use of the data would tend to lead to landscapes that made Type-A cultures look more stable than Type-B cultures, and would produce accounts of the interlocking constraints that made Type-A cultures look more natural than Type-B cultures.
Often, however, we will know from archival records or field reports that groups exist, even if we know nothing about them, which allows us to estimate the sampling bias. With such an estimate in hand, Equation (9) allows us to re-weight observations to compensate.

2.3.5. Partial-MPF: Accounting for Missing Data

Handling missing data is a challenge. Consider an observation such as the following,
j = { 1 , 0 , X , X } ,
where answers to the last two questions are not provided. The function that MPF minimizes, Equation (5), can only be calculated for fully specified data, and so a natural response is to perform data imputation: for example, replacing missing answers with the most common responses for that question in the remainder of the data.
While naive imputation methods are often suggested in machine-learning tutorials, they are, in the final analysis, an epistemic fallacy: they replace what is unknown by what is known, and assume that what has not been seen looks like what has. In qualitative work, such a fallacy would be obvious. An archaeologist would not suggest, for example, that the metaphysical beliefs of a long-vanished civilization should match the median beliefs of civilizations today.
A better way to solve this problem, which we refer to as “Partial-MPF”, is to dynamically infer the missing data from the best estimates of the parameters θ ; i.e., to work not with a particular completion for j, but a distribution over, in this case, the four possible values, j 1 , { 1 , 0 , 0 , 0 } , j 2 , { 1 , 0 , 0 , 1 } , j 3 , { 1 , 0 , 1 , 0 } , and j 4 , { 1 , 0 , 1 , 1 } , found using Equation (1).
When the amount of missing data is small (in practice, fewer than 10 missing values per configuration, independent of the total number of questions), the distribution can be computed exactly. For an observation with m missing values, we expand the observation into the 2 m different combinations; compute the weights, w j ( θ ) , for each combination; and combine them together as in Equation (9). This is somewhat like the “expectation-maximization” step suggested by Ref. [29] for missing data, but with probabilistic weightings that preserve continuity in the derivative.
Performing this correctly requires care, and there are three alterations we have to make to the basic algorithm. First, we must update the weights w j ( θ ) as we move through parameter space. Second, because the weights depend on θ , this changes the form of the derivative d K ( θ ) / d θ . Third, when considering a configuration with missing data, we have to restrict its neighbour space to include only those configurations that differ in a known question.
Importantly, while inference of the missing data is exact, K ( θ ) is still only an approximation, and so minimizing K ( θ ) will be in slight tension with the new (exact) inference step that Partial-MPF takes. As we will see in Section 4.3, this is not a show stopper, and our treatment of missing data is, in practice, much more effective than standard alternatives.

3. Data

Our case study draws on data from the Database of Religious History (http://religiondatabase.org, accessed on 12 October 2022) (DRH) [4,6]. The DRH, an ongoing project based at the University of British Columbia, includes a peer-reviewed collection of information about religious groups in both the contemporary, historical, and archaeological record, in the form of coded answers to standardized question sets (“polls”, in the DRH) [4,30,31].
The DRH is organized hierarchically, such that some “super” questions (e.g., “Is a spirit-body distinction present?”) have sub-questions (e.g., “Is spirit-mind conceived of as having qualitatively different powers or properties than other body parts?”), and even sub-sub-questions. For this case study, we limit ourselves to super questions, since sub-questions are contingent on answers to super questions. This limits the number of questions from 1133 to 171. The majority of the questions are binary questions, and so are a natural fit to the Inverse Ising method. When we limit ourselves to questions that ask for binary answers, this further limits the number of questions from 171 to 149, and the number of records from 838 to 835.
The DRH is under continuous development. In this preliminary analysis, intended to demonstrate the methods and the basic ideas behind landscape construction, we focus on a subset of 20 questions, and do not correct for potentially uneven sampling of groups by time or place. We start by selecting the questions with the fewest unanswered questions across records, and then select all records (i.e., religious groups) that have five or fewer missing answers. Additionally, selecting only civilizations from the “group” poll [30], leaves us with a final data set of 407 civilizations. We infer parameters by running the Partial-MPF algorithm on these observations. See Appendix B Table A1 for the full list of questions, and Appendix B Table A3 and Table A4 for all religious groups in our curated dataset.

4. Results: Simulations

We first present the results of simulations; these confirm that our extensions to the basic MPF algorithm provide critically important improvements to the quality of the fit. To do this, we create large numbers of “imaginary” landscapes, where the underlying parameters have statistics similar to those observed in the real world. We take n, the number of YES/NO questions, equal to 20, and we draw the parameters J a b and h a from a Gaussian distribution. We then simulate data as draws from this underlying distribution, using the Metropolis–Hastings algorithm, altering it in different ways to take into account how real-world data is distorted by the data-gathering process.
With these simulated datasets in hand, we use our different extensions to the MPF algorithm to attempt to infer the underlying true parameters. We quantify the performance of our algorithms by direct calculation of the Kullback–Leibler divergence in the inferred distribution (corresponding to inferred parameters J ^ a b and h ^ a ) from the true distribution (which, in our simulations, is known—it is just the distribution produced by the original J a b and h a ),
KL = i C p i ( J a b , h a ) log p i ( J a b , h a ) p i ( J ^ a b , h ^ a ) .
When KL is close to zero, the inferred distribution is hard to distinguish from the true distribution—i.e., it is a good fit. KL has a number of useful properties that allow it to play the role of “mean squared error” for probability distributions [32], quantifying the relative error in reconstruction and prioritizing accurate reconstruction of the more common states.
In general, reconstruction performance will depend upon the parameters of the distribution from which the test values J a b and h a are drawn. For our particular case of N = 20 , we choose this to be a Gaussian with mean zero, and σ ranging between 0.01 and 1.0 .
When σ is small, the constraints are very weak and we are in a near-random or “dispersed” regime. As σ becomes larger, we enter what we call the “ordered” regime up to σ of approximately 0.25 , where constraints are strong enough to produce peaks where data tends to cluster; practically speaking, this is where most real-world systems, including the DRH, tend to be found. For completeness, we consider yet larger σ values: going above 0.25 we enter the “near critical” regime, where peaks become sufficiently strong to produce large-scale order, and, finally, what we call the “critical” regime, above 0.5 , where the distribution is near, or past, the spin-glass phase transition.

4.1. Regularization and Cross-Validation Greatly Improve Performance

Regularization using the λ parameter significantly improves our ability to estimate the underlying landscape, making reliable extraction possible with very small amounts of data. An example is shown in Figure 1, where we take a particular simulated dataset (with σ equal to 0.5 ), and compare the probabilities estimated using the baseline MPF (i.e., without regularization), to our regularization method where λ is estimated using leave-one-out cross-validation.
The regularized model is not only better at estimating the probability of the peaks of the landscape (the more likely, high-probability configurations), it also avoids overfitting to less common configurations. Standard MPF, by contrast, can sometimes recover very large values for the J a b parameters, leading it to underestimate the vast majority of the less likely configurations (p less than 10 2 ). For Standard MPF, sometimes, what has not been seen is not just less likely, but effectively impossible.
Table 1 shows that regularization makes reconstruction possible even in the radically under-sampled regime where the number of parameters (here, 210, for n = 20 ) exceeds the amount of data (here, 128 observations), and cross-validation leads to near-optimal results.

4.2. Re-Weighting Can Correct for Sampling Bias

To study bias correction, we simulate multiple examples of a biased sampling process. First, we construct landscapes (for a variety of β values) where answers to one of the questions are split, evenly, between YES (the “Type A” groups) and NO (the “Type B” groups). We then create two samples: a full sample of 256 observations, and a biased data sample, with 128 observations of Type-A groups, but only 64 observations of Type-B groups. This simulates an extreme form of bias, where the dominant Type-A cultures are over-sampled by a factor of 2:1.
We then compare the reconstruction performance in three conditions: the ideal case, with 256 observations; the naive-biased case, where parameters are learned from the biased sample; and the re-weighting case, where we implement the weighting prescription of Section 2.3.4. We measure both the KL divergence, and the average log-odds bias against the Type-B groups, defined as
Bias = exp log p B p A 1 ,
where p B is the model’s predicted probability of Type-B groups, p A (equal to 1 p B ) is the predicted probability of Type-A groups, and the average is taken over multiple simulations in a β range. The true value, by construction, is p A equal to p B , and negative values indicate bias against the minority cultures.
Table 2 shows the results; even at 2:1 levels of bias, our methods can achieve high reconstruction accuracy without inappropriately biasing the underlying landscape in favor of dominant cultures.

4.3. Partially Observed Data Can Be Consistently Integrated into Inference

To test the performance of Partial-MPF, we consider a scenario where we have a certain amount of complete data, and then add in new partially observed data. Figure 2 shows an example of how this works in practice for a single simulated system. We begin with 128 data points, and then add increasing amounts of data which is 25% incomplete (a random selection of five of the 20 features are blanked out.) We compare our method to a common “naive” choice of taking missing variables to have the most commonly observed value in the remainder.
The three lines show how fit quality changes as (1) more fully observed data is added (the ideal case); (2) partially observed data is added, and integrated in using the Partial-MPF strategy; and (3) partially observed data is added, using the naive strategy. While Partial-MPF is able to make good use of the data to improve the fit (the KL from the estimated landscape to the actual landscape declines), additional (noisy) data very often harms the quality of the naive fit beyond a certain point. Table 3 shows the average results in different regimes; the same pattern is observed.

5. Results: The Database of Religious History

We present our empirical results in four parts. First, in Section 5.1, we look at the values of the inferred parameters. The parameters suggest how we should read the underlying “logic” of the landscape: the key interactions that, in combination, make some configurations more consistent with constraints than others.
We then look at the landscape of configurations, as implied by the parameters. In Section 5.2, we show how it can be used to inform hypotheses in cases where data is inconsistent or missing; we take, as an example, the case of a cult in the ancient Mediterranean.
In Section 5.3, we show how to visualize the large-scale structure of the landscape—the topography of “peaks” (concentrated regions where religions tend to cluster), “valleys” (where underlying constraints make traditions harder to sustain), and “floodplains” (areas of configuration space where constraints are weaker, favoring diversity and variation). Finally, in Section 5.4, we show how to analyze the local neighbourhood of a configuration, which gives us a new window into the question of cultural evolution over time.

5.1. Parameter Interpretation and Landscape Logic

Figure 3 provides a simple overview of the logic of the cultural landscape derived from the DRH. This compares the underlying parameters of the Inverse Ising model (the J i j and h i terms), inferred by Partial-MPF, to the surface-level, observed correlations in the data.
In some cases, the surface-level correlations are a good guide to the underlying logic. Our model suggests that, for example, the observed correlation between small-scale (18) and large-scale (19) rituals is most naturally explained, at this resolution, by an underlying sympathetic (i.e., J 18 , 19 positive) pairwise constraint. Similarly, the “big Gods” [33] pairing of supernatural monitoring (12) and supernatural punishment (13) is both a strong surface-level feature, and a core part of the landscape logic.
Much of the surface-level structure that we observe, however, turns out to be an emergent property of more complex relationships in the underlying parameters. The model suggests, for example, that a strong surface-level correlation between monumental architecture (3) and special treatment for corpses (7) can be explained away by mediation through other variables. Grave goods (9) is another example: it is rare in the observed data, but the local field for this feature is slightly positive, suggesting that there is nothing inherently difficult about maintaining a grave-good tradition. Instead, the practice is disfavored because of how it interacts with, for example, the keeping of written scriptures (2). Our model also reveals an underlying logic that links interactions among an “extreme” set of practices (castration (14), adult sacrifice (15), child sacrifice (16), grave-co-sacrifices (8), and suicide (17)).

5.2. Hypothesising the Unknown

Landscape models enable us to predict unknown data: given partial information about a group, Equation (1) allows us to conjecture about how the constraints, inferred from other systems, would interact in the particular case at hand. Cases with genuine expert disagreement, and cases where features of religious cultures are unknown due to the ravages of time, are the most exciting to analyze in this way.
As a particularly compelling example, consider the “Archaic Spartan Cults” (800 BCE—500 BCE). For these precursors to the Spartan state, both the presence of child sacrifice and small-scale rituals have been coded by the DRH expert as “unknown to the field”. In Table 4, we use the inferred parameters, along with what is known about the Spartans, to compute the degrees of belief in the four combinations.
The model is nearly 99% certain that the Cults did not practice child sacrifice. In this case, the known absence of both castration and adult sacrifice, both of which have sympathetic links with child sacrifice in the underlying model, are sources of evidence against the proposition (see Figure 3A).
The model is also reasonably confident about the presence of small-scale rituals; here, emergent constraints such as the strong pairwise coupling to the presence of large-scale rituals, which the Spartan Cults are known to have had, tilt the balance in favor of small-scale ritual. The judgement is less certain, however. The power of the Inverse Ising model is seen here not just in its recognition of common patterns, but in how it parses out the evidentiary value of different pieces of information.

5.3. The Landscape of Religious Culture

The basic output of our model is a probability distribution over 2 20 possible configurations: a cultural landscape with peaks (small groups of high-probability configurations), and valleys (areas of low-probability configurations). As we shall see, landscapes can also include wider “floodplains”—more widely dispersed collections of configurations that are reasonably, and relatively equally, probable.
It is difficult, however, to visualize all the configurations at the same time: placing all the points of a 20-dimensional hypercube on a two-dimensional plot makes it hard to see which configurations are close (and, e.g., part of a connected plateau) vs. far (e.g., two well-separated peaks).
One way to approach this problem is to start with the topography of the most likely configurations. In Figure 4, we represent the 150 most probable configurations as a network, where configurations that differ in only one answer are connected by an edge, and the nodes are arranged to best represent distances; roughly speaking, configurations that differ in more answers are further apart (see Appendix A.1 for details). The configurations shown in the network represent 42 % of the total probability mass, and provide an overview of the region of the landscape that contains the most favored configurations. Since we only visualize the 150 most probable configurations, a great deal of the landscape structure is not represented, including rarely explored parts of the space (e.g., configurations that support extreme practices, such as human sacrifice, suicide, and castration).
As a second aid to visualization, we used hierarchical clustering to construct a dendrogram (see Appendix Figure A1). Based on this clustering, we can split nodes into nested communities (see Appendix A.2). Table 5 provides names for the religions labeled in Figure 4A, and Table 6 (and Appendix B Table A3) provides a list of the most distinctive features of each group. The full list of groups is provided in Appendix B Table A3.
Group 1 (red) is the largest by probability mass ( 21 % ); it is characterized by a relative presence of small- and large-scale rituals, monuments, and scriptures. Among others, this group contains Ancient Egyptian religions, many Islamic traditions, and Catholic groups such as the Jesuits and Cistercians.
Group 2 (blue) is the second-largest ( 12 % ); it is characterized by a relative absence of small- and large-scale rituals, grave goods, and special corpse treatment. It includes demotic, charismatic, and reform traditions, including many Protestant groups such as the Southern Baptists, Jehovah’s Witnesses, and Pentecostalism. Group 2, in contrast to Group 1, is more evenly weighted among its configurations; where Group 1 has a small number of peaks, Group 2 is more like a “floodplain”. The configurations in the light blue group (2.1) tend to be found closer to Group 1 topologically; they have higher rates of state-political support than the configurations in the dark-blue group (2.2), and higher rates of both monuments and special treatment of corpses.
Finally, Group 3 (yellow) is the smallest by probability mass ( 9 % ). It is characterized by a relative absence of written scripture, and a relative presence of grave goods, and co-sacrifices in tombs. It includes many folk, traditional, indigenous, and “pre-Axial” [36] pagan cultures, such as the Iban traditional religion, Roman Imperial Cults and Mesopotamian religions. The light-yellow group (3.1), topologically further away from both Group 1 and Group 2, is characterized by a relative absence of moralizing “big Gods” [37] who conduct supernatural monitoring and punishment.
While the landscape is inferred without reference to time, cultural evolution appears to have explored the landscape in a somewhat sequential fashion. These temporal effects include shifts from Group 3’s pre-Axial tribal and archaic religious cultures towards Group 1’s later Axial religious cultures [36] and “big Gods” religions that co-evolved with large-scale complex societies [31,33]. Group-3 religions tend to be older than those in nearby Group 1, which has the highest concentration of religious cultures committed to a belief in high Gods. Group 2, in turn, includes popular developments out of Group-1 traditions into contemporary society, including many Protestant religions and more recent groups such as Pentecostalism, a sect established in the twentieth century and rapidly becoming one of the largest Christian sub-groups [33].
The landscape reflects more than just a temporal sequence of social, economic, and material evolution, however. It also seems to capture the constraints of more permanent features of the human mind. While Group 1 includes many later “solutions” to the constraints found by Axial-age and “Big Five” religions, it also includes cases such as pre-Christian Ireland. Religions, in other words, may co-evolve with social context, but they also have to respect the psychological constraints on how we believe and keep faith, and may well wander back to earlier solutions [38].

5.4. Focal Landscapes

Figure 4A provides an overview of how constraints combine to imply a landscape of configurations; a second possibility is to map the local landscape of configurations around a particular group. Among other things, this provides a grounded way to speculate on how a culture might evolve into the future, or where it might have come from—to ask, for example, which bits in a configuration might flip, and whether or not this would push the religion to a more probable configuration which is better able to satisfy the underlying constraints.
Figure 4B,C does this for two groups in our data, the (contemporary) Free Methodist Church and the (ancient) Roman Imperial Cult. In both cases, we show fifty nodes: the group itself as the focal node, and then the 49 most probable nearby configurations, which differ in up to two answers from the focal case.
As seen in Figure 4B, the Free Methodist Church is situated at a local peak, and all neighboring configurations are of lower probability. Some of them appear in our data (e.g., the Southern Baptists, and Pauline Christianity), but several are unoccupied. The highest probability configuration in the local region is occupied by the Jehovah’s Witnesses, two steps away.
The Free Methodist Church does not require participation in large-scale rituals. A change in this attribute is their most probable reformation ( 15 % ) and would place them in the same configuration as the Southern Baptists. This change would take the Free Methodist Church configuration closer to the local maximum, which is occupied by the Jehovah’s Witnesses. Another path to the Jehovah’s Witnesses configuration is through Pauline Christianity. However, all paths from the Free Methodist Church to the Jehovah’s Witnesses require intermediary states of lower probability. Slightly less probable is a mutation in which the Free Methodists adopt a practice for special treatment of corpses ( 13 % ). This reformation would take the Free Methodists in another direction in the landscape, and there is no religion in our dataset that corresponds to this configuration.
In contrast to the Free Methodist Church, the Roman Imperial Cult (Figure 4C) sits in a valley, with several neighboring configurations of higher probability. The Cult satisfies the constraints better without its own distinct written language, than with (as was actually the case), and with scriptures rather than without. Loss of its own distinct language would shift it up to the Mesopotamia configuration, while acquiring scriptures would shift it up to the Achaemenid configuration.

6. Discussion

The main goal of this work was to provide those in cultural evolution and sociophysics with new methods, and accompanying code, for inferring the landscapes beneath the incomplete data of the historical record. In addition to characterizing these methods through simulation, we showed how they play out in a real-world example, drawn from the Database of Religious History. In the words of archaeologist David Hurst Thomas, “it’s not what you find, it’s what you find out”, and we endeavored to show how landscape models not only organize data from the field, but provide insight into the underlying laws and dynamics that can help explain it.
A key direction for future research is to consider how these methods might be extended to even larger configuration spaces. As the number of features considered increase, so do the challenges; when n goes from 20 to 100, for example, the number of parameters goes from 210 to more than 5000. To maintain the same level of accuracy would, generically, require the amount of data to rise by a similar factor—however, this may not always be possible; in the final analysis, there are only a finite number of civilizations in human history.
A more creative solution to the problem is to go from the “unrestricted” Boltzmann machine case, where all J i j s are (potentially) non-zero, to the “restricted” case, where some links are set to zero by the researcher ahead of time. In this case, the researcher sculpts a theory of constraints, restricting a priori the ways in which features may interact and reducing the number of free parameters. Another solution is to connect nodes not to each other, but to a small number of hidden variables—“layers”, in the deep-learning jargon. If there are n features, and m hidden nodes, then the total number of parameters, including local fields, is n ( m + 1 ) , which may make the problem tractable again. Hidden layers have proven to be particularly expressive; in the physics jargon, they are equivalent to how renormalization leads to higher order interactions [39]. The original MPF paper [5] demonstrated the use of hidden nodes in this fashion, and the framework makes it possible to extend our Partial-MPF algorithm to these cases as well.
These are the are challenges in inference. There are equally compelling challenges in data curation itself. The DRH is one example of the exciting resources coming online for researchers in the human sciences, but these sources bring complexities of interpretation in their wake. As discussed in Appendix A.3, for example, drawing the boundaries between one group and another—in space, or time—is not a simple matter. This raises questions about how to properly combine the rich, qualitative data that comes from the field in ways that properly represent the diversity of human possibilities.
We paid particular attention to mitigating different forms of bias: both the bias that comes from undersampling a subset of traditions, and from how we treat missing data. There are other forms of bias in the data curation stage, however, and one we have not addressed is “question bias”: the ways in which the questions we use map the neighbourhoods of some cultures better than others.
One might imagine, for example, a set of questions very finely tuned to distinctions between different forms of Christianity, but that end up lumping indigenous practices in Africa into a single configuration. A scholar of Christianity might not, for example, include questions about whether a religious group has practitioners who are separately distinguished as “sorcerers” or “witches”, because the answer for all the traditions they have in mind would be “no”; the same question, however, could track important aspects of cultural evolution in other parts of the world (We thank one of our referees for this example). If we build a global landscape solely on the basis of “Christianity” questions, we will radically underestimate the diversity of indigenous traditions, and learn little about the network of constraints the stabilize these traditions.
A natural test for question bias is to check the extent to which “truly different” groups are mapped to the same configuration. If all of the groups in a particular region have an identical configuration, for example, or an usually low level of diversity, it might suggest that we are biased against important dimensions of the religious experience in that region.
Question bias is not, however, something that can be spotted or corrected purely at the algorithmic level. It may well be the case, for example, that one region truly has less religious diversity than another: the religions in a region may have emerged from a single founding group and undergone very little further evolution. Adding questions to artificially increase the diversity, in such cases, can do more harm than good—if the new questions are about somewhat accidental properties, they will increase noise without adding insight. In the final analysis, the proper construction of a landscape requires a proper choice of questions, which, in turn, requires sensitivity to the differences that matter.

Author Contributions

Conceptualization, V.M.P. and S.D.; Methodology, V.M.P. and S.D.; Software, V.M.P. and S.D.; Validation, V.M.P. and S.D.; Formal analysis, V.M.P. and S.D.; Investigation, V.M.P. and S.D.; Resources, S.D.; Data curation, V.M.P.; Writing—original draft, V.M.P. and S.D.; Writing—review & editing, V.M.P. and S.D.; Visualization, V.M.P. and S.D.; Supervision, S.D.; Project administration, S.D.; Funding acquisition, S.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work used the Extreme Science and Engineering Discovery Environment (XSEDE [40]), which is supported by National Science Foundation grant number ACI-1548562. Specifically, it used the Bridges-2 system [41], which is supported by NSF award number ACI-1928147, at the Pittsburgh Supercomputing Center (PSC), under grant HUM220003. The DRH is funded by the John Templeton Foundation, Templeton Religious Trust, and Canada’s Social Sciences and Humanities Research Council (SSHRC). This work was supported in part by the Survival and Flourishing Fund.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data and open-source code (incl. optimized C code mpf_CMU) for the methods and analysis described in this paper is available at https://github.com/victor-m-p/cultural-landscapes, accessed on 24 January 2023.

Acknowledgments

We thank the DRH team for discussions and data sharing, and participants in the Santa Fe Institute workshop “Coding the Past: The Challenges and Promise of Large-Scale Cultural Databases” for discussions, which made this work possible.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. Database of Religious History: Analysis and Data Considerations

Appendix A.1. Network Layout

As briefly touched upon in Section 5.3, it is mathematically impossible to faithfully represent the 20-dimensional hypercube landscape in a two-dimensional layout. By only laying out a subset of the possible configurations (e.g., in Figure 4A, 150 out of the total 2 20 possible configurations), dimensionality reduction techniques can approximately compress the high-dimensional space into a low-dimensional spatial representation. We attempted approaches based on minimizing a global energy function (e.g. multi-dimensional scaling, and similar approaches [42]), as well as force-directed placement algorithms (e.g., Fruchterman–Reingold heuristic [34]). We achieved the most appealing results following the latter approach, using the algorithm as implemented in Graphviz [35]. We stuck with this approach for network layouts throughout, i.e., in all the plots shown in Figure 4, and in both plots shown in Figure 3. For the plots shown in Figure 4, the layout uses only immediate neighbors (1 Hamming distance) and is unweighted. To create the spatial layout of nodes for Figure 3, we thresholded the connections (couplings) between nodes (Figure 3A), such that only connections with an absolute coupling value above 0.15 are taken into account when running the force-directed algorithm. Figure 3B uses the layout obtained from Figure 3A to facilitate comparison.

Appendix A.2. Hierarchical Clustering

We use agglomerative clustering as implemented in the Python package scikit-learn [43] to cluster nodes into nested groups. This produces the dendrogram shown in Figure A1. The algorithm starts off with all individual leaves (in our case configurations) in individual clusters and then successively merges nearby elements together. This results in a hierarchical grouping, where, e.g., the red clade is closer to the two yellow clades than it is to the two blue clades. There is no natural resolution of number of clades, as can be seen from Figure A1. Five clusters was chosen for clarity, but we emphasized the three overall splits between red, blue, and yellow, and we could equally well have used a higher resolution (i.e., more fine-grained clades). Notice that changing the resolution does not change the structure for the hierarchical clustering algorithm used here. This is in contrast to other common (non-hierarchical) approaches, such as Louvain community detection [44], which were found to give unstable results on our network of top configurations.

Appendix A.3. Duplicate Entry Names

Entries in the Database of Religious History (DRH) all have a unique Entry ID, and the record has a name associated with it (Entry Name), but this name is not necessarily unique. In our sample of 407 religious cultures (Entry IDs), we have two religious cultures with the exact same Entry Name, Donatism and Roman Private Religion. In the case of Donatism, we do, in fact, appear to have two overlapping entries from different experts, one which focus on Donatism from 311 CE to 427 CE, and one which focus on Donatism from 311 CE to 600 CE. In this one case, a legitimate argument can be made that it would be more appropriate to collapse this into one record about Donatism, using the flexibility of our MPF algorithm to weight potential disagreement. The Roman Private Religion case is different, with both entries submitted by the same expert, one describing the religious culture between 202 BCE and 44 BCE, and the other one describing the religious culture between 600 BCE and 202 BCE. In this case, the periods do not overlap, and it is reasonable to assume that the expert has made the decision to split the records based on differences between the two cultures about which she has expert knowledge. These cases raise a more general point about independence, and how we should treat partially overlapping cultures.
The most extreme example from our curated subset of the DRH is the case of the Ancient Egyptian religions. We have a total of six different entries about Ancient Egyptian religions (e.g., early dynastic, first intermediate period, and old kingdom, etc.). Since these cultures naturally overlap on most attributes, the model will consider Ancient Egyptian religions as very stable configurations (which is, in fact, not totally unreasonable). Four of these six different entries focusing on Ancient Egyptian religions have the same configuration, and this configuration is assigned the second-highest probability mass in the landscape (annotated as “Ancient Egyption” in Figure 4A). Whether this is reasonable, or whether all of the Ancient Egyptian religions should be treated as one religious culture (and be weighted accordingly) is a difficult question. As culture is fluid, and no culture is completely independent from other—past and present—cultures, it seems impossible to design a general decision rule for whether to consider two cultures meaningfully independent. In this paper, we took the records from the DRH at face value, and treated each unique Entry ID as its own religious culture. In the future, more sophisticated approaches should be pursued in collaboration with domain experts.

Appendix B. Database of Religious History: Religions and Question Codes used in This Analysis

Table A1. Question subset used in DRH analysis. “Long” question names, e.g., “Does the religion have official political support”, correspond to “Related Question” as they appear in the DRH, besides ending characters which have been modified, e.g., “Does the religious group have scriptures?” instead of “Does the religious group have scriptures:” as it appears in the DRH. Short question names are used for convenience. ID column was recoded to range from 1–20, and does not correspond to “Related Question ID” in the DRH.
Table A1. Question subset used in DRH analysis. “Long” question names, e.g., “Does the religion have official political support”, correspond to “Related Question” as they appear in the DRH, besides ending characters which have been modified, e.g., “Does the religious group have scriptures?” instead of “Does the religious group have scriptures:” as it appears in the DRH. Short question names are used for convenience. ID column was recoded to range from 1–20, and does not correspond to “Related Question ID” in the DRH.
IDQuestion (Short; Long)
1Official political support
Does the religion have official political support
2Scriptures
Does the religious group have scriptures?
3Monumental religious architecture
Is monumental religious architecture present?
4Spirit-body distinction
Is a spirit-body distinction present?
5Belief in afterlife
Belief in afterlife?
6Reincarnation in this world
Reincarnation in this world?
7Special treatment for corpses
Are there special treatments for adherents’ corpses?
8Co-sacrifices in tomb/burial
Are co-sacrifices present in tomb/burial?
9Grave goods
Are grave goods present?
10Formal burials
Are formal burials present?
11Supernatural beings present
Are supernatural beings present?
12Supernatural monitoring present
Is supernatural monitoring present?
13Supernatural beings punish
Do supernatural beings mete out punishment?
14Castration required
Does membership in this religious group require castration?
15Adult sacrifice required
Does membership in this religious group require sacrifice of adults?
16Child sacrifice required
Does membership in this religious group require sacrifice of children?
17Suicide required
Does membership in this religious group require self-sacrifice (suicide)?
18Small-scale rituals required
Does membership in this religious group require participation in small-scale rituals (private, household)?
19Large-scale rituals required
Does membership in this religious group require participation in large-scale rituals?
20Distinct written language
Does the religious group in question possess its own distinct written language?
Figure A1. Dendrogram obtained by running an agglomerative clustering algorithm [43] on the 150 most probable configurations in our landscape. Labels correspond to the “Node” column in Table A3.
Figure A1. Dendrogram obtained by running an agglomerative clustering algorithm [43] on the 150 most probable configurations in our landscape. Labels correspond to the “Node” column in Table A3.
Entropy 25 00264 g0a1
Table A2. For each community, we calculate the average possession of each of the 20 religious attributes. We weight this by the probability of each configuration in the community, and convert it to a percentage (Self). We compare this to the average possession of the attribute across all configurations that are not in the community. Again, we weight this by the probability of each configuration and convert it to a percentage (Other). We calculate the difference (S-O), and show the three attributes for which each community differ most from the rest.
Table A2. For each community, we calculate the average possession of each of the 20 religious attributes. We weight this by the probability of each configuration in the community, and convert it to a percentage (Self). We compare this to the average possession of the attribute across all configurations that are not in the community. Again, we weight this by the probability of each configuration and convert it to a percentage (Other). We calculate the difference (S-O), and show the three attributes for which each community differ most from the rest.
GroupColorQuestionSelfOtherS-O
Group 1RedSmall-scale rituals required97.4138.6758.74
Monumental religious architecture79.6440.4139.23
Scriptures94.9556.0838.87
Large-scale rituals required100.0065.7734.23
Reincarnation in this world38.478.9329.54
Group 2BlueSmall-scale rituals required22.2586.72−64.47
Grave goods7.4357.19−49.76
Large-scale rituals required50.9095.99−45.09
Special treatment for corpses44.5387.74−43.20
Official political support40.7976.01−35.22
Group 3YellowScriptures19.3790.21−70.84
Grave goods86.9730.6256.35
Co-sacrifices in tomb/burial37.153.8133.34
Official political support91.6958.6433.06
Special treatment for corpses98.2768.7629.51
Distinctive community features.
Table A3. For each observed religion, we count up the total probability mass which corresponds to configurations in either of the five communities. We assign the religion to the community which contains the configuration with the highest probability mass. Some observed religions will not appear in this table because none of their possible realized configurations correspond to data states in the top 150 most probable configurations. Some observed religions contain missing data or inconsistent coding. Complete records (religions that have a unique configurations) will appear with 100% weight, and incomplete records will appear for the community in which the sum is highest. DRH ID can be linked back to the original entry online; e.g., https://religiondatabase.org/browse/654/#/ links to DRH ID 654 (the Cistercians). Group 1 corresponds to the red community, Group 2.1 corresponds to the light-blue community, Group 2.2 corresponds to the dark-blue community, Group 3.1 corresponds to the light-yellow community, and Group 3.2 corresponds to the dark-yellow community (see Figure 4 and Figure A1).
Table A3. For each observed religion, we count up the total probability mass which corresponds to configurations in either of the five communities. We assign the religion to the community which contains the configuration with the highest probability mass. Some observed religions will not appear in this table because none of their possible realized configurations correspond to data states in the top 150 most probable configurations. Some observed religions contain missing data or inconsistent coding. Complete records (religions that have a unique configurations) will appear with 100% weight, and incomplete records will appear for the community in which the sum is highest. DRH ID can be linked back to the original entry online; e.g., https://religiondatabase.org/browse/654/#/ links to DRH ID 654 (the Cistercians). Group 1 corresponds to the red community, Group 2.1 corresponds to the light-blue community, Group 2.2 corresponds to the dark-blue community, Group 3.1 corresponds to the light-yellow community, and Group 3.2 corresponds to the dark-yellow community (see Figure 4 and Figure A1).
GroupNodeDRH IDEntry Name (DRH)Weight
Group 1165412th–13th c Cistercians100
Group 11661Congregation of Savigny100
Group 11899Nahdlatul Ulama (NU)100
Group 11943Moravian Missionaries in Nunatsiavut100
Group 11963The Knights Templar100
Group 11966Naqshbandī Order, Naqshbandī Tarīqa, Naqshbandīyyah, Khwājagān,100
Group 11974Greek Chalcedonian Christians, Nicaea100
Group 111293The Zapotec or Ben ’Zaa (The Cloud People)100
Group 11676The Order of the Holy Trinity for the Redemption of Captives, 1198–150055.01
Group 111466Islamic modernists51.06
Group 11900Pharisees50.63
Group 11965Pachomian Monasticism44.75
Group 11888Céli Dé monks38.77
Group 11843Bnay Qyāmā and Bnāt Qyāmā29.88
Group 11691Fur28.28
Group 111268Monotheistic Pre-Islamic South Arabia26.53
Group 12476Cham Bani100.00
Group 12738Ancient Egyptian100
Group 12788The Late Bronze Age City-State of Ugarit100
Group 12970Cult of Isis (Mysteries of Isis)100
Group 121006Ancient Egypt—Old Kingdom100.00
Group 121008Ancient Egypt—First Intermediate Period100
Group 121125Ancient Egypt—the Ramesside Period100
Group 121069Religion at Deir el-Medina95.76
Group 121162Lingbao dafa93.52
Group 121258Religion in Judah93.52
Group 121191Religion in Greco-Roman Alexandria75.48
Group 12228Late Chosŏn Korea53.15
Group 121149Christianity in Tang China35.99
Group 13632Local Religion at Selinous100
Group 13931The Society of Jesus (Jesuits) in Britain100
Group 131140Early Medieval Confucianism100
Group 13354Priests and Scholars of Hellenistic Uruk98.01
Group 131004Religion in Roman Ostia51.50
Group 131227Polytheistic Pre-Islamic South Arabia32.14
Group 151043Islam in Aceh100.00
Group 151108Early Christianity and Monasticism in Egypt83.98
Group 16852Pre-Christian Religion/Paganism in Ireland82.15
Group 161231Shenxiao (“Divine Empyrean”) Daoism54.45
Group 16960Bön (Bon)52.93
Group 171218Yiguan Dao/I-Kuan Tao90.44
Group 1935816th-17th c. Gaudiya Vaisnava Tradition73.72
Group 19590Huayan School (Early Tang)51.70
Group 191172Twofold Mystery (Chongxuan)33.31
Group 111914Tariqa Shadhiliyya100
Group 1111370Teda69.30
Group 111994Monastic Communities of Lower Egypt: Nitria, Kellia, Scetis35.27
Group 111456The Essenes31.62
Group 112416Edinoverie100.00
Group 112984Calvinism (Early/Reformation)100
Group 115442Donatism100.00
Group 115483Northern Irish Roman Catholics100
Group 115989Opus Dei100.00
Group 1151038Sino-Muslims in Qing China100
Group 1151321Mourides (Muridiyya)100
Group 115387Ahmadi; Ahmadiyya Muslim Jama’at; Ahmadiya50.15
Group 115927Zealots28.67
Group 118176Qumran Movement62.40
Group 1261295Donatism77.18
Group 126621Haitians33.15
Group 128869Sichuan Esoteric Buddhist Cult100
Group 129891Peruvian Mormons100
Group 129983Tibetan and Himalayan Mundane and Landscape Cults100
Group 129493Krishna Worship in North India—Modern Period32.01
Group 129441Worshipers of Śītalā29.47
Group 131972Nestorian Christianity86.73
Group 1311012Inhabitants of Medieval Kurgus83.58
Group 131944Mohyla’s Ukrainian Church76.62
Group 133200Nechung Cult100
Group 133390Dasara100
Group 133415Shaiva World Renouncers100
Group 133440Jain Digambara Tantra, Karnataka100
Group 133636Balinese Śaiva priests (pedanda siwa)68.74
Group 1381041Korean Catholicism100
Group 139850Pre-Christian Religion / Paganism in Gaul77.24
Group 141420Swaminarayan Sampraday100
Group 141227Hindu Goddess Worship in Northwest India—Modern Period83.93
Group 141623Diasporic American Hinduism32.95
Group 1421241Taiping Movement61.03
Group 142870Postsocialist Mongolian Buddhism49.11
Group 142934Mongolian Buddhism during the Revolutionary Period44.78
Group 1431192Kaharingan100.00
Group 145657Trukese23.23
Group 149667Ainu58.28
Group 149736Jivaro45.07
Group 1491210Mende44.09
Group 149717Igbo26.85
Group 1491110Twana18.96
Group 159563Gaddi, a Hindu community of the Western Himalayas100
Group 1631178Modern Mystery School (MMS)100
Group 163802Chinese Nunnery in Myanmar33.18
Group 169886Pushtimarg (The Path of Grace) in the UK and Gujarat100
Group 175714Tamil Śaiva Bhakti100
Group 175926Ladakhi Buddhism100.00
Group 1941247Exovedate100
Group 194744Rwala Bedouin27.13
Group 1108526Hmong Christianity100.00
Group 1108564Tribal Christianity (and allied castes) in the Himalayas100
Group 1110686Popoluca53.45
Group 1124887Postsocialist Mongolian Shamanism84.17
Group 1124576Kuy traditional religions49.14
Group 1133958Society of Jesus100
Group 1150987Parsis, Zoroastrians of India100
Group 2.18855Middle-Class Migrant Muslims in the UAE100
Group 2.181076Inquisitors of Goa’s Santo Ofício100
Group 2.1221196K’iche’ (Quiché)68.03
Group 2.12483919th century German Protestantism100
Group 2.1241371Twelver Shi’ism in post-revolutionary Iran100
Group 2.1371374The Reformed Church (Early Orthodoxy)100
Group 2.1371522Tijaniyya Order/100
Group 2.137906The Church of England62.60
Group 2.154609Tallensi28.36
Group 2.158977Chishti Sufis100
Group 2.158883Catholics in the People’s Republic of China (PRC)99.63
Group 2.160602Amhara26.91
Group 2.1711517Tunisian Women’s Associations55.28
Group 2.1791333Cult of Thecla90.43
Group 2.185935Nigerian Pentecostalism100
Group 2.1851127Anglican Church of Korea100
Group 2.1851376African Initiated Churches53.25
Group 2.188633Mādhva69.68
Group 2.195884Sub Saharan Africa Pentecostalism100
Group 2.196941Chan Buddhists in early Qing period86.26
Group 2.1961024Universal Salvation Ritual34.38
Group 2.1103928The Ghost Dance Movement and the Lakota Sioux100
Group 2.11041349The Dingxiang Wang Cult61.80
Group 2.1130419The Worship of Jagannath in Puri (Odisha)100
Group 2.1145607Mohism66.52
Group 2.1146637Yahgan90.63
Group 2.24873The Branch Davidians100
Group 2.24880Egyptian Salafism (inluding North Africa and West Asia)100
Group 2.24968Anabaptist Mennonites in North America, 1683–2021100
Group 2.24988Churches of Christ- United States100
Group 2.241311Jehovah’s Witnesses100
Group 2.24858The New Prophecy or “Montanism”87.89
Group 2.24897Gaengjeongyudo71.90
Group 2.24948Christianity in Ephesus46.18
Group 2.24196Pauline Christianity (ca. 45–60 CE)39.29
Group 2.241309Circumcellions31.63
Group 2.217857Wesleyanism100
Group 2.217879Free Methodist Church100.00
Group 2.217942African Methodist Episcopal Church100
Group 2.217950The Religious Society of Friends100
Group 2.217975Neo-Charismatic Movement—Third Wave Charismatic Movement100.00
Group 2.2171334No-debt Movement in US Evangelicalism100.00
Group 2.217892Charismatic Renewal Movement in Christianity—Second Wave Pentecostalism96.88
Group 2.2361307Southern Baptists100
Group 2.2441392Messalians39.84
Group 2.246859Valentinians83.50
Group 2.253953Sachchai96.93
Group 2.2551010Pythagoreanism66.18
Group 2.2641013Estado da Índia Renegades in Deccan100
Group 2.264182Pauline Christianity63.55
Group 2.21091304Peyote Religion (Peyotism) and the Native American Church100
Group 2.2109812Temple of the Jedi Order81.19
Group 2.2115862Ilm-e-Khshnoom100.00
Group 2.2125842American Evangelicalism100.00
Group 2.2125915Protestantism welcoming People with Disabilities100.00
Group 2.2129871Spiritualism100
Group 3.176768Mundurucu35.17
Group 3.176723Trumai21.93
Group 3.1801511Sokoto53.77
Group 3.192710Hidatsa50.32
Group 3.1105689Lengua21.84
Group 3.1127677Yapese44.57
Group 3.1128658Mapuche87.90
Group 3.1139769Wogeo46.72
Group 3.1143764Timbira (Canela)95.67
Group 3.11431228Lesu41.89
Group 3.11491240Azande63.60
Group 3.1149737Nama Hottentot51.49
Group 3.11491458Mbuti35.30
Group 3.2101251Tsonga10
Group 3.210794Omaha82.23
Group 3.210389Iban traditional religion68.05
Group 3.210492Roman Divination59.68
Group 3.210748Lamet41.79
Group 3.210729Kikuyu32.93
Group 3.2101434Kuna28.65
Group 3.210739Lakalai27.59
Group 3.213230Religion in Mesopotamia100
Group 3.2131129Ancient Thessalians100
Group 3.2131337Tell Afis (Syria)55.01
Group 3.213757Kiribati32.51
Group 3.2131101Natchez22.65
Group 3.219685Badjau82.79
Group 3.2201323Luguru69.58
Group 3.221993Pontifex Maximus and Pontifices (Pontifical College)100
Group 3.223424Achaemenid Religion89.49
Group 3.223681Sargonic Empire50.82
Group 3.225534Roman Imperial Cult100
Group 3.2251051Religion at Tell el-Dab’a (ancient Avaris) in Ancient Egypt98.74
Group 3.2251189Yādiya/Sam’al90.74
Group 3.225479Mesopotamian city-state cults of the Early Dynastic periods37.75
Group 3.2271248Religion in the Old Assyrian Period100
Group 3.227712Huichol27.33
Group 3.2271015Ancient Egypt—Predynastic Period—Early Naqada Culture27.12
Group 3.232721Semang89.41
Group 3.232651Kapauku56.67
Group 3.235470Archaic Spartan Cults69.65
Group 3.248224Old Norse Fornsed42.32
Group 3.257581Mentawai (Rereiket)39.64
Group 3.257620Papago25.17
Group 3.277662Goajiro51.79
Group 3.277578Gond42.36
Group 3.281650Manus61.11
Group 3.281626Barama River Carib52.54
Group 3.281614!Kung39.80
Group 3.2871246Shilluk58.89
Group 3.287733Kaska50.10
Group 3.299722Gros Ventre92.31
Group 3.2101918Fangshi56.42
Group 3.2120257Classic Zapotec73.85
Group 3.2120671Mbau Fijians69.48
Group 3.21201420Creek63.49
Group 3.2120732Marquesans19.45
Table A4. Religious cultures in our dataset that do not have configurations (or even possible configurations) which overlap with the 150 most probable configurations visualized in Figure 4.
Table A4. Religious cultures in our dataset that do not have configurations (or even possible configurations) which overlap with the 150 most probable configurations visualized in Figure 4.
DRH IDEntry Name (DRH)
23Late Shang Religion
173Johannine Christianity
174Matthew-James-Didache Movement
190Sri Lankan Buddhism (1948-Present)
204Han Confucianism
217Roman private religion
222Late Classic Lowland Maya
231Roman private religion
263Irish Catholicism
284Yolngu religion
294Xuanxue
308Neo-Assyrian Scholars
310Aztec Imperial Core
364Chittagong Plain Buddhists
378Bahinabai Chaudhari’s Songs: A Performance Tradition in Maharashtra
381Sikhism: Guru Nanak to Guru Arjan
383Varkaris
392Sikhism: Guru Hargobind to Guru Gobind Singh
395International Society for Krishna Consciousness (ISKCON)
400Singaporean Mega-Churches
422Demeter Cult
439Śaiva Magic
455Lan Na Buddhism
469Won-Buddhism
472Karma Kagyu or Kamtsang Kagyu
477Tractarian Movement
478Early Indian Buddhism
484Northern Irish Protestants
485Sámi pre-Christian religion
486Church of Jesus Christ of Latter-day Saints (early)
490Guglielmites
491Anglican Church
520Cham Ahiér
525Church of Jesus Christ of Latter-day Saints (modern)
535Pāśupatas
562Medieval Srivaisnavism
570Supreme Master Ching Hai World Society
580Nāth Sampradāya
582Siamese (Central Thai)
586Kelantanese Thai Religion
589Warrau
592Veerashaivas
597Confucianism - Eastern Zhou
599Lepcha
605Burmese
613Maori
618Nuxalk
619Gilyak
624The Roshaniyya
625Copper Inuit
627Tikopia
629Newar Buddhists
630Aranda
631Cayapa
635Ifugao
638Klamath
639Chinese Esoteric Buddhism (Tang Tantrism)
645Meo Muslim, Mev, Mewati Muslim
646Lakota Religious Traditions
647Havasupai
649Worshippers at the Chidambaram Nataraja Temple, Modern Period
652Tiwi
655Siriono
660Iban
666Siuai
669Orokaiva
674Aymara
675Chukchee
679Santal
680Gujarati Mata Worshipers
682Lakher
684Huron
688Darul Uloom Deoband
690Ahl-e-Sunnat wa Jamaat
711Kwoma
713Northern Saulteaux
719Hinduism in Trinidad
726Early Sramanas
727Raglai
742Thai Bhikkhunis
745Buka
749Korean shamanism
751Buddhism in the Mekong Delta
752Haroi
755Uyghur Islam
759Comanche
765The Oneida Community
770Thai Forest Tradition
771Muscular Christianity
826The Church of Christ, Scientist
833Dobu
841Sa skya
846Xuanzang’s Yogācāra Tradition
848Sadducees
849“Gaiwiio Religion,” “Longhouse Religion,” or “The Way of Handsome Lake” of the
Seneca Tribe
851Sikhism in the United States
854pre-Christian Irish
860Kimpa Vita
867Nyingma Treasure
877The International Network of Engaged Buddhists - INEB
882Orphism
885Contemporary West African Vodun
893Sethian Gnostic
894Universal Fellowship of Metropolitan Community Churches
896Sannō Shintō
898Rastafari of Jamaica
910Christian Base Community movement
919Catholicism in contemporary Croatia
921Bhils
924Muridiyya of Senegal
925Rabbinic Judaism (Babylonia)
929Drikung Kagyu
933Marcionites
937Vestal Virgins
938Ugarit
939Goodenough and Fergusson Islanders
940Julio-Claudian Imperial Cult
946Nyingma (rnying ma)
949The Church of All Worlds
952Ethiopian Jews
957Spartan Religion
961Romanian Orthodox Church
962Amdo Gelukpa
964Baul Fakirs of Bengal
967Unitarian Universalism
969Adi Dravida/Valluva Sakya Buddhism
971Ganapatya
973The Sarna religion of the Oraons of Jharkhand
976Mising Community
978Indonesian Catholicism
980Pure Land Buddhist Schools in Early Medieval Japan
981Nation of Islam
982The Samaritans (Persian to early Roman periods)
985The Victorines
986Theurgy
991Beat Buddhism
995Tamil Neo-Saivism
997Ancient Egypt - Early Dynastic Period
1028German Pietists (Hasidei Ashkenaz)
1037Third Intermediate Period in Ancient Egypt
1044Han Imperial Cult under Emperor Wu
1060The Taizhou Movement
1071Digital Shinto Communities
1083Drukpa Kagyü School (Bhutan)
1087The Bogomils
1106Old Kingdom Religion at Abydos
1109Chan Buddhism in the Song
1133Religion in Greco-Roman Egypt
1134Tiantai
1136The Fellowship of Goodness (Tongshanshe)
1153Russian Orthodox Mission in Alaska
1154The Cult of the Fox
1156Solovetski monastery
1180Batak Traditional Religions
1183Religion at Nippur in the Ur III period
1197Atheism in the Soviet Union
1199Tiv
1223Toda
1234Ashanti
1279Mandarese Muslims
1283Butonese Muslims
1284Religion of Phoenicia
1289Buginese Muslims
1299Russians (of Viriatino Village)
1300Enlace de Agentes de Pastoral Indígena (EAPI, Network of Indigenous Ministry Agents)
1301Moche (Mochica)
1312Bishnoi
1319Reginistas
1322Pagans under the Emperor Julian
1335Congregation of the Oratory
1341Muslim Students Association of the United States and Canada
1344Tibetan Nonsectarianism (ris med)
1352Sakadvipiya Brahmanas
1357Encratites
1386Liumen (Liu School)
1390Eastern Apache
1396Ptolemaic Egypt—Egyptian Religion
1397The Classic Period Peripheral Coastal Lowlands Ritual Ballgame Cult
1409Formative Olmec
1412Tenrikyo
1415Anomeans
1419Ancient Egyptian Religion in the Early 18th Dynasty
1426Early Missionary Christianity in China
1433Novatians
1436The Monastic School of Gaza
1441Mexica (Aztec) Religion
1454Hesychastic Controversy
1468Eastern Christianity From Nicaea to Chalcedon
1495Secular Buddhists
1521Order of the Hermits of St Augustine (Augustinian friars)
1542Umbanda

References

  1. Clarke, D. Archaeology: The loss of innocence. Antiquity 1973, 47, 6–18. [Google Scholar] [CrossRef] [Green Version]
  2. Henrich, J.; Heine, S.J.; Norenzayan, A. The WEIRDest people in the world? Behav. Brain Sci. 2010, 33, 61–83. [Google Scholar] [CrossRef]
  3. Smail, D.L. On Deep History and the Brain; University of California Press: Oakland, CA, USA, 2007. [Google Scholar]
  4. Slingerland, E.; Sullivan, B. Durkheim with data: The database of religious history. J. Am. Acad. Relig. 2017, 85, 312–347. [Google Scholar] [CrossRef]
  5. Sohl-Dickstein, J.; Battaglino, P.; DeWeese, M.R. Minimum probability flow learning. In Proceedings of the Proceedings of the 28th International Conference on International Conference on Machine Learning, Bellevue, WA, USA, 28 June–2 July 2011; Omnipress: Madison, WI, USA, 2011; pp. 905–912. [Google Scholar]
  6. Slingerland, E.; Monroe, M.W.; Muthukrishna, M. The Database of Religious History (DRH): Ontology, Coding Strategies and the Future of Cultural Evolutionary Analyses. Relig. Brain Behav. 2022, in press. [Google Scholar]
  7. Wilson, A.S.; Brown, E.L.; Villa, C.; Lynnerup, N.; Healey, A.; Ceruti, M.C.; Reinhard, J.; Previgliano, C.H.; Araoz, F.A.; Diez, J.G.; et al. Archaeological, radiological, and biological evidence offer insight into Inca child sacrifice. Proc. Natl. Acad. Sci. USA 2013, 110, 13322–13327. [Google Scholar] [CrossRef] [Green Version]
  8. DeDeo, S. Major transitions in political order. In From Matter to Life: Information and Causality; Walker, S.I., Davies, P.C.W., Ellis, G.F.R., Eds.; Cambridge University Press: Cambridge, UK, 2017; pp. 393–428. [Google Scholar] [CrossRef] [Green Version]
  9. Bubeck, S.; Sellke, M. A Universal Law of Robustness via Isoperimetry. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–14 December 2021; Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2021; Volume 34, pp. 28811–28822. [Google Scholar]
  10. Ackley, D.H.; Hinton, G.E.; Sejnowski, T.J. A learning algorithm for Boltzmann machines. Cogn. Sci. 1985, 9, 147–169. [Google Scholar] [CrossRef]
  11. Tkačik, G.; Marre, O.; Mora, T.; Amodei, D.; Berry, M.J., II; Bialek, W. The simplest maximum entropy model for collective behavior in a neural network. J. Stat. Mech. Theory Exp. 2013, 2013, P03011. [Google Scholar] [CrossRef]
  12. Schneidman, E.; Berry, M.J.; Segev, R.; Bialek, W. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature 2006, 440, 1007–1012. [Google Scholar] [CrossRef] [Green Version]
  13. Mora, T.; Walczak, A.M.; Bialek, W.; Callan, C.G., Jr. Maximum entropy models for antibody diversity. Proc. Natl. Acad. Sci. USA 2010, 107, 5405–5410. [Google Scholar] [CrossRef] [Green Version]
  14. Louie, R.H.Y.; Kaczorowski, K.J.; Barton, J.P.; Chakraborty, A.K.; McKay, M.R. Fitness landscape of the human immunodeficiency virus envelope protein that is targeted by antibodies. Proc. Natl. Acad. Sci. USA 2018, 115, E564–E573. [Google Scholar] [CrossRef]
  15. Bialek, W.; Cavagna, A.; Giardina, I.; Mora, T.; Silvestri, E.; Viale, M.; Walczak, A.M. Statistical mechanics for natural flocks of birds. Proc. Natl. Acad. Sci. USA 2012, 109, 4786–4791. [Google Scholar] [CrossRef] [Green Version]
  16. Daniels, B.C.; Krakauer, D.C.; Flack, J.C. Control of finite critical behaviour in a small-scale social system. Nat. Commun. 2017, 8, 1–8. [Google Scholar] [CrossRef] [Green Version]
  17. Lee, E.D.; Broedersz, C.P.; Bialek, W. Statistical mechanics of the US Supreme Court. J. Stat. Phys. 2015, 160, 275–301. [Google Scholar] [CrossRef] [Green Version]
  18. Lee, E.D. Partisan intuition belies strong, institutional consensus and wide Zipf’s law for voting blocs in US Supreme Court. J. Stat. Phys. 2018, 173, 1722–1733. [Google Scholar] [CrossRef]
  19. Stephens, G.J.; Bialek, W. Statistical mechanics of letters in words. Phys. Rev. E 2010, 81, 66119. [Google Scholar] [CrossRef] [Green Version]
  20. Miton, H.; DeDeo, S. The cultural transmission of tacit knowledge. J. R. Soc. Interface 2022, 19, 20220238. [Google Scholar] [CrossRef]
  21. Sherrington, D.; Kirkpatrick, S. Solvable model of a spin-glass. Phys. Rev. Lett. 1975, 35, 1792. [Google Scholar] [CrossRef]
  22. Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev. 1957, 106, 620. [Google Scholar] [CrossRef]
  23. Stephens, G.J.; Osborne, L.C.; Bialek, W. Searching for simplicity in the analysis of neurons and behavior. Proc. Natl. Acad. Sci. USA 2011, 108, 15565–15571. [Google Scholar] [CrossRef] [Green Version]
  24. Hillar, C.; Sohl-Dickstein, J.; Koepsell, K. Efficient and optimal binary Hopfield associative memory storage using minimum probability flow. In Proceedings of the Neural Information Processing Systems (NurIPS) Workshop on Discrete Optimization in Machine Learning (DISCML), Virtual, 6–14 December 2021. [Google Scholar] [CrossRef]
  25. Nguyen, H.C.; Zecchina, R.; Berg, J. Inverse statistical problems: From the inverse Ising problem to data science. Adv. Phys. 2017, 66, 197–261. [Google Scholar] [CrossRef] [Green Version]
  26. Hinton, G.E. Training products of experts by minimizing contrastive divergence. Neural Comput. 2002, 14, 1771–1800. [Google Scholar] [CrossRef]
  27. Lee, E.D.; Daniels, B.C. Convenient Interface to Inverse Ising (ConIII): A Python 3 Package for Solving Ising-Type Maximum Entropy Models. J. Open Res. Softw. 2018, 7. [Google Scholar] [CrossRef] [Green Version]
  28. Bickel, P.J.; Li, B.; Tsybakov, A.B.; van de Geer, S.A.; Yu, B.; Valdés, T.; Rivero, C.; Fan, J.; van der Vaart, A. Regularization in statistics. Test 2006, 15, 271–344. [Google Scholar] [CrossRef] [Green Version]
  29. Battaglino, P.B. Minimum Probability Flow Learning: A New Method For Fitting Probabilistic Models. Ph.D. Thesis, University of California, Berkeley, CA, USA, 2014. [Google Scholar]
  30. Slingerland, E.; Atkinson, Q.D.; Ember, C.R.; Sheehan, O.; Muthukrishna, M.; Bulbulia, J.; Gray, R.D. Coding culture: Challenges and recommendations for comparative cultural databases. Evol. Hum. Sci. 2020, 2, e29. [Google Scholar] [CrossRef]
  31. Spicer, R.; Monroe, M.W.; Hamm, M.; Danielson, A.; Canlas, G.; Randall, I.; Slingerland, E. Religion and ecology: A pilot study employing the database of religious history. Curr. Res. Ecol. Soc. Psychol. 2022, 3, 100073. [Google Scholar] [CrossRef]
  32. Kline, D.M.; Berardi, V.L. Revisiting squared-error and cross-entropy functions for training neural network classifiers. Neural Comput. Appl. 2005, 14, 310–318. [Google Scholar] [CrossRef]
  33. Norenzayan, A. Big Gods: How Religion Transformed Cooperation and Conflict; Princeton University Press: Princeton, NJ, USA, 2013. [Google Scholar]
  34. Fruchterman, T.M.; Reingold, E.M. Graph drawing by force-directed placement. Softw. Pract. Exp. 1991, 21, 1129–1164. [Google Scholar] [CrossRef]
  35. Ellson, J.; Gansner, E.; Koutsofios, L.; North, S.C.; Woodhull, G. Graphviz—Open source graph drawing tools. In Proceedings of the International Symposium on Graph Drawing, Vienna, Austria, 23–26 September 2001; pp. 483–484. [Google Scholar]
  36. Bellah, R.N. Religion in Human Evolution: From the Paleolithic to the Axial Age; Harvard University Press: Cambridge, MA, USA, 2017. [Google Scholar]
  37. Whitehouse, H.; François, P.; Savage, P.E.; Hoyer, D.; Feeney, K.C.; Cioni, E.; Purcell, R.; Larson, J.; Baines, J.; Haar, B.t.; et al. Testing the Big Gods hypothesis with global historical data: A review and “retake”. Relig. Brain Behav. 2022, 1–43. [Google Scholar] [CrossRef]
  38. Luhrmann, T. How God Becomes Real: Kindling the Presence of Invisible Others; Princeton University Press: Princeton, NJ, USA, 2020. [Google Scholar]
  39. Mehta, P.; Schwab, D.J. An exact mapping between the variational renormalization group and deep learning. arXiv 2014, arXiv:1410.3831. [Google Scholar]
  40. Towns, J.; Cockerill, T.; Dahan, M.; Foster, I.; Gaither, K.; Grimshaw, A.; Hazlewood, V.; Lathrop, S.; Lifka, D.; Peterson, G.D.; et al. XSEDE: Accelerating scientific discovery. Comput. Sci. Eng. 2014, 16, 72–74. [Google Scholar] [CrossRef]
  41. Brown, S.T.; Buitrago, P.; Hanna, E.; Sanielevici, S.; Scibek, R.; Nystrom, N.A. Bridges-2: A platform for rapidly-evolving and data intensive research. Practice and Experience in Advanced Research Computing. PEARC Conf. Ser. 2021, 1–4. [Google Scholar] [CrossRef]
  42. Gansner, E.R.; Koren, Y.; North, S. Graph drawing by stress majorization. In Proceedings of the International Symposium on Graph Drawing, New York, NY, USA, 29 September–2 October 2004; pp. 239–250. [Google Scholar]
  43. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  44. Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 2008, P10008. [Google Scholar] [CrossRef]
Figure 1. Regularization corrects for overfitting. A sample reconstruction of the 2 20 (≈1 million) probabilities for a landscape, based on 256 datapoints. Without the regularization constraint (red points), the model underestimates the probabilities of some reasonably common configurations. The effect is largely controlled for when using regularization with cross-validation (blue points).
Figure 1. Regularization corrects for overfitting. A sample reconstruction of the 2 20 (≈1 million) probabilities for a landscape, based on 256 datapoints. Without the regularization constraint (red points), the model underestimates the probabilities of some reasonably common configurations. The effect is largely controlled for when using regularization with cross-validation (blue points).
Entropy 25 00264 g001
Figure 2. An example of how Partial-MPF adapts the baseline MPF algorithm to make use of partial data. We begin with 128 complete samples of a particular 20-question landscape (drawn from a distribution with β equal to 0.2), and then add additional, incomplete samples where five of the 20 questions are marked unknown. As more, but incomplete, data is added, the Partial-MPF fit (blue line) continues to improve, though not as fast as when the additional data is complete (yellow line). By contrast, the naive strategy (red line) often makes the fit worse, because imputation destroys implicit correlations.
Figure 2. An example of how Partial-MPF adapts the baseline MPF algorithm to make use of partial data. We begin with 128 complete samples of a particular 20-question landscape (drawn from a distribution with β equal to 0.2), and then add additional, incomplete samples where five of the 20 questions are marked unknown. As more, but incomplete, data is added, the Partial-MPF fit (blue line) continues to improve, though not as fast as when the additional data is complete (yellow line). By contrast, the naive strategy (red line) often makes the fit worse, because imputation destroys implicit correlations.
Entropy 25 00264 g002
Figure 3. The logic of the cultural landscape (A), compared to the surface-level correlations (B). Nodes represent questions; see Table A1 for the question text. (A) edges represent the fifteen strongest pairwise couplings ( J i j ) between questions, as inferred by Partial-MPF; nodes (questions) are colored by the value of the local fields h i . (B) edges represent the fifteen strongest Pearson correlations; nodes are colored by the observed mean. Node placement (layout) is explained in the Appendix A.1.
Figure 3. The logic of the cultural landscape (A), compared to the surface-level correlations (B). Nodes represent questions; see Table A1 for the question text. (A) edges represent the fifteen strongest pairwise couplings ( J i j ) between questions, as inferred by Partial-MPF; nodes (questions) are colored by the value of the local fields h i . (B) edges represent the fifteen strongest Pearson correlations; nodes are colored by the observed mean. Node placement (layout) is explained in the Appendix A.1.
Entropy 25 00264 g003
Figure 4. (A) (left) shows the 150 configurations that have the highest probability mass according to our model. We only show edges between configurations (nodes) that are immediate neighbors (separated by 1 Hamming distance). Nodes are scaled by the probability mass assigned to each configuration, and edges are scaled by the product of the probability mass of the nodes that they connect. Colors are assigned to each of five groups based on hierarchical clustering (see Appendix A.2 and the dendrogram, Figure A1). (B) shows the 50 most probable configurations in the local neighborhood of the Free Methodist Church, while (C) shows the 50 most probable configurations in the local neighborhood of the Roman Imperial Cult. In all cases (AC), the layout is determined by a force-directed placement algorithm [34] as implemented in Graphviz [35]. For more on the layout approach, see Appendix A.1.
Figure 4. (A) (left) shows the 150 configurations that have the highest probability mass according to our model. We only show edges between configurations (nodes) that are immediate neighbors (separated by 1 Hamming distance). Nodes are scaled by the probability mass assigned to each configuration, and edges are scaled by the product of the probability mass of the nodes that they connect. Colors are assigned to each of five groups based on hierarchical clustering (see Appendix A.2 and the dendrogram, Figure A1). (B) shows the 50 most probable configurations in the local neighborhood of the Free Methodist Church, while (C) shows the 50 most probable configurations in the local neighborhood of the Roman Imperial Cult. In all cases (AC), the layout is determined by a force-directed placement algorithm [34] as implemented in Graphviz [35]. For more on the layout approach, see Appendix A.1.
Entropy 25 00264 g004
Table 1. Cross-validation can recover near-optimal sparsity parameters. Without sparsity, MPF consistently overfits to observed data. Reconstruction with 20 nodes (210 parameters), and 128 data points (i.e., the undersampled regime). The more computationally expensive N 2 strategy does not improve significantly over the simpler N 1 .
Table 1. Cross-validation can recover near-optimal sparsity parameters. Without sparsity, MPF consistently overfits to observed data. Reconstruction with 20 nodes (210 parameters), and 128 data points (i.e., the undersampled regime). The more computationally expensive N 2 strategy does not improve significantly over the simpler N 1 .
β RangeOptimal KLKL with CVStandard MPF
n = 20, 128 Points
0.01–0.125 (dispersed)0.220.231.2
0.125–0.25 (ordered)0.550.562.3
0.25–0.5 (near critical)0.620.6319.4
0.5–1.0 (critical)0.500.549.5
Table 2. Reweighting observations can correct for sample bias.
Table 2. Reweighting observations can correct for sample bias.
β RangeIdealBiased Sample
KLBias against Minority
CorrectedNaiveCorrectedNaive
0.01–0.125 (dispersed)0.130.160.16−0.2%−14%
0.125–0.25 (ordered)0.340.450.46−0.1%−40%
0.25–0.5 (near-critical)0.430.550.609.6%−51%
0.5–1.0 (critical)0.480.570.710.1%−65%
Table 3. Using Partial-MPF to reconstruct landscapes in the presence of partially-observed data. While the “naive” strategy actually decreases the quality of the fit, Partial-MPF enables efficient use of partial observations to improve knowledge of the landscape.
Table 3. Using Partial-MPF to reconstruct landscapes in the presence of partially-observed data. While the “naive” strategy actually decreases the quality of the fit, Partial-MPF enables efficient use of partial observations to improve knowledge of the landscape.
β Range128 Full128 Full + 128 Partial256 Full
Partial-MPFNaive
0.01–0.125 (dispersed)0.230.170.230.15
0.125–0.25 (ordered)0.560.410.560.34
0.25–0.5 (near critical)0.630.440.810.40
0.5–1.0 (critical)0.540.411.060.38
Table 4. Predictions of the landscape model for Archaic Spartan Cults.
Table 4. Predictions of the landscape model for Archaic Spartan Cults.
Small-Scale RitualNo Small-Scale Ritual
Child sacrifice1.2%0.4%
No child sacrifice69.7%28.8%
Table 5. Observed configurations labelled in Figure 4A; DRH ID can be used as reference to the original source, e.g., DRH ID 654 (the Cistercians) can be found at https://religiondatabase.org/browse/654/#/.
Table 5. Observed configurations labelled in Figure 4A; DRH ID can be used as reference to the original source, e.g., DRH ID 654 (the Cistercians) can be found at https://religiondatabase.org/browse/654/#/.
GroupDRH IDEntry Name (DRH)
Group 165412th–13th c. Cistercians
738Ancient Egyptian
931The Society of Jesus (Jesuits) in Britain
1043Islam in Aceh
852Pre-Christian Religion / Paganism in Ireland
1218Yiguan Dao/I-Kuan Tao
35816th-17th c. Gaudiya Vaisnava Tradition
456The Essenes
984Calvinism (Early/Reformation)
Group 21311Jehovah’s Witnesses
855Middle-Class Migrant Muslims in the UAE
879Free Methodist Church
83919th century German Protestantism
1307Southern Baptists
906The Church of England
1392Messalians
859Valentinians
609Tallensi
1010Pythagoreanism
883Catholics in the People’s Republic of China (PRC)
1304Peyote Religion (Peyotism)
Group 31251Tsonga
230Religion in Mesopotamia
1323Luguru
534Roman Imperial Cult
723Trumai
1511Sokoto
710Hidatsa
769Wogeo
Table 6. Distinctive features of the five clusters in the landscape of Figure 4A; + indicates higher than average rates of “yes”; −, higher than average rates of “no”. See Appendix B Table A3 for full list.
Table 6. Distinctive features of the five clusters in the landscape of Figure 4A; + indicates higher than average rates of “yes”; −, higher than average rates of “no”. See Appendix B Table A3 for full list.
GroupColorTop Distinctive Practices
Group 1Red+ rituals (small, large); + monuments; + scriptures
Group 2Blue− rituals (small, large); − grave goods; − special corpse treatment
Group 3Yellow− scriptures; + grave goods; + co-sacrifices in tomb
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Poulsen, V.M.; DeDeo, S. Inferring Cultural Landscapes with the Inverse Ising Model. Entropy 2023, 25, 264. https://doi.org/10.3390/e25020264

AMA Style

Poulsen VM, DeDeo S. Inferring Cultural Landscapes with the Inverse Ising Model. Entropy. 2023; 25(2):264. https://doi.org/10.3390/e25020264

Chicago/Turabian Style

Poulsen, Victor Møller, and Simon DeDeo. 2023. "Inferring Cultural Landscapes with the Inverse Ising Model" Entropy 25, no. 2: 264. https://doi.org/10.3390/e25020264

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop