Reinforcement Learning in a New Keynesian Model

Deák, Szabolcs; Levine, Paul; Pearlman, Joseph; Yang, Bo

doi:10.3390/a16060280

Open AccessArticle

Reinforcement Learning in a New Keynesian Model^†

¹

Department of Economics, University of Exeter, Exeter EX4 4PU, UK

²

School of Economics, University of Surrey, Guildford GU2 7XH, UK

³

Department of Economics, City University London, London EC1R 0JD, UK

⁴

Department of Economics, Swansea University, Swansea SA2 8PP, UK

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper presented at the CEF 2015 Conference, Taiwan, June 2015; the MMF 2016 Conference, University of Bath, September 2016; workshops at the Bank of England, March 2015, the Tinbergen Institute, October, 2015 and the Bank of Portugal, June 2018.

Algorithms 2023, 16(6), 280; https://doi.org/10.3390/a16060280

Submission received: 24 March 2023 / Revised: 22 May 2023 / Accepted: 22 May 2023 / Published: 31 May 2023

(This article belongs to the Special Issue Advancements in Reinforcement Learning Algorithms)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

We construct a New Keynesian (NK) behavioural macroeconomic model with bounded-rationality (BR) and heterogeneous agents. We solve and simulate the model using a third-order approximation for a given policy and evaluate its properties using this solution. The model is inhabited by fully rational (RE) and BR agents. The latter are anticipated utility learners, given their beliefs of aggregate states, and they use simple heuristic rules to forecast aggregate variables exogenous to their micro-environment. In the most general form of the model, RE and BR agents learn from their forecasting errors by observing and comparing them with each other, making the composition of the two types endogenous. This reinforcement learning is then at the core of the heterogeneous expectations model and leads to the striking result that increasing the volatility of exogenous shocks, by assisting the learning process, increases the proportion of RE agents and is welfare-increasing.

Keywords:

new Keynesian behavioural model; heterogeneous expectations; bounded rationality; reinforcement learning

1. Introduction

Since the burst of the United States housing bubble in 2008, a large amount of recent behavioural macroeconomics literature has emerged in response to what many regard as the extreme modelling assumption of rational (model-consistent) expectations—henceforth RE. Its defining characteristic is to limit the cognitive skills of at least a group of agents in the model. One strand of this literature achieves this by introducing simple ‘heuristic’ learning rules which can be thought of as parsimonious forms of forecasting rules (as in References [1,2,3,4]). This, we argue, fits well the behavioural approach of assuming agents in the model with limited cognitive skills who behave according to bounded rationality—henceforth BR.

However, this raises the opposite concern regarding the bounds on BR: with heuristic rule-of-thumb behaviour, agents may fall considerably short of building RE, and such models are particularly vulnerable to the Lucas critique when policy scenarios are studied. The problem is that agents can depart from rationality in an infinite number of ways, leading into the “the wilderness of bounded rationality problem” of Reference [5]. The challenge posed by the wilderness is clearly demonstrated by the sheer size of literature on behavioural macroeconomics and the huge number of equilibria proposed. Surveys include References [6,7,8,9].

The concern of behavioural models regarding RE are shared by the recent Agent-Based(AB) alternatives. This approach represents economic agents as well as various social and environmental phenomena as autonomous virtual entities that interact during simulation experiments following pre-defined rules. In standard macroeconomic models, agents’ decisions consist of behavioural equations or, in the case of dynamic stochastic general equilibrium (DSGE) models, micro-founded first-order conditions satisfying a dynamic optimisation problem, that are continuous functions of the current and past state of the economy. The AB approach provides a potentially more flexible way of modelling the cognitive capabilities of decision makers and their responses to both the macro- and individual micro-environment (for example, the authors of Reference [10] studied the inter-linkages between the real and financial sides of the economy using an AB framework in which different types of agents interact on different markets following simple heuristic rules).

When emotional states, cognitive limitations and past information play a key role in economic behaviour, the AB decision process serves as a promising approach for accounting for the behaviour of heterogeneous rule-possessing agents. In AB models, economies can represent out-of-equilibrium behaviour and non-market clearing and can be regarded as “evolving systems of autonomous interacting agents” Reference [11]. Hence, while DSGE assumes that agents have very sophisticated computational capabilities and live in very simple environments, AB models assume that people use simple behavioural rules to cope with complex and dynamic environments. Many of the features of AB models in addition to non-RE, such as heterogeneous agents and unemployment, are now being incorporated into DSGE models. The bounded-rational behavioural models with learning can be then seen as a genre with both classical DSGE and AB modelling features (see Reference [12] for further discussions).

In response to the wilderness concern, the literature on BR models adopts a basic general heterogeneous expectations framework pioneered by Reference [13]. To limit the departure from rationality, the approach of reinforcement learning proposes that, although adaptation can be slow and there can be a random component of choice, the higher the “payoff” (defined appropriately) from taking an action in the past, the more likely it will be taken in the future. We adopt a heterogeneous RE-BR model of this type. The idea behind this correction mechanism in which agents evaluate the payoff function is rooted in discrete choice theory, which is extensively studied in the fields of experimental economics and cognitive psychology. Recent studies have shown that, when managing their incentive structures, agents with market-consistent information may not follow rational choice theory and do not always correct irrational behaviour even if they have sufficient knowledge available to correct it Reference [14]. Instead, a recent study by Reference [15] conducted several experiments to analyse how agents decide between different alternatives. The results showed that people tend to evaluate their perceived efficacy to correct the error by following rational principles based on cognitively assessing the costs and benefits (payoff) associated with the correction.

In addition to the selection mechanism, for given proportions of RE and BR agents, there then exists a choice of learning model: Euler versus the anticipated utility approach (following Reference [16])—henceforth EL and AU. In both approaches, agents cannot form model-consistent expectations. Under EL, agents forecast their own one-period-ahead decisions, whereas under AU, agents form beliefs over the future infinite time horizon of aggregate states and prices which are exogenous to their decisions (AU, also known the “infinite time-horizon” framework, is closely related to the “internal rationality” (IR) approach of Reference [17]). Under both IR and AU, agents maximise utility, given their constraints and a consistent set of probability beliefs about payoff-relevant variables that are external. Then with IR, beliefs take the form of a well-defined probability measure over a stochastic process (the “fully Bayesian” plan). The authors of Reference [18] compared the IR vs. AU and found that AU can closely approximate the fully Bayesian optimisation. The two approaches then differ with respect to what agents learn about—their own future one-period ahead decision for EL and variables exogenous to the agents for AU.

In this paper, we introduce heterogeneity in a full Brock–Hommes new Keynesian (NK) model with a composite specification of BR and RE agents allowing for a wealth distribution between the two groups. A third-order perturbation solution leads to a demonstration of the effects of reinforcement learning in our NK boundedly rational model environment. The primary interest of this paper is to study the effect of learning on the business cycle and its implications for the design of optimal policy strategies within the BR environment. To this end, the discussions are organised around a number of issues that we aim to address. Can our model with an endogenous selection mechanism generate endogenous persistence and non-normality in the frequency distribution of macroeconomic aggregates? Does the composition of the types of agents change with reinforcement learning and the nature of the shocks hitting the economy? What are the welfare implications based on a behavioural macroeconomic model of this type?

In particular, the main contributions of this paper are as follows: (1) we develop a micro-founded framework that models the endogenous composition of RE and non-RE agents with reinforcement learning along the lines of Reference [19]; (2) we carry out our simulations based on different parameterisations of the model and focus on an assessment of the model-implied moments, including the simulated impulse response functions. Furthermore, in Appendix A, Appendix B, Appendix C, Appendix D, Appendix E, Appendix F and Appendix G we discuss the sources of instability and indeterminacy in our setup featuring the BR agents who solve their decision problems using the EL and AU expectation formation schemes. The highly non-linear structure of the BR specification in which agents endogenously select the heuristic rules is crucial for conducting optimal policy in macroeconomic models.

Our paper aims to contribute to both the learning and macroeconomic literature. The investigation on the role of BR behaviour in understanding the dynamics in economic activity observed empirically and guiding policy choices is not a trivial one. Various attempts modify the baseline NK model to account for hybrid heterogeneous expectations and BR. An approach that is closely related to ours in this regard is from the earlier contributions of References [3,19,20], in which they studied calibrated composite heterogeneous expectations models of RE and BR agents and discuss implications for the business cycle and designing stabilisation policies. In our setting, we focus on the major BR approaches with reinforcement learning—a highly non-linear structure within BR which is methodologically relevant for capturing movements that are non-normally distributed in empirical data. We also investigate the effect on rationality when we subject our model to the occurrence of more volatile exogenous shocks.

The rest of the paper is structured as follows. Section 2 sets out the standard linear RE NK model used in the literature and then proceeds to the Brock–Hommes composite model of rational and boundedly rational agents. Section 3 goes back to the non-linear foundations of the model. Section 4 describes the specific market-consistent environment in which households and firms form their expectations. Then, Section 5 presents our main results. Section 5.3 discusses how we choose the set of parameter values that avoids chaotic dynamics. Finally, Section 6 concludes the paper. Appendix A, Appendix B, Appendix C, Appendix D, Appendix E, Appendix F and Appendix G contain further details and results on the model’s stability and the construction of the model.

2. The Standard Behavioural NK Model

This section discusses the standard behavioural NK model framework used by References [3,4,19,20,21,22] and others.

2.1. The Workhorse NK Model

We first set out the most basic three-equation linearised workhorse NK model with RE

\begin{matrix} y_{t} & = & E_{t} y_{t + 1} - (r_{n, t} - E_{t} π_{t + 1}) + u_{1, t} \end{matrix}

(1)

\begin{matrix} π_{t} & = & β E_{t} π_{t + 1} + κ y_{t} + u_{2, t} \end{matrix}

(2)

\begin{matrix} r_{n, t} & = & ρ_{r} r_{n, t - 1} + (1 - ρ_{r}) (θ_{π} π_{t} + θ_{y} y_{t}) + u_{3, t} \end{matrix}

(3)

where

y_{t}

,

π_{t}

and

r_{n, t}

are the output gap, the inflation rate and the nominal interest rate, respectively. All variables are expressed in log-deviation form about a zero net-inflation steady state. The shock processes

u_{i, t}, i = 1, 2, 3

should be interpreted as exogenous shocks to demand (or preferences), the supply side, and monetary policy, respectively, and they are usually AR(1) processes. Expectations (

E_{t}

) up to now are formed, assuming RE and perfect information of the state vector (which includes the shock processes). Equation (1) is the linearised Euler equation for consumption which is equated with output in equilibrium (there is no government expenditure). The value (2) is the NK Phillips curve, and (3) is the nominal interest rate rule in “implementable form” in that it responds to output relative to the steady state rather than the output gap (note that (1) assumes logarithmic utility and that the supply side shock is a composite of technology and marginal cost processes in the model developed in this paper. The AR(1) feature of shock processes is criticised by Reference [4], as it implies that persistence is exogenously generated. This paper addresses this critique in developing strong endogenous persistence mechanisms through learning).

Before relaxing the RE assumption, two points about this formulation need to be made. First, there are not a lagged term in

y_{t}

in the demand curve (1) nor a lagged term in

π_{t}

in the Phillips curve (2) (as, for example, in Reference [23]). These can enter through the introduction of external habits in the consumers’ utility function and price indexing, respectively, but we choose to focus on learning as a persistence mechanism; thus, both these features are omitted. Second, the linearisation even without these persistence terms is only correct for a zero-inflation steady state.

2.2. The Brock–Hommes Behavioural NK Model

In the Brock–Hommes framework, which we later follow, the model becomes behavioural by a departure from the RE assumption and the introduction of two groups of agents. One group is rational, and the other forms EL expectations through simple “heuristic” learning rules. RE agents form model-consistent expectations fully aware of the existence of BR agents in the composite model. A version of general adaptive learning rules (the authors of Reference [24] provided lab-based support for such rules, and the generalised heuristic rule we later adopt in Section 4 includes a

t - 2

period and encompasses all the different behavioural group forecast heuristics) that encompasses those adopted by References [3,4,13,19,25] is

\begin{matrix} E_{t}^{*} y_{t + 1} & = & E_{t - 1}^{*} y_{t} + λ_{y} (y_{t - j} - E_{t - 1}^{*} y_{t}); λ_{y} \in [0, 1], j = 0, 1 \end{matrix}

(4)

\begin{matrix} E_{t}^{*} π_{t + 1} & = & E_{t - 1}^{*} π_{t} + λ_{π} (π_{t - j} - E_{t - 1}^{*} π_{t}); λ_{π} \in [0, 1], j = 0, 1 \end{matrix}

(5)

where we can in principle allow for both current and lagged observations of output and inflation,

j = 0, 1

, respectively. Throughout the rest of the paper, we make the following information assumptions: for observations of aggregateoutput and inflation, similar to the EL approach, we assume

j = 1

. Later in the AU approach, we need to model observations of market-specificvariables consisting of factor prices, profits and marginal costs. These we assume can be observed without a lag, and therefore,

j = 0

.

Let

n_{y, t}

,

n_{π, t}

be the proportions of rational agents forecasting output and inflation, respectively. The IS and NK Phillips curve equations then become

\begin{matrix} y_{t} & = & n_{y, t} E_{t} y_{t + 1} + (1 - n_{y, t}) E_{t}^{*} y_{t + 1} - [r_{n, t} - (n_{π, t} E_{t} π_{t + 1} + (1 - n_{π, t}) E_{t}^{*} π_{t + 1})] + u_{1, t} \end{matrix}

(6)

\begin{matrix} π_{t} & = & β [n_{π, t} E_{t} π_{t + 1} + (1 - n_{π, t}) E_{t}^{*} π_{t + 1}] + λ y_{t} + u_{2, t} \end{matrix}

(7)

To complete the model, we need expressions for the weights

n_{y, t}

and

n_{π, t}

. These follow the reinforcement learning literature by choosing probabilities

n_{x, t} = \frac{exp (- γ Φ_{x, t}^{R E} ({x_{t}}))}{exp (- γ Φ_{x, t}^{R E} ({x_{t}})) + exp (- γ Φ_{x, t}^{A E} ({x_{t}}))}

(8)

where

- Φ_{x, t}^{R E} ({x_{t})})

and

- Φ_{x, t}^{A E} ({x_{t})})

are “fitness” measures, respectively, of the forecast performance of the rational and non-rational predictor of outcome

{x_{t}} = {y_{t}}, {π_{t}}

given by a discounted least-squares error predictor

\begin{matrix} Φ_{x, t}^{R E} ({x_{t}}) & = & μ_{R E} Φ_{x, t - 1}^{R E} ({x_{t}}) + (1 - μ_{R E}) ({[x_{t} - E_{t - 1} x_{t}]}^{2} + C_{x}) \end{matrix}

(9)

\begin{matrix} Φ_{x, t}^{A E} ({x_{t}}) & = & μ_{A E} Φ_{x, t - 1}^{A E} ({x_{t}}) + (1 - μ_{A E}) {[x_{t - j} - E_{t - 1 - j}^{*} x_{t - 1}]}^{2}; j = 0, 1 \end{matrix}

(10)

where

μ_{R E}

and

μ_{A E}

capture the memory of the agents forming RE and adaptive expectations (a measure of forgetfulness of past observations).

C_{x}

represents the relative costs of being rational in learning about variable

x_{t}

. Thus, the proportion of rational agents in the steady state is given by

n_{x} = \frac{exp (- γ C_{x})}{exp (- γ C_{x}) + 1}

which is pinned down by the

γ C_{x}

. Equations (3)–(10) constitute the linearised NK behavioural model (the authors of References [3,4] constructed a rather different composite EL-type model consisting of “fundamentalist” rather than rational agents alongside adaptive learners. For the former RE,

E (\cdot)

are replaced with

E^{f} y_{t + 1} = y_{t}^{F}

and

E^{f} π_{t + 1} = 0

. Thus, fundamentalists always believe that the next period’s output gap is zero and that the net inflation rate will return to its steady-state value of zero. The same authors also assume

C_{x} = 0

in (9)).

3. The Non-Linear NK Model

Thus far in the linearised model, the justification for the form of adaptive forecasts needs to be established. In order to address this, we step back to the underlying non-linear model and introduce the distinction between internal decisions and aggregate macro-variables. We start with the non-linear RE model and proceed from full to bounded rationality in stages. The complete model setup and its balanced growth steady state are summarised in Appendix A, Appendix B, Appendix C, Appendix D, Appendix E, Appendix F and Appendix G.

3.1. Households

Household j chooses savings between work and labour supply. Let

C_{t} (j)

be consumption and

H_{t} (j)

be the proportion of available work or leisure spent at the former. The single-period utility we choose, compatible with a balanced growth steady state, is

U_{t} (j) = U (C_{t} (j), H_{t} (j)) = log (C_{t} (j)) - \frac{H_{t} {(j)}^{1 + ϕ}}{1 + ϕ}

and the value function of the representative household at time t dependent on its assets B is

V_{t} (j) = V_{t} (B_{t - 1} (j)) = E_{t} [\sum_{s = 0}^{\infty} β^{s} U (C_{t + s} (j), H_{t + s} (j))]

(11)

The household’s problem at time t is to choose paths for consumption

{C_{t} (j)}

, labour supply

{H_{t} (j)}

and holdings of financial savings to maximise

V_{t} (j)

, given by (11), given its budget constraint in period t

B_{t} (j) = R_{t} B_{t - 1} (j) + W_{t} H_{t} (j) + Γ_{t} - C_{t} (j) - T_{t} - \frac{ϖ}{2} {(B_{t - 1} (j) - B)}^{2}

(12)

where

B_{t} (j)

is the given net stock of real financial assets at the end of period t,

W_{t}

is the wage rate,

T_{t}

are lump-sum taxes, and

Γ_{t}

are profits from wholesale and retail firms owned by households. In order to allow for a wealth distribution by heterogenous agents introduced later and to achieve a stationary path for bond holdings, we introduce a portfolio adjustment cost (this as a modelling device similar to that used in open economies with home and foreign household is pioneered by Reference [26]. We examine the limit as

ϖ

becomes very small so that our choice of real rather than nominal bond holding costs is immaterial. The wealth distribution effect does not significantly change the equilibrium).

R_{t}

is the real interest rate paid on assets held at the beginning of period t given by

R_{t} = \frac{R_{n, t - 1}}{Π_{t}} R S_{t}

, where

R_{n, t}

and

Π_{t}

are the nominal interest and inflation rates, respectively, and

R S_{t}

is a risk premium shock.

W_{t}

,

R_{n, t}

,

Π_{t}

and

Γ_{t}

are all exogenous to household j. As usual, all real variables are expressed relative to the price of the final output. The standard first-order conditions are

\begin{matrix} E_{t} [Λ_{t, t + 1} (j) R_{t + 1}] & = & 1 + ϖ (B_{t} (j) - B) \\ \frac{U_{H, t} (j)}{U_{C, t} (j)} & = & - W_{t} \end{matrix}

where

Λ_{t, t + 1} (j) \equiv β \frac{U_{C, t + 1} (j)}{U_{C, t} (j)}

is the stochastic discount factor for household j, over the interval

[t, t + 1]

. For our choice of utility function

U_{C, t} = \frac{1}{C_{t}}

and

U_{H, t} = - H_{t}^{ϕ}

, and these become

\begin{matrix} β E_{t} [\frac{C_{t} (j) R_{t + 1}}{C_{t + 1} (j)}] & = & 1 + ϖ (B_{t} (j) - B) \end{matrix}

(13)

\begin{matrix} C_{t} (j) H_{t} {(j)}^{ϕ} & = & W_{t} \Rightarrow H_{t} (j) = {(\frac{W_{t}}{C_{t} (j)})}^{\frac{1}{ϕ}} \end{matrix}

(14)

The first-order conditions up to now are suitable for the RE solution. We now express the solution in a form suitable for moving from an RE to a learning equilibrium. We consider the limit as

ϖ \to 0

. Solving (12) forward in time and imposing the transversality condition on debt, we can write

B_{t - 1} (j) = {PV}_{t} (C_{t} (j)) - {PV}_{t} (W_{t} H_{t} (j)) - {PV}_{t} (Γ_{t}) + {PV}_{t} (T_{t})

(15)

where the present (expected) value of a series

X \equiv {X_{t + i}}_{i = 0}^{\infty}

at time t is defined by

{PV}_{t} (X_{t}) \equiv E_{t} \sum_{i = 0}^{\infty} \frac{X_{t + i}}{R_{t, t + i}} = \frac{X_{t}}{R_{t}} + \frac{1}{R_{t}} {PV}_{t} (X_{t + 1})

(16)

writing

R_{t, t + i} \equiv R_{t} R_{t + 1} R_{t + 2} \dots R_{t + i}

as the real interest rate over the interval

[t - 1, t + i]

.

The forward-looking budget constraint (15) holds for the representative household. If we allow RE and BR agents to borrow from or lend to one another, we must allow for

B_{t - 1} \neq 0

. Then, in a symmetric equilibrium with

C_{t} (j) = C_{t}

and

H_{t} (j) = H_{t}

, (15) and (14) become

\begin{matrix} B_{t - 1} & = & {PV}_{t} (C_{t}) - {PV}_{t} (\frac{W_{t}^{1 + \frac{1}{ϕ}}}{C_{t}^{\frac{1}{ϕ}}}) - {PV}_{t} (Γ_{t}) + {PV}_{t} (T_{t}) \\ H_{t} & = & {(\frac{W_{t}}{C_{t}})}^{\frac{1}{ϕ}} \end{matrix}

Solving (13) forward in time and using the law of iterated expectation, we have for

i \geq 1

\frac{1}{C_{t}} = β^{i} E_{t} [\frac{R_{t + 1, t + i}}{C_{t + i}}]; i \geq 1

(17)

We now express the solution to the household optimisation problem for

C_{t}

and

H_{t}

that are functions of point expectations

{E_{t} W_{t + i}}_{i = 1}^{\infty}

,

{E_{t} R_{t + 1, t + i}}_{i = 1}^{\infty}

and

{E_{t} Γ_{t + i}}_{i = 0}^{\infty}

, treated as exogenous processes given at time t. With point expectations, we use (17) to obtain the following optimal decision for

C_{t + i}

, given the point expectations

E_{t} R_{t + 1, t + i}

\begin{matrix} C_{t + i} & = & C_{t} β^{i} E_{t} R_{t + 1, t + i}; i \geq 1 \end{matrix}

(18)

\begin{matrix} E_{t} (W_{t + i} H_{t + i}) & = & \frac{{(E_{t} W_{t + i})}^{1 + \frac{1}{ϕ}}}{C_{t + i}^{\frac{1}{ϕ}}} \end{matrix}

(19)

Substituting (18) and (19) into the forward-looking household budget constraint, using

\sum_{i = 0}^{\infty} β^{i} = \frac{1}{1 - β}

and

E_{t} R_{t, t + i} = R_{t} E_{t} R_{t + 1, t + i}

for

i \geq 1

, we arrive at

\frac{C_{t} - R_{t} B_{t - 1}}{(1 - β)} = \frac{1}{C_{t}^{\frac{1}{ϕ}}} (W_{t}^{1 + \frac{1}{ϕ}} + \sum_{i = 1}^{\infty} {(β^{\frac{1}{ϕ}})}^{- i} {(\frac{E_{t} W_{t + i}}{E_{t} R_{t + 1, t + i}})}^{1 + \frac{1}{ϕ}}) + Γ_{t} - T_{t} + \sum_{i = 1}^{\infty} \frac{E_{t} (Γ_{t + i} - T_{t + i}))}{E_{t} R_{t + 1, t + i}}

which can be written in recursive form as

\begin{matrix} \frac{C_{t} - R_{t} B_{t - 1}}{(1 - β)} & = & \frac{1}{C_{t}^{\frac{1}{ϕ}}} (W_{t}^{1 + \frac{1}{ϕ}} + Ω_{1, t}) + Γ_{t} - T_{t} + Ω_{2, t} \\ Ω_{1, t} & \equiv & \sum_{i = 1}^{\infty} {(β^{\frac{1}{ϕ}})}^{- i} {(\frac{E_{t} W_{t + i}}{E_{t} R_{t + 1, t + i}})}^{1 + \frac{1}{ϕ}} = {(β^{\frac{1}{ϕ}})}^{- 1} {(\frac{E_{t} W_{t + 1}}{E_{t} R_{t + 1, t + 1}})}^{1 + \frac{1}{ϕ}} + \frac{Ω_{1, t + 1}}{β^{\frac{1}{ϕ}} E_{t} R_{t + 1}} \\ Ω_{2, t} & \equiv & \sum_{i = 1}^{\infty} \frac{E_{t} (Γ_{t + i} - T_{t + i})}{E_{t} R_{t + 1, t + i}} = \frac{E_{t} (Γ_{t + 1} - T_{t + 1})}{E_{t} R_{t + 1, t + 1}} + \frac{Ω_{2, t + 1}}{E_{t} R_{t + 1}} \end{matrix}

(20)

Consumption is then given by (20), assuming point expectations or by the symmetric form of the Euler equation (13) under full rationality (i.e., households know the symmetric nature of equilibrium with

C_{t} (j) = C_{t}

).

C_{t}

is a function of rational point expectations

{E_{t} W_{t + i}}_{i = 1}^{\infty}

,

{E_{t} R_{t, t + i}}_{i = i}^{\infty}

and

{E_{t} Γ_{t + i}}_{i = 1}^{\infty}

which can be treated as exogenous processes given at time t or as rational model-consistent expectations. Since

E_{t} f (X_{t}) \approx f (E_{t} (X_{t}))

;

E_{t} f (X_{t} Y_{t})) \approx f (E_{t} (X_{t}) E_{t} (Y_{t}))

up to a first-order Taylor-series expansion, assuming that point expectations are equivalent to using a linear approximation (given below), as is usually performed in the literature.

3.2. Firms, Government Expenditures and Monetary Policy

This section sets out the wholesalers and the retail sector which is optimised using Calvo-pricing contracts. We close the non-linear setup with resource and balanced government budget constraints, a monetary policy rule and by specifying the structural shocks in the economy. Wholesale firms employ a Cobb–Douglas production function to produce a homogeneous output

Y_{t}^{W} = F (A_{t}, H_{t}) = A_{t} H_{t}^{α}

where

A_{t}

is total factor productivity. Profit-maximising demand for labour results in the first-order condition

W_{t} = \frac{P_{t}^{W}}{P_{t}} F_{H, t} = α \frac{P_{t}^{W}}{P_{t}} \frac{Y_{t}^{W}}{H_{t}}

(21)

The retail sector costlessly converts a homogeneous wholesale good into a basket of differentiated goods for aggregate consumption

C_{t} = {(\int_{0}^{1} C_{t} {(m)}^{(ζ - 1) / ζ} d m)}^{ζ / (ζ - 1)}

(22)

where

ζ

is the elasticity of substitution. For each m, the consumer chooses

C_{t} (m)

at a price

P_{t} (m)

to maximise (22) given total expenditure

\int_{0}^{1} P_{t} (m) C_{t} (m) d m

. Assuming that government services are similarly differentiated, this results in a set of demand equations for each differentiated good m with price

P_{t} (m)

of the form

Y_{t} (m) = {(\frac{P_{t} (m)}{P_{t}})}^{- ζ} Y_{t}

(23)

where

P_{t} = {[\int_{0}^{1} P_{t} {(m)}^{1 - ζ} d m]}^{\frac{1}{1 - ζ}}

,

P_{t}

is the aggregate price index, and

C_{t}

and

P_{t}

are Dixit–Stigliz aggregates; see Reference [27].

Following Reference [28], we assume that there is a probability of

1 - ξ

at each period that the price of each retail good m is set optimally to

P_{t}^{O} (m)

. If the price is not re-optimised, then it is held fixed. For each retail producer m, given its real marginal cost

M C_{t} = \frac{P_{t}^{W}}{P_{t}}

, the objective is at time t to choose

{P_{t}^{O} (m)}

to maximise discounted real profits

E_{t} \sum_{k = 0}^{\infty} ξ^{k} \frac{Λ_{t, t + k}}{P_{t + k}} Y_{t + k} (m) [P_{t}^{O} (m) - P_{t + k} M C_{t + k}]

subject to (23), where

Λ_{t, t + k} \equiv β^{k} \frac{U_{C, t + k}}{U_{C, t}}

is the stochastic discount factor over the interval

[t, t + k]

. The solution to this is standard and is given by

\begin{matrix} \frac{P_{t}^{O} (m)}{P_{t}} = \frac{ζ}{ζ - 1} \frac{E_{t} \sum_{k = 0}^{\infty} ξ^{k} Λ_{t, t + k} {(Π_{t, t + k})}^{ζ} Y_{t + k} M C_{t + k}}{E_{t} \sum_{k = 0}^{\infty} ξ^{k} Λ_{t, t + k} {(Π_{t, t + k})}^{ζ} {(Π_{t, t + k})}^{- 1} Y_{t + k}} \end{matrix}

Denoting the numerator and denominator by

J_{t}

and

J J_{t}

, respectively, and introducing a mark-up shock

M S_{t}

to

M C_{t}

, from Appendix D, we write in recursive form

\begin{matrix} \frac{P_{t}^{O} (m)}{P_{t}} & = & \frac{J_{t}}{J J_{t}} \end{matrix}

(24)

\begin{matrix} J_{t} - ξ E_{t} [Λ_{t, t + 1} Π_{t + 1}^{ζ} J_{t + 1}] & = & \frac{1}{1 - \frac{1}{ζ}} Y_{t} M C_{t} M S_{t} \end{matrix}

(25)

\begin{matrix} J J_{t} - ξ E_{t} [Λ_{t, t + 1} Π_{t + 1}^{ζ - 1} J J_{t + 1}] & = & Y_{t} \end{matrix}

(26)

Using the fact that all resetting firms will choose the same price, by the law of large numbers, we can find the evolution of inflation given by

1 = ξ {(Π_{t - 1, t})}^{ζ - 1} + (1 - ξ) {(\frac{P_{t}^{O}}{P_{t}})}^{1 - ζ}

(27)

Price dispersion lowers aggregate output as follows. Market clearing in the labour market gives

H_{t} = \sum_{m = 1}^{n} H_{t} (m) = \sum_{m = 1}^{n} {(\frac{Y_{t} (m)}{A_{t}})}^{\frac{1}{α}} = {(\frac{Y_{t}}{A_{t}})}^{\frac{1}{α}} \sum_{m = 1}^{n} {(\frac{P_{t} (m)}{P_{t}})}^{- \frac{ζ}{α}}

using (23). Hence, equilibrium for good m gives

Y_{t} = \frac{Y_{t}^{W}}{Δ_{t}^{α}}

, where price dispersion is defined by

Δ_{t} \equiv (\sum_{m = 1}^{n} {(\frac{P_{t} (m)}{P_{t}})}^{- \frac{ζ}{α}})

Assuming that the number of firms is large from Appendix E, we obtain the following dynamic relationship

Δ_{t} = ξ Π_{t}^{\frac{ζ}{α}} Δ_{t - 1} + (1 - ξ) {(\frac{J_{t}}{J J_{t}})}^{- \frac{ζ}{α}}

(28)

To close the model, we first require total profits from retail, and wholesale firms,

Γ_{t}

, is remitted to households. This is given in real terms by

Γ_{t} = \underset{r e t a i l}{\underset{︸}{Y_{t} - \frac{P_{t}^{W}}{P_{t}} Y_{t}^{W}}} + \underset{W h o l e s a l e}{\underset{︸}{\frac{P_{t}^{W}}{P_{t}} Y_{t}^{W} - W_{t} H_{t}}} = Y_{t} - α \frac{P_{t}^{W}}{P_{t}} Y_{t}^{W}

using the first-order condition (21). Then, to complete closure, we have resource and balanced government budget constraints

Y_{t} = C_{t} + G_{t} = C_{t} + T_{t}

where

G_{t}

is an exogenous demand process, and a monetary policy rule for the nominal interest rate given by the following implementable Taylor-type rule

\begin{matrix} log (\frac{R_{n, t}}{R_{n}}) & = & ρ_{r} log (\frac{R_{n, t - 1}}{R_{n}}) + (1 - ρ_{r}) (θ_{π} log (\frac{Π_{t}}{Π_{t a r g, t}}) \\ + & θ_{y} log (\frac{Y_{t}}{Y}) + θ_{d y} log (\frac{Y_{t}}{Y_{t - 1}})) + ϵ_{M P, t} \end{matrix}

and

ϵ_{M P, t}

is an i.i.d. shock to monetary policy.

Π_{t a r g, t}

is a time-varying inflation target and together with

A_{t}

,

G_{t}

,

R S_{t}

and

M S_{t}

follows an AR(1) process. This completes the model.

3.3. Recovering the NK Workhorse Model

We now show that the linearised form of the non-linear model about the steady state reduces to the standard workhorse model in Section 2.1 where rational expectations

E_{t} y_{t + 1}

and

E_{t} π_{t + 1}

or non-RE

E_{t}^{*} y_{t + 1}

and

E_{t}^{*} π_{t + 1}

can be treated as expectations by individual households and firms, respectively, of aggregate future output and inflation. We consider the linearised form of the above set-up about a zero inflation and growth deterministic steady state. We also ignore lending or borrowing between RE and BR agents. With RE, the household j’s first-order conditions take one of two forms. First, linearising (20), we have

\begin{matrix} α_{1} c_{t} (j) & = & α_{2} w_{t} + α_{3} (ω_{2, t} + r_{t}) + α_{4} ω_{1, t} \\ ω_{1, t} & = & α_{5} E_{t} w_{t + 1} - α_{6} E_{t} r_{t + 1} + β E_{t} ω_{1, t + 1} \\ ω_{2, t} & = & (1 - β) (γ_{t} - g_{t}) - r_{t} + β E_{t} ω_{2, t + 1} \\ γ_{t} & = & \frac{1}{γ_{y}} y_{t} - \frac{α}{γ_{y}} (w_{t} + h_{t}) \end{matrix}

(29)

where lower case variables

x_{t} \equiv log (X_{t} / X)

, X is the steady state of

X_{t}

;

c_{y} \equiv \frac{C}{Y}

,

γ_{y} \equiv \frac{Γ}{Y}

,

g_{y} \equiv \frac{G}{Y}

and

γ_{t}

is exogenous profit per household (a function of aggregate consumption and hours). Positive coefficients are given by

α_{1} \equiv 1 + \frac{α}{ϕ c_{y}}

,

α_{2} \equiv (1 - β) (1 + \frac{1}{ϕ}) \frac{α}{c_{y}}

,

α_{3} \equiv \frac{γ_{y}}{c_{y}}

,

α_{4} \equiv \frac{β α}{c_{y}}

,

α_{5} \equiv (1 - β) (1 + \frac{1}{ϕ})

and

α_{6} \equiv (1 + \frac{1}{ϕ})

. Alternatively, from Euler Equation (13),

c_{t} = E_{t} c_{t + 1} - E_{t} r_{t + 1}

(30)

in a symmetric equilibrium. Under RE, (29) or (30) lead to the same equilibrium, but under BR, this is no longer the case.

Linearising the household supply of hours decision, the resource constraint and the Fisher equation, we have

\begin{matrix} y_{t} & = & (1 - g_{y}) c_{t} + g_{y} g_{t} \end{matrix}

(31)

\begin{matrix} r_{t} & = & r_{n, t - 1} - π_{t} + r s_{t - 1} \\ h_{t} & = & \frac{1}{ϕ} (w_{t} - c_{t}) \end{matrix}

(32)

Then, in a special case where

G_{t} = 0

and there is no distinction between public and private consumption,

g_{y} = 0

and

y_{t} = c_{t}

. Equations (30)–(32) with

r s_{t} = u_{1, t}

reduce to (1) where

E_{t} y_{t + 1}

is the forecast of aggregate output.

Turning to the supply side, for the wholesale sector

\begin{matrix} y_{t} & = & a_{t} + α h_{t} \\ m c_{t} & = & w_{t} - y_{t} + h_{t} \end{matrix}

For retail firm m, linearising the pricing dynamics (24)–(26) about a zero net equation steady state and solving forward, we have

\begin{matrix} p_{t}^{o} (m) - p_{t} & = & β ξ E_{t} [π_{t + 1} + p_{t + 1}^{o} (m) - p_{t + 1}] + (1 - β ξ) (m c_{t} + m s_{t}) \\ = & E_{t} \sum_{i = 0}^{\infty} {(β ξ)}^{i} [β ξ π_{t + i + 1} + (1 - β ξ) (m c_{t + i} + m s_{t + i})] \end{matrix}

(33)

Then, in a symmetric equilibrium, we have

π_{t} = \frac{(1 - ξ)}{ξ} (E_{t} \sum_{i = 0}^{\infty} {(β ξ)}^{i} [β ξ π_{t + i + 1} + (1 - β ξ) (m c_{t + i} + m s_{t + i})])

(34)

where

E_{t} [π_{t + i + 1}]

and

E_{t} [m c_{t + i} + m s_{t + i}]

are expectations of aggregate inflation and real marginal costs, both variables exogenous to individual price setters. However, if price setters know they are identical, they know the aggregate price level over non-optimising and optimising firms

p_{t} (m) = ξ p_{t - 1} + (1 - ξ) p_{t}^{o} (m)

(35)

to obtain in a symmetric equilibrium

p_{t}^{o} (m) - p_{t} = p_{t}^{o} - p_{t} = \frac{ξ}{(1 - ξ)} (p_{t} - p_{t - 1}) = \frac{ξ}{(1 - ξ)} π_{t}

Then, substituting back into (33), we arrive at

π_{t} = \frac{(1 - ξ) (1 - β ξ)}{ξ} E_{t}^{*} \sum_{i = 0}^{\infty} β^{i} (m c_{t + i} + m s_{t + i})

(36)

which omits learning about aggregate inflation. Equation (36) is the familiar linearised Phillips curve. Under RE, (34) and (36) are equivalent. (Putting

m c_{t} = w_{t} - a_{t} + h_{t} = (1 + ϕ) h_{t} = \frac{(1 + ϕ) (y_{t} - a_{t})}{α}

, (36) in recursive form gives (2) with

λ = \frac{(1 - ξ) (1 - β ξ) (1 + ϕ)}{α ξ}

and

u_{2, t} = λ m s_{t}

). The form of the Phillips curve (36), which is equivalent to (2), is often used in the behavioural NK literature (see, for example, Reference [4]), but as we have shown, this assumes that firms know they are identical. In our BR model, we use (29) and (34), which do not make this assumption.

4. AU Learning and Market-Consistent Information

With anticipated utility (AU) learning, our learning model is one where agents make fully optimal decisions, given their individual specification of beliefs, but have no macroeconomic model to form expectations of aggregate variables. We draw a clear distinction between aggregate and internal quantities so that identical agents in our model are not aware of this equilibrium property (nor any others).

To close the model, we need to specify the manner in which households and firms form their expectations. To do so, we assume that variables which are local to the agents, in a geographical sense, are observable within the period, whereas variables that are strictly macroeconomic are only observable with a lag. This categorisation regarding information about the current state of the economy follows Reference [29], which distinguishes between the local information that agents acquire directly through their interactions in markets and statistics that are collected and summarised, usually by governments, and are made available to the wider public. (This paper actually focuses on a third category, information provided by the news media, and allows for imperfect information in the form of noisy signals, issues which go beyond the scope of our paper.) The policy rate is announced by the central bank; thus, it is observed without a lag, and it is common knowledge. Given this, we assume an adaptive expectations forecasting rule given below by (38) and (39) about variables external to agents’ decisions. Let

x_{t} = r_{t}, r_{n, t}, π_{t}, w_{t}, γ_{t}

, then household expectations are given by

E_{t}^{*} x_{t + i} = E_{t}^{*} x_{t + 1}; i \geq 1

(37)

Expressing

E_{t} ω_{1, t + 1}

and

E_{t} ω_{2, t + 1}

in (29) as forward-looking summations and using (37), we arrive at the individual learning consumption equation

\begin{matrix} α_{1} c_{t} & = & α_{2} w_{t} + α_{3} (ω_{2, t} + r_{t}) + α_{4} ω_{1, t} \\ ω_{1, t} & = & \frac{1}{1 - β} [α_{5} E_{t}^{*} w_{t + 1} - α_{6} (β E_{t}^{*} r_{n, t + 1} - E_{h, t}^{*} π_{t + 1})] - α_{6} r_{n, t} \\ ω_{2, t} & = & (1 - β) (γ_{t} - g_{t}) - r_{t} + \frac{β}{1 - β} ((1 - β) (E_{t}^{*} γ_{t + 1} - E_{t}^{*} g_{t + 1}) - E_{t}^{*} r_{t + 1}) \end{matrix}

which is now expressed in terms of one-step ahead forecasts by

E_{t}^{*} x_{t + 1} = E_{t}^{*} x_{t} + λ_{x} (x_{t - j} - E_{t}^{*} x_{t}); x = w, r_{n}, π, γ; j = 0, 1

(38)

Households make inter-temporal decisions for their consumption and hours supplied given adaptive expectations of the wage rate, the nominal interest rate, inflation and profits. These macro-variables may in principle be observed with or without a one-period lag (

j = 1, 0

), but as stated earlier, we assume

j = 0

for market-specific variables

w_{t}, γ_{t}

, and

j = 1

for aggregate inflation

π_{t}

. However, we assume that the current nominal interest rate,

r_{n, t}

, is announced and therefore is observed without a lag.

We distinguish household and firm expectations

E_{h, t}^{*} π_{t + 1}

,

E_{f, t}^{*} π_{t + 1}

. Then, for retail firm m

\begin{matrix} E_{t}^{*} π_{t + i + 1} & = & E_{t}^{*} π_{t + 1}; i \geq 0 \\ E_{t}^{*} (m c_{t + i} + m s_{t + i}) & = & E_{t}^{*} (m c_{t + 1} + m s_{t + 1}); i \geq 1 \\ p_{t}^{o} (m) - p_{t} & = & \frac{β ξ}{1 - β} E_{f, t}^{*} π_{t + 1} + (1 - β ξ) (m c_{t} + m s_{t}) + \frac{β}{1 - β} E_{t}^{*} (m c_{t + 1} + m s_{t + 1}) \end{matrix}

where one-step ahead forecasts are given by the adaptive expectations rule

E_{t}^{*} x_{t + 1} = E_{t}^{*} x_{t} + λ_{x} (x_{t - j} - E_{t}^{*} x_{t}); x = π, (m c + m s); j = 0, 1

(39)

Retail firms make inter-temporal decisions for their price and output given adaptive expectations of the aggregate inflation rate and their post-shock real marginal shock wage rate. As before, these variables may be observed with or without a one-period lag (

j = 1, 0

), but for aggregate inflation, we assume

j = 1

as for households, but

j = 0

for the market-specific variable

m c_{t}

. Note that we can in principle distinguish between households’ and firms’ expectations of inflation.

5. Heterogeneous Expectations across Agents

Now we come to the full Brock–Hommes NK model but with BR-AU rather than EL boundedly rational agents. We argue that our benchmark models, namely, an agent-level learning behavioural NK model with infinite horizon learners (AU) who use the standard Brock–Hommes forecast heuristics to form expectations, and a composite version with fixed proportions of agents forming both RE and AU in a NK setting, are selected because we want to compare the equilibrium features and empirical performance of these assumptions in an informational, consistent environment. We assume that all RE agents know the composite model, and moreover, we impose informational inconsistency by assuming that they have the same imperfect information set as the BR-AU agents. The latter do not know the model, but they make individually optimal decisions given individual observations of the states and belief formations. The composite RE-BR model then has an equilibrium (in non-linear form)

\begin{matrix} H_{t}^{d} & = & n_{h, t} {(H_{t}^{s})}^{R E} + (1 - n_{h, t}) {(H_{t}^{s})}^{B R} \\ C_{t} & = & n_{h, t} {(C_{t})}^{R E} + (1 - n_{h, t}) {(C_{t})}^{B R} = Y_{t} - G_{t} \\ \frac{P_{t}^{o}}{P_{t}} & = & n_{f, t} {(\frac{P_{t}^{o}}{P_{t}})}^{R E} + (1 - n_{f, t}) {(\frac{P_{t}^{o}}{P_{t}})}^{B R} \end{matrix}

Zero net wealth in aggregateimplies that

n_{h, t} B_{t}^{R E} = - (1 - n_{h, t}) B_{t}^{B R}

.

We first consider the properties of the model with fixed exogenous proportions of RE and BR agents. Then, in Section 5.2, we allow these proportions to be determined endogenously.

5.1. Exogenous Proportions of RE and BR Agents

For our model of BR with AU, Figure 1 plots the impulse response functions (IRFs) with standard parameters for the rule for a shock to monetary policy under fast and slow learning. Figure A3 and Figure A4 in Appendix F show IRFs for the technology and mark-up shocks. Not surprisingly, fast learning sees an IRF converge faster to the RE case, but in either case, BR introduces more persistence compared with RE. This suggests that this feature should lead to a better fit of the data without relying on other persistence mechanisms (shocks, habit or price indexing). The stability properties of the model are examined in the WP version of the paper and Appendix A.

5.2. Endogenous Proportions of RE and BR Agents with Reinforcement Learning

Proportions of rational households (

n_{h, t}

) and firms (

n_{f, t}

) are given by (8)

n_{j, t} = \frac{exp (- γ Φ_{j, t}^{R E})}{exp {(- γ Φ_{j, t})}^{R E} + exp (- γ Φ_{j, t}^{B R})}; j = h, f

where fitness for households and firms

j = h, f

is given by

\begin{matrix} Φ_{j, t}^{R E} & = & μ_{j}^{R E} Φ_{j, t - 1}^{R E} + (1 - μ_{j}^{R E}) (weighted sum of forecast errors + C_{j}) \\ Φ_{j, t}^{B R} & = & μ_{j}^{B R} Φ_{j, t - 1}^{B R} + (1 - μ_{j}^{B R}) (weighted sum of forecast errors) \end{matrix}

Table 1 provides a third-order perturbation solution of the non-linear NK RE-BR model. We use the Bayesian estimation of the model in Reference [30] where the model is linearised and the proportions

n_{h, t}

and

n_{f, t}

are fixed. Non-linear estimation would be required to pin down the parameters

n_{h}

,

n_{f}

in the steady state in the BR scenarios and

μ_{h}^{R E, B R}

,

μ_{f}^{R E, B R}

and

γ

in the reinforcement learning process, which goes beyond the scope of this paper. Thus, here we impose them as reported in the table (

n_{h, t} = n_{f, t} = 0.1

). We also scale the estimated standard deviations of the shocks using a parameter

σ = 1, 2

. For the robustness of our results, we perform additional simulations, for different choice of the memory parameters, and present the results with

μ_{h}^{R E} = μ_{h}^{B R} = μ_{f}^{R E} = μ_{f}^{B R} = 0.5

and

= 0.75

in Appendix G. The robustness exercise assumes instead that agents have some memory of past observations.

The main results from these simulations are as follows. First, reinforcement learning introduces high kurtosis and skewness in macroeconomic variables, the absence of kurtosis in the standard NK model, often highlighted in the literature (see, for example, Reference [3]), is in part simply the consequence of linearisation, and non-normality is a feature of higher order approximations. Second, reinforcement learning with stronger switching processes (i.e.,

γ = 100, 1000

) coupled with higher volatility of exogenous shocks results in the numbers of rational agents increasing from the estimated deterministic steady state value of

0.1

to

0.13

and

0.15

for households and firms, respectively, in the stochastic steady state. Third, given that BR is a welfare-reducing friction in these models, it follows that volatility can actually be welfare-increasing in our heterogeneous expectations setting. Furthermore, when we assume that agents have some memory of past observations when revising their expectations given their forecast performances, the simulated skewness and kurtosis are lower compared to the case when no memory is assumed in the learning process.

Our main results clearly suggest that, when the switching process between groups of heterogeneous agents becomes more deterministic depending on agents’ willingness to learn from the past performance when predicting future outcomes, this leads to an increase in the level of rationality in the BR macroeconomy. This result is in line with the finding in Reference [3]. The cognitive effect of this selection mechanism is much stronger with the occurrence of large exogenous shocks. This group behaviour not only plays a key role in explaining the dynamic properties of the data, revaluating the importance of expectations in driving economic fluctuations in the spirit of Keynes’ concept of animal spirits, but has important implications for the optimal control of policy in the spirit of the Lucas critique. Depending on intentions on the part of policymakers, the model suggests that different versions of policy can be designed and devised in a game between policymakers and the economy, with uncertainty as to which expectation formation is selected.

5.3. The Possibility of Bifurcation and Chaotic Dynamics

Non-linear models in general open up the possibility that, for certain parameter values or initial conditions, they may exhibit chaotic dynamics. How are the obtained results related to such dynamics? This possibility is examined using the model of this paper in Reference [22].

The conclusions are: first, the RE determinancy condition for the linearised model in the vicinity of the deterministic steady state ensures local determinancy and stability in the model with a fixed proportion n of fully rational agents. Second, if the linear form of the model starts from a position of indeterminacy, an increase in the fixed cost of being fully rational can lead to the loss of local stability via a Hopf bifurcation. This Hopf bifurcation appears to be super-critical, giving rise to stable limit cycles. As the speed at which agents learn increases, a rational route to randomness appears to follow, which we explore with numerical methods. From a policy point of view, the main conclusion is that local indeterminacy about the steady state can be avoided by a careful choice of interest-rate rule that obeys a “Taylor condition” modified to allow for persistence. This is the case for our simulations which avoid chaotic dynamics.

6. Conclusions

This paper studies an NK behavioural model for which boundedly rational beliefs of economic agents are about payoff-relevant macroeconomic variables that are exogenous to their decision rules. Reinforcement learning is at the core of the heterogeneous expectations model and leads to the striking result that a high volatility of exogenous shocks, by assisting the learning process, can be welfare-increasing.

The results from our simulations have a range of practical and theoretical implications. From a practical point of view, our model provides a behavioural explanation for the important properties of the business cycle dynamics and (ir)rationality under market economy. Our findings shed more light on the underlying mechanism that guides policy choices in a society comprising policymakers and agents who form heterogeneous expectations. Regarding the theoretical implications, our results for a simple NK model suggest a new agenda for constructing empirical medium-sized NK models for agents’ behaviours under imperfect information. Future work will embed the RE-BR composite model into a richer NK macroeconomic model along the lines of Reference [31], use non-linear estimation methods to identify a number of parameters involving reinforcement learning that are not identified using linear Bayesian estimation, and examine optimal monetary policy.

Another potential direction for future research is to investigate how reinforcement learning affects the possible chaotic dynamics of the model. We know that an increase in the fixed cost of being fully rational can lead to the loss of local stability. If we enter a region of local instability, but global boundedness, we see chaotic dynamics as highlighted generally in Reference [25]. In addition, from Reference [22], who plotted the simulated trajectories for various parameter values with an almost purely stochastic switching process (

γ = 0.1

), it is evident that, when the level of rationality varies according to reinforcement learning, it is likely that we see very different stability/determinancy properties of the model, which imply that uncertainty as to how expectations and learning are processed can lead to a policy rule that is unstable or has infinite multiple equilibria (i.e., is indeterminate).

As with any research, there are limitations in our study that should be addressed in future work. We have alluded to the wilderness of non-rational expectations posed by the sheer size of the literature on behavioural macroeconomics and the huge number of equilibria proposed. Any analysis based on only one choice of model clearly has limitations when turning to policy implications. A policy that works well for one particular choice may perform badly using a different model. One solution to this problem proposed by References [32,33] is to choose a policy to maximise weighted average inter-temporal welfare across a set of competing models and to weigh models based on relative forecasting performance. In other studies, the proportions of rational and non-rational agents are fixed; a possible avenue for future research would be to extend the analysis to time-varying endogenous proportions as in this paper.

Finally, there remains a wide range of views over the asymmetric macroeconomic effects of economic shocks (e.g., news, energy and monetary policy) as well as over the variations in these effects with respect to economic conditions and states. Different strands of literature offer different explanations on the existence of non-linearities, focusing on the sources of the shocks, econometric specifications and time-variation in impact and policy responses (see Reference [34] for a recent study that addresses the latter two aspects). We argue that the modelling approach and non-linear techniques used in our paper add an important dimension to this strand of literature by providing a variety of starting points for future work that investigates the non-linear effects of shocks that may originate from the time-varying nature of expectation formations and complex adaptive systems.

Author Contributions

Conceptualization, S.D., P.L., J.P. and B.Y.; methodology, S.D. and P.L.; software, S.D., P.L. and B.Y.; validation, P.L. and B.Y.; formal analysis, S.D., P.L. and B.Y.; investigation, S.D., P.L., J.P. and B.Y.; writing—original draft preparation, P.L. and J.P.; writing—review and editing, P.L. and B.Y.; project administration, P.L.; funding acquisition, P.L. and J.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the ESRC, grant number: ES/K005154/1.

Data Availability Statement

No data were created or analysed in this study.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Stability Analysis

We have three possible models of expectations: rational (i.e., model consistent), boundedly rational with Euler learning and boundedly but with infinite-horizon learning. We denote these three cases by RE, EL and AU, respectively. In this section, we consider homogeneous expectations for which all agents (households and firms) form either RE or AU or EL expectations. In Section 5 of the main paper, we then allow for the possibility that households and firms are heterogenous across these groups (but retain intra-group homogeneity).

In the numerical results below, we fix parameters at their priors used in the Bayesian estimation apart from the adaptive learning parameter

λ_{x}

which we set at unity. We make the following information assumptions: for observations of aggregate output and inflation,

j = 1

, which is assumed in the EL approach. In the AU approach, we need to model observations of market-specific variables consisting of factor prices, profits and marginal costs. These we assume can be observed without a lag, and therefore,

j = 0

. Note this only applies to the EL and AU agents, but the RE equilibrium assumes perfect information where agents observe all current values of state variables. However, for rational agents, the stability conditions considered now can be derived from a perfect foresight equilibrium and are independent of the information assumption.

Figure A1 compares the models in the

(ρ_{r}, θ_{π})

space with

θ_{y} = 0.3

and

θ_{d y} = 0

. Figure A2 sets

ρ_{r} = 1

and compares the EL and AU models in

(θ_{y}, θ_{π})

space having re-parameterised the rule as

r_{n, t} = ρ_{r} r_{n, t - 1} + θ_{π} π_{t} + θ_{y} y_{t}

. Note that this rule reduces to a price-level rule when

θ_{y} = 0

. The differences in the sizes of the policy spaces that result in a saddle-path stable equilibrium are significant. Furthermore, a clear ranking of the sizes of these spaces emerges with

R E \supset E L \supset A U

. This means that, unless the policy rule is designed for the AU model, uncertainty as to which model of expectations is correct can lead to a rule that is unstable or has infinite multiple equilibria (i.e., is indeterminate).

Figure A1. Comparison of stability properties of RE, EL and AU models in

(ρ_{r}, θ_{π})

space;

ρ_{r} > 0

,

λ_{x} = 1

; red: determinancy; black: indeterminacy; green: instability. (a) RE:

θ_{y} = 0.3

; (b) EL:

θ_{y} = 0.3

; (c) AU:

θ_{y} = 0.3

.

Figure A1. Comparison of stability properties of RE, EL and AU models in

(ρ_{r}, θ_{π})

space;

ρ_{r} > 0

,

λ_{x} = 1

; red: determinancy; black: indeterminacy; green: instability. (a) RE:

θ_{y} = 0.3

; (b) EL:

θ_{y} = 0.3

; (c) AU:

θ_{y} = 0.3

.

Figure A2. Comparisonof stability properties of EL and AU models in

(θ_{y}, θ_{π})

space;

ρ_{r} = 1

,

λ_{x} = 1

; red: determinancy; black: indeterminacy; green: instability. (a) EL:

ρ_{r} = 1

; (b) AU:

ρ_{r} = 1

.

Figure A2. Comparisonof stability properties of EL and AU models in

(θ_{y}, θ_{π})

space;

ρ_{r} = 1

,

λ_{x} = 1

; red: determinancy; black: indeterminacy; green: instability. (a) EL:

ρ_{r} = 1

; (b) AU:

ρ_{r} = 1

.

Appendix B. Summary of Composite RE-BR Model

In stationarised form of the model for exogenous proportions

n_{h, t}

and

n_{f, t}

, we have

RE Households:

\begin{matrix} U_{t}^{R E} & = & U (C_{t}^{R E}, H_{t}^{R E}) = log C_{t}^{R E} - \frac{{(H_{t}^{R E})}^{1 + ϕ}}{1 + ϕ} \\ U_{C, t}^{R E} & = & E_{t} [β_{g, t + 1} U_{C, t + 1}^{R E} R_{t + 1}] \\ β_{g, t} & = & β / (1 + g_{t}) \\ g_{t} & = & (1 + g) exp (ϵ_{A t r e n d}) - 1 \\ R_{t} & = & \frac{R_{n, t - 1}}{Π_{t}} \\ U_{C, t}^{R E} & = & \frac{1}{C_{t}^{R E}} \\ U_{H, t}^{R E} & = & - {(H_{t}^{R E})}^{ϕ} \\ - \frac{U_{H, t}^{R E}}{U_{C, t}^{R E}} & = & W_{t} \end{matrix}

BR Households:

\begin{matrix} U_{t}^{B R} & = & U (C_{t}^{B R}, H_{t}^{B R}) = log C_{t}^{B R} - \frac{{(H_{t}^{B R})}^{1 + ϕ}}{1 + ϕ} \\ \frac{C_{t}^{B R}}{(1 - E_{t} β_{g, t + 1})} & = & \frac{1}{{(C_{t}^{B R})}^{\frac{1}{ϕ}}} (W_{t}^{1 + \frac{1}{ϕ}} + \frac{{((\frac{E_{t}^{*} R_{n, t + 1}}{R_{n, t}}) E_{t}^{*} W_{t + 1})}^{1 + \frac{1}{ϕ}}}{{(E_{t} β_{g, t + 1})}^{\frac{1}{ϕ}} {(E_{t}^{*} R_{t + 1}^{e x})}^{1 + \frac{1}{ϕ}} - 1}) \\ + & Γ_{t} - G_{t} + \frac{(\frac{E_{t}^{*} R_{n, t + 1}}{R_{n, t}}) E_{t}^{*} (Γ_{t + 1} - G_{t + 1})}{E_{t}^{*} R_{t + 1}^{e x} - 1} \\ \equiv & \frac{1}{{(C_{t}^{B R})}^{\frac{1}{ϕ}}} (W_{t}^{1 + \frac{1}{ϕ}} + {(\frac{E_{t}^{*} R_{n, t + 1}}{R_{n, t}})}^{1 + \frac{1}{ϕ}} Ω_{1, t}) \\ + & Γ_{t} - G_{t} + (\frac{E_{t}^{*} R_{n, t + 1}}{R_{n, t}}) Ω_{2, t} \\ U_{C, t}^{B R} & = & \frac{1}{C_{t}^{B R}} \\ U_{H, t}^{B R} & = & - {(H_{t}^{B R})}^{ϕ} \\ - \frac{U_{H, t}^{B R}}{U_{C, t}^{B R}} & = & W_{t} \end{matrix}

where

\begin{matrix} Ω_{1, t} & = & \frac{{(E_{t}^{*} W_{t + 1})}^{1 + \frac{1}{ϕ}}}{{(E_{t} β_{g, t + 1})}^{\frac{1}{ϕ}} {(E_{t}^{*} R_{t + 1}^{e x})}^{1 + \frac{1}{ϕ}} - 1} \\ Ω_{2, t} & = & \frac{E_{t}^{*} (Γ_{t + 1} - G_{t + 1})}{E_{t}^{*} R_{t + 1}^{e x} - 1} \\ E_{t}^{*} R_{t + 1}^{e x} & = & \frac{E_{t}^{*} R_{n, t + 1}}{E_{h, t}^{*} Π_{t + 1}} \end{matrix}

Wholesale Firms:

\begin{matrix} Y_{t}^{W} & = & F (A_{t}, H_{t}) = A_{t} H_{t}^{α} = A_{t} {(n_{h, t} H_{t}^{R E} + (1 - n_{h, t}) H_{t}^{B R})}^{α} \\ Y_{t} & = & \frac{Y_{t}^{W}}{Δ_{t}^{α}} \\ \frac{P_{t}^{W}}{P_{t}} F_{H, t} & = & \frac{P_{t}^{W}}{P_{t}} \frac{α Y_{t}^{W}}{H_{t}} = W_{t} \\ 1 & = & ξ Π_{t}^{ζ - 1} + (1 - ξ) (n_{f, t} {(\frac{J_{t}^{R E}}{J J_{t}^{R E}})}^{1 - ζ} + (1 - n_{f, t}) {(\frac{J_{t}^{B R}}{J J_{t}^{B R}})}^{1 - ζ}) \\ Δ_{t} & = & ξ Π_{t}^{\frac{ζ}{α}} Δ_{t - 1} + (1 - ξ) (n_{f, t} {(\frac{J_{t}^{R E}}{J J_{t}^{R E}})}^{- \frac{ζ}{α}} + (1 - n_{f, t}) {(\frac{J_{t}^{B R}}{J J_{t}^{B R}})}^{- \frac{ζ}{α}}) \\ M C_{t} & = & \frac{P_{t}^{W}}{P_{t}} = \frac{W_{t}}{F_{H, t}} \\ Γ_{t} & = & Y_{t} - α M C_{t} Y_{t}^{W} \end{matrix}

RE Retail Firms:

\begin{matrix} J J_{t}^{R E} - ξ E_{t} [Π_{t + 1}^{ζ - 1} J J_{t + 1}^{R E} β_{g, t + 1}] & = & Y_{t} (n_{h, t} U_{C, t}^{R E} + (1 - n_{h, t}) U_{C, t}^{B R}) \\ J_{t}^{R E} - ξ E_{t} [Π_{t + 1}^{ζ} J_{t + 1}^{R E} β_{g, t + 1}] & = & (\frac{1}{1 - \frac{1}{ζ}}) Y_{t} M C_{t} M S_{t} (n_{h, t} U_{C, t}^{R E} + (1 - n_{h, t}) U_{C, t}^{B R}) \\ {(\frac{P_{t}^{0}}{P_{t}})}^{R E} & = & \frac{J_{t}^{R E}}{J J_{t}^{R E}} \end{matrix}

BR Retail Firms:

\begin{matrix} J_{t}^{B R} & = & (\frac{1}{1 - \frac{1}{ζ}}) (Y_{t} M C_{t} M S_{t} + Ω_{3, t}) \\ J J_{t}^{B R} & = & Y_{t} + Ω_{4, t} \\ {(\frac{P_{t}^{0}}{P_{t}})}^{B R} & = & \frac{J_{t}^{B R}}{J J_{t}^{B R}} \end{matrix}

where

\begin{matrix} Ω_{3, t} & = & \frac{ξ {(E_{f, t}^{*} Π_{t + 1})}^{ζ} E_{t}^{*} Y_{t + 1} E_{t}^{*} M C_{t + 1} E_{t}^{*} M S_{t + 1}}{E_{f, t}^{*} R_{t + 1} - ξ {(Π_{t + 1})}^{ζ}} \\ Ω_{4, t} & = & \frac{ξ {(E_{f, t}^{*} Π_{t + 1})}^{ζ - 1} E_{t}^{*} Y_{t + 1}}{E_{f, t}^{*} R_{t + 1} - ξ {(E_{f, t}^{*} Π_{t + 1})}^{ζ - 1}} \\ E_{f, t}^{*} R_{t + 1} & = & E_{f, t}^{*} [\frac{R_{n, t}}{Π_{t + 1}}] = \frac{R_{n, t}}{E_{f, t}^{*} Π_{t + 1}} \end{matrix}

One-Period Ahead Adaptive Expectations:

\begin{matrix} E_{t}^{*} [β_{g, t + 1}] & = & E_{t - 1}^{*} [β_{g}, t] + λ_{1, β_{g}} (β_{g, t - 1} - E_{t - 1}^{*} [β_{g, t}]) + λ_{2, β_{g}} (β_{g, t - 1} - β_{g, t - 2}); λ_{i, β_{g}} \in [0, 1] \\ E_{t}^{*} [G_{t + 1}] & = & E_{t - 1}^{*} [G_{t}] + λ_{1, G} (G_{t} - E_{t - 1}^{*} [G_{t}]) + λ_{2, G} (G_{t} - G_{t - 1}); λ_{i, G} \in [0, 1] \\ E_{t}^{*} [W_{t + 1}] & = & E_{t - 1}^{*} [W_{t}] + λ_{W} (W_{t} - E_{t - 1}^{*} [W_{t}]) + λ_{2, W} (W_{t} - W_{t - 1}); λ_{i, W} \in [0, 1] \\ E_{t}^{*} [Γ_{t + 1}] & = & E_{t - 1}^{*} [Γ_{t}] + λ_{1, Γ} (Γ_{t} - E_{t - 1}^{*} [Γ_{t}]) + λ_{2, Γ} (Γ_{t} - Γ_{t - 1}); λ_{i, Γ} \in [0, 1] \\ E_{t}^{*} [R_{n, t + 1}] & = & E_{t - 1}^{*} [R_{n, t}] + λ_{1, R_{n}} (R_{n, t} - E_{t - 1}^{*} [R_{n, t}]) + λ_{2, R_{n}} (R_{n, t} - R_{n, t - 1}); λ_{i, R_{n}} \in [0, 1] (households) \\ E_{h, t}^{*} [Π_{t + 1}] & = & E_{t - 1}^{*} [Π_{t}] + λ_{1 h, Π} (Π_{t - 1} - E_{t - 1}^{*} [Π_{t}]) + λ_{2 h, Π} (Π_{t - 1} - Π_{t - 2}); λ_{i h, Π} \in [0, 1] (households) \\ E_{f, t}^{*} [Π_{t + 1}] & = & E_{t - 1}^{*} [Π_{t}] + λ_{1 f, Π} (Π_{t - 1} - E_{t - 1}^{*} [Π_{t}]) + λ_{2 h, Π} (Π_{t - 1} - Π_{t - 2}); λ_{i f, Π} \in [0, 1] (firms) \\ E_{t}^{*} [Y_{t + 1}] & = & E_{t - 1}^{*} [Y_{t}] + λ_{1, Y} (Y_{t - 1} - E_{t - 1}^{*} [Y_{t}]) + λ_{2, Y} (Y_{t - 1} - Y_{t - 2}); λ_{i, Y} \in [0, 1] \\ E_{t}^{*} [{\tilde{M C}}_{t + 1}] & = & E_{t - 1}^{*} [{\tilde{M C}}_{t}] + λ_{1, M C} ({\tilde{M C}}_{t} - E_{t - 1}^{*} [{\tilde{M C}}_{t}]) + λ_{2, M C} ({\tilde{M C}}_{t} - {\tilde{M C}}_{t - 1}); λ_{i, M C} \in [0, 1] \end{matrix}

where

{\tilde{M C}}_{t} \equiv M C_{t} M S_{t}

. Note that we have used the first-order approximation

log \frac{X_{t}}{X} \approx \frac{X_{t} - X}{X}

.

Wealth Distribution:

First, define bond holdings of BR households by

B_{t}^{B R} = R_{t} B_{t - 1}^{B R} + W_{t} H_{t}^{B R} + Γ_{t} - C_{t}^{B R} - T_{t} - \frac{ϖ}{2} {(B_{t - 1}^{B R} - B)}^{2}

having introduced a portfolio cost adjustment with a small

ϖ

. Then, replace

C_{t}^{B R}

and the Euler equation above with

\begin{matrix} \frac{C_{t}^{B R} - B_{t}^{B R}}{(1 - E_{t}^{*} β_{g, t + 1})} & = & \frac{1}{{(C_{t}^{B R})}^{\frac{1}{ϕ}}} (W_{t}^{1 + \frac{1}{ϕ}} + \frac{{((\frac{E_{t}^{*} R_{n, t + 1}}{R_{n, t}}) E_{t}^{*} W_{t + 1})}^{1 + \frac{1}{ϕ}}}{{(E_{t}^{*} β_{g, t + 1})}^{\frac{1}{ϕ}} {(E_{t}^{*} R_{t + 1}^{e x})}^{1 + \frac{1}{ϕ}} - 1}) + Γ_{t} - G_{t} \\ + & \frac{(\frac{E_{t}^{*} R_{n, t + 1}}{R_{n, t}}) E_{t}^{*} (Γ_{t + 1} - G_{t + 1})}{E_{t}^{*} R_{t + 1}^{e x} - 1} \\ \equiv & \frac{1}{{(C_{t}^{B R})}^{\frac{1}{ϕ}}} (W_{t}^{1 + \frac{1}{ϕ}} + {(\frac{E_{t}^{*} R_{n, t + 1}}{R_{n, t}})}^{1 + \frac{1}{ϕ}} Ω_{1, t}) \\ + & Γ_{t} - G_{t} + (\frac{E_{t}^{*} R_{n, t + 1}}{R_{n, t}}) Ω_{2, t} \\ U_{C, t}^{R E} & = & E_{t} [β_{g, t + 1} U_{C, t + 1}^{R E} (R_{t + 1} - ϖ (B_{t}^{R E} - B))] \end{matrix}

where zero net wealth implies

n_{h, t} B_{t}^{R E} = - (1 - n_{h, t}) B_{t}^{B R}

.

Closure of Model:

\begin{matrix} Y_{t} & = & n_{h, t} C_{t}^{R E} + (1 - n_{h, t}) C_{t}^{B R} + G_{t} \\ G_{t} & = & T_{t} \\ log (\frac{R_{n, t}}{R_{n}}) & = & ρ_{r} log (\frac{R_{n, t - 1}}{R_{n}}) + (1 - ρ_{r}) (θ_{π} log (\frac{Π_{t}}{Π_{t a r g, t}}) \\ + & θ_{y} log (\frac{Y_{t}}{Y}) + θ_{d y} log (\frac{Y_{t}}{Y_{t - 1}})) + ϵ_{M P, t} \\ log A_{t} - log A & = & ρ_{A} (log A_{t - 1} - log A) + ϵ_{A, t} \\ log G_{t} - log G & = & ρ_{G} (log G_{t - 1} - log G) + ϵ_{G, t} \\ log M S_{t} - log M S & = & ρ_{M S} (log M S_{t - 1} - log M S) + ϵ_{M S, t} \\ log Π_{t a r g, t} - log Π & = & ρ_{π} (log Π_{t a r g, t - 1} - log Π) + ϵ_{π, t} \end{matrix}

Endogenous Proportions of RE and BR Agents:

The payoff for households and firms is expressed in terms of a discounted sum of past weighted forecast errors,

Φ_{h, t}

say, starting at

t = 0

for rational and non-rational households, respectively,

\begin{matrix} Φ_{h, t}^{R E} & = & μ_{h}^{R E} Φ_{h, t - 1}^{R E} - (1 - μ_{h}^{R E}) (w_{β_{g}} (β_{g, t} - E_{h, t - 1} β_{g, t}) / β_{g})^{2} + w_{G} {((G_{t} - E_{h, t - 1} G_{t}) / G)}^{2} \\ + & w_{W} {((W_{t} - E_{h, t - 1} W_{t}) / W)}^{2} + w_{h, Π} {((Π_{t} - E_{h, t - 1} Π) / Π)}^{2} \\ + & w_{Γ} {((Γ_{t} - E_{h, t - 1} Γ_{t}) / Γ)}^{2} + w_{R} {((R_{n, t} - E_{t - 1} R_{n, t}) / R_{n})}^{2} + C_{h}) \\ Φ_{h, t}^{B R} & = & μ_{h}^{B R} Φ_{h, t - 1}^{B R} - (1 - μ_{h}^{B R}) (w_{β_{g}} (β_{g, t} - E_{h, t - 1}^{*} β_{g, t}) / β_{g})^{2} + w_{G} {((G_{t} - E_{h, t - 1}^{*} G_{t}) / G)}^{2} \\ + & w_{W} {((W_{t} - E_{h, t - 1}^{*} W_{t}) / W)}^{2} + w_{h, Π} {((Π_{t} - E_{h, t - 1}^{*} Π) / Π)}^{2} + w_{Γ} {((Γ_{t} - E_{h, t - 1}^{*} Γ_{t}) / Γ)}^{2} \\ + & w_{R} {((R_{n, t} - E_{t - 1} R_{n, t}) / R_{n})}^{2})) \end{matrix}

The parameter

C_{h}

is a fixed cost of being rational for households. For firms, this becomes

\begin{matrix} Φ_{f, t}^{R E} & = & μ_{f}^{R E} Φ_{f, t - 1}^{R E} - (1 - μ_{f}^{R E}) (w_{Y} {((Y_{t} - E_{f, t - 1} Y_{t}) / Y)}^{2} + w_{f, Π} {((Π_{t} - E_{f, t - 1} Π) / Π)}^{2} \\ + & w_{M C} {(({\tilde{M C}}_{t} - E_{f, t - 1} {\tilde{M C}}_{t}) / M C)}^{2} + C_{f}) \\ Φ_{f, t}^{B R} & = & μ_{f}^{B R} Φ_{f, t - 1}^{B R} - (1 - μ_{f}^{B R}) (w_{Y} {((Y_{t} - E_{f, t - 1}^{*} Y_{t}) / Y)}^{2} + w_{f, Π} {((Π_{t} - E_{f, t - 1}^{*} Π) / Π)}^{2} \\ + & w_{M C} {(({\tilde{M C}}_{t} - E_{f, t - 1}^{*} {\tilde{M C}}_{t}) / M C)}^{2}) \end{matrix}

where parameter

C_{f}

is a fixed cost of being rational for firms, and we allow for the possibility that

C_{h} \neq C_{f}

. Then, the proportions of rational households and firms is given by

\begin{matrix} n_{h, t} & = & \frac{exp (γ Φ_{h, t}^{R E})}{exp {(γ Φ_{h, t})}^{R E} + exp (γ Φ_{h, t}^{B R})} = \frac{exp (γ (Φ_{h, t}^{R E} - Φ_{h, t}^{B R}))}{exp (γ (Φ_{h, t}^{R E} - Φ_{h, t}^{B R})) + 1} \\ n_{f, t} & = & \frac{exp (γ Φ_{f, t}^{R E})}{exp {(γ Φ_{f, t})}^{R E} + exp (γ Φ_{f, t}^{B R})} = \frac{exp (γ (Φ_{f, t}^{R E} - Φ_{f, t}^{B R}))}{exp (γ (Φ_{f, t}^{R E} - Φ_{f, t}^{B R})) + 1} \end{matrix}

Thus, the proportion of rational agents in the steady state is given by

\begin{matrix} n_{h} & = & \frac{exp (- γ C_{h})}{exp (- γ C_{h}) + 1} \\ n_{f} & = & \frac{exp (- γ C_{f})}{exp (- γ C_{f}) + 1} \end{matrix}

which is pinned down by the cost parameters

(C_{h}, C_{f})

(which can be positive or negative).

Welfare and Consumption Equivalence:

\begin{matrix} U_{t} & = & log ((n_{h, t} C_{t}^{R E} + (1 - n_{h, t}) C_{t}^{B R}) - \frac{{(n_{h, t} H_{t}^{R E} + {(1 - n_{h, t} H_{t})}^{B R})}^{1 + ϕ}}{1 + ϕ} \\ {w e l}_{t} & = & (1 - β_{g, t}) U_{t} + E_{t} [β_{g, t + 1} {w e l}_{t + 1}] \\ {w e l}_{t}^{R E} & = & (1 - β_{g, t}) U_{t}^{R E} + E_{t} [β_{g, t + 1} {w e l}_{t + 1}^{R E}] \\ {w e l}_{t}^{B R} & = & (1 - β_{g, t}) U_{t}^{B R} + E_{t} [β_{g, t + 1} {w e l}_{t + 1}^{B R}] \\ C E_{t} & = & log (1.01 C_{t}) - log (C_{t}) \end{matrix}

Appendix C. Balanced Growth Steady State

In recursive form, the zero-growth zero-inflation (

Π = 1

) steady state can be written as

\begin{matrix} R & = & \frac{1}{β} \\ Λ & = & β \\ M C = \frac{P^{W}}{P} & = & 1 - \frac{1}{ζ} \\ \frac{C}{Y} & = & 1 - g_{y} \\ H & = & \frac{α Δ^{α} M C}{κ (1 - g_{y})} \\ Y^{W} & = & {(A H)}^{α} \\ Y & = & \frac{Y^{W}}{Δ^{α}} \\ W & = & α \frac{P^{W}}{P} \frac{Y^{W}}{H} \\ J & = & \frac{Y M C U_{C}}{(1 - \frac{1}{ζ}) (1 - ξ β Π^{ζ})} \\ J J & = & \frac{Y U_{C}}{(1 - ξ β Π^{ζ - 1})} \\ Hence, with Π = 1, J & = & J J \\ Δ & = & 1 \\ Γ & = & Y - α M C Y^{W} \end{matrix}

For a particular steady state, the inflation rate

Π > 1

, and the NK features of the steady state become

\begin{matrix} \frac{J}{J J} & = & {(\frac{1 - ξ Π^{ζ - 1}}{1 - ξ})}^{\frac{1}{1 - ζ}} \\ M C = \frac{P^{W}}{P} & = & (1 - \frac{1}{ζ}) \frac{J (1 - β ξ Π^{ζ})}{J J (1 - β ξ Π^{ζ - 1})} \\ Δ & = & \frac{{(1 - ξ)}^{α} {(\frac{J}{J J})}^{- ζ}}{1 - ξ Π^{ζ}} \end{matrix}

then,

P^{W} Y^{W} / P Y = M C Δ

.

We can now easily set up the model with a balanced exogenous-growth steady state. Now the process for

A_{t}

is replaced with

\begin{matrix} A_{t} & = & {\bar{A}}_{t} A_{t}^{c} \\ {\bar{A}}_{t} & = & (1 + g) {\bar{A}}_{t - 1} exp (ϵ_{A, t}) \\ log A_{t}^{c} - log A^{c} & = & ρ_{A} (log A_{t - 1}^{c} - log A^{c}) + ϵ_{A, t} \end{matrix}

where

A_{t}

is a labour-augmenting technical progress parameter which we decompose into a cyclical component,

A_{t}^{c}

, modelled as a temporary AR(1) process and a stochastic trend, whose log is a random walk with drift,

{\bar{A}}_{t}

. Thus, the balanced growth deterministic steady state path is driven by labour-augmenting technical change growing at a net rate g. If we put

g = ϵ_{t r e n d, t} = 0

and

{\bar{A}}_{t} = 1

, we arrive at our previous formulation with

A_{t}^{c} = A_{t}

.

Now stationarise the variables by defining cyclical and stationary components

\begin{matrix} {(Y_{t}^{W})}^{c} & \equiv & \frac{Y_{t}^{W}}{{\bar{A}}_{t}} = A_{t}^{c} H_{t}^{α} \\ C_{t}^{c} & \equiv & \frac{C_{t}}{{\bar{A}}_{t}} \\ W_{t}^{c} & \equiv & \frac{W_{t}}{{\bar{A}}_{t}} \\ U_{t}^{c} & \equiv & log C_{t}^{c} - κ \frac{H_{t}^{1 + ϕ}}{1 + ϕ} \\ U_{C, t}^{c} & \equiv & \frac{1}{C_{t}^{c}} \\ Λ_{t, t + 1} & = & β \frac{U_{C, t + 1}}{U_{C, t}} = β_{g, t + 1} \frac{U_{C, t + 1}^{c}}{U_{C, t}^{c}} \end{matrix}

for all non-stationary variables where

\begin{matrix} g_{t} & \equiv & \frac{({\bar{A}}_{t} - {\bar{A}}_{t - 1})}{{\bar{A}}_{t}} = (1 + g) exp (ϵ_{A, t}) - 1 \\ β_{g, t} & \equiv & β (1 + g_{t}) \end{matrix}

is the stochastic steady state growth rate; then, the stationarised Euler equation and the Calvo pricing become

E_{t} [Λ_{t, t + 1} R_{t + 1}] = E_{t} [β_{g, t + 1} \frac{U_{C, t + 1}^{c}}{U_{C, t}^{c}} R_{t + 1}] = 1

and

\begin{matrix} {\hat{J J}}_{t}^{c} - ξ E_{t} [Π_{t + 1}^{ζ - 1} {\hat{J J}}_{t + 1}^{c} Λ_{t, t + 1}] & = & Y_{t}^{c} \\ {\hat{J}}_{t}^{c} - ξ E_{t} [Π_{t + 1}^{ζ} {\hat{J}}_{t + 1}^{c} Λ_{t, t + 1}] & = & Y_{t}^{c} M C_{t} M S_{t} \end{matrix}

or equivalently

\begin{matrix} {\hat{J J}}_{t}^{c} - ξ E_{t} [Π_{t + 1}^{ζ - 1} {\hat{J J}}_{t + 1}^{c} β_{g, t + 1}] & = & Y_{t}^{c} U_{t}^{c} \\ {\hat{J}}_{t}^{c} - ξ E_{t} [Π_{t + 1}^{ζ} {\hat{J}}_{t + 1}^{c} β_{g, t + 1}] & = & Y_{t}^{c} U_{t}^{c} M C_{t} M S_{t} \end{matrix}

The steady state for the rest of the system is the same as the zero-growth one except for the following relationships:

\begin{matrix} R & = & \frac{1}{β_{g}} = \frac{R_{n}}{Π} \end{matrix}

where R and

R_{n}

are the real and nominal steady state interest rates, and

Π

is inflation.

Appendix D. Lemma

In the first-order conditions for Calvo contracts and expressions for value functions, we are confronted with expected discounted sums of the general form

Ω_{t} = E_{t} [\sum_{k = 0}^{\infty} β^{k} X_{t, t + k} Y_{t + k}]

where

X_{t, t + k}

has the property

X_{t, t + k} = X_{t, t + 1} X_{t + 1, t + k}

and

X_{t, t} = 1

(for example an inflation, interest or discount rate over the interval

[t, t + k]

).

Lemma A1.

Ω_{t}

can be expressed as

Ω_{t} = Y_{t} + β E_{t} [X_{t, t + 1} Ω_{t + 1}]

Proof.

\begin{matrix} Ω_{t} & = & X_{t, t} Y_{t} + E_{t} [\sum_{k = 1}^{\infty} β^{k} X_{t, t + k} Y_{t + k}] \\ = & Y_{t} + E_{t} [\sum_{k^{'} = 0}^{\infty} β^{k^{'} + 1} X_{t, t + k^{'} + 1} Y_{t + k^{'} + 1}] \\ = & Y_{t} + β E_{t} [\sum_{k^{'} = 0}^{\infty} β^{k^{'}} X_{t, t + 1} X_{t + 1, t + k^{'} + 1} Y_{t + k^{'} + 1}] \\ = & Y_{t} + β E_{t} [X_{t, t + 1} Ω_{t + 1}] \end{matrix}

□

Appendix E

Proof of Equation (28).

In the next period,

ξ

of these firms will keep their old prices, and

(1 - ξ)

will change their prices to

P_{t + 1}^{O}

. By the law of large numbers, we assume that the distribution of prices among those firms that do not change their prices is the same as the overall distribution in period t. It follows that we may write

\begin{matrix} Δ_{t + 1} & = & ξ \sum_{j_{n o c h a n g e}} {(\frac{P_{t} (j)}{P_{t + 1}})}^{- ζ} + (1 - ξ) {(\frac{J_{t + 1}}{J J_{t + 1}})}^{- ζ} \\ = & ξ {(\frac{P_{t}}{P_{t + 1}})}^{- ζ} \sum_{j_{n o c h a n g e}} {(\frac{P_{t} (j)}{P_{t}})}^{- ζ} + (1 - ξ) {(\frac{J_{t + 1}}{J J_{t + 1}})}^{- ζ} \\ = & ξ {(\frac{P_{t}}{P_{t + 1}})}^{- ζ} \sum_{j} {(\frac{P_{t} (j)}{P_{t}})}^{- ζ} + (1 - ξ) {(\frac{J_{t + 1}}{J J_{t + 1}})}^{- ζ} \\ = & ξ Π_{t + 1}^{ζ} Δ_{t} + (1 - ξ) {(\frac{J_{t + 1}}{J J_{t + 1}})}^{- ζ} \end{matrix}

□

Appendix F. Additional Simulated IRFs for RE-BR Composite Models

Figure A3. RE versus RE-BR composite expectations with

n_{h} = n_{f} = 0.5

;

λ_{x} = 0.25, 1.0

; Taylor rule with

ρ_{r} = 0.7

,

θ_{π} = 1.5

and

θ_{y} = 0.3

; technology shock.

Figure A3. RE versus RE-BR composite expectations with

n_{h} = n_{f} = 0.5

;

λ_{x} = 0.25, 1.0

; Taylor rule with

ρ_{r} = 0.7

,

θ_{π} = 1.5

and

θ_{y} = 0.3

; technology shock.

Figure A4. RE versus RE-BR composite expectations with

n_{h} = n_{f} = 0.5

;

λ_{x} = 0.25, 1.0

; Taylor rule with

ρ_{r} = 0.7

,

θ_{π} = 1.5

and

θ_{y} = 0.3

; mark-up shock.

Figure A4. RE versus RE-BR composite expectations with

n_{h} = n_{f} = 0.5

;

λ_{x} = 0.25, 1.0

; Taylor rule with

ρ_{r} = 0.7

,

θ_{π} = 1.5

and

θ_{y} = 0.3

; mark-up shock.

Appendix G. Robustness

Table A1. Third-order solution of the estimated NK RE-BR model;

μ_{h}^{R E} = μ_{h}^{B R} = μ_{f}^{R E} = μ_{f}^{B R} = 0.5

;

γ = 1, 100, 1000

.

Table A1. Third-order solution of the estimated NK RE-BR model;

μ_{h}^{R E} = μ_{h}^{B R} = μ_{f}^{R E} = μ_{f}^{B R} = 0.5

;

γ = 1, 100, 1000

.

Variable	Stochastic Mean	Standard Deviation (%)	Skewness	Kurtosis
$\frac{C_{t}}{C}$	0.999544	0.042057	0.323304	0.093034
$\frac{H_{t}}{H}$	1.000273	0.005111	0.038002	−0.020743
$\frac{W_{t}}{W}$	0.999810	0.038145	0.318586	0.073488
$\frac{Π_{t}}{Π}$	0.999898	0.004235	−0.045800	0.030136
$\frac{R_{n, t}}{R_{n}}$	0.999887	0.004440	−0.046254	0.044145
$Φ_{h, t}^{R E} - C_{h}$	−0.000443	0.000257	−1.504159	3.793195
$Φ_{h, t}^{A E}$	−0.000526	0.000303	−1.592581	4.581412
$Φ_{f, t}^{R E} - C_{f}$	−0.000199	0.000116	−1.672777	5.558457
$Φ_{f, t}^{A E}$	−0.000349	0.000226	−1.897335	7.457836
$n_{h, t} (γ = 1; σ = 1)$	0.100008	0.000013	0.488774	3.275592
$n_{f, t} (γ = 1; σ = 1)$	0.100014	0.000016	1.680492	6.480563
$n_{h, t} (γ = 100; σ = 1)$	0.100750	0.001295	0.488774	3.275592
$n_{f, t} (γ = 100; σ = 1)$	0.101352	0.001568	1.680492	6.480563
$n_{h, t} (γ = 1000; σ = 1)$	0.107502	0.012952	0.488774	3.275592
$n_{f, t} (γ = 1000; σ = 1)$	0.113519	0.015679	1.680492	6.480563
$n_{h, t} (γ = 1000; σ = 2)$	0.130010	0.052873	0.535046	3.638229
$n_{f, t} (γ = 1000; σ = 2)$	0.154185	0.063624	1.779321	7.399916

Table A2. Third-order solution of the estimated NK RE-BR model;

μ_{h}^{R E} = μ_{h}^{B R} = μ_{f}^{R E} = μ_{f}^{B R} = 0.75

;

γ = 1, 100, 1000

.

Table A2. Third-order solution of the estimated NK RE-BR model;

μ_{h}^{R E} = μ_{h}^{B R} = μ_{f}^{R E} = μ_{f}^{B R} = 0.75

;

γ = 1, 100, 1000

.

Variable	Stochastic Mean	Standard Deviation (%)	Skewness	Kurtosis
$\frac{C_{t}}{C}$	0.999544	0.042057	0.323304	0.093034
$\frac{H_{t}}{H}$	1.000273	0.005111	0.038002	−0.020743
$\frac{W_{t}}{W}$	0.999810	0.038145	0.318586	0.073488
$\frac{Π_{t}}{Π}$	0.999898	0.004235	−0.045797	0.030137
$\frac{R_{n, t}}{R_{n}}$	0.999887	0.004440	−0.046251	0.044145
$Φ_{h, t}^{R E} - C_{h}$	−0.000443	0.000170	−0.978598	1.538134
$Φ_{h, t}^{A E}$	−0.000526	0.000204	−1.088202	2.164231
$Φ_{f, t}^{R E} - C_{f}$	−0.000199	0.000077	−1.063312	2.243911
$Φ_{f, t}^{A E}$	−0.000349	0.000159	−1.414287	4.290569
$n_{h, t} (γ = 1; σ = 1)$	0.100008	0.000008	0.350716	2.281134
$n_{f, t} (γ = 1; σ = 1)$	0.100014	0.000011	1.385635	4.243151
$n_{h, t} (γ = 100; σ = 1)$	0.100750	0.000821	0.350716	2.281134
$n_{f, t} (γ = 100; σ = 1)$	0.101352	0.001081	1.385635	4.243151
$n_{h, t} (γ = 1000; σ = 1)$	0.107503	0.008211	0.350716	2.281134
$n_{f, t} (γ = 1000; σ = 1)$	0.113521	0.010812	1.385635	4.243151
$n_{h, t} (γ = 1000; σ = 2)$	0.130012	0.033699	0.406619	2.557592
$n_{f, t} (γ = 1000; σ = 2)$	0.154191	0.044060	1.491071	4.993491

References

Branch, W.A.; Evans, G.W. Monetary Policy and Heterogeneous Agents. Econ. Theory 2011, 47, 365–393. [Google Scholar] [CrossRef]
De Grauwe, P. Animal spirits and monetary policy. Econ. Theory 2011, 47, 423–457. [Google Scholar] [CrossRef]
De Grauwe, P. Booms and Busts in Economic Activity: A Behavioral Explanation. J. Econ. Behav. Organ. 2012, 83, 484–501. [Google Scholar] [CrossRef]
De Grauwe, P. Lectures on Behavioral Macroeconomics; Princeton University Press: Princeton, NJ, USA, 2012. [Google Scholar]
Sims, C. Macroeconomics and Reality. Econometrica 1980, 48, 1–48. [Google Scholar] [CrossRef]
Evans, G.W.; Honkapohja, S. Learning and Macroeconomics. Annu. Rev. Econ. 2009, 1, 421–449. [Google Scholar] [CrossRef]
Eusepi, S.; Preston, B. The science of monetary policy: An imperfect knowledge perspective. In Federal Reserve Bank of New York Satff Reports; No. 782; Federal Reserve Bank of New York: New York, NY, USA, 2016. [Google Scholar]
Branch, W.A.; McGough, B. Heterogeneous Expectations and Micro-Foundations in Macroeconomics. In Handbook of Computational Economics 4; Elsevier: Amsterdam, The Netherlands, 2018. [Google Scholar]
CalvertJump, R.; Levine, P. Behavioural New Keynesian Models. J. Macroecon. 2019, 59, 58–77. [Google Scholar]
Caiani, A.; Godin, A.; Caverzasi, E.; Gallegati, M.; Kinsella, S.; Stiglitz, J. Agent based-stock flow consistent macroeconomics: Towards a benchmark model. J. Econ. Dyn. Control 2016, 69, 375–408. [Google Scholar] [CrossRef]
Tesfatsion, L. Agent-based computational economics: A constructive approach to economic theory. In Handbook of Computational Economics; Tesfatsion, L.S., Judd, K.L., Eds.; North-Holand: Amsterdam, The Netherlands, 2006; pp. 831–880. [Google Scholar]
Levine, P. The State of DSGE Modelling. In Oxford Research Encyclopedia of Economics and Finance; Oxford University Press: Oxford, UK, 2020. [Google Scholar]
Brock, W.A.; Hommes, C. A Rational Route to Randomness. Econometrica 1997, 65, 1059–1095. [Google Scholar] [CrossRef]
Tamvada, J.P.; Chowdhury, R. The irrationality of rationality in market economics: A paradox of incentives perspective. Bus. Soc. 2023, 62, 482–487. [Google Scholar] [CrossRef]
Sirota, M.; Juanchich, M.; Holford, D.L. Rationally irrational: When people do not correct their reasoning errors even if they could. J. Exp. Psychol. Gen. 2023. advance online publication. [Google Scholar] [CrossRef]
Kreps, D. Anticipated Utility and Dynamic Choice. In Frontiers of Research in Economic Theory; Jacobs, D., Kalai, E., Kamien, M., Eds.; Cambridge University Press: Cambridge, UK, 1998; pp. 242–274. [Google Scholar]
Adam, K.; Marcet, A. Internal Rationality, Imperfect Market Knowledge and Asset Prices. J. Econ. Theory 2011, 146, 1224–1252. [Google Scholar] [CrossRef]
Cogley, T.; Sargent, T.J. Anticipated utility and rational expectations as approximations of bayesian decision making. Int. Econ. Rev. 2008, 49, 185–221. [Google Scholar] [CrossRef]
Branch, W.A.; McGough, B. Dynamic predictor election in a new keynesian model with heterogeneous agents. J. Econ. Dyn. Control 2010, 34, 1492–1508. [Google Scholar] [CrossRef]
Massaro, D. Heterogeneous Expectations in Monetary DSGE Models. J. Econ. Dyn. Control 2013, 37, 680–692. [Google Scholar] [CrossRef]
Cornea-Madeira, A.; Hommes, C.; Massaro, D. Behavioral Heterogeneity in U.S. Inflation Dynamics. J. Bus. Econ. Stat. 2019, 37, 288–300. [Google Scholar] [CrossRef]
CalvertJump, R.; Hommes, C.; Levine, P. Learning, Heterogeneity, and Complexity in the New Keynesian model. J. Econ. Behav. Organ. 2019, 166, 446–470. [Google Scholar] [CrossRef]
Milani, F. Expectations, learning and macroeconomic persistence. J. Monet. Econ. 2007, 54, 2065–2082. [Google Scholar] [CrossRef]
Anufriev, M.; Hommes, C.; Makarewicz, T. Simple Forecasting Heuristics that Make Us Smart: Evidence from Different Market Experiments; Working Paper Series 29; Economics Discipline Group, UTS Business School, University of Technology: Sydney, Australia, 2015. [Google Scholar]
Hommes, C. Behavioral Rationality and Heterogeneous Expectations in Complex Economic Systems; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
Schmitt-Grohe, S.; Uribe, M. Closing small open economy models. J. Int. Econ. 2003, 61, 163–185. [Google Scholar] [CrossRef]
Dixit, A.K.; Stiglitz, J.E. Monopolistic competition and optimal product diversity. Am. Econ. Rev. 1977, 67, 297–308. [Google Scholar]
Calvo, G. Staggered Prices in a Utility-Maximising Framework. J. Monet. Econ. 1983, 12, 383–398. [Google Scholar] [CrossRef]
Nimark, K.P. Man-Bites-Dog Business Cycles. Am. Econ. Rev. 2014, 104, 2320–2367. [Google Scholar] [CrossRef]
Deak, S.; Levine, P.; Pearlman, J.; Yang, B. Internal Rationality, Learning and Imperfect Information; School of Economics, University of Surrey: Guildford, UK, 2017; Discussion Papers 08/17. [Google Scholar]
Smets, F.; Wouters, R. Shocks and Frictions in US business cycles: A Bayesian DSGE approach. Am. Econ. Rev. 2007, 97, 586–606. [Google Scholar] [CrossRef]
Deak, S.; Mirza, A.; Levine, P.; Pearlman, J. Designing Robust Policies using Optimal Pooling; School of Economics, University of Surrey: Guildford, UK, 2019; Discussion Papers 12/19. [Google Scholar]
Deak, S.; Mirza, A.; Levine, P.; Pham, S. Negotiating the Wilderness of Bounded Rationality through Robust Policy; School of Economics, University of Surrey: Guildford, UK, 2023; Discussion Papers 02/23. [Google Scholar]
Bildirici, M.; Ersin, O. Markov-switching vector autoregressive neural networks and sensitivity analysis of environment, economic growth and petrol prices. Environ. Sci. Pollut. Res. 2018, 25, 31630–31655. [Google Scholar] [CrossRef] [PubMed]

Figure 1. RE versus RE-BR composite expectations with

n_{h} = n_{f} = 0.5

,

λ_{x} = 0.25, 1.0

; Taylor rule with

ρ_{r} = 0.7

,

θ_{π} = 1.5

and

θ_{y} = 0.3

,

θ_{d y} = 0

; monetary policy shock.

Figure 1. RE versus RE-BR composite expectations with

n_{h} = n_{f} = 0.5

,

λ_{x} = 0.25, 1.0

; Taylor rule with

ρ_{r} = 0.7

,

θ_{π} = 1.5

and

θ_{y} = 0.3

,

θ_{d y} = 0

; monetary policy shock.

Table 1. Third-order solution of the estimated NK RE-BR model;

μ_{h}^{R E} = μ_{h}^{B R} = μ_{f}^{R E} = μ_{f}^{B R} = 0

;

γ = 1, 100, 1000

.

Table 1. Third-order solution of the estimated NK RE-BR model;

μ_{h}^{R E} = μ_{h}^{B R} = μ_{f}^{R E} = μ_{f}^{B R} = 0

;

γ = 1, 100, 1000

.

Variable	Stochastic Mean	Standard Deviation (%)	Skewness	Kurtosis
$\frac{C_{t}}{C}$	0.999544	0.042057	0.323304	0.093034
$\frac{H_{t}}{H}$	1.000273	0.005111	0.038002	−0.020743
$\frac{W_{t}}{W}$	0.999810	0.038145	0.318586	0.073488
$\frac{Π_{t}}{Π}$	0.999898	0.004235	−0.045800	0.030136
$\frac{R_{n, t}}{R_{n}}$	0.999887	0.004440	−0.046254	0.044145
$Φ_{h, t}^{R E} - C_{h}$	−0.000443	0.000446	−2.078809	6.635580
$Φ_{h, t}^{A E}$	−0.000526	0.000516	−2.168947	8.000489
$Φ_{f, t}^{R E} - C_{f}$	−0.000199	0.000203	−2.279557	9.082031
$Φ_{f, t}^{A E}$	−0.000349	0.000342	−2.269953	9.937975
$n_{h, t} (γ = 1; σ = 1)$	0.100008	0.000023	0.857638	4.454288
$n_{f, t} (γ = 1; σ = 1)$	0.100014	0.000025	1.586194	6.015115
$n_{h, t} (γ = 100; σ = 1)$	0.100750	0.002297	0.857638	4.454288
$n_{f, t} (γ = 100; σ = 1)$	0.101352	0.002479	1.586194	6.015115
$n_{h, t} (γ = 1000; σ = 1)$	0.107501	0.022973	0.857638	4.454288
$n_{f, t} (γ = 1000; σ = 1)$	0.113518	0.024787	1.586194	6.015115
$n_{h, t} (γ = 1000; σ = 2)$	0.130007	0.093482	0.888592	4.857691
$n_{f, t} (γ = 1000; σ = 2)$	0.154182	0.100265	1.683430	6.867599

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Deák, S.; Levine, P.; Pearlman, J.; Yang, B. Reinforcement Learning in a New Keynesian Model. Algorithms 2023, 16, 280. https://doi.org/10.3390/a16060280

AMA Style

Deák S, Levine P, Pearlman J, Yang B. Reinforcement Learning in a New Keynesian Model. Algorithms. 2023; 16(6):280. https://doi.org/10.3390/a16060280

Chicago/Turabian Style

Deák, Szabolcs, Paul Levine, Joseph Pearlman, and Bo Yang. 2023. "Reinforcement Learning in a New Keynesian Model" Algorithms 16, no. 6: 280. https://doi.org/10.3390/a16060280

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reinforcement Learning in a New Keynesian Model^†

Abstract

1. Introduction

2. The Standard Behavioural NK Model

2.1. The Workhorse NK Model

2.2. The Brock–Hommes Behavioural NK Model

3. The Non-Linear NK Model

3.1. Households

3.2. Firms, Government Expenditures and Monetary Policy

3.3. Recovering the NK Workhorse Model

4. AU Learning and Market-Consistent Information

5. Heterogeneous Expectations across Agents

5.1. Exogenous Proportions of RE and BR Agents

5.2. Endogenous Proportions of RE and BR Agents with Reinforcement Learning

5.3. The Possibility of Bifurcation and Chaotic Dynamics

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Stability Analysis

Appendix B. Summary of Composite RE-BR Model

Appendix C. Balanced Growth Steady State

Appendix D. Lemma

Appendix E

Appendix F. Additional Simulated IRFs for RE-BR Composite Models

Appendix G. Robustness

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Reinforcement Learning in a New Keynesian Model †

Abstract

1. Introduction

2. The Standard Behavioural NK Model

2.1. The Workhorse NK Model

2.2. The Brock–Hommes Behavioural NK Model

3. The Non-Linear NK Model

3.1. Households

3.2. Firms, Government Expenditures and Monetary Policy

3.3. Recovering the NK Workhorse Model

4. AU Learning and Market-Consistent Information

5. Heterogeneous Expectations across Agents

5.1. Exogenous Proportions of RE and BR Agents

5.2. Endogenous Proportions of RE and BR Agents with Reinforcement Learning

5.3. The Possibility of Bifurcation and Chaotic Dynamics

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Stability Analysis

Appendix B. Summary of Composite RE-BR Model

Appendix C. Balanced Growth Steady State

Appendix D. Lemma

Appendix E

Appendix F. Additional Simulated IRFs for RE-BR Composite Models

Appendix G. Robustness

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Reinforcement Learning in a New Keynesian Model^†