# Transfer Entropy for Coupled Autoregressive Processes

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

_{x→y}and TE

_{y→x}is nonnegative and both will be positive (and not necessarily equal) when information flow is bi-directional. Because of these properties, transfer entropy is useful for detecting causal relationships between systems generating measurement time series. Indeed, transfer entropy has been shown to be equivalent, for Gaussian variables, to Granger causality [2]. Reasons for caution about making causal inferences in some situations using transfer entropy, however, are discussed in [3,4,5,6]. A formula for normalized transfer entropy is provided in [7].

_{n}} with its noisy measurement process Y = {y

_{n}}, and (2) a set of two mutually-coupled AR processes. Computation of transfer entropies for these systems is a worthwhile demonstration since they are simple models that admit intuitive understanding. In what follows we first show how to compute the covariance matrix for successive iterates of the example AR processes and then use these matrices to compute transfer entropy quantities based on the differential entropy expression for multivariate Gaussian random variables. Plots of transfer entropies versus various system parameters are provided to illustrate various relationships of interest.

## 2. Differential Entropy

_{i}is the probability of the i

^{th}outcome and the sum is over all possible outcomes.

_{i}= f

_{i}Δx where f

_{i}is the value of the pdf at the i

^{th}partition point and Δx is the refinement of the partition. We then obtain:

_{XY}and C

_{Y|X}, respectively, of two random variables X and Y (having dimensions n

_{x}and n

_{y}, respectively) are given by:

_{11}and Σ

_{22}have dimensions n

_{x}by n

_{x}and n

_{y}by n

_{y}, respectively. Now, using Leibniz’s formula, we have that:

^{(k)}to be the covariance of two random processes sampled at k consecutive timestamps {t

_{n−k+2}, t

_{n− k+1}, …, t

_{n}, t

_{n+1}}. We then compute transfer entropies for values of k up to k sufficiently large to ensure that their valuations do not change significantly if k is further increased. For our examples, we have found k = 10 to be more than sufficient. A discussion of the importance of considering this sufficiency is provided in [11].

## 3. Transfer Entropy Computation Using Variable Number of Timestamps

_{n}and w

_{n}are zero mean uncorrelated Gaussian noise processes having variances R and Q, respectively. For system stability, we require the model poles to lie within the unit circle.The first model is of a filtered process noise X one-way coupled to an instantaneous, but noisy measurement process Y. The second model is a two-way coupled pair of processes, X and Y.

_{n+1}determines the present state of the Y process y

_{n+1}(assuming c

_{− 1}is not zero). To fully capture the information transfer from the X process to the current state of the Y process we must identify the correct causal states [4]. For the measurement system, the causal states include the current (present) state. This state is not included in the definition of transfer entropy, being a mutual information quantity conditioned on only past states. Hence, for the purpose of this paper, we will temporarily define a quantity, “information transfer,” similar to transfer entropy, except that the present of the driving process, x

_{n+1}, will be lumped in with the past values of the X process: x

_{n−k+2}:x

_{n}. For the first general model there is no information transferred from the Y to the X process. We define the (non-zero) information transfer from the X to the Y process (based on data from k timetags) as:

^{(1)}(t

_{n}) corresponding to a single timestamp, t

_{n}, is:

_{n}and t

_{n+1}), we compute the additional expectations required to fill in the matrix C

^{(2)}(t

_{n}):

^{(2)}required to compute block entropies based on two timetags or, equivalently, one time lag. Using this matrix the single-lag transfer entropies may be computed.

^{(2)}as a block matrix and, using standard formulas, compute the conditional mean and covariance C

_{c}of ${\overline{z}}_{n+1}$ given ${\overline{z}}_{n}$:

_{c}. It is conveniently computed using the inbuilt Matlab function sqrtm. To see analytically that the recursion works, note that using it we recover at each timestamp a process having the correct mean and covariance:

^{(k)}for a variable number (k) of timestamps. Note that C

^{(k)}has dimensions of 2k × 2k. We denote 2 × 2 blocks of C

^{(k)}as C

^{(k)}

_{ij}for i, j = 1,2, ..., k , where C

^{(k)}

_{ij}is the 2-by-2 block of C

^{(k)}consisting of the four elements of C

^{(k)}that are individually located in row 2i − 1 or 2i and column 2j − 1 or 2j.

^{(3)}. Then each of these block elements is, in turn, expressed in terms of block elements of C

^{(2)}. These calculations are shown in detail below where we have also used the fact that the mean of the z

_{n}vector is zero:

^{(k−1)}to yield C

^{(k)}. The pattern consists of setting most of the augmented matrix equal to that of the previous one, and then computing two additional rows and columns for C

^{(k)}, k > 2, to fill out the remaining elements. The general expressions are:

_{n}and measurement noise v

_{n}are white zero-mean Gaussian noise processes, we may express the joint probability density function for the 2k variates as:

^{(3)}covariance was computed. The error for all C

^{(3)}matrices was then averaged, assuming that the C

^{(3)}matrix calculated using the method based on the recursive representation was the true value. The result was that for each of the matrix elements, the error was less than 0.0071% of its true value. We are now in position to compute transfer entropies for a couple of illustrative examples.

## 4. Example 1: A One-Way Coupled System

_{n}and a noisy measurement, y

_{n}, of x

_{n}. Thus the x

_{n}sequence represents a hidden process (or model) which is observable by way of another sequence, y

_{n}. We wish to examine the behavior of transfer entropy as a function of the correlation ρ between x

_{n}and y

_{n}. One might expect that the correlation ρ between x

_{n}and y

_{n}to be proportional of the degree of information flow; however, we will see that the relationship between transfer entropy and correlation is not quite that simple.

^{(1)}for x

_{n}and y

_{n}and their correlation we obtain:

^{(1)}corresponding to a single timestamp, t

_{n}is:

_{n}and t

_{n+1}) we compute the additional expectations required to fill in the matrix C

^{(2)}:

^{(2)}required to compute block entropies based on a single time lag. Using this matrix the single-lag transfer entropies may be computed. Using the recursive process described in the previous section we can compute C

^{(1}°

^{)}. We have found that using higher lags does not change the entropy values significantly.

_{y→x}= 0. Since we are here computing transfer entropy for a single lag (i.e., two time tags t

_{n}and t

_{n+1}) we have:

_{n+1}that state x

_{n+1}is a causal state of X influencing the value of y

_{n+1}. In fact, it is the most important such state. To capture the full information that is transferred from the X process to the Y process over the course of two time tags we need to include state x

_{n+1}. Hence we compute the information transfer from x → y as:

^{(2)}indicated by the list of indices i shown in the subscripted brackets. For example, $\mathrm{det}{C}_{[1:4],[1:4]}^{\left(2\right)}$ is the determinant of the matrix formed by extracting columns {1, 2, 3, 4} and rows {1, 2, 3, 4} from matrix C

^{(2)}. In later calculations we will use slightly more complicated-looking notation. For example, $\mathrm{det}{C}_{[2:2:20],[2:2:20]}^{\left(10\right)}$ is the determinant of the matrix formed by extracting columns {2, 4 ,…, 18, 20} and the same-numbered rows from matrix C

^{(1}°

^{)}. (Note C

^{(k)}

_{[i],[i]}is not the same as C

^{(k)}

_{ii}as used in Section 3).

_{n+1}= x

_{n+1}+ v

_{n+1}we see immediately that:

^{(10)}as defined above. We partition the sequence ${\left\{{\overline{z}}_{n+i}^{T}\right\}}_{i=0}^{9}=\left\{{x}_{n},{y}_{n},{x}_{n+1},{y}_{n+1},{x}_{n+2},{y}_{n+2},{x}_{n+3},{y}_{n+3},{x}_{n+4},{y}_{n+4},{x}_{n+5},{y}_{n+5},{x}_{n+6},{y}_{n+6},{x}_{n+7},{y}_{n+7},{x}_{n+8},{y}_{n+8},{x}_{n+9},{y}_{n+9}\right\}$ into three subsets:

_{c}= 1, Q = 1, and for three different values of a (0.5, 0.7 and 0.9) we vary R so as to scan the correlation ρ between the x and y processes between the values of 0 and 1.

^{(k)}

_{x→y}. As the correlation ρ between x

_{n}and y

_{n}increases from a low value the transfer entropy increases since the amount of information shared between y

_{n+1}and x

_{n}is increasing. At a critical value of ρ transfer entropy peaks and then starts to decrease. This decrease is due to the fact that at high values of ρ the measurement noise variance R is small. Hence y

_{n}becomes very close to equaling x

_{n}so that the amount of information gained (about y

_{n+1}) by learning x

_{n}, given y

_{n}, becomes small. Hence h(y

_{n+1}| y

_{n}) ‑ h(y

_{n+1}| y

_{n}, x

_{n}) is small. This difference is TE

^{(2)}

_{x→y}.

**Figure 1.**Example 1: Transfer entropy TE

^{(k)}

_{x→y}versus correlation coefficient ρ for three values of parameter a (see legend). Solid trace: k = 10, dotted trace: k = 2.

_{X}, is greater for larger values of a. Hence more information is available to be transferred at the fixed value of ρ when a is larger. In the lower half of Figure 3 we see that as ρ increases the entropy of the Y process, H

_{Y}, approaches the value of H

_{X}. This result is due to the fact that the mechanism being used to increase ρ is to decrease R. Hence as R drops close to zero y

_{n}looks increasingly identical to x

_{n}(since h

_{c}= 1).

**Figure 3.**Example 1: Process entropies H

_{X}and H

_{Y}versus correlation coefficient ρ for three values of parameter a (see legend).

^{(k)}

_{x→y}plotted versus correlation coefficient ρ. Now note that the trend is for information transfer to increase as ρ is increased over its full range of values. °

**Figure 4.**Example 1: Information transfer IT

^{(k)}

_{x→y}versus correlation coefficient ρ for three different values of parameter a (see legend) for k = 10 (solid trace) and k = 2 (dotted trace).

_{n+1}becomes increasingly correlated with x

_{n+1}. Also, for a fixed ρ, the lowest information transfer occurs for the largest value of parameter a. We obtain this result since at the higher a values x

_{n}and x

_{n+1}are more correlated. Thus the benefit of learning the value of y

_{n+1}through knowledge of x

_{n+1}is relatively reduced, given that y

_{n}(itself correlated with x

_{n}) is presumed known. Finally, we have IT

^{(10)}

_{x→y}< IT

^{(2)}

_{x→y}since conditioning the entropy quantities comprising the expression for information transfer with more state data acts to reduce their difference. Also, by comparison of Figure 2 and Figure 4, it is seen that information transfer is much greater than transfer entropy. This relationship is expected since information transfer as defined herein (for k = 2) is the amount of information that is gained about y

_{n+1}from learning x

_{n+1}and x

_{n}, given that y

_{n}is already known. Whereas transfer entropy (for k = 2) is the information gained about y

_{n+1}from learning only x

_{n}, given that y

_{n}is known. Since the state y

_{n+1}in fact equals x

_{n+1}, plus noise, learning x

_{n+1}is highly informative, especially when the noise variance is small (corresponding to high values of ρ). The difference between transfer entropy and information transfer therefore quantifies the benefit of learning x

_{n+1}, given that x

_{n}and y

_{n}are known (when the goal is to determine y

_{n+1}).

_{n+1}from knowledge of x

_{n}and x

_{n+1}less accurate. Now, for a fixed R, the greatest value for information transfer occurs for the greatest value of parameter a. This is the opposite of what we obtained for a fixed value of ρ as shown in Figure 4. The way to see the rationale for this is to note that, for a fixed value of information transfer, R is highest for the largest value of parameter a. This result is obtained since larger values of a yield the most correlation between states x

_{n}and x

_{n+1}. Hence, even though the measurement y

_{n+1}of x

_{n+1}is more corrupted by noise (due to higher R), the same information transfer is achieved nevertheless, because x

_{n}provides a good estimate of x

_{n+1}and, thus, of y

_{n+1}.

**Figure 5.**Example 1: Information transfer IT

^{(10)}

_{x→y}versus measurement error variance R for three different values of parameter a (see legend).

## 5. Example 2: Information-theoretic Analysis of Two Coupled AR Processes.

_{n}and v

_{n}are the X and Y processes noise terms respectively. Using the following definitions:

_{n}and y

_{n}to obtain:

^{(2)}of the variates obtained at two consecutive timestamps to yield:

^{(k)}; k = 3,4, …, 10 and transfer entropies. For illustration purposes, we define the parameters of the system as shown below, yielding a symmetrically coupled pair of processes. To generate a family of curves for each transfer entropy we choose a fixed coupling term ε from a set of four values. We set Q = 1000 and vary R so that ρ varies from about 0 to 1. For each ρ value we compute the transfer entropies. The relevant system equations and parameters are:

^{(2)}:

_{min}, was found by means of the inbuilt Matlab program fminbnd. This program is designed to find the minimum of a function in this case ρ(a, b, c, d, R, Q)) with respect to one parameter (in this case R) starting from an initial guess (here, R = 500). The program returns the minimum functional value (ρ

_{min}) and the value of the parameter at which the minimum is achieved (R

_{min}). After identifying R

_{min}a set of R values were computed so that the corresponding set of ρ values spanned from ρ

_{min}to the maximum ${\rho}_{\infty}$ in fixed increments of Δρ (here equal to 0.002). This set of R values was generated using the iteration:

_{n}), a term appearing in the denominator of the expression for ρ, to increase. If Q << R, a similar result is obtained when ε is increased.

**Figure 6.**Example 2: Process noise variance R versus correlation coefficient ρ for a set of ε parameter values (see figure legend).

_{x− >y}initially decrease while TE

_{y− >x}increases. Then for further increases of R, ρ reaches a minimum value then begins to increase, while TE

_{x→y}continues to decrease and TE

_{y→x}continues to increase.

**Figure 7.**Example 2: Transfer entropy values versus correlation ρ for a set of ε parameter values (see figure legend). Arrows indicate direction of increasing R values.

**Figure 8.**Example 2: Transfer entropies difference (TE

_{x− >y}– TE

_{y − > x}) and sum (TE

_{x− > y}+ TE

_{y− > x}) versus correlation ρ for a set of ε parameter values (see figure legend). Arrow indicates direction of increasing R values.

_{x→y}– TE

_{y→x}in Figure 8 we see the symmetry that arises as R increases from a low value to a high value. What is happening is that when R is low, the X process dominates the Y process so that TE

_{x→y}> TE

_{y→x}. As R increases, the two entropies equilibrate. Then, as R rises above Q, the Y process dominates giving TE

_{x→y}< TE

_{y→x}. The sum of the transfer entropies shown in Figure 8 reveal that the total information transfer is minimal at the minimum value of ρ and increases monotonically with ρ. The minimum value for ρ in this example occurs when the process noise variances Q and R are equal (matched). Figure 9 shows the changes in the transfer entropy values explicitly as a function of R. Clearly, when R is small (as compared to Q = 1000), TE

_{x→y}> TE

_{y→x}. Also it is clear that at every fixed value of R, both transfer entropies are higher at the larger values for the coupling term ε.

**Figure 9.**Example 2: Transfer entropies TE

_{x→y}and TE

_{y→x versus}process noise variance R for a set of ε parameter values (see figure legend).

**Figure 10.**Example 2: Transfer entropy TE

_{x− > y}plotted versus TE

_{y− > x}for a set of ε parameter values (see figure legend). The black diagonal line indicates locations where equality obtains. Arrow indicates direction of increasing R values.

_{y→x}increases from a value less than TE

_{x→y}to a value greater than TE

_{x→y}as R increases. Note that for higher coupling values ε this relative increase is more abrupt.

_{x}= ε

_{y}= ε for the three R values. Note that for the case R = Q the relationship is symmetric around ε = ¼. As R departs from equality more correlation between x

_{n}and y

_{n}is obtained.

**Figure 11.**Example 2: Correlation coefficient ρ vs coupling coefficient ε for a set of R values (see figure legend).

_{x‑>y}> TE

_{y‑>x}and the reverse for R > Q. (red). For R = Q, TE

_{x‑>y}= TE

_{y‑>x}(green).

**Figure 12.**Example 2: Transfer entropies TE

_{x→y}(solid lines) vs TE

_{y→x}(dashed lines) vs coupling coefficient ε for a set of R values (see figure legend).

_{1}, R

_{2}and R

_{3}for each case) such that R

_{1}< Q, R

_{2}= Q and R

_{3}= Q

^{2}/R

_{1}(so that R

_{i+1}= QR

_{i}/R

_{1}for i = 1 and i = 2) we then obtain the symmetric relationships TE

_{x ‑ >y}(R

_{1}) = TE

_{y ‑ >x}(R

_{3}) and TE

_{x ‑ >y}(R

_{3}) = TE

_{y ‑ >x}(R

_{1}) for all ε in the interval (1, ½). For these cases we also obtain ρ(R

_{1}) = ρ(R

_{3}) on the same ε interval.

## 6. Conclusions

## References

- Schreiber, T. Measuring information transfer. Phys. Rev. Lett.
**2000**, 85, 461–464. [Google Scholar] [CrossRef] [PubMed] - Barnett, L.; Barrett, A.B.; Seth, A.K. Granger causality and transfer entropy are equivalent for Gaussian variables. Phys. Rev. Lett.
**2009**, 103, 238701. [Google Scholar] [CrossRef] [PubMed] - Ay, N.; Polani, D. Information Flows in Causal Networks. Adv. Complex Syst.
**2008**, 11, 17–41. [Google Scholar] [CrossRef] - Lizier, J.T.; Prokopenko, M. Differentiating information transfer and causal effect. Eur. Phys. J. B
**2010**, 73, 605‑615. [Google Scholar] [CrossRef] - Chicharro, D.; Ledberg, A. When two become one: the limits of causality analysis of brain dynamics. PLoS One
**2012**, 7, e32466. [Google Scholar] [CrossRef] [PubMed] - Hahs, D.W.; Pethel, S.D. Distinguishing anticipation from causality: anticipatory bias in the estimation of information flow. Phys. Rev. Lett.
**2011**, 107, 128701. [Google Scholar] [CrossRef] [PubMed] - Gourevitch, B.; Eggermont, J.J. Evaluating information transfer between auditory cortical neurons. J. Neurophysiol.
**2007**, 97, 2533–2543. [Google Scholar] [CrossRef] [PubMed] - Kaiser, A.; Schreiber, T. Information transfer in continuous processes. Physica D
**2002**, 166, 43–62. [Google Scholar] [CrossRef] - Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley Series in Telecommunications, Wiley: New York, NY, USA, 1991. [Google Scholar]
- Kotz, S.; Balakrishnan, N.; Johnson, N.L. Continuous Multivariate Distributions, Models and Applications, 2nd ed.; John Wiley and Sons, Inc.: New York, NY, USA, 2000; Volume 1. [Google Scholar]
- Lizier, J.T.; Prokopenko, M.; Zomaya, A.Y. Local information transfer as a spatiotemporal filter for complex systems. Phys. Rev. E
**2008**, 77, 026110. [Google Scholar] [CrossRef] - Williams, P.L.; Beer, R.D. Nonnegative decomposition of multivariate information. 2010; arXiv:1004:2515. [Google Scholar]
- Crutchfield, J.P; Ellison, C.J.; Mahoney, J.R. Time’s barbed arrow: irreversibility, crypticity, and stored information. Phys. Rev. Lett.
**2009**, 103, 094101. [Google Scholar] [CrossRef] [PubMed]

© 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Hahs, D.W.; Pethel, S.D.
Transfer Entropy for Coupled Autoregressive Processes. *Entropy* **2013**, *15*, 767-788.
https://doi.org/10.3390/e15030767

**AMA Style**

Hahs DW, Pethel SD.
Transfer Entropy for Coupled Autoregressive Processes. *Entropy*. 2013; 15(3):767-788.
https://doi.org/10.3390/e15030767

**Chicago/Turabian Style**

Hahs, Daniel W., and Shawn D. Pethel.
2013. "Transfer Entropy for Coupled Autoregressive Processes" *Entropy* 15, no. 3: 767-788.
https://doi.org/10.3390/e15030767