Entropy as a Topological Operad Derivation

Bradley, Tai-Danae

doi:10.3390/e23091195

Open AccessArticle

Entropy as a Topological Operad Derivation

by

Tai-Danae Bradley

Sandbox@Alphabet, Mountain View, CA 94043, USA

Entropy 2021, 23(9), 1195; https://doi.org/10.3390/e23091195

Submission received: 27 July 2021 / Revised: 27 August 2021 / Accepted: 7 September 2021 / Published: 9 September 2021

Download

Browse Figures

Versions Notes

Abstract

:

We share a small connection between information theory, algebra, and topology—namely, a correspondence between Shannon entropy and derivations of the operad of topological simplices. We begin with a brief review of operads and their representations with topological simplices and the real line as the main example. We then give a general definition for a derivation of an operad in any category with values in an abelian bimodule over the operad. The main result is that Shannon entropy defines a derivation of the operad of topological simplices, and that for every derivation of this operad there exists a point at which it is given by a constant multiple of Shannon entropy. We show this is compatible with, and relies heavily on, a well-known characterization of entropy given by Faddeev in 1956 and a recent variation given by Leinster.

Keywords:

Shannon entropy; topology; operad

1. Introduction

In this article, we describe a simple connection between information theory, algebra, and topology. To motivate the idea, consider the function

d : [0, 1] \to R

defined by

d (x) = \{\begin{matrix} - x log x & if x > 0, \\ 0 & if x = 0 . \end{matrix}

This map satisfies an equation reminiscent of the Leibniz rule from Calculus,

d (x y) = d (x) y + x d (y)

for all

x, y \in [0, 1]

. In other words, d is a nonlinear derivation [1], (Lemma 2.2.6). This derivation may also bring to mind the Shannon entropy of a probability distribution. Indeed, a probability distribution on a finite set

{1, \dots, n}

for

n \geq 1

is a tuple of nonnegative real numbers

p = (p_{1}, \dots, p_{n})

satisfying

\sum_{i = 1}^{n} p_{i} = 1

, and the Shannon entropy of p is defined to be

H (p) = - \sum_{i = 1}^{n} p_{i} log p_{i} = \sum_{i = 1}^{n} d (p_{i}) .

Although d is not linear, this may prompt one to wonder about settings in which Shannon entropy itself is a derivation. We describe one such setting below by showing a correspondence between Shannon entropy and derivations of the operad of topological simplices.

1.1. Motivation

As evidenced by recent work, the intersection of information theory and algebraic topology is fertile ground. In 2015 tools of information cohomology were introduced in [2] by Baudot and Bennequin who construct a certain cochain complex for which entropy represents the unique cocycle in degree 1. In the same year, Elbaz-Vincent and Gangl approached entropy from an algebraic perspective and showed that what are known as information functions of degree 1 behave “a lot like certain derivations” [3]. A few years prior in 2011, Baez, Fritz, and Leinster gave a category theoretical characterization of entropy in [4], which was recently extended to the quantum setting by Parzygnat in [5]. In preparation of that 2011 result, Baez remarked in the informal article [6] that entropy appears to behave similarly to a derivation in a certain operadic context, an observation we verify and make explicit below. Cohomological ideas are also explored in Mainiero’s recent work, where entropy is found to appear in the Euler characteristic of a particular cochain complex associated to a quantum state [7]. Upon taking inventory, one thus has the sense that entropy behaves somewhat similar to “d of something,” for some (co)boundary-like operator

d .

The present article is in this same vein. Notably, once a few simple definitions are in place, the mathematics is quite straightforward. Even so, we feel it is worth sharing if for no other reason than to provide a glimpse at yet another algebraic and topological facet of entropy.

1.2. Background

To start, our work is based on a particular characterization of Shannon entropy that is compatible with an operadic viewpoint. Let

Δ^{n}

denote the standard topological n-simplex for

n \geq 0

,

Δ^{n} : = {(p_{0}, p_{1}, \dots, p_{n}) \in R^{n + 1} ∣ 0 \leq p_{i} \leq 1 and \sum_{i = 0}^{n} p_{i} = 1},

where

Δ^{0}

denotes the unique probability distribution on the one-point set. More generally, any probability distribution

p = (p_{0}, \dots, p_{n})

on an

n + 1

-element set is a point in

Δ^{n}

. Given

n + 1

probability distributions

q^{i} = (q_{0}^{i}, \dots, q_{k_{i}}^{i}) \in Δ^{k_{i}}

where

i = 0, 1, \dots, n

, they may be composed with p simultaneously to obtain a point in

Δ^{k_{0} + k_{1} + \dots + k_{n} + n}

denoted by

p \circ (q^{0}, q^{1}, \dots, q^{n}) : = (p_{0} q_{0}^{0}, \dots, p_{0} q_{k_{0}}^{0}, p_{1} q_{1}^{1}, \dots, p_{1} q_{k_{1}}^{1}, \dots, p_{n} q_{1}^{n}, \dots, p_{n} q_{k_{n}}^{n}) .

As shown in [1] and reviewed below, this composition of probabilities finds a natural home in the language of operads. Furthermore, it plays a key role in a well-known 1956 characterization of Shannon entropy due to D. K. Faddeev [8]. A proof of a slight variation of Faddeev’s result was recently given by Leinster [1], (Theorem 2.5.1). That is the version we quote here.

Theorem 1

(Faddeev-Leinster). Let

{F : Δ^{n} \to R}_{n \geq 0}

be a sequence of functions. The following are equivalent:

1.: the functions F are continuous and satisfy

$F (p \circ (q^{0}, \dots, q^{n})) = F (p) + \sum_{i = 0}^{n} p_{i} F (q^{i})$

(1)

where $n \geq 0$ and $p \in Δ^{n}$ and $q^{i} \in Δ^{k_{i}}$ with $k_{0}$ , $k_{1}, \dots, k_{n} \geq 0$ ;
2.: $F = c H$ for some $c \in R .$

To make the connection with derivations, let us introduce some notation. Given a probability distribution

p \in Δ^{n}

let

\bar{p} : R^{n + 1} \to R

denote the function that maps a point

x = (x_{0}, \dots, x_{n})

to the standard inner product

〈 p, x 〉 = \sum_{i = 0}^{n} p_{i} x_{i}

. Then, when

F = H

, Equation (1) may be rewritten as

H (p \circ (q^{0}, \dots, q^{n})) = H (p) + \bar{p} (H (q^{0}), \dots, H (q^{n})) .

(2)

This equation is one hint that entropy might be a derivation, although a “q” is notably absent from the first term on the right-hand side. As a further teaser, Baez explored an algebraic interpretation of Equation (2) in the informal article [6], where the reader is reminded that Shannon entropy is a derivative of the partition function of a probability distribution with respect to Boltzmann’s constant, considered as a formal parameter. In that article, Equation (2) follows in a few short lines from this computation. One is thus motivated to look for a general framework of operad derivations for which Equation (2) is an example. This is what we describe below.

Section 2 reviews the definition of operads and representations of them. We will recall that the collection of topological simplices admits the structure of an operad as in [1] and that

R

gives rise to a representation of it. In Section 3, we define an abelian bimodule M over any operad

O

and the notion of a derivation of

O

with values in M. With these definitions in place, Equation (2) will find a generalization in Proposition 1, and the main result will quickly follow.

Theorem 2.

Shannon entropy defines a derivation of the operad of topological simplices, and for every derivation of this operad there exists a point at which it is given by a constant multiple of Shannon entropy.

2. Background: Operads and Their Representations

In an introduction to operads, it is helpful to first think about algebras. An algebra A is a vector space V equipped with a bilinear map

μ : V \times V \to V

thought of as multiplication. Depending on whether

μ

satisfies a particular relation, the algebra will usually be described by an approriate qualifier. For instance, if

μ (v, w) = μ (w, v)

for all

v, w \in V

, then A is called a commutative algebra; if

μ (μ (u, v), w) = μ (u, μ (v, w))

for all

u, v, w \in V

, then A is a called an associative algebra, and so on. Behind each of these algebras is a particular operad that encodes the behavior of the multiplication map

μ

. To motivate the formal definition, it is helpful to visualize

μ

as a planar binary rooted tree and more generally to imagine an arbitrary n-ary operation as a planar rooted tree with n leaves. There is a natural way to compose such operations. For instance, when f is a 3-ary operation and g is a 4-ary operation, they may be composed to obtain a 6-ary operation by using the output of g as one of the inputs of f as illustrated in Figure 1. There g has been grafted into the second leaf of the tree associated to f, and so we denote that choice with the subscript “

\circ_{2}

” in the figure. There are two other composites

f \circ_{1} g

and

f \circ_{3} g

, which are not shown but are obtained similarly.

In general, there are n ways to compose an m-ary operation with an n-ary operation, and the resulting operation will always have arity

m + n - 1

. This composition should further satisfy some sensible associativity and unital axioms, and the collection of all such operations with their compositions is called an operad. The concept has origins in category theory [9] and has been used extensively in algebraic topology and homotopy theory [10,11,12,13,14] with applications in physics as well [15,16]. Operads may be defined in any symmetric monoidal category, and for ease of exposition below, we will assume all categories

C

are concrete (that is, all objects have underlying sets) so that we may refer to elements in a given object of

C

. Indeed, the main example to have in mind is the category of topological spaces.

Definition 1.

Let

C

be a symmetric monoidal category with monoidal product

\otimes .

. Anoperadin

C

consists of a sequence of objects

{O (1), O (2), \dots}

together with morphisms

\circ_{i} : O (n) \otimes O (m) \to O (n + m - 1)

in

C

for all

n, m \geq 1

and

1 \leq i \leq n

and an operation

1 \in O (1)

satisfying the following:

(i): [associativity] For all $p \in O (n)$ and $q \in O (m)$ and $r \in O (k)$ ,

$\begin{matrix} (p \circ_{j} q) \circ_{i} r = \{\begin{matrix} (p \circ_{i} r) \circ_{j + k - 1} q & if 1 \leq i \leq j - 1 \\ p \circ_{j} (q \circ_{i - j + 1} r) & if j \leq i \leq j + m - 1 \\ (p \circ_{i - m + 1} r) \circ_{j} q & if i \geq j + m \end{matrix} \end{matrix}$
(ii): [identity] The operation $1 \in O (1)$ acts as an identity in the sense that

$1 \circ_{1} p = p \circ_{i} 1 = p$

for all $p \in O (n)$ and $1 \leq i \leq n .$

The definition is conceptually simple despite its cumbersome appearance. For instance, Figure 2 illustrates the associativity requirements listed in item (i).

As mentioned above, one often thinks of the elements

O (n)

as abstract n-to-1 operations, and the morphisms

\circ_{i}

specify a way to compose them. It is common to begin indexing the sequence of objects at

n = 0

to account for 0-ary operations, but as we will soon see, our main example of an operad in Example 2 will have no 0-ary operations, and so our definition starts with

O (1)

. We do not consider an action of the symmetric group and so

O

is sometimes called a non-symmetric operad, but we will simply call it an operad. In the special case when

C

is the category of vector spaces with linear maps and ⊗ is the tensor product,

O

is often called a linear operad. When it is the category

Top

of topological spaces with continuous maps and ⊗ is the Cartesian product,

O

is often called a topological operad.

Example 1.

Given a set X, theendomorphism operadis

{End}_{X} = {{End}_{X} (1), {End}_{X} (2), \dots}

where

{End}_{X} (n) : = C (X^{n}, X)

denotes the set of all functions from the n-fold Cartesian product

X^{n}

to X. The unit operation in

{End}_{X} (1)

is the identity function

{id}_{X} : X \to X .

If

f \in C (X^{n}, X)

and

g \in C (X^{m}, X)

are a pair of functions, then for each

i = 1, \dots, n

the composition

f \circ_{i} g

is obtained by using the output of g as the ith input of

f .

Explicitly, given

(x_{1}, \dots, x_{n + m - 1}) \in X^{n + m - 1}

,

(f \circ_{i} g) (x_{1}, \dots, x_{n + m - 1}) : = f (x_{1}, \dots, x_{i - 1}, g (x_{i}, \dots, x_{i + m - 1}), x_{i + m}, \dots, x_{n + m - 1}) .

The simultaneous composition of several functions may also be considered. That is, given n functions

g_{i} \in C (X^{k_{i}}, X)

where

i = 1, \dots, n

they may be composed with f simultaneously to obtain a new function

f \circ (g_{1}, \dots, g_{n}) \in C (X^{k_{1} + \dots + k_{n}}, X)

, which is again defined by using the outputs of the

g_{i}

as the inputs of

f .

Explicitly, given

(x_{1}, \dots, x_{k_{1} + \dots + k_{n}}) \in X^{k_{1} + \dots + k_{n}}

, we have

(f \circ (g_{1}, \dots, g_{n})) (x_{1}, \dots, x_{k_{1} + \dots + k_{n}}) = f (g_{1} (x_{1}, \dots, x_{k_{1}}), \dots, g_{n} (x_{k_{1} + \dots + k_{n - 1} + 1}, \dots, x_{k_{1} + \dots + k_{n}}))

Example 2.

The simplices

Δ^{0}, Δ^{1}, Δ^{2}, \dots

give rise to a topological operad calledthe operad of topological simplices

Δ = {Δ_{1}, Δ_{2}, \dots}

where

Δ_{n} : = Δ^{n - 1}

. The unit operation in

Δ_{1}

is the unique probability distribution on a one-point set. If

p = (p_{1}, \dots, p_{n}) \in Δ_{n}

and

q = (q_{1}, \dots, q_{m}) \in Δ_{m}

are probability distributions, then the composition

p \circ_{i} q

is obtained by multiplying each of the m coordinates of q by

p_{i}

and then replacing the

i th

coordinate of p with the resulting m-tuple. Explicitly,

p \circ_{i} q : = (p_{1}, \dots, p_{i} q_{1}, \dots, p_{i} q_{m}, \dots, p_{n}) \in Δ_{n + m - 1} .

Equivalently, the distribution p may be visualized as a planar tree with n leaves labeled by the probabilities

p_{1}, \dots, p_{n}

and similarly for q. Then the composition

p \circ_{i} q

is obtained by “painting” each of the leaves of q with the probability

p_{i}

and grafting the resulting tree into the

i th

leaf of p as below. Notice the sum of the probabilities on the leaves on the composite tree is 1. Entropy 23 01195 i001

As an example, if

p = (\frac{1}{6}, \dots, \frac{1}{6})

represents the probability distribution of rolling a six-sided die and

q = (\frac{1}{2}, \frac{1}{2})

is that of a fair coin toss, then

p \circ_{3} q = (\frac{1}{6}, \frac{1}{6}, \frac{1}{12}, \frac{1}{12}, \frac{1}{6}, \frac{1}{6}, \frac{1}{6})

is a point in

Δ_{7}

, whose picture is shown on the left of Figure 3.

Further recall that if we have n different distributions

q^{i} = (q_{1}^{i}, \dots, q_{k_{i}}^{i}) \in Δ_{k_{i}}

where

i = 1, \dots, n

, then we may compose them with p simultaneously to obtain the following point in

Δ_{k_{1} + \dots + k_{n}},

p \circ (q^{1}, \dots, q^{n}) = (p_{1} q_{1}^{1}, \dots, p_{1} q_{k_{1}}^{1}, p_{2} q_{1}^{2}, \dots, p_{2} q_{k_{2}}^{2}, \dots, p_{n} q_{1}^{n}, \dots, p_{n} q_{k_{n}}^{n}) .

This simultaneous composition is illustrated by the tree on the right in Figure 3.

Just as groups come to life when considering representations of them, so operads come to life when each abstract n-ary operation is mapped to a concrete n-ary operation on a particular object. This assignment is traditionally called an algebra of the operad, but we prefer the more descriptive name representation.

Definition 2.

Let

O

be an operad in the category of sets. Arepresentation of

O

, or an

O

-representation, is set X together with functions

φ_{n} : O (n) \to {End}_{X} (n) for n \geq 1

that respect the operad unit and compositions. That is,

φ_{n} (1) = 1

and

φ_{n + m - 1} (p \circ_{i} q) = φ_{n} (p) \circ_{i} φ_{m} (q)

for all

p \in O (n), q \in O (m)

and

1 \leq i \leq n

.

Importantly, one may also wish to define a representation of an operad in any symmetric monoidal category

C

whenever “

{End}_{X} (n)

” is in fact an object in

C .

It must consist of an object X together with a family of morphisms

O (n) \to {End}_{X} (n)

in

C

that are compatible with the operad unit and compositions. This holds, for instance, when the monoidal category

C

is also closed—that is, when it is equipped with an internal hom functor that is compatible with the monoidal product. Monoidal closure, however, will not be required in our work, which primarily concerns the category

Top

of topological spaces. Indeed, the main example to have in mind is when

O = Δ

is the operad of simplices and

X = R

is the real line in

Top

. In this case, we define

{End}_{R} (n) : = Top (R^{n}, R)

to be the space of continuous functions

R^{n} \to R

equipped with the product topology. Now, consider the continuous maps

φ_{n} : Δ_{n} \to {End}_{R} (n)

given by

p \mapsto φ_{n} (p)

where

φ_{n} (p) (x) : = 〈 p, x 〉 = \sum_{i = 1}^{n} p_{i} x_{i}

whenever

x = (x_{1}, \dots, x_{n}) \in R^{n}

. Then, it is simple to check that

φ_{n + m - 1} (p \circ_{i} q) = φ_{n} (p) \circ_{i} φ_{m} (q)

for all

p, q,

and i and that

φ_{n} (1) = 1

for all n, and so

R

is a representation of

Δ .

3. Derivations of the Operad of Simplices

With these basic definitions in hand, the present goal is to define a mapping d out of the topological operad

Δ

that satisfies an appropriate version of the Leibniz rule,

\begin{matrix} d (p \circ_{i} q) = d p \circ_{i} q + p \circ_{i} d q (desideratum) \end{matrix}

(3)

for all

p \in Δ_{n}

and

q \in Δ_{m}

and for all

1 \leq i \leq n

. This desired equation suggests the codomain of d should be a (bi)module over

Δ

that is, moreover, an abelian monoid. This motivates the following two definitions, the first of which is a slight generalization of that given by Markl in [15].

Definition 3.

Let

O = {O (1), O (2), \dots}

be an operad in a symmetric monoidal category

C

. Abimodule over

O

, or simply an

O

-bimodule, is a collection of objects

M = {M (1), M (2), \dots}

in

C

together with morphisms

\begin{matrix} \circ_{i}^{L} & = O (n) \otimes M (m) \to M (n + m - 1) (left composition) \\ \circ_{i}^{R} & = M (n) \otimes O (m) \to M (n + m - 1) (right composition) \end{matrix}

in

C

for each

1 \leq i \leq n

such that whenever

p \otimes q \otimes r \in \{\begin{matrix} M (n) \otimes O (m) \otimes O (k), or \\ O (n) \otimes M (m) \otimes O (k), or \\ O (n) \otimes O (m) \otimes M (k) \end{matrix}

the following holds:

\begin{matrix} (p \circ_{j} q) \circ_{i} r = \{\begin{matrix} (p \circ_{i} r) \circ_{j + k - 1} q & if 1 \leq i \leq j - 1 \\ p \circ_{j} (q \circ_{i - j + 1} r) & if j \leq i \leq j + m - 1 \\ (p \circ_{i - m + 1} r) \circ_{j} q & if i \geq j + m . \end{matrix} \end{matrix}

(4)

The associativity requirements displayed in Equation (4)—and hence the intuition behind them—are completely analogous to those defining operads as illustrated in Figure 2. The only difference here is that one of the three operations may come from the bimodule rather than the operad. Here is the main example to have in mind.

Example 3.

As every algebra is a bimodule over itself, so every representation of

O

is an

O

-bimodule in a straightforward way. Indeed, in the case of the topological operad of simplices, the maps comprising the Δ-representation structure on

R

induce a Δ-bimodule structure on

{End}_{R}

. However, we will make use of a slight variant of this bimodule structure. Right composition will be defined in the expected way, though left composition will not. Explicitly, we define the left and right composition maps

\begin{matrix} \circ_{i}^{L} : Δ_{n} \times Top (R^{m}, R) & ⟶ Top (R^{n + m - 1}, R) \\ \circ_{i}^{R} : Top (R^{n}, R) \times Δ_{m} & ⟶ Top (R^{n + m - 1}, R) \end{matrix}

as follows. Given a probability distribution

p \in Δ_{n}

and a continuous function

f : R^{m} \to R

, define left composition by

p \circ_{i}^{L} f : = \bar{p} \circ (0, \dots, 0, f, 0, \dots, 0)

, where the composition on the right-hand side is defined as in the simultaneous composition in the endomorphism operad of

R

illustrated in Example 1, and where each 0 denotes the zero function

R \to R

. Here, recall that

\bar{p} : R^{n} \to R

maps a point x to the standard inner product

〈 p, x 〉

as introduced in Section 1. Unwinding this, left composition thus evaluates explicitly as

(p \circ_{i}^{L} f) (x_{1}, \dots, x_{n + m - 1}) = p_{i} f (x_{i}, \dots, x_{i + m - 1})

. In words, the value of the left composite

p \circ_{i}^{L} f : R^{n + m - 1} \to R

at a point x is computed by evaluating f at the m-subtuple of x beginning at the

i th

coordinate and scaling that output by

p_{i}

. All other coordinates of x are ignored. The picture to have in mind is that below, where the bold dots are imagined to be “plugs” that prevent the surplus coordinates from playing a role. In this picture,

n = 3

and

m = 2

.

Given a probability distribution

q \in Δ_{m}

and a continuous function

g : R^{n} \to R

, define right composition by

\begin{matrix} (g \circ_{i}^{R} q) (x_{1}, \dots, & x_{n + m - 1}) \\ : = g (x_{1}, \dots, x_{i - 1}, \sum_{k = 1}^{m} q_{k} x_{i + k - 1}, x_{i + m}, \dots, x_{n + m - 1}) . \end{matrix}

This may be understood visually as well. The value of the right composite

g \circ_{i}^{R} q : R^{n + m - 1} \to R

at a point x is computed by taking the inner product of q with the m-tuple of x beginning at the

i th

coordinate and using that number as the

i th

input of g with all other coordinates of x falling into place as in the picture below. There are no “plugs” in this instance since all coordinates of x play a role. Entropy 23 01195 i003

These examples suggest the inner product notation is a convenient choice. Given

N \geq 1

and

k \leq N

and a point

x \in R^{N}

, let

x_{i, k} \in R^{k}

denote the k-subtuple of x beginning at the

i th

coordinate:

x_{i, k} : = (x_{i}, \dots, x_{i + k - 1}) .

Then given any point

x \in R^{n + m - 1}

, the left and right composition maps may be written more succinctly as

\begin{matrix} (p \circ_{i}^{L} f) (x) & = p_{i} f (x_{i, m}) \\ (g \circ_{i}^{R} q) (x) & = g (x_{1}, \dots, x_{i - 1}, 〈 q, x_{i, m} 〉, x_{i + m}, \dots, x_{n + m - 1}) . \end{matrix}

We will use this notation below and will always write

x_{i}

in lieu of

x_{i, m}

since the context will make it clear that

x_{i}

must be an m-tuple. The boldface font is used to distinguish a tuple

x_{i}

from a real number

x_{i}

. Finally, note that the maps

\circ_{i}^{L}

and

\circ_{i}^{R}

are continuous since f and g are continuous, and moreover that the associativity requirements in Equation (4) are analogous to those illustrated in Figure 2, so it is straightforward to verify they are satisfied. In particular, the zero functions appearing in the definition of

\circ_{i}^{L}

simplify the situation greatly. For instance, several of associativity requirements follow from the simple fact that multiplying an input

x_{i}

by a probability and then mapping the result to zero is the same as first mapping the input to zero and then multiplying that zero by a probability. So

{End}_{R}

is indeed a Δ-bimodule.

Next, recall that the desired Leibniz rule in Equation (3) suggests the bimodule should be equipped with a notion of addition. This motivates the following definition.

Definition 4.

Let

O

be an operad in a symmetric monoidal category

C

. An

O

-bimodule M is anabelian

O

-bimoduleif each

M (n)

is an abelian monoid in

C

; that is, if for each

n = 1, 2, \dots

the following hold:

(i): [associativity, commutativity] there is a morphism $μ_{n} : M (n) \times M (n) \to M (n)$ in $C$ such that $μ_{n} (μ_{n} (a, b), c) = μ_{n} (a, μ_{n} (b, c))$ and $μ_{n} (a, b) = μ_{n} (b, a)$ for all $a, b, c \in M (n)$ ,
(ii): [identity] there is an element $1 \in M (n)$ such that $μ_{n} (1, a) = a = μ_{n} (a, 1)$ for all $a \in M (n)$ .

As the primary example, consider

{End}_{R}

viewed as a

Δ

-bimodule as described in Example 3. For each

n,

define

μ_{n} : {End}_{R} (n) \times {End}_{R} (n) \to {End}_{R} (n)

by pointwise addition, meaning that for each

f, g \in {End}_{R} (n)

we have

μ_{n} (f, g) = f + g

where

(f + g) (x) : = f (x) + g (x)

for all

x \in R^{n}

. The identity element in

{End}_{R} (n)

is the constant map at zero. Moreover each

μ_{n}

is continuous and inherits associativity and commutativity from

R .

In this way,

{End}_{R}

is an abelian

Δ

-bimodule.

Remark 1.

Notice that the Δ-bimodule composition maps

\circ_{i}^{L}

and

\circ_{i}^{R}

distribute over sums in the abelian Δ-bimodule

{End}_{R}

. In other words, for all continuous functions

f, g \in {End}_{R} (n)

and for all probability distributions

q \in Δ_{m}

,

\begin{matrix} (f + g) \circ_{i}^{R} q = f \circ_{i}^{R} q + g \circ_{i}^{R} q, 1 \leq i \leq n \end{matrix}

and similarly for left composition

\circ_{i}^{L}

. This follows directly from pointwise addition.

With this setup in mind, our desideratum in Equation (3) is now realized in the following definition.

Definition 5.

Let

O

be an operad in a category

C

and let M be an abelian

O

-bimodule. Aderivation of

O

valued in M is sequence of morphisms

{d_{n} : O (n) \to M (n)}

in

C

satisfying

\begin{matrix} d_{n + m - 1} (p \circ_{i} q) = d_{n} p \circ_{i}^{R} q + p \circ_{i}^{L} d_{m} q \end{matrix}

(5)

for all

p \in O (n), q \in O (m)

and for all

1 \leq i \leq n .

In the special case when

O

is a linear operad, this definition coincides with that given by Markl in [15]. In what follows, we omit the subscripts and simply write d instead of

d_{n}

. Now, suppose

O = Δ

is the operad of topological simplices and

{End}_{R}

is equipped with the structure of an abelian

Δ

-bimodule given above. Here is the picture to have in mind for Equation (5): Entropy 23 01195 i004

On the right-hand side we have used the “plug” notation introduced in Example 3, which can also be understood explicitly by evaluating d at a point

x \in R^{n + m - 1}

,

\begin{matrix} d (p \circ_{i} q) (x) & = (d p \circ_{i}^{R} q) (x) + (p \circ_{i}^{L} d q) (x) \\ = d p (x_{1}, \dots, 〈 q, x_{i} 〉, \dots, x_{n + m - 1}) + p_{i} d q (x_{i}) . \end{matrix}

Of particular interest is the behavior of a derivation

{d : Δ_{n} \to {End}_{R} (n)}

when it is applied to a simultaneous composition of probability distributions. A derivation applied to the composite

(p \circ_{j} q) \circ_{i} r

for probability distributions

p \in Δ_{n}, q \in Δ_{m}

, and

r \in Δ_{k}

can be understood in a convenient picture when q and r are composed onto different leaves of p; that is, when

1 \leq i \leq j - 1

or

i \geq j + m

. This follows straightforwardly from a repeated application of d. Indeed, by definition we have

d ((p \circ_{j} q) \circ_{i} r) = d (p \circ_{j} q) \circ_{i}^{R} r + (p \circ_{j} q) \circ_{i}^{L} d r

and by applying the Leibniz rule again to the first summand, this is equal to

(d p \circ_{j}^{R} q + p \circ_{j}^{L} d q) \circ_{i}^{R} r + (p \circ_{j} q) \circ_{i}^{L} d r,

which we can expand to obtain

(d p \circ_{j}^{R} q) \circ_{i}^{R} r + (p \circ_{j}^{L} d q) \circ_{i}^{R} r + (p \circ_{j} q) \circ_{i}^{L} d r

since composition distributes over sums as noted in Remark 1. We will identify this function with the picture below in lieu of the cumbersome notation. Entropy 23 01195 i005

Importantly, the obvious generalization of the formula holds for any simultaneous composition

p \circ (q^{1}, \dots, q^{n})

for any

p \in Δ_{n}

and

q^{i} \in Δ_{k_{i}}

where

i = 1, \dots, n

. This again follows directly from repeated applications of Equation (5), as illustrated below. Entropy 23 01195 i006

This is summarized in the following proposition.

Proposition 1.

Let

p \in Δ_{n}

and

q^{i} \in Δ_{k_{i}}

for

n, k_{1}, \dots, k_{n} \geq 1

and let

{d : Δ_{n} \to {End}_{R} (n)}

be a derivation of the operad of topological simplices. Then for any point

x \in R^{k_{1} + \dots + k_{n}}

,

d (p \circ (q^{1}, \dots, q^{n})) (x) = d p (〈 q^{1}, x_{1} 〉, \dots, 〈 q^{n}, x_{n} 〉) + \sum_{i = 1}^{n} p_{i} d q^{i} (x_{i}) .

Finally, the main result follows.

Theorem 3.

Shannon entropy defines a derivation of the operad of topological simplices, and for every derivation of this operad there exists a point at which it is given by a constant multiple of Shannon entropy.

Proof.

For each

n \geq 1

define

d : Δ_{n} \to {End}_{R} (n)

by

p \mapsto d p

where

d p (x) = H (p)

is constant for all

x \in R^{n}

. Then, d is continuous since H is continuous. Moreover, if

p = (p_{1}, \dots, p_{n}) \in Δ_{n}

and

q = (q_{1}, \dots, q_{m}) \in Δ_{m}

are probability distributions, then for any

x \in R^{m + n - 1}

and

1 \leq i \leq n

, we have

\begin{matrix} d (p \circ_{i} q) (x) = H (p \circ_{i} q) & = - (\sum_{k = 1}^{i - 1} p_{k} log p_{k} + p_{i} \sum_{k = 1}^{m} q_{k} log (p_{i} q_{k}) + \sum_{k = i + 1}^{n} p_{k} log p_{k}) \\ = - (\sum_{k = 1}^{i - 1} p_{k} log p_{k} + p_{i} log p_{i} \sum_{k = 1}^{m} q_{k} + p_{i} \sum_{k = 1}^{m} q_{k} log q_{k} + \sum_{k = i + 1}^{n} p_{k} log p_{k}) \\ = - (\sum_{k = 1}^{n} p_{k} log p_{k} + p_{i} \sum_{k = 1}^{m} q_{k} log q_{k}) \\ = H (p) + p_{i} H (q) \\ = (d p \circ_{i}^{R} q + p \circ_{i}^{L} d q) (x), \end{matrix}

where the last line follows since

(d p \circ_{i}^{R} q) (x)

is computed by evaluating the function

d p

at some point, and this function is assumed to be constant at

H (p) .

Conversely, suppose

{d : Δ_{n} \to {End}_{R} (n)}

is a derivation. For each

n \geq 1

define a function

F : Δ_{n} \to R

by

F (p) = d p (0)

where

0 = (0, \dots, 0) \in R^{n}

. Then F is continuous since d is continuous, and Proposition 1 further implies that

\begin{matrix} F (p \circ (q^{1}, \dots, q^{n})) & = d (p \circ (q^{1}, \dots, q^{n})) (0) \\ = d p (〈 q^{1}, 0_{1} 〉, \dots, 〈 q^{n}, 0_{n} 〉) + \sum_{i = 1}^{n} p_{i} d q^{i} (0_{i}) \\ = d p (0) + \sum_{i = 1}^{n} p_{i} d q^{i} (0) \\ = F (p) + \sum_{i = 1}^{n} p_{i} F (q^{i}) . \end{matrix}

From the Faddeev–Leinster result in Theorem 1, it follows that

d p (0) = F (p) = c H (p)

for some

c \in R

. □

Notice that the important Equation (2) mentioned in the introduction is obtained as a corollary. Indeed, if for each

n \geq 1

the map

d : Δ_{n} \to {End}_{R} (n)

is defined to be constant at entropy

p \mapsto d p \equiv H (p)

, then d is a derivation by Theorem 3 and so Proposition 1 yields the following by evaluating

d (p \circ (q^{1}, \dots, q^{n}))

at any point.

Corollary 1.

Let

p \in Δ_{n}

and

q^{i} \in Δ_{k_{i}}

with

1 \leq i \leq n .

Then

H (p \circ (q^{1}, \dots, q^{n})) = H (p) + \sum_{i = 1}^{n} p_{i} H (q^{i}) .

As a closing remark, Faddeev’s characterization of entropy in Theorem 1 can be reexpressed using the language of category theory and operads as in [1], (Theorem 12.3.1). We have omitted this language here but invite the reader to explore the full category theoretical story in Chapter 12 of Leinster’s book.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Acknowledgments

I thank Darij Grinberg, Joey Hirsh, Tom Leinster, Jim Stasheff, and John Terilla for helpful discussions as well as the anonymous referees for their insightful feedback.

Conflicts of Interest

The author declares no conflict of interest.

References

Leinster, T. Entropy and Diversity: The Axiomatic Approach; Cambridge University Press: Cambridge, UK, 2021. [Google Scholar]
Baudot, P.; Bennequin, D. The Homological Nature of Entropy. Entropy 2015, 17, 3253–3318. [Google Scholar] [CrossRef]
Elbaz-Vincent, P.; Gangl, H. Finite Polylogarithms, Their Multiple Analogues and the Shannon Entropy. In Lecture Notes in Computer Science; Nielsen, F., Barbaresco, F., Eds.; Geometric Science of Information. GSI 2015; Springer: Cham, Switzerland, 2015; Volume 9389, pp. 277–285. [Google Scholar]
Baez, J.C.; Fritz, T.; Leinster, T. A characterization of entropy in terms of information loss. Entropy 2011, 13, 1945–1957. [Google Scholar] [CrossRef]
Parzygnat, A.J. A functorial characterization of von Neumann entropy. arXiv 2020, arXiv:2009.07125. [Google Scholar]
Baez, J.C. Entropy as a Functor. Blog Post. 2011. Available online: https://www.ncatlab.org/johnbaez/show/Entropy+as+a+functor (accessed on 15 July 2021).
Mainiero, T. Homological Tools for the Quantum Mechanic. arXiv 2019, arXiv:1901.02011. [Google Scholar]
Faddeev, D.K. On the concept of entropy of a finite probabilistic scheme. Uspekhi Mat. Nauk 1956, 11, 227–231. (In Russian) [Google Scholar]
Lambek, J. Deductive systems and categories. In Deductive systems and categories II. Standard constructions and closed categories; Hilton, P., Ed.; Category Theory, Homology Theory and their Applications, I (Battelle Institute Conference, Seattle, 1968); Springer: Berlin/Heidelberg, Germany, 1969; Volume 68. [Google Scholar]
May, J. The Geometry of Iterated Loop Spaces. In Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 1972; Volume 271. [Google Scholar]
Boardman, J.M.; Vogt, R. Homotopy Invariant Algebraic Structures on Topological spaces. In Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 1973; Volume 347. [Google Scholar]
Loday, J.L.; Vallette, B. Algebraic Operads; Grundlehren der mathematischen Wissenschaften; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Vallette, B. Algebra + Homotopy = Operad. arXiv 2012, arXiv:1202.3245. [Google Scholar]
Stasheff, J. What is... an operad? Notices Amer. Math. Soc. 2004, 51, 630–631. [Google Scholar]
Markl, M. Models for operads. Commun. Algebra 1996, 24, 1471–1500. [Google Scholar] [CrossRef]
Markl, M.; Shnider, S.; Stasheff, J. Operads in Algebra, Topology and Physics; Mathematical Surveys and Monographs, American Mathematical Society: Providence, RI, USA, 2002. [Google Scholar]

Figure 1. One of the three ways to compose a 4-ary operation g with a 3-ary operation f.

Figure 2. Associativity in an operad. (Left) First composing q with p and then r is the same as first composing r with p and then q. The order in which this is performed does not matter. (Right) The same is true if r appears to the right, rather than the left, of

q .

(Middle) Likewise, r may first be composed with q and their composite may then be composed with p, or q may be first composed with p followed by

r .

Again, the order does not matter.

Figure 2. Associativity in an operad. (Left) First composing q with p and then r is the same as first composing r with p and then q. The order in which this is performed does not matter. (Right) The same is true if r appears to the right, rather than the left, of

q .

(Middle) Likewise, r may first be composed with q and their composite may then be composed with p, or q may be first composed with p followed by

r .

Again, the order does not matter.

Figure 3. (Left) A picture of the composition

p \circ_{3} q

when p is the probability distribution associated to a six-sided die and q is that of a fair coin toss. (Right) The simultaneous composition of n probability distributions

q^{i} \in Δ_{k_{i}}

with a given

p \in Δ_{n}

.

Figure 3. (Left) A picture of the composition

p \circ_{3} q

when p is the probability distribution associated to a six-sided die and q is that of a fair coin toss. (Right) The simultaneous composition of n probability distributions

q^{i} \in Δ_{k_{i}}

with a given

p \in Δ_{n}

.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bradley, T.-D. Entropy as a Topological Operad Derivation. Entropy 2021, 23, 1195. https://doi.org/10.3390/e23091195

AMA Style

Bradley T-D. Entropy as a Topological Operad Derivation. Entropy. 2021; 23(9):1195. https://doi.org/10.3390/e23091195

Chicago/Turabian Style

Bradley, Tai-Danae. 2021. "Entropy as a Topological Operad Derivation" Entropy 23, no. 9: 1195. https://doi.org/10.3390/e23091195

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Entropy as a Topological Operad Derivation

Abstract

1. Introduction

1.1. Motivation

1.2. Background

2. Background: Operads and Their Representations

3. Derivations of the Operad of Simplices

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI