1. Introduction
The description of ecosystems is often based on networks of interactions of different types. For terrestrial ecosystems, recent developments concern different types of interactions, sometimes gathered into a common model called multiplex [
1]. In marine ecology, the most studied interactions are trophic, i.e., interactions between predators. These highly complex food webs have been described by numerous models widely used to describe the impact of human activities on marine ecosystems; see [
2,
3]. They are also important tools for the sustainable management of marine and coastal environments [
4].
There are many ways to model food webs. Flows from prey to predators can be represented by nonquantified interactions, and then the model is a graph (web) with vertices (the species) and edges (the interactions). Trophic modeling for the ecosystem-based management of fisheries and other management questions is mainly based on weighted networks; see, for example, the Ecopath–Ecosim–Ecospace model [
5]. Each link corresponds to a flow of organic matter between two trophic compartments, collecting individuals of similar feeding behaviors and metabolisms and with the same predators.
In ecology, field studies are often associated with laboratory experiments to estimate some flows within food webs, but many flow values remain unknown, which the Linear Inverse Modeling (LIM) takes into account; see [
6]. LIM relies on the principle of a steady state for the biomass of all compartments, i.e., the sum of the inflows and outflows through the components of the system equals the rate of change in their standing stocks, most often considered negligible or null. This yields a set of linear equations, the Mass-Balance Equations (MBE), describing the steady state or mass balance. Then, constraints are added from field measurements of mass transfers like local estimations of primary production, respiration or diet contents, etc. Additional constraints come from experiments or the study of other ecosystems. All these constraints constitute a set of linear inequalities that impose linear combinations of flows to be between certain values. The total of the MBE and inequalities defines a bounded multidimensional polyhedron, called a polytope, within which lie all realistic solutions to the problem.
The first idea to describe the polytope was to select a single solution inside this space of possible solutions, assuming a least square of the flows; see [
7]. Then, methods were developed to describe the solution space by calculating a representative sample of all possible solutions by using the Monte Carlo approach, see [
8], and more efficiently by using the Monte Carlo Markov Chain (MCMC) approach, see [
9,
10]. Further, Linear Inverse Modeling Monte Carlo Markov Chain (LIM-MCMC) methods involve mass balanced models (the biomass of each species is assumed to be constant) that consider the uncertainty in the estimations and link them to the variability of the living; see [
6,
10]. As such, a range of possible results is provided that makes LIM-MCMC advantageous by allowing one to include uncertainties in the data; see [
11].
Still, the question remains to extract a unique solution from this simulated sample. In [
12], we proposed a numerical optimization of some pertinent goal functions that directly yield a unique functioning point of the ecosystem with a low computing time. This straightforward approach allowed us to infer properties on the stability of the selected state and hence on the evolution of the ecosystem. Precisely, the assumption is that the biomasses are the stationary state of a dynamical system determined by the constraints on the flows. Determining the stability of this fixed point informs us on how the biomasses will vary. In this sense, this approach is data-driven by the information on both the biomasses and the biological constraints.
Such an optimization may also be inferred in a probabilistic framework where the flows are all seen as random variables. MCMC thus amounts to simulating samples drawn from the probability density functions (pdfs) of the flows, while the optimization can be processed in the space of pdfs satisfying the biological constraints.
In ecology, Ecological Network Analysis (ENA) indices have been introduced in the literature as criteria of fitness to the ecosystems; see [
9,
12,
13,
14]. They are all related to Shannon information theory. Some are based on Shannon entropy while others are based on Kullback–Leibler divergence [
15] or Shannon mutual information. In this paper, we present several of these classical goal (cost) functions. Then, we consider goal functions that take into account reference distributions that may not belong to the polytope but are of interest; for instance, those resulting from previous studies of the same ecosystem. We also consider goal functions based on Rényi entropy and divergence that may fit better to particular food systems by adjusting the associated parameter. Their potential optimization opens the way to a better fit of the chosen unique solution among the MCMC sample.
2. The Constraints, the Polytope of Solutions and the Probabilistic Framework
2.1. The Constraints
We will denote by V the set of all vertices of the trophic network (directed graph), and E is the set of all oriented edges—denoted by when going from vertice i to vertice j, with .
In many situations, all biomasses of the species can be measured with accuracy. On the other hand, the flows between the species (or compartments) are much more difficult to evaluate. Therefore, we will adopt here the standard viewpoint that the biomasses are given and the flows between nodes are unknown.
Mathematically, the flow is a vector
of dimension
with non-negative components, also called flows, satisfying the Kirchhoff law on all vertices:
where the successors and predecessors of a vertex
i are
and
The flows satisfy biological constraints. For example, a fish cannot eat more than a certain percentage of its biomass. These constraints are numerous thanks to the important knowledge that has been acquired over the years on the functioning of species ecosystems and can be summarized as a system of linear equations and inequations over the flow space.
2.2. The Polytope of Solutions
The set
of solutions
F to the constraints system is a convex subset of
, where
n is the number of edges of the graph. Precisely, for any such ecosystem,
where
A and
G are matrices of sizes
and
, respectively, and
b and
h vectors of sizes
m and
k, respectively, where
m and
k are, respectively, the number of equality and inequality constraints (with
so that the system
is underdetermined).
The defined domain is the intersection of hyperplanes and half spaces and hence is a convex polyhedron. In addition, biological constraints encompass bounds on flows, so is bounded. The same is true for most alike ecosystems. Such bounded polyhedrons are called polytopes. Apart from the MBE which involve many flows in the same equation, most constraints involve a small number of flows. Therefore, matrices A and G are mainly composed of 0 and a few 1.
2.3. The Probabilistic Framework
The problem can also be presented in terms of pdfs or proportions. Let
: denote the proportion of flows from vertex (i.e., compartment)
i to vertex
j with respect to the total flow in the system, where
. The marginal distributions
and
, where
and
, represent the proportions of, respectively, the outflows and inflows of trophic compartments. The polytope of admissible flows defined by (
2) is identified as
where
and
.
Actually, can be considered, and hence MCMC can be applied in a smaller space. Let d be the rank of A, that is, the dimension of the subspace of defined by the system . One way to present this is to write , where is a particular solution in and is the orthonormal matrix obtained from A, e.g., through the well-known QR decomposition. Indeed, in this subspace, only inequality constraints remain, say with and , which define the volume delimited by the intersection of hyperplanes. MCMC methods are then applied to the latter system that is in a smaller space.
3. The Goal Functions
In ecological studies, goal functions appear either as indices of the behavior of the system or as a means of comparison between estimated solutions. Actually, they are potential tools for determining one solution from the numerous MCMC-simulated ones.
Many goal functions can be considered in relation to ecosystems. They are either statistical tools, information theory quantities or purely ecological indexes. The most classical are the mean, the quadratic energy and the so-called ENA, all quantities derived from entropy. An important characteristics is whether the goal function includes information from the constraints or not. Further, only convex (or concave) functions yield a unique optimum corresponding to a unique functioning state of the system.
The natural setting is to give all these functions in terms of the pdfs
f. Indeed, the advantage of considering pdfs for ENA indices, computed in different situations, is that different ecosystems can be compared. This makes it possible to compare different stations or different time periods. For example, Le Guen et al. [
16] compares seven spatial subsections of the Seine estuary at two time periods; they were thus able to describe the change in the functioning of the estuary’s food web, linked to the construction of the Le Havre port extension called Port2000, but also to the change in the precipitation regime that occurred over the same period. Such comparisons are essential in the search for indicators of marine ecosystem health.
3.1. Classical ENA
The most usual function is the empirical mean of all fluxes,
The quadratic energy, giving rise to the least squares method, is also classical, with
Various functions involving the concept of the entropy of distributions have been considered in the literature on ecosystems under various specific names. See [
17] and the references therein for a synthesis of these Shannon entropic indices and [
18] for details and more on information theory tools. Shannon’s definition of entropy
gave birth to information theory; see [
18,
19]. It was introduced in the field of ecological systems by [
20], through the so-called Mc Arthur index
C,
Note that the basis of the logarithm is non pertinent since only comparisons or optimization are to be considered.
Mutual information is also a classical tool in information theory. It was introduced in ecology by [
21], ans is called ascendency, defined as
The ascendency can be written as
; that is, the Kullback–Leibler divergence with respect to the product distribution
. This function is neither convex nor concave, which hinders it from being used as an optimization function; see [
12].
While
also serves to measure the degree of constraint of the system, in order to keep a balance between
A and
C, Fath [
22] prefers
for which we fail to find an interpretation in terms of pdfs.
The symmetrized conditional entropy becomes, in ecology, the system redundancy (overhead), introduced in [
23] as the negative quantity
, that is to say
Several indices drawn from overhead and ascendency have been proposed as health indicators based on marine ecosystem functioning; see [
24,
25].
Note that in the literature, these ENA indices have sometimes been multiplied by to make them depend on the total mass of the system. Then, they can only be used as tools when analyzing the system and neither for determining a flow solution to the constraints nor for comparing two systems.
3.2. Goal Function Involving a Reference
An added value for determining a pertinent solution in
is to incorporate information on the system that is not given in terms of the set of the MBE and inequalities. With this aim, the following goal functions involve some reference pdf, say
, known a priori to be informative on the behavior of the ecosystem. In particular, a solution to the problem obtained in a previous study, or a previous year, may not be a solution to the present but can be considered as a reasonable reference. Another natural way is to consider the middle of the constraint intervals, in particular to take into account the regular feeding habits of most species; see [
12].
The most classical tool in information theory is the Kullback–Leibler divergence
where
is some reference pdf of flows that makes sense in terms of the ecosystem. Note that [
17] considers it as a “structural term”.
The divergence is not a mathematical distance because it is not symmetric in
f and
. Still, it is non-negative and null only if
, and minimizing
K determines the projection in terms of the divergence of the reference
on the set
of solutions
f to the constraints; see [
18,
26].
It is worth remembering that the entropy
in Equation (
6) is, up to constants, the Kullback–Leibler divergence with respect to the uniform distribution on the flows, with
. This uniform distribution is well known to maximize the entropy when no information is a priori available.
A simple generalization of the quadratic function
Q is the well-known
-distance:
see [
12] for an application, where for both
and
, we have chosen
for each flow as the middle of the interval defined by the inequality constraints.
3.3. Rényi Goal Functions
Shannon-type entropy quantities have been generalized to many others that—to the best of our knowledge—have not yet been used in ecological networks. For a better fit of the goal function to the problem through the choice of a parameter
s, we propose considering the family of Rényi entropy, introduced in [
27], and defined for positive
by
with associated divergence:
The value of the parameter
s has to be chosen. In a first stage,
and
can illustrate the role of the regions separated by the threshold
. Note that Shannon entropy appears as the limit of Rényi entropy when
s tends to one; see [
18].
Similarly to the Shannon case, Rényi mutual information is Rényi divergence with respect to the product distribution
:
The associated quantities and may also be of use in further studies.
4. Conclusions
Classically in ecology, ENA indices are used to bring more information to an estimated model or as a tool for comparing two of them. Considering them as goal (cost) functions in [
12] opens the field to full mathematical optimization methods as soon as the convexity of the function is assessed. Adding to the collection divergence-like functions that take into account reference pdfs should lead to a better fit to a priori information on the ecosystem. Tools from extended information theory, such as Rényi’s, may also yield an optimized solution that is a better fit to certain complex systems by a pertinent choice of the parameter.
A fully determined food web model (one in which all the flows are known, mainly from the mean of the observations) is presented in [
28], which describes the eight habitats of the Sylt-Romo Bight. These food webs are composed of 56 living compartments and three nonliving compartments. In [
29], the known flows are replaced with inequalities at four decreasing levels of information block per block. Then, a unique solution is chosen within the LIM-MCMC sample by optimizing a variety of goal functions. Some technical issues of computation time regarding the R package limsolve used in this paper made the randomization difficult. This will be corrected by using a new version of limsolve updated in C++, which will appear soon; see [
30]. Then, a meaningful comparison of the goal functions on the ecosystems of [
28] at decreasing levels of information will be conducted by the authors.