Constraint Preserving Mixers for the Quantum Approximate Optimization Algorithm

Fuchs, Franz Georg; Lye, Kjetil Olsen; Møll Nilsen, Halvor; Stasik, Alexander Johannes; Sartor, Giorgio

doi:10.3390/a15060202

Open AccessArticle

Constraint Preserving Mixers for the Quantum Approximate Optimization Algorithm

SINTEF, Department of Mathematics and Cybernetics, 0373 Oslo, Norway

^*

Author to whom correspondence should be addressed.

Algorithms 2022, 15(6), 202; https://doi.org/10.3390/a15060202

Submission received: 12 May 2022 / Revised: 1 June 2022 / Accepted: 6 June 2022 / Published: 10 June 2022

(This article belongs to the Collection Feature Paper in Algorithms and Complexity Theory)

Download

Browse Figures

Versions Notes

Abstract

:

The quantum approximate optimization algorithm/quantum alternating operator ansatz (QAOA) is a heuristic to find approximate solutions of combinatorial optimization problems. Most of the literature is limited to quadratic problems without constraints. However, many practically relevant optimization problems do have (hard) constraints that need to be fulfilled. In this article, we present a framework for constructing mixing operators that restrict the evolution to a subspace of the full Hilbert space given by these constraints. We generalize the “XY”-mixer designed to preserve the subspace of “one-hot” states to the general case of subspaces given by a number of computational basis states. We expose the underlying mathematical structure which reveals more of how mixers work and how one can minimize their cost in terms of the number of CX gates, particularly when Trotterization is taken into account. Our analysis also leads to valid Trotterizations for an “XY”-mixer with fewer CX gates than is known to date. In view of practical implementations, we also describe algorithms for efficient decomposition into basis gates. Several examples of more general cases are presented and analyzed.

Keywords:

quantum algorithms

1. Introduction

The quantum approximate optimization algorithm (QAOA) [1], and its generalization, the quantum alternating operator ansatz (also abbreviated as QAOA) [2], is a meta-heuristic for solving combinatorial optimization problems that can utilize gate-based quantum computers and possibly outperform purely classical heuristic algorithms. Typical examples that can be tackled are quadratic (binary) optimization problems of the form

x^{*} = \underset{x \in {0, 1}^{n}, g (x) = 0}{\arg \min} f (x), f (x) = x^{T} Q_{f} x + c_{f}, g (x) = x^{T} Q_{g} x + c_{g}

(1)

where

Q_{f}, Q_{g} \in R^{n \times n}

are symmetric

n \times n

matrices. For binary variables

x \in {0, 1}

, any linear part can be absorbed into the diagonal of

Q_{f}

and

Q_{g}

. In this article, we focus on the case where the constraint is given by a feasible subspace as defined in the following:

Definition 1

(Constraints given by indexed computational basis states). Let

H = {(C^{2})}^{\otimes n}

be the Hilbert space for n qubits, which is spanned by all computational basis states

| z_{j} ⟩

, i.e.,

H = span {| z_{j} ⟩, 1 \leq j \leq 2^{n}, z_{j} \in {0, 1}^{n}}

. Let

B = \{| z_{j} ⟩, j \in J, z_{j} \in {0, 1}^{n}\},

(2)

be the subset of all computational basis states defined by an index set J. This corresponds to

g (x) = \prod_{j \in J} \sum_{i = 1}^{n} {(x_{i} - {(z_{j})}_{i})}^{2},

(3)

which is a quadratic constraint.

There is a well-established connection of quadratic (binary) optimization problems to Ising models, see, e.g., [3], that allows one to directly translate these problems to the QAOA. The general form of QAOA is given by

| γ, β ⟩ = U_{M} (β_{p}) U_{P} (γ_{p}) \dots U_{M} (β_{1}) U_{P} (γ_{1}) | ϕ_{0} ⟩,

(4)

where one alternates the application of phase separating and mixing operator p times. Here,

U_{P} (γ)

is a phase separating operator that depends on the objective function f. As defined in [2], the requirements for the mixing operator

U_{M} (β)

are as follows

$U_{M}$ does not commute with $U_{P}$ , i.e., $[U_{M} (β), U_{P} (γ)] \neq 0$ , for almost all $γ, β \in R$ ;
$U_{M}$ preserves the feasible subspace as given in Definition 1, i.e., $Sp (B)$ is an invariant subspace of $U_{M}$ ,

$U_{M} (β) | v ⟩ \in Sp (B), \forall | v ⟩ \in Sp (B), \forall β \in R;$

(5)
$U_{M}$ provides transitions between all pairs of feasible states, i.e., for each pair $x, y$ $\exists β^{*} \in R$ and $\exists r \in N \cap {0}$ , such that

$| ⟨ x | \underset{r times}{\underset{︸}{U_{M} (β^{*}) \dots U_{M} (β^{*})}} | y ⟩ | > 0, \forall comp . basis states | x ⟩, | y ⟩ \in B .$

(6)

If both

U_{M}

and

U_{P}

correspond to the time evolution under some Hamiltonians

H_{M}, H_{P}

, i.e.,

U_{M} = e^{- i β H_{M}}

and

U_{P} = e^{- i γ H_{P}}

, the approach can be termed “Hamiltonian-based QAOA” (H-QAOA). If the Hamiltonians

H_{M}, H_{P}

are the sum of (polynomially many) local terms, it represents a sub-class termed “local Hamiltonian-based QAOA” (LH-QAOA).

In practice, it is not possible to implement

U_{M}

or

U_{P}

directly. It is necessary to decompose the evolution into smaller pieces, which means that instead of applying

e^{- i t (H_{1} + H_{2})}

, one can only apply

e^{- i t H_{1}}

and

e^{- i t H_{2}}

. This process is typically referred to as “Trotterization”. As an example, the simplest Suzuki–Trotter decomposition, or the exponential product formula [4,5] is given by

e^{x (H_{1} + H_{2})} = e^{x H_{1}} e^{x H_{2}} + O (x^{2})

(7)

where x is a parameter and

H_{1}, H_{2}

are two operators with some commutation relation

[H_{1}, H_{2}] \neq 0

. Higher-order formulas can be found for instance in [4].

Practical algorithms need to be defined using a few operators from a universal gate set, e.g.,

{U_{3}, C X}

, where

U_{3} (θ, ϕ, λ) = (\begin{matrix} cos (θ / 2) & - e^{i λ} sin (θ / 2) \\ e^{i ϕ} sin (θ / 2) & e^{i (ϕ + λ)} cos (θ / 2) \end{matrix}), C X = (\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{matrix}) .

(8)

A good (and simple) indicator for the complexity of a quantum algorithm is given by the number of required

C X

gates. Overall, the most efficient algorithm is the one that provides the best accuracy in a given time [6].

Remark 1

(Repeated mixers). If

U_{M}

is the exponential of a Hermitian matrix, the parameter r in Equation (6) does not matter, as it can be absorbed as a re-scaling of β. However, if

U_{M}

is Trotterized, this can lead to missing transitions. In this case,

r > 1

can again provide these transitions. It is therefore suggested in [2] to repeat mixers within one mixing step. For this reason, we will consider the cost of Trotterized mixers including the necessary repetitions to provide transitions for all feasible states.

2. Related Work

The QAOA was introduced by [1] where it was applied to the Max-Cut problem. The authors in [7] compared the QAOA to the classical AKMAXSAT solver extrapolate from small instances to large instances and estimate that a quantum speed-up can be obtained with (several) hundreds of qubits. A general overview of variational quantum algorithms, including challenges and how to overcome them, is provided in [8,9]. Key challenges are that it is in general hard to find good parameters. It has been shown that the training landscapes are in general NP-hard [10]. Another obstacle is so-called barren plateaus, i.e., regions in the training landscape where the loss function is effectively constant [9]. This phenomenon can be caused by random initializations, noise, and over-expressablity of the ansatz [11,12].

Since its inception, several extensions/variants of the QAOA have been proposed. ADAPT-QAOA [13] is an iterative, problem-tailored version of QAOA that can adapt to specific hardware constraints. A non-local version, referred to as R-QAOA [14], recursively removes variables from the Hamiltonian until the remaining instance is small enough to be solved classically. Numerical evidence shows that this procedure significantly outperforms standard QAOA for frustrated Ising models on random three-regular graphs for the Max-Cut problem. WS-QAOA [15] takes into account solutions of classical algorithms to a warm-starting QAOA. Numerical evidence shows an advantage at low depth, in the form of a systematic increase in the size of the obtained cut for fully connected graphs with random weights.

There are two principal ways to take constraints into account when solving Equation (1) with the QAOA. The standard, simple approach is to penalize unsatisfied constraints in the objective function with the help of a so-called Lagrange multiplier

λ

, leading to

x^{*} = \underset{x \in {0, 1}^{n}}{\arg \min} (f (x) + λ g (x)) .

(9)

This approach is popular, since it is straightforward to define a phase-separating Hamiltonian for

f (x) + λ g (x)

. Some applications include the tail-assignment problem [16], the Max-k-cut problem [17], graph coloring problems, and the traveling sales person problem [18]. A downside of this approach is that infeasible solutions are also possible outcomes, especially for approximate solvers such as QAOA. This also makes the search space much bigger and the entire approach less efficient. In addition, the quality of the results turns out to be very sensitive to the chosen value of the hyperparameter

λ

. On one hand,

λ

should be chosen large enough such that the lowest eigenstates of

H_{P}

correspond to feasible solutions. On the other hand, too large values of

λ

mean that the resulting optimization landscape in the

γ

has very high frequencies, which makes the problem hard to solve in practice. In general, it can be very challenging to find (the problem-dependent) value for

λ

that best balances the tradeoff between optimality and feasibility in the objective function [19].

For QAOA, a second approach is to define mixers that have zero probability to go from a feasible state to an infeasible one, making the hyperparameter

λ

of the previous approach unncessary. However, it is generally more challenging to devise mixers that take into account constraints. The most prominent example in the literature is the

X Y

-mixer [2,18,19], which constrains evolution to states with nonzero overlap with “one-hot” states. One-hot states are computational basis states with exactly one entry equal to one. For instance, |0001⟩ and |010000⟩ are one-hot states, while |00⟩ and |110⟩ are not. The name

X Y

mixer comes from the related

X Y

-Hamiltonian [20]. The mixers derived in the literature follow the intuition of physicists to use “hopping” terms. A performance analysis of the XY-mixer applied to the maximum k-vertex cover shows a heavy dependence on the initial states as well as the chosen Trotterization [21].

QAOA can be viewed as a discretized version of quantum annealing. In quantum annealing, enforcing constraints via penalty terms is particularly “harmful”, since they often require all-to-all connectivity of the qubits [22]. The authors in [23] therefore introduce driver Hamiltonians that commute with the constraints of the problem. This bears similarities with and actually inspired the approaches in [2,18].

The main contributions of this article are:

A general framework to construct mixers restricted to a set of computational basis states; see Section 3.1.
An analysis of the underlying mathematical structure, which is largely independent of the actual states; see Section 3.2.
Efficient algorithms for decomposition into basis gates; see Section 3.3 and Section 3.5.
Valid Trotterizations, which is not completely understood in the literature; see Section 3.5.
We prove that it is always possible to realize a valid Trotterization; see Theorem 3.
Improved efficiency of Trotterized mixers for “one-hot” states in Section 5.1.
Discussion of the general case, exemplified in Section 5.2.

We start by describing the general framework.

3. Construction of Constraint Preserving Mixers

In the following, we will derive a general framework for mixers that are restricted to a subspace, given by certain basis states. For example, one may want to construct a mixer for five qubits that is restricted to the subspace

Sp (| 01001 ⟩, | 11001 ⟩, | 11110 ⟩)

of

C^{2^{5}}

, where

Sp (B)

denotes the linear span of B. In this section, we will describe the conditions for a Hamiltonian-based QAOA mixer to preserve the feasible subspace and for providing transitions between all pairs of feasible states. We also provide efficient algorithms to decompose these mixers into basis gates.

3.1. Conditions on the Mixer Hamiltonian

Theorem 1

(Mixer Hamiltonians for subspaces). Given a feasible subspace B as in Definition 1 and a real-valued transition matrix

T \in R^{| J | \times | J |}

. Then, for the mixer constructed via

U_{M} (β) = e^{- i β H_{M}}, where H_{M} = \sum_{j, k \in J} {(T)}_{j, k} | x_{j} ⟩ ⟨ x_{k} |,

(10)

the following statements hold.

If T is symmetric, the mixer is well defined and preserves the feasible subspace, i.e., condition (5) is fulfilled.
If T is symmetric and for all $1 \leq j, k \leq | J |$ , there exists an $r \in N \cup {0}$ (possibly depending on the pair) such that

${(T^{r})}_{j, k} \neq 0,$

(11)

then $U_{M}$ provides transitions between all pairs of feasible states, i.e., condition (6) is fulfilled.

Proof.

Well definedness. Almost trivially

H_{M}

is Hermitian if T is symmetric,

H_{M}^{†} = \sum_{j, k \in J} {(T)}_{j, k} | x_{k} ⟩ ⟨ x_{j} | = \sum_{j, k \in J} {(T)}_{k, j} | x_{k} ⟩ ⟨ x_{j} | = H_{M} .

(12)

Since

H_{M}

is a Hermitian (and therefore normal) matrix, there exists a diagonal matrix D, with the entries of the diagonal as the (real valued) eigenvalues of

H_{M}

, and a matrix U, with columns given by the corresponding orthonormal eigenvectors. The mixer is therefore well defined through the convergent series

e^{- i t H_{M}} = \sum_{m = 0}^{\infty} \frac{{(- i t)}^{m} H_{M}^{m}}{m!} = U e^{- i t D} U^{†} .

(13)

Reformulations. We can rewrite

H_{M}

in the following way

H_{M} : | y ⟩ \mapsto \sum_{j, k \in J} {(T)}_{j, k} ⟨ x_{j} | y ⟩ | x_{k} ⟩ = E T E^{T} | y ⟩, | y ⟩ \in C^{2^{n}},

(14)

where the columns of the matrix

E \in R^{2^{n} \times | J |}

consist of the feasible computational basis states, i.e.,

E = {[x_{j}]}_{j \in J}

; see Figure 1 for an illustration.

Using that

E^{T} E = I \in R^{| J | \times | J |}

is the identity matrix, we have that

H_{M}^{m} = E T^{m} E^{T} = \sum_{j, k \in J} {(T^{m})}_{j, k} | x_{j} ⟩ ⟨ x_{k} |, m \in N,

(15)

and Equation (13) can be written as

e^{- i t H_{M}} = E (\sum_{m = 0}^{\infty} \frac{{(- i t)}^{m} T^{m}}{m!}) E^{T} .

(16)

Preservation of the feasible subspace. Let

| v ⟩ \in Sp (B)

. Using Equation (15), we know that

H_{M}^{m} | v ⟩ = \sum_{j, k \in J} {(T^{m})}_{j, k} | x_{j} ⟩ ⟨ x_{k} | v ⟩ = \sum_{j \in J} c_{j} | x_{j} ⟩ \in Sp (B),

with coefficients

c_{j} \in C

. Therefore, also

e^{- i t H_{M}} | v ⟩ \in Sp (B), t \in R

, since it is a sum of these terms.

Transition between all pairs of feasible states. For any pair of feasible computational basis states

| x_{j^{*}} ⟩, | x_{k^{*}} ⟩ \in B

, we have that

\begin{matrix} f (t) & = ⟨ x_{j^{*}} | U_{M} (t) | x_{k^{*}} ⟩ = ⟨ x_{j^{*}} | \sum_{m = 0}^{\infty} (\frac{{(- i t)}^{m}}{m!} \sum_{j, k \in I} {(T^{m})}_{j, k} | x_{j} ⟩ ⟨ x_{k} |) | x_{k^{*}} ⟩ \\ = \sum_{m = 0}^{\infty} \frac{{(- i t)}^{m}}{m!} {(T^{m})}_{j^{*}, k^{*}} \end{matrix}

(17)

It is enough to show that

f (t)

is not the zero function. Since

f (t) : R \to C

is an analytic function, it has a unique extension to

C

. Assume that f is indeed the zero function on

R

; then, the extension to

C

would also be the zero function, and all coefficients of its Taylor series would be zero. However, we assumed the existence of an

r \in N \cup {0}

such that

| {(T^{r})}_{j^{*}, k^{*}} | > 0

, and hence, there exists a nonzero coefficient, which is a contradiction to f being the zero function. □

A natural question is how the statements in Theorem 1 depend on the particular ordering of the elements of B.

Corollary 1

(Independence of the ordering of B). Statements in Theorem 1 that hold for a particular ordering of computational basis states for a given B hold also for any permutation

π : {1, \dots, | J |} \to {1, \dots, | J |}

, i.e., they are independent of the ordering of elements. For each ordering, the transition matrix T changes according to

T_{π} = P_{π}^{T} T P_{π}

, where

P_{π}

is the permutation matrix associated with π.

Proof.

We start by pointing out that the inverse matrix of

P_{π}

exists and can be written as

P_{π}^{- 1} = P_{π^{- 1}} = P_{π}^{T}

.

The resulting matrix

H_{M}

is unchanged. Following the derivation in Equation (14), we have that

H_{M_{π}} = E_{π} T_{π} E_{π}^{T},

where the columns of the matrix

E \in R^{2^{n} \times | J |}

consist of the permuted feasible computational basis states, i.e.,

E_{π} = {x_{π (j)}}_{j \in J}

. Inserting

T = P_{π}^{T} T P_{π}

, we have indeed

H_{M_{π}} = E_{π} T_{π} E_{π}^{T} = (E_{π} P_{π}^{T}) T (P_{π} E_{π}^{T}) = E T E^{T} = H_{M}

.

T_{π}

is symmetric if T is. Assuming that

T^{T} = T

, we have that also

{(T_{π})}^{T} = {(P_{π}^{T} T P_{π})}^{T} = P_{π}^{T} T^{T} P_{π} = P_{π}^{T} T P_{π} = T_{π} .

If the condition in Equation (11) holds for T, then it also holds for

T_{π}

. Using

T_{π}^{r} = P_{π}^{T} T^{r} P_{π}

, we can show that Equation (11) holds for the permuted index pair

(π (j), π (k))

for

T_{π}

if it holds for

(j, k)

for T. □

In the following, if nothing else is remarked, computational basis states are ordered with respect to increasing integer value, e.g.,

| 001 ⟩, | 010 ⟩, | 111 ⟩

.

Apart from special cases, there is a lot of freedom to choose the transition matrix T that fulfills the conditions of Theorem 1. The entries of T will heavily influence the circuit complexity, which will be investigated in Section 3.3. In addition, we have the following property which adds additional flexibility to develop efficient mixers.

Corollary 2

(Properties of mixers). For a given feasible subspace

Sp (B)

, let

U_{M, B}

be the mixer given by Theorem 1. For any subspace

Sp (C)

with

Sp (B) \cap Sp (C) = {0}

or equivalently

B \cap C = ⌀

, also

U_{M} = U_{M, B} U_{M, C}

is a valid mixer for B satisfying the conditions of Equations (5) and (6); see also Figure 2.

Proof.

Any

| v ⟩ \in B

is in the null space of

H_{M, C}

, i.e.,

H_{M, C} | v ⟩ = 0

and hence

U_{M, C} | v ⟩ = I

. Therefore,

U_{M, B} U_{M, C} | v ⟩ = U_{M, B} | v ⟩ \in B

, and

U_{M, C} U_{M, B} | v ⟩ = U_{M, C} | w ⟩ = | w ⟩

with

| w ⟩ \in B

which means the feasible subspace is preserved. Condition (6) follows similarly from the fact that

U_{M, C} | v ⟩ = I

for any

| v ⟩ \in B

. □

Corollary 2 naturally holds as well for any linear combination of mixers, i.e.,

H_{M, B} + \sum_{i} a_{i} H_{M, C_{i}}

is a mixer for the feasible subspace

Sp (B)

as long as

Sp (C_{i}) \cap Sp (B) = {0}, \forall i

. At first, it might sound counterintuitive that adding more terms to the mixer results in more efficient decomposition into basis gates. However, as we will see in Section 5, it can lead to cancellations due to symmetry considerations.

Next, we describe the structure of the eigensystem of

U_{M}

.

Corollary 3

(Eigensystem of mixers). Given the setting in Theorem 1 with a symmetric transition matrix T. Let

(λ, v)

be an eigenpair of T, then

(λ, E v)

is an eigenpair of

H_{M}

and

(e^{- i t λ}, E v)

is an eigenpair of

U_{M}

, where

E = {| x_{j} ⟩}_{j \in J}

as defined in Equation (14).

Proof.

Let

(λ, v)

be an eigenpair of T. Then,

H_{M} E v = E T E^{T} E v = E T v = λ E v

, so

(λ, E v)

is an eigenpair of

H_{M}

. The connection between

H_{M}

and

U_{M}

is general knowledge from linear algebra. □

An example illustrating Corollary 3 is provided by the transition matrix

T \in R^{4 \times 4}

with zero diagonal and all other entries equal to one. A unit eigenvector of T, which fulfills Theorem 1, is

v = 1 / 2 {(1, 1, 1, 1)}^{T}

. For any

B = {| z_{1} ⟩, | z_{2} ⟩,

| z_{3} ⟩, | z_{4} ⟩}

, the uniform superpositions of these states is an eigenvector, since

\frac{1}{{∥ v ∥}_{2}} E v = \frac{1}{2} (| z_{1} ⟩, | z_{2} ⟩, | z_{3} ⟩, | z_{4} ⟩) {(1, 1, 1, 1)}^{T} = \frac{1}{2} (| z_{1} ⟩ + | z_{2} ⟩ + | z_{3} ⟩ + | z_{4} ⟩) .

This result holds irrespective of what the states are and which dimension they have.

Theorem 2

(Products of mixers for subspaces). Given the same setting as in Theorem 1. For any decomposition of T into a sum of Q symmetric matrices

T_{q}

, in the following sense

T = \sum_{q = 1}^{Q} T_{q}, {(T_{q})}_{i, j} = {(T_{q})}_{j, i} = \{\begin{matrix} either & {(T)}_{i j}, \\ or & 0, \end{matrix}

(18)

we construct the mixing operator via

U_{M} (β) = \prod_{\underset{q_{n} \in {1, 2, \dots, Q}}{n = 1}}^{N} e^{- i β T_{q_{n}}} .

(19)

If all entries of T are positive, then

U_{M}

provides transitions between all pairs of feasible states, i.e., condition (6) is fulfilled, if for all

1 \leq j, k \leq | J |

there exist

r_{m} \in N \cup {0}

(possibly depending on the pair) such that

{(\prod_{\underset{q_{m} \in Q}{m = 1}}^{M} T_{q_{m}}^{r_{m}})}_{j, k} \neq 0 .

(20)

Proof.

Combining Equations (15) and (16), we have

\begin{matrix} ⟨ x_{j} | U_{M} (β) | x_{k} ⟩ = & \sum_{j_{1} = 0, j_{2} = 0, \dots, j_{M} = 0}^{\infty} \frac{{(- i t)}^{j_{1} + j_{2} + \dots + j_{m}} {(T_{q_{1}}^{j_{1}} T_{q_{2}}^{j_{2}} \dots T_{q_{M}}^{j_{M}})}_{j, k}}{j_{1}! j_{2}! \dots j_{m}!} \\ = & \sum_{j = 1}^{\infty} \frac{{(- i t)}^{j}}{j!} \sum_{j_{1}, \dots, j_{M} s . t . (\sum_{m = 1}^{M} j_{m}) = j} {(T_{q_{1}}^{j_{1}} T_{q_{2}}^{j_{2}} \dots T_{q_{M}}^{j_{M}})}_{j, k} . \end{matrix}

(21)

Using that T only has positive entries and the condition in Equation (20), the same argument as in Theorem 1 can be used to show that

U_{M} (β)

is not the zero function, and therefore, we have transitions between all pairs of feasible states. □

As Theorem 1 leaves a lot of freedom for choosing valid transition matrices, we will continue by describing important examples for T.

3.2. Transition Matrices for Mixers

Theorem 1 provides conditions for the construction of mixer Hamiltonians that preserve the feasible subspace and provide transitions between all pairs of feasible computational basis states, namely

$T \in R^{| J | \times | J |}$ is symmetric; and
for all $1 \leq j, k \leq | J |$ there exists an $r_{j, k} \in N \cup {0}$ such that ${(T^{r})}_{j, k} \neq 0$ .

Remarkably, these conditions depend only on the dimension of the feasible subspace

| J | = d i m (Sp (B)) = | B |

, and they are independent of the specific states that constitute B. In addition, Corollary 1 shows that these conditions are robust with respect to reordering of rows if in addition columns are reordered in the same way. Moreover, Equation (17) shows also that the overlap between computational basis states

| x_{j} ⟩, | x_{k} ⟩ \in B

is independent of the specific states that B consists of and only depends on T, since the right-hand side of the expression

⟨ x_{j} | U_{M} (t) | x_{k} ⟩ = \sum_{m = 0}^{\infty} \frac{{(- i t)}^{m}}{m!} {(T^{m})}_{j, k},

(22)

is independent of the elements in B. This allows us to describe and analyze valid transition matrices by only knowing the number of feasible states, i.e.,

| B |

. What these specific states are is irrelevant, unless one wants to look at what an optimal mixer is, which we will come back to in Section 3.4. Figure 3 provides a comparison of some mixers described in the following with respect to the overlap between different states.

In the following, we denote the matrix for pairs of indices whose binary representation have a Hamming distance equal to d as

T_{Ham (d)}, with {(T_{Ham (d)})}_{i, j} = \{\begin{matrix} 1, & if d_{Hamming} (bin (i), bin (j)) = d, \\ 0, & else, \end{matrix}

(23)

Examples of the structure of

T_{Ham (d)}

can be found in Figure 4.

Furthermore, it will be useful to denote the matrix which has two nonzero entries at

(k, l)

and

(l, k)

as

T_{k \leftrightarrow l}, with {(T_{k \leftrightarrow l})}_{i, j} = \{\begin{matrix} 1, & if (i, j) = (k, l) or (i, j) = (l, k), \\ 0, & else . \end{matrix}

(24)

Before we start, we point out that the diagonal entries of T can be chosen to be zero, because

| {(T^{0})}_{j, j} | = 1 \neq 0

for all

j \in J

. Although trivial, we will repeatedly use that

v = \frac{1}{\sqrt{| J |}} {(1, 1, \dots, 1)}^{T}

is an eigenvector of a matrix

F \in C^{| J | \times | J |}

if the sum of all rows are a multiple of v.

3.2.1. Hamming Distance One Mixer $T_{Ham (1)}$

The matrix

T_{Ham (1)} \in R^{| J | \times | J |}

fulfills Theorem 1 when

| J | = 2^{n}, n \in N

. The symmetry of

T_{Ham (1)}

is due to the fact that the Hamming distance is a symmetric function. Using the identity

T_{Ham (k)} T_{Ham (1)} = T_{Ham (1)} T_{Ham (k)} = (n - (k - 1)) T_{Ham (k - 1)} + (k + 1) T_{Ham (k + 1)}

(25)

it can be shown that

T_{Ham (1)}^{k} = \sum_{j = 1}^{k - 1} c_{k} T_{Ham (j)} + k! T_{Ham (k)},

(26)

where

c_{k}

are real coefficients. Therefore, it is clear that

T_{Ham (1)}^{k}

reaches all states with Hamming distance K. Furthermore,

v = \frac{1}{\sqrt{2^{n}}} {(1, 1, \dots, 1)}^{T}

is a unit eigenvector of

T_{Ham (1)}

since the sum of each row is n. This is because there are exactly n other states with a Hamming distance of one for each bitstring.

3.2.2. All-to-All Mixer $T_{A}$

We denote the matrix with all but the diagonal entries equal to one as

T_{A}, with {(T_{A})}_{i, j} = \{\begin{matrix} 1, & if i \neq j, \\ 0, & else . \end{matrix}

(27)

Trivially,

T_{A} \in R^{| J | \times | J |}

fulfilles Theorem 1 and

v = \frac{1}{\sqrt{| J |}} {(1, 1, \dots, 1)}^{T}

is a unit eigenvector of

T_{A}

since the sum of each row is

| J | - 1

.

3.2.3. (Cyclic) Nearest Integer Mixer $T_{Δ}$ / $T_{Δ, c}$

Inspired by the stencil of finite-difference methods, we introduce

T_{Δ}

,

T_{Δ, c} \in R^{| J | \times | J |}

as matrices with off-diagonal entries equal to one

\begin{matrix} T_{Δ}, & with {(T_{Δ})}_{i, j} = \{\begin{matrix} 1, & if i = j + 1 \lor i = j - 1, \\ 0, & else, \end{matrix} \\ T_{Δ, c}, & with {(T_{Δ, c})}_{i, j} = \{\begin{matrix} 1, & if i = (j + 1) mod n \lor i = (j - 1) mod n, \\ 0, & else . \end{matrix} \end{matrix}

(28)

Both matrices fulfill Theorem 1. Symmetry holds by definition, and it is easy to see that the k-th off-diagonal of

T_{Δ}^{k}

and

T_{Δ, c}^{k}

is nonzero for

1 \leq k \leq | J |

.

For the nearest integer mixer

T_{Δ}

, it is known that

v_{k} = (sin (c), sin (2 c), \dots, sin (| J | c)), c = \frac{k π}{| J | + 1}

(29)

are eigenvectors for

1 \leq k \leq | J |

. For the cyclic nearest integer mixer, we have that the sum of each row/column of

T_{Δ, c}

is equal to two (except for

n = 1

when it is one). Therefore,

v = \frac{1}{\sqrt{| J |}} {(1, 1, \dots, 1)}^{T}

is a unit eigenvector.

3.2.4. Products of Mixers and $T_{E}, T_{O}$

In some cases, it will be necessary to use Theorem 2 to implement mixer unitaries. When splitting transitions matrices into odd and even entries, the following definition is useful. Denote the matrix with entries in the d-th off-diagonal for even rows equal to one

T_{E (d)}, with {(T_{E (d)})}_{i, j} = \{\begin{matrix} 1, & if i = j + d \lor i = j - d, and i even, \\ 0, & else, \end{matrix}

(30)

and accordingly

T_{O (d)}

for odd rows. In addition, we will use

T_{O (1), c}

to be the cyclic version in the same way as in Equation (28). As an example, this allows one to decompose

T_{Δ, c} = T_{1} + T_{2} \in R^{n \times n}

with

T_{1} = T_{O (1)} + T_{O (n - 1)} = T_{O (1), c}

and

T_{2} = T_{E (1)}

.

3.2.5. Random Mixer $T_{rand}$

Finally, the upper triangular entries of the mixer

T_{rand}

are drawn from a continuous uniform distribution on the interval

[0, 1]

, and the lower triangular entries are chosen such that T becomes symmetric. Since the probability of getting a zero entry is zero, such a random mixer fulfills Theorem 1 with probability 1.

3.3. Decomposition of (Constraint) Mixers into Basis Gates

Given a set of feasible (computational basis) states

B = \{| x_{j} ⟩, j \in J, x_{j} \in {0, 1}^{n}\}

, we can use Theorem 1 to define a suitable mixer Hamiltonian. The next question is how to (efficiently) decompose the resulting mixer into basis gates. In order to do so, we first decompose the Hamiltonian

H_{M}

into a weighted sum of Pauli-strings. A Pauli-string P is a Hermitian operator of the form

P = P_{1} \otimes \dots \otimes P_{n}

where

P_{i} \in {I, X, Y, Z}

. Pauli-strings form a basis of the real vector space of all n-qubit Hermitian operators. Therefore, we can write

H_{M} = \sum_{i_{1}, \dots, i_{n} = 1}^{4} c_{i_{1}, \dots, i_{n}} σ_{i_{1}} \otimes \dots \otimes σ_{i_{n}}, c_{i_{1}, \dots, i_{n}} \in R,

(31)

with real coefficients

c_{i_{1}, \dots, i_{n}}

, where

σ_{1} = I, σ_{2} = X, σ_{3} = Y, σ_{4} = Z

. After using a standard Trotterization scheme [4,5] (which is exact for commuting Pauli-strings),

U_{M} (t) = e^{- i t H_{M}} \approx \prod_{\underset{| c_{i_{1}, \dots, i_{n} | > 0}}{i_{1}, \dots, i_{n} = 1}}^{4} e^{- i t c_{i_{1}, \dots, i_{n}} σ_{i_{1}} \otimes \dots \otimes σ_{i_{n}}},

(32)

it is well-established how to implement each of the terms of the product using basis gates; see Equation (33). We will discuss the effects of Trotterization in more detail in Section 3.5, as there are several important aspects to consider for a valid mixer.

(33)

U_{i} = \{\begin{matrix} H, & if P_{i} = X, \\ S H, & if P_{i} = Y, \\ I, & if P_{i} = Z, \end{matrix} {(U^{†})}_{i} = \{\begin{matrix} H, & if P_{i} = X, \\ H S^{†}, & if P_{i} = Y, \\ I, & if P_{i} = Z . \end{matrix}

Here, S is the S or Phase gate and H is the Hadamard gate. The standard way to compute the coefficients

c_{i_{1}, \dots, i_{n}}

is given in Algorithm 1.

Algorithm 1: Decompose

H_{M}

given by Equation (10) into Pauli-strings via trace

For n qubits, this requires to compute

4^{n}

coefficients, as well as the multiplication of

2^{n} \times 2^{n}

matrices. However, most of these terms are expected to vanish. We therefore describe an alternative way to produce this decomposition, using the language of quantum mechanics [24]. In the following, we use the ladder operators used in the creation and annihilation operators from the second quantization formulation in quantum chemistry defined by

a^{†} = \frac{1}{2} (X - i Y), a = \frac{1}{2} (X + i Y) .

(34)

Since

a | 0 ⟩ = 0, a | 1 ⟩ = | 0 ⟩

, where 0 is the zero vector, we have that

| 0 ⟩ ⟨ 1 | = a

. Since

a^{†} | 0 ⟩ = | 1 ⟩, a^{†} | 1 ⟩ = 0

, we have that

| 1 ⟩ ⟨ 0 | = a^{†}

, and finally

a^{†} a | 0 ⟩ = 0, a^{†} a | 1 ⟩ = | 1 ⟩, a a^{†} | 0 ⟩ = | 0 ⟩, a a^{†} | 1 ⟩ = 0,

means that

| 0 ⟩ ⟨ 0 | = a a^{†}

and

| 1 ⟩ ⟨ 1 | = a^{†} a

. Note that

a^{†} a = \frac{1}{2} (I - Z), a a^{†} = \frac{1}{2} (I + Z) .

(35)

As an example, consider the matrix

M = | 01 ⟩ ⟨ 10 | = | 0 ⟩ ⟨ 1 | \otimes | 1 ⟩ ⟨ 0 |

, which can be expressed with ladder operators as

M = a_{1} a_{2}^{†}

. Another example is given by

M = | 01 ⟩ ⟨ 11 | = a_{1} a_{2}^{†} a_{2}

. This approach clearly extends to the general case and leads to Algorithm 2.

Algorithm 2: Decompose

H_{M}

given by Equation (10) into Pauli-strings directly

A comparison of the complexity of the two algorithms is given in Table 1. The naive algorithm needs to perform a matrix–matrix multiplication with matrices of size

2^{n} \times 2^{n}

for each of the

4^{n}

coefficients. This quickly becomes prohibitive for larger n. The algorithm based on ladder operators requires resources that scale with the number of nonzero entries of the transition matrix T, which is much more favorable. In the end, a symbolic mathematics library is used to simplify the expressions in order to create the list of nonzero Pauli-strings.

3.4. Optimality of Mixers

On current NISQ devices, the noise level of two-qubit gate (CX) times and error rates are one order of magnitude higher than for single qubit gates (

U_{3}

). In addition, most devices lack all-to-all connectivity. The CX gates between these require SWAP operations, which consist of additional CX gates. An optimal mixer will therefore contain as few CX gates as possible. Since Pauli-strings are implemented according to Equation (33), we define the cost to implement

e^{- i t H_{M}}

as

Cost (H_{M}) = \sum_{\begin{matrix} i_{1}, \dots, i_{n} = 1 \\ | c_{i_{1}, \dots, i_{n}} | > 0 \\ len (σ_{i_{1}} \otimes \dots \otimes σ_{i_{n}}) > 1 \end{matrix}}^{4} 2 (len (σ_{i_{1}} \otimes \dots \otimes σ_{i_{n}}) - 1),

(36)

where

len (P)

is the length of a Pauli-string P defined as the number of literals that are not the identity. For instance,

P = I X I I Y = I_{1} X_{2} I_{3} I_{4} Y_{5} = X_{2} Y_{5}

has

len (P) = 2

. The

Cost (H_{M})

specifies the number of CX gates that are required to implement the mixer. A lower cost means fewer and/or shorter Pauli-strings. There are four interconnected factors that influence the cost to implement the mixer for a given B.

3.4.1. Transition Matrix T

The larger

| B |

, the more freedom we have in choosing the transition matrix T that fulfills Theorem 1. The combination of T and the specific states of B define the cost of the Hamiltonian. Unless one can find a way to utilize the structure of the states of B to efficiently compute an optimal T, we expect this problem to be NP-hard. In practice, a careful analysis of the specific states of B is required to determine T such that the cost becomes low. We will revisit optimality for both unrestricted and restricted mixers in Section 4 and Section 5.

3.4.2. Adding Mixers

Corollary 2 allows one to add mixers with a kernel that contains

Sp (B)

. In general, also, this is a combinatorial optimization problem which we do not expect to solve exactly with an efficient algorithm. However, we will provide a heuristic that can be used to reduce the cost of mixers in certain cases. We will provide more details in Section 5 where we discuss constrained mixers on some examples in detail.

3.4.3. Non-Commuting Pauli-Strings

Depending on the mixer—which depends on the transition matrix and addition of mixers outside the feasible subspace—one can influence the commutativity pattern of the resulting Pauli-strings. This is an intricate topic, which we discuss next.

3.5. Trotterizations

Algorithms 1 and 2 produce a weighted sum of Pauli-strings equal to the mixer Hamiltonian

H_{M}

defined in Theorem 1. A further complication arises when the non-vanishing Pauli-strings of the mixer Hamiltonian

H_{M}

do not all commute. In that case, one can not realize

U_{M}

exactly but has to find a suitable approximation/Trotterization; see Equation (32). Two Pauli-strings commute, i.e.,

[P_{A}, P_{B}] = P_{A} P_{B} - P_{B} P_{A} = 0

if, and only if, they fail to commute on an even number of indices [25]. An example is given in Figure 5.

This problem is similar to a problem for observables: how does one divide the Pauli-strings into groups of commuting families [25,26] to maximize efficiency and increase accuracy? In order to minimize the number of measurements required to estimate a given observable, one wants to find a “min-commuting-partition”; given a set of Pauli-strings from a Hamiltonian, one seeks to partition the strings into commuting families such that the total number of partitions is minimized. This problem is NP-hard in general [25]. However, based on Theorem 3, we expect our problem to be much more tractable.

For our case, it turns out that not all Trotterizations are suitable as mixing operators; they can either fail to preserve the feasible subspace, i.e., Equation (5), or fail to provide transitions between all pairs of feasible states, i.e., Equation (6). An example is given by

B = {| 001 ⟩, | 010 ⟩, | 100 ⟩}

with the mixer

H_{M} = \frac{1}{2} (X I X + Y I Y) + \frac{1}{2} (X X I + Y Y I)

associated with

T_{Δ} = T_{1 \leftrightarrow 2} + T_{2 \leftrightarrow 3}

; see Section 5.1. Looking at Figure 5, these terms can be grouped into commuting families in two ways, which represent two (of many) different ways to realize the mixer unitary with basis gates.

The first possible Trotterization is given by $U_{1} (β) = e^{- i β (X X I + I X X)}$ and $U_{2} (β) = e^{- i β (Y Y I + I Y Y)}$ . However, it turns out that $\exists β \in R$ such that $| ⟨ 111 | U_{1} (β) U_{2} (β) | z ⟩ | > 0$ for all $| z ⟩ \in B$ . This means that this Trotterization does not preserve the feasible subspace and does not represent a valid mixer Hamiltonian. The underlying reason for this is that the terms $X X I$ and $Y Y I$ are generated from the entry $T_{1 \leftrightarrow 2}$ , but are split in this Trotterization. The same holds true for $I X X$ and $I Y Y$ , which are generated via $T_{2 \leftrightarrow 3}$ .
The second possible Trotterization is given by $U_{1} (β) = e^{- i β (X I X + Y I Y)}$ and $U_{2} (β) = e^{- i β (X X I + Y Y I)}$ , which splits terms with respect to $T_{1 \leftrightarrow 2}$ and $T_{2 \leftrightarrow 3}$ In this case, we have that $| ⟨ 100 | U_{1} (β) U_{2} (β) | 001 ⟩ | = 0$ , so it does not provide an overlap between all feasible computational basis states. This can be understood via Theorem 2. We have that ${(T_{1 \leftrightarrow 2}^{n_{1}} T_{2 \leftrightarrow 3}^{n_{2}})}_{3, 1} = 0$ for all $n_{1}, n_{2} \in N$ , so one can not “reach” |100⟩ from |001⟩. The opposite is not true; we have that ${(T_{1 \leftrightarrow 2} T_{2 \leftrightarrow 3})}_{1, 3} = 1$ , so $\exists β$ such that $| ⟨ 001 | U_{1} (β) U_{2} (β) | 100 ⟩ | > 0$ .

We have just learned that it is a bad idea to Trotterize terms that belong to a nonzero entry of T, i.e., to

T_{j \leftrightarrow i}

. Therefore, we need to show that all non-vanishing Pauli-strings of

| x_{j} ⟩ ⟨ x_{i} | + | x_{i} ⟩ ⟨ x_{j} |

commute; otherwise, there might exist subspaces for which we can not realize the mixer constructed in Theorem 1. Luckily, the following theorem shows that it is always possible to realize a mixer by Trotterizing according to nonzero entries of

T = \sum_{i, j \in J, i < j} T_{j \leftrightarrow i}

.

Theorem 3

(Pauli-strings for

T_{j \leftrightarrow i}

commute). Let

| z ⟩, | w ⟩

be two computational basis states in

C^{2^{n}}

. Then, all non-vanishing Pauli-strings

σ_{i_{1}} \otimes \dots \otimes σ_{i_{n}}

of the decomposition

| z ⟩ ⟨ w | + | w ⟩ ⟨ z | = \sum_{i_{1}, \dots, i_{n} = 1}^{4} c_{i_{1}, \dots, i_{n}} σ_{i_{1}} \otimes \dots \otimes σ_{i_{n}}, c_{i_{1}, \dots, i_{n}} \in R,

(37)

commute.

Proof.

We will prove the following more general assertion by induction. Let

P_{1 +, 1}, P_{1 +, 2}

be two non-vanishing Pauli-strings of the decomposition of

| z ⟩ ⟨ w | + | w ⟩ ⟨ z |

, and

P_{i -, 1}, P_{i -, 2}

be two non-vanishing Pauli-strings of the decomposition of

i (| z ⟩ ⟨ w | - | w ⟩ ⟨ z |)

. Then,

[P_{1 +, 1}, P_{1 +, 2}] = 0

,

[P_{i -, 1}, P_{i -, 2}] = 0

and

[P_{1 +, \cdot}, P_{i -, \cdot}] \neq 0

. We will use that two Pauli-strings commute if, and only if, they fail to commute on an even number of indices [25].

For

n = 1

, we have the following cases.

A_{1} = | z ⟩ ⟨ w | + | w ⟩ ⟨ z | = \{\begin{matrix} I + Z, & if (z, w) = (0, 0), \\ X, & if (z, w) = (0, 1), \\ X, & if (z, w) = (1, 0), \\ I - Z, & if (z, w) = (1, 1), \end{matrix} B_{1} = i (| z ⟩ ⟨ w | - | w ⟩ ⟨ z |) = \{\begin{matrix} 0, & if (z, w) = (0, 0), \\ - Y, & if (z, w) = (0, 1), \\ + Y, & if (z, w) = (1, 0), \\ 0, & if (z, w) = (1, 1) . \end{matrix}

(38)

It is trivially true that

[P_{1 +, 1}, P_{1 +, 2}] = 0

and

[P_{i -, 1}, P_{i -, 2}] = 0

, since the maximum number of Pauli-strings is two, and in that case, one of the Pauli-strings is the identity. Moreover,

P_{i -, \cdot}

is nonzero only when

z \neq w

. In that case,

[P_{1 +, \cdot}, P_{i -, \cdot}] = [X, \pm Y] \neq 0

.

n \to n + 1

. We assume the assumptions hold for two computational basis states

| z ⟩, | w ⟩ \in C^{2^{n}}

. Then, there are the following four cases

\begin{matrix} A_{n + 1} = | w x ⟩ ⟨ z y | + | w x ⟩ ⟨ z y | = & \frac{1}{2} \{\begin{matrix} A_{n} \otimes (I + Z), & if (x, y) = (0, 0), \\ A_{n} \otimes X + B_{n} \otimes Y, & if (x, y) = (0, 1), \\ A_{n} \otimes X - B_{n} \otimes Y, & if (x, y) = (1, 0), \\ A_{n} \otimes (I - Z), & if (x, y) = (1, 1), \end{matrix} \\ B_{n + 1} = i (| w x ⟩ ⟨ z y | - | w x ⟩ ⟨ z y |) = & \frac{1}{2} \{\begin{matrix} 0, & if (x, y) = (0, 0), \\ B_{n} \otimes X - A_{n} \otimes Y, & if (x, y) = (0, 1), \\ B_{n} \otimes X + A_{n} \otimes Y, & if (x, y) = (1, 0), \\ 0, & if (x, y) = (1, 1), \end{matrix} \end{matrix}

(39)

where

A_{n} = (| z ⟩ ⟨ w | + | w ⟩ ⟨ z |), B_{n} = i (| z ⟩ ⟨ w | - | w ⟩ ⟨ z |)

.

Case

x = y

. According to our assumptions that all non-vanishing Pauli-strings for

A_{n}

commute, the same holds for

A_{n + 1} = A_{n} \otimes (I \pm Z)

. Since

B_{n + 1} = 0

, the rest of the assertions are trivially true, as there are no non-vanishing Pauli-strings.

Case

x \neq y

. Our assumptions mean that non-vanishing Pauli-strings of

A_{n}

fail to commute on an odd number of indices with non-vanishing Pauli-strings of

B_{n}

. Therefore, non-vanishing Pauli-strings of

A_{n + 1} = A_{n} \otimes X \pm B_{n} \otimes Y

fail to commute on an even number of indices, and, hence, commute. The same argument holds for

B_{n + 1} = B_{n} \otimes X \pm A_{n} \otimes Y

. Finally, we prove that non-vanishing Pauli-strings of

A_{n + 1}

and

B_{n + 1}

do not commute. Either Pauli strings

P_{1 +, \cdot}

and

P_{i -, \cdot}

stem from

A_{n} \otimes X

and

B_{n} \otimes X

, respectively, or they stem from

\pm A_{n} \otimes Y

and

\mp B_{n} \otimes Y

, respectively. In both cases, the number of commuting terms does not change, so non-vanishing Pauli-strings of

A_{n + 1}

and

B_{n + 1}

do not commute. □

The proof in Theorem 3 inspires the following algorithm to decompose

H_{M}

into Pauli-strings. For each item in the list S that the algorithm produces, all Pauli-strings commute.

We can illustrate the difference between Algorithms 2 and 3 for

B = {| 01 ⟩, | 10 ⟩}

and

T_{1 \leftrightarrow 2}

. With Algorithm 2, we have

P_{1, 2} = \frac{1}{4} (X - i Y) (X + i Y)

and

P_{2, 1} = \frac{1}{4} (X + i Y) (X - i Y)

, which can be simplified to

S = P_{1, 2} + P_{2, 1} = \frac{1}{2} (X X + Y Y)

. With Algorithm 3, we have

A_{1} = X, B_{1} = Y

and

S = A_{2} = \frac{1}{2} (A_{1} X + B_{1} Y) = \frac{1}{2} (X X + Y Y)

without the need to simplify the expression.

Algorithm 3: Decompose

H_{M}

given by Equation (10) into Pauli-strings directly

As shown above, Trotterizations can also lead to missing transitions. It is suggested in [2] that it is useful to repeat mixers within one mixing step, which corresponds to

r > 1

in Equation (6). However, as we see in Figure 6, there can be more efficient ways to obtain mixers that provide transitions between all pairs of feasible states. One way to do so is to construct an exact Trotterization (restricted to the feasible subspace) as described in [19]. However, the ultimate goal is not to avoid Trotterization errors, but rather to provide transitions between all pairs of feasible states. We will revisit the topic of Trotterizations in Section 5 in more detail for each case and show that there are more efficient ways to do so.

4. Full/Unrestricted Mixer

We start by applying the proposed algorithm to the case without constraints, i.e., for the case

g = 0

in Equation (1), in order to check for consistency and new insight. We will see that the presented approach is able to reproduce the “standard” X mixer as one possibility, but it provides a more general framework. For this case,

B = {| x_{j} ⟩, j \in J, x_{j} \in {0, 1}^{n}}

,

J = {i, 1 \leq i \leq 2^{n}}

, which means that

Sp (B) = H

. Furthermore, using Equation (14), we have that

H_{M, B} = T

, since E is the identity.

4.1. $T_{Ham (1)}$ aka “Standard” Full Mixer

The Hamiltonian of the standard full mixer for n qubits can be written as

\begin{matrix} H_{M} & = \sum_{j \in J} X_{j} \\ = \sum_{j \in J} {(| 0 ⟩ ⟨ 0 | + | 1 ⟩ ⟨ 1 |)}^{\otimes (j - 1)} \otimes (| 0 ⟩ ⟨ 1 | + | 1 ⟩ ⟨ 0 |) \otimes {(| 0 ⟩ ⟨ 0 | + | 1 ⟩ ⟨ 1 |)}^{\otimes (n - j)} \\ = \sum_{j, k \in J} {(T_{Ham (1)})}_{j, k} | x_{j} ⟩ ⟨ x_{k} | . \end{matrix}

(40)

The last identity in Equation (40) shows that

H_{M}

is created by the transition matrix given by

T_{Ham (1)}

. This assumes that the feasible states in B are ordered from the smallest to the largest integer representation.

4.2. All-to-All Full Mixer

For

| J | = 2^{n}

, the full mixer

T_{A}

can be written as

T_{A} = \sum_{j = 1}^{n} T_{Ham (j)}

. For the case

T_{Ham (2)}

, the resulting Hamiltonian

H_{M}

does not provide transitions between all pairs of feasible states, but we observe that

H_{M} = \sum_{j, k \in J} {(T_{Ham (2)})}_{j, k} | x_{j} ⟩ ⟨ x_{k} | = \sum_{j_{1}}^{n} \sum_{j_{2} = j_{1} + 1}^{n} X_{j_{1}} X_{j_{2}}

, i.e.,

H_{M}

consists of all

(\begin{matrix} n \\ 2 \end{matrix})

possible pairs of Pauli-strings, which contain exactly two Xs. For

m \leq n

, this can be further generalized to

H_{M} = \sum_{j, k \in J} {(T_{Ham (m)})}_{j, k} | x_{j} ⟩ ⟨ x_{k} | = \sum_{j_{1} = 1}^{n} \sum_{j_{2} = j_{1} + 1}^{n} \dots \sum_{j_{m} = j_{m - 1} + 1}^{n} X_{j_{1}} \dots X_{j_{m}},

(41)

which consists of all

(\begin{matrix} n \\ m \end{matrix})

possible pairs of Pauli-strings with exactly m Xs. The resulting mixer Hamiltonian is therefore given by

H_{M} = \sum_{j_{1} = 1}^{n} (X_{j_{1}} + \sum_{j_{2} = j_{1} + 1}^{n} (X_{j_{1}} X_{j_{2}} + \dots + \sum_{j_{n} = j_{n - 1} + 1}^{n} X_{j_{1}} \dots X_{j_{n}})) .

(42)

This means that the mixer consists of the standard mixer plus

(\begin{matrix} n \\ k \end{matrix})

applications of Pauli X-k strings for k from 2 to n, which is a large overhead compared to the standard X-mixer.

4.3. (Cyclic) Nearest Integer Full Mixer

The resulting mixer for

T_{Δ}

/

T_{Δ, c}

involves exponentially many Pauli-strings with increasing n. The following shows

T_{Δ}

for

1 \leq n \leq 4

.

\begin{matrix} H_{M}^{n = 1} & = X_{1}, \\ H_{M}^{n = 2} & = I_{1} \otimes H_{M}^{n = 1} + \frac{1}{2} (X_{1} X_{2} + Y_{1} Y_{2}), \\ H_{M}^{n = 3} & = I_{1} \otimes H_{M}^{n = 2} + \frac{1}{4} (X_{1} X_{2} X_{3} - X_{1} Y_{2} Y_{3} + Y_{1} X_{2} Y_{3} + Y_{1} Y_{2} X_{3}), \\ H_{M}^{n = 4} & = I_{1} \otimes H_{M}^{n = 3} + \frac{1}{8} (X_{1} X_{2} X_{3} X_{4} - X_{1} X_{2} Y_{3} Y_{4} - X_{1} Y_{2} X_{3} Y_{4} - X_{1} Y_{2} Y_{3} X_{4} \\ + Y_{1} X_{2} X_{3} Y_{4} + Y_{1} X_{2} Y_{3} X_{4} + Y_{1} Y_{2} X_{3} X_{4} - Y_{1} Y_{2} Y_{3} Y_{4}) . \end{matrix}

(43)

4.4. Comparison and Optimality of Full Mixers

It would be convenient to have a condition on the transition matrix for the optimality of the resulting mixer. We define the total Hamming distance of T to be

Ham (T) = \sum_{\begin{matrix} j, k = 1 \\ {| (T)}_{j, k} | > 0 \end{matrix}}^{| J |} d_{Hamming} (bin (i), bin (j)),

(44)

where

b i n (i)

is the binary representation of an integer i. As a first instinct, one might suspect that the mixer with minimal Hamming distance also minimizes the cost. However, this turns out to be false because of cancellations when more terms in T are nonzero. Table 2 gives a comparison of the total Hamming distance and cost for different full mixers. The standard full mixer has a total Hamming distance

Ham (T) = (n - 1) 2^{n}

, as there are

2^{n}

states each with

n - 1

states that have a Hamming distance of 1. The all-to-all full mixer has

Ham (T) = 2^{n} \sum_{k = 1}^{n} k (\begin{matrix} n \\ k \end{matrix})

. For the rest of the transition matrices, it is not that straightforward to derive a general formula for

H a m (T)

, but the table gives an impression. Table 2 shows a dramatic difference between the different mixers with regard to resource requirements. The standard mixer is the only one that does not require CX gates and is the most efficient to implement. Furthermore, as the resulting Pauli terms for the full mixers given by

T_{H a m (1)}

and

T_{A}

consist only of I and X and therefore commute, they can be implemented without Trotterization. For the mixers given by

T_{Δ, c}, T_{Δ}

, and

T_{r a n d}

, on the other hand, not all Pauli-strings commute, which results in the need for Trotterization. We continue with the case of constrained mixers.

5. Constrained Mixers

We start by describing what is known as the “XY”-mixer [2,17,19] before we explore more general cases. Our framework provides additional insights into this case and inspires further improvement of the algorithms above with respect to the optimality of the mixers, described in Section 3.4, by (possibly) reducing the length of Pauli-strings. For this case, we will analyze

T_{A}, T_{Δ}

, and

T_{Δ, c}

only.

T_{H a m (1)}

only makes sense when n is a power of two, and

T_{rand}

has in general high cost; see Table 2.

5.1. “One-Hot” Aka “XY”-Mixer

We are concerned with the case given by all computational basis states with exactly one “1”, i.e.,

B = {| x ⟩, x_{j} \in {0, 1}^{n}, s . t . \sum x_{j} = 1}

. These states are sometimes referred to as “one-hot”. We have that

n = | B |

is the number of qubits. After some motivating examples, we present the general case for constructing mixers for any

n > 2

.

5.1.1. Case $n = 2$

The smallest, non-trivial case is given by

B = {| 01 ⟩, | 10 ⟩}

. For any

b \in R, | b | > 0

, the transition matrix

T = (\begin{matrix} d & b \\ b & d \end{matrix})

fulfills Theorem 1 and leads to the mixer

H_{M} = \frac{b}{2} (X X + Y Y) + \frac{d}{2} (I I - Z Z)

. Since we want to minimize

Cost (H_{M})

given in Equation (36), we set

d = 0

, which results in the

Cost (H_{M}) = 4

. However, by using Corollary 2, there is room for further reducing the cost. We can add the mixer for

C = {| 00 ⟩, | 11 ⟩}

, since

B \cap C = ⌀

. Using the same T (setting

d = 0

) gives

H_{M, B} + H_{M, C} = b X X,

(45)

which has

Cost (H_{M}) = 2

. No Trotterization is needed in this case.

5.1.2. Case $n = 3$

We continue with

B = {| 001 ⟩, | 010 ⟩, | 100 ⟩}

. For the transition matrix

T = a T_{1 \leftrightarrow 2} + b T_{2 \leftrightarrow 3} + c T_{1 \leftrightarrow 3}, a, b, c \in R

, this results in the mixer

H_{M, B} = \frac{a}{4} (X X + Y Y) (I + Z) + \frac{b}{4} (I + Z) (X X + Y Y) + \frac{c}{4} (X (I + Z) X + Y (I + Z) Y),

(46)

with associated cost

Cost (H_{M}) = 24 + 12 c

for

a = b = 1, c \in {0, 1}

. In this case, Corollary 2 allows us to add the mixer for

C = {| 110 ⟩, | 101 ⟩, | 011 ⟩}

since

B \cap C = ⌀

. The mixer

H_{M} = H_{M, B} + H_{M, C} = \frac{a}{2} (X X I + Y Y I) + \frac{b}{2} (X I X + Y I Y) + \frac{c}{2} (I X X + I Y Y),

(47)

has cost

Cost (H_{M}) = 8 + 4 c

for

a = b = 1, c \in {0, 1}

. However, this mixer can not be realized, since not all terms of

H_{M}

commute. Figure 5 shows two ways to put the graph into commuting Pauli-terms with only one way to preserve the feasible subspace, as discussed in Section 3.5. For the Trotterization according to

T = T_{1 \leftrightarrow 2}^{k_{1}} + T_{2 \leftrightarrow 3}^{k_{2}}

, we have that

{(T)}_{3, 1} = {(T)}_{1, 3} = 0, \forall k_{1}, k_{2} \in N

. To fulfill Theorem 2, we need to include the term

T_{1 \leftrightarrow 3}

as well. The Trotterized mixer with minimal cost is therefore given by

T = T_{1 \leftrightarrow 2} + T_{2 \leftrightarrow 3} + T_{1 \leftrightarrow 3}

.

5.1.3. The General Case $n > 2$

We start with the observation that for any symmetric

T \in R^{n \times n}

with zero diagonal, we have

H_{M, B} = \sum_{j = 1}^{n} \sum_{k = j + 1}^{n} {(T)}_{j, k} {\hat{P}}^{j, k}, {({\hat{P}}^{j, k})}_{l} = \frac{1}{2^{n}} \{\begin{matrix} (X + Y), & if l \in {j, k}, \\ (I + Z), & if x_{l} = z_{l} = 0 \end{matrix}

(48)

The cost for implementing one of the entries, i.e.,

e^{- i β {\hat{P}}^{j, k}}

is given by the recursive formula

Cost ({\hat{P}}^{j, k}) = \sum_{l = 2}^{n} 2 (l - 1) f_{n}^{l}, n > 2, f_{n}^{l} = f_{n - 1}^{l} + f_{n - 1}^{l - 1}, f_{2}^{l} = \{\begin{matrix} 2, & if l = 2, \\ 0, & else, \end{matrix}

(49)

where

f_{n}^{l}

is Pascal’s triangle starting with 2 instead of 1. Examples of the resulting costs for different transition matrices can be seen in Table 3.

The cost of the mixers can be considerably reduced by adding mixers generalized from case

n = 3

. If the entries

{(T)}_{i \leftrightarrow j}

of T are nonzero, we can add mixers for each of the

2^{n - 2}

pairs of states

x \in {0, 1}^{n}

that fulfill that

(x_{i} = 0 \land x_{j} = 1) \lor (x_{i} = 1 \land x_{j} = 0)

. We can enumerate them with

0 \leq l \leq 2^{n - 2} - 1

by

{\tilde{B}}_{i, j}^{l} = {| x ⟩, x \in {0, 1}^{n}, s . t . x_{- i, - j} = b i n (l)}

, where

x_{- i, - j}

removes the indices i and j of x. We have that

B \cap {\tilde{B}}_{i, j}^{l} = ⌀

. We observe that for

n \geq 2

, let

| x ⟩, | z ⟩

with

H a m (x, z) = 2

, i.e., the strings

x, z

differ at exactly two positions we have that

| x ⟩ ⟨ z | + | z ⟩ ⟨ x | = \frac{1}{2^{n - 1}} \{\begin{matrix} (X + Y), & if x_{l} \neq z_{l}, \\ (I + Z), & if x_{l} = z_{l} = 0, \\ (I - Z), & if x_{l} = z_{l} = 1 . \end{matrix}

(50)

Adding these mixers for each nonzero entry

T_{j \leftrightarrow k}

of T has the effect of summing over all possible combinations of

{(I \pm Z)}^{\otimes 2^{n - 2}}

, which is equal to the identity. Therefore, we obtain the mixer

H_{M, B} + \sum_{i, j \in J} \sum_{l = 0}^{2^{n - 2}} H_{M, B_{i, j}^{l}} = \sum_{j = 1}^{n} \sum_{k = j + 1}^{n} {(T)}_{j, k} P^{j, k}, P^{j, k} = X_{i} X_{j} + Y_{i} Y_{j},

(51)

which reduces the cost of one term to

Cost (P^{j, k}) = 4

.

5.1.4. Trotterizations

Not all Pauli-strings of the mixer in Equation (51) commute. This necessitates a suitable and efficient Trotterization. We will use Theorems 2 and 3 to identify valid Trotterized mixers. As pointed out in [19], when n is a power of two, one can realize a Trotterization, which is exact in the feasible subspace B. Termed simultaneous complete-graph mixer, this involves all possible pairs

(i, j)

corresponding to a certain Trotterization of mixer for

T_{A}

. We will see that there are more efficient mixers that provide transitions between all pairs of feasible states.

Another possibility is to Trotterize

T_{Δ, c}

or

T_{Δ}

according to odd and even entries as described in Section 3.2.4. This is what is termed a parity-partitioned mixer in [19]. However, fewer and fewer feasible states can be reached as n increases, as we have seen in Figure 6. Repeated applications (

r > 0

in Equation (6)) are necessary, and r increases with increasing n. Figure 7 shows a comparison of different Trotterizations. As the cost of the mixer is dictated by the number of nonzero entries of the transition matrix, it is more efficient to add mixers for off-diagonals according to

\sum_{i \in I} (T_{O (i)} + T_{E (i)})

for some suitable index set I.

5.2. General Cases

In this section, we analyze some specific cases that go beyond unrestricted mixers and mixers restricted to one-hot states.

5.2.1. Example 1

We start by looking at the case

B = {| 100 ⟩, | 010 ⟩, | 011 ⟩}

. Using

T_{Δ} = c_{1, 2} T_{1 \leftrightarrow 2} + c_{2, 3} T_{2 \leftrightarrow 3}

and

T_{Δ, c} = T_{Δ} + c_{3, 1} T_{3 \leftrightarrow 1}

, this results in the mixer

\begin{matrix} H_{M, B} = & c_{1, 2} \frac{1}{4} (X X + Y Y) (I + Z) \\ + c_{2, 3} \frac{1}{4} (I + Z) (I - Z) X \\ + c_{3, 1} \frac{1}{4} (X X X + Y X Y + Y Y X - X Y Y) \end{matrix}

(52)

with

Cost (H_{M}) = 12 c_{1, 2} + 8 c_{2, 3} + 16 c_{3, 1}

. Here,

(c_{1, 2}, c_{2, 3}, c_{3, 1}) = (1, 1, 0)

corresponds to

T_{Δ}

and

(1, 1, 1)

to

T_{Δ, c}

. There is a lot of freedom adding mixers, which is summarized in Table 4. Adding more terms only increases the cost for this case. Overall, the most efficient mixers for B are given by

H_{M} = \frac{c_{1, 2}}{2} [\begin{matrix} X X I + X X Z or \\ X X I + Y Y Z \end{matrix}] + \frac{c_{2, 3}}{2} [\begin{matrix} (I + Z) I X or \\ I (I - Z) X \end{matrix}] + \frac{c_{3, 1}}{2} [\begin{matrix} X X X - X Y Y or \\ X X X + Y X Y \end{matrix}],

(53)

with associated cost

Cost (H_{M}) = 6 c_{1, 2} + 2 c_{2, 3} + 8 c_{3, 1}

. A valid Trotterization is given through splitting according to

T_{i \leftrightarrow j}

.

5.2.2. Example 2

Finally, we investigate the case

B = {| 10010 ⟩, | 01110 ⟩, | 10011 ⟩, | 11101 ⟩, | 00110 ⟩, | 01010 ⟩}

, which restricts to six of the total

2^{5} = 32

computational basis states for 5 qubits. It is not clear a priori if for any (distinct) pair

T_{i_{1}, j_{1}}

and

T_{i_{2}, j_{2}}

, all pairs of non-vanishing Pauli strings commute. In order to fulfill Equation (6) for

r = 1

, this means that one needs to Trotterize according to all pairs of

T_{A}

, as shown in Table 5. The resulting cost for this Trotterized mixer is

Cost (H_{M}) = 1360

. Since

H \cap B

is spanned by

k = 2^{n} - | B | = 26

computational basis states, there are

(\begin{matrix} k \\ 2 \end{matrix}) = 325

different pairs to add to each

T_{i \leftrightarrow j}

. As Table 5 shows, this can reduce the cost of the resulting mixer to

Cost (H_{M}) = 568

. Of course, there is the possibility to reduce the cost even further by adding more mixers for states in the kernel of

H_{M, B}

. However, this quickly becomes computationally very demanding, when all possibilities are considered in a brute-force fashion.

6. Conclusions and Outlook

While designing mixers with the presented framework is more or less straightforward, designing efficient mixers turns out to be a difficult task. An additional difficulty arises due to the need for Trotterization. Somewhat counter-intuitively, the more restricted the mixer, i.e., the smaller the subspace, the more design freedom one has to increase efficiency. More structure/symmetry of the restricted subspace seems to allow for a lower cost of the resulting mixer. For the case of “one-hot” states, we provide a deeper understanding of the requirements for Trotterizations. Compared to the state of the art in the literature, this leads to a considerable reduction of the cost of the mixer, as defined in Equation (36). The introduced framework reveals a rigorous mathematical analysis of the underlying structure of mixer Hamiltonians and deepens the understanding of those. We believe the framework can serve as the backbone for the further development of efficient mixers.

When adding mixers, in general, the kernel of

H_{M, B}

is spanned by

k = 2^{n} - | B |

computational basis states. Therefore, one can add

\sum_{i = 2}^{k} (\begin{matrix} k \\ 2 \end{matrix})

(54)

different mixers for each nonzero entry

T_{i \leftrightarrow j}

of T. Out of all these, one wants to find the combination leading to the lowest overall cost. Clearly, brute-force optimization is computationally not tractable, even for a moderate number of qubits n when

| B | ≪ 2^{n}

. Further research should aim to carefully analyze the structure of the basis states in B in order to develop efficient (heuristic) algorithms to find low-cost mixers through adding mixers in the kernel of

H_{M, B}

.

Author Contributions

Conceptualization, F.G.F., K.O.L., H.M.N., A.J.S. and G.S.; software, F.G.F.; formal analysis, F.G.F.; data curation, F.G.F.; writing—original draft preparation, F.G.F.; writing—review and editing, F.G.F., K.O.L., H.M.N., A.J.S. and G.S.; visualization, F.G.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data and the python/jupyter notebook source code for reproducing the results obtained in this article are available at https://github.com/OpenQuantumComputing as of 1 June 2022.

Conflicts of Interest

The authors declare no conflict of interest.

References

Farhi, E.; Goldstone, J.; Gutmann, S. A quantum approximate optimization algorithm. arXiv 2014, arXiv:1411.4028. [Google Scholar]
Hadfield, S.; Wang, Z.; O’Gorman, B.; Rieffel, E.G.; Venturelli, D.; Biswas, R. From the quantum approximate optimization algorithm to a quantum alternating operator ansatz. Algorithms 2019, 12, 34. [Google Scholar] [CrossRef]
Lucas, A. Ising formulations of many NP problems. Front. Phys. 2014, 2, 5. [Google Scholar] [CrossRef]
Hatano, N.; Suzuki, M. Finding exponential product formulas of higher orders. In Quantum Annealing and Other Optimization Methods; Springer: Berlin/Heidelberg, Germany, 2005; pp. 37–68. [Google Scholar]
Trotter, H.F. On the product of semi-groups of operators. Proc. Am. Math. Soc. 1959, 10, 545–551. [Google Scholar] [CrossRef]
Kronsjö, L. Algorithms: Their Complexity and Efficiency; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1987. [Google Scholar]
Guerreschi, G.G.; Matsuura, A. QAOA for Max-Cut requires hundreds of qubits for quantum speed-up. Sci. Rep. 2019, 9, 6903. [Google Scholar] [CrossRef]
Cerezo, M.; Arrasmith, A.; Babbush, R.; Benjamin, S.C.; Endo, S.; Fujii, K.; McClean, J.R.; Mitarai, K.; Yuan, X.; Cincio, L.; et al. Variational quantum algorithms. Nat. Rev. Phys. 2021, 3, 625–644. [Google Scholar] [CrossRef]
Moll, N.; Barkoutsos, P.; Bishop, L.S.; Chow, J.M.; Cross, A.; Egger, D.J.; Filipp, S.; Fuhrer, A.; Gambetta, J.M.; Ganzhorn, M.; et al. Quantum optimization using variational algorithms on near-term quantum devices. Quantum Sci. Technol. 2018, 3, 030503. [Google Scholar] [CrossRef]
Bittel, L.; Kliesch, M. Training variational quantum algorithms is NP-hard—Even for logarithmically many qubits and free fermionic systems. Phys. Rev. Lett. 2021, 127, 120502. [Google Scholar] [CrossRef]
Wang, S.; Fontana, E.; Cerezo, M.; Sharma, K.; Sone, A.; Cincio, L.; Coles, P.J. Noise-induced barren plateaus in variational quantum algorithms. Nat. Commun. 2021, 12, 1–11. [Google Scholar] [CrossRef]
Zhang, H.K.; Zhu, C.; Liu, G.; Wang, X. Fundamental limitations on optimization in variational quantum algorithms. arXiv 2022, arXiv:2205.05056. [Google Scholar]
Zhu, L.; Tang, H.L.; Barron, G.S.; Mayhall, N.J.; Barnes, E.; Economou, S.E. An adaptive quantum approximate optimization algorithm for solving combinatorial problems on a quantum computer. arXiv 2020, arXiv:2005.10258. [Google Scholar]
Bravyi, S.; Kliesch, A.; Koenig, R.; Tang, E. Obstacles to state preparation and variational optimization from symmetry protection. arXiv 2019, arXiv:1910.08980. [Google Scholar]
Egger, D.J.; Marecek, J.; Woerner, S. Warm-starting quantum optimization. arXiv 2020, arXiv:2009.10095. [Google Scholar] [CrossRef]
Vikstål, P.; Grönkvist, M.; Svensson, M.; Andersson, M.; Johansson, G.; Ferrini, G. Applying the Quantum Approximate Optimization Algorithm to the Tail-Assignment Problem. Phys. Rev. Appl. 2020, 14, 034009. [Google Scholar] [CrossRef]
Fuchs, F.G.; Kolden, H.; Aase, N.H.; Sartor, G. Efficient Encoding of the Weighted MAX k-CUT on a Quantum Computer Using QAOA. SN Comput. Sci. 2021, 2, 1–14. [Google Scholar] [CrossRef]
Hadfield, S.; Wang, Z.; Rieffel, E.G.; O’Gorman, B.; Venturelli, D.; Biswas, R. Quantum approximate optimization with hard and soft constraints. In Proceedings of the Second International Workshop on Post Moores Era Supercomputing, Denver, CO, USA, 12–17 November 2017; pp. 15–21. [Google Scholar]
Wang, Z.; Rubin, N.C.; Dominy, J.M.; Rieffel, E.G. XY mixers: Analytical and numerical results for the quantum alternating operator ansatz. Phys. Rev. A 2020, 101, 012320. [Google Scholar] [CrossRef]
Lieb, E.; Schultz, T.; Mattis, D. Two soluble models of an antiferromagnetic chain. Ann. Phys. 1961, 16, 407–466. [Google Scholar] [CrossRef]
Cook, J.; Eidenbenz, S.; Bärtschi, A. The quantum alternating operator ansatz on maximum k-vertex cover. In Proceedings of the 2020 IEEE International Conference on Quantum Computing and Engineering (QCE), Broomfield, CO, USA, 12–16 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 83–92. [Google Scholar]
Hen, I.; Sarandy, M.S. Driver Hamiltonians for constrained optimization in quantum annealing. Phys. Rev. A 2016, 93, 062312. [Google Scholar] [CrossRef]
Hen, I.; Spedalieri, F.M. Quantum annealing for constrained optimization. Phys. Rev. Appl. 2016, 5, 034007. [Google Scholar] [CrossRef]
Sakurai, J.J. Advanced Quantum Mechanics; Pearson: Upper Saddle River, NJ, USA, January 1967. [Google Scholar]
Gokhale, P.; Angiuli, O.; Ding, Y.; Gui, K.; Tomesh, T.; Suchara, M.; Martonosi, M.; Chong, F.T. Minimizing state preparations in variational quantum eigensolver by partitioning into commuting families. arXiv 2019, arXiv:1907.13623. [Google Scholar]
Gui, K.; Tomesh, T.; Gokhale, P.; Shi, Y.; Chong, F.T.; Martonosi, M.; Suchara, M. Term grouping and travelling salesperson for digital quantum simulation. arXiv 2020, arXiv:2001.05983. [Google Scholar]

Figure 1. Illustration of properties of Hamiltonians constructed with Theorem 1.

Figure 2. Corollary 2 shows that adding a mixer with support outside

Sp (B)

is also a valid mixer for B.

Figure 2. Corollary 2 shows that adding a mixer with support outside

Sp (B)

is also a valid mixer for B.

Figure 3. Examples of the squared overlap between two states for the case

| B | = 4

. The squared overlap is independent of what the states in

B = {| z_{0} ⟩, | z_{1} ⟩, | z_{2} ⟩, | z_{3} ⟩}

are. The comparison for different T shows that there exists a

β

such that the overlap is nonzero, except for

T_{2 \leftrightarrow 3}

which, as expected, does not provide transitions between

| z_{0} ⟩

and

| z_{3} ⟩

.

Figure 3. Examples of the squared overlap between two states for the case

| B | = 4

. The squared overlap is independent of what the states in

B = {| z_{0} ⟩, | z_{1} ⟩, | z_{2} ⟩, | z_{3} ⟩}

are. The comparison for different T shows that there exists a

β

such that the overlap is nonzero, except for

T_{2 \leftrightarrow 3}

which, as expected, does not provide transitions between

| z_{0} ⟩

and

| z_{3} ⟩

.

Figure 4. Examples of the structure of

T_{Ham (d)}

. The black color represents non-vanishing entries equal to one, representing pairs with the specified Hamming distance.

Figure 4. Examples of the structure of

T_{Ham (d)}

. The black color represents non-vanishing entries equal to one, representing pairs with the specified Hamming distance.

Figure 5. In the commutation graph (middle) of the terms of the mixer given in Equation (47), an edge occurs if the terms commute. From this, we can group terms into three (nodes connected by green edge) or two (nodes connected by red/blue edges) sets. Only the left/green grouping preserves the feasible subspace, the right one does not.

Figure 6. Valid (white) and invalid (black) transitions between pairs of states, as defined in Theorem 2 for Trotterized mixer Hamiltonians. The first row shows that for

T_{1} = T_{O (1), c}

and

T_{2} = T_{E (1)}

, the mixer

U = e^{- i β T_{1}} e^{- i β T_{2}}

does not provide transitions between all pairs of feasible states, although

U = e^{- i β (T_{1} + T_{2})}

does.

Figure 6. Valid (white) and invalid (black) transitions between pairs of states, as defined in Theorem 2 for Trotterized mixer Hamiltonians. The first row shows that for

T_{1} = T_{O (1), c}

and

T_{2} = T_{E (1)}

, the mixer

U = e^{- i β T_{1}} e^{- i β T_{2}}

does not provide transitions between all pairs of feasible states, although

U = e^{- i β (T_{1} + T_{2})}

does.

Figure 7. Comparison of different Trotterization mixers restricted to “one-hot” states. All markers represent cases when the resulting mixer provides transitions for all pairs of feasible states; see also Figure 6. All versions can be implemented in linear depth. The most efficient Trotterizations are achieved by using sub-diagonal entries. The cost equals 4 times # (XX + YY)-terms.

Table 1. Comparison of the complexity of the two algorithms for n qubits. Here,

γ

is the number of nonzero entries of T.

Table 1. Comparison of the complexity of the two algorithms for n qubits. Here,

γ

is the number of nonzero entries of T.

	Algorithm 1	Algorithm 2	Algorithm 3
runtime	$O (2^{5 n})$	$O (n γ)$	$O (n γ)$
memory	$O (2^{2 n})$	$O (n γ)$	$O (n γ)$

Table 2. Full/unrestricted mixer case for n qubits, i.e.,

| B | = 2^{n}

. Comparison of the total Hamming distance of the transition matrix T as well as resulting requirements for implementations in terms of single- and two-qubit gates for different T.

Table 2. Full/unrestricted mixer case for n qubits, i.e.,

| B | = 2^{n}

. Comparison of the total Hamming distance of the transition matrix T as well as resulting requirements for implementations in terms of single- and two-qubit gates for different T.

n	1	2	3	4	5	6	1	2	3	4	5	6	1	2	3	4	5	6
	Ham $(T)$						# $U_{3}$						#CX $= Cost (H_{M})$
$T_{H a m (1)}$	2	8	24	64	160	384	1	2	3	4	5	6	0	0	0	0	0	0
$T_{A}$	2	16	96	512	2560	12,288	1	2	3	4	5	6	0	2	10	34	98	258
$T_{Δ, c}$	2	12	28	60	124	252	1	1	1	1	1	1	0	2	12	44	132	356
$T_{Δ}$	2	8	22	52	114	240	1	1	1	1	1	1	0	4	20	68	196	516
$T_{r a n d}$	2	16	96	512	2560	12,288	2	4	6	8	10	12	0	10	86	552	3260	17,650

Table 3. Comparison of the cost #CX

= Cost (H_{M})

of mixers constrained to “one-hot” states. The Trotterized versions we define

T_{1} = T_{O (1), c}

and

T_{2} = T_{E (1)}

. All Hamiltonians need to be Trotterized.

Table 3. Comparison of the cost #CX

= Cost (H_{M})

of mixers constrained to “one-hot” states. The Trotterized versions we define

T_{1} = T_{O (1), c}

and

T_{2} = T_{E (1)}

. All Hamiltonians need to be Trotterized.

n	3	4	5	6	7	8	9	10	15
$H_{M, B}$
$T_{Δ}$	12 · 2	32 · 3	80 · 44	192 · 54	448 · 64	1024 · 74	2304 · 84	5120 · 94	245,760 · 144
$T_{Δ, c}$	12 · 3	32 · 4	80 · 54	192 · 64	448 · 74	1024 · 84	2304 · 94	5120 · 10	245,760 · 154
$T_{A}$	12 · 3	32 · 6	80 · 10	192 · 15	448 · 21	1024 · 28	2304 · 36	5120 · 45	245,760 · 105
$H_{M, B} + \sum_{i} H_{M, C_{i}}$
$T_{Δ}$	4 · 2	4 · 3	4 · 44	4 · 54	4 · 64	4 · 74	4 · 84	4 · 94	4 · 144
$T_{Δ, c}$	4 · 3	4 · 4	4 · 54	4 · 64	4 · 74	4 · 84	4 · 94	4 · 10	4 · 154
$T_{A}$	4 · 3	4 · 6	4 · 10	4 · 15	4 · 21	4 · 28	4 · 36	4 · 45	4 · 105

Table 4. Comparison of the

Cost (H_{M, B} + H_{M, C})

for different added mixers C for the case

B = {| 100 ⟩, | 010 ⟩, | 011 ⟩}

. All 10 possible pairs are shown. We see that the cost can be both reduced and increased.

Table 4. Comparison of the

Cost (H_{M, B} + H_{M, C})

for different added mixers C for the case

B = {| 100 ⟩, | 010 ⟩, | 011 ⟩}

. All 10 possible pairs are shown. We see that the cost can be both reduced and increased.

$C =$	$T_{1 \leftrightarrow 2}$	$T_{2 \leftrightarrow 3}$	$T_{3 \leftrightarrow 1}$
${}$	12	8	16
${\| 000 ⟩, \| 001 ⟩}$	20	2	24
${\| 000 ⟩, \| 101 ⟩}$	24	20	28
${\| 000 ⟩, \| 110 ⟩}$	6	20	28
${\| 000 ⟩, \| 111 ⟩}$	28	24	8
${\| 001 ⟩, \| 101 ⟩}$	20	16	24
${\| 001 ⟩, \| 110 ⟩}$	28	24	8
${\| 001 ⟩, \| 111 ⟩}$	6	20	28
${\| 101 ⟩, \| 110 ⟩}$	24	20	28
${\| 101 ⟩, \| 111 ⟩}$	20	16	24
${\| 110 ⟩, \| 111 ⟩}$	20	2	24

Table 5. Comparison of the

Cost (H_{M, B} + H_{M, C})

for different added mixers C for the case

B = {| 10010 ⟩, | 01110 ⟩, | 10011 ⟩, | 11101 ⟩, | 00110 ⟩, | 01010 ⟩}

. There are 325 possible pairs in total. We see that the cost can be both reduced and increased.

Table 5. Comparison of the

Cost (H_{M, B} + H_{M, C})

for different added mixers C for the case

B = {| 10010 ⟩, | 01110 ⟩, | 10011 ⟩, | 11101 ⟩, | 00110 ⟩, | 01010 ⟩}

. There are 325 possible pairs in total. We see that the cost can be both reduced and increased.

$C =$	$T_{1 \leftrightarrow 2}$	$T_{1 \leftrightarrow 3}$	$T_{1 \leftrightarrow 4}$	$T_{1 \leftrightarrow 5}$	$T_{1 \leftrightarrow 6}$	$T_{2 \leftrightarrow 3}$	$T_{2 \leftrightarrow 4}$	$T_{2 \leftrightarrow 5}$	$T_{2 \leftrightarrow 6}$	$T_{3 \leftrightarrow 4}$	$T_{3 \leftrightarrow 5}$	$T_{3 \leftrightarrow 6}$	$T_{4 \leftrightarrow 5}$	$T_{4 \leftrightarrow 6}$	$T_{5 \leftrightarrow 6}$
${}$	96	64	112	80	80	112	96	64	64	96	96	96	112	112	80
${\| 00010 ⟩, \| 00011 ⟩}$	160	24	176	144	144	176	160	128	128	160	160	160	176	176	144
${\| 00010 ⟩, \| 01101 ⟩}$	208	176	48	192	192	224	208	176	176	208	208	208	224	224	192
${\| 00010 ⟩, \| 10001 ⟩}$	192	160	208	176	176	208	48	160	160	192	192	192	208	208	176
${\| 00010 ⟩, \| 10101 ⟩}$	208	176	224	192	192	224	208	176	176	208	208	208	224	48	192
${\| 00010 ⟩, \| 11001 ⟩}$	208	176	224	192	192	224	208	176	176	208	208	208	48	224	192
${\| 00011 ⟩, \| 01101 ⟩}$	192	160	208	176	176	208	192	160	160	40	192	192	208	208	176
${\| 00011 ⟩, \| 10000 ⟩}$	192	160	208	176	176	208	48	160	160	192	192	192	208	208	176
${\| 00100 ⟩, \| 01000 ⟩}$	176	144	192	160	160	192	176	144	144	176	176	176	192	192	32
${\| 00100 ⟩, \| 01100 ⟩}$	160	128	176	144	144	176	160	24	128	160	160	160	176	176	144
${\| 00100 ⟩, \| 10000 ⟩}$	176	144	192	32	160	192	176	144	144	176	176	176	192	192	160
${\| 00100 ⟩, \| 10001 ⟩}$	192	160	208	176	176	208	192	160	160	192	40	192	208	208	176
${\| 00100 ⟩, \| 10111 ⟩}$	192	160	208	176	176	208	48	160	160	192	192	192	208	208	176
${\| 00101 ⟩, \| 10110 ⟩}$	192	160	208	176	176	208	48	160	160	192	192	192	208	208	176
${\| 00111 ⟩, \| 01011 ⟩}$	176	144	192	160	160	192	176	144	144	176	176	176	192	192	32
${\| 00111 ⟩, \| 01111 ⟩}$	160	128	176	144	144	176	160	24	128	160	160	160	176	176	144
${\| 00111 ⟩, \| 10100 ⟩}$	192	160	208	176	176	208	48	160	160	192	192	192	208	208	176
${\| 01000 ⟩, \| 01100 ⟩}$	160	128	176	144	144	176	160	128	24	160	160	160	176	176	144
${\| 01000 ⟩, \| 10000 ⟩}$	176	144	192	160	32	192	176	144	144	176	176	176	192	192	160
${\| 01000 ⟩, \| 10001 ⟩}$	192	160	208	176	176	208	192	160	160	192	192	40	208	208	176
${\| 01000 ⟩, \| 11011 ⟩}$	192	160	208	176	176	208	48	160	160	192	192	192	208	208	176
${\| 01001 ⟩, \| 11010 ⟩}$	192	160	208	176	176	208	48	160	160	192	192	192	208	208	176
${\| 01011 ⟩, \| 01111 ⟩}$	160	128	176	144	144	176	160	128	24	160	160	160	176	176	144
${\| 01011 ⟩, \| 11000 ⟩}$	192	160	208	176	176	208	48	160	160	192	192	192	208	208	176
${\| 01100 ⟩, \| 10000 ⟩}$	40	160	208	176	176	208	192	160	160	192	192	192	208	208	176
${\| 01100 ⟩, \| 10001 ⟩}$	208	176	224	192	192	48	208	176	176	208	208	208	224	224	192
${\| 01100 ⟩, \| 11111 ⟩}$	192	160	208	176	176	208	48	160	160	192	192	192	208	208	176
${\| 01101 ⟩, \| 11110 ⟩}$	192	160	208	176	176	208	48	160	160	192	192	192	208	208	176
${\| 01111 ⟩, \| 11100 ⟩}$	192	160	208	176	176	208	48	160	160	192	192	192	208	208	176
${\| 10000 ⟩, \| 10001 ⟩}$	160	24	176	144	144	176	160	128	128	160	160	160	176	176	144
${\| 10110 ⟩, \| 10111 ⟩}$	160	24	176	144	144	176	160	128	128	160	160	160	176	176	144
${\| 10110 ⟩, \| 11010 ⟩}$	176	144	192	160	160	192	176	144	144	176	176	176	192	192	32
${\| 10110 ⟩, \| 11110 ⟩}$	160	128	176	144	144	176	160	24	128	160	160	160	176	176	144
${\| 11010 ⟩, \| 11011 ⟩}$	160	24	176	144	144	176	160	128	128	160	160	160	176	176	144
${\| 11010 ⟩, \| 11110 ⟩}$	160	128	176	144	144	176	160	128	24	160	160	160	176	176	144
$\dots$
${\| 00000 ⟩, \| 11111 ⟩}$	224	192	240	208	208	240	224	192	192	224	224	224	240	240	208
$\dots$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fuchs, F.G.; Lye, K.O.; Møll Nilsen, H.; Stasik, A.J.; Sartor, G. Constraint Preserving Mixers for the Quantum Approximate Optimization Algorithm. Algorithms 2022, 15, 202. https://doi.org/10.3390/a15060202

AMA Style

Fuchs FG, Lye KO, Møll Nilsen H, Stasik AJ, Sartor G. Constraint Preserving Mixers for the Quantum Approximate Optimization Algorithm. Algorithms. 2022; 15(6):202. https://doi.org/10.3390/a15060202

Chicago/Turabian Style

Fuchs, Franz Georg, Kjetil Olsen Lye, Halvor Møll Nilsen, Alexander Johannes Stasik, and Giorgio Sartor. 2022. "Constraint Preserving Mixers for the Quantum Approximate Optimization Algorithm" Algorithms 15, no. 6: 202. https://doi.org/10.3390/a15060202

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Constraint Preserving Mixers for the Quantum Approximate Optimization Algorithm

Abstract

1. Introduction

2. Related Work

3. Construction of Constraint Preserving Mixers

3.1. Conditions on the Mixer Hamiltonian

3.2. Transition Matrices for Mixers

3.2.1. Hamming Distance One Mixer T Ham ( 1 )

3.2.2. All-to-All Mixer T A

3.2.3. (Cyclic) Nearest Integer Mixer T Δ / T Δ , c

3.2.4. Products of Mixers and T E , T O

3.2.5. Random Mixer T rand

3.3. Decomposition of (Constraint) Mixers into Basis Gates

3.4. Optimality of Mixers

3.4.1. Transition Matrix T

3.4.2. Adding Mixers

3.4.3. Non-Commuting Pauli-Strings

3.5. Trotterizations

4. Full/Unrestricted Mixer

4.1. T Ham ( 1 ) aka “Standard” Full Mixer

4.2. All-to-All Full Mixer

4.3. (Cyclic) Nearest Integer Full Mixer

4.4. Comparison and Optimality of Full Mixers

5. Constrained Mixers

5.1. “One-Hot” Aka “XY”-Mixer

5.1.1. Case n = 2

5.1.2. Case n = 3

5.1.3. The General Case n > 2

5.1.4. Trotterizations

5.2. General Cases

5.2.1. Example 1

5.2.2. Example 2

6. Conclusions and Outlook

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2.1. Hamming Distance One Mixer $T_{Ham (1)}$

3.2.2. All-to-All Mixer $T_{A}$

3.2.3. (Cyclic) Nearest Integer Mixer $T_{Δ}$ / $T_{Δ, c}$

3.2.4. Products of Mixers and $T_{E}, T_{O}$

3.2.5. Random Mixer $T_{rand}$

4.1. $T_{Ham (1)}$ aka “Standard” Full Mixer

5.1.1. Case $n = 2$

5.1.2. Case $n = 3$

5.1.3. The General Case $n > 2$