Next Article in Journal
An Algorithm for the Closed-Form Solution of Certain Classes of Volterra–Fredholm Integral Equations of Convolution Type
Next Article in Special Issue
Quantum Computing Approaches for Mission Covering Optimization
Previous Article in Journal
Comparing the Reasoning Capabilities of Equilibrium Theories and Answer Set Programs
Previous Article in Special Issue
Two Taylor Algorithms for Computing the Action of the Matrix Exponential on a Vector
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Constraint Preserving Mixers for the Quantum Approximate Optimization Algorithm

by
Franz Georg Fuchs
*,
Kjetil Olsen Lye
,
Halvor Møll Nilsen
,
Alexander Johannes Stasik
and
Giorgio Sartor
SINTEF, Department of Mathematics and Cybernetics, 0373 Oslo, Norway
*
Author to whom correspondence should be addressed.
Algorithms 2022, 15(6), 202; https://doi.org/10.3390/a15060202
Submission received: 12 May 2022 / Revised: 1 June 2022 / Accepted: 6 June 2022 / Published: 10 June 2022
(This article belongs to the Collection Feature Paper in Algorithms and Complexity Theory)

Abstract

:
The quantum approximate optimization algorithm/quantum alternating operator ansatz (QAOA) is a heuristic to find approximate solutions of combinatorial optimization problems. Most of the literature is limited to quadratic problems without constraints. However, many practically relevant optimization problems do have (hard) constraints that need to be fulfilled. In this article, we present a framework for constructing mixing operators that restrict the evolution to a subspace of the full Hilbert space given by these constraints. We generalize the “XY”-mixer designed to preserve the subspace of “one-hot” states to the general case of subspaces given by a number of computational basis states. We expose the underlying mathematical structure which reveals more of how mixers work and how one can minimize their cost in terms of the number of CX gates, particularly when Trotterization is taken into account. Our analysis also leads to valid Trotterizations for an “XY”-mixer with fewer CX gates than is known to date. In view of practical implementations, we also describe algorithms for efficient decomposition into basis gates. Several examples of more general cases are presented and analyzed.

1. Introduction

The quantum approximate optimization algorithm (QAOA) [1], and its generalization, the quantum alternating operator ansatz (also abbreviated as QAOA) [2], is a meta-heuristic for solving combinatorial optimization problems that can utilize gate-based quantum computers and possibly outperform purely classical heuristic algorithms. Typical examples that can be tackled are quadratic (binary) optimization problems of the form
x * = arg min x { 0 , 1 } n , g ( x ) = 0 f ( x ) , f ( x ) = x T Q f x + c f , g ( x ) = x T Q g x + c g
where Q f , Q g R n × n are symmetric n × n matrices. For binary variables x { 0 , 1 } , any linear part can be absorbed into the diagonal of Q f and Q g . In this article, we focus on the case where the constraint is given by a feasible subspace as defined in the following:
Definition 1
(Constraints given by indexed computational basis states). Let H = ( C 2 ) n be the Hilbert space for n qubits, which is spanned by all computational basis states | z j , i.e., H = span { | z j , 1 j 2 n , z j { 0 , 1 } n } . Let
B = | z j , j J , z j { 0 , 1 } n ,
be the subset of all computational basis states defined by an index set J. This corresponds to
g ( x ) = j J i = 1 n x i ( z j ) i 2 ,
which is a quadratic constraint.
There is a well-established connection of quadratic (binary) optimization problems to Ising models, see, e.g., [3], that allows one to directly translate these problems to the QAOA. The general form of QAOA is given by
| γ , β = U M ( β p ) U P ( γ p ) U M ( β 1 ) U P ( γ 1 ) | ϕ 0 ,
where one alternates the application of phase separating and mixing operator p times. Here, U P ( γ ) is a phase separating operator that depends on the objective function f. As defined in [2], the requirements for the mixing operator U M ( β ) are as follows
  • U M does not commute with U P , i.e.,  [ U M ( β ) , U P ( γ ) ] 0 , for almost all γ , β R ;
  • U M preserves the feasible subspace as given in Definition 1, i.e.,  Sp B is an invariant subspace of U M ,
    U M ( β ) | v Sp B , | v Sp B , β R ;
  • U M provides transitions between all pairs of feasible states, i.e., for each pair x , y   β * R and r N { 0 } , such that
    | x | U M ( β * ) U M ( β * ) r times | y | > 0 , comp . basis states | x , | y B .
If both U M and U P correspond to the time evolution under some Hamiltonians H M , H P , i.e., U M = e i β H M and U P = e i γ H P , the approach can be termed “Hamiltonian-based QAOA” (H-QAOA). If the Hamiltonians H M , H P are the sum of (polynomially many) local terms, it represents a sub-class termed “local Hamiltonian-based QAOA” (LH-QAOA).
In practice, it is not possible to implement U M or U P directly. It is necessary to decompose the evolution into smaller pieces, which means that instead of applying e i t ( H 1 + H 2 ) , one can only apply e i t H 1 and e i t H 2 . This process is typically referred to as “Trotterization”. As an example, the simplest Suzuki–Trotter decomposition, or the exponential product formula [4,5] is given by
e x ( H 1 + H 2 ) = e x H 1 e x H 2 + O ( x 2 )
where x is a parameter and H 1 , H 2 are two operators with some commutation relation [ H 1 , H 2 ] 0 . Higher-order formulas can be found for instance in [4].
Practical algorithms need to be defined using a few operators from a universal gate set, e.g.,  { U 3 , C X } , where
U 3 ( θ , ϕ , λ ) = cos ( θ / 2 ) e i λ sin ( θ / 2 ) e i ϕ sin ( θ / 2 ) e i ( ϕ + λ ) cos ( θ / 2 ) , C X = 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 .
A good (and simple) indicator for the complexity of a quantum algorithm is given by the number of required C X gates. Overall, the most efficient algorithm is the one that provides the best accuracy in a given time [6].
Remark 1
(Repeated mixers). If U M is the exponential of a Hermitian matrix, the parameter r in Equation (6) does not matter, as it can be absorbed as a re-scaling of β. However, if  U M is Trotterized, this can lead to missing transitions. In this case, r > 1 can again provide these transitions. It is therefore suggested in [2] to repeat mixers within one mixing step. For this reason, we will consider the cost of Trotterized mixers including the necessary repetitions to provide transitions for all feasible states.

2. Related Work

The QAOA was introduced by [1] where it was applied to the Max-Cut problem. The authors in [7] compared the QAOA to the classical AKMAXSAT solver extrapolate from small instances to large instances and estimate that a quantum speed-up can be obtained with (several) hundreds of qubits. A general overview of variational quantum algorithms, including challenges and how to overcome them, is provided in [8,9]. Key challenges are that it is in general hard to find good parameters. It has been shown that the training landscapes are in general NP-hard [10]. Another obstacle is so-called barren plateaus, i.e., regions in the training landscape where the loss function is effectively constant [9]. This phenomenon can be caused by random initializations, noise, and over-expressablity of the ansatz [11,12].
Since its inception, several extensions/variants of the QAOA have been proposed. ADAPT-QAOA [13] is an iterative, problem-tailored version of QAOA that can adapt to specific hardware constraints. A non-local version, referred to as R-QAOA [14], recursively removes variables from the Hamiltonian until the remaining instance is small enough to be solved classically. Numerical evidence shows that this procedure significantly outperforms standard QAOA for frustrated Ising models on random three-regular graphs for the Max-Cut problem. WS-QAOA [15] takes into account solutions of classical algorithms to a warm-starting QAOA. Numerical evidence shows an advantage at low depth, in the form of a systematic increase in the size of the obtained cut for fully connected graphs with random weights.
There are two principal ways to take constraints into account when solving Equation (1) with the QAOA. The standard, simple approach is to penalize unsatisfied constraints in the objective function with the help of a so-called Lagrange multiplier λ , leading to
x * = arg min x { 0 , 1 } n f ( x ) + λ g ( x ) .
This approach is popular, since it is straightforward to define a phase-separating Hamiltonian for f ( x ) + λ g ( x ) . Some applications include the tail-assignment problem [16], the Max-k-cut problem [17], graph coloring problems, and the traveling sales person problem [18]. A downside of this approach is that infeasible solutions are also possible outcomes, especially for approximate solvers such as QAOA. This also makes the search space much bigger and the entire approach less efficient. In addition, the quality of the results turns out to be very sensitive to the chosen value of the hyperparameter λ . On one hand, λ should be chosen large enough such that the lowest eigenstates of H P correspond to feasible solutions. On the other hand, too large values of λ mean that the resulting optimization landscape in the γ has very high frequencies, which makes the problem hard to solve in practice. In general, it can be very challenging to find (the problem-dependent) value for λ that best balances the tradeoff between optimality and feasibility in the objective function [19].
For QAOA, a second approach is to define mixers that have zero probability to go from a feasible state to an infeasible one, making the hyperparameter λ of the previous approach unncessary. However, it is generally more challenging to devise mixers that take into account constraints. The most prominent example in the literature is the X Y -mixer [2,18,19], which constrains evolution to states with nonzero overlap with “one-hot” states. One-hot states are computational basis states with exactly one entry equal to one. For instance, |0001⟩ and |010000⟩ are one-hot states, while |00⟩ and |110⟩ are not. The name X Y mixer comes from the related X Y -Hamiltonian [20]. The mixers derived in the literature follow the intuition of physicists to use “hopping” terms. A performance analysis of the XY-mixer applied to the maximum k-vertex cover shows a heavy dependence on the initial states as well as the chosen Trotterization [21].
QAOA can be viewed as a discretized version of quantum annealing. In quantum annealing, enforcing constraints via penalty terms is particularly “harmful”, since they often require all-to-all connectivity of the qubits [22]. The authors in [23] therefore introduce driver Hamiltonians that commute with the constraints of the problem. This bears similarities with and actually inspired the approaches in [2,18].
The main contributions of this article are:
  • A general framework to construct mixers restricted to a set of computational basis states; see Section 3.1.
  • An analysis of the underlying mathematical structure, which is largely independent of the actual states; see Section 3.2.
  • Efficient algorithms for decomposition into basis gates; see Section 3.3 and Section 3.5.
  • Valid Trotterizations, which is not completely understood in the literature; see Section 3.5.
  • We prove that it is always possible to realize a valid Trotterization; see Theorem 3.
  • Improved efficiency of Trotterized mixers for “one-hot” states in Section 5.1.
  • Discussion of the general case, exemplified in Section 5.2.
We start by describing the general framework.

3. Construction of Constraint Preserving Mixers

In the following, we will derive a general framework for mixers that are restricted to a subspace, given by certain basis states. For example, one may want to construct a mixer for five qubits that is restricted to the subspace Sp | 01001 , | 11001 , | 11110 of C 2 5 , where Sp B denotes the linear span of B. In this section, we will describe the conditions for a Hamiltonian-based QAOA mixer to preserve the feasible subspace and for providing transitions between all pairs of feasible states. We also provide efficient algorithms to decompose these mixers into basis gates.

3.1. Conditions on the Mixer Hamiltonian

Theorem 1
(Mixer Hamiltonians for subspaces). Given a feasible subspace B as in Definition 1 and a real-valued transition matrix T R | J | × | J | . Then, for the mixer constructed via
U M ( β ) = e i β H M , where H M = j , k J ( T ) j , k | x j x k | ,
the following statements hold.
  • If T is symmetric, the mixer is well defined and preserves the feasible subspace, i.e., condition (5) is fulfilled.
  • If T is symmetric and for all 1 j , k | J | , there exists an r N { 0 } (possibly depending on the pair) such that
    ( T r ) j , k 0 ,
    then U M provides transitions between all pairs of feasible states, i.e., condition (6) is fulfilled.
Proof. 
Well definedness. Almost trivially H M is Hermitian if T is symmetric,
H M = j , k J ( T ) j , k | x k x j | = j , k J ( T ) k , j | x k x j | = H M .
Since H M is a Hermitian (and therefore normal) matrix, there exists a diagonal matrix D, with the entries of the diagonal as the (real valued) eigenvalues of H M , and a matrix U, with columns given by the corresponding orthonormal eigenvectors. The mixer is therefore well defined through the convergent series
e i t H M = m = 0 ( i t ) m H M m m ! = U e i t D U .
Reformulations. We can rewrite H M in the following way
H M : | y j , k J ( T ) j , k x j | y | x k = E T E T | y , | y C 2 n ,
where the columns of the matrix E R 2 n × | J | consist of the feasible computational basis states, i.e.,  E = [ x j ] j J ; see Figure 1 for an illustration.
Using that E T E = I R | J | × | J | is the identity matrix, we have that
H M m = E T m E T = j , k J ( T m ) j , k | x j x k | , m N ,
and Equation (13) can be written as
e i t H M = E m = 0 ( i t ) m T m m ! E T .
Preservation of the feasible subspace. Let | v Sp B . Using Equation (15), we know that
H M m | v = j , k J ( T m ) j , k | x j x k | v = j J c j | x j Sp B ,
with coefficients c j C . Therefore, also e i t H M | v Sp B , t R , since it is a sum of these terms.
Transition between all pairs of feasible states. For any pair of feasible computational basis states | x j * , | x k * B , we have that
f ( t ) = x j * | U M ( t ) | x k * = x j * | m = 0 ( i t ) m m ! j , k I ( T m ) j , k | x j x k | | x k * = m = 0 ( i t ) m m ! ( T m ) j * , k *
It is enough to show that f ( t ) is not the zero function. Since f ( t ) : R C is an analytic function, it has a unique extension to C . Assume that f is indeed the zero function on R ; then, the extension to C would also be the zero function, and all coefficients of its Taylor series would be zero. However, we assumed the existence of an r N { 0 } such that | ( T r ) j * , k * | > 0 , and hence, there exists a nonzero coefficient, which is a contradiction to f being the zero function.   □
A natural question is how the statements in Theorem 1 depend on the particular ordering of the elements of B.
Corollary 1
(Independence of the ordering of B). Statements in Theorem 1 that hold for a particular ordering of computational basis states for a given B hold also for any permutation π : { 1 , , | J | } { 1 , , | J | } , i.e., they are independent of the ordering of elements. For each ordering, the transition matrix T changes according to T π = P π T T P π , where P π is the permutation matrix associated with π.
Proof. 
We start by pointing out that the inverse matrix of P π exists and can be written as P π 1 = P π 1 = P π T .
The resulting matrix H M is unchanged. Following the derivation in Equation (14), we have that H M π = E π T π E π T , where the columns of the matrix E R 2 n × | J | consist of the permuted feasible computational basis states, i.e.,  E π = { x π ( j ) } j J . Inserting T = P π T T P π , we have indeed H M π = E π T π E π T = ( E π P π T ) T ( P π E π T ) = E T E T = H M .
T π is symmetric if T is. Assuming that T T = T , we have that also
( T π ) T = ( P π T T P π ) T = P π T T T P π = P π T T P π = T π .
If the condition in Equation (11) holds for T, then it also holds for T π . Using T π r = P π T T r P π , we can show that Equation (11) holds for the permuted index pair ( π ( j ) , π ( k ) ) for T π if it holds for ( j , k ) for T.    □
In the following, if nothing else is remarked, computational basis states are ordered with respect to increasing integer value, e.g.,  | 001 , | 010 , | 111 .
Apart from special cases, there is a lot of freedom to choose the transition matrix T that fulfills the conditions of Theorem 1. The entries of T will heavily influence the circuit complexity, which will be investigated in Section 3.3. In addition, we have the following property which adds additional flexibility to develop efficient mixers.
Corollary 2
(Properties of mixers). For a given feasible subspace Sp B , let U M , B be the mixer given by Theorem 1. For any subspace Sp C with Sp B Sp C = { 0 } or equivalently B C = , also U M = U M , B U M , C is a valid mixer for B satisfying the conditions of Equations (5) and (6); see also Figure 2.
Proof. 
Any | v B is in the null space of H M , C , i.e.,  H M , C | v = 0 and hence U M , C | v = I . Therefore, U M , B U M , C | v = U M , B | v B , and  U M , C U M , B | v = U M , C | w = | w with | w B which means the feasible subspace is preserved. Condition (6) follows similarly from the fact that U M , C | v = I for any | v B .    □
Corollary 2 naturally holds as well for any linear combination of mixers, i.e.,  H M , B + i a i H M , C i is a mixer for the feasible subspace Sp B as long as Sp C i Sp B = { 0 } , i . At first, it might sound counterintuitive that adding more terms to the mixer results in more efficient decomposition into basis gates. However, as we will see in Section 5, it can lead to cancellations due to symmetry considerations.
Next, we describe the structure of the eigensystem of U M .
Corollary 3
(Eigensystem of mixers). Given the setting in Theorem 1 with a symmetric transition matrix T. Let ( λ , v ) be an eigenpair of T, then ( λ , E v ) is an eigenpair of H M and ( e i t λ , E v ) is an eigenpair of U M , where E = { | x j } j J as defined in Equation (14).
Proof. 
Let ( λ , v ) be an eigenpair of T. Then, H M E v = E T E T E v = E T v = λ E v , so ( λ , E v ) is an eigenpair of H M . The connection between H M and U M is general knowledge from linear algebra.    □
An example illustrating Corollary 3 is provided by the transition matrix T R 4 × 4 with zero diagonal and all other entries equal to one. A unit eigenvector of T, which fulfills Theorem 1, is v = 1 / 2 ( 1 , 1 , 1 , 1 ) T . For any B = { | z 1 , | z 2 , | z 3 , | z 4 } , the uniform superpositions of these states is an eigenvector, since
1 v 2 E v = 1 2 ( | z 1 , | z 2 , | z 3 , | z 4 ) ( 1 , 1 , 1 , 1 ) T = 1 2 ( | z 1 + | z 2 + | z 3 + | z 4 ) .
This result holds irrespective of what the states are and which dimension they have.
Theorem 2
(Products of mixers for subspaces). Given the same setting as in Theorem 1. For any decomposition of T into a sum of Q symmetric matrices T q , in the following sense
T = q = 1 Q T q , ( T q ) i , j = ( T q ) j , i = either ( T ) i j , or 0 ,
we construct the mixing operator via
U M ( β ) = n = 1 q n { 1 , 2 , , Q } N e i β T q n .
If all entries of T are positive, then U M provides transitions between all pairs of feasible states, i.e., condition (6) is fulfilled, if for all 1 j , k | J | there exist r m N { 0 } (possibly depending on the pair) such that
m = 1 q m Q M T q m r m j , k 0 .
Proof. 
Combining Equations (15) and (16), we have
x j | U M ( β ) | x k = j 1 = 0 , j 2 = 0 , , j M = 0 ( i t ) j 1 + j 2 + + j m ( T q 1 j 1 T q 2 j 2 T q M j M ) j , k j 1 ! j 2 ! j m ! = j = 1 ( i t ) j j ! j 1 , , j M s . t . m = 1 M j m = j ( T q 1 j 1 T q 2 j 2 T q M j M ) j , k .
Using that T only has positive entries and the condition in Equation (20), the same argument as in Theorem 1 can be used to show that U M ( β ) is not the zero function, and therefore, we have transitions between all pairs of feasible states.    □
As Theorem 1 leaves a lot of freedom for choosing valid transition matrices, we will continue by describing important examples for T.

3.2. Transition Matrices for Mixers

Theorem 1 provides conditions for the construction of mixer Hamiltonians that preserve the feasible subspace and provide transitions between all pairs of feasible computational basis states, namely
  • T R | J | × | J | is symmetric; and 
  • for all 1 j , k | J | there exists an r j , k N { 0 } such that ( T r ) j , k 0 .
Remarkably, these conditions depend only on the dimension of the feasible subspace | J | = d i m ( Sp B ) = | B | , and they are independent of the specific states that constitute B. In addition, Corollary 1 shows that these conditions are robust with respect to reordering of rows if in addition columns are reordered in the same way. Moreover, Equation (17) shows also that the overlap between computational basis states | x j , | x k B is independent of the specific states that B consists of and only depends on T, since the right-hand side of the expression
x j | U M ( t ) | x k = m = 0 ( i t ) m m ! ( T m ) j , k ,
is independent of the elements in B. This allows us to describe and analyze valid transition matrices by only knowing the number of feasible states, i.e.,  | B | . What these specific states are is irrelevant, unless one wants to look at what an optimal mixer is, which we will come back to in Section 3.4. Figure 3 provides a comparison of some mixers described in the following with respect to the overlap between different states.
In the following, we denote the matrix for pairs of indices whose binary representation have a Hamming distance equal to d as
T Ham ( d ) , with ( T Ham ( d ) ) i , j = 1 , if d Hamming ( bin ( i ) , bin ( j ) ) = d , 0 , else ,
Examples of the structure of T Ham ( d ) can be found in Figure 4.
Furthermore, it will be useful to denote the matrix which has two nonzero entries at ( k , l ) and ( l , k ) as
T k l , with ( T k l ) i , j = 1 , if ( i , j ) = ( k , l ) or ( i , j ) = ( l , k ) , 0 , else .
Before we start, we point out that the diagonal entries of T can be chosen to be zero, because  | ( T 0 ) j , j | = 1 0 for all j J . Although trivial, we will repeatedly use that v = 1 | J | ( 1 , 1 , , 1 ) T is an eigenvector of a matrix F C | J | × | J | if the sum of all rows are a multiple of v.

3.2.1. Hamming Distance One Mixer T Ham ( 1 )

The matrix T Ham ( 1 ) R | J | × | J | fulfills Theorem 1 when | J | = 2 n , n N . The symmetry of T Ham ( 1 ) is due to the fact that the Hamming distance is a symmetric function. Using the identity
T Ham ( k ) T Ham ( 1 ) = T Ham ( 1 ) T Ham ( k ) = ( n ( k 1 ) ) T Ham ( k 1 ) + ( k + 1 ) T Ham ( k + 1 )
it can be shown that
T Ham ( 1 ) k = j = 1 k 1 c k T Ham ( j ) + k ! T Ham ( k ) ,
where c k are real coefficients. Therefore, it is clear that T Ham ( 1 ) k reaches all states with Hamming distance K. Furthermore, v = 1 2 n ( 1 , 1 , , 1 ) T is a unit eigenvector of T Ham ( 1 ) since the sum of each row is n. This is because there are exactly n other states with a Hamming distance of one for each bitstring.

3.2.2. All-to-All Mixer T A

We denote the matrix with all but the diagonal entries equal to one as
T A , with ( T A ) i , j = 1 , if i j , 0 , else .
Trivially, T A R | J | × | J | fulfilles Theorem 1 and v = 1 | J | ( 1 , 1 , , 1 ) T is a unit eigenvector of T A since the sum of each row is | J | 1 .

3.2.3. (Cyclic) Nearest Integer Mixer T Δ / T Δ , c

Inspired by the stencil of finite-difference methods, we introduce T Δ , T Δ , c R | J | × | J | as matrices with off-diagonal entries equal to one
T Δ , with ( T Δ ) i , j = 1 , if i = j + 1 i = j 1 , 0 , else , T Δ , c , with ( T Δ , c ) i , j = 1 , if i = ( j + 1 ) mod n i = ( j 1 ) mod n , 0 , else .
Both matrices fulfill Theorem 1. Symmetry holds by definition, and it is easy to see that the k-th off-diagonal of T Δ k and T Δ , c k is nonzero for 1 k | J | .
For the nearest integer mixer T Δ , it is known that
v k = ( sin ( c ) , sin ( 2 c ) , , sin ( | J | c ) ) , c = k π | J | + 1
are eigenvectors for 1 k | J | . For the cyclic nearest integer mixer, we have that the sum of each row/column of T Δ , c is equal to two (except for n = 1 when it is one). Therefore, v = 1 | J | ( 1 , 1 , , 1 ) T is a unit eigenvector.

3.2.4. Products of Mixers and T E , T O

In some cases, it will be necessary to use Theorem 2 to implement mixer unitaries. When splitting transitions matrices into odd and even entries, the following definition is useful. Denote the matrix with entries in the d-th off-diagonal for even rows equal to one
T E ( d ) , with ( T E ( d ) ) i , j = 1 , if i = j + d i = j d , and i even , 0 , else ,
and accordingly T O ( d ) for odd rows. In addition, we will use T O ( 1 ) , c to be the cyclic version in the same way as in Equation (28). As an example, this allows one to decompose T Δ , c = T 1 + T 2 R n × n with T 1 = T O ( 1 ) + T O ( n 1 ) = T O ( 1 ) , c and T 2 = T E ( 1 ) .

3.2.5. Random Mixer T rand

Finally, the upper triangular entries of the mixer T rand are drawn from a continuous uniform distribution on the interval [ 0 , 1 ] , and the lower triangular entries are chosen such that T becomes symmetric. Since the probability of getting a zero entry is zero, such a random mixer fulfills Theorem 1 with probability 1.

3.3. Decomposition of (Constraint) Mixers into Basis Gates

Given a set of feasible (computational basis) states B = | x j , j J , x j { 0 , 1 } n , we can use Theorem 1 to define a suitable mixer Hamiltonian. The next question is how to (efficiently) decompose the resulting mixer into basis gates. In order to do so, we first decompose the Hamiltonian H M into a weighted sum of Pauli-strings. A Pauli-string P is a Hermitian operator of the form P = P 1 P n where P i { I , X , Y , Z } . Pauli-strings form a basis of the real vector space of all n-qubit Hermitian operators. Therefore, we can write
H M = i 1 , , i n = 1 4 c i 1 , , i n σ i 1 σ i n , c i 1 , , i n R ,
with real coefficients c i 1 , , i n , where σ 1 = I , σ 2 = X , σ 3 = Y , σ 4 = Z . After using a standard Trotterization scheme [4,5] (which is exact for commuting Pauli-strings),
U M ( t ) = e i t H M i 1 , , i n = 1 | c i 1 , , i n | > 0 4 e i t c i 1 , , i n σ i 1 σ i n ,
it is well-established how to implement each of the terms of the product using basis gates; see Equation (33). We will discuss the effects of Trotterization in more detail in Section 3.5, as there are several important aspects to consider for a valid mixer.
Algorithms 15 00202 i004
U i = H , if P i = X , S H , if P i = Y , I , if P i = Z , ( U ) i = H , if P i = X , H S , if P i = Y , I , if P i = Z .
Here, S is the S or Phase gate and H is the Hadamard gate. The standard way to compute the coefficients c i 1 , , i n is given in Algorithm 1.  
Algorithm 1: Decompose H M given by Equation (10) into Pauli-strings via trace
Algorithms 15 00202 i001
   For n qubits, this requires to compute 4 n coefficients, as well as the multiplication of 2 n × 2 n matrices. However, most of these terms are expected to vanish. We therefore describe an alternative way to produce this decomposition, using the language of quantum mechanics [24]. In the following, we use the ladder operators used in the creation and annihilation operators from the second quantization formulation in quantum chemistry defined by
a = 1 2 ( X i Y ) , a = 1 2 ( X + i Y ) .
Since a | 0 = 0 , a | 1 = | 0 , where 0 is the zero vector, we have that | 0 1 | = a . Since a | 0 = | 1 , a | 1 = 0 , we have that | 1 0 | = a , and finally a a | 0 = 0 , a a | 1 = | 1 , a a | 0 = | 0 , a a | 1 = 0 , means that | 0 0 | = a a and | 1 1 | = a a . Note that
a a = 1 2 I Z , a a = 1 2 I + Z .
As an example, consider the matrix M = | 01 10 | = | 0 1 | | 1 0 | , which can be expressed with ladder operators as M = a 1 a 2 . Another example is given by M = | 01 11 | = a 1 a 2 a 2 . This approach clearly extends to the general case and leads to Algorithm 2.  
Algorithm 2: Decompose H M given by Equation (10) into Pauli-strings directly
Algorithms 15 00202 i002
A comparison of the complexity of the two algorithms is given in Table 1. The naive algorithm needs to perform a matrix–matrix multiplication with matrices of size 2 n × 2 n for each of the 4 n coefficients. This quickly becomes prohibitive for larger n. The algorithm based on ladder operators requires resources that scale with the number of nonzero entries of the transition matrix T, which is much more favorable. In the end, a symbolic mathematics library is used to simplify the expressions in order to create the list of nonzero Pauli-strings.

3.4. Optimality of Mixers

On current NISQ devices, the noise level of two-qubit gate (CX) times and error rates are one order of magnitude higher than for single qubit gates ( U 3 ). In addition, most devices lack all-to-all connectivity. The CX gates between these require SWAP operations, which consist of additional CX gates. An optimal mixer will therefore contain as few CX gates as possible. Since Pauli-strings are implemented according to Equation (33), we define the cost to implement e i t H M as
Cost ( H M ) = i 1 , , i n = 1 | c i 1 , , i n | > 0 len ( σ i 1 σ i n ) > 1 4 2 ( len ( σ i 1 σ i n ) 1 ) ,
where len ( P ) is the length of a Pauli-string P defined as the number of literals that are not the identity. For instance, P = I X I I Y = I 1 X 2 I 3 I 4 Y 5 = X 2 Y 5 has len ( P ) = 2 . The Cost ( H M ) specifies the number of CX gates that are required to implement the mixer. A lower cost means fewer and/or shorter Pauli-strings. There are four interconnected factors that influence the cost to implement the mixer for a given B.

3.4.1. Transition Matrix T

The larger | B | , the more freedom we have in choosing the transition matrix T that fulfills Theorem 1. The combination of T and the specific states of B define the cost of the Hamiltonian. Unless one can find a way to utilize the structure of the states of B to efficiently compute an optimal T, we expect this problem to be NP-hard. In practice, a careful analysis of the specific states of B is required to determine T such that the cost becomes low. We will revisit optimality for both unrestricted and restricted mixers in Section 4 and Section 5.

3.4.2. Adding Mixers

Corollary 2 allows one to add mixers with a kernel that contains Sp B . In general, also, this is a combinatorial optimization problem which we do not expect to solve exactly with an efficient algorithm. However, we will provide a heuristic that can be used to reduce the cost of mixers in certain cases. We will provide more details in Section 5 where we discuss constrained mixers on some examples in detail.

3.4.3. Non-Commuting Pauli-Strings

Depending on the mixer—which depends on the transition matrix and addition of mixers outside the feasible subspace—one can influence the commutativity pattern of the resulting Pauli-strings. This is an intricate topic, which we discuss next.

3.5. Trotterizations

Algorithms 1 and 2 produce a weighted sum of Pauli-strings equal to the mixer Hamiltonian H M defined in Theorem 1. A further complication arises when the non-vanishing Pauli-strings of the mixer Hamiltonian H M do not all commute. In that case, one can not realize U M exactly but has to find a suitable approximation/Trotterization; see Equation (32). Two Pauli-strings commute, i.e.,  [ P A , P B ] = P A P B P B P A = 0 if, and only if, they fail to commute on an even number of indices [25]. An example is given in Figure 5.
This problem is similar to a problem for observables: how does one divide the Pauli-strings into groups of commuting families [25,26] to maximize efficiency and increase accuracy? In order to minimize the number of measurements required to estimate a given observable, one wants to find a “min-commuting-partition”; given a set of Pauli-strings from a Hamiltonian, one seeks to partition the strings into commuting families such that the total number of partitions is minimized. This problem is NP-hard in general [25]. However, based on Theorem 3, we expect our problem to be much more tractable.
For our case, it turns out that not all Trotterizations are suitable as mixing operators; they can either fail to preserve the feasible subspace, i.e., Equation (5), or fail to provide transitions between all pairs of feasible states, i.e., Equation (6). An example is given by B = { | 001 , | 010 , | 100 } with the mixer H M = 1 2 ( X I X + Y I Y ) + 1 2 ( X X I + Y Y I ) associated with T Δ = T 1 2 + T 2 3 ; see Section 5.1. Looking at Figure 5, these terms can be grouped into commuting families in two ways, which represent two (of many) different ways to realize the mixer unitary with basis gates.
  • The first possible Trotterization is given by U 1 ( β ) = e i β ( X X I + I X X ) and U 2 ( β ) = e i β ( Y Y I + I Y Y ) . However, it turns out that β R such that | 111 | U 1 ( β ) U 2 ( β ) | z | > 0 for all | z B . This means that this Trotterization does not preserve the feasible subspace and does not represent a valid mixer Hamiltonian. The underlying reason for this is that the terms X X I and Y Y I are generated from the entry T 1 2 , but are split in this Trotterization. The same holds true for I X X and I Y Y , which are generated via T 2 3 .
  • The second possible Trotterization is given by U 1 ( β ) = e i β ( X I X + Y I Y ) and U 2 ( β ) = e i β ( X X I + Y Y I ) , which splits terms with respect to T 1 2 and T 2 3 In this case, we have that | 100 | U 1 ( β ) U 2 ( β ) | 001 | = 0 , so it does not provide an overlap between all feasible computational basis states. This can be understood via Theorem 2. We have that ( T 1 2 n 1 T 2 3 n 2 ) 3 , 1 = 0 for all n 1 , n 2 N , so one can not “reach” |100⟩ from |001⟩. The opposite is not true; we have that ( T 1 2 T 2 3 ) 1 , 3 = 1 , so β such that | 001 | U 1 ( β ) U 2 ( β ) | 100 | > 0 .
We have just learned that it is a bad idea to Trotterize terms that belong to a nonzero entry of T, i.e., to T j i . Therefore, we need to show that all non-vanishing Pauli-strings of | x j x i | + | x i x j | commute; otherwise, there might exist subspaces for which we can not realize the mixer constructed in Theorem 1. Luckily, the following theorem shows that it is always possible to realize a mixer by Trotterizing according to nonzero entries of T = i , j J , i < j T j i .
Theorem 3
(Pauli-strings for T j i commute). Let | z , | w be two computational basis states in C 2 n . Then, all non-vanishing Pauli-strings σ i 1 σ i n of the decomposition
| z w | + | w z | = i 1 , , i n = 1 4 c i 1 , , i n σ i 1 σ i n , c i 1 , , i n R ,
commute. 
Proof. 
We will prove the following more general assertion by induction. Let P 1 + , 1 , P 1 + , 2 be two non-vanishing Pauli-strings of the decomposition of | z w | + | w z | , and  P i , 1 , P i , 2 be two non-vanishing Pauli-strings of the decomposition of i ( | z w | | w z | ) . Then, [ P 1 + , 1 , P 1 + , 2 ] = 0 , [ P i , 1 , P i , 2 ] = 0 and [ P 1 + , · , P i , · ] 0 . We will use that two Pauli-strings commute if, and only if, they fail to commute on an even number of indices [25].
For n = 1 , we have the following cases.
A 1 = | z w | + | w z | = I + Z , if ( z , w ) = ( 0 , 0 ) , X , if ( z , w ) = ( 0 , 1 ) , X , if ( z , w ) = ( 1 , 0 ) , I Z , if ( z , w ) = ( 1 , 1 ) , B 1 = i ( | z w | | w z | ) = 0 , if ( z , w ) = ( 0 , 0 ) , Y , if ( z , w ) = ( 0 , 1 ) , + Y , if ( z , w ) = ( 1 , 0 ) , 0 , if ( z , w ) = ( 1 , 1 ) .
It is trivially true that [ P 1 + , 1 , P 1 + , 2 ] = 0 and [ P i , 1 , P i , 2 ] = 0 , since the maximum number of Pauli-strings is two, and in that case, one of the Pauli-strings is the identity. Moreover, P i , · is nonzero only when z w . In that case, [ P 1 + , · , P i , · ] = [ X , ± Y ] 0 .
n n + 1 . We assume the assumptions hold for two computational basis states | z , | w C 2 n . Then, there are the following four cases
A n + 1 = | w x z y | + | w x z y | = 1 2 A n ( I + Z ) , if ( x , y ) = ( 0 , 0 ) , A n X + B n Y , if ( x , y ) = ( 0 , 1 ) , A n X B n Y , if ( x , y ) = ( 1 , 0 ) , A n ( I Z ) , if ( x , y ) = ( 1 , 1 ) , B n + 1 = i ( | w x z y | | w x z y | ) = 1 2 0 , if ( x , y ) = ( 0 , 0 ) , B n X A n Y , if ( x , y ) = ( 0 , 1 ) , B n X + A n Y , if ( x , y ) = ( 1 , 0 ) , 0 , if ( x , y ) = ( 1 , 1 ) ,
where A n = ( | z w | + | w z | ) , B n = i ( | z w | | w z | ) .
Case x = y . According to our assumptions that all non-vanishing Pauli-strings for A n commute, the same holds for A n + 1 = A n ( I ± Z ) . Since B n + 1 = 0 , the rest of the assertions are trivially true, as there are no non-vanishing Pauli-strings.
Case x y . Our assumptions mean that non-vanishing Pauli-strings of A n fail to commute on an odd number of indices with non-vanishing Pauli-strings of B n . Therefore, non-vanishing Pauli-strings of A n + 1 = A n X ± B n Y fail to commute on an even number of indices, and, hence, commute. The same argument holds for B n + 1 = B n X ± A n Y . Finally, we prove that non-vanishing Pauli-strings of A n + 1 and B n + 1 do not commute. Either Pauli strings P 1 + , · and P i , · stem from A n X and B n X , respectively, or they stem from ± A n Y and B n Y , respectively. In both cases, the number of commuting terms does not change, so non-vanishing Pauli-strings of A n + 1 and B n + 1 do not commute.   □
The proof in Theorem 3 inspires the following algorithm to decompose H M into Pauli-strings. For each item in the list S that the algorithm produces, all Pauli-strings commute.
We can illustrate the difference between Algorithms 2 and 3 for B = { | 01 , | 10 } and T 1 2 . With Algorithm 2, we have P 1 , 2 = 1 4 ( X i Y ) ( X + i Y ) and P 2 , 1 = 1 4 ( X + i Y ) ( X i Y ) , which can be simplified to S = P 1 , 2 + P 2 , 1 = 1 2 ( X X + Y Y ) . With Algorithm 3, we have A 1 = X , B 1 = Y and S = A 2 = 1 2 ( A 1 X + B 1 Y ) = 1 2 ( X X + Y Y ) without the need to simplify the expression.  
Algorithm 3: Decompose H M given by Equation (10) into Pauli-strings directly
Algorithms 15 00202 i003
As shown above, Trotterizations can also lead to missing transitions. It is suggested in [2] that it is useful to repeat mixers within one mixing step, which corresponds to r > 1 in Equation (6). However, as we see in Figure 6, there can be more efficient ways to obtain mixers that provide transitions between all pairs of feasible states. One way to do so is to construct an exact Trotterization (restricted to the feasible subspace) as described in [19]. However, the ultimate goal is not to avoid Trotterization errors, but rather to provide transitions between all pairs of feasible states. We will revisit the topic of Trotterizations in Section 5 in more detail for each case and show that there are more efficient ways to do so.

4. Full/Unrestricted Mixer

We start by applying the proposed algorithm to the case without constraints, i.e., for the case g = 0 in Equation (1), in order to check for consistency and new insight. We will see that the presented approach is able to reproduce the “standard” X mixer as one possibility, but it provides a more general framework. For this case, B = { | x j , j J , x j { 0 , 1 } n } , J = { i , 1 i 2 n } , which means that Sp B = H . Furthermore, using Equation (14), we have that H M , B = T , since E is the identity.

4.1. T Ham ( 1 ) aka “Standard” Full Mixer

The Hamiltonian of the standard full mixer for n qubits can be written as
H M = j J X j = j J | 0 0 | + | 1 1 | ( j 1 ) ( | 0 1 | + | 1 0 | ) | 0 0 | + | 1 1 | ( n j ) = j , k J T Ham ( 1 ) j , k | x j x k | .
The last identity in Equation (40) shows that H M is created by the transition matrix given by T Ham ( 1 ) . This assumes that the feasible states in B are ordered from the smallest to the largest integer representation.

4.2. All-to-All Full Mixer

For | J | = 2 n , the full mixer T A can be written as T A = j = 1 n T Ham ( j ) . For the case T Ham ( 2 ) , the resulting Hamiltonian H M does not provide transitions between all pairs of feasible states, but we observe that H M = j , k J ( T Ham ( 2 ) ) j , k | x j x k | = j 1 n j 2 = j 1 + 1 n X j 1 X j 2 , i.e., H M consists of all n 2 possible pairs of Pauli-strings, which contain exactly two Xs. For m n , this can be further generalized to
H M = j , k J T Ham ( m ) j , k | x j x k | = j 1 = 1 n j 2 = j 1 + 1 n j m = j m 1 + 1 n X j 1 X j m ,
which consists of all n m possible pairs of Pauli-strings with exactly m Xs. The resulting mixer Hamiltonian is therefore given by
H M = j 1 = 1 n X j 1 + j 2 = j 1 + 1 n X j 1 X j 2 + + j n = j n 1 + 1 n X j 1 X j n .
This means that the mixer consists of the standard mixer plus n k applications of Pauli X-k strings for k from 2 to n, which is a large overhead compared to the standard X-mixer.

4.3. (Cyclic) Nearest Integer Full Mixer

The resulting mixer for T Δ / T Δ , c involves exponentially many Pauli-strings with increasing n. The following shows T Δ for 1 n 4 .
H M n = 1 = X 1 , H M n = 2 = I 1 H M n = 1 + 1 2 ( X 1 X 2 + Y 1 Y 2 ) , H M n = 3 = I 1 H M n = 2 + 1 4 ( X 1 X 2 X 3 X 1 Y 2 Y 3 + Y 1 X 2 Y 3 + Y 1 Y 2 X 3 ) , H M n = 4 = I 1 H M n = 3 + 1 8 ( X 1 X 2 X 3 X 4 X 1 X 2 Y 3 Y 4 X 1 Y 2 X 3 Y 4 X 1 Y 2 Y 3 X 4 I 1 H M n = 3 + 1 8 ( + Y 1 X 2 X 3 Y 4 + Y 1 X 2 Y 3 X 4 + Y 1 Y 2 X 3 X 4 Y 1 Y 2 Y 3 Y 4 ) .

4.4. Comparison and Optimality of Full Mixers

It would be convenient to have a condition on the transition matrix for the optimality of the resulting mixer. We define the total Hamming distance of T to be
Ham ( T ) = j , k = 1 | ( T ) j , k | > 0 | J | d Hamming ( bin ( i ) , bin ( j ) ) ,
where b i n ( i ) is the binary representation of an integer i. As a first instinct, one might suspect that the mixer with minimal Hamming distance also minimizes the cost. However, this turns out to be false because of cancellations when more terms in T are nonzero. Table 2 gives a comparison of the total Hamming distance and cost for different full mixers. The standard full mixer has a total Hamming distance Ham ( T ) = ( n 1 ) 2 n , as there are 2 n states each with n 1 states that have a Hamming distance of 1. The all-to-all full mixer has Ham ( T ) = 2 n k = 1 n k n k . For the rest of the transition matrices, it is not that straightforward to derive a general formula for H a m ( T ) , but the table gives an impression. Table 2 shows a dramatic difference between the different mixers with regard to resource requirements. The standard mixer is the only one that does not require CX gates and is the most efficient to implement. Furthermore, as the resulting Pauli terms for the full mixers given by T H a m ( 1 ) and T A consist only of I and X and therefore commute, they can be implemented without Trotterization. For the mixers given by T Δ , c , T Δ , and T r a n d , on the other hand, not all Pauli-strings commute, which results in the need for Trotterization. We continue with the case of constrained mixers.

5. Constrained Mixers

We start by describing what is known as the “XY”-mixer [2,17,19] before we explore more general cases. Our framework provides additional insights into this case and inspires further improvement of the algorithms above with respect to the optimality of the mixers, described in Section 3.4, by (possibly) reducing the length of Pauli-strings. For this case, we will analyze T A , T Δ , and T Δ , c only. T H a m ( 1 ) only makes sense when n is a power of two, and T rand has in general high cost; see Table 2.

5.1. “One-Hot” Aka “XY”-Mixer

We are concerned with the case given by all computational basis states with exactly one “1”, i.e., B = { | x , x j { 0 , 1 } n , s . t . x j = 1 } . These states are sometimes referred to as “one-hot”. We have that n = | B | is the number of qubits. After some motivating examples, we present the general case for constructing mixers for any n > 2 .

5.1.1. Case n = 2

The smallest, non-trivial case is given by B = { | 01 , | 10 } . For any b R , | b | > 0 , the transition matrix T = d b b d fulfills Theorem 1 and leads to the mixer H M = b 2 ( X X + Y Y ) + d 2 ( I I Z Z ) . Since we want to minimize Cost ( H M ) given in Equation (36), we set d = 0 , which results in the Cost ( H M ) = 4 . However, by using Corollary 2, there is room for further reducing the cost. We can add the mixer for C = { | 00 , | 11 } , since B C = . Using the same T (setting d = 0 ) gives
H M , B + H M , C = b X X ,
which has Cost ( H M ) = 2 . No Trotterization is needed in this case.

5.1.2. Case n = 3

We continue with B = { | 001 , | 010 , | 100 } . For the transition matrix T = a T 1 2 + b T 2 3 + c T 1 3 , a , b , c R , this results in the mixer
H M , B = a 4 ( X X + Y Y ) ( I + Z ) + b 4 ( I + Z ) ( X X + Y Y ) + c 4 ( X ( I + Z ) X + Y ( I + Z ) Y ) ,
with associated cost Cost ( H M ) = 24 + 12 c for a = b = 1 , c { 0 , 1 } . In this case, Corollary 2 allows us to add the mixer for C = { | 110 , | 101 , | 011 } since B C = . The mixer
H M = H M , B + H M , C = a 2 ( X X I + Y Y I ) + b 2 ( X I X + Y I Y ) + c 2 ( I X X + I Y Y ) ,
has cost Cost ( H M ) = 8 + 4 c for a = b = 1 , c { 0 , 1 } . However, this mixer can not be realized, since not all terms of H M commute. Figure 5 shows two ways to put the graph into commuting Pauli-terms with only one way to preserve the feasible subspace, as discussed in Section 3.5. For the Trotterization according to T = T 1 2 k 1 + T 2 3 k 2 , we have that ( T ) 3 , 1 = ( T ) 1 , 3 = 0 , k 1 , k 2 N . To fulfill Theorem 2, we need to include the term T 1 3 as well. The Trotterized mixer with minimal cost is therefore given by T = T 1 2 + T 2 3 + T 1 3 .

5.1.3. The General Case n > 2

We start with the observation that for any symmetric T R n × n with zero diagonal, we have
H M , B = j = 1 n k = j + 1 n ( T ) j , k P ^ j , k , ( P ^ j , k ) l = 1 2 n ( X + Y ) , if l { j , k } , ( I + Z ) , if x l = z l = 0
The cost for implementing one of the entries, i.e., e i β P ^ j , k is given by the recursive formula
Cost ( P ^ j , k ) = l = 2 n 2 ( l 1 ) f n l , n > 2 , f n l = f n 1 l + f n 1 l 1 , f 2 l = 2 , if l = 2 , 0 , else ,
where f n l is Pascal’s triangle starting with 2 instead of 1. Examples of the resulting costs for different transition matrices can be seen in Table 3.
The cost of the mixers can be considerably reduced by adding mixers generalized from case n = 3 . If the entries ( T ) i j of T are nonzero, we can add mixers for each of the 2 n 2 pairs of states x { 0 , 1 } n that fulfill that ( x i = 0 x j = 1 ) ( x i = 1 x j = 0 ) . We can enumerate them with 0 l 2 n 2 1 by B ˜ i , j l = { | x , x { 0 , 1 } n , s . t . x i , j = b i n ( l ) } , where x i , j removes the indices i and j of x. We have that B B ˜ i , j l = . We observe that for n 2 , let | x , | z with H a m ( x , z ) = 2 , i.e., the strings x , z differ at exactly two positions we have that
| x z | + | z x | = 1 2 n 1 ( X + Y ) , if x l z l , ( I + Z ) , if x l = z l = 0 , ( I Z ) , if x l = z l = 1 .
Adding these mixers for each nonzero entry T j k of T has the effect of summing over all possible combinations of ( I ± Z ) 2 n 2 , which is equal to the identity. Therefore, we obtain the mixer
H M , B + i , j J l = 0 2 n 2 H M , B i , j l = j = 1 n k = j + 1 n ( T ) j , k P j , k , P j , k = X i X j + Y i Y j ,
which reduces the cost of one term to Cost ( P j , k ) = 4 .

5.1.4. Trotterizations

Not all Pauli-strings of the mixer in Equation (51) commute. This necessitates a suitable and efficient Trotterization. We will use Theorems 2 and 3 to identify valid Trotterized mixers. As pointed out in [19], when n is a power of two, one can realize a Trotterization, which is exact in the feasible subspace B. Termed simultaneous complete-graph mixer, this involves all possible pairs ( i , j ) corresponding to a certain Trotterization of mixer for T A . We will see that there are more efficient mixers that provide transitions between all pairs of feasible states.
Another possibility is to Trotterize T Δ , c or T Δ according to odd and even entries as described in Section 3.2.4. This is what is termed a parity-partitioned mixer in [19]. However, fewer and fewer feasible states can be reached as n increases, as we have seen in Figure 6. Repeated applications ( r > 0 in Equation (6)) are necessary, and r increases with increasing n. Figure 7 shows a comparison of different Trotterizations. As the cost of the mixer is dictated by the number of nonzero entries of the transition matrix, it is more efficient to add mixers for off-diagonals according to i I ( T O ( i ) + T E ( i ) ) for some suitable index set I.

5.2. General Cases

In this section, we analyze some specific cases that go beyond unrestricted mixers and mixers restricted to one-hot states.

5.2.1. Example 1

We start by looking at the case B = { | 100 , | 010 , | 011 } . Using T Δ = c 1 , 2 T 1 2 + c 2 , 3 T 2 3 and T Δ , c = T Δ + c 3 , 1 T 3 1 , this results in the mixer
H M , B = + c 1 , 2 1 4 ( X X + Y Y ) ( I + Z ) + c 2 , 3 1 4 ( I + Z ) ( I Z ) X + c 3 , 1 1 4 ( X X X + Y X Y + Y Y X X Y Y )
with Cost ( H M ) = 12 c 1 , 2 + 8 c 2 , 3 + 16 c 3 , 1 . Here, ( c 1 , 2 , c 2 , 3 , c 3 , 1 ) = ( 1 , 1 , 0 ) corresponds to T Δ and ( 1 , 1 , 1 ) to T Δ , c . There is a lot of freedom adding mixers, which is summarized in Table 4. Adding more terms only increases the cost for this case. Overall, the most efficient mixers for B are given by
H M = c 1 , 2 2 [ X X I + X X Z or X X I + Y Y Z ] + c 2 , 3 2 [ ( I + Z ) I X or I ( I Z ) X ] + c 3 , 1 2 [ X X X X Y Y or X X X + Y X Y ] ,
with associated cost Cost ( H M ) = 6 c 1 , 2 + 2 c 2 , 3 + 8 c 3 , 1 . A valid Trotterization is given through splitting according to T i j .

5.2.2. Example 2

Finally, we investigate the case B = { | 10010 , | 01110 , | 10011 , | 11101 , | 00110 , | 01010 } , which restricts to six of the total 2 5 = 32 computational basis states for 5 qubits. It is not clear a priori if for any (distinct) pair T i 1 , j 1 and T i 2 , j 2 , all pairs of non-vanishing Pauli strings commute. In order to fulfill Equation (6) for r = 1 , this means that one needs to Trotterize according to all pairs of T A , as shown in Table 5. The resulting cost for this Trotterized mixer is Cost ( H M ) = 1360 . Since H B is spanned by k = 2 n | B | = 26 computational basis states, there are k 2 = 325 different pairs to add to each T i j . As Table 5 shows, this can reduce the cost of the resulting mixer to Cost ( H M ) = 568 . Of course, there is the possibility to reduce the cost even further by adding more mixers for states in the kernel of H M , B . However, this quickly becomes computationally very demanding, when all possibilities are considered in a brute-force fashion.

6. Conclusions and Outlook

While designing mixers with the presented framework is more or less straightforward, designing efficient mixers turns out to be a difficult task. An additional difficulty arises due to the need for Trotterization. Somewhat counter-intuitively, the more restricted the mixer, i.e., the smaller the subspace, the more design freedom one has to increase efficiency. More structure/symmetry of the restricted subspace seems to allow for a lower cost of the resulting mixer. For the case of “one-hot” states, we provide a deeper understanding of the requirements for Trotterizations. Compared to the state of the art in the literature, this leads to a considerable reduction of the cost of the mixer, as defined in Equation (36). The introduced framework reveals a rigorous mathematical analysis of the underlying structure of mixer Hamiltonians and deepens the understanding of those. We believe the framework can serve as the backbone for the further development of efficient mixers.
When adding mixers, in general, the kernel of H M , B is spanned by k = 2 n | B | computational basis states. Therefore, one can add
i = 2 k k 2
different mixers for each nonzero entry T i j of T. Out of all these, one wants to find the combination leading to the lowest overall cost. Clearly, brute-force optimization is computationally not tractable, even for a moderate number of qubits n when | B | 2 n . Further research should aim to carefully analyze the structure of the basis states in B in order to develop efficient (heuristic) algorithms to find low-cost mixers through adding mixers in the kernel of H M , B .

Author Contributions

Conceptualization, F.G.F., K.O.L., H.M.N., A.J.S. and G.S.; software, F.G.F.; formal analysis, F.G.F.; data curation, F.G.F.; writing—original draft preparation, F.G.F.; writing—review and editing, F.G.F., K.O.L., H.M.N., A.J.S. and G.S.; visualization, F.G.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data and the python/jupyter notebook source code for reproducing the results obtained in this article are available at https://github.com/OpenQuantumComputing as of 1 June 2022.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Farhi, E.; Goldstone, J.; Gutmann, S. A quantum approximate optimization algorithm. arXiv 2014, arXiv:1411.4028. [Google Scholar]
  2. Hadfield, S.; Wang, Z.; O’Gorman, B.; Rieffel, E.G.; Venturelli, D.; Biswas, R. From the quantum approximate optimization algorithm to a quantum alternating operator ansatz. Algorithms 2019, 12, 34. [Google Scholar] [CrossRef]
  3. Lucas, A. Ising formulations of many NP problems. Front. Phys. 2014, 2, 5. [Google Scholar] [CrossRef]
  4. Hatano, N.; Suzuki, M. Finding exponential product formulas of higher orders. In Quantum Annealing and Other Optimization Methods; Springer: Berlin/Heidelberg, Germany, 2005; pp. 37–68. [Google Scholar]
  5. Trotter, H.F. On the product of semi-groups of operators. Proc. Am. Math. Soc. 1959, 10, 545–551. [Google Scholar] [CrossRef]
  6. Kronsjö, L. Algorithms: Their Complexity and Efficiency; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1987. [Google Scholar]
  7. Guerreschi, G.G.; Matsuura, A. QAOA for Max-Cut requires hundreds of qubits for quantum speed-up. Sci. Rep. 2019, 9, 6903. [Google Scholar] [CrossRef]
  8. Cerezo, M.; Arrasmith, A.; Babbush, R.; Benjamin, S.C.; Endo, S.; Fujii, K.; McClean, J.R.; Mitarai, K.; Yuan, X.; Cincio, L.; et al. Variational quantum algorithms. Nat. Rev. Phys. 2021, 3, 625–644. [Google Scholar] [CrossRef]
  9. Moll, N.; Barkoutsos, P.; Bishop, L.S.; Chow, J.M.; Cross, A.; Egger, D.J.; Filipp, S.; Fuhrer, A.; Gambetta, J.M.; Ganzhorn, M.; et al. Quantum optimization using variational algorithms on near-term quantum devices. Quantum Sci. Technol. 2018, 3, 030503. [Google Scholar] [CrossRef]
  10. Bittel, L.; Kliesch, M. Training variational quantum algorithms is NP-hard—Even for logarithmically many qubits and free fermionic systems. Phys. Rev. Lett. 2021, 127, 120502. [Google Scholar] [CrossRef]
  11. Wang, S.; Fontana, E.; Cerezo, M.; Sharma, K.; Sone, A.; Cincio, L.; Coles, P.J. Noise-induced barren plateaus in variational quantum algorithms. Nat. Commun. 2021, 12, 1–11. [Google Scholar] [CrossRef]
  12. Zhang, H.K.; Zhu, C.; Liu, G.; Wang, X. Fundamental limitations on optimization in variational quantum algorithms. arXiv 2022, arXiv:2205.05056. [Google Scholar]
  13. Zhu, L.; Tang, H.L.; Barron, G.S.; Mayhall, N.J.; Barnes, E.; Economou, S.E. An adaptive quantum approximate optimization algorithm for solving combinatorial problems on a quantum computer. arXiv 2020, arXiv:2005.10258. [Google Scholar]
  14. Bravyi, S.; Kliesch, A.; Koenig, R.; Tang, E. Obstacles to state preparation and variational optimization from symmetry protection. arXiv 2019, arXiv:1910.08980. [Google Scholar]
  15. Egger, D.J.; Marecek, J.; Woerner, S. Warm-starting quantum optimization. arXiv 2020, arXiv:2009.10095. [Google Scholar] [CrossRef]
  16. Vikstål, P.; Grönkvist, M.; Svensson, M.; Andersson, M.; Johansson, G.; Ferrini, G. Applying the Quantum Approximate Optimization Algorithm to the Tail-Assignment Problem. Phys. Rev. Appl. 2020, 14, 034009. [Google Scholar] [CrossRef]
  17. Fuchs, F.G.; Kolden, H.; Aase, N.H.; Sartor, G. Efficient Encoding of the Weighted MAX k-CUT on a Quantum Computer Using QAOA. SN Comput. Sci. 2021, 2, 1–14. [Google Scholar] [CrossRef]
  18. Hadfield, S.; Wang, Z.; Rieffel, E.G.; O’Gorman, B.; Venturelli, D.; Biswas, R. Quantum approximate optimization with hard and soft constraints. In Proceedings of the Second International Workshop on Post Moores Era Supercomputing, Denver, CO, USA, 12–17 November 2017; pp. 15–21. [Google Scholar]
  19. Wang, Z.; Rubin, N.C.; Dominy, J.M.; Rieffel, E.G. XY mixers: Analytical and numerical results for the quantum alternating operator ansatz. Phys. Rev. A 2020, 101, 012320. [Google Scholar] [CrossRef]
  20. Lieb, E.; Schultz, T.; Mattis, D. Two soluble models of an antiferromagnetic chain. Ann. Phys. 1961, 16, 407–466. [Google Scholar] [CrossRef]
  21. Cook, J.; Eidenbenz, S.; Bärtschi, A. The quantum alternating operator ansatz on maximum k-vertex cover. In Proceedings of the 2020 IEEE International Conference on Quantum Computing and Engineering (QCE), Broomfield, CO, USA, 12–16 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 83–92. [Google Scholar]
  22. Hen, I.; Sarandy, M.S. Driver Hamiltonians for constrained optimization in quantum annealing. Phys. Rev. A 2016, 93, 062312. [Google Scholar] [CrossRef]
  23. Hen, I.; Spedalieri, F.M. Quantum annealing for constrained optimization. Phys. Rev. Appl. 2016, 5, 034007. [Google Scholar] [CrossRef]
  24. Sakurai, J.J. Advanced Quantum Mechanics; Pearson: Upper Saddle River, NJ, USA, January 1967. [Google Scholar]
  25. Gokhale, P.; Angiuli, O.; Ding, Y.; Gui, K.; Tomesh, T.; Suchara, M.; Martonosi, M.; Chong, F.T. Minimizing state preparations in variational quantum eigensolver by partitioning into commuting families. arXiv 2019, arXiv:1907.13623. [Google Scholar]
  26. Gui, K.; Tomesh, T.; Gokhale, P.; Shi, Y.; Chong, F.T.; Martonosi, M.; Suchara, M. Term grouping and travelling salesperson for digital quantum simulation. arXiv 2020, arXiv:2001.05983. [Google Scholar]
Figure 1. Illustration of properties of Hamiltonians constructed with Theorem 1.
Figure 1. Illustration of properties of Hamiltonians constructed with Theorem 1.
Algorithms 15 00202 g001
Figure 2. Corollary 2 shows that adding a mixer with support outside Sp B is also a valid mixer for B.
Figure 2. Corollary 2 shows that adding a mixer with support outside Sp B is also a valid mixer for B.
Algorithms 15 00202 g002
Figure 3. Examples of the squared overlap between two states for the case | B | = 4 . The squared overlap is independent of what the states in B = { | z 0 , | z 1 , | z 2 , | z 3 } are. The comparison for different T shows that there exists a β such that the overlap is nonzero, except for T 2 3 which, as expected, does not provide transitions between | z 0 and | z 3 .
Figure 3. Examples of the squared overlap between two states for the case | B | = 4 . The squared overlap is independent of what the states in B = { | z 0 , | z 1 , | z 2 , | z 3 } are. The comparison for different T shows that there exists a β such that the overlap is nonzero, except for T 2 3 which, as expected, does not provide transitions between | z 0 and | z 3 .
Algorithms 15 00202 g003
Figure 4. Examples of the structure of T Ham ( d ) . The black color represents non-vanishing entries equal to one, representing pairs with the specified Hamming distance.
Figure 4. Examples of the structure of T Ham ( d ) . The black color represents non-vanishing entries equal to one, representing pairs with the specified Hamming distance.
Algorithms 15 00202 g004
Figure 5. In the commutation graph (middle) of the terms of the mixer given in Equation (47), an edge occurs if the terms commute. From this, we can group terms into three (nodes connected by green edge) or two (nodes connected by red/blue edges) sets. Only the left/green grouping preserves the feasible subspace, the right one does not.
Figure 5. In the commutation graph (middle) of the terms of the mixer given in Equation (47), an edge occurs if the terms commute. From this, we can group terms into three (nodes connected by green edge) or two (nodes connected by red/blue edges) sets. Only the left/green grouping preserves the feasible subspace, the right one does not.
Algorithms 15 00202 g005
Figure 6. Valid (white) and invalid (black) transitions between pairs of states, as defined in Theorem 2 for Trotterized mixer Hamiltonians. The first row shows that for T 1 = T O ( 1 ) , c and T 2 = T E ( 1 ) , the mixer U = e i β T 1 e i β T 2 does not provide transitions between all pairs of feasible states, although U = e i β ( T 1 + T 2 ) does.
Figure 6. Valid (white) and invalid (black) transitions between pairs of states, as defined in Theorem 2 for Trotterized mixer Hamiltonians. The first row shows that for T 1 = T O ( 1 ) , c and T 2 = T E ( 1 ) , the mixer U = e i β T 1 e i β T 2 does not provide transitions between all pairs of feasible states, although U = e i β ( T 1 + T 2 ) does.
Algorithms 15 00202 g006
Figure 7. Comparison of different Trotterization mixers restricted to “one-hot” states. All markers represent cases when the resulting mixer provides transitions for all pairs of feasible states; see also Figure 6. All versions can be implemented in linear depth. The most efficient Trotterizations are achieved by using sub-diagonal entries. The cost equals 4 times # (XX + YY)-terms.
Figure 7. Comparison of different Trotterization mixers restricted to “one-hot” states. All markers represent cases when the resulting mixer provides transitions for all pairs of feasible states; see also Figure 6. All versions can be implemented in linear depth. The most efficient Trotterizations are achieved by using sub-diagonal entries. The cost equals 4 times # (XX + YY)-terms.
Algorithms 15 00202 g007
Table 1. Comparison of the complexity of the two algorithms for n qubits. Here, γ is the number of nonzero entries of T.
Table 1. Comparison of the complexity of the two algorithms for n qubits. Here, γ is the number of nonzero entries of T.
Algorithm 1Algorithm 2Algorithm 3
runtime O ( 2 5 n ) O ( n γ ) O ( n γ )
memory O ( 2 2 n ) O ( n γ ) O ( n γ )
Table 2. Full/unrestricted mixer case for n qubits, i.e., | B | = 2 n . Comparison of the total Hamming distance of the transition matrix T as well as resulting requirements for implementations in terms of single- and two-qubit gates for different T.
Table 2. Full/unrestricted mixer case for n qubits, i.e., | B | = 2 n . Comparison of the total Hamming distance of the transition matrix T as well as resulting requirements for implementations in terms of single- and two-qubit gates for different T.
n123456123456123456
Ham ( T ) # U 3 #CX = Cost ( H M )
T H a m ( 1 ) 282464160384123456000000
T A 21696512256012,28812345602103498258
T Δ , c 2122860124252111111021244132356
T Δ 282252114240111111042068196516
T r a n d 21696512256012,2882468101201086552326017,650
Table 3. Comparison of the cost #CX = Cost ( H M ) of mixers constrained to “one-hot” states. The Trotterized versions we define T 1 = T O ( 1 ) , c and T 2 = T E ( 1 ) . All Hamiltonians need to be Trotterized.
Table 3. Comparison of the cost #CX = Cost ( H M ) of mixers constrained to “one-hot” states. The Trotterized versions we define T 1 = T O ( 1 ) , c and T 2 = T E ( 1 ) . All Hamiltonians need to be Trotterized.
n34567891015
H M , B
T Δ 12 · 232 · 380 · 44192 · 54448 · 641024 · 742304 · 845120 · 94245,760 · 144
T Δ , c 12 · 332 · 480 · 54192 · 64448 · 741024 · 842304 · 945120 · 10245,760 · 154
T A 12 · 332 · 680 · 10192 · 15448 · 211024 · 282304 · 365120 · 45245,760 · 105
H M , B + i H M , C i
T Δ 4 · 24 · 34 · 444 · 544 · 644 · 744 · 844 · 944 · 144
T Δ , c 4 · 34 · 44 · 544 · 644 · 744 · 844 · 944 · 104 · 154
T A 4 · 34 · 64 · 104 · 154 · 214 · 284 · 364 · 454 · 105
Table 4. Comparison of the Cost ( H M , B + H M , C ) for different added mixers C for the case B = { | 100 , | 010 , | 011 } . All 10 possible pairs are shown. We see that the cost can be both reduced and increased.
Table 4. Comparison of the Cost ( H M , B + H M , C ) for different added mixers C for the case B = { | 100 , | 010 , | 011 } . All 10 possible pairs are shown. We see that the cost can be both reduced and increased.
C = T 1 2 T 2 3 T 3 1
{ } 12816
{ | 000 , | 001 } 20224
{ | 000 , | 101 } 242028
{ | 000 , | 110 } 62028
{ | 000 , | 111 } 28248
{ | 001 , | 101 } 201624
{ | 001 , | 110 } 28248
{ | 001 , | 111 } 62028
{ | 101 , | 110 } 242028
{ | 101 , | 111 } 201624
{ | 110 , | 111 } 20224
Table 5. Comparison of the Cost ( H M , B + H M , C ) for different added mixers C for the case B = { | 10010 , | 01110 , | 10011 , | 11101 , | 00110 , | 01010 } . There are 325 possible pairs in total. We see that the cost can be both reduced and increased.
Table 5. Comparison of the Cost ( H M , B + H M , C ) for different added mixers C for the case B = { | 10010 , | 01110 , | 10011 , | 11101 , | 00110 , | 01010 } . There are 325 possible pairs in total. We see that the cost can be both reduced and increased.
C = T 1 2 T 1 3 T 1 4 T 1 5 T 1 6 T 2 3 T 2 4 T 2 5 T 2 6 T 3 4 T 3 5 T 3 6 T 4 5 T 4 6 T 5 6
{ } 9664112808011296646496969611211280
{ | 00010 , | 00011 } 16024176144144176160128128160160160176176144
{ | 00010 , | 01101 } 20817648192192224208176176208208208224224192
{ | 00010 , | 10001 } 19216020817617620848160160192192192208208176
{ | 00010 , | 10101 } 20817622419219222420817617620820820822448192
{ | 00010 , | 11001 } 20817622419219222420817617620820820848224192
{ | 00011 , | 01101 } 19216020817617620819216016040192192208208176
{ | 00011 , | 10000 } 19216020817617620848160160192192192208208176
{ | 00100 , | 01000 } 17614419216016019217614414417617617619219232
{ | 00100 , | 01100 } 16012817614414417616024128160160160176176144
{ | 00100 , | 10000 } 17614419232160192176144144176176176192192160
{ | 00100 , | 10001 } 19216020817617620819216016019240192208208176
{ | 00100 , | 10111 } 19216020817617620848160160192192192208208176
{ | 00101 , | 10110 } 19216020817617620848160160192192192208208176
{ | 00111 , | 01011 } 17614419216016019217614414417617617619219232
{ | 00111 , | 01111 } 16012817614414417616024128160160160176176144
{ | 00111 , | 10100 } 19216020817617620848160160192192192208208176
{ | 01000 , | 01100 } 16012817614414417616012824160160160176176144
{ | 01000 , | 10000 } 17614419216032192176144144176176176192192160
{ | 01000 , | 10001 } 19216020817617620819216016019219240208208176
{ | 01000 , | 11011 } 19216020817617620848160160192192192208208176
{ | 01001 , | 11010 } 19216020817617620848160160192192192208208176
{ | 01011 , | 01111 } 16012817614414417616012824160160160176176144
{ | 01011 , | 11000 } 19216020817617620848160160192192192208208176
{ | 01100 , | 10000 } 40160208176176208192160160192192192208208176
{ | 01100 , | 10001 } 20817622419219248208176176208208208224224192
{ | 01100 , | 11111 } 19216020817617620848160160192192192208208176
{ | 01101 , | 11110 } 19216020817617620848160160192192192208208176
{ | 01111 , | 11100 } 19216020817617620848160160192192192208208176
{ | 10000 , | 10001 } 16024176144144176160128128160160160176176144
{ | 10110 , | 10111 } 16024176144144176160128128160160160176176144
{ | 10110 , | 11010 } 17614419216016019217614414417617617619219232
{ | 10110 , | 11110 } 16012817614414417616024128160160160176176144
{ | 11010 , | 11011 } 16024176144144176160128128160160160176176144
{ | 11010 , | 11110 } 16012817614414417616012824160160160176176144
{ | 0000 }
{ | 00000 , | 11111 } 224192240208208240224192192224224224240240208
{ | 0000 }
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Fuchs, F.G.; Lye, K.O.; Møll Nilsen, H.; Stasik, A.J.; Sartor, G. Constraint Preserving Mixers for the Quantum Approximate Optimization Algorithm. Algorithms 2022, 15, 202. https://doi.org/10.3390/a15060202

AMA Style

Fuchs FG, Lye KO, Møll Nilsen H, Stasik AJ, Sartor G. Constraint Preserving Mixers for the Quantum Approximate Optimization Algorithm. Algorithms. 2022; 15(6):202. https://doi.org/10.3390/a15060202

Chicago/Turabian Style

Fuchs, Franz Georg, Kjetil Olsen Lye, Halvor Møll Nilsen, Alexander Johannes Stasik, and Giorgio Sartor. 2022. "Constraint Preserving Mixers for the Quantum Approximate Optimization Algorithm" Algorithms 15, no. 6: 202. https://doi.org/10.3390/a15060202

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop