Quantum-Inspired Classification Algorithm from DBSCAN–Deutsch–Jozsa Support Vectors and Ising Prediction Model

Shiba, Kodai; Chen, Chih-Chieh; Sogabe, Masaru; Sakamoto, Katsuyoshi; Sogabe, Tomah

doi:10.3390/app112311386

Open AccessArticle

Quantum-Inspired Classification Algorithm from DBSCAN–Deutsch–Jozsa Support Vectors and Ising Prediction Model

by

Kodai Shiba

^1,2,

Chih-Chieh Chen

²

,

Masaru Sogabe

²,

Katsuyoshi Sakamoto

^1,3 and

Tomah Sogabe

^1,2,3,*

¹

Engineering Department, The University of Electro-Communications, Tokyo 182-8585, Japan

²

Grid, Inc., Tokyo 107-0061, Japan

³

i-PERC, The University of Electro-Communications, Tokyo 182-8585, Japan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(23), 11386; https://doi.org/10.3390/app112311386

Submission received: 2 October 2021 / Revised: 2 November 2021 / Accepted: 9 November 2021 / Published: 1 December 2021

(This article belongs to the Topic Machine and Deep Learning)

Download

Browse Figures

Versions Notes

Abstract

:

Quantum computing is suggested as a new tool to deal with large data set for machine learning applications. However, many quantum algorithms are too expensive to fit into the small-scale quantum hardware available today and the loading of big classical data into small quantum memory is still an unsolved obstacle. These difficulties lead to the study of quantum-inspired techniques using classical computation. In this work, we propose a new classification method based on support vectors from a DBSCAN–Deutsch–Jozsa ranking and an Ising prediction model. The proposed algorithm has an advantage over standard classical SVM in the scaling with respect to the number of training data at the training phase. The method can be executed in a pure classical computer and can be accelerated in a hybrid quantum–classical computing environment. We demonstrate the applicability of the proposed algorithm with simulations and theory.

Keywords:

quantum-inspired algorithm; Deutsch–Jozsa algorithm; Ising model; support-vector machine

1. Introduction

Quantum machine learning [1,2,3] is an emerging field using quantum information processing for machine learning applications. Various quantum machine learning algorithms have been proposed, such as quantum Boltzmann machines [4], supervised quantum autoencoders [5], reinforcement learning using quantum Boltzmann machines [6], quantum circuit learning [7,8] and some quantum methods for linear algebra [9,10,11]. Even though the exponentially large Hilbert space is expected to help tackling the “curse of dimensionality” in machine learning, loading large classical data sets into a small quantum memory is a bottleneck [12,13]. Even though this problem can be circumvented in some settings [14], the available quantum information processors are restricted by the small number of qubits and high operational noise in the current noisy intermediate-scale quantum [15] era. On the other hand, these quantum algorithms have inspired various new classical efficient methods [16,17,18,19]. The dequantizing of quantum algorithms is not only for practical concerns, but also for theoretical interests in complexity theory [20].

The support-vector machine (SVM) is a widely used classification algorithm [21,22,23,24]. There are also newly developed variants of the SVM [25,26] and several quantum versions of the SVM have been proposed [27,28,29]. One proposal is to use the HHL algorithm [9] to invert the kernel matrix [27]. Schuld et al. [28] proposed a QSVM algorithm in which the distance for the kernel classifier is evaluated by single qubit gate circuits. Recently, Havlicek et al. [29] implemented the quantum circuit learning classification and a quantum kernel estimator for the SVM. However, due to the requirements of quantum coherence, the applications of these algorithms to large-scale data sets has not been demonstrated yet.

In this work, we propose a new quantum-inspired algorithm for classification tasks. The support vectors are selected from the data by a principle inspired by the classical clustering algorithm density-based spatial clustering of applications with noise (DBSCAN) [30] and the quantum algorithm Deutsch–Jozsa (DJ) [31]. DBSCAN is a commonly used classical clustering algorithm that determines clusters by considering numbers of data points within some distance parameter

ϵ

. DBSCAN is efficient, noise-resilient and does not require the input of the number of clusters. DJ is a deterministic quantum algorithm that determines a specific property of a given Boolean function

f : {0, 1}^{n} \to {0, 1}

with a single oracle call to the function f. The classical deterministic algorithm to solve the DJ problem requires

O (2^{n})

calls. We combine DBSCAN and DJ to identify support vectors that are close to the classification boundary. The support vectors are then used for prediction based on an Ising model classifier. The prediction labels are identified as spins in an Ising model and the prediction label is aligned with nearby support vectors through spin–spin interactions. The training time complexity of the proposed algorithm scales according to

O (l^{2} d)

, where l is the number of training data and d is the feature dimension. This is favorable comparing to

O (l^{3} d)

standard kernel support-vector machines, such as LIBSVM, used in scikit-learn [21,22,23,24]. The proposed algorithm can be executed on classical computers, while quantum devices could be used to accelerate several subroutines in the algorithm. The numerical verification is presented. We also provide theoretical evidences for the learnability of the algorithm. We show that the training algorithm is described by an integer linear programming problem. We also derive upper bounds for the VC dimension of the Ising classifier in two special lattices.

2. The Learning Algorithm

2.1. Hypothesis Set

In this subsection, we briefly illustrate the hypothesis set and training and testing algorithms. The flow chart is illustrated in Figure 1. The details are discussed in the following sections. Given the training data set

τ = {({\vec{x}}_{i}, y_{i}) | i = 1, \dots, l}

, where

y_{i} \in {\pm 1}

are the labels for binary classification, the goal of a classification algorithm is to find a hypothesis

f_{θ} : ({\vec{x}}_{1}, \dots, {\vec{x}}_{n_{t}}) \mapsto {- 1, + 1}^{n_{t}}

with small prediction error, where

n_{t}

is the number of test data. The classification algorithm consists of two parts, training and prediction. A training algorithm finds a hypothesis by minimizing some loss function

L (f_{θ} ({\vec{x}}_{i}), y_{i})

. Our hypothesis set consists of the ground states of the Ising model. The hypothesis set is

\begin{matrix} f_{θ} ({\vec{x}}_{1}, \dots, {\vec{x}}_{n_{t}}) & = (s_{1}, \dots, s_{n_{t}}), {\vec{x}}_{i} \in T for 1 \leq i \leq n_{t}, \end{matrix}

(1)

\begin{matrix} (s_{1}, \dots, s_{n_{t} + n_{s v}}) & = arg min_{{s_{k}}} [- \sum_{\underset{{\vec{x}}_{i}, {\vec{x}}_{j} \in S \cup T}{i < j}} J_{i, j} ({\vec{x}}_{i}, {\vec{x}}_{j}) s_{i} s_{j} - \sum_{\underset{{\vec{x}}_{i} \in S \cup T}{i}} h_{i} s_{i}], \end{matrix}

(2)

where

S = {support vectors}

and

T = {test data}

,

n_{s v} = | S |

,

n_{t} = | T |

,

s_{i} \in {- 1, 1}

.

h_{i} = y_{i}

for

{\vec{x}}_{i} \in S

and

h_{i} = 0

for

{\vec{x}}_{i} \in T

. The free parameters of the model are, in general,

θ = (J_{i j})

. We use a physics-motivated ansatz to efficiently determine the support vectors and

θ

from three free parameters,

θ^{'} = (α, β, ϵ)

, as explained in the following subsections. Finally, the training is performed by minimizing the loss function with respect to the remaining three free parameters,

θ = (α, β, ϵ)

, with particle swarm optimization (PSO). The loss function is the cross-entropy loss function

L (f_{θ} ({\vec{x}}_{i}), y_{i}) = - (1 - q ({\vec{x}}_{i})) ln (p_{- 1} ({\vec{x}}_{i})) - q ({\vec{x}}_{i}) ln (p_{+ 1} ({\vec{x}}_{i})),

(3)

where

q ({\vec{x}}_{i}) = (1 + y_{i}) / 2

;

p_{\pm 1} ({\vec{x}}_{i})

is the probability that

s_{i} = \pm 1

for the Ising ground state.

2.2. DBSCAN Algorithm for Determining the Support Vector

The training algorithm is designed by applying the DBSCAN clustering algorithm to a Deutsch–Jozsa decision problem on the training data. The idea is depicted in Figure 2. Let us consider the hypersphere

B_{ϵ} ({\vec{x}}_{i})

of radius

ϵ

centered in the data point

{\vec{x}}_{i}

. Near the classification boundary, it is more likely that the data in the sphere have two different classification labels. On the other hand, for the data in the interior of each cluster, it is more likely that all the data points in the sphere have the same label. Hence, the classification boundary can be identified by a DBSCAN scanning through all the data points. For each data point, we query a decision problem, namely, whether the data inside the sphere is constant or not. If the number of non-constant spheres around is large for a given data point, we then select this point as a support vector for the prediction algorithm. This decision problem takes the form of a Deutsch–Jozsa problem and the potential use of the quantum DJ algorithm to enhance the training is discussed in Section 2.4.

After selecting the support vectors from the training data, we compute a kernel matrix

{(J_{s v})}_{i, j} = J ({\vec{x}}_{i}, {\vec{x}}_{j})

for

{\vec{x}}_{i}, {\vec{x}}_{j} \in {support vectors}

. This kernel matrix will be used in the prediction algorithm. The matrix elements are, in general, some functions of data positions. We choose J to be the inverse of some power-law function of distances between data points:

J ({\vec{x}}_{i}, {\vec{x}}_{j}) = \frac{1}{| | {\vec{x}}_{i} - {\vec{x}}_{j} {| |}^{β}} .

(4)

The training algorithm can be described as the pseudo-code Algorithm 1.

Algorithm 1: The training algorithm to determine support vectors.

2.3. Integer Linear Programming Formulation

We show that the training algorithm is equivalent to solving an integer linear programming problem. We use the following definitions:

B_{ϵ} ({\vec{x}}_{i})

is the hypersphere of radius

ϵ

centered in

{\vec{x}}_{i}

;

Y_{i j} = \frac{1}{2} (1 + y_{i} y_{j})

is the indicator y-correlation matrix of training labels;

K_{i j}^{ϵ} = κ_{ϵ} ({\vec{x}}_{i}, {\vec{x}}_{j}) = lim_{γ \to \infty} \frac{1}{2} [1 + tanh (γ (ϵ - | | {\vec{x}}_{i} - {\vec{x}}_{j} | |))] = H (ϵ - | | {\vec{x}}_{i} - {\vec{x}}_{j} | |)

is the indicator kernel matrix, where

H (x)

is the Heaviside step function. The indicator kernel matrix satisfies

K_{i j}^{ϵ} = \{\begin{matrix} 1, & if {\vec{x}}_{j} \in B_{ϵ} ({\vec{x}}_{i}) \\ 0, & otherwise \end{matrix} .

(5)

The indicator y-correlation matrix satisfies

Y_{i j} = \{\begin{matrix} 1, & if y_{i} = y_{j} \\ 0, & if y_{i} = - y_{j} \end{matrix} .

(6)

From the definitions, we have

\prod_{p = 1}^{l} Y_{j p}^{K_{j p}} = \{\begin{matrix} 1, & if {y_{k} | {\vec{x}}_{k} \in B_{ϵ} ({\vec{x}}_{j})} is constant \\ 0, & if {y_{k} | {\vec{x}}_{k} \in B_{ϵ} ({\vec{x}}_{j})} is mixed \end{matrix} .

(7)

Then, the counting number returned by the DBSCAN-DJ training procedure is

c_{i} = \sum_{j = 1}^{l} K_{i j} (1 - \prod_{p = 1}^{l} Y_{j p}^{K_{j p}})

. Since all

c_{i} > 0

, the ranking algorithm selects support vectors that satisfy the solution for the constrained integer linear programming problem:

\begin{matrix} min_{α_{i}} \sum_{i}^{l} c_{i} α_{i} \end{matrix}

(8)

\begin{matrix} α_{i} \in {0, 1} \end{matrix}

(9)

\begin{matrix} \sum_{i}^{l} α_{i} = ⌊ \frac{1}{α} ⌋, \end{matrix}

(10)

where

α_{i} = 1

if and only if

{\vec{x}}_{i}

is selected as a support vector. Notice that, if the number of training data is l, the sorting only takes

O (l log l)

time, while general integer linear programming is NP-complete.

2.4. Quantum-Enhanced Algorithm with DJ

Given a Boolean function

f : {0, 1}^{n} \to {0, 1}

, that is guaranteed to be constant or balanced, and its reversible circuit

U_{f} : | x 〉 | y 〉 \to | x 〉 | y \oplus f (x) 〉

, the DJ algorithm determines whether f is constant or balanced with one call to

U_{f}

. A balanced function means

f (x) = 1

for half of the possible input string x. The DJ algorithm is conducted using operator

(H^{\otimes n} \otimes 1) U_{f} H^{\otimes (n + 1)}

on the initial state

{| 0 〉}^{\otimes n} | 1 〉

. The probability to measure an all-zero string

x = 00 \dots 0

is

P_{0} = {| \frac{1}{2^{n}} \sum_{x} {(- 1)}^{f (x)} |}^{2}

. Hence, if f is constant, the probability to obtain a measurement result

00 \dots 0

is

P_{0} = 1

. If f is balanced, then

P_{0} = 0

. If f is not constant nor balanced, then

1 > P_{0} > 0

.

Given the data

{({\vec{x}}_{i}, y_{i})}

in a sphere, we construct the reversible circuit for the function

f : {\vec{x}}_{i} \to (1 + y_{i}) / 2

. This serves as the oracle function in the DJ algorithm. A four-data point example is shown in Table 1 and Figure 3. For n data points, the reading of the data and the construction of the oracle

U_{f}

take

O (n)

time. However, once given the oracle, the DJ algorithm can determine whether the function is constant or balanced with a single call. In the case where the data are not constant, there is some probability to obtain measurement results other than

0 \dots 0

. We could repeat the algorithm several times (the number of repetitions is another hyper-parameter that determines the precision) to obtain a reasonable estimation of the probability of how likely that is a constant function. Then, this probability provides an estimation for

c_{i}

. For an efficient implementation of the

U_{f}

circuit, one could utilize the single-step n-Toffoli gate [32] in a quantum computer. While this scheme works in principle, it is difficult to implement and the overhead is large. In our simulation, we take an simplified oracle consisting of one n-Toffoli gate and one n-Toffoli gate with negative control, as shown in Figure 4. Each qubit is encoded in a classical indicator state

(1 + y_{i}) / 2 \in {0, 1}

. Then, a single query to the oracle gives the measurement result

\prod_{i} (1 + y_{i}) / 2

, which indicates whether the data are constant or not. This approach is simpler to implement, but requires more encoding qubits. In the original DJ, it takes

O (log n)

for n data points, while, in this simplified approach, it takes

O (n)

qubits. However, it has a potential acceleration similar to that of DJ, where a single call to the oracle gives the desired result.

2.5. Annealing Algorithm for Data Prediction

The classical SVM algorithm constructs

\vec{w}

and b from the training data and then predicts each test data by computing

\vec{w} \cdot \vec{x} + b

. We propose an algorithm such that it is possible to predict multiple test data points with a single run of the prediction algorithm. The prediction is made by finding the ground state of the Ising model:

H = - \sum_{i < j} J_{i, j} s_{i} s_{j} - \sum_{i} h_{i} s_{i} .

(11)

The number of spins is

N = n_{s v} + n_{t}

, where

n_{s v}

is the number of support vectors and

n_{t}

is the number of test data. The coupling constants are functions of the data positions

J_{i j} ({\vec{x}}_{i}, {\vec{x}}_{j})

defined in Equation (4). We use the notation

J_{i j} = \{\begin{matrix} {(J_{s v})}_{i j}, & if {\vec{x}}_{i}, {\vec{x}}_{j} \in {support vectors} \\ {(J_{t})}_{i j}, & if {\vec{x}}_{i}, {\vec{x}}_{j} \in {testing data} \\ {(J_{i n t})}_{i j}, & if {\vec{x}}_{i} \in {testing data}, {\vec{x}}_{j} \in {support vectors} \end{matrix} .

(12)

J_{s v}

is an

n_{s v} \times n_{s v}

matrix,

J_{t}

is an

n_{t} \times n_{t}

matrix and

J_{i n t}

is an

n_{s v} \times n_{t}

matrix. Notice that the

J_{s v}

matrix is given by the training algorithm and does not need to be computed again. The magnetic field is defined as

h_{i} = \{\begin{matrix} y_{i}, & if {\vec{x}}_{i} \in {support vectors} \\ 0, & if {\vec{x}}_{i} \in {testing data} \end{matrix} .

(13)

The prediction result is given by the ground state of the Ising model:

y_{i} = s_{i}, for {\vec{x}}_{i} \in {testing data} .

(14)

The configurations

s_{i}

of the support vectors are dummy variables and not used for prediction. Intuitively, the model favors parallel spins, since

J_{i j} > 0

. Hence, we expect the prediction to have the same spin direction as nearby training data. In principle, the Ising ground state can be calculated by any Ising solver. However, the problem of finding the general Ising model ground state is equivalent to quadratic unconstrained binary optimization (QUBO), which is NP-hard. In this work, we use adiabatic quantum computing implemented on a quantum circuit simulator to find the ground state [33]. The prediction labels are determined by

s_{i} = \{\begin{matrix} - 1, & if p_{- 1} ({\vec{x}}_{i}) > p_{+ 1} ({\vec{x}}_{i}) \\ + 1, & if p_{- 1} ({\vec{x}}_{i}) < p_{+ 1} ({\vec{x}}_{i}) \end{matrix},

(15)

where

p_{\pm}

is the probability that the point label is predicted to be

\pm 1

.

2.6. The VC Dimension

We further show that the Ising classifier has a finite VC dimension upper bound in two simplified toy models in Figure 5. These simplifications can be regarded as regularization methods. The intuition is that, for

J_{i j} > 0

, anti-parallel spin configurations have higher energy and it is difficult that they are a ground state. This difficulty sets a limitation on how many points can be shattered and leads to an upper bound for the VC dimension. The detail derivation is in Appendix A. We use the following definitions:

n_{s v}

is the number of support vectors;

n_{t}

is the number of prediction data points;

N = n_{s v} + n_{t}

is the total number of spins in the Ising model; d is the feature space dimension. We assume that

n_{s v} \leq n_{t}

. For the first toy model, we consider a simplification by introducing equal distance constraints to the support vectors and test data. This means

J_{i j} = \frac{1}{| | {\vec{x}}_{i} - {\vec{x}}_{j} | |} = J

for all

(i, j)

. The Ising prediction model, in this case, is the fully connected infinite-dimensional mean field Ising model:

H = - J \sum_{i < j} s_{i} s_{j} .

(16)

Notice that this is a strong constraint which requires

N \leq d + 1

, but there is still one continuous parameter J, so the hypothesis set is still infinite. This model is exactly solvable. The eigenenergy is

E_{m} = J (- \frac{1}{2} N (N - 1) + 2 m (N - m))

for eigenstates with m spin up and

N - m

spin down. If there are two test data, the anti-parallel state

↑ ↓

is always an excited state; hence, it is impossible it is part of a ground state. We obtain the result

d_{V C} = 1

in this case.

This upper bound can be relaxed by considering a model with less constraints. Consider a second toy model where all the support vectors form a fully connected equal-distance lattice defined by parameter

J_{s v}

. All the test data form another fully connected equal-distance lattice defined by another parameter,

J_{t}

. The distance between support vectors and prediction data is parametrized by a different variable

J_{i n t}

. Hence, there are three continuous variables in the model. The interaction between support vectors and prediction data points is restricted to one-to-one coupling. Let

\{\begin{matrix} {support vectors} & = {{\vec{x}}_{1}, {\vec{x}}_{2}, \dots, {\vec{x}}_{n_{s v}}} \\ {test data} & = {{\vec{x}}_{n_{s v} + 1}, {\vec{x}}_{n_{s v} + 2}, \dots, {\vec{x}}_{n_{s v} + n_{t}}} \end{matrix} .

(17)

Then, the coupling for this model is

J_{i j} = \{\begin{matrix} J_{s v}, & if 1 \leq i < j \leq n_{s v} \\ J_{t}, & if (n_{s v} + 1) \leq i < j \leq (n_{s v} + n_{p}) \\ J_{i n t}, & if 1 \leq i \leq n_{s v} and j = i + n_{s v} \\ 0, & otherwise \end{matrix} .

(18)

For this model, we obtain the result

d_{v c} \leq \sqrt{\frac{2 J_{i n t}}{J_{t}} n_{s v}}

. The intuition is that, if there are more support vectors (large

n_{s v}

), the model has more degrees of freedom. If the support vectors are closer to the test data (large

J_{i n t}

), the model has stronger influences on the test data. Hence, the Ising model can shatter more test data for larger

J_{i n t}

and

n_{s v}

. On the other hand, if the distance among test data is small (large

J_{t}

), then the strong interactions among test data make it difficult to be shattered by the model.

3. Simulation Results

We demonstrate the classification task for two types of teacher data (linear classification boundary and non-linear classification boundary) using the proposed method. The learning of the optimal

(α, β, ϵ)

parameters was performed by minimizing the cost function. Since the

(α, β, ϵ)

parameters are associated with discrete ranking, we chose gradient-free optimization instead of gradient-based optimization. We implemented the Nelder–Mead method and particle swarm optimization (PSO) algorithms. We observed that PSO generally gives better results; hence, we focused on the results obtained by PSO. The hyperparameter N is the number of particles for PSO. The learned support vectors, classification boundary and learning curves for

N = 10, 50, 100

are depicted in Figure 6. The learned parameters are shown in Table 2. In all the cases, the total number of support vectors was fixed at

n_{s v} = 10

, where five of them belonged to

+ 1

and the other five belonged to

- 1

. The

α

and the

ϵ

parameters were fixed to

α = 3

and

ϵ = 0.4

. The optimization was performed by minimizing the loss function Equation (3) for the optimal values of

β

. We observed that the classification could be performed in all experiments. We also observed that, for larger values of N, the optimized loss function had a lower value, but the test accuracy was also lower. This suggests possible overfitting for large values of N in PSO optimization.

We now compare the proposed method with scikit-learn SVM. The verification results are shown in Figure 7. We used 28 teacher data (small dots) for each verification. The detail numbers of data used are listed in Table 3. The large sphere symbols are the support vectors. In the classification using the proposed algorithm, we set the radius of the scan circle to

ϵ = 0.5

(linear data) and

ϵ = 0.6

(non-linear data). The power of the reciprocal of the distance among data was

β = 1

and the criterion for determining the support vectors from the ranking was

α = 3

. For the traditional SVM, we used scikit-learn.svm.SVC with default values. From Figure 7, we found that both the linear classification and the non-linear classification performed by the proposed method could give a classification boundary similar to that of scikit-learn. However, if we changed only the positions of teacher data without changing the number of teacher data, there were cases where it could not be classified well. A possible cause is that the number of teacher data was too small. As a result, the number of support vectors was insufficient and the classification accuracy might have been reduced. Therefore, we expected a better classification result by using a larger number of training data. However, we noticed that quantum adiabatic calculations required a number of qubits of

n_{s v} + n_{t}

, where

n_{s v}

is the number of support vectors and

n_{t}

is the number of test data. Our experiment was limited by this restriction.

4. Computational Complexity

We investigate the efficiency of the proposed algorithm. Table 4 shows the time complexity of the kernel SVM and the proposed algorithm. The training algorithm is DBSCAN; hence, the worst-case complexity is essentially the same as DBSCAN which takes

O (l^{2})

time for computing distances among l training data points. This is better than the standard kernel SVM, which takes the worst-case time of

O (l^{3})

for kernel matrix inversion (empirical scaling is

O (l^{2 + δ})

for some

1 > δ > 0

) [22,23]. The training algorithm has also the feature of Deutsch–Jozsa; hence, it can be quantum-enhanced by using an n-Toffoli gate to speed it up. The speed-up does not change the worst-case scaling of the training algorithm’s complexity. The cost of the prediction part is

O ((n_{s v} + n_{t}) n_{t} d)

for computing the

J_{t}

and

J_{i n t}

matrices. This scaling seems to be worse than that of the kernel SVM, but, in practice, this is not an issue. We could divide the test data into

n_{b}

batches and run the prediction algorithm one time for each batch. The scaling becomes

O ((n_{s v} + \frac{n_{t}}{n_{b}}) n_{t} d)

in this case. For example, when

n_{b} = n_{t}

(i.e., we predict one datum at the time, as in the kernel SVM), we recover the same scaling as the kernel SVM. The cost of the prediction part depends on finding the ground state of the Ising model, which is equivalent to solving a QUBO problem. This, in general, could take an exponentially large amount of time, depending on the solver algorithm. However, to make a reasonable prediction, it is not necessary to find the optimal solution. The Ising prediction part can be solved by classical simulated annealing, or other QUBO solver and can be potentially accelerated by using adiabatic quantum computation or quantum annealing [34]. One might question about the possibility to combine the DBSCAN–DJ training algorithm with the kernel SVM prediction. However, since our training algorithm does not produce

\vec{w}

and

\vec{b}

vectors, it cannot be used for kernel SVM prediction.

5. Conclusions

In this work, we propose a quantum-inspired algorithm for classification. The algorithm has an advantage over the traditional kernel SVM in the training stage. The algorithm can be performed on a pure classical computer and can be enhanced in a hybrid quantum–classical computing environment. We experimentally verify the algorithm and provide theoretical background for the algorithm. The applicability to more general classification problems and possible improvements will be studied in future works. For example, multi-class classifications could be treated by one-versus-rest or one-versus-one approaches [35]. The optimal value for

ϵ

could be determined by other methods [36].

Author Contributions

K.S. (Kodai Shiba) carried out simulation and experiment. C.-C.C. wrote the manuscript and carried out the theoretical simulation. M.S. and K.S. (Katsuyoshi Sakamoto) contributed to research design. T.S. supervised the project. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data and scripts that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

We thank Naoki Yamamoto for valuable discussions.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. VC Dimension of Ising Predictor

In this appendix, we provide the derivation for the VC dimension of the Ising classifier toy models. First, we show a simple result that one test datum can always be shattered by one support vector; hence, there is a general lower bound

d_{V C} \geq 1

. If there is only one support vector

s_{1}

and one test datum

s_{2}

, the Hamiltonian is

H = - J s_{1} s_{2} - h_{1} s_{1} = - s_{1} (J s_{2} + h_{1})

. Given any

s_{2}

, we can choose

s_{1} = s_{2}

to produce the desired ground state. Hence, one point can always be shattered by one support vector. Hence,

d_{V C} \geq 1

.

Then, we consider the fully connected Ising model

H = - J \sum_{i < j} s_{i} s_{j}

. The eigenenergy

E_{m} = J (- \frac{1}{2} N (N - 1) + 2 m (N - m))

for eigenstates with m spin-up and

(N - m)

spin-down. The spectrum is a parabolic curve concave downward as a function of m. The doubly degenerated ground states are all spin-up and all spin-down states with energy

E_{0} = - \frac{J}{2} N (N - 1)

. The highest energy state occurs at

m = ⌊ \frac{N}{2} ⌋

with half spin-up and half spin-down and the energy is

E_{m a x} = J ⌊ \frac{N}{2} ⌋

.

Now, we can consider the first prediction model defined in Equation (16). Let us consider two configurations of prediction data: (A) half spin-up and half spin-down; (B) all spins aligned in the same direction. To shatter the prediction data, the model has to be able to achieve a ground state when the given test data are in the highest energy state

E_{t}^{A} = J ⌊ \frac{n_{t}}{2} ⌋

. The best possible choice of the support vectors is that all spins are aligned in the same direction with some energy

E_{s v}

. The interaction between support vectors and test data gives the energy

E_{i n t}^{A} = - \frac{J}{2} (n_{t} \mod 2)

. The resulting total energy is

E_{t o t a l}^{A} = E_{t}^{A} + E_{i n t}^{A} + E_{s v} = J ⌊ \frac{n_{t}}{2} ⌋ - \frac{J}{2} (n_{t} \mod 2) + E_{s v}

. To predict configuration (A),

E_{t o t a l}^{A}

must be lower than the state

E_{t o t a l}^{B}

, where all prediction spins are aligned with all support vectors. The total energy in the second case is

E_{t o t a l}^{B} = - \frac{J}{2} n_{t} (n_{t} - 1) - J n_{t} n_{s v} + E_{s v}

. The requirement

E_{t o t a l}^{A} \leq E_{t o t a l}^{B}

means

J ⌊ \frac{n_{t}}{2} ⌋ - \frac{J}{2} (n_{t} \mod 2) + \frac{J}{2} n_{t} (n_{t} - 1) + J n_{t} n_{s v} \leq 0

. Since only the second term is negative, this condition can never be satisfied for any

n_{t} \geq 2

. Hence, the model can never shatter two points, so

d_{V C} \leq 1

. Since we also have

d_{V C} \geq 1

, we conclude that

d_{V C} = 1

.

Then, we consider the second relaxed model defined in Equation (18). To simplify the derivation, we assume that

n_{s v}

and

n_{t}

are both even numbers. The support vectors can be in the configuration of half spin-up and half spin-down. The spin-up support vectors are one-to-one-coupled to the spin-up test data. The interaction energy is

E_{i n t}^{A} = - J_{i n t} n_{s v}

and the total energy is

E_{t o t a l}^{A} = E_{t}^{A} + E_{i n t}^{A} + E_{s v} = J_{t} \frac{n_{t}}{2} - J_{i n t} n_{s v} + E_{s v}

. We also have

E_{t o t a l}^{B} = E_{t}^{B} + E_{i n t}^{B} + E_{s v} = \frac{- 1}{2} J_{t} n_{t} (n_{t} - 1) + 0 + E_{s v}

. The requirement

E_{t o t a l}^{A} \leq E_{t o t a l}^{B}

then implies

n_{p} \leq \sqrt{\frac{2 J_{i n t}}{J_{t}} n_{s v}}

. Hence,

d_{V C} \leq \sqrt{\frac{2 J_{i n t}}{J_{t}} n_{s v}}

. We may also consider that all support vectors are aligned in the same direction. In this case, we have

E_{t o t a l}^{A} = E_{t}^{A} + E_{i n t}^{A} + E_{s v} = J_{t} \frac{n_{t}}{2} + 0 + E_{s v}

and

E_{t o t a l}^{B} = E_{t}^{B} + E_{i n t}^{B} + E_{s v} = \frac{- 1}{2} J_{t} n_{t} (n_{t} - 1) + J_{i n t} n_{s v} + E_{s v}

; therefore, the upper bound remains the same.

References

Schuld, M.; Sinayskiy, I.; Petruccione, F. An introduction to quantum machine learning. Contemp. Phys. 2015, 56, 172–185. [Google Scholar] [CrossRef] [Green Version]
Biamonte, J.; Wittek, P.; Pancotti, N.; Rebentrost, P.; Wiebe, N.; Lloyd, S. Quantum machine learning. Nature 2017, 549, 195–202. [Google Scholar] [CrossRef] [PubMed]
Aaronson, S. The learnability of quantum states. Proc. R. Soc. A Math. Phys. Eng. Sci. 2007, 463, 3089–3114. [Google Scholar] [CrossRef] [Green Version]
Dumoulin, V.; Goodfellow, I.J.; Courville, A.; Bengio, Y. On the Challenges of Physical Implementations of RBMs. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI’14), Quebec City, QC, Canada, 27–31 July 2014; pp. 1199–1205. [Google Scholar]
Romero, J.; Olson, J.P.; Aspuru-Guzik, A. Quantum autoencoders for efficient compression of quantum data. Quantum Sci. Technol. 2017, 2, 045001. [Google Scholar] [CrossRef] [Green Version]
Crawford, D.; Levit, A.; Ghadermarzy, N.; Oberoi, J.S.; Ronagh, P. Reinforcement Learning Using Quantum Boltzmann Machines. Quantum Inf. Comput. 2018, 18, 51–74. [Google Scholar] [CrossRef]
Mitarai, K.; Negoro, M.; Kitagawa, M.; Fujii, K. Quantum circuit learning. Phys. Rev. A 2018, 98, 032309. [Google Scholar] [CrossRef] [Green Version]
Chen, C.C.; Watabe, M.; Shiba, K.; Sogabe, M.; Sakamoto, K.; Sogabe, T. On the expressibility and overfitting of quantum circuit learning. ACM Trans. Quantum Comput. 2021, 2, 1–24. [Google Scholar] [CrossRef]
Harrow, A.W.; Hassidim, A.; Lloyd, S. Quantum Algorithm for Linear Systems of Equations. Phys. Rev. Lett. 2009, 103, 150502. [Google Scholar] [CrossRef]
Borle, A.; Elfving, V.E.; Lomonaco, S.J. Quantum Approximate Optimization for Hard Problems in Linear Algebra. arXiv 2020, arXiv:2006.15438. [Google Scholar] [CrossRef]
Lin, L.; Tong, Y. Optimal polynomial based quantum eigenstate filtering with application to solving quantum linear systems. Quantum 2020, 4, 361. [Google Scholar] [CrossRef]
Harrow, A.W. Small quantum computers and large classical data sets. arXiv 2020, arXiv:2004.00026. [Google Scholar]
Aaronson, S. Read the fine print. Nat. Phys. 2015, 11, 291–293. [Google Scholar] [CrossRef]
Nakaji, K.; Yamamoto, N. Quantum semi-supervised generative adversarial network for enhanced data classification. arXiv 2020, arXiv:2010.13727. [Google Scholar] [CrossRef]
Preskill, J. Quantum Computing in the NISQ era and beyond. Quantum 2018, 2, 79. [Google Scholar] [CrossRef]
Tang, E. A Quantum-Inspired Classical Algorithm for Recommendation Systems. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC 2019), Phoenix, AZ, USA, 23–26 June 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 217–228. [Google Scholar] [CrossRef] [Green Version]
Chia, N.H.; Li, T.; Lin, H.H.; Wang, C. Quantum-inspired sublinear algorithm for solving low-rank semidefinite programming. arXiv 2019, arXiv:1901.03254. [Google Scholar]
Arrazola, J.M.; Delgado, A.; Bardhan, B.R.; Lloyd, S. Quantum-inspired algorithms in practice. Quantum 2020, 4, 307. [Google Scholar] [CrossRef]
Bravyi, S.; Kliesch, A.; Koenig, R.; Tang, E. Hybrid quantum-classical algorithms for approximate graph coloring. arXiv 2020, arXiv:2011.13420. [Google Scholar]
Chia, N.H.; Gilyén, A.; Li, T.; Lin, H.H.; Tang, E.; Wang, C. Sampling-Based Sublinear Low-Rank Matrix Arithmetic Framework for Dequantizing Quantum Machine Learning. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC 2020), Chicago, IL, USA, 22–26 July 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 387–400. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Chang, C.C.; Lin, C.J. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 1–27. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Bordes, A.; Ertekin, S.; Weston, J.; Bottou, L. Fast Kernel Classifiers with Online and Active Learning. J. Mach. Learn. Res. 2005, 6, 1579–1619. [Google Scholar]
Wu, X.; Zuo, W.; Lin, L.; Jia, W.; Zhang, D. F-SVM: Combination of Feature Transformation and SVM Learning via Convex Relaxation. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 5185–5199. [Google Scholar] [CrossRef] [Green Version]
Rizwan, A.; Iqbal, N.; Ahmad, R.; Kim, D.H. WR-SVM Model Based on the Margin Radius Approach for Solving the Minimum Enclosing Ball Problem in Support Vector Machine Classification. Appl. Sci. 2021, 11, 4657. [Google Scholar] [CrossRef]
Rebentrost, P.; Mohseni, M.; Lloyd, S. Quantum Support Vector Machine for Big Data Classification. Phys. Rev. Lett. 2014, 113, 130503. [Google Scholar] [CrossRef]
Schuld, M.; Fingerhuth, M.; Petruccione, F. Implementing a distance-based classifier with a quantum interference circuit. EPL (Europhys. Lett.) 2017, 119, 60002. [Google Scholar] [CrossRef] [Green Version]
Havlíček, V.; Córcoles, A.D.; Temme, K.; Harrow, A.W.; Kandala, A.; Chow, J.M.; Gambetta, J.M. Supervised learning with quantum-enhanced feature spaces. Nature 2019, 567, 209–212. [Google Scholar] [CrossRef] [Green Version]
Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96), Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
Deutsch, D.; Jozsa, R. Rapid solution of problems by quantum computation. Proc. R. Soc. Lond. Ser. A Math. Phys. Sci. 1992, 439, 553–558. [Google Scholar] [CrossRef]
Rasmussen, S.E.; Groenland, K.; Gerritsma, R.; Schoutens, K.; Zinner, N.T. Single-step implementation of high-fidelity n-bit Toffoli gates. Phys. Rev. A 2020, 101, 022308. [Google Scholar] [CrossRef] [Green Version]
Farhi, E.; Goldstone, J.; Gutmann, S.; Sipser, M. Quantum Computation by Adiabatic Evolution. arXiv 2000, arXiv:quant-ph/quant-ph/0001106. [Google Scholar]
Kadowaki, T.; Nishimori, H. Quantum annealing in the transverse Ising model. Phys. Rev. E 1998, 58, 5355–5363. [Google Scholar] [CrossRef] [Green Version]
Bishop, C.M. Pattern Recognition and Machine Learning (Information Science and Statistics); Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Giri, K.; Biswas, T.K.; Sarkar, P. ECR-DBSCAN: An improved DBSCAN based on computational geometry. Mach. Learn. Appl. 2021, 6, 100148. [Google Scholar] [CrossRef]

Figure 1. Outline of the proposed algorithm. The training data set is fed into the support vector selection algorithm. The support vectors are then used in the Ising predictor for prediction. The support vector selection algorithm is described in Algorithm 1 and the Ising predictor is described in Section 2.5.

Figure 2. The training process. The red dots and blue dots are training data with two different classification labels. (a) For each data point

{\vec{x}}_{i}

, we consider the hypersphere

B_{ϵ} ({\vec{x}}_{i})

of radius

ϵ

centered in the data point

{\vec{x}}_{i}

. We ask the question of whether all the data points in the sphere have the same label or not. (b) If the data within the sphere are mixed (not constant), then all the data within the sphere receive one count by

c [j] = c [j] + 1

. (c) After scanning through all the data, we rank the data according to

c [i]

. (d) The data points with the highest

c [i]

are selected as support vectors. The blue and yellow dots are support vectors for two different classes.

Figure 2. The training process. The red dots and blue dots are training data with two different classification labels. (a) For each data point

{\vec{x}}_{i}

, we consider the hypersphere

B_{ϵ} ({\vec{x}}_{i})

of radius

ϵ

centered in the data point

{\vec{x}}_{i}

. We ask the question of whether all the data points in the sphere have the same label or not. (b) If the data within the sphere are mixed (not constant), then all the data within the sphere receive one count by

c [j] = c [j] + 1

. (c) After scanning through all the data, we rank the data according to

c [i]

. (d) The data points with the highest

c [i]

are selected as support vectors. The blue and yellow dots are support vectors for two different classes.

Figure 3. The quantum circuit for the 4-data point example in Table 1. (a) The oracle circuit

U_{f}

. (b) The circuit for the DJ algorithm.

Figure 3. The quantum circuit for the 4-data point example in Table 1. (a) The oracle circuit

U_{f}

. (b) The circuit for the DJ algorithm.

Figure 4. The quantum circuit implementation for the DJ query. The circuit consists of one n-Toffoli gate and one n-Toffoli gate with negative control.

Figure 5. (a) Fully-connected mean field Ising model. (b) A relaxed model where there are only one-to-one couplings between support vectors and test data.

Figure 6. Support vector distribution, prediction probability and classification boundary from particle swarm optimization (PSO) results. The experimental parameters and optimal values are listed in Table 2. (a)

N = 10

. (b)

N = 50

. (c)

N = 100

. The small dots are the training data. The symbols bordered by white circles are the test data that were classified correctly. The crossed circles are the test data that were classified incorrectly. The large sphere symbols are the support vectors. (d) The learning curves for four experiments. The orange line is the Nelder–Mead experiment which does not converge. The other three lines are PSO results. The black line is

N = 10

, the red line is

N = 50

and the blue line is

N = 100

, respectively. (e) The training and testing accuracy suggest overfitting for larger values of N.

Figure 6. Support vector distribution, prediction probability and classification boundary from particle swarm optimization (PSO) results. The experimental parameters and optimal values are listed in Table 2. (a)

N = 10

. (b)

N = 50

. (c)

N = 100

. The small dots are the training data. The symbols bordered by white circles are the test data that were classified correctly. The crossed circles are the test data that were classified incorrectly. The large sphere symbols are the support vectors. (d) The learning curves for four experiments. The orange line is the Nelder–Mead experiment which does not converge. The other three lines are PSO results. The black line is

N = 10

, the red line is

N = 50

and the blue line is

N = 100

, respectively. (e) The training and testing accuracy suggest overfitting for larger values of N.

Figure 7. Classification results. The small dots are the training data. The symbols bordered by white circles are the test data that were classified correctly. The crossed circles are the test data that were classified incorrectly by scikit-learn SVM, but correctly predicted by the proposed algorithm. The large sphere symbols are the support vectors. (a) Linear data classification by the proposed algorithm. (b) Linear data classification by scikit-learn SVM. (c) Non-linear data classification by the proposed algorithm. (d) Non-linear data classification by scikit-learn SVM.

Table 1. An example of four training data’s look-up table for function

f (x)

.

Table 1. An example of four training data’s look-up table for function

f (x)

.

$x_{i}$	$y_{i}$	$(1 + y_{i}) / 2$
00	−1	0
01	1	1
10	1	1
11	−1	0

Table 2. Particle swarm optimization results. N is the number of particles and

β

is the hyperparameter being optimized. The optimization is conducted by minimizing the cross-entropy loss function Equation (3). The optimized results are visualized in Figure 6.

Table 2. Particle swarm optimization results. N is the number of particles and

β

is the hyperparameter being optimized. The optimization is conducted by minimizing the cross-entropy loss function Equation (3). The optimized results are visualized in Figure 6.

N	$β$	Test Accuracy
10	1.128	0.9675
50	0.6356	0.9610
100	0.6157	0.9335

Table 3. The number of training and testing data used in the experiment in Figure 7.

Data	Total	+1 Label	$- 1$ Label
Linear Training	28	13	15
Linear Testing	12	5	7
Non-linear Training	28	9	19
Non-linear Testing	12	7	5

Table 4. Worst-case time complexity for the proposed algorithm comparing to the standard kernel SVM [22,23,24]. d is the feature space dimension, l is the number of training data,

n_{s v}

is the number of support vectors,

n_{t}

is the number of test data and

n_{b}

is the number of batches for the test data.

T_{a}

is the annealing time to find the Ising ground state, which depends on the choice of Ising solver algorithm.

Table 4. Worst-case time complexity for the proposed algorithm comparing to the standard kernel SVM [22,23,24]. d is the feature space dimension, l is the number of training data,

n_{s v}

is the number of support vectors,

n_{t}

is the number of test data and

n_{b}

is the number of batches for the test data.

T_{a}

is the annealing time to find the Ising ground state, which depends on the choice of Ising solver algorithm.

	Training	Prediction
Kernel SVM	$O (l^{3} d)$	$O (n_{s v} n_{t} d)$
This work	$O (l^{2} d)$	$O ((n_{s v} + \frac{n_{t}}{n_{b}}) n_{t} d) + T_{a}$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shiba, K.; Chen, C.-C.; Sogabe, M.; Sakamoto, K.; Sogabe, T. Quantum-Inspired Classification Algorithm from DBSCAN–Deutsch–Jozsa Support Vectors and Ising Prediction Model. Appl. Sci. 2021, 11, 11386. https://doi.org/10.3390/app112311386

AMA Style

Shiba K, Chen C-C, Sogabe M, Sakamoto K, Sogabe T. Quantum-Inspired Classification Algorithm from DBSCAN–Deutsch–Jozsa Support Vectors and Ising Prediction Model. Applied Sciences. 2021; 11(23):11386. https://doi.org/10.3390/app112311386

Chicago/Turabian Style

Shiba, Kodai, Chih-Chieh Chen, Masaru Sogabe, Katsuyoshi Sakamoto, and Tomah Sogabe. 2021. "Quantum-Inspired Classification Algorithm from DBSCAN–Deutsch–Jozsa Support Vectors and Ising Prediction Model" Applied Sciences 11, no. 23: 11386. https://doi.org/10.3390/app112311386

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Quantum-Inspired Classification Algorithm from DBSCAN–Deutsch–Jozsa Support Vectors and Ising Prediction Model

Abstract

1. Introduction

2. The Learning Algorithm

2.1. Hypothesis Set

2.2. DBSCAN Algorithm for Determining the Support Vector

2.3. Integer Linear Programming Formulation

2.4. Quantum-Enhanced Algorithm with DJ

2.5. Annealing Algorithm for Data Prediction

2.6. The VC Dimension

3. Simulation Results

4. Computational Complexity

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. VC Dimension of Ising Predictor

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI