1. Introduction
The shortest vector problem (SVP) and closest vector problem (CVP) are hard computing lattice problems, which have become central building blocks in latticebased cryptanalysis. The security analysis of many latticebased cryptographic primitives is usually reduced to solving the underlying mathematical problems, which are closely related to SVP and CVP. Some hard computing problems used in classical publickey cryptosystems can also be converted to a variant version of SVP or CVP, such as the knapsack problem [
1,
2,
3], the hidden number problem [
4] and the integer factoring problem [
5].
Lattice enumeration is a general SVP solver with linear space complexity, which can be traced back to the early 1980s [
6,
7]. It outputs a lattice vector (or proves there is none) that is shorter than the given target length within superexponential time. Enumeration can also be used as the subroutine of the blockwise lattice basis reduction (BKZ) algorithm, and therefore plays an important role in the security analysis and parameter assessment of latticebased cryptosystems [
8,
9,
10].
Classical lattice enumeration can be traced back to the early 1980s. Kannan [
6] proposed an algorithm to enumerate lattice points in a highdimensional parallelepiped, which output the shortest vector, along with a wellreduced HKZ basis, at the cost of a huge time consumption. Fincke and Pohst [
7,
11] proposed using a lighter preprocessing method, such as LLL or BKZ algorithms, and spending more time on enumerating the lattice vector in a hyperellipsoid. Fincke–Pohst enumeration has a higher level of time complexity in theory, but it performs better than Kannan’s enumeration in practice.
Pruning is the most important technique to accelerate lattice enumeration. In the classical enumeration algorithm, all the coordinate vectors of lattice points are organized as an enumeration tree and are searched in a depthfirst way. The pruning method cuts off the branch and stops searching in depth when the objective function value at the current node exceeds the bounding function. This might cut off the correct solution during searching; hence, enumeration becomes a probability algorithm. Gama, Nguyen and Regev [
12] proposed the extreme pruning method, treating the bounding function as the solution to an optimization problem. The optimal bounding function can be regarded as an extreme point, which minimizes the expected total running time (with a given success probability). Therefore, the extreme pruning method is believed to be the most efficient pruning method for classical enumeration, which is also called GNR enumeration. The
fplll library [
13] provides an opensource implementation of GNR enumeration.
The classical pruned enumeration searches lattice vectors in a hypercylinder intersection, which is regarded as a continuous region in time analysis. Consequently, the computation of the expected running time of GNR enumeration is easy to handle, which implies that the upper bound on the cost of lattice enumeration is clear. Aono et al. also proved a lower bound on GNR enumeration [
14].
The discrete pruning method is quite different. The discrete pruned enumeration (DP enumeration) originated from a heuristic “Sampling Reduction” algorithm [
15], which iteratively samples lattice vectors under the restriction on their Gram–Schmidt coefficients and then rerandomizes the basis using lattice reduction. Ajtai, Buchmann and Ludwig [
16,
17] provided some analyses of the time complexity and success probability. Fukase and Kashiwabara [
18] put forward a series of significant improvements, including the natural number representation (NNR) of lattice points, to make the sampling reduction method more practical and provided a heuristic analysis. Teruya et al. [
19] designed a parallelized version of the Fukase–Kashiwabara sampling reduction algorithm and solved a 152dimensional SVP challenge, which was the best record of that year. Other relevant studies include [
20,
21,
22]. The Sampling Reduction algorithm shows good practicality but lacks sufficient theoretical support, especially regarding the parameter settings and estimation of running time. The conception of “discrete pruning” was formally put forward in EUROCRYPT’ 17 by Aono and Nguyen [
23]. They proposed a novel conception named “lattice partition” to generalize the previous sampling methods, and they solved the problem of what kind of lattice points should be “sampled” using the classical enumeration technique. The success probability of discrete pruning can be described as the volume of “ballbox intersection”, and can be calculated efficiently using fast inverse Laplace transform(FILT). Aono et al. [
24] made some modifications to DP enumeration and proposed a quantum variant. The theoretical foundation of DP enumeration was gradually developed, but some problems still remain.
A precise cost estimation: There is a gap between theoretical time complexity and the actual cost. It has been proved that each subalgorithm of DP enumeration has a polynomialtime complexity, but the actual running time is not in proportion with the theoretical upper bound, since subalgorithms with different structures are analyzed using different arithmetic operations.
The optimal parameter setting: The parameters of DP enumeration used to be set empirically, by hand. An important problem that [
23,
24] did not clearly explain is how many points should be enumerated in the iteration. The authors of [
18,
23] provided some examples of parameter selection, without further explanation. For a certain SVP instance, optimal parameter settings should exist to minimize the total running time. This is based on the solution to the first problem.
To solve the above problems, the whole work is carried out using two steps: First, we built a precise cost model for DP enumeration, called the “DP simulator”. During this procedure, some implementation details regarding DP enumeration are improved, and the cost model is based on these improvements. Second, we used the optimization method to find the optimal parameter of DP enumeration.
In the first step, to estimate the precise cost of DP enumeration, we studied and improved DP enumeration in terms of both mathematical theory and algorithm implementation. The main work was as follows:
Rectification of the theoretical assumption: It is generally assumed that the lattice point in participating “cells” follows the uniform distribution, but Ludwig [
25] (Section 2.4) pointed out that this randomness assumption does not strictly hold for Schnorr’s sampling method, i.e., Schnorr’s partition. This defect also exists in the randomness assumption of natural number partition and leads to a paradox, where two symmetric lattice vectors with the same length have a different moment value and success probability. This will lead to inaccuracies in cell enumeration and success probability analyses; hence, we provided a rectified assumption to describe lattice point distribution in cells more cautiously and accurately, and consequently eliminate the defect.
Improvements in algorithm implementation: We propose a new polynomialtime binary search algorithm to find the cell enumeration radius, which guarantees a more precise output than [
24] and is more conducive to building a precise cost model. We proposed using a lightweight rerandomization method and a truncated version of BKZ, “ktoursBKZ”, as the reprocessing method when DP enumeration fails in one round and has to be repeated. This method takes both the basis of quality and controllable running time into consideration. We examined the stabilization of basis quality during repeated reprocessing and proposed a model to describe the relationship between orthogonal basis information and the parameters of DP enumeration.
A cost simulator of DP enumeration: Based on the above improvements, we provided an opensource implementation of DP enumeration. To describe the actual running time of DP enumeration, it is necessary to define a unified “basic operation” for all subalgorithms of DP enumeration and fit the coefficients of polynomials. We calculated the fitted time cost formula in CPUcycles for each subalgorithm of our implementation. We also modified the calculation procedures of success probability according to the rectified randomness assumption. Then, we built a cost model, the DP simulator, to estimate the exact cost of DP enumeration under any given parameters. In addition, for random lattices with GSA assumption holding, this works in a simple and efficient way, without computing any specific lattice basis.
In the second step, to predict the optimal parameter for DP enumeration, we proposed an optimization model. In this model, the DP enumeration parameter is regarded as the input of the DP simulator, and the output of the DP simulator is the estimated running time. We found that the Nelder–Mead method is suitable for solving the optimization problem, since the DP simulator has no explicit expression and has unknown derivatives. As the cost model is accurate, the parameter that minimizes the output of the DP simulator can also be considered the optimal parameter of the DP enumeration algorithm.
Contributions of this work: We propose a systematic solution to the open problem of DP enumeration by combining the improved implementation of DP enumeration, DP simulator and the optimization method used to find optimal parameters. The experiment confirms that DP enumeration outperforms the stateofart enumeration with extreme pruning and provides concrete crossover points for algorithm performance. Furthermore, the experimental result is extrapolated to higher dimensions, and we provided the asymptotic expression of the cost of DP enumeration. These results provide valuable references for latticebased cryptography. We released our implementation as an open source (
https://github.com/LunaLuan9555/DPENUM, accessed on 30 December 2022) to promote the development of latticebased cryptography.
Roadmap:Section 2 introduces the fundamental knowledge of lattices and provides an overview of pruning technologies for lattice enumeration.
Section 3 first rectifies the basic randomness assumption of lattice partition, and then describes the details of three improvements in discrete pruning enumeration.
Section 4 shows the details of our DP enumeration cost simulator, including the runtime estimation of every subalgorithm and the rectified success probability model.
Section 5 describes how to find the optimal parameter setting for DP enumeration using our cost simulator.
Section 6 provides experimental results to verify the accuracy of our cost simulator, and compares the efficiency of our implementation with the extreme pruned enumeration in fplll library. Finally,
Section 7 provides the conclusion and discusses some further works.
2. Preliminaries
2.1. Lattice
Lattice: Let ${\mathbb{R}}^{m}$ denote the mdimensional Euclidean space. Given n linear independent vectors ${\mathbf{b}}_{1},\dots ,{\mathbf{b}}_{n}\in {\mathbb{R}}^{m}$ ($m\ge n$), a lattice $\mathcal{L}$ is defined by a set of points in ${\mathbb{R}}^{m}$: $\mathcal{L}=\left\{{\sum}_{i=1}^{n}{x}_{i}{\mathbf{b}}_{i}:{x}_{i}\in \mathbb{Z}\right\}$. The vector set $\{{\mathbf{b}}_{1},\dots ,{\mathbf{b}}_{n}\}$ is called a basis of lattice $\mathcal{L}$ and can be written in the form of column matrix $\mathbf{B}=\left[{\mathbf{b}}_{1},\dots ,{\mathbf{b}}_{n}\right]$. The rank of matrix $\mathbf{B}$ is n, which is also known as the dimension of lattice. From a computational perspective, we can only consider the case that ${\mathbf{b}}_{i}\in {\mathbb{Z}}^{m}$ for $i=1,\dots ,n$ for convenience, since the real number is represented by a rational number in the computer, and a lattice with a rational basis can always be scaled to one with an integral basis. The lattice is fullrank when $m=n$, which is a common case in latticebased cryptanalysis. In the following, we only consider the case $\mathbf{B}\in {\mathbb{Z}}^{n\times n}$.
A lattice has many different bases. Given two bases ${\mathbf{B}}_{1},{\mathbf{B}}_{2}\in {\mathbb{Z}}^{m\times n}$ of a lattice $\mathcal{L}$, there exists a unimodular matrix $\mathbf{U}$ such that ${\mathbf{B}}_{1}={\mathbf{B}}_{2}\mathbf{U}$. A basis of the lattice corresponds to a fundamental parallelepiped $\mathcal{P}\left(\mathbf{B}\right)=\left\{{\sum}_{i=1}^{n}{\mathbf{b}}_{i}{x}_{i}:0\le {x}_{i}<1,i=1,\dots ,n\right\}$. The shape of fundamental parallelepiped varies depending on the basis, but the volume of those fundamental parallelepipeds is an invariant of the lattice, which is denoted by $\mathrm{vol}\left(\mathcal{L}\right)$. This is also called the determinant $\mathrm{det}\left(\mathcal{L}\right)$ of a lattice, and we have $\mathrm{det}\left(\mathcal{L}\right)=\mathrm{vol}\left(\mathcal{L}\right)=\sqrt{\mathrm{det}\left({\mathbf{B}}^{T}\mathbf{B}\right)}$.
Random lattice: The formal definition and generation algorithm of a random lattice can refer to Goldstein and Mayer’s work in [
26]. The SVP challenge also adopts the Goldstein–Mayer lattice. The lattice of an ndimensional SVP challenge instance has a volume of about
${2}^{10n}$.
Gaussian heuristic: For a lattice $\mathcal{L}$ and a measurable set S in ${\mathbb{R}}^{n}$, we intuitively expect that the set contains $\mathrm{vol}\left(S\right)/\mathrm{vol}\left(\mathcal{L}\right)$ fundamental parallelepipeds; therefore, there should be the same number of points in $S\cap \mathcal{L}$.
Assumption 1. Gaussian heuristic. Let $\mathcal{L}$ be a ndimensional lattice in ${\mathbb{R}}^{n}$ and S be a measurable set of ${\mathbb{R}}^{n}$. Then, We note that the Gaussian heuristic should be used carefully, because in some “bad” cases, this assumption does not hold (see Section 2.1.2 in [
27]). However, in random lattice, this assumption generally holds, especially for some “nice” set
S; therefore, we can use the Gaussian heuristic to predict
${\lambda}_{1}\left(\mathcal{L}\right)$:
In fact, $\mathrm{GH}\left(\mathcal{L}\right)$ is exactly the radius of an ndimensional ball with volume $\mathrm{vol}\left(\mathcal{L}\right)$. It is widely believed that $\mathrm{GH}\left(\mathcal{L}\right)$ is a good estimation of ${\lambda}_{1}\left(\mathcal{L}\right)$ when $n\u2a8645$.
Shortest vector problem (SVP): For a lattice $\mathcal{L}=\mathcal{L}\left(\mathbf{B}\right)$ with basis $\mathbf{B}\in {\mathbb{Z}}^{m\times n}$, one can find a lattice vector $\mathbf{Bx}$ with $\mathbf{x}\in {\mathbb{Z}}^{n}\backslash \left\{\mathbf{0}\right\}$ such that $\u2225\mathbf{Bx}\u2225\le \u2225\mathbf{By}\u2225$ for any $\mathbf{y}\in {\mathbb{Z}}^{n}\backslash \left\{\mathbf{0}\right\}$. The length of the shortest vector is denoted by ${\lambda}_{1}\left(\mathcal{L}\right)$.
It is of great interest to find the shortest nonzero vector of a lattice in the fields of complexity theory, computational algebra and cryptanalysis. However, a more common case in cryptanalysis is to find a lattice vector that is shorter than a given bound. In other words, researchers are more interested in finding an approximate solution to SVP. For example, the target of the SVP challenge [
28] is to find a lattice vector
$\mathbf{v}\in \mathcal{L}$ such that
${\parallel \mathbf{v}\parallel}^{2}\le 1.05\xb7\mathrm{GH}\left(\mathcal{L}\right)\approx 1.05{\lambda}_{1}\left(\mathcal{L}\right)$.
Orthogonal projection: The Gram–Schmidt orthogonalization can be considered a direct decomposition of lattice and is frequently used in lattice problems.
Definition 1. Gram–Schmidt orthogonalization. Let $\mathbf{B}=\left[{\mathbf{b}}_{1},\dots {\mathbf{b}}_{n}\right]\in {\mathbb{Z}}^{m\times n}$ be a lattice basis, The Gram–Schmidt orthogonal basis ${\mathbf{B}}^{*}=\left[{\mathbf{b}}_{\mathbf{1}}^{*},\dots {\mathbf{b}}_{\mathbf{n}}^{*}\right]\in {\mathbb{Q}}^{m\times n}$ is defined with ${\mathbf{b}}_{i}^{*}={\mathbf{b}}_{i}{\sum}_{j=1}^{i1}{\mu}_{i,j}{\mathbf{b}}_{j}^{*}$, where the orthogonal coefficient ${\mu}_{i,j}=\frac{\langle {\mathbf{b}}_{i},{\mathbf{b}}_{j}^{*}\rangle}{\langle {\mathbf{b}}_{j}^{*},{\mathbf{b}}_{j}^{*}\rangle}$.
Definition 2. Orthogonal projection. Let ${\pi}_{i}:{\mathbb{R}}^{m}\to span{({\mathbf{b}}_{1},\dots ,{\mathbf{b}}_{i1})}^{\perp}$ be the ith orthogonal projection. For $\mathbf{v}\in {\mathbb{R}}^{m}$, we define ${\pi}_{i}\left(\mathbf{v}\right)=\mathbf{v}{\sum}_{j=1}^{i1}\frac{\langle \mathbf{v},{\mathbf{b}}_{j}^{*}\rangle}{\parallel {\mathbf{b}}_{j}^{*}{\parallel}^{2}}{\mathbf{b}}_{j}^{*}$. Since any lattice vector $\mathbf{v}$ can be represented by the orthogonal basis ${\mathbf{B}}^{*}$ as $\mathbf{v}={\sum}_{i=1}^{n}{u}_{i}{\mathbf{b}}_{i}^{*}$, we also have ${\pi}_{i}\left(\mathbf{v}\right)={\sum}_{j=i}^{n}{u}_{j}{\mathbf{b}}_{j}^{*}$.
For lattice $\mathcal{L}\left(\mathbf{B}\right)$ and $i=1,\dots ,n$, we can define the $ni+1$dimensional projected lattice ${\pi}_{i}\left(\mathcal{L}\left(\mathbf{B}\right)\right)=\mathcal{L}\left([{\pi}_{i}\left({\mathbf{b}}_{i}\right),{\pi}_{i}\left({\mathbf{b}}_{i+1}\right),\dots ,{\pi}_{i}\left({\mathbf{b}}_{n}\right)]\right)$. Note that the orthogonal basis of ${\pi}_{i}\left(\mathcal{L}\left(\mathbf{B}\right)\right)$ is exactly $[{\mathbf{b}}_{i}^{*},{\mathbf{b}}_{i+1}^{*},\dots ,{\mathbf{b}}_{n}^{*}]$.
2.2. Discrete Pruning
In classical enumeration, we search for lattice points directly, according to their coordinates $({x}_{1},\dots ,{x}_{n})$ with respect to basis $\mathbf{B}$, such that $\parallel \mathbf{v}\parallel =\parallel {\sum}_{i=1}^{n}{x}_{i}{\mathbf{b}}_{i}\parallel \le R$. However, enumeration with discrete pruning behaves in a very different way.
Considering the representation
$\mathbf{v}={\sum}_{j=1}^{n}{u}_{j}{\mathbf{b}}_{j}^{*}$, it is intuitive to search for a lattice vector with a small
${u}_{j}$, especially for index
j, corresponding to a very large
$\parallel {\mathbf{b}}_{j}^{*}\parallel $. This idea is first applied in a heuristic vector sampling method proposed by Schnorr [
15] and dramatically improved by Fukase and Kashiwabara [
18]. These sampling strategies are summarized by Aono and Nguyen, and they defined
lattice partition to generalize these sampling methods.
Definition 3. (Lattice partition [
23])
. Let $\mathcal{L}$ to be a fullrank lattice in ${\mathbb{Z}}^{n}$. An $\mathcal{L}$partition $\left(\mathcal{C}\right(),T)$ is a partition of ${\mathbb{R}}^{n}$ such that:${\mathbb{R}}^{n}={\cup}_{\mathbf{t}\in T}\mathcal{C}\left(\mathbf{t}\right)$ and $\mathcal{C}\left(\mathbf{t}\right)\cap \mathcal{C}\left({\mathbf{t}}^{\prime}\right)=\u2300$. The index set T is a countable set.
There is exactly one lattice point in each cell $\mathcal{C}\left(\mathbf{t}\right)$, and there is a polynomial time algorithm to convert a tag $\mathbf{t}$ to the corresponding lattice vector $\mathbf{v}\in \mathcal{C}\left(\mathbf{t}\right)\cap \mathcal{L}$.
A nontrivial partition is generally related to the orthogonal basis
${\mathbf{B}}^{*}$. Some examples are given in [
23]. Here, we only introduce natural partition, which was first proposed by Fukase and Kashiwabara [
18], since it has smaller moments than other lattice partitions such as Babai’s and Schnorr’s, implying that enumeration with natural partition tends to have more efficiency. In this paper, we only discuss discrete pruning based on natural partition, following the work of [
18,
23,
24].
Definition 4. Given a lattice $\mathcal{L}$ with the basis $\mathbf{B}\in {\mathbb{Z}}^{n\times n}$, and a lattice vector $\mathbf{v}={\sum}_{j=1}^{n}{u}_{j}{\mathbf{b}}_{j}^{*}$, the natural number representation (NNR) of $\mathbf{v}$ is a vector $\mathbf{t}=({t}_{1},\dots ,{t}_{n})\in {\mathbb{N}}^{n}$ such that ${u}_{j}\in \left(\frac{{t}_{j}+1}{2},\phantom{\rule{0.166667em}{0ex}}\frac{{t}_{j}}{2}\right]\cup \left(\frac{{t}_{j}}{2},\phantom{\rule{0.166667em}{0ex}}\frac{{t}_{j}+1}{2}\right]$ for all $j=1,\dots ,n$. Then, natural number representation $\mathbf{t}\in {\mathbb{N}}^{n}$ leads to the natural partition $(\mathcal{C}\left(\right),{\mathbb{N}}^{n})$ by defining The shape of $\mathcal{C}\left(\mathbf{t}\right)=\left\{{\sum}_{i=1}^{n}{x}_{i}{\mathbf{b}}_{i}^{*},\frac{{t}_{i}+1}{2}<{x}_{i}\le \frac{{t}_{i}}{2}\phantom{\rule{4.pt}{0ex}}\mathrm{or}\phantom{\rule{4.pt}{0ex}}\frac{{t}_{i}}{2}<{x}_{i}\le \frac{{t}_{i}+1}{2}\right\}\u228a{\mathbb{R}}^{n}$ is a union of ${2}^{j}$ hypercuboids (boxes), which are centrosymmetric and disjoint, where j is the number of nonzero coefficients in $\mathbf{t}$.
Given a certain cell, the lattice vector lying in it can be determined by the tag and lattice basis; however, if we randomly pick some cells, the position of the lattice vector in
$\mathcal{C}\left(\mathbf{t}\right)$ always shows a kind of randomness. We can naturally assume that the lattice point belonging to
$\mathcal{C}\left(\mathbf{t}\right)$ follows a random uniform distribution. The prototype of this assumption was first proposed by Schnorr [
15] and generalized by Fukase and Kashiwabara [
18]. Aono, Nguyen and Shen [
23,
24] also use this assumption by default.
Assumption 2. (Randomness Assumption [
18])
. Given a lattice $\mathcal{L}\left(\mathbf{B}\right)\in {\mathbb{Z}}^{n\times n}$ with orthogonal basis ${\mathbf{B}}^{*}$ and a natural number vector $\mathbf{t}\in {\mathbb{N}}^{n}$, for the lattice vector $\mathbf{v}={\sum}_{j=1}^{n}{u}_{j}{\mathbf{b}}_{j}^{*}$ contained in $\mathcal{C}\left(\mathbf{t}\right)$, the Gram–Schmidt coefficients ${u}_{j}$ ($j=1,\dots ,n$) are uniformly distributed over $\left(\frac{{t}_{j}+1}{2},\phantom{\rule{0.166667em}{0ex}}\frac{{t}_{j}}{2}\right]\cup \left(\frac{{t}_{j}}{2},\phantom{\rule{0.166667em}{0ex}}\frac{{t}_{j}+1}{2}\right]$ and statistically independent with respect to j. In such an ideal situation, by considering the GS coefficients
${u}_{j}$ of
$\mathbf{v}$ as random variables, one can compute the expectation and variance value of
${\parallel \mathbf{v}\parallel}^{2}$, since
${\parallel \mathbf{v}\parallel}^{2}$ can also be considered a random variable. They are also defined as the first moment of cell
$\mathcal{C}\left(\mathbf{t}\right)$ [
23]:
This means that, for a given tag $\mathbf{t}$, we can predict the length of the lattice vector $\mathbf{v}\in \mathcal{C}\left(\mathbf{t}\right)$ immediately without converting $\mathbf{t}$ to $\mathbf{v}$, which is precise but takes a rather long time. This leads to the core idea of the discrete pruning method: we first search for a batch of cells ${\cup}_{\mathbf{t}\in S}\mathcal{C}\left(\mathbf{t}\right)$ that are “most likely” to contain very short lattice vectors; then, we decode them to obtain the corresponding lattice vectors and check if there is a $\mathbf{v}$ such that $\parallel \mathbf{v}\parallel \le R$. The pruning set is $P={\cup}_{\mathbf{t}\in S}\mathcal{C}\left(\mathbf{t}\right)$. If the randomness assumption and Gaussian heuristic hold, the probability that P contains a lattice vector shorter than R can be easily described by the volume of the intersection $\mathrm{vol}(Bal{l}_{n}\left(R\right)\cap P)={\sum}_{\mathbf{t}\in S}\mathrm{vol}\left(Bal{l}_{n}\left(R\right)\cap \mathcal{C}\left(\mathbf{t}\right)\right)$.
The outline of discrete pruned enumeration is given in Algorithm 1.
Algorithm 1 Discrete Pruned Enumeration 
 Require:
wellreduced lattice basis $\mathbf{B}$, number of tags M, target vector length R  Ensure:
$\mathbf{v}\mathcal{L}\left(\mathbf{B}\right)$ such that $\parallel \mathbf{v}\parallel <R=1.05\xb7\mathrm{GH}\left(L\right)$  1:
Reduce lattice basis $\mathbf{B}$  2:
while true do  3:
$S\leftarrow \u2300$  4:
Use binary search to find bound r such that there are M tags $\mathbf{t}$ satisfying $f\left(\mathbf{t}\right)<r$;  5:
Enumerate all these M tags and save them in set S  6:
for $\mathbf{t}\in S$ do  7:
Decode $\mathbf{t}$ to recover the corresponding $\mathbf{v}$ such that $\mathbf{v}\in \mathcal{C}\left(\mathbf{t}\right)$;  8:
if ${\parallel \mathbf{v}\parallel}^{2}<{R}^{2}$ then return $\mathbf{v}$; $//$ Find a solution  9:
end if  10:
end for  11:
Rerandomize $\mathbf{B}$ by multiplying a unimodular matrix $\mathbf{Q}\in {\mathbb{Z}}^{n\times n}$, i.e., $\mathbf{B}\leftarrow \mathbf{BQ}$  12:
Reprocess $\mathbf{B}$ using lattice reduction algorithms such as BKZ or LLL  13:
end while

3. Improvements in Discrete Pruning Method
3.1. Rectification of Randomness Assumption
Most of the studies on discrete pruning take randomness assumption as a foundation of their analyses; therefore, they can apply Equation (
2) to predict vector length. However, we can easily point out a paradox if the following assumption holds: for two cells with tag
$\mathbf{t}=[{t}_{1},\dots ,{t}_{k}\ne 0,0,\dots ,0]$ and
${\mathbf{t}}^{\prime}=[{t}_{1},\dots ,{t}_{k}+1\ne 0,0,\dots ,0]$, if
${t}_{k}$ is odd; then, it is easy to verify that the corresponding lattice vectors of
$\mathbf{t}$ and
${\mathbf{t}}^{\prime}$ are in opposite directions with the same length. However, the Equation (
2) indicates
$E\left[\mathcal{C}\left(\mathbf{t}\right)\right]<E\left[\mathcal{C}\left({\mathbf{t}}^{\prime}\right)\right]$. In fact, we also have
$\mathrm{vol}\left(Bal{l}_{n}\left(R\right)\cap \mathcal{C}\left(\mathbf{t}\right)\right)\ne \mathrm{vol}\left(Bal{l}_{n}\left(R\right)\cap \mathcal{C}\left({\mathbf{t}}^{\prime}\right)\right)$, which means these two cells have different success probabilities, while the lattice vectors contained in them are essentially the same.
This paradox implies that the distribution of lattice points in cells is not completely uniform. As a matter of fact, for a tag $\mathbf{t}=[{t}_{1},\dots ,{t}_{k}\ne 0,0,\dots ,0]$, GS coefficient ${u}_{k},{u}_{k+1},\dots ,{u}_{n}$ of the corresponding lattice vector, $\mathbf{v}\in \mathcal{C}\left(\mathbf{t}\right)$ are fixed integers rather than uniformly distributed real numbers. The exact values of ${u}_{k},{u}_{k+1},\dots ,{u}_{n}$ are given in Proposition 1.
Proposition 1. Given a lattice $\mathcal{L}\left(\mathbf{B}\right)$ with orthogonal basis ${\mathbf{B}}^{*}$ and a tag $\mathbf{t}=[{t}_{1},\dots ,{t}_{k}\ne 0,0,\dots ,0]$, the corresponded lattice vector is denoted by $\mathbf{v}={\sum}_{j=1}^{n}{u}_{j}{\mathbf{b}}_{j}^{*}\in \mathcal{C}\left(\mathbf{t}\right)$, then Proof. We can verify the proposition through the procedures of decoding algorithm, and a brief theoretical proof is also provided. For lattice vector $\mathbf{v}={\sum}_{i=1}^{n}{x}_{i}{\mathbf{b}}_{i}={\sum}_{i=1}^{n}{u}_{i}{\mathbf{b}}_{i}^{*}\in \mathcal{C}\left(\mathbf{t}\right)$, where ${u}_{k}={x}_{k}+{\sum}_{i=k+1}^{n}{\mu}_{i,k}{x}_{i}$, the last nonzero coefficient of $\mathbf{x}$ is ${x}_{k}$ if, and only if, ${u}_{k}$ is the last nonzero coefficient of $\mathbf{u}$, and ${u}_{k}={x}_{k}$. Then, according to Definition 4, we have ${t}_{j}=0$ for all $j>k$; ${u}_{k}$ is nonnegative if, and only if, ${t}_{k}=2{x}_{k}1$ is odd; ${u}_{k}<0$ if and only if ${t}_{k}=2{x}_{k}$ is even. For brevity, the tags with odd and even numbers in the last nonzero coefficient are called the “oddended tag” and “evenended tag”, respectively. □
Based on Proposition 1, the rectified randomness assumption is given below, and the moments of natural partition are also modified.
Assumption 3. (The Rectified Randomness Assumption).
Let $\mathcal{L}\left(\mathbf{B}\right)$ be an ndimensional lattice with orthogonal basis ${\mathbf{B}}^{*}$. Given a tag $\mathbf{t}$ with corresponding lattice vector $\mathbf{v}={\sum}_{j=1}^{n}{u}_{j}{\mathbf{b}}_{j}^{*}\in \mathcal{C}\left(\mathbf{t}\right)$, suppose the last nonzero coefficient of $\mathbf{t}$ is ${t}_{k}$; then, for $j<k$, we assume that ${u}_{j}$ is uniformly distributed over $\left(\frac{{t}_{j}+1}{2},\phantom{\rule{0.166667em}{0ex}}\frac{{t}_{j}}{2}\right]\cup \left(\frac{{t}_{j}}{2},\phantom{\rule{0.166667em}{0ex}}\frac{{t}_{j}+1}{2}\right]$ and independent with respect to j, for $j\ge k$, ${u}_{j}$ can be directly given by proposition (3). Then, the moments of lattice partition should also be modified, since the last several coefficients are not random variables after the rectification. For a tag
$\mathbf{t}=[{t}_{1},\dots ,{t}_{k}\ne 0,0,\dots ,0]$, the expectation of the corresponding
${\parallel \mathbf{v}\parallel}^{2}$ is
where
${u}_{k}$ is defined by Equation (
3).
After the rectification, for two tags $\mathbf{t},{\mathbf{t}}^{\prime}$, which only differ by 1 in the last nonzero coefficient, we have ${E}^{\prime}\left[\mathcal{C}\left(\mathbf{t}\right)\right]={E}^{\prime}\left[\mathcal{C}\left({\mathbf{t}}^{\prime}\right)\right]$, and the paradox mentioned in the beginning of this subsection is eliminated.
3.2. Binary Search and Cell Enumeration
A crucial step of Algorithm 1 is called
cell enumeration (line 5), aiming to find the “best”
M cells. We use the objective function
$f\left(\mathbf{t}\right)$ to measure how good the cells are.
$E\left[\mathcal{C}\right(\mathbf{t}\left)\right]$ (Equation (
2)) is a tacit indicator for searching for the proper
$\mathbf{t}$, since it is exactly the expected squared length of lattice vector
$\mathbf{v}\in \mathcal{C}\left(\mathbf{t}\right)$. Aono and Gama [
23] directly use
$E\left[\mathcal{C}\right(\mathbf{t}\left)\right]$ as the objective function
$f\left(\mathbf{t}\right)$ in cell enumeration, and Aono et al. [
24] use a modified version of
$E\left[\mathcal{C}\right(\mathbf{t}\left)\right]$ to guarantee its polynomial runtime. They require the function
$f\left(\mathbf{t}\right)={\sum}_{i=1}^{n}f(i,{t}_{i})$ satisfying
$f(i,0)=0$ and
$f(i,j)\ge f(i,{j}^{\prime})$ for all
i and
$j>{j}^{\prime}$, which means we have to drop the constants in
$E\left[\mathcal{C}\right(\mathbf{t}\left)\right]$, i.e., let
$f(i,j)=\frac{1}{4}({j}^{2}+j){\parallel {\mathbf{b}}_{i}^{*}\parallel}^{2}$. Based on their work and the rectification above, we propose a modified objective function. Given a tag vector
$\mathbf{t}=[{t}_{1},\dots ,{t}_{k}\ne 0,0,\dots ,0]$ as input, we first define
where
${u}_{i}$ is defined by Equation (
3). Then, the objective function of cell enumeration is
The complete cell enumeration procedure is given below:
Remark 1. Remark. Considering the symmetry of the lattice vector, we only search for evenended tags (line 17 and line 31).
The time complexity of Algorithm 2 is similar to that of [
24]: the number of times that Algorithm 2 enters the loop is, at most,
$(2n1)\xb7M/2$, where
M is the number of tags such that
$f\left(\mathbf{t}\right)\le r$. For each loop iteration, the number of arithmetic operations performed is
$O\left(1\right)$, and the number of calls to
$f\left(\right)$ is exactly one. The proof is essentially the same as that of theorem 11 in [
24]. (Note that, although we change the definition of
$f(i,{t}_{i})$, and therefore change the value calculated in line 3, this does not affect the total number of while loops in the asymptotic sense. Furthermore, the modification of
$f\left(\mathbf{t}\right)$ does not change the key step in the proof: each partial assignment
${\sum}_{i={i}_{0}}^{n}f(i,{t}_{i})\le R$ of a middle node can be expanded to a larger sum
${\sum}_{i=1}^{n}f(i,{t}_{i})\le R$. )
In cell enumeration, a bound
r should be determined as in the previous section, such that there are exactly
M tags satisfying
$f\left(\mathbf{t}\right)\le r$. Aono and Nugyen [
23] first proposed the idea to use the binary search method to find a proper
r. Aono et al. [
24] gave a detailed binary search algorithm (Algorithm 5 in [
24]), which was proved to have a polynomial running time
$O\left(n(n+1)M\right)+{n}^{O\left(1\right)}+O\left({log}_{2}M\right)$. Their algorithm uses the input (radius
r) precision to control the termination of the binary search, but a slight vibration of the radius
r will cause a large disturbance to the number of valid tags, since the number of tags is
$\mathbf{t}$, such that
$f\left(\mathbf{t}\right)<r$ grows exponentially with
r. Therefore, their binary search method could only guarantee an output of
N tags with
$N\in [M,(n+1\left)M\right]$, which is a relatively large interval. Then, it would be intractable to estimate the precise cost of the subsequent tagwise calculation, such as in the decoding algorithm.
Since the current termination condition in the binary search of DP enumeration will bring uncertainty into the total costestimation model, we provide a more practical and stable polynomialtime binary search strategy to determine the parameter
r for cell enumeration. The essential difference with ANS18 [
24] is that we use the number of output tags as the bisection indicator of binary search. This method guarantees that cell enumeration algorithm (Algorithm 1, line 5) outputs about
$(1\u03f5)M$ to
$(1+\u03f5)M$ tags
$\mathbf{t}$, satisfying
$f\left(\mathbf{t}\right)<r$. When
$\u03f5$ is small, the number of output tags can be approximately counted as
M.
Algorithm 2 CellENUM 
 Require:
Orthongonal basis ${\mathbf{B}}^{*}=[{\mathbf{b}}_{1}^{*},\dots ,{\mathbf{b}}_{n}^{*}]$, r  Ensure:
All $\mathbf{t}\in {\mathbb{N}}^{n}$ such that $f\left(\mathbf{t}\right)\le r$ where $f\left(\mathbf{t}\right)$, as defined in Equation ( 6), and $\mathbf{t}$ is evenended  1:
$S\leftarrow \u2300$  2:
${t}_{1}={t}_{2}=\dots ={t}_{n}=0$;  3:
${c}_{1}={c}_{2}=\dots ={c}_{n+1}=0$;  4:
$k\leftarrow 1$  5:
while true do  6:
if ${t}_{k}={t}_{k+1}={t}_{n}=0$ then  7:
${c}_{k}\leftarrow 0$  8:
else if ${t}_{k}$ is the last nonzero component of $\mathbf{t}$ then  9:
${c}_{k}\leftarrow {u}_{k}^{2}{\parallel {\mathbf{b}}_{k}^{*}\parallel}^{2}$ $//$${u}_{k}$ is calculated using Equation ( 3)  10:
else  11:
${c}_{k}\leftarrow {c}_{k+1}+\frac{1}{4}({t}_{i}^{2}+{t}_{i}){\parallel {\mathbf{b}}_{i}^{*}\parallel}^{2}$;  12:
end if $//$ calculating $f(k,{t}_{k})$, defined by Equation ( 5)  13:
if ${c}_{k}<r$ then  14:
if $k=1$ then  15:
$S\leftarrow S\cup \{\mathbf{t}=({t}_{1},{t}_{2},\dots ,{t}_{n})\}$  16:
if ${t}_{k+1}=\dots ={t}_{n}=0$ then  17:
${t}_{k}\leftarrow {t}_{k}+2$ $//$ Only output “evenended” tags  18:
else  19:
${t}_{k}\leftarrow {t}_{k}+1$  20:
end if  21:
else  22:
$k\leftarrow k1$  23:
${t}_{k}\leftarrow 0$  24:
end if  25:
else  26:
$k\leftarrow k+1$  27:
if $k=n+1$ then  28:
exit  29:
else  30:
if ${t}_{k+1}=\dots ={t}_{n}=0$or$k=n$ then  31:
${t}_{k}\leftarrow {t}_{k}+2$;  32:
else  33:
${t}_{k}\leftarrow {t}_{k}+1$;  34:
end if  35:
end if  36:
end if  37:
end whilereturnS

Remark 2. In Algorithm 3,CellENUM in line 3 is actually a variant of the original Algorithm 2. It only needs to count the number of qualified tags and return early when $\#S>(1+\u03f5)M$, and it has no need to store the valid tags.
Algorithm 3ComputeRadius 
 Require:
$M,\u03f5,{\mathbf{B}}^{*}$  Ensure:
$r\in \mathbb{R}$ such that $\#\{\mathbf{t}:f(\mathbf{t})<r\}\approx M$  1:
${R}_{l}\leftarrow {\sum}_{i=1}^{n}f(i,0)=0$  2:
${R}_{r}\leftarrow {\sum}_{i=1}^{n}f(i,\lceil {M}^{\frac{1}{n}}\rceil )$  3:
while${R}_{l}<{R}_{r}$do  4:
${R}_{m}\leftarrow \frac{{R}_{l}+{R}_{r}}{2}$  5:
$S\leftarrow \mathsf{CellENUM}(\mathbf{B},{R}_{m})$  6:
if $\#S<(1\u03f5)M$ then  7:
${R}_{l}\leftarrow {R}_{m}$ $//$${R}_{m}$ is too small  8:
else if $\#S>(1+\u03f5)M$ then  9:
${R}_{r}\leftarrow {R}_{m}$ $//$${R}_{m}$ is too large  10:
elsereturn $r\leftarrow {R}_{m}$ $//$${R}_{m}$ is acceptable  11:
end if  12:
end while

Theorem 1 gives the asymptotic time complexity of Algorithm 3:
Theorem 1. Given lattice $\mathcal{L}\left(\mathbf{B}\right)$, M, a relaxation factor ϵ, Algorithm 3 ends within $O\left(logn+log\frac{1}{\u03f5}+\right.$$\left.nlog\left(n\mathrm{det}{\left(L\right)}^{\frac{2}{n}}\right)\right)$ loops and output the enumeration parameter r, such that $(1\u03f5)M\le \#\{\mathbf{t}\in {\mathbb{N}}^{n}:f\left(\mathbf{t}\right)<R\}\le (1+\u03f5)M$. In each loop, subalgorithmcellENUM is called exactly once.
The approximate proof of Theorem 1 is given in
Appendix A. In the following experiments, we set
$\u03f5=0.005$.
3.3. Lattice Decoding
The decoding algorithm converts a tag
$\mathbf{t}\in {\mathbb{N}}^{n}$ to a lattice vector
$\mathbf{v}\in \mathcal{L}\left(\mathbf{B}\right)\subset {\mathbb{R}}^{n}$. The complete algorithm is described both in [
18,
23]. However, in discrete pruned enumeration, almost all the tags we process do not correspond to the solution for SVP, and there is no need to recover the coordinates of those lattice vectors. Instead, inspired by classical enumeration, we use an intermediate result, the partial sum of the squared length of lattice vector (line 14 in Algorithm 4), as an earlyabort indicator: when the projected squared length of lattice vector
$\rho ={\displaystyle \sum _{k=i}^{n}}{x}_{k}+{\displaystyle \sum _{i=k+1}^{n}}{\mu}_{ik}{x}_{i}{}^{2}{\parallel {\mathbf{b}}_{k}^{*}\parallel}^{2}$ is larger than the target length of SVP, we stop the decoding, since it is not a short lattice vector. Therefore, we avoid a vector–matrix multiplication with time
$O\left({n}^{2}\right)$.
Algorithm 4 has $O\left(n\right)$ loops and, in each loop, there are about $O\left(n\right)$ arithmetic operations, which are mainly organized by line 7. Therefore, the time complexity of Algorithm 4 is $O\left({n}^{2}\right)$. During experiments, we noticed that, for the SVP challenge, decoding terminates at index $i\approx 0.21n$ on average, which means that the earlyabort technique works and saves decoding time.
Space complexity of DP enumeration. We note that Decode can be embedded into line 15 in CellENUM. In other words, the decoding procedure can be instantly determined when a candidate tag is found, and then we can decide whether to output the final solution to SVP or throw out the tag. This indicates that DP enumeration essentially has polynomial space complexity, since it has no need to store any tags.
Algorithm 4Decode 
 Require:
tag $\mathbf{t}\in {\mathbb{N}}^{n}$, SVP target length $R=1.05\xb7\mathrm{GH}\left(L\right)$, orthogonalization information $\mathbf{U}={\left({\mu}_{i,j}\right)}_{n\times n}$, ${\mathbf{B}}^{*}\in {\mathbb{R}}^{n\times n}$;  Ensure:
lattice vector $\mathbf{v}$ such that ${\parallel \mathbf{v}\parallel}^{2}<{R}^{2}$ or output ⌀  1:
$\rho \leftarrow 0$  2:
$\Delta \leftarrow 1$ $//$ to indicate whether we should stop decoding;  3:
for$i=i$tondo  4:
${u}_{i}=0$  5:
end for  6:
for$i=n$to 1 do  7:
$y={\sum}_{j=i+1}^{n}{u}_{j}{\mu}_{j,i}$  8:
${u}_{i}=\u230ay+0.5\u230b$  9:
if ${u}_{i}\le y$ then  10:
${u}_{i}={u}_{i}{(1)}^{{t}_{i}}\u2308{t}_{i}/2\u2309$  11:
else  12:
${u}_{i}={u}_{i}+{(1)}^{{t}_{i}}\u2308{t}_{i}/2\u2309$  13:
end if  14:
$\rho \leftarrow \rho +{({u}_{i}y)}^{2}{\parallel {\mathbf{b}}_{i}^{*}\parallel}^{2}$ $//$$\rho ={\displaystyle \sum _{k=i}^{n}}{x}_{k}+{\displaystyle \sum _{i=k+1}^{n}}{\mu}_{ik}{x}_{i}{}^{2}{\parallel {\mathbf{b}}_{k}^{*}\parallel}^{2}$  15:
if $\rho >R$ then  16:
$\Delta \leftarrow 0$  17:
exit  18:
else  19:
$\Delta \leftarrow 1$ $//$ find a solution to SVP  20:
end if  21:
end for  22:
if$\Delta =1$then return$\mathbf{v}=\mathbf{Bu}$  23:
elsereturn⌀  24:
end if

3.4. Lattice Reduction for Reprocessing
3.4.1. Rerandomization and Reduction
To solve an SVP instance, the DP enumeration should be repeatedly run on many different bases, which means that the lattice basis should be rerandomized when DP enumeration restarts; hence, it should be reprocessed to maintain good quality. A plain reduction method is to use the polynomialtime LLL reduction as the reprocessing method, which only guarantees some primary properties and is not as good as BKZ reduction. However, a complete BKZ reduction will take a long time, and the estimation of its runtime requires a sophisticated method. Additionally, the BKZ algorithm produces diminishing returns: after the first dozens of iterations, the quality of the basis, such as the root Hermite factor, changes slowly during iterations, as illustrated in [
29,
30].
A complete reduction is unnecessary, since our DP enumeration algorithm does not require that the lattice basis is strictly BKZreduced. The key point of reprocessing for DP enumeration is to achieve a compromise between time consumption and basis quality. An earlyabort strategy called terminating BKZ [
31] is a good attempt to decrease the number of iterations of BKZ reduction while maintaining some good properties. However, the runtime of BKZ is still hard to estimate, since the number of iterations is not fixed, and those properties mainly describe the shortness of
${\mathbf{b}}_{1}$, providing little help for our cost estimation.
Another idea is to use a BKZ algorithm with limited tours of reduction, which is convenient for runtime analysis and also efficiently produces a wellreduced basis. This is what we call the “ktoursBKZ” algorithm (Algorithm 5), which restricts the total number of “tours” (lines 4–18 in Algorithm 5) of BKZ within
k. Given BKZ blocksize
$\beta $ and
k, the time consumption of Algorithm 5 can be approximately estimated by multiplying
$k(n\beta )$ and the cost of solving a single
$\beta $dimensional SVP oracle. This is explained in
Section 4.
Algorithm 5 ktoursBKZ 
 Require:
Lattice basis $\mathbf{B}$; BKZ blocksize $\beta $, k  Ensure:
a reprocessed lattice basis ${\mathbf{B}}^{\prime}$  1:
$Z\leftarrow 0$; $i\leftarrow 0$; $//$Z is used to judge the termination condition for original BKZ;  2:
$K\leftarrow 0$; $//$K records the tours;  3:
LLL(${\mathbf{b}}_{1},\dots ,{\mathbf{b}}_{n}$);  4:
while$Z<n1$or$K<k$do  5:
$K++$  6:
for $i=1$ to $n1$ do  7:
$j\leftarrow min(i+\beta 1,n)$;  8:
$h\leftarrow min(j+1,n)$;  9:
$\mathbf{v}\leftarrow \mathrm{ENUM}({\pi}_{i}\left({\mathbf{b}}_{i}\right),\dots ,{\pi}_{i}\left({\mathbf{b}}_{j}\right))$; $//$ call the SVP oracle  10:
if $\mathbf{v}\ne \mathbf{0}$ then  11:
$Z\leftarrow 0$;  12:
LLL( ${\mathbf{b}}_{1},\dots ,{\mathbf{b}}_{i1},{\sum}_{s=i}^{j}{v}_{s}{\mathbf{b}}_{s},{\mathbf{b}}_{i},\dots ,{\mathbf{b}}_{h}$ );  13:
else  14:
$z++$;  15:
LLL( ${\mathbf{b}}_{1},\dots ,{\mathbf{b}}_{h}$);  16:
end if  17:
end for  18:
end while

The value of
k is tightly related to the rerandomization; hence, the rerandomization method should also be cautiously discussed. The essence of this idea is to generate a unimodular matrix
Q as the elementary column transformation performed on
$\mathbf{B}$; therefore,
$\mathbf{BQ}$ will become the new basis waiting to be reprocessed. A very “heavy” matrix
Q means the basis is transformed with high intensity, which implies that the basis will lose many good properties after rerandomization, e.g., some very short basis vectors, which were achieved by the previous reduction, and the reprocessing procedure needs a long time to reach the wellreduced status. However, a very sparse
Q may lead to insufficient randomness during transformation, which may not guarantee a very different basis. To balance the randomness and reprocessing cost, we heuristically require a Hamming distance between
$\mathbf{Q}$ and an identity matrix
$\mathbf{I}$ satisfying
A practical way to generate such a $\mathbf{Q}$ is to randomly select n position $(i,j)$ with $i<j$ and let ${Q}_{ij}\leftarrow \{\pm 1,\pm 2,\dots \}$ as a small integer, as well as let ${Q}_{ii}=1$ for all $1\le i\le n$. This forms an uppertriangular unimodular matrix and immediately satisfies the above formula.
In practice, we find that this method can guarantee a new basis after reprocessing and will not destroy or reduce the basis quality too much; therefore, this can help the ktoursBKZ to achieve a stable basis quality again in only few tours (see
Figure 1,
Figure 2 and
Figure 3). The experiments in the next subsection suggest that we can set a very small
k to save time in the reprocessing procedure. There is a broad consensus that “the most significant improvements of BKZ reduction only occurs in the first few rounds” [
27], and this is proved in [
31]. Some cryptanalyses also use an optimistically small constant
c as the number of BKZ tours used to estimate the attack overhead (like
$c=8$ in [
32]). Considering the previous literature and the following experiments, we set
$k=8$ for our ktoursBKZ method.
3.4.2. Average Quality of Lattice Basis During Reprocessing
Even when using the same parameters, DP enumeration will have different runtimes and probabilities of success when used on different bases of a lattice. We expect the basis to have a stable quality that will remain largely unchanged by the reprocessing operation and can be easily simulated or predicted without conducting a real lattice reduction. The very first work aims to define the quality of the lattice basis and study how it changes during enumeration loops. We chose three indicators to describe the quality of the basis and observe their behavior during reprocessing.
Gram–Schmidt Sum.
For DPenumeration, its success probability is tightly related to the lengths of the Gram–Schmidt basis vectors
$\{\parallel {\mathbf{b}}_{1}^{*}\parallel ,\dots ,\parallel {\mathbf{b}}_{n}^{*}\parallel \}$, and Fukase et al. [
18] proposed using
Gram–Schmidt Sum as a measurement of lattice basis quality and also gave an intuitive and approximate analysis of the strong negative correlation between
$\mathrm{GSS}\left(\mathbf{B}\right)$ and the efficiency of finding a short lattice vector: the larger the Gram–Schmidt Sum is, the smaller the success probability becomes. The
Gram–Schmidt Sum is defined as
We generated an
$n=120$ dimensional SVP challenge basis at random and showed how the ktoursBKZ change the
$\mathrm{GSS}\left(\mathbf{B}\right)$ of the basis during reprocessing. They started to form a
${\mathrm{BKZ}}_{\beta}$reduced basis; then, the lattice basis was rerandomized and reprocessed by ktoursBKZ for every
$k=8$ tours with blocksize
$\beta $. As shown in
Figure 1, the peak that comes up before every
$k=8$ tours of BKZ reduction corresponds to the
$\mathrm{GSS}\left(\mathbf{B}\right)$ of the rerandomized basis without reprocessing. The peak indicates that the lattice basis is evidently disturbed by our rerandomization method and
$\mathrm{GSS}\left(\mathbf{B}\right)$ becomes worse, while, in the following
k tours, the value of
$\mathrm{GSS}\left(\mathbf{B}\right)$ quickly returns to a similar state to the wellreduced initial state and hardly changes again. In general, when the lattice basis is iteratively reprocessed, the value of
$\mathrm{GSS}\left(\mathbf{B}\right)$ only shows mild fluctuations, which implies that the success probability when finding very short lattice vectors is quite stable.
Geometry Series Assumption and GS Slope.
For a wellreduced basis of a random lattice, such as the lattice of the SVP challenge, the Gram–Schmidt orthogonal basis generally has a regular pattern, which is called the
Geometry Series Assumption (GSA). For an
ndimensional lattice with a
${\mathrm{BKZ}}_{\beta}$reduced basis
$\mathbf{B}$, GSA means there is a
$q\in (0,1)$, such that
If GSA holds, the points
$\left\{\right(i,log\parallel {\mathbf{b}}_{i}^{*}{\parallel \left)\right\}}_{i=1}^{n}$ form a straight line with a slope of
$logq$. In other words,
q defines the “shape” of Gram–Schmidt sequence
$\{\parallel {\mathbf{b}}_{i}^{*}{\parallel \}}_{i=1}^{n}$. In the following, we call
$q\in (0,1)$ the
GS slope. In the real case of lattice reduction, the points
${\left\{(i,log\parallel {\mathbf{b}}_{i}^{*})\right\}}_{i=1}^{n}$ do not strictly lie on a straight line, and the approximate value of
q can be obtained by least square fitting. Generally, a
q closer to 1 means that the basis is more reduced and vice versa.
Figure 2 shows the evolution of fitted
q in reprocessing. The value of
q sharply decreases after each rerandomization and returns in the next few reprocessing tours, showing a stable trend during the iterative rerandomization and BKZ tours.
Root Hermite Factor.
In the studies of BKZ algorithm, the root Hermite factor
$\delta $ is used to describe the “ability” of BKZ to find a short lattice vector, which is given as the first basis vector
${\mathbf{b}}_{1}$ in the output basis:
Gama and Nguyen [
8] pointed out the phenomenon that, for BKZ algorithm, when blocksize parameter
$\beta \leqq n$, the root Hermite factor is only affected by the blocksize parameter
$\beta $ but has no relationship with the lattice dimension. Additional observations of root Hermite factor are given by Table 2 of [
29].
As in
$GSS\left(\mathbf{B}\right)$ and GS slope
q,
Figure 3 shows the evolution of the root Hermite factor
$\delta $ of the reprocessed basis. The peaks arising before reprocessing tours also indicate that the first basis vector
${\mathbf{b}}_{1}$ is frequently changed by our rerandomization method, and in the following
k tours, the value of
$\delta $ quickly returns to a similar state to the wellreduced initial state and hardly changes again.
Based on the above observations, we believe that, during the reprocessing stage, only a few BKZ reduction tours can stabilize the properties of the lattice bases.
4. Precise Cost Estimation of DPENUM
The precise cost estimation of DP enumeration is a great concern and remains an open problem for cryptanalysis. However, there are several obstacles to building a good runtime model that is consistent with the experimental results of a viable dimension and can be extrapolated to a very high dimension.
First, DP enumeration contains many subalgorithms with different structures, such as binary search, cell enumeration, decoding and reprocessing. Although the asymptotic time complexity expression of each part is clearly discussed in
Section 3, the real runtime of the DP enumeration still needs to be handled carefully. These subalgorithms involve a variety of arithmetic operations and logic operations, which make it hard to define a universal “basic operation” for all the DP enumeration procedures. To build a runtime model for DP enumeration, our key idea is to use CPU cycles as the basic operation unit, since this can avoid the differences caused by different types of operations and is also easy to count.
Second, the searching space of DP enumeration is a union of many discrete boxes irregularly distributed in
${\mathbb{R}}^{n}$. It is quite hard to compute the volume of the pruning set, which directly determines the probability for pruning success. Aono et al. proposed using FILT to compute the volume of its pruning set [
23], but this calculation model should be modified to achieve better accuracy, according to the rectification of the original randomness assumption we made in
Section 3.
According to Algorithm 1, the cost of each loop in DP enumeration can be divided into four parts:
${T}_{bin}$: Use binary search to determine cell enumeration parameter r (Algorithm 3)
${T}_{cell}$: Enumerate all the tags of candidate cells (Algorithm 2)
${T}_{decode}$: Decode a tag and check the length of the corresponding lattice vector (Algorithm 4)
${T}_{repro}$: If there is no valid solution to SVP, rerandomize the lattice basis and reprocess it by ktoursBKZ algorithm (Algorithm 5)
Denote the success probability when finding a lattice vector shorter than
R in a single loop of DP enumeration by
${p}_{succ}$ and assume that
${p}_{succ}$ is stable during rerandomizing; then, the expected number of loops is about
$\frac{1}{{p}_{succ}}$ according to geometric distribution, and the total runtime
${T}_{total}$ of DP enumeration can be estimated by
We assume that the preprocessing time for ${T}_{pre}$, which denotes the time for a fulltour ${\mathrm{BKZ}}_{\beta}$ reduction on the initial lattice basis, is far less than the time spent in the main iteration and can be ignored when $\beta \ll n$. In this section, our aim is to determine the explicit expression of ${T}_{repro}$, ${T}_{bin}$, ${T}_{cell}$ and ${T}_{decode}$, as well as provide an accurate estimation of ${p}_{succ}$.
For all the experiments in this paper, the computing platform is a server with Intel Xeon E52620 CPUs, with eight physical cores running at 2.10 GHz, and 64GB RAM. To obtain accurate CPU cycles for the DP enumeration algorithm, we fixed the CPU basic frequency and set the CPU affinity, and all the time data for fitting were obtained from our singlethread implementation.
4.1. Simulation of Lattice Basis
Some parts in the total time cost model Equation (
10) are hugely dependent on the quality of the basis: more precisely, the vector lengths of Gram–Schmidt orthogonal basis
$\{\parallel {\mathbf{b}}_{i}^{*}{\parallel \}}_{i=1}^{n}$. Based on the reprocessing method and the stability analysis in
Section 3.4, we can reasonably assume the Gram–Schmidt sequence
$\{\parallel {\mathbf{b}}_{i}^{*}{\parallel \}}_{i=1}^{n}$ will not severely change in the loops of DP enumeration. Then, the issue is to simulate an “average” GS sequence for a
${\mathrm{BKZ}}_{\beta}$reduced basis.
BKZ simulator [
29,
30] is a universal method, especially when
$\beta $ is quite large, since the Gaussian heuristic generally holds in
$\beta $dimensional blocks where
$\beta \gtrsim 45$. In this case, the probabilistic BKZ simulator proposed by Bai et al. [
30] is an appropriate model due to its reliable depiction of the “head concavity” phenomenon. Based on Algorithm 3 in [
30], we provide this BKZ simulator in C++ as a component of our implementation.
However, the BKZ simulator does not work well for small and mediumsized $\beta $ ($\beta <45$), because the keystone of the BKZ simulator is to estimate the shortest vector length in a $\beta $dimensional sublattice (block) ${\mathcal{L}}_{[i,j]}$ by computing $GH\left({\mathcal{L}}_{[i,j]}\right)$, which will decrease in accuracy when $\beta <45$. This is rarely the case with asymptotic cryptanalysis. However, this is a prevailing case in preprocessing and also needs investigation; hence, we have to propose a method that considers this case and fills the vacancy in GS sequence simulation.
If we combine
and Equation (
8), then we have
Using Equation (
11), we can approximately calculate the whole Gram–Schmidt sequence
$\{\parallel {\mathbf{b}}_{i}^{*}{\parallel \}}_{i=1}^{n}$ if GSA holds and one of
$\parallel {\mathbf{b}}_{1}\parallel $ or
q is known. Here, we prefer to use the GS slope
q rather than the value of
$\parallel {\mathbf{b}}_{1}\parallel $, since it contains more information on the Gram–Schmidt orthogonal basis. Additionally, using
$\parallel {\mathbf{b}}_{1}\parallel ={\delta}^{n}\mathrm{vol}{\left(\mathcal{L}\right)}^{1/n}$ to recover the Gram–Schmidt sequence might lead to an overly optimistic estimation, since the “head concavity” phenomenon [
30] indicates that the
$\parallel {\mathbf{b}}_{1}\parallel $ might significantly deviate from the prediction given by GSA. Then, the feasible approach is to give an average estimation of
q for a given lattice and BKZ parameter
$\beta $.
We find that the GS slope has similar properties to the root Hermite factor: when
$\beta \ll n$, the GS slope of a reduced basis is mostly related to its blocksize
$\beta $ but not its lattice dimension. For each parameter set
$(n,\beta )\in \{120,\dots ,180\}\times \{11,13,\dots ,45\}$, we generate 50 random SVP challenge instances and apply
${\mathrm{BKZ}}_{\beta}$ on the
ndimensional lattice basis to verify this phenomenon. Then, we use the least square method to fit the
$logq$ of reduced bases.
Figure 4 shows the relationship between
q and lattice dimension
n, indicating that
q is hardly dependent on lattice dimension
n and mostly varies with
$\beta $.
Figure 5 illustrates the positive correlation of
q and
$\beta $, which is consistent with the idea that a larger blocksize
$\beta $ makes the lattice basis better, and the GS slope is milder, which means that
$q<1$ is closer to 1.
Table 1 gives an estimated value of
${q}_{\beta}$.
By using the empirical data of
${q}_{\beta}$, we can generate a virtual GS sequence
$\{{B}_{1},{B}_{2},\dots ,{B}_{n}\}$ to simulate the real behavior of the Gram–Schmidt orthogonal basis of a
${\mathrm{BKZ}}_{\beta}$reduced lattice basis by solving the following equations:
Remark 3. All the explicit values of ${q}_{\beta}$ for $\beta \le 45$ are given in our opensource implementation. Note that the method proposed above only takes effect when $\beta \le 45$, while we still give an extrapolation as an alternative reference. Since $q<1$ holds for all $\beta >0$ and ${q}_{\beta}$ is an increasing function of β, which implies a trend $q\stackrel{\beta \to \infty}{\to}1$, we heuristically assume this function has a form of $q=1exp(a\beta +b)$; then, the fitting result of Table 1 isThe fitting curve is also illustrated in Figure 5. 4.2. Cost of Subalgorithms
Cost of Binary Search and Cell Enumeration.
For the cell enumeration algorithm (Algorithm 2), the asymptotic time complexity is
$O\left(\right(2n1)\xb7M)$. We take
$n=60,\dots ,160$,
$M=1.0\times {10}^{4},1.5\times {10}^{4},\dots ,1.0\times {10}^{5}$ and, for each parameter set
$(n,M)$, we generate 100 SVP lattices at random. The fitting result is
For the binary search algorithm (Algorithm 3), Theorem 1 indicates that the asymptotic time complexity has an asymptotic upperbound
$\left(logn+log\frac{1}{\u03f5}+nlog\left(n\mathrm{det}{\left(L\right)}^{\frac{2}{n}}\right)\right)\times (2n1)M$. To simplify the fitting function and retain accuracy, we only take the dominant term of the complete expansion, which is
$nlog\left(n\mathrm{det}{\left(L\right)}^{\frac{2}{n}}\right)\xb7(2n1)M$. For the SVP challenge lattice, we have
$\mathrm{vol}\left(\mathcal{L}\right)\sim {2}^{10n}$. Then, the fitting function of
${T}_{bin}$ is
Both fitting functions obtained by the least square method have a coefficient of determination (Rsquared) larger than $0.95$.
Cost of Decoding.
To decode one tag by running Algorithm 4, the number of times to enter the for loop is
$O\left(n\right)$, and in each loop, this performs
$O\left(n\right)$ arithmetic operations. Therefore,
${T}_{decode}$ can be regarded as a quadratic function of
n. We take
$n=60,\dots ,160$ and fix
$M=1.0\times {10}^{5}$, and we generate 100 SVP lattices at random for each
n. The expected runtime
${T}_{decode}$ of decoding algorithm is fitted by
Figure 6 shows that the fitting curve of
${T}_{decode}$ is almost strictly consistent with the experimental data. The fitting function also has a coefficient of determination (Rsquared) larger than
$0.95$.
Cost of Reprocessing.
The cost of the ktoursBKZ algorithm is a little complicated, since it iteratively calls an
$O\left(\beta \right)$dimensional SVP oracle. Our implementation of ktoursBKZ is based on the BKZ 2.0 algorithm in fplll library [
13]. In one tour of
${\mathrm{BKZ}}_{\beta}$, the total runtime is composed of the processing time on
$n1$ blocks. For each block
$\mathcal{L}[i,j]=\mathcal{L}({\mathbf{b}}_{i},\dots ,{\mathbf{b}}_{j})$ with
$i=1,\dots ,n1$ and
$j=min(n,i+\beta 1)$, the main steps are classical enumeration and LLL reduction for updating:
Then, the cost of ktoursBKZ can be estimated by
In Equation (
17),
$\mathrm{BlockProcess}(j,n,logA)$ is the cost of updating basis (Algorithm 5, line 12). The asymptotic time complexity of this part is
$O({j}^{3}mlogA)$, which is mainly donated by LLL reduction [
33], where
$A\lesssim {2}^{10n}$ for the SVP challenge lattice. When
$\beta $ is small, the cost of updating cannot be ignored. For the cost of classical pruned enumeration,
${C}_{node}$ is the CPU cycles for processing a single node in the enumeration tree, which is said to be
${C}_{node}\approx 200$ [
34];
$\mathrm{EnumCost}(i,j)$ is the total amount of nodes that needs to be traversed to find a short vector on
$\mathcal{L}[i,j]$.
Let
$n=60,\dots ,150$,
$\beta =11,13,\dots ,43$ and
$k=8$, we record the cost of each stage (including runtime and the number of nodes) of
${\mathrm{ktoursBKZ}}_{\beta}$ on 50 random lattices. The least squares fitting shows
${C}_{node}\approx 205.45$, and
The remaining part is
$\mathrm{EnumCost}(i,j)$, which is the number of nodes of enumeration on block
$\mathcal{L}[i,j]$. For a full enumeration of
$\mathcal{L}[i,j]$, the total number of nodes can easily be derived from the Gaussian heuristic, which can be considered a baseline enumeration cost:
where
${V}_{k}\left(R\right)$ denotes the volume of a kdimensional ball with radius
R.
However, in our implementation of ktoursBKZ, the SVP oracle uses extreme pruning and heuristic enumeration radius
$c=min(1.1\mathrm{GH}\left(\mathcal{L}[i,j]\right),\parallel {\mathbf{b}}_{i}^{*}\parallel )$ for acceleration. We assume that, for classical enumeration on a
${\beta}^{\prime}=ji+1$ dimensional block
$\mathcal{L}[i,j]$, these methods offer a speedup ratio of
${r}_{{\beta}^{\prime}}$ in total, and
${r}_{{\beta}^{\prime}}$ is independent with the block index
i and lattice dimension
n. The key point is obtaining an explicit expression of
${r}_{{\beta}^{\prime}}$. (An alternative (lowerbound) estimation of enumeration cost is provided by Chen and Nguyen [
29]. The coefficients of their model are given in LWE estimator [
35]. However, their model is more suitable for when
$\beta $ is very high and is not very precise when the blocksize is small).
The value of
${\mathrm{FullEnumCost}}_{{\beta}^{\prime}}$ can be calculated by Equation (
20) with GS sequence
$\{\parallel {\mathbf{b}}_{i}^{*}{\parallel \}}_{i=1}^{n}$, and the actual number of enumeration nodes
${\mathrm{ExtremeEnumCost}}_{{\beta}^{\prime}}$ is obtained from experiments. We recorded the number of enumeration nodes
${\mathrm{ExtremeEnumCost}}_{{\beta}^{\prime}}$ to calculate the speedup ratio data. To fit
${r}_{{\beta}^{\prime}}$, we run
${\mathrm{ktoursBKZ}}_{\beta}$ on
${\mathrm{BKZ}}_{\beta}$reduced bases with
$n=60,\dots ,150$ and
$\beta =11,13,\dots ,43$; then, all the data of the same blocksize were gathered and averaged. It is well known that extreme pruning can offer an exponential speedup [
12], and tightening the radius also leads to a superexponential speedup. We assume
${r}_{{\beta}^{\prime}}\sim expO({\beta}^{\prime}log{\beta}^{\prime})$, and by fitting, we can obtain
Figure 7 shows the fitting results and the value of
${r}_{{\beta}^{\prime}}$ in experiments, reflecting that the assumptions we made are reasonable.
To predict
$\mathrm{EnumCost}(i,j)$ without any information on a specific lattice basis, the GS sequence contained Equation (
21) should be replaced by the simulated values
$\{{B}_{1},{B}_{2},\dots ,{B}_{n}\}$ derived by equations set using the (
12) or BKZ simulator.
Algorithm 6 gives the procedures of calculating
${T}_{repro}$, i.e., the cost of reprocessing.
Algorithm 6 Calculating ${T}_{repro}$ 
 Require:
$\beta $, lattice dimension n, k, $\mathrm{vol}\left(\mathcal{L}\right)$  Ensure:
The running time ${T}_{repro}$ of reprocessing with ${\mathrm{ktoursBKZ}}_{\beta}$  1:
if$\beta <45$then  2:
Find q corresponding to $\beta $ $//$ See Table 1 or Equation ( 13)  3:
$log{B}_{1}\leftarrow \frac{1}{n}\left(log\left(\mathrm{vol}\left(\mathcal{L}\right)\right)\frac{1}{2}n(n1)\right)$ $//$ Equation ( 11)  4:
${B}_{1}\leftarrow exp(log{B}_{1})$  5:
for $i=2$ to n do  6:
$log{B}_{i}\leftarrow (i1)\xb7logq+log{B}_{1}$ $//$ Equation ( 8)  7:
${B}_{i}\leftarrow exp(log{B}_{i})$  8:
end for  9:
else  10:
${\left\{{B}_{i}\right\}}_{i=1}^{n}\leftarrow $ BKZSim() $//$ Use BKZ simulator in [ 30], Algorithm 3  11:
end if  12:
$i\leftarrow 0$  13:
$Cost\leftarrow 0$  14:
while$i=1$to$n1$do  15:
${\beta}^{\prime}=min(\beta ,ni+1)$  16:
${r}_{{\beta}^{\prime}}\leftarrow exp\left(0.35461{\beta}^{\prime}log{\beta}^{\prime}1.5331{\beta}^{\prime}+4.8982log{\beta}^{\prime}2.9084\right)$ $//$ Equation ( 22)  17:
$Cost=Cost+\frac{1}{{r}_{{\beta}^{\prime}}}\xb7$ FullENUMCost ( ${B}_{i},\dots ,{B}_{i+{\beta}^{\prime}1}$) + BlockProcess( $i+{\beta}^{\prime}1,n$) $//$ by replacing $\parallel {\mathbf{b}}_{i}^{*}\parallel $ with ${B}_{i}$ in Equation ( 20)  18:
end while  19:
return$k\xb7{C}_{node}\xb7Cost$

4.3. Success Probability
Under a Gaussian heuristic, the success probability of pruned enumeration can be directly reduced to computing the volume of the pruning set. For discrete pruning, the shape of the pruning set has always been considered a union of “ballbox intersections”, which is not easy to compute. Aono et al. [
23] proposed an efficient numerical method based on fast inverse Laplace transform (FILT) to compute the volume of a single “ballbox intersection”
$\mathcal{C}\left(\mathbf{t}\right)\cup {\mathrm{Ball}}_{n}\left(R\right)$ and used stratified sampling to deduce the total volume of the union.
However, the imperfections in original randomness assumption (Assumption 2) lead to reduced accuracy of the success probability model for discrete pruning. For two cells with tag
$\mathbf{t}=[{t}_{1},\dots ,{t}_{k}\ne 0,0,\dots ,0]$ and
${\mathbf{t}}^{\prime}=[{t}_{1},\dots ,{t}_{k}+1\ne 0,0,\dots ,0]$, if
${t}_{k}$ is odd, i.e.,
$\mathbf{t}$ is oddended and
${\mathbf{t}}^{\prime}$ is the corresponding evenended tags, they will have different success probabilities according to the model given by [
23]. However, the lattice vectors contained in
$\mathcal{C}\left(\mathbf{t}\right)$ and
$\mathcal{C}\left({\mathbf{t}}^{\prime}\right)$ have exactly the same length.
Figure 8 illustrates the gap at a larger scale. For the parameter settings
$n=60,\dots ,84$,
$M=\mathrm{50,000}$ and
$\beta =20,30$, we used 30
${\mathrm{BKZ}}_{\beta}$reduced
ndimensional lattice bases to compute the average value of theoretical success probability
${p}_{succ,odd}$ of
M oddended cells enumerated by Algorithm 2, as well as compute
${p}_{succ,even}$ of their corresponding evenended cells, both using the method provided by [
23]. Then, we ran a complete DP enumeration on each lattice basis using the same parameters and recorded the number of iteration rounds.
Figure 8 shows that the actual number of rounds of DP enumeration is in the apparent gap between expectation value
$1/{p}_{succ,odd}$ and
$1/{p}_{succ,odd}$, which were estimated using oddended and evenended cells, respectively.
This phenomenon calls for a proper rectification of the success probability model. As a matter of fact, in
Section 3.1, Proposition 1 and the rectified Assumption 3 indicate that lattice point is actually randomly distributed in an hyperplane contained in
$\mathcal{C}\left(\mathbf{t}\right)\cup {\mathrm{Ball}}_{n}\left(R\right)$, which can be described by the assumption below:
Assumption 4. Given lattice basis $\mathbf{B}$ and its orthogonal basis ${\mathbf{B}}^{*}$, for a tag vector $\mathbf{t}=[{t}_{1},\dots ,{t}_{k}\ne 0,0,\dots ,0]$, the lattice vector of $\mathcal{C}\left(\mathbf{t}\right)$ can be considered to be uniformly distributed over ${\mathcal{C}}^{\prime}\left(\mathbf{t}\right)\subset \mathcal{C}\left(\mathbf{t}\right)$, where This assumption gives a more precise distribution of the lattice vector in the cell. In fact,
${\mathcal{C}}^{\prime}\left(\mathbf{t}\right)$ is the union of a
${2}^{k1}$$k1$dimensional “box”, which is formally denoted by
Based on Proposition 1 and the new assumption of lattice vector distribution, we redefine the success probability of DP enumeration on a single cell. For a
$\mathcal{C}\left(\mathbf{t}\right)$ with
$\mathbf{t}=[{t}_{1},\dots ,{t}_{k}\ne 0,0,\dots ,0]$, denoting the lattice vector
$\mathbf{v}\in \mathcal{C}\left(\mathbf{t}\right)$ by
$\mathbf{v}={\sum}_{j=1}^{n}{u}_{j}{\mathbf{b}}_{j}^{*}$, the probability that
$\parallel \mathbf{v}\parallel \le R$ is defined by
Let
${R}^{\prime}=\sqrt{{R}^{2}{u}_{k}^{2}{\parallel {\mathbf{b}}_{k}^{*}\parallel}^{2}}$,
${\alpha}_{i}=\frac{{t}_{i}}{2{R}^{\prime}}\parallel {\mathbf{b}}_{i}^{*}\parallel $ and
${\beta}_{i}=\frac{{t}_{i}+1}{2{R}^{\prime}}\parallel {\mathbf{b}}_{i}^{*}\parallel $; then, the numerator part in Equation (
24) can be written as
Then, the calculation of
${p}_{succ}\left(\mathbf{t}\right)$ is reduced to computing the sum distribution of
$k1$ independent and identically distributed variables
${x}_{1}^{2},\dots ,{x}_{k1}^{2}$, which can be approximated by the FILT method combined with Euler transformation. The details of these methods are given in
Appendix B.
For a set of tags
$\mathcal{U}$, which is the output of cellENUM (Algorithm 2), the total success probability of finding a short lattice vector among
$\mathcal{U}$ is
To extrapolate the probability model to higherdimensional SVP instances without performing any timeconsuming computation of real lattice reduction, the concrete value of the GS sequence involved in the calculation of ${p}_{succ}$ should be replaced by the simulated GS sequence $\{{B}_{1},{B}_{2},\dots ,{B}_{n}\}$.
Figure 9 verifies the accuracy of the rectified success probability model (Equations (
24) and (
26)). We take the SVP instances with
$n=60,\dots ,84,\beta =20,30$ and
M = 50,000 as examples, and we run the DP enumeration algorithm to solve the SVP challenge on each SVP instance to record the total iteration rounds. The experiment was repeated 30 times on each parameter set to obtain the average value. The dashed line shows the expected iteration rounds
$1/{p}_{succ}$ calculated using the original
$\{\parallel {\mathbf{b}}_{i}^{*}{\parallel \}}_{i=1}^{n}$ of the real reduced basis, and the dotted line was only calculated with the simulated GS sequence
${\left\{{B}_{i}\right\}}_{i=1}^{n}$. The results illustrate that the rectified model gives a more precise estimation of success probability than the original method provided in [
23].
4.4. Simulator for DP Enumeration
Based on all the works in this section, the runtime of DP enumeration can be estimated by Algorithm 7, shown below. This simulator only needs minimal information for a lattice
$\mathcal{L}$.
Algorithm 7 DPsimulator 
 Require:
Lattice dimension and volume $n,\mathrm{vol}\left(\mathcal{L}\right)$, $k,\beta $, M, target length R of SVP  Ensure:
Expected runtime (CPU cycles) to find $\mathbf{v}\in \mathcal{L}$ such that $\parallel \mathbf{v}\parallel \le R$ by DP enumeration  1:
if$\beta <45$then  2:
Generate the simulated GS sequence ${B}_{1},\dots ,{B}_{n}$ by solving Equation ( 12)  3:
else  4:
Generate the simulated GS sequence ${B}_{1},\dots ,{B}_{n}$ by BKZ simulator  5:
end if  6:
Calculate ${r}_{\beta}$ by Equation ( 22)  7:
Calculate ${T}_{repro}$ by calling Algorithm 6 with parameters $(\beta ,n,k,\mathrm{vol}(\mathcal{L}\left)\right)$ as input  8:
Calculate ${T}_{cell}$ by Equation ( 14) with $M,\phantom{\rule{4pt}{0ex}}n$ 9:
Calculate ${T}_{bin}$ by Equation ( 15) with $M,\phantom{\rule{4pt}{0ex}}n$ 10:
Calculate ${T}_{decode}$ by Equation ( 16) with n 11:
Call Algorithms 2 and 3 with ${B}_{1},\dots ,{B}_{n}$ as the GS sequence and output the M tags with minimal value of $f\left(\mathbf{t}\right)$  12:
Calculate the total success probability ${p}_{succ}$ on the M tags, with GS sequence ${B}_{1},\dots ,{B}_{n}$ return $\frac{{T}_{repro}+{T}_{bin}+{T}_{cell}+M\xb7{T}_{decode}}{{p}_{succ}}$

Remark 4. The simulating method of GS sequence (line 1) only works for lattice bases that meet GSA. For those lattices that lack good “randomness” and do not satisfy GSA, one has to use a real $BK{Z}_{\beta}$ reduction algorithm on several lattice bases and compute an averaged GS sequence ${B}_{1},\dots ,{B}_{n}$ as a good simulation of $\{\parallel {\mathbf{b}}_{i}^{*}{\parallel \}}_{i=1}^{n}$.
5. The Optimal Parameters for DPENUM
To solve a certain SVP instance, the parameters of DP enumeration that need to be manually determined are as follows: $\beta $ of BKZ reduction algorithm, k of preprocessing and M of cell enumeration.
It should be noted that
k could be a fixed constant. There is no need to set a very large
k because of the “diminishing returns” of lattice reduction, which means the improvement in basis quality would slow down with an increase in
k. We heuristically set
$k=8$ for SVP instances with
$n\le 200$, which is also roughly consistent with the observation of [
32] (see Section 2.5 of [
32] ). Then, only
$\beta $ and
M should be determined with restrictions
$0<\beta \le n$ and
$M>0$. The two parameters should minimize the total cost of DP enumeration, i.e., the expression value of (
10). This value is calculated by Algorithm 7 and can barely be represented by a differentiable function. The Nelder–Mead simplex method is an effective method to solve this type of optimization problem. Since there are only two independent variables, it is reasonable to believe that the Nelder–Mead method can quickly converge to the optimal solution.
Algorithm 8 gives the optimal values of $\beta ,M$ for a certain SVP instance based on the standard version of the Nelder–Mead method.
Table 2 gives some concrete values of optimal parameter sets for solving mediumsize SVP challenges (
$R=1.05\mathrm{GH}\left(\mathcal{L}\right)$) and the corresponding estimation of running time. For the mediumsize SVP challenges, the optimal parameter set basically follows
$M\sim {10}^{5}$ and
$\beta <n/2$. Neither of them increase very rapidly with the growth of
n.
Algorithm 8 Finding optimal parameters for DP enumeration 
 Require:
lattice dimension n, lattice volume $\mathrm{vol}\left(\mathcal{L}\right)$ and the target vector length R of SVP  Ensure:
$(\beta ,M)$ that minimizes the output of DPsimulator ($n,\beta ,M,R$)  1:
$S(\beta ,M)$:= DPsimulator $(n,\beta ,M,R)$ + $P(\beta ,M)$ $//$$P(\beta ,M)$ is a penalty function to avoid parameters exceeding the feasible region, i.e., $\beta >n$ or $M<0$.  2:
$N\leftarrow 2$ $//$ 2 independent variables  3:
Select initial points ${\mathbf{x}}_{1}=[{\beta}_{1},{M}_{1}],\dots ,{\mathbf{x}}_{N+1}=[{\beta}_{N+1},{M}_{N+1}]$ at random  4:
while true do  5:
reorder the $N+1$ points, such that $S\left({\mathbf{x}}_{1}\right)<\dots <S\left({\mathbf{x}}_{N+1}\right)$  6:
${y}_{1}\leftarrow S\left({\mathbf{x}}_{1}\right),\dots ,{y}_{N+1}\leftarrow S\left({\mathbf{x}}_{N+1}\right)$  7:
if ${\beta}_{1}{\beta}_{N+1}<2$and${M}_{1}{M}_{N+1}<1000$ then  8:
break;  9:
end if  10:
${\mathbf{x}}_{m}\leftarrow \frac{1}{N}{\sum}_{i=0}^{N}{\mathbf{x}}_{i}$ $//$ calculate the centroid (midpoint)  11:
${\mathbf{x}}_{r}\leftarrow 2{\mathbf{x}}_{m}{\mathbf{x}}_{N+1}$ $//$ reflection  12:
${y}_{r}\leftarrow S\left({\mathbf{x}}_{r}\right)$  13:
if ${y}_{1}\le {y}_{r}<{y}_{N}$ then  14:
${\mathbf{x}}_{N+1}\leftarrow {\mathbf{x}}_{r}$  15:
continue;  16:
else if [then expansion] ${y}_{r}<{y}_{1}$  17:
${\mathbf{x}}_{e}\leftarrow {\mathbf{x}}_{m}+2({\mathbf{x}}_{r}{\mathbf{x}}_{m})$  18:
if $S\left({\mathbf{x}}_{e}\right)<{y}_{r}$ then  19:
${\mathbf{x}}_{N+1}\leftarrow {\mathbf{x}}_{e}$  20:
else  21:
${\mathbf{x}}_{N+1}\leftarrow {\mathbf{x}}_{r}$  22:
end if  23:
else if [ then contraction] ${y}_{N}\le {y}_{r}<{y}_{N+1}$  24:
${\mathbf{x}}_{c}\leftarrow {\mathbf{x}}_{m}+({\mathbf{x}}_{r}{\mathbf{x}}_{m})/2$  25:
if $S\left({\mathbf{x}}_{c}\right)<{y}_{r}$ then  26:
${\mathbf{x}}_{N+1}\leftarrow {\mathbf{x}}_{c}$  27:
continue;  28:
end if  29:
else  30:
${\mathbf{x}}_{c}\leftarrow {\mathbf{x}}_{m}+({\mathbf{x}}_{N+1}{\mathbf{x}}_{m})/2$  31:
if $S\left({\mathbf{x}}_{c}\right)<{y}_{r}$ then  32:
${\mathbf{x}}_{N+1}\leftarrow {\mathbf{x}}_{c}$  33:
continue;  34:
end if  35:
end if  36:
for [do shrink] $i=2$to$N+1$  37:
${\mathbf{x}}_{i}\leftarrow {\mathbf{x}}_{1}+({\mathbf{x}}_{i}{\mathbf{x}}_{1})/2$  38:
end for  39:
end whilereturn The optimal parameters ${\mathbf{x}}_{min}\leftarrow {\mathbf{x}}_{1}=[{\beta}_{1},{M}_{1}]$ and the corresponding cost estimation $S\left({\mathbf{x}}_{min}\right)$

6. Experiments and Analysis
We compared the performance of our optimized DP enumeration with different SVP solvers, including the polynomialspace extreme pruned enumeration and exponentialspace sieving. For each
n, the experiments were repeated on 40 different SVP challenge instances. We ran our optimized DP enumeration with parameters given by Algorithm 8. The lattice dimension
n ranged from 60 to 110, and the time cost predicted by DP simulator is also provided. The extreme pruned enumeration was implemented by fplll library [
13] with the default pruning function up to
$n=90$. The data of sieving were implemented by G6K library [
36] with the dimensionforfree method, i.e., G6K’s
WorkOut$(s=0,{f}^{+}=1)$ from
$n=60$ to 110.
Figure 10 illustrates the prediction of the DP simulator, as well as the experimental results of our optimized DP enumeration, extreme pruning and G6K sieving.
The experiments confirm the accuracy of our cost model proposed in
Section 4. The prediction (orange dotted line) is quite consistent with the actual performance of DP enumeration (orange broken line). For
$n\lesssim 75$, the DP enumeration algorithm sometimes finds a solution before the first round ends; therefore, the actual running time is slightly smaller than the simulated time. However,
$n>80$ shows that our implementation of DP enumeration (with optimal parameter set) coincides with the DP simulator very well.
Compared with extreme pruning,
Figure 10 shows that when
$n\gtrsim 67$, the optimized DP enumeration has a shorter runtime than the stateoftheart extreme pruning enumeration. As for sieving, Albrecht et al. [
37] has observed that the stateoftheart sieving algorithm outperforms classical enumeration at dimension
$n\gtrsim 70$, which is also verified in our experiments. The experimental results reveal that the crossover point of DP enumeration and sieving is around
$n\approx 82$, which is an update of the crossover dimension between enumeration and sieving.
In addition to the experimental performance, we also compared the asymptotic behavior of extreme pruned enumeration, G6K sieving and our implementation.
A commonly accepted cost model of extreme pruned enumeration originates from the work of Chen and Nguyen in ASIACRYPT’11 [
29]. An explicit fitting function is given by LWE estimator
estimator.BKZ.CheNgu12 [
35]:
However, a more recent work [
38] (denoted by [ABF+20] in
Figure 11) suggests that the fitting formula should strictly follow the form
${2}^{1/\left(2\mathrm{e}\right)nlog\left(n\right)+an+b}$, and their fitting result of [
29] is
The time complexity of sieving is believed to be
${2}^{0.292+o\left(n\right)}$ [
10], and G6K gives an even better result by fitting [
37]:
Here,
$\mathrm{CPUfreq}=2.3\times {10}^{9}$ according to their implementation; thus, the metric of
${T}_{\mathrm{sieve}}$ is unified to CPU cycles.
Since the cost model of (optimized) DP enumeration can accurately estimate the actual runtime in a high dimension, we used the DP simulator with an optimal parameter set to predict the runtime of discrete pruning during SVP challenge (from
$n=80$ to
$n=160$) and provide an asymptotic prediction. We required the fitting function to have a fixed form
${2}^{1/\left(2\mathrm{e}\right)nlog\left(n\right)+an+b}$ to be consistent with [
38]. The fitting result is
Figure 11 shows the asymptotic behavior of extreme pruned enumeration (
${T}_{extreme}$), sieved with the dimensionforfree technique (
${T}_{sieve}$), and the fitting function of DP simulator (
${T}_{discrete}$). Both the experimental and asymptotic comparison indicate that the discrete pruned enumeration might have more practical potential than the (classical) extreme pruning in solving highdimensional SVP, and it might become the most efficient polynomialspace SVP solver known to date.
7. Conclusions
In this paper, the discrete pruned enumeration algorithm for solving SVP was thoroughly studied and improved. We refined the mathematical theory underlying DP enumeration and propose some improvements to the DP enumeration algorithm to make it more practical. The most valuable part is that our discrete pruning simulator combined theoretical analysis and many numerical techniques. The experimental results verify that the DP simulator can precisely predict the performance of DP enumeration. For a certain SVP instance, we can use the DP simulator to find optimal parameters to minimize the DP enumeration runtime. The explicit time and space consumption is also given by the simulator. Using simulation experiments, we believe that the time complexity of DP enumeration is still superexponential, and the space complexity is still linear, which does not change the conclusion of the enumeration algorithm.
When comparing the performance of our implementation and extreme pruned enumeration, we show that DP enumeration, under optimal parameter settings, could outperform extreme pruning when
$n\gtrsim 67$. By comparing this with the stateoftheart exponentialspace SVP algorithm, sieving with dimension for free [
36,
37], we report an updated crossover point of enumeration and sieving at
$n\approx 82$, which is slightly higher than previously observed. Then, at a higher dimension (
$80<n<300$), we compared the asymptotic behavior of DP enumeration, extreme pruned enumeration and sieving, which also shows the advantage of the discrete pruning method compared with other polynomialspace SVP solvers.
We provide the analytical cost formula of DP enumeration as an asymptotic estimation for cryptanalysis reference, and we hope that the opensource implementation of this work could help cryptologists to further develop the algorithm.
There are several possible directions for improvement: