Lookup Table-Based Design of Scalar Multiplication for Elliptic Curve Cryptography

Ning, Yan-Duan; Chen, Yan-Haw; Shih, Cheng-Sin; Chu, Shao-I

doi:10.3390/cryptography8010011

Open AccessArticle

Lookup Table-Based Design of Scalar Multiplication for Elliptic Curve Cryptography

¹

Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung 807618, Taiwan

²

Department of Information Engineering, I-Shou University, Kaohsiung 840301, Taiwan

^*

Author to whom correspondence should be addressed.

Cryptography 2024, 8(1), 11; https://doi.org/10.3390/cryptography8010011

Submission received: 7 February 2024 / Revised: 13 March 2024 / Accepted: 14 March 2024 / Published: 18 March 2024

Download

Browse Figures

Versions Notes

Abstract

:

This paper is aimed at using a lookup table method to improve the scalar multiplication performance of elliptic curve cryptography. The lookup table must be divided into two polynomials and requires two iterations of point doubling operation, for which negation operations are needed. It is well known that an inversion operation requires a lot of multiplication time. The advantage of this paper is that we are able to reduce one inverse element calculation for this problem and also improve the basic operations of finite fields through segmentation methods. If the normal basis method is used in the design of the inverse element operation, it must be converted to the normal basis through the standard basis. However, the conversion process requires a lot of matrix operations. Though the anti-element operation has good speed performance, it also increases the computational complexity. Using number theory and grouping methods will greatly improve the performance of inverse element operations. With application of the two-time point doubling operation in the hardware implementation, the developed approach reduces the computing time by 48% as compared with the conventional approach. The computational time of the scalar multiplication using the presented method is further improved by 67% over the traditional algorithm with only an area increase of 12%. Finally, the proposed lookup table-based technique can be utilized for software and hardware implementation, as the developed arithmetic operations are simple and are consistent in their execution.

Keywords:

elliptic curve cryptography; scalar multiplication; finite field; point doubling

1. Introduction

Elliptic curve cryptography (ECC) was presented by N. Koblitz and V. Miller [1,2] in 1986. The ECC is an encryption technique based on the discrete logarithm problem. The public key cryptographic primitives can be implemented by using the elliptic curves over finite fields to generate finite abelian groups. The ECC provides similar security to existing public key cryptography but via application of a smaller key. The ECC with a 160-bit key has the equivalent security level of Rivest–Shamir–Adleman (RSA) and digital signature algorithms (DSA) that require a 1024-bit key. The ECC is defined over prime finite fields GF(p) or binary finite fields GF(2^m). The ECC design over prime fields generally provides more robust resistance to side-channel attacks as compared with those over binary fields. However, the ECC design over binary fields has the carry-free feature and makes arithmetic operations more suitable for hardware implementation.

The authors of [3] presented a method for constructing dynamic lookup tables in order to realize multiplication over a finite field. The Euclidian algorithm can be used for the inverse operation at the expense of computation time. Algorithms for the point multiplication over the ECC have been investigated in many studies. The method of non-adjacent forms (NAF) was presented by Morain et al. [4] to record the scalar in a point multiplication with an aim of reducing nonzero bits and thus the number of point additions. The authors of [5] used the representation of integers in binary form for simplification. Nowadays, most ECC techniques are performed over Koblitz curves [1,6]. Solinas [7] presented the Frobenius map method for data encryption. The projective and affine coordinates were combined in [8] to implement high-efficiency point addition and point doubling operations. There are no inverse operations involved in the exploitation of the coordinate transform techniques. Guo et al. developed [9] a scalar multiplication algorithm based on the step multi-base representation via point halving and the septuple formula to significantly reduce computational cost. The triple-based chain method was proposed in [10] to optimize the time usage in the elliptic curve cryptosystem. The authors of [11] presented a configurable ECC crypto-processor. The ECC operates over the prime field and is defined by the Weierstrass equation. This crypto-processor was verified on a Xilinx FPGA board.

The existing work on modular multipliers over binary finite fields GF(2^m) can be found in [12,13,14]. In [15], a speed-oriented architecture over the finite fields GF(2¹⁶³) and GF(2⁵⁷¹) were implemented by using a balanced quadratic multiplier with high operating frequency. Li et al. [16] exploited the Karatsuba–Ofman multiplier to a scalar multiplication over the finite fields over GF(2⁵⁷¹) and GF(2²⁸³). To minimize the number of point additions without increasing the number of point doubling, the Radix-2^w arithmetic for scalar multiplication was proposed in [17]. The authors of [18] developed a new Montgomery point multiplication algorithm for large field-size ECC over GF(2⁵⁷¹) and GF(2⁴⁰⁹) to optimize resource utilization efficiency. Recently, Zhang et al. [19] introduced a high-performance scalar multiplication architecture over binary fields. A low-latency window (LLW) algorithm was presented for hardware implementation to enhance security, as was an enhanced comb method for point addition and point doubling. The above methods in [15,16,17,18,19] transformed the process of scalar multiplication from an affine coordinate system to a projective coordinate system. These designs were implemented and verified on the FPGA boards.

Different from operations over the projective coordinate system, this paper focuses on the scalar multiplication over the affine system. The hardware architecture and circuit are synthesized for future ASIC implementation. In the affine coordinate system, point addition and point doubling require one modular inversion operation each time. Because the inversion over the finite field is the most time-consuming operation among all of the basic operations. Reducing the number of inverse operations is an important objective of our design. The main idea of this paper can be described as follows: The Fermat’s little theorem [20] is flexibly used for the inverse operation, which controls the critical path easily, as hardware implementation is required. The Horner’s rule is also innovatively exploited to improve the method in [5]. By the binary representation and grouping techniques for constructing lookup tables, the operations of point addition and point doubling are improved. A high-speed scalar multiplication is therefore achieved. The presented approach is applicable to the digital signature [21] for real-time mutual authentication. Finally, the proposed lookup table-based algorithms can be utilized for software and hardware implementations as the developed arithmetic operations are simple and consistent in their execution. From the perspective of the hardware design, the computational time of the scalar multiplication by the proposed method is reduced by 67% over the conventional algorithm. This is because the presented two-time point doubling is superior to the conventional method.

The rest of the paper is organized as follows: The finite field arithmetic is introduced in Section 2. Section 3 briefs the concepts of point addition and point doubling in ECC. The proposed algorithms for the finite field arithmetic is described in Section 4. Section 5 presents the new algorithm for scalar multiplication, which combines the methodology in Section 4. Section 6 concludes this paper.

2. Finite Field Arithmetic

The finite field is a set of finite elements of the field, also known as the Galois field (GF). Multiplication, addition, subtraction and division are defined and certain basic rules are satisfied.

The finite field GF(2) has two elements, with values of 0 and 1. The finite field GF(2) can be extended to GF(2^m) by a primitive polynomial. The finite field GF(2^m) has 2^melements with the values from 0 to 2^m − 1. The primitive polynomial is an irreducible polynomial denoted as

F (x)

. The national institute of standards and technology (NIST) recommended the primitive polynomials that are applicable to different GF(2^m) finite fields in the public document FIPS 186-4 [22] issued in 2013, as shown in Table 1.

2.1. Addition and Subtraction

The addition operation over the finite field is

δ = γ + β

, where

β, γ, δ \in GF (2^{m})

. The element has m-bit representation in a vector form. The addition is based on the bitwise exclusive OR (XOR) operation. Consider an example of GF(2³). Let

γ = (011)

and

β = (110)

. The result of

γ \oplus β

is

(101)

. The subtraction operation is the same as the addition operation.

2.2. Finite Field Multiplication

Multiplication over the finite field is defined as

δ = γ \cdot β

, where

β, γ, δ \in GF (2^{m})

. The element

γ

can be represented by the polynomial

A (x) = a_{m - 1} x^{m - 1} + a_{m - 2} x^{m - 2} + \dots + a_{2} x^{2} + a_{1} x + a_{0}

, where

a_{i} \in {0, 1}

. The element

β

is expressed by the polynomial

B (x) = b_{m - 1} x^{m - 1} + b_{m - 2} x^{m - 2} + \dots + b_{2} x^{2} + b_{1} x + b_{0}

, where

b_{i} \in {0, 1}

. The element

δ

is represented by

C (x) = c_{m - 1} x^{m - 1} + c_{m - 2} x^{m - 2} + \dots + c_{2} x^{2} + c_{1} x + c_{0}

, where

c_{i} \in {0,1}

. The multiplication result

C (x)

is the polynomial multiplication of

A (x)

and

B (x)

modulo

F (x)

, where

F (x)

is the primitive polynomial. That is,

C (x) \equiv A (x) B (x) (m o d F (x))

(1)

Take the field GF(2³) as an example. Let the elements

γ

and

β

be

(011)

and

(110)

, respectively. The primitive polynomial is

F (x) = x^{3} + x + 1

. The multiplication result of

γ

and

β

is obtained as

(001)

.

2.3. Finite Field Division

The division operation is

δ = γ / β

. It can be viewed as

δ = γ \cdot β^{- 1}

, where

β^{- 1}

is the inverse element of

β

. The inverse element of an element can be obtained quickly by Fermat’s little theorem. Let

A (x)

represent the element of a finite field, briefly denoted as

A

. Using the fact that

A^{2^{m} - 1} \equiv 1 (m o d F (x))

, we have

A^{2^{m} - 2} \equiv A^{- 1} (m o d F (x))

. Furthermore,

A^{2^{m} - 2} = A^{2^{1} + 2^{2} + \dots + 2^{m - 2} + 2^{m - 1}} = A^{2^{1}} \cdot A^{2^{2}} \cdot \dots \cdot A^{2^{m - 2}} \cdot A^{2^{m - 1}}

(2)

The inverse element can be calculated efficiently.

3. Arithmetic in ECC

In 1985, Koblitz and Miller proposed the elliptic curve for public key cryptography, elliptic curve cryptography is used in data encryption and decryption, key agreement, digital signature, etc. The definition of the elliptic curve

E

in the finite field

G F (2^{m})

is expressed as

E (G F (2^{m}))

where

m

is a positive integer, as shown in (3).

E (G F (2^{m})) : y^{2} + x y = x^{3} + a x^{2} + b,

(3)

where

a, b \in G F (2^{m})

and

b \neq 0

. The point

P

of the elliptic curve

E

is defined as

P = (x_{1}, y_{1})

, where

x_{1} \in G F (2^{m})

and

y_{1} \in G F (2^{m})

. The inverse of

P

is

- P = (x_{1}, x_{1} + y_{1})

.

Koblitz proposed the elliptic curve

E

in 1991 as

E (G F (2)) : y^{2} + x y = x^{3} + a x^{2} + 1 .

(4)

The cases of

a = 0

or

a = 1

refer to a Koblitz curve.

3.1. Point Addition

The point addition of P and Q is defined as

R = P ⊞ Q

, where

P \neq Q

. Let the points P and Q be

P = (x_{1}, y_{1})

, and

Q = (x_{2}, y_{2})

, respectively, where

x_{1} \neq x_{2}

. The result of point addition R is

R = (x_{3}, y_{3})

. The relationships among these are

λ = \frac{y_{1} + y_{2}}{x_{1} + x_{2}},

(5)

x_{3} = λ^{2} + λ + x_{1} + x_{2} + a,

(6)

y_{3} = λ (x_{1} + x_{3}) + x_{3} + y_{1},

(7)

where the symbol “+” in (5)–(7) is the addition over the finite field.

3.2. Point Doubling

The point doubling of P is defined as

R = 2 P

, where

P = (x_{1}, y_{1})

,

R = (x_{3}, y_{3})

and

x_{1} \neq 0

. The operations of point doubling are

λ = x_{1} + \frac{y_{1}}{x_{1}},

(8)

x_{3} = λ^{2} + λ + a,

(9)

y_{3} = x_{1}^{2} + (λ + 1) x_{3} .

(10)

3.3. The Points at Infinite ( $O$ Points)

The elliptic curve

E

contains a

O

point in the finite field

GF (2^{m})

. The

O

point must comply with the following rules:

O = O + O

(11)

P = P + O = O + P

(12)

O = P + (- P) = (- P) + P

(13)

3.4. Elliptic Curve Scalar Multiplication

The elliptic curve scalar multiplication is to add a point P on elliptic curve K times. That is,

K P = \underset{K}{\underset{⏟}{P + P + P + \dots + P + P}},

(14)

where K is a positive integer.

4. Proposed Lookup Table-Based Techniques for Arithmetic on $G F (2^{163})$

This paper focuses on the finite field

G F (2^{163})

.

4.1. Improved Multiplication Algorithm

Let

H (a_{i}, a_{j}) = a_{i} x + a_{j}

. By Horner’s rule,

A (x)

is expressed as

A (x) = (\begin{matrix} (\begin{matrix} \dots (\begin{matrix} (a_{162} x + a_{161}) x^{2} \\ + (a_{160} x + a_{159}) \end{matrix}) x^{2} \\ + \dots + (a_{4} x + a_{3}) \end{matrix}) x^{2} \\ + (a_{2} x + a_{1}) \end{matrix}) x + a_{0} .

(15)

Furthermore, the multiplication of the polynomials

A (x)

and

B (x)

is re-written as

A (x) B (x) = (\begin{matrix} (\begin{matrix} \dots (\begin{matrix} (a_{162} x B (x) + a_{161} B (x)) x^{2} \\ + (a_{160} x B (x) + a_{159} B (x)) \end{matrix}) x^{2} \\ + \dots + (a_{4} x B (x) + a_{3} B (x)) \end{matrix}) x^{2} \\ + (a_{2} x B (x) + a_{1} B (x)) \end{matrix}) x + a_{0} B (x) .

(16)

Define

d

as the number of input items of

H

.

As

d = 1

,

H (a_{i}) = a_{i}

. In the case of

d = 2

,

H (a_{i}, a_{j}) = a_{i} x + a_{j}

. According to (16), one needs only

⌊\frac{163}{d}⌋

executions to create an H table. The table of

H (a_{i}, a_{j})

for

d = 2

is shown in Table 2.

4.2. Lookup Table-Based Modulo Operation

Consider the multiplication of

A (x) \cdot x^{q}

. We have

A (x) \cdot x^{q} = (\begin{matrix} a_{162} x^{162 + q} + a_{161} x^{161 + q} + \dots \\ + a_{2} x^{2 + q} + a_{1} x^{1 + q} + a_{0} x^{q} \end{matrix}) m o d F (x),

(17)

where

F (x) = x^{163} + x^{7} + x^{6} + x^{3} + 1

. In (17), there are q terms (

a_{162} x^{162}, a_{161} x^{161}, \dots, a_{163 - q} x^{163 - q}

) to perform a modulo

F (x)

operation, because of the degree. As a result, we can establish a table with which to manage these

q

terms. The table is used to store the results of

(a_{162} x^{162} + a_{161} x^{161} + \dots + a_{163 - q} x^{163 - q}) x^{q} m o d F (x)

. Such a table is required when

q < 156

. The inputs of the table with q terms are

a_{162}, a_{161}, \dots, a_{163 - q}

. The output is

a_{162} x^{q - 1} \cdot f + a_{161} x^{q - 2} \cdot f + \dots + a_{163 - q} \cdot f

, where

f = x^{7} + x^{6} + x^{3} + 1

. For

q = 2

, the M table is constructed as Table 3.

The circuit of the finite field multiplication is depicted in Figure 1, where the H and M tables are included. It can also be observed that the H table dominates the critical path of the whole architecture. Note that two XOR elements and one multiplexer are required for constructing the H table as depicted in Figure 2. The M table includes one XOR logic and one multiplexer in Figure 3.

For the software implementation, there is more time to establish these tables and store the corresponding values. However, the building and searching time for the tables will increase as the tables become large. Additionally, in terms of hardware implementation, more logic elements are required.

4.3. Improved Squaring Operation

The squaring operation of

A (x)

is

C (x) \equiv a_{162} x^{324} + a_{161} x^{322} + \dots + a_{1} x^{2} + a_{0} m o d F (x) .

(18)

Consider modulo operations of each term as follows:

A_{162} (x) \equiv a_{162} x^{324} m o d F (x) A_{161} (x) \equiv a_{161} x^{322} m o d F (x) \dots A_{1} (x) \equiv a_{1} x^{2} m o d F (x) A_{0} (x) \equiv a_{0} m o d F (x) .

(19)

Let

A_{i} (x) = \sum_{j = 0}^{162} ρ_{i j} x^{j}

, where

0 \leq i \leq 162

. In light of (19), we have

C (x) = \sum_{i = 0}^{162} A_{i} (x) = \sum_{i = 0}^{162} c_{i} (x),

(20)

where

c_{i} (x) = θ_{i} x^{i}

.

4.4. Improved Inverse Operation

As shown in (12), we have

2^{163} - 2 = 2^{162} + 2^{161} + \dots + 2^{2} + 2^{1}

(21)

(21) can be re-written as

2^{163} - 2 = (\begin{matrix} (\begin{matrix} \dots (\begin{matrix} (\sum_{i = 1}^{w} 2^{i}) 2^{w} \\ + (\sum_{i = 1}^{w} 2^{i}) \end{matrix}) 2^{w} + \dots \\ + (\sum_{i = 1}^{w} 2^{i}) \end{matrix}) 2^{w} \\ + (\sum_{i = 1}^{w} 2^{i}) \end{matrix}) 2^{l} + s g n (l) (\sum_{i = 1}^{l} 2^{i}),

(22)

where

l \equiv 162 m o d w

and

s g n (l)

are sign function defined as follows:

s g n (l) = \{\begin{matrix} 0 & , l = 0 \\ 1 & , l > 0 \end{matrix}

(23)

Simplification of

\sum_{i = 1}^{w} 2^{i}

and

\sum_{i = 1}^{l} 2^{i}

in (22) is required.

As

w

is odd, we have

\sum_{i = 1}^{w} 2^{i} = 2^{w} + (2^{w - 1} + \dots + 2^{2} + 2^{1}) .

(24)

As

w

is even, it is derived that

\sum_{i = 1}^{w} 2^{i} = (2^{w / 2} + 2^{w / 2 - 1} + \dots + 2^{2} + 2^{1}) 2^{w / 2} + (2^{w / 2} + 2^{w / 2 - 1} + \dots + 2^{2} + 2^{1})

(25)

According to (24) and (25), (22) is represented as

\begin{array}{l} A^{2^{m} - 2} & = A^{2^{1} + 2^{2} + \dots + 2^{m - 2} + 2^{m - 1}} \\ = A^{(\begin{matrix} (\begin{matrix} \dots (\begin{matrix} (\sum_{i = 1}^{w} 2^{i}) 2^{w} \\ + (\sum_{i = 1}^{w} 2^{i}) \end{matrix}) 2^{w} + \dots \\ + (\sum_{i = 1}^{w} 2^{i}) \end{matrix}) 2^{w} \\ + (\sum_{i = 1}^{w} 2^{i}) \end{matrix}) 2^{l} + s g n (l) (\sum_{i = 1}^{l} 2^{i})} \end{array}

(26)

The original Fermat’s little theorem for the inverse element operation needs to go through 162 cycles of power operation and 161 cycles of multiplication operation. The main purpose of the above simplification process is to reduce the number of multiplication operations, but there is also a need to increase the number of different

2^{n}

power operations. We investigate the number of multiplications for the inverse operation on the field GF(2¹⁶³) for various

w

. The results are depicted in Figure 4 for

w = 61

to

w = 70

.

It can be seen in Figure 1 that grouping by

w = 65

requires the lowest number of multiplications. In the following, the inverse operation of

G F (2^{163})

and grouping by

w = 65

are simplified, where

l = 32

, as shown in (27).

2^{163} - 2 = ((\sum_{i = 1}^{65} 2^{i}) 2^{65} + (\sum_{i = 1}^{65} 2^{i})) 2^{32} + (\sum_{i = 1}^{32} 2^{i})

(27)

Simplify

\sum_{i = 1}^{w} 2^{i}

and

\sum_{i = 1}^{l} 2^{i}

in (27) and let

Z = 2^{64} + 2^{63} + \dots + 2^{2} + 2^{1}

. We have

\sum_{i = 1}^{65} 2^{i} = 2^{65} + (2^{64} + 2^{63} + \dots + 2^{2} + 2^{1}) = 2^{65} + Z .

(28)

Using the idea, it is derived as follows:

Z = (2^{32} + 2^{31} + \dots + 2^{2} + 2^{1}) 2^{32} + (2^{32} + 2^{31} + \dots + 2^{2} + 2^{1}) = (Y) 2^{32} + Y,

(29)

where

Y = 2^{32} + 2^{31} + \dots + 2^{2} + 2^{1}

.

Y = (2^{16} + 2^{15} + \dots + 2^{2} + 2^{1}) 2^{16} + (2^{16} + 2^{15} + \dots + 2^{2} + 2^{1}) = (X) 2^{16} + X,

(30)

where

X = 2^{16} + 2^{15} + \dots + 2^{2} + 2^{1}

.

X = (2^{8} + 2^{7} + \dots + 2^{2} + 2^{1}) 2^{8} + (2^{8} + 2^{7} + \dots + 2^{2} + 2^{1}) = (W) 2^{8} + W,

(31)

where

W = 2^{8} + 2^{7} + \dots + 2^{2} + 2^{1}

.

W = (2^{4} + 2^{3} + 2^{2} + 2^{1}) 2^{4} + (2^{4} + 2^{3} + 2^{2} + 2^{1}) = (V) 2^{4} + V,

(32)

where

V = 2^{4} + 2^{3} + 2^{2} + 2^{1}

.

V = (2^{2} + 2^{1}) 2^{2} + (2^{2} + 2^{1}) = (U) 2^{2} + U,

(33)

where

U = 2^{2} + 2^{1}

.

Among these,

Y

is the same as the residue

\sum_{i = 1}^{32} 2^{i}

, so there is no need to simplify the residue further, though it does need to store the value of

Y

after the operation. In this case, where

w = 65

, the inverse operation over the field GF(2¹⁶³) requires 9 multiplication operations, 1 squaring operation and

2^{2}, 2^{4}, 2^{8}, 2^{16}, 2^{32}, 2^{65}

power operations.

The circuit for inverse operation over GF(2¹⁶³) is shown in Figure 5. The GFM is the Galois field multiplication unit and the input Ci in the multiplexer means the ith cycle.

5. Proposed Elliptic Curve Scalar Multiplication Operation

By building a table of elliptic curve points and using a table lookup method and an inverse operation that reduces the number of iterations of point doubling by two, the speed of scalar multiplication of elliptic curves is increased.

5.1. Conventional Multiple Time Point Doubling Algorithm

Consider Algorithm 1, called Point_Doubling_Repeating

(P, r)

, where

P

is the point of the elliptic curve and

r

is a positive integer. If

r = 2

, the point of

P

will be subjected to a point doubling operation twice. The coordinate of

P

is

(x_{1}, y_{1})

and the result of point doubling is

(x_{3}, y_{3})

.

Algorithm 1 Point Doubling Repeating

1: Function Point_Doubling_Repeating

(P, r)

2: for

i

from

0

to

r - 1

do
3:

λ = x_{1} + \frac{y_{1}}{x_{1}}

4:

x_{3} = λ^{2} + λ + a

5:

y_{3} = x_{1}^{2} + (λ + 1) x_{3}

6:

P \leftarrow P (x_{3}, y_{3})

7:

x_{1} = x_{3}

8:

y_{1} = y_{3}

9: end for
10: end function

The computational complexity of Algorithm 1 is listed in Table 4 with

r = 1

and

r = 2

. It is observed that point doubling two times (

r = 2

) requires the inverse operation two times. In fact,

k

inverse operations are needed as

r = k

.

5.2. Proposed Two-Time Point Doubling Algorithm

Algorithm 2, Point_Doubling_Modify

(P, S e l)

, is presented, where the input of

P

point is

P (x_{1}, y_{1})

. As

S e l = 0

, the point doubling is performed once and the output is

P (x_{3}, y_{3})

. When

S e l = 1

, the improved algorithm for point doubling twice is executed and the output is

P (x_{5}, y_{5})

.

Algorithm 2 Point Doubling Modify

1: Function Point_Doubling_Modify

(P, S e l)

2: if

S e l = 0

then
3:

λ = x_{1} + \frac{y_{1}}{x_{1}}

4:

x_{3} = λ^{2} + λ + a

5:

y_{3} = x_{1}^{2} + (λ + 1) x_{3}

6:

P \leftarrow P (x_{3}, y_{3})

7: else if

S e l = 1

then
8:

t_{0} = x_{1}^{2}

9:

u_{0} = t_{0} + y_{1}

10:

t_{1} = a \cdot t_{0}

11:

t_{2} = (u_{0} + x_{1}) \cdot u_{0}

12:

u_{1} = t_{1} + t_{2}

13:

t_{3} = u_{1}^{2}

14:

t_{4} = u_{1} \cdot t_{0}

15:

t_{5} = t_{4}^{- 1}

16:

x_{3} = t_{3} \cdot t_{5}

17:

t_{6} = (a + 1) \cdot t_{4}

18:

t_{7} = u_{0}^{2}

19:

t_{8} = t_{7} \cdot u_{1}

20:

t_{9} = t_{0}^{2}

21:

t_{10} = t_{0} \cdot t_{9}

22:

u_{2} = t_{6} + t_{8} + t_{10}

23:

λ = u_{2} \cdot t_{5}

24:

t_{11} = λ^{2}

25:

x_{5} = t_{11} + λ + a

26:

t_{12} = x_{3}^{2}

27:

t_{13} = (λ + 1) \cdot x_{5}

28:

y_{5} = t_{12} + t_{13}

29:

P \leftarrow P (x_{5}, y_{5})

30: end if
31: return

P

32: end function

The number of finite field operations required for improved two-time point doubling is shown in Table 5. It can be found that, after simplification, only one inverse operation is required.

Comparison with the conventional two-time point doubling is revealed in Table 6. Although the proposed two-time point doubling algorithm reduces the number of inverse operations by 1, it increases the number of multiplication operations by 5 and the number of squaring operations by 2.

The proposed architecture of two-time point doubling is presented in Figure 6. The yellow block is used to perform one-time point doubling and the red block is for computing two-time point doubling.

The inverse operation utilizes the framework in Figure 5.

5.3. Proposed Elliptic Curve Scalar Multiplication Algorithm

The elliptic curve scalar multiplication operation is defined in (14). The positive integer K is expressed as follows in the binary form.

K = k_{m - 1} 2^{m - 1} + k_{m - 2} 2^{m - 2} + \dots + k_{2} 2^{2} + k_{1} 2^{1} + k_{0} 2^{0},

(34)

where

k_{i} \in \{0, 1\}

.

If

m

is odd, K is re-written as

K = (\begin{matrix} \dots (\begin{matrix} (k_{m - 1} 2 + k_{m - 2}) 2^{2} \\ + (k_{m - 3} 2 + k_{m - 4}) \end{matrix}) 2^{2} + \dots \\ + (k_{2} 2 + k_{1}) \end{matrix}) 2 + k_{0}

(35)

by Horner’s rule. Moreover,

K P = (\begin{matrix} \dots (\begin{matrix} (k_{m - 1} 2 P + k_{m - 2} P) 2^{2} \\ + (k_{m - 3} 2 P + k_{m - 4} P) \end{matrix}) 2^{2} + \dots \\ + (k_{2} 2 P + k_{1} P) \end{matrix}) 2 + k_{0} P .

(36)

It is observed in (36) that the last term is

k_{0} P

and that the common term is expressed as

k_{i} 2 P + k_{j} P

. As a result, the L table is motivated and constructed. The L table is listed as Table 7, where the input is

(k_{i}, k_{j})

. As when one builds the L table, the operations for point addition and doubling once need to be performed first. There are an additional four 163-bit registers required for the table.

The term of

(k_{i} 2 P + k_{j} P)

can be viewed as a point of the elliptic curve. This term requires point doubling once, as

k_{i} = 1

. Furthermore,

(k_{i} 2 P + k_{j} P) 2^{2}

is the operation of two-time point doubling. According to (36), a new scalar multiplication technique is proposed in Algorithm 3.

Algorithm 3 New Scalar Multiplication 1

1: Function New_ScalarM

(K, P)

2:

Q = (0,0)

3: To make a lookup table

L (k_{i}, k_{j})

as shown in Table 7
4: for

i

from

⌊\frac{m}{2}⌋ - 1

to

0

do
5:

Q \leftarrow

Point_Doubling_Repeating

(Q, 2)

6:

Q \leftarrow

Point_Addition

(Q, L (k_{2 i + 2}, k_{2 i + 1}))

7: end for
8:

Q \leftarrow

Point_Doubling_Repeating

(Q, 1)

9:

Q \leftarrow

Point_Addition

(Q, L (0, k_{0}))

10: return

Q

11: end function

We replace Point_Doubling_Repeating

(P, r)

with Point_Doubling_Modify

(P, S e l)

in Algorithm 3. Algorithm 4 is therefore obtained.

Algorithm 4 New Scalar Multiplication 2

1: Function New_ScalarM

(K, P)

2:

Q = (0,0)

3: To make a lookup table

L (k_{i}, k_{j})

as shown in Table 7
4: for

i

from

⌊\frac{m}{2}⌋ - 1

to 0 do
5:

Q \leftarrow

Point_Doubling_Modify

(Q, 1)

6:

Q \leftarrow

Point_Addition

(Q, L (k_{2 i + 2}, k_{2 i + 1}))

7: end for
8:

Q \leftarrow

Point_Doubling_Modify

(Q, 0)

9:

Q \leftarrow

Point_Addition

(Q, L (0, k_{0}))

10: return

Q

11: end function

The numbers of arithmetic operations required in Algorithms 3 and 4 are demonstrated in Table 8. The proposed framework of the scalar multiplication by Algorithm 4 is depicted in Figure 7, where Algorithm 2 is applied for point doubling. Note that using the improved two-time point doubling reduces the number of inverse operations by 81. However, the numbers of multiplication and squaring operations will increase by 405 and 162, respectively.

5.4. Performance Evaluation

The results of multiplication over the finite field GF(2¹⁶³) are obtained after computing 163 coefficients. Utilization of the H table by Horner’s rule only needs computation of

⌊\frac{163}{d}⌋

coefficients. The M table is exploited in order to speed up the modulo operation by the XOR operation. The inverse operation using Fermat’s little theorem is obtained by the simplifications of

\sum_{i = 1}^{w} 2^{i}

and

\sum_{i = 1}^{l} 2^{i}

. The results of

A^{\sum_{i = 1}^{w} 2^{i}}

and

A^{\sum_{i = 1}^{l} 2^{i}}

are stored in advance. Thus, this leads to improvements in performance of the inverse operation.

The most time-consuming operation over the finite field is the inverse operation. Exploiting Horner’s rule to compute the inverse requires nine multiplication and one squaring operation. The conventional two-time point doubling needs two inverse operations. However, the proposed method requires only one inverse operation at the expense of five multiplication and two squaring operations. Finally, the proposed algorithm reduces four multiplications and increase two squaring operations. Elliptic curve point multiplication KP is improved by using the idea that the integer K is expressed in the binary form with Horner’s rule-based grouping technique. The L table is established to further enhance the efficiency of the scalar multiplication.

The efficiency evaluations for the proposed hardware design are listed in Table 9, Table 10, Table 11 and Table 12. All circuits are synthesized with Taiwan Semiconductor Manufacturing Company (TSMC) 40 nm standard cell library by Synopsys Design Compiler

Table 9 indicates the synthesized results of the proposed multiplication method, with various d values. It is observed that the shortest critical path happens as d = 2. For the proposed squaring circuits, Table 10 shows that the critical path becomes short with the increase of n. However, the improvements are not significant from the cases of 2¹⁶ to 2⁶⁵. Table 11 reveals that the proposed two-time point doubling has a 48% shorter computing time when compared with the conventional one, and with low area–time complexity. Take the architectures of scalar multiplication into consideration in Table 12. The delay of Algorithm 4 is improved by 67% over Algorithm 3, with an area increase of only 12%.

6. Conclusions

This paper proposed a methodology and hardware architecture for the scalar multiplication in the affine coordinate system over ECC. Our main idea is to exploit Horner’s rule to efficiently construct the lookup tables for arithmetic operations over the finite field. The time-consuming part in such a system is the inverse operation. The new algorithm for two-time point doubling was therefore developed by reducing the number of inverse operations. Results of the hardware implementations show that the computing time of two-time point doubling is reduced by 48% over the conventional one, with an area increase of only 18%. For the scalar multiplication, including the point addition and doubling operations, the presented hardware architecture has a reduction of delay by 68% as compared with the traditional algorithm. The total power and circuit area increase by only 13% and 12%, respectively. Such results reveal that the proposed lookup table-based design indeed achieves a high-speed performance by reducing the computational complexity of point doubling. However, the current reported hardware designs are still involved in plenty of registers. The size of the lookup tables may be reduced by some merging techniques. These issues will be investigated and addressed for compact design in the future.

Author Contributions

Conceptualization, Y.-D.N. and Y.-H.C.; methodology, Y.-H.C.; software, C.-S.S.; validation, Y.-H.C. and C.-S.S.; formal analysis, Y.-D.N.; investigation, Y.-H.C. and C.-S.S.; writing—original draft preparation, Y.-D.N. and S.-I.C.; writing—review and editing, Y.-D.N. and S.-I.C.; visualization, C.-S.S.; supervision, Y.-H.C. and S.-I.C.; project administration, Y.-H.C.; funding acquisition, Y.-H.C. All authors have read and agreed to the published version of the manuscript.

Funding

The funding sources are the Ministry of Science and Technology, Taiwan (under Grant no. MOST 111-2221-E-214-014).

Data Availability Statement

The data is available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Koblitz, N. Elliptic curve crypto systems. Math. Comput. 1987, 48, 203–309. [Google Scholar] [CrossRef]
Miller, V. Uses of elliptic curves in cryptography. In Advances in Cryptography-CRYPTO; Springer: Berlin/Heidelberg, Germany, 1986; pp. 417–426. [Google Scholar]
Chen, Y.H.; Huang, C.H. Efficient Operations in Large Finite Fields for Elliptic Curve Cryptographic. Int. J. Eng. Technol. Manag. Res. 2020, 7, 141–151. [Google Scholar] [CrossRef]
Morain, F.; Olivos, J. Speeding Up the Computations on An Elliptic Curve Using Addition-Subtraction Chains. RAIRO Theor. Inform. Appl. 1990, 24, 531–543. [Google Scholar] [CrossRef]
Solinas, J.A. Low-Weight Binary Representation for Pairs of Integers; Combinatorics and Optimization Research Report CORR 2001-41; Technical Report; Centre for Applied Cryptographic Research, University of Waterloo: Waterloo, ON, Canada, 2001. [Google Scholar]
Koblitz, N. CM-Curves with Good Cryptographic Properties. In Advances in Cryptology-CRYPTO ‘91; Feigenbaum, J., Ed.; Springer: Berlin/Heidelberg, Germany, 1992; pp. 279–287. [Google Scholar]
Solinas, J.A. Efficient Arithmetic on Koblitz Curves. In Towards a Quarter-Century of Public Key Cryptography: A Special Issue of Designs, Codes and Cryptography an International Journal; Koblitz, N., Ed.; Springer: New York, NY, USA, 2000; Volume 19, pp. 125–179. [Google Scholar]
Lange, T. A Note On López-Dahab Coordinates; IACR Cryptology ePrint Archive, Paper 2004/323; 2004. [Google Scholar]
Guo, C.; Gong, B. Efficient Scalar Multiplication of ECC Using SMBR and Fast Septuple Formula for IoT. EURASIP J. Wirel. Commun. Netw. 2021, 2021, 82. [Google Scholar] [CrossRef]
Cho, S.M.; Gwak, S.G.; Kim, C.H.; Hong, S. Faster Elliptic Curve Arithmetic for Triple-base Chain by Reordering Sequences of Field Operations. Multimed. Tools Appl. 2016, 75, 14819–14831. [Google Scholar] [CrossRef]
Di Matteo, S.; Baldanzi, L.; Crocetti, L.; Nannipieri, P.; Fanucci, L.; Saponara, S. Secure Elliptic Curve Crypto-Processor for Real-Time IoT Applications. Energies 2021, 14, 4676. [Google Scholar] [CrossRef]
Imran, M.; Rashid, M. Architectural review of polynomial bases finite field multipliers over GF(2 m). In Proceedings of the International Conference on Communication, Computing and Digital Systems (C-CODE), Islamabad, Pakistan, 8–9 March 2017; pp. 331–336. [Google Scholar]
Khan, S.; Javeed, K.; Shah, Y.A. High-speed FPGA implementation of full-word Montgomery multiplier for ECC applications. Microprocess. Microsyst. 2018, 62, 91–101. [Google Scholar] [CrossRef]
SPillutla, R.; Boppana, L. A high-throughput fully digit-serial polynomial basis finite field GF(2 m) multiplier for IoT applications. In Proceedings of the TENCON 2019—2019 IEEE Region 10 Conference (TENCON), Kochi, India, 17–20 October 2019; pp. 920–924. [Google Scholar]
Li, J.; Zhong, S.; Li, Z.; Cao, S.; Zhang, J.; Wang, W. Speed-Oriented Architecture for Binary Field Point Multiplication on Elliptic Curves. IEEE Access 2019, 7, 32048–32060. [Google Scholar] [CrossRef]
Li, J.; Wang, W.; Zhang, J.; Luo, Y.; Ren, S. Innovative Dual-Binary-Field Architecture for Point Multiplication of Elliptic Curve Cryptography. IEEE Access 2021, 9, 12405–12419. [Google Scholar] [CrossRef]
Oudjida, A.K.; Liacha, A. Radix-2w Arithmetic for Scalar Multiplication in Elliptic Curve Cryptography. IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 68, 1979–1989. [Google Scholar] [CrossRef]
Lee, C.-Y.; Zeghid, M.; Sghaier, A.; Ahmed, H.Y.; Xie, J. Efficient Hardware Implementation of Large Field-Size Elliptic Curve Cryptographic Processor. IEEE Access 2022, 10, 7926–7936. [Google Scholar] [CrossRef]
Zhang, J.; Chen, Z.; Ma, M.; Jiang, R.; Li, H.; Wang, W. High-Performance ECC Scalar Multiplication Architecture Based on Comb Method and Low-Latency Window Recoding Algorithm. IEEE Trans. Very Large Scale Integr. Syst. 2024, 32, 382–395. [Google Scholar] [CrossRef]
Wang, C.C.; Truong, T.K.; Shao, H.M.; Deutsch, L.J. VLSI Architectures for Computing Multiplications and Inverses in GF(2 m). IEEE Trans. Comput. 1985, 100, 709–717. [Google Scholar] [CrossRef] [PubMed]
National Institute of Standards and Technology. Digital Signature Standard (DSS); National Institute of Standards and Technology: Gaithersburg, MD, USA, 2023.
National Institute of Standards and Technology. Digital Signature Standard (DSS); National Institute of Standards and Technology: Gaithersburg, MD, USA, 2013. [CrossRef]

Figure 1. Multiplication circuit over GF(2¹⁶³) when d = 2.

Figure 2. H table.

Figure 3. M table.

Figure 4. Number of multiplications for different

w

value groupings.

Figure 4. Number of multiplications for different

w

value groupings.

Figure 5. Proposed inversion circuit over GF(2¹⁶³) with

w = 65

.

Figure 5. Proposed inversion circuit over GF(2¹⁶³) with

w = 65

.

Figure 6. Proposed circuit for two-time point doubling.

Figure 7. Proposed circuit for scalar multiplication by Algorithm 4.

Table 1. NIST-recommended primitive polynomial.

Finite Field	Primitive Polynomial
$G F (2^{163})$	$F (x) = x^{163} + x^{7} + x^{6} + x^{3} + 1$
$G F (2^{233})$	$F (x) = x^{233} + x^{74} + 1$
$G F (2^{283})$	$F (x) = x^{283} + x^{12} + x^{7} + x^{5} + 1$
$G F (2^{409})$	$F (x) = x^{409} + x^{87} + 1$
$G F (2^{571})$	$F (x) = x^{571} + x^{10} + x^{5} + x^{2} + 1$

Table 2. H Table.

Input		Output
$\{a_{i}, a_{j}\}$	$H (a_{i}, a_{j}) = a_{i} x + a_{j}$	$H (x) \cdot B (x) m o d F (x)$
$\{0, 0\}$	0	0
$\{0, 1\}$	1	$B (x)$
$\{1, 0\}$	$x$	$x \cdot B (x) m o d F (x)$
$\{1, 1\}$	$x + 1$	$x \cdot B (x) m o d F (x) + B (x)$

Table 3. M Table.

Input	Output
$M (a_{162}, a_{161})$	Output
$M (0, 0)$	0
$M (0, 1)$	$f$
$M (1, 0)$	$x \cdot f$
$M (1, 1)$	$x \cdot f + f$

Table 4. Number of finite field operations required for each of the one-time and two-time point doubling operations.

r	$r = 1$	$r = 2$
Addition	5	10
Multiplication	2	4
Square	2	4
Inverse	1	2

Table 5. Number of finite field operations for improved two-time point doubling.

	Number of Operations
Addition	10
Multiplication	9
Square	6
Inverse	1

Table 6. Comparison of two-time point doubling.

	Conventional	Proposed
Addition	10	10
Multiplication	4	9
Square	4	6
Inverse	2	1

Table 7. L Table.

$L (k_{i}, k_{j})$	$k_{i} 2 P + k_{j} P$
$L (0, 0)$	0
$L (0, 1)$	$P$
$L (1, 0)$	$2 P$
$L (1, 1)$	$2 P + P$

Table 8. Number of arithmetic operations in Algorithms 3 and 4.

Operation	Number
Number of Point Addition	83
Number of Point Doubling	2
Two-time Point Doubling	81

Table 9. Multiplication designs with various d.

d	Critical Path (ns)	Frequency (MHz)	Area (μm²)	Total Power (mW)	Cycle
1	1.44	694.44	4761.44	3.39	164
2	0.86	1162.79	7705.08	4.53	84
3	0.95	1052.63	8750.62	5.03	57
4	0.95	1052.63	10482.24	6.00	43

Table 10. Squaring circuits of 2ⁿ^.

2ⁿ	Critical Path (ns)	Area (μm²)	Total Power (mW)	Cycle
2¹	0.18	355.6	0.38	1
2²	0.4	709.2	1.09	1
2⁴	0.82	1980.6	4.00	1
2⁸	1.11	5736.1	14.06	1
2¹⁶	1.22	6569.3	16.44	1
2³²	1.22	6652.3	16.63	1
2⁶⁴	1.25	6624.6	16.65	1

Table 11. Architecture comparison for two-time point doubling.

Two-Time Doubling	Critical Path (ns)	Area (μm²)	Total Power (mW)	Cycle	Delay (ns)
Conventional	2.64	123,197.08	62.15	1795	4738.8
Proposed	2.65	147,547.68	75.11	924	2448.6

Table 12. Architecture comparison for scalar multiplication.

	Critical Path (ns)	Area (μm²)	Total Power (mW)	Cycle	Delay (ns)
Algorithm 3	2.76	393,205.19	194.23	464,476	1,281,953.76
Algorithm 4	2.78	441,985.79	219.44	150,925	419,571.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ning, Y.-D.; Chen, Y.-H.; Shih, C.-S.; Chu, S.-I. Lookup Table-Based Design of Scalar Multiplication for Elliptic Curve Cryptography. Cryptography 2024, 8, 11. https://doi.org/10.3390/cryptography8010011

AMA Style

Ning Y-D, Chen Y-H, Shih C-S, Chu S-I. Lookup Table-Based Design of Scalar Multiplication for Elliptic Curve Cryptography. Cryptography. 2024; 8(1):11. https://doi.org/10.3390/cryptography8010011

Chicago/Turabian Style

Ning, Yan-Duan, Yan-Haw Chen, Cheng-Sin Shih, and Shao-I Chu. 2024. "Lookup Table-Based Design of Scalar Multiplication for Elliptic Curve Cryptography" Cryptography 8, no. 1: 11. https://doi.org/10.3390/cryptography8010011

Article Menu

Lookup Table-Based Design of Scalar Multiplication for Elliptic Curve Cryptography

Abstract

1. Introduction

2. Finite Field Arithmetic

2.1. Addition and Subtraction

2.2. Finite Field Multiplication

2.3. Finite Field Division

3. Arithmetic in ECC

3.1. Point Addition

3.2. Point Doubling

3.3. The Points at Infinite ( $O$ Points)

3.4. Elliptic Curve Scalar Multiplication

4. Proposed Lookup Table-Based Techniques for Arithmetic on $G F (2^{163})$

4.1. Improved Multiplication Algorithm

4.2. Lookup Table-Based Modulo Operation

4.3. Improved Squaring Operation

4.4. Improved Inverse Operation

5. Proposed Elliptic Curve Scalar Multiplication Operation

5.1. Conventional Multiple Time Point Doubling Algorithm

5.2. Proposed Two-Time Point Doubling Algorithm

5.3. Proposed Elliptic Curve Scalar Multiplication Algorithm

5.4. Performance Evaluation

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Lookup Table-Based Design of Scalar Multiplication for Elliptic Curve Cryptography

Abstract

1. Introduction

2. Finite Field Arithmetic

2.1. Addition and Subtraction

2.2. Finite Field Multiplication

2.3. Finite Field Division

3. Arithmetic in ECC

3.1. Point Addition

3.2. Point Doubling

3.3. The Points at Infinite ( O Points)

3.4. Elliptic Curve Scalar Multiplication

4. Proposed Lookup Table-Based Techniques for Arithmetic on G F 2 163

4.1. Improved Multiplication Algorithm

4.2. Lookup Table-Based Modulo Operation

4.3. Improved Squaring Operation

4.4. Improved Inverse Operation

5. Proposed Elliptic Curve Scalar Multiplication Operation

5.1. Conventional Multiple Time Point Doubling Algorithm

5.2. Proposed Two-Time Point Doubling Algorithm

5.3. Proposed Elliptic Curve Scalar Multiplication Algorithm

5.4. Performance Evaluation

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.3. The Points at Infinite ( $O$ Points)

4. Proposed Lookup Table-Based Techniques for Arithmetic on $G F (2^{163})$