# Symmetric Cryptography on RISC-V: Performance Evaluation of Standardized Algorithms

## Abstract

## 1. Introduction

#### 1.1. Previous and Related Work

#### 1.2. Objectives

## 2. Materials and Methods

## 3. Cryptographic Algorithms

## 4. Software Implementations

#### 4.1. Software Implementation of AES

#### 4.2. Software Implementation of Seed

#### 4.3. Software Implementation of CAMELLIA

#### 4.4. Software Implementation of CAST

#### 4.5. Software Implementation of SHA-256 and SHA-512

#### 4.6. Software Implementation of TDEA

#### 4.7. Software Implementation of MISTY1

Algorithm 1: MISTY1 key scheduling. |

1.35 Date: 128-bit Main Key(K)Result: Array of 16-bit round keysfor i = 0 to 7 { EK[i] = K[i×2]×256 XOR K[i×2 + 1]; } for i = 0 to 7 { EK[i + 8] = FI(EK[i], EK[(i + 1)%8]); EK[i + 16] = EK[i + 8] & 0x1ff; EK[i + 24] = EK[i + 8] $>>$ 9; } |

#### 4.8. Software Implementation of HIGHT

Algorithm 2 HIGHT key scheduling. |

Date: s0 = 0, s1 = 1, s2 = 0, s3 = 1, s4 = 1, s5 = 0, s6 = 1d0 = s6 $\left|\right|$ s5 $\left|\right|$ s4 $\left|\right|$ s3 $\left|\right|$ s2 $\left|\right|$ s1 $\left|\right|$ s0 Result: Subkey Array SKfor i = 1 to 127{ $s(i+6)=s(i+2)\oplus s(i-1)$ di = s(i + 6)$\left|\right|$ s(i + 5) $\left|\right|$ s(i + 4) $\left|\right|$ s(i + 3) $\left|\right|$ s(i + 2) $\left|\right|$ s(i + 1) $\left|\right|$ si } for i = 0 to 7 { for i = 0 to 7 { SK(16×i + j) = K(j − i mod 8) [+] d(16×i + j) } } for j = 0 to 7 { SK(16×i + j + 8) = K((j − i mod 8) + 8) [+] d(16×i + j + 8) } |

#### 4.9. Software Implementation of PRESENT

## 5. Hardware Implementations

- RISC-V Core
- Bit Re-positioning Instructions
- Carry-Less Multiply Instructions
- Crossbar Permutation Instructions
- Logic With Negate Instructions
- Packing Instructions
- Hash Instructions
- AES and SM4 Instructions

#### 5.1. Hardware Architecture of Bit Re-Positioning Instructions

#### 5.2. Hardware Architecture of Carry-Less Multiply Instructions

## 6. Hardware Architecture of 32-bit Algorithm Specific Cryptography Instructions

#### 6.1. Hardware Architecture of Hash Instructions

#### 6.2. Hardware Implementation of AES and SM4 Instructions

- Multiplication in GF(${2}^{4}$):

- Square in GF(${2}^{4}$):

- Addition in GF(${2}^{4}$):

- Inverse in GF(${2}^{4}$):

## 7. Results

#### 7.1. Clock Cycle Count

#### 7.2. Program Memory

#### 7.3. Static Memory

#### 7.4. Analysis for Cryptography Instructions

#### 7.5. Proposed New Instruction for SBOX Address Calculation

#### 7.6. Conclusion

- Compared to implementations using only the base rv32i instruction set, implementations with the cryptography set extension provide 1.5× to 8.6× faster execution speed and 1.2× to 5.8× less program memory for five of the eleven algorithms. For the remaining six algorithms, the increase in execution speed and reduction in program memory requirement is less than 6%.
- The hardware crypto implementations have an additional hardware complexity of 0.3% to 7.7% over the software implementations using the rv32i ISA.
- The benefit-cost analysis in Figure 31 graphically shows the acceleration of execution time as a function of the relative hardware cost, summarizing the gains in execution time as a function of the costs in terms of hardware complexity for each algorithm. As one illustration of the benefit vs. cost, we see that for the SHA algorithms, we achieve an acceleration of approximately 1.7× at a hardware cost increase of less than 7.5%.
- Based on our analysis of execution times, we proposed a new instruction to accelerate the memory address calculation operations for the 8-bit input SBOX table, which is dominant in the execution time for four of the eleven algorithms. This new instruction provided a 1.2× to 1.6× faster execution time for the four algorithms with only a 1.1% additional hardware cost, as shown in Figure 35.

## Abbreviations

AES | Advanced Encryption Standard |

SHA | Secure Hash Algorithm |

FPGA | Field Programmable Gate Array |

ISE | Instruction Set Extension |

ISA | Instruction Set Architecture |

HDL | Hardware Description Language |

RTL | Register Transfer Level |

GF | Galois Field |

RISC | Reduced Instruction Set Computer |

PI | Proposed Instruction |

GE | Gate Equivalent |

NC | Not Calculated |

**Figure 35.**Acceleration vs. hardware cost of crypto implementations with new address calculation instruction.

Cipher | Block Size (Bits) | Key Size (Bits) | Comment | Reference |
---|---|---|---|---|

AES | 128 | 128,192,256 | ISO/IEC 18033-3:2010, FIPS 197 | [8,32] |

SEED | 128 | 128 | ISO/IEC 18033-3:2010 | [29,32] |

CAMELLIA | 128 | 128,192,256 | ISO/IEC 18033-3:2010 | [28,32] |

MISTY1 | 64 | 128 | ISO/IEC 18033-3:2010 | [24,32] |

CAST-128 | 64 | 40 to 128 | ISO/IEC 18033-3:2010 | [25,32] |

HIGHT | 64 | 128 | ISO/IEC 18033-3:2010 | [26,32] |

TDEA | 64 | 112,168 | ISO/IEC 18033-3:2010 | [23,32] |

PRESENT | 64 | 80,128 | ISO/IEC 29192-2:2019 | [27,33] |

Function | Output Size (Bits) | State Size (Bits) | Round # | Comment | Reference |
---|---|---|---|---|---|

SHA-256 | 256 | 256 (8 × 32) | 64 | FIPS 180-3 | [9] |

SHA-512 | 512 | 512 (8 × 64) | 80 | FIPS 180-3 | [9] |

SHA3-256 | 256 | 1600 (5 ×5 ×64) | 24 | FIPS 202 | [30] |

Element | Inverse |
---|---|

${y}^{0}$ = 1 | 1 |

${y}^{1}$ = y | ${y}^{3}+1$ |

${y}^{2}$ = ${y}^{2}$ | ${y}^{3}+{y}^{2}+1$ |

${y}^{3}$ = ${y}^{3}$ | ${y}^{3}+{y}^{2}+y+1$ |

${y}^{4}$ = $y+1$ | ${y}^{3}+{y}^{2}+y$ |

${y}^{5}$ = ${y}^{2}+y$ | ${y}^{2}+y+1$ |

${y}^{6}$ = ${y}^{3}+{y}^{2}$ | ${y}^{3}+y$ |

${y}^{7}$ = ${y}^{3}+y+1$ | ${y}^{2}+1$ |

${y}^{8}$ = ${y}^{2}+1$ | ${y}^{3}+y+1$ |

${y}^{9}$ = ${y}^{3}+y$ | ${y}^{3}+{y}^{2}$ |

${y}^{10}$ = ${y}^{2}+y+1$ | ${y}^{2}+y$ |

${y}^{11}$ = ${y}^{3}+{y}^{2}+y$ | $y+1$ |

${y}^{12}$ = ${y}^{3}+{y}^{2}+y+1$ | ${y}^{3}$ |

${y}^{13}$ = ${y}^{3}+{y}^{2}+1$ | ${y}^{2}$ |

${y}^{14}$ = ${y}^{3}+1$ | y |

${y}^{15}$ = 1 | 1 |

Algorithm | TDEA | MISTY1 | CAST-128 | HIGHT | PRESENT |
---|---|---|---|---|---|

rv32i | 25,041 | 1013 | 2237 | 4528 | 14,102 |

rv32i+crypto | NC | 977 | 2139 | 4400 | 1641 |

Acceleration | NC | 1.037 | 1.046 | 1.029 | 8.607 |

Algorithm | AES | CAMELLIA V1 | CAMELLIA V2 | SEED V1 | SEED V2 |
---|---|---|---|---|---|

rv32i | 1606 | 1861 | 2258 | 2133 | 4533 |

rv32i+crypto | 438 | 1768 | NC | NC | 2854 |

Acceleration | 3.685 | 1.053 | NC | NC | 1.589 |

Algorithm | SHA-256 | SHA-512 | SHA3-256 |
---|---|---|---|

rv32i | 4755 | 13975 | 25,976 |

rv32i+crypto | 2708 | 8471 | NC |

Acceleration | 1.756 | 1.650 | NC |

Algorithm | V1 (rv32i) | V2 (rv32i) | V2 (rv32i+crypto) |
---|---|---|---|

SBOX Address Calculation | 800 | 800 | 640 |

Operation | V1 (rv32i) | V2 (rv32i) | V1 (rv32i+crypto) |
---|---|---|---|

128-bit Rotate | 132 | 132 | 132 |

32-bit Rotate | 12 | 12 | 4 |

8-bit Rotate | 0 | 396 | 0 |

SBOX Address Calculation | 440 | 440 | 352 |

Operation | rv32i | rv32i+crypto |
---|---|---|

SBOX calculation | 1152 | 1152 |

32-bit Rotate | 64 | 16 |

Operation | rv32i | rv32i+crypto |
---|---|---|

8-bit Rotation | 1408 | 1280 |

Operation | rv32i |
---|---|

Initial Permutation | 152 |

Inverse Initial Permutation | 254 |

SBOX Table Read | 5424 |

E Permutation | 2016 |

P Permutation | 6144 |

Permuted Choice 1 | 399 |

Permuted Choice 2 | 9216 |

Operation | rv32i | rv32i+crypto |
---|---|---|

pLayer | 7936 | 558 |

sBoxLayer | 5766 | 248 |

Algorithm | TDEA | MISTY1 | CAST-128 | HIGHT | PRESENT |
---|---|---|---|---|---|

rv32i | 6680 | 3256 | 3760 | 3028 | 1552 |

rv32i+crypto | NC | 3132 | 3704 | 2996 | 352 |

Reduction | NC | 1.040 | 1.015 | 1.011 | 4.409 |

Algorithm | AES | CAMELLIA V1 | CAMELLIA V2 | SEED V1 | SEED V2 |
---|---|---|---|---|---|

rv32i | 2536 | 7448 | 9032 | 1048 | 2248 |

rv32i+crypto | 436 | 7076 | NC | NC | 1416 |

Reduction | 5.817 | 1.053 | NC | NC | 1.588 |

Algorithm | SHA-256 | SHA-512 | SHA3-256 |
---|---|---|---|

rv32i | 632 | 1392 | 3996 |

rv32i+crypto | 488 | 1088 | NC |

Reduction | 1.295 | 1.279 | NC |

Algorithm | TDEA | MISTY1 | CAST-128 | HIGHT | PRESENT |
---|---|---|---|---|---|

Memory | 256 | 642 | 8192 | 10 | 8 |

Algorithm | AES | CAMELLIA V1 | CAMELLIA V2 | SEED V1 | SEED V2 |
---|---|---|---|---|---|

Memory | 1288 | 1072 | 304 | 4176 | 576 |

Algorithm | SHA-256 | SHA-512 | SHA3-256 |
---|---|---|---|

Memory | 288 | 704 | 188 |

Hardware Module | AREA (GE) |
---|---|

RISC-V Core | 19,706 |

Bit Re-positioning Instructions | 766 |

Carry-Less Multiply Instructions | 2248.5 |

Crossbar Permutation Instructions | 756.5 |

Logic With Negate Instructions | 177 |

Packing Instructions | 52 |

Hash Instructions | 2030.5 |

AES and SM4 Instructions | 1437 |

**Table 20.**Cryptography instruction and instruction module extension usage of crypto implementations.

Cryptographic Algorithm | Instruction Usage | Instruction Module Extension |
---|---|---|

AES | aes32esmi, aes32esi | AES and SM4 |

SEED V2 | xperm4, rori | Crossbar Permutation, Bit Re-positioning |

CAMELLIA V1 | xperm4, rol | Crossbar Permutation, Bit Re-positioning |

MISTY1 | pack | Packing |

CAST-128 | pack | Packing |

HIGHT | grev | Bit Re-positioning |

PRESENT | xperm4, unshfli, rori | Crossbar Permutation, Bit Re-positioning |

SHA-256 | SHA-256 Instructions | Hash |

SHA-512 | SHA-512 Instructions | Hash |

Hardware Module | AREA (GE) |
---|---|

Address Calculation Instruction | 220 |

