# AES and EC Cryptographic Processor with Runtime Configuration

#### **Leonel Sousa**

#### Technical University of Lisbon/INESC-ID, Lisbon, Portugal

16th December, 2009





Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

17th International Conference on Advanced Computing and Communication – ADCOM'09

technology

from seed

# **Motivation**



technology from seed

### Most electronic devices require security

- Applications and devices have to be secure
- Security/computation in the same device
- Both asymmetric and symmetric cryptography required
- Advanced Encryption Standard (AES) widely used in symmetric cryptography

Symmetric block cipher, standardized by the NIST (US)

- EC cryptosystem used in asymmetric cryptography
  - − More security per key bit than RSA (e.g. 163 EC  $\leftrightarrow$  1024 RSA)
  - Therefore, smaller keys to be computed, transmitted and stored



Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

# **Motivation**

inesc id

- Exploiting TOGETHER reconfigurability ...
  - Design an architecture based on processing modules
  - These modules are independent and autonomous
  - Allow for runtime adaptation to the host requests

#### • ... and programmability

- Control the dedicated hardware with microcode (centralized memory)
- Support different algorithms (production of keys, signing)
- Fast and easy to design based on microcode

#### • Last but nost the least: "Good" random number generators (RNG)

- For private keys generation
- By integrating in a single device avoids communication of private data
- Compact random number generation with good statistical properties





• Motivation

## AES and Elliptic Curve (EC) overview

- Processor design
- Reconfiguration
- Results
- Conclusions



technology from seed

inesc id

- **AES** is performed over GF(2<sup>8</sup>) (8-bit words)
- Data organized in 4x4 matrices
- Simpler and faster than asymmetric cryptography





technology from seed

linesc id

#### • **EC** geometrical interpretation



EC discrete logarithm problem: it is hard to compute k knowing Q and P



Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technology from seed

• **EC** over a finite field GF(2<sup>m</sup>)



- Extension of a binary field
- Suitable to hardware implementations

inescid

- Additions are XOR
- Operations modulo an irreducible polynomial
- Polynomial Basis



technology from seed

inesc id

• **EC** operations hierarchy





Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technology from seed

inesc id

### **AES + EC:** challenges for putting together

- Different datapath width:
  - AES operates over 8 bit elements
  - EC operates over from 163-bit up to 571-bit elements
- Different security scalability:
  - + Secure AES = More rounds in the algorithm over the same data type
  - + Secure EC = Increase the underlying field size (larger elements)
- Different reduction methods
  - AES implementation can avoid reduction methods by table look-ups
  - EC involves direct computation of reduction methods





technology from seed

- Motivation
- AES and Elliptic Curve (EC) overview
- Processor design
- Reconfiguration
- Results
- Conclusions



inesc id

- Modular system: PU contains arithmetic units for basic field operations
- Centralized control: single shared instruction memory for all PUs
- Arbiter to solve memory access conflicts
  - PUs operate with multi-cycle instructions
- Architecture allows to easily detach or attach PUs
  - AES PUs and/or EC PUs
  - Adapt to runtime requirements
- Random Number Generator (RNG)
  - Avoiding external communication of private data





• Different processing units (PUs) sharing a single RAM with microcode





Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

- Random Number Generator (RNG)
  - Use internal jitter noise as randomness source (+oscilators = +randomness)

inesc id

- Host can order to write random words to the PUs local memory without accessing them
- Avoiding external communication of private data increases security
- RNG can be switched off by asserting a reset signal





Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technology from seed



• EC processing unit



- Each field element is decomposed in several words stored in the RAM
- Take advantage of dual port RAMs
- Operations supported:
  - 2-word addition
  - 2-word multiplication (Karatsuba)
  - Field addition
  - Squaring
  - Reduction
  - Conditional jump (key register)
  - Customized field operation



technology from seed

linesc id

- AES processing unit
  - XOR operation
  - Shared look-up ROM
  - ROM contains the substitution function and constants
  - Jump instructions supported



- Operations Supported (L(.) is a look-up)
  - R(c)=R(a)+R(b)

- R(c)=L(R(b))

- R(c)=R(a)+ L(R(b))
- R(c)=L(R(a)+R(b))



# Reconfiguration



technology from seed

- Motivation
- AES and Elliptic Curve (EC) overview
- Processor design
- Reconfiguration
- Results
- Conclusions



### Reconfiguration

inesc id

- Communication logic redirect input data to ICAP while reconfiguring (ICAP – Internal Configuration Access Port)
- Support different runtime requirements trade AES by EC PUs
- Number of PUs limited by the maximum number of AES PUs
  - Number of cycles per instruction in the AES unit (3 cycles)
- Resources to support a PU (reconfiguration zone) limited by EC PUs
  - EC PUs require more resources
  - More complex datapath
- Signals crossing the reconfiguration zone boundaries directly routed
  - 20 bus macros of 8 bit each required





- Maximum 3 AES PUs per arbiter = 3 reconfigurable zones per arbiter
- Several static EC PUs (more PUs = more conflicts)
- Compromise of up to 6 operating PUs attached to the same arbiter







- Motivation
- AES and Elliptic Curve (EC) Overview
- Processor design
- Reconfiguration
- Results
- Conclusions





- Prototyped on a Xilinx Virtex 4 (XC4VSX35-10) device
- Operating frequency: 100 MHz
- Resources:
  - RNG 20 oscillators: 20 CLB (passed statistical tests)
  - Static logic: 8,446 slices, 11 BRAM
    - 6 EC PUs
    - 2 arbiter
    - 1 RNG
    - Communication logic
  - Reconfigurable PUs: 1 BRAM 157±2 slices (AES) 943±7 slices (EC)
    - Oscillation in the resources usage different reconfigurable zones





- 13 x 21 CLB reconfigurable zones (1092 slices)
  - 14% usage by AES unit
  - 86% usage by EC unit
  - Resource margin allowing to ease the routing task
- 26 reconfiguration frames per reconfigurable zone
  - Different reconfiguration frames for the different reconfiguration zones
  - Compressed bitstream used to runtime reconfiguration
  - Bitstream size:

SUPERIOR

- 29500 x 32-bit words AES
- 31067 x 32-bit words EC
- PU reconfiguration time (limited by communication): 941µs
  - 310 µs if maximum ICAP frequency used



#### No PUs

#### AES PUs only

#### EC PUs only



Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

### Results

technology from seed

inesc id

- Prototype tested with
  - EC arithmetic over GF(2<sup>163</sup>)
  - AES 128-bit key
- Instructions' time
  - AES: 3 clock cycles
  - EC: from 3 to 14 clock cycles
- AES key expansion (610 cc), ciphering/deciphering (2,290 cc)
  - total 253 instructions
- EC point addition (4,796 cc) and point multiplication (201,661 cc)
  - total 401 instructions





• Conflicts solved by an arbiter

INSTITUTO SUPERIOR TÉCNICO



### Results

- Related art comparison
  - Our performance figures per arbiter (AES cipher. and EC point mult.):
    - Single AES PU throughput:
    - Maximum AES PU throughput:
    - Single EC PU throughput:
    - Maximum EC PU throughput:

5.59 Mbit/s 16.67 Mbit/s (x 2 ports = 33.34Mb/s) 496 OP/s 1536 Op/s (x 2 ports = 3072 Op/s)

inesc id

- Lim et. al. (Xilinx Virtex 41 MHz)
  - SIMD for EC and SISD for AES
  - Does not support simultaneous EC and AES computation
  - Our design is 1.4 and 2.65 faster for AES and EC (considering only one PU)
  - Our design is 2.1 times more compact



### Results

technology from seed

lisboa

- Related art comparison (cont.)
  - Wang et. al. (0.18µm ASIC -100 MHz)
    - Share multipliers and register for AES and EC
    - AES faster but does not support simultaneous AES+EC
    - Our design is 1.4 times faster for EC
  - Good et. al. (Xilinx Spartan 2 67 MHz)
    - Compact AES solution
    - Multiply-accumulate and byte substitution unit
    - 124 slices (vs. 157 slices in our AES PU)
    - Our throughput is 2.5 times higher (for 1 single PU)



## Conclusions



technology from seed

- Motivation
- AES and Elliptic Curve (EC) Overview
- Processor design
- Reconfiguration
- Results
- Conclusions





- EC + AES processor composed of several PUs sharing a common instruction memory
- Reconfigurability and programmability enhance the flexibility of the processor for different algorithms and protocols
- Architecture allows to easily reconfigure the PUs
  - Runtime reconfiguration allows to adapt the system to the runtime needs
- RNG allows to the internal generation of secrets, enhancing security
- PU's reconfiguration time smaller than EC point multiplication
  - Reduced overhead for the adaptability
- Processor provides competitive performance figures regarding the related art



### **Related publications**



- Samuel Freitas Antão, Ricardo Chaves and Leonel Sousa, "Compact and Flexible Microcoded Elliptic Curve processor for Reconfigurable Devices", In the 17th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), 2009
  [HiPEAC paper award]
- Ricardo Chaves, Georgi Kuzmanov, Stamatis Vassiliadis and Leonel Sousa, "Reconfigurable Memory based AES Co-processor", In IPDPS, 13th *Reconfigurable Architectures Workshop* (RAW), 2006.



Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technology

from seed



## **Questions?**

# Contact person: Leonel Sousa las@inesc-id.pt



Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa