Introduction: Twofish Encryption Algorithm on ZYBO

Greetings!

In this project I will show you how to create an encryption IP. The algorithm used is Twofish, a clock cypher with keys and plaintext ranging from 128 bits to 256 bits. It is one of the finalists of the Advanced Encryption Standard contest, with no successful cryptanalytic attack known to date. More information about the cypher can be found here:

https://www.schneier.com/academic/twofish/

This project's aim is to implement only the 128 bit version of the encryption algorithm. This requires some HDL and FPGA knowledge. The language used to describe the IP is Verilog 2001, with little use of SystemVerilog in certain sections.

Step 1: Tools Used

Simulation tool: Vivado Simulator

Synthesis and implementation tool: Vivado 2016.4

Software development kit: SDK 2016.4

Development board: ZYBO Zynq-7000 Development board

Storage: microSD card

Terminal: Tera Term

Step 2: Twofish Structure

Twofish consists of 16 rounds built similar to the Feistel network structure. This is of great advantage because encryption and decryption are quite similar in structure, the only major differences are the keys used in those processes.

Also, both inputs and outputs are XORed with 8 keys K0....K7. Those steps are called input whitening and output whitening. XOR operations are used in most ciphers because they are reversible operations, allowing decryption to be implemented.

Other elements of the algorithm include Maximum Distance Separable matrices (MDS), Pseudo-Hadamard Transform (PHT) and key dependent S-boxes.

The S-boxes are new in cipher designs. They are used as a non-linear fixed substitution operation. In Twofish, each S-box consists of three 8-by-8-bit fixed permutations chosen from a set of two possible permutations, namely q0 and q1.

MDS multiplies a 32-bit input value by 8-bit constants, with all multiplications performed (byte by byte) in the Galois field GF (256). The polynomial used in this operation is x^8 + x^6 + x^5 + x^3 + 1.

The PHT is a simple addition function described by the equations:

a' = a + b
b' = a + 2*b


The cipher uses 40 keys ranging from K0 to K39. As described earlier, 8 keys are used for whitenings steps. the other keys are used in each of the 16 rounds. A big advantage of the cipher is that there is a small difference between the round structure and the key-generator function, thus allowing us to use the same blocks for both rounds and key-generating.

A much more detailed explanation of the structure:

https://www.schneier.com/academic/paperfiles/paper...

Step 3: Structure Modifications

In order to improve the speed of the encryption process, certain modifications have been made:

1) Most matrix multiplications have been replaced with look-up tables. This allows for the fastest possible operations. On the downside, the area occupied by the implementation raised considerably.

2) Instead of using a single function that has it's outputs driven to it's inputs 16 times (the rounds), I created 16 functions. This creates a pipeline structure that makes it possible to feed plaintext inputs while other texts are being encrypted, considering that we use the same key for encrypt data. While this speeds up the encryption of large portions of data quite fast, the area occupied raises by 16 times.

3) In order to reduce the number of look-up tables (considering that each function has four LUTs of 256 elements and there are 16 functions....), the structure of the function F has been modified. Instead of using two H functions that have the same internal structure, a MUX has been used. This is doable because the only difference between the two is the input, one function has the input rotated 8 positions to the left. So, a MUX that has a toggle signal on it's selection will drive the input on the first clock cycle to the function H and on the next clock cycle it will rotate the input 8 times before driving it to the function H. This will halve the number of LUTs used.

Step 4: Simulating the Design

Open Vivado and add the source files form the src folder and twofishTB.v from the tb folder.

-------------------------------------------------------------------------------------------------------------------------------------------------

Note:

Twofish.zip contains the original source files that use readmemh statements to initialize the LUTs. This statement might not be synthesizable on some technologies.

LUTmodification.zip do not use readmemh, but you have to change the compiling settings to SystemVerilog.

-------------------------------------------------------------------------------------------------------------------------------------------------

In order to test the results, we have to compare the waveform results with the outputs from the test vector files that are available from the official website. The results.zip contains the official files:

ECB_IVAL.TXT -> encryption intermediate values

ECB_TBL.TXT -> MDS and permutation tests

ECB_VT.TXT -> variable text test

In order to modify the input of the encryption block, we can change it from the testbench file. Furthermore, we can test the inputs using an online encryption tool, such as:

http://twofish.online-domain-tools.com/

Step 5: Creating a Block Design

After simulating the functionality, we can create a new project where we will add the TwofishIP and connect it to the processing system. Select the ZYBO board for the project.

1) Start by adding the IP and ZYNQ PS to the block design. The block automation on the PS will add DDR and multiple Peripheral I/O Pins, but we can leave only UART1,GPIO and SD0, disable the other ones. Go to Clock Configuration tab at the ZYNQ PS and to PL Fabric Clocks, here we have to change the FCLK_CLK0 frequency to 45Mhz in order to meet the timing requirements.

2) Add a GPIO block and link the switches to it. Optional, you can add a GPIO for the leds, one can be used to show whether the encryption module is busy.

3) Run the connection automation tool, it will automatically add a reset block and the AXI Interface with the TwofishIP.

4) Validate the design by pressing F6.

5) Right-click on the Block Design in Sources tab and create a HDL Wrapper. This allow us to synthesize the design.

6) Run Implementation and generate the bitstream. We can also report design utilization and timing.

7) Export the bitstream (File->Export...->Export Hardware-> check Include bitstream!) and lunch SDK.

Step 6: Working in the SDK

SDK allows us to run a program on the processing system. We will use the PS to send data to the TwofishIP and read the encrypted text from it. Also, the PS uses the ff.h functions (such as f_open and f_write) to store the encrypted text on a microSD card. The information is also stored in the same format used in the official Variable Text test vector file. This will make it easier to compare the results we get from our encryption module with the official expected results.

  • Create a new application project in SDK and add the main.c file.
  • Make sure that the addresses form the xparameters.h in SDK and Address Editor in Vivado match!

main.c creates a bus of 128 bits of data with a 1 on the MSb and 0 on the other bits. After driving this input to the encryption module, the result is stored on the SD card and the bus is shifted right one bit. This process is repeated 127 times. This will test most of the cases in the ECB_VT.txt file.

  • Program the FPGA with the bitstream generated earlier. After it is done, open a terminal software application (such as Tera Term), specify the COM port of the connected ZYBO and select a BAUD rate of 115 kb/s.
  • Run the application on hardware! In the terminal window we can see the inputs that are being tested, as well as some debug prints such as the enable switch being on/off (labeled as Busy).

Certain errors might occur while opening files or writing on the SD card. The errors are indexed by the return value (return 0 means that is has successfully finished that operation). Don't forget to format the microSD card with a FAT32 filesystem! Please refer to this link whether you have any problems with those operations:

http://elm-chan.org/fsw/ff/00index_e.html

The file created on the SD card is called "ENCRYPT.TXT" and it is saved in the root directory. By using a compare tool in Notepad++, we can monitor the differences between this file and the official variable text file.

Attachments

Step 7: Further Improvements

In the current stage, the encryption module works with a fixed global key of 128 bits of 0. This is solved by adding a matrix multiplication in Finite Field GF(256) with the polynomial x^8 + x^6 + x^3 + x^2 + 1.

Following the addition of this multiplication, further tests can be done such as the Variable Keys test and Monte Carlo Test.

By small modifications to the structure, decryption can be achieved. It will require another input to the module (encrypt/~decrypt) and the keys to be given in reverse order into the function H. Most of the base implemented structure can be reused for this purpose.

Increasing the size of the cipher to 192 and 256 bits. This will require heavy modifications on the structure.

Further improvements on this guide itself!

Thank you!