Introduction: Break a Substitution Cipher

About: I'm currently in college studying math and computer science with a minor in music. I enjoy participating in a wide range of activities from math competitions to whitewater rafting.

This Instructable is meant to explain some code I wrote that will help you to break a mono-alphabetic substitution cipher. There is at least one other Instructable out there that talks about some of the things that I have to say, but that one focuses more on the concepts rather than the code (https://www.instructables.com/id/How-to-Solve-Simpl... Its a great Instructable that I would recommend checking out before reading this one.

Anyways, for my Instructable, I will be using a program called Sublime Text 3 (http://www.sublimetext.com/3) that is a great all around tool for coding. Also, I will be using a terminal called Cygwin (https://www.cygwin.com/) to compile and run all my code. I'm not going to go over how to set up Cygwin in this Instructable since that is outside the scope of what this deals with. If people are having trouble with that, let me know and I can help individuals or maybe make an Instructable about it. If you have other ways to code, feel free to use them. All of my code in this Instructable will be in C++ (disclaimer: I have only been using C++ for a short time so there may be better/more efficient ways to do things. If so, leave a comment below!).

Alright, now to the programming!

Step 1: The Basics

So for those that may not know, a cipher is a way to encrypt words or letters into a message that is unreadable without the key. The key typically is in the form of a rearrangement of the alphabet, but it can be numbers or symbols as well. If a single alphabet is used, this is called a mono-alphabetic cipher whereas if multiple alphabets are used, it is called poly-alphabetic. In theory, this works well, but because it is very difficult to memorize a truly random key, systems for encrypting are generally much simpler. This works in the favor of a cryptanalyst (code-breaker) because it leaves patterns.

The main focus of this Instructable will be to go through how to use letter frequency to break a simple (mono-alphabetic) cipher and also how to implement a program that can generate the letter frequency for us.

Step 2: Letter Frequency

In the English language, the most common letters are ETAOINHR... , but the order can change slightly depending on your source of frequencies. This is useful because as the length of the cipher you are trying to break increases, the better fit to the English averages it will be. So while long messages may look like they will be harder, they actually provide you with a lot more information.

While this rarely will perfectly match up every letter in your message, it is a good place to start since it is very simple and straightforward. After you get your frequencies from the message, then you can start looking at single letters and other things to get more information.

Step 3: Getting Started Coding

For this part, I will do my best to keep it exciting, but I am assuming you have a background in some programming language. If you don't, I will hopefully explain myself well enough, otherwise feel free to ask any questions and I will do my best to answer them.

To begin, all C++ programs need to have a few things: a header, a namespace, a main function, and a return statement. See the picture for an example. A quick explanation of what each means:

#include <iostream> allows us to get user input and output text to the terminal, as well as a few other things

using namespace std; makes coding simpler; this is not needed but otherwise you will be typing std:: before most things to be able to run them

int main(){ } is the main function where the program will start when we run it and will hold all the rest of our code

return 0; tells the program where to end

Step 4: The Basics of the Program

To begin this program, we need to declare a couple of variables as well as get input from the user. See the code for further explanation of what each part does. This step is getting us ready to actually analyze the user input for letter frequencies.

Step 5: Analyzing the Text

So now we finally are going to analyze the text! To do this, I am going to use ASCII values for the letters, so a quick explanation about that. ASCII stands for American Standard Code for Information Interchange and is basically how the computer interprets each character that is typed into it. A table of ASCII values can be found at http://www.asciitable.com/. This allows us to scale a letter down to an array subscript and thus keep a count of how many times it is used in the ciphertext. This then can be divided by the total number of letters in the ciphertext to get our percent frequency which is then output to the terminal.

The first part is to count up the number of times each character is used. The first picture shows the code to do this. Essentially we will loop the same number of times as there are letters in the string, and each letter is scaled down to an array subscript. Then, the array holding the count of each letter is incremented. If a space comes up, the program increments a variable called "spaceCount" and does not change anything in the array. This then loops until all the characters have been processed.

Then, each array element is divided by the total number of non-space characters in the string and then multiplied by 100 to get the percent. This is the output on the terminal alongside the letter it corresponds to.

Step 6: Final Code and Result

Now we have all the elements to our program together and can analyze some text. To run this program, go to your terminal (Cygwin, etc.) and type in "g++ filename.cpp" replacing filename with whatever you saved your program as. This will then run through all of your code and compile it so that it can be run. If you get an error, it should tell you what line it is on and so you can go back and fix it from there. Any errors that you can't figure out, let me know and I'll try and help you out!

Now that it is compiled, we can run the program! To do this, type in "./a.exe" for Windows or "./a.out" for Mac. This will then launch the program, ask for your input, and give you the letter frequencies. See above for a picture of all this.

This code can definitely be improved, but it is a good place to start. If you have any questions or comments, please leave them below or send me a message! I hope to make more cryptography Instructables (more tools for mono-alphabetic ciphers, maybe some poly-alphabetic ciphers, and this summer I want to try to build a homemade Enigma machine) in the future, so let me know what you think of this one!

Coded Creations

Participated in the
Coded Creations

Automation Contest

Participated in the
Automation Contest