Introduction: How to Solve Simple Substitution Ciphers

About: I enjoy DIY projects, especially those involving woodworking. I'm an avid computer programmer, computer animator, and electronics enthusiast.

A substitution cipher is a simple "one-to-one" correlation between letters of a key and letters of a message to be encrypted. This is the easiest cipher type to break, and that's why you'll find these puzzles in newspapers alongside Sudoku puzzles.

Maybe you've never played with these puzzles before and would like to know where to start - I hope this Instructable can answer your questions.

Step 1: Acquire a Cryptogram

Cryptograms are generally very easy to find. Like stated earlier, they can be found in newspapers fairly readily, and a Google search can give you more cryptograms that you could ever want.

An example site with many cryptograms to play with is http://www.cryptograms.org/play.php.

Step 2: Method 1: Word Lengths and Punctuation

If a cipher were intended to be a bit more difficult to break by hand, all punctuation would be eliminated and letters would be jumbled together or broken into identical-length "words." Instead, these cryptograms are made to be "easy," and as such, are left with proper word lengths and punctuation.

I begin my solving process by drawing up a solving environment on paper - my method can be seen in the second image. When I find a letter relationship, I mark it in the alphabet on the top line and fill in the letter occurrences in the spaces that follow.

When you first begin to decipher a cryptogram, you will want to identify the one-letter words. The English language has two one-letter length words: "I" and "a." Knowing this, you can make a reasonable assumption for your first letter substitution.

Once complete, you move on to two-letter and three-letter words. Look for relationships between letters. For example, if you see a two letter word, "eq," a three letter word, "qys," and a four-letter word, "qyiq," a reasonable assumption would be the first word could be "it," the second word could be "the," and the third word could be "that," as these are common words. The key point here is think of common words and look at the letter relationships between words to make an assumption.

Contractions make easy substitutions as well. If you see a repeated contraction, the letter being contracted is likely an "s," although, it could also be "d" or "m." Two letter contractions could be "re," "ll," or "ve."

Assemble as many words as you can through these processes, using aids such as http://www.morewords.com/wordsbylength/ to make assumptions and solve words. I have found this process to be a relatively slow but reliable method of solving newspaper cryptograms.

SPOILER: The next step shows the process of solving the first cipher in the newspaper cryptogram pictured above.

Step 3: Solving the First Pictured Cipher

You'll notice I missed a few letters here and there, but I filled them in when I realized they has been missed in the substitution process.

This cipher took right around ten minutes to solve.

Step 4: Alternate Method: Letter Frequency

The underlying reason behind why substitution ciphers are so easy to solve relates to the predictability of each letter's frequency. A longer message using a substitution cipher is easier to crack because there are more samples.

In the first image, you will see an image showing letter frequencies - I made this histogram using a Wikipedia article, sampling 50,000 letters. If you search Google for letter frequency charts, you will find similar plots in existence.

The second image is the letter frequency from the cipher I solved in the previous step. As you can see, the letters "p" and "j" have the highest occurrence rate. A reasonable assumption would then be that one letter corresponds to "e," the most common letter, and the other must correspond to "t," the second most common letter.

You can use a combination of the previous solve-by-hand method with the letter frequency method to solve cryptograms faster.

---------------------------------------------

I have included an executable .jar file I wrote you can use to analyze letter frequencies and generate plots like the one's I've pictured above. The source files are packaged in the .jar file, if you want to extract them. Alternatively, you could download just the source code, which I have attached here as well.

Step 5: Happy Deciphering!

If you have questions, please ask them in the comments!

How to Play ____

Participated in the
How to Play ____

Coded Creations

Participated in the
Coded Creations