Cryptogram Solver

Introduction

If you’re anything like me, everytime you see a cryptogram in the entertainment section of the newspaper, you stare at it for about 60 seconds and then you give up. I’ve always wanted to be able to solve cryptograms but I never really learned how. Furthermore, even after I read a few things about the strategies you can use to solve them, I was intrigued by (in my mind) a much more interesting problem – could you write a computer program to solve cryptograms for you?

First, some definition of the problem is in order. The object of the program is to change a ciphertext consisting of cipher words into a cleartext made up of words which exist in a dictionary file, by mapping each cipher letter into exactly one cleartext letter. There is a 1-to-1 mapping in that no two cipher letters share the same cleartext and vice versa. I will also allow for apostrophes as letters which always map to apostrophes, and ignore all other characters (periods, commas). Words are separated by any non-alphabetic or apostrophe letter.

Example

Gb py bgioy lkq ckd’y oqwwttc, yil, yil prpgd.

Random Assignment

My first attempt at solving this problem was very rudimentary. Basically, assign a random mapping of all cipher letters to cleartext letters and produce the result. Check to see that every word is in fact in the dictionary. This solution will inevitably work, but it will also take forever. How many mappings are there for 26 letters? Well, for “A” you have 26 choices, for “B” you have 25 choices left, etc. In other words there are 26! mappings to choose from and only one will give you the right answer. In fact this is worse than just looping through all possibilities because you will start duplicating random assignments that you’ve already tried. Don’t do this.

Distribution Ranking

This is a similar approach to what a human would do. Basically, we know that for most texts, e is the most common letter, followed by t, a, o, i, n, s, h, and r. Insert RSTLNE joke here. What we can try to do is to analyze our ciphertext and rank the letters by occurrence frequency and then assign letters based on our letter ranking list. As it turns out, this approach didn’t work very well. If your text has exactly the same letter distribution as the English language in general, then your first assignment should be the answer, and you’re done. This is not the case, and it’s even possible that the most common letter isn’t even E. And that’s a big problem, because when it comes to recursive algorithms, getting the first part wrong means you spend lots of time searching for solutions where none will be found. There may be a way to do this using some kind of breadth first search which will lead to a relatively quick solution, but I have to say I think this approach is misguided.

Pattern Search

The solution I eventually came up with has much less to do with the letters themselves and more to do with the uniqueness of each word. When you’re looking at a cipher word, you don’t know what the letters actually represent, but you do know one very important thing. You know that everytime the same letter is used in the cipher, it is the same letter in the real word too. In other words, the letter pattern of a word is a property shared by the cleartext of that word.

Here’s an example:

oqwwttc (could be rewritten as)
0122334 (numerically speaking)
abccdde (as a normalized cipher word)
succeed (an actual word with this pattern)

As it turns out, in my dictionary, there are only five seven-letter words with that letter pattern, and succeed is one of them. Fantastic! We now know that the solution, if it will be found, must assign letters according to one of those five words. This leads us to the following pseudocode:

  1. Load the dictionary into a hash table where the hash is the normalized letter pattern of each word, and the real words are added to the table according to their hash
  2. For each ciphertext word, normalize it and get a list of all English words that match this pattern
  3. Sort the list of ciphertext words in increasing count of English words
  4. Recursively assign a word to the current cipher. Double check that this does not violate any previously assigned letters for previous words; if so, back out. Pick the next word and recurse.

This algorithm is actually really fast. For the cipher I gave above, there are only 5 choices for the first* word and only 7 for the second*, and this results in an assignment of 8 letters immediately which have a high (1/35) chance of being correct. The search space increases to 49 choices for the next word, and more for each word following, but by the time you reach the words with a high number of pattern matches you’ve already decided most of your letter assignments already, so your choices are easy to prune.

Here’s the source: View Source

Summary

A true programmatic crypogram solver doesn’t necessarily need to use the same skills as a human would use to solve the same problem; we can use the computer’s advantage of being able to complete repetitive tasks by pre-processing every word in the dictionary based on its letter pattern, and draw on that information to choose whole word assignments instead of individual letters. Choosing from the smallest number of options severely reduces the recursive search space, and from there it’s just a matter of testing the remaining word patterns until we reach the solution(s).

If you see any glaring errors in my code, or possible optimizations, please feel free to add a comment saying how I could improve. Thanks!

Oh and by the way, if at first you don’t succeed, try, try again.

* Not actually the first word, but the word with the lowest matching pattern count in the dictionary.

Solving Tetris via a Scoring Algorithm

When I was in college, I wrote a simple Windows Tetris game I called WinTetris, because the name wasn’t taken and I wasn’t feeling very creative that day. This project was at first just an exercise in Win32 programming and game logic, and the first version featured very ugly GDI graphics with DrawRect.

Over time, I polished it quite a bit, and added a lot of new features. I replaced the ugly graphics with DirectDraw, added support for mouse/keyboard controls, added sound and music, A/B game types, and LAN multiplayer. But quite possibly my favorite new feature was the “demo mode”, which starts when you launch the program.

Demo mode was a simple way for me to use the tetris engine I had already developed to do something cool; write an algorithm that could play tetris extremely well. The bonus was that I could show off both my AI programming and my game at the same time, thanks to my demo mode.

So how does it work? At the top-most level, it does the following:

  1. Considers every possible piece placement for the current piece, including both lateral movement and rotation.
  2. Assigns each piece placement a score, according to some kind of scoring mechanism.
  3. Chooses the placement with the highest score.
  4. Moves and rotates the piece into place.

This arrangement is pretty logical, but everything hinges on the scoring algorithm. How would you write such a function? Personally, I thought a bit about things that are good, and things that are bad when it comes to playing tetris, in general:

  • Completing a line is a good thing. Completing more than one line at once, all the way up to a tetris, is even better.
  • Reducing the maximum stack height is a good thing. Conversely, making the maximum stack height higher is bad.
  • Placing a piece lower in the playing field is generally better than placing it higher.
  • Creating empty holes beneath a piece is bad.

So now that you know what to look for, you write methods that will tell you the effects of your piece placement, assign each measure a weight, and total up the plusses and minuses. But what weight values to pick?

Personally, what I did was to pick weights that seemed reasonable, watch how the game played, and then made adjustments if I thought it was playing in a way that was either too reckless or too safe. And although I didn’t mention it before, my goal was an algorithm that would take a long time to lose, not one that loves to set up tetrises or rack up a high score.

In the end, I made the following weight table:

Hole Count: -500
Lines Completed: +500
Block Height: -150
Chasm Height: -400

If I was to tweak these weights again, I would want to take a more scientific approach. I could use even more AI programming to run thousands of tetris simulations, and see which weight values yielded the lowest average stack height, and use those values. Perhaps that could be a future project of mine.

The last trick I added was a modification of the process, not the scoring algorithm. In the process I gave above, the code only evaluates the current piece. But players get to see the next piece! So instead of choosing the piece placement with the highest score, I modified my algorithm to calculate the best combined score (A + B) for the current piece as well as the next piece, and use that information to place only the current piece. So the current piece is placed with the highest potential score. The weird thing about this is that the algorithm is not committed to that second placement. It can choose to do something completely different if the next piece changes the optimal two-piece score.

So now that you know how to write a Tetris AI, break out your favorite compiler and improve on my methods. I implemented WinTetris such that you can create your own AI DLLs if you want to play around with it. The installer and source code for WinTetris is available at http://www.hexar.net/wintetris.php. Enjoy!