Cryptogram Solver

Introduction

If you’re anything like me, everytime you see a cryptogram in the entertainment section of the newspaper, you stare at it for about 60 seconds and then you give up. I’ve always wanted to be able to solve cryptograms but I never really learned how. Furthermore, even after I read a few things about the strategies you can use to solve them, I was intrigued by (in my mind) a much more interesting problem – could you write a computer program to solve cryptograms for you?

First, some definition of the problem is in order. The object of the program is to change a ciphertext consisting of cipher words into a cleartext made up of words which exist in a dictionary file, by mapping each cipher letter into exactly one cleartext letter. There is a 1-to-1 mapping in that no two cipher letters share the same cleartext and vice versa. I will also allow for apostrophes as letters which always map to apostrophes, and ignore all other characters (periods, commas). Words are separated by any non-alphabetic or apostrophe letter.

Example

Gb py bgioy lkq ckd’y oqwwttc, yil, yil prpgd.

Random Assignment

My first attempt at solving this problem was very rudimentary. Basically, assign a random mapping of all cipher letters to cleartext letters and produce the result. Check to see that every word is in fact in the dictionary. This solution will inevitably work, but it will also take forever. How many mappings are there for 26 letters? Well, for “A” you have 26 choices, for “B” you have 25 choices left, etc. In other words there are 26! mappings to choose from and only one will give you the right answer. In fact this is worse than just looping through all possibilities because you will start duplicating random assignments that you’ve already tried. Don’t do this.

Distribution Ranking

This is a similar approach to what a human would do. Basically, we know that for most texts, e is the most common letter, followed by t, a, o, i, n, s, h, and r. Insert RSTLNE joke here. What we can try to do is to analyze our ciphertext and rank the letters by occurrence frequency and then assign letters based on our letter ranking list. As it turns out, this approach didn’t work very well. If your text has exactly the same letter distribution as the English language in general, then your first assignment should be the answer, and you’re done. This is not the case, and it’s even possible that the most common letter isn’t even E. And that’s a big problem, because when it comes to recursive algorithms, getting the first part wrong means you spend lots of time searching for solutions where none will be found. There may be a way to do this using some kind of breadth first search which will lead to a relatively quick solution, but I have to say I think this approach is misguided.

Pattern Search

The solution I eventually came up with has much less to do with the letters themselves and more to do with the uniqueness of each word. When you’re looking at a cipher word, you don’t know what the letters actually represent, but you do know one very important thing. You know that everytime the same letter is used in the cipher, it is the same letter in the real word too. In other words, the letter pattern of a word is a property shared by the cleartext of that word.

Here’s an example:

oqwwttc (could be rewritten as)
0122334 (numerically speaking)
abccdde (as a normalized cipher word)
succeed (an actual word with this pattern)

As it turns out, in my dictionary, there are only five seven-letter words with that letter pattern, and succeed is one of them. Fantastic! We now know that the solution, if it will be found, must assign letters according to one of those five words. This leads us to the following pseudocode:

  1. Load the dictionary into a hash table where the hash is the normalized letter pattern of each word, and the real words are added to the table according to their hash
  2. For each ciphertext word, normalize it and get a list of all English words that match this pattern
  3. Sort the list of ciphertext words in increasing count of English words
  4. Recursively assign a word to the current cipher. Double check that this does not violate any previously assigned letters for previous words; if so, back out. Pick the next word and recurse.

This algorithm is actually really fast. For the cipher I gave above, there are only 5 choices for the first* word and only 7 for the second*, and this results in an assignment of 8 letters immediately which have a high (1/35) chance of being correct. The search space increases to 49 choices for the next word, and more for each word following, but by the time you reach the words with a high number of pattern matches you’ve already decided most of your letter assignments already, so your choices are easy to prune.

Here’s the source: View Source

Summary

A true programmatic crypogram solver doesn’t necessarily need to use the same skills as a human would use to solve the same problem; we can use the computer’s advantage of being able to complete repetitive tasks by pre-processing every word in the dictionary based on its letter pattern, and draw on that information to choose whole word assignments instead of individual letters. Choosing from the smallest number of options severely reduces the recursive search space, and from there it’s just a matter of testing the remaining word patterns until we reach the solution(s).

If you see any glaring errors in my code, or possible optimizations, please feel free to add a comment saying how I could improve. Thanks!

Oh and by the way, if at first you don’t succeed, try, try again.

* Not actually the first word, but the word with the lowest matching pattern count in the dictionary.

Lambda Functions in C++ for RAII

I learned a new trick recently at Microsoft involving the use of C++ lambda functions for automatic resource cleanup, which I thought was pretty cool. It involves combining the RAII programming pattern with a new feature of C++ 0x, namely lambda functions.

What is RAII?

RAII stands for Resource Acquisition Is Initialization, a design pattern which is popular in languages such as C++, D, and Ada. The idea is that you want to acquire resources during the initialization of objects, i.e. as soon as possible, so that you cannot accidentally use an uninitialized object, and also that you want your object to automatically release the resource upon destruction. One of the main advantages of this pattern is that your resources will always be released, even if there are errors or exceptions between when your object is initialized and your object goes out of scope. How does one do this? Here’s a simple example:

#include 
using namespace std;
class FileCloser
{
public:
FileCloser(char* fname)
{
fp = fopen(fname, "r");
}
void ReadLine(char* line, int count)
{
fgets(line, count, fp);
}
~FileCloser()
{
fclose(fp);
}
private:
FileCloser() { }
FILE* fp;
};

int main(int argc, char* argv[])
{
FileCloser fc("test.txt");
char line[100];
fc.ReadLine(line, 100);
cout << line << endl;
return 0;
}

As you can see, we didn’t have to explicitly call fclose on the file pointer because our FileCloser object’s destructor did it for us as soon as the FileCloser object went out of scope in main. Thus, even if we change main to the following, we will still close fp after the exception is thrown:

int main(int argc, char* argv[])
{
FileCloser fc("test.txt");
char line[100];
fc.ReadLine(line, 100);
cout << line << endl;
throw;
return 0;
}

What are lambda functions?

Lambda functions are a new feature of the proposed new standard for the C++ programming language called C++0x, although it’s likely to be introduced sometime in 2009. They are a functional programming technique based on lambda calculus, which you may or may not remember from Theory of Computation. Or was it Programming Languages? Either way, I like to think of it as defining an anonymous inline method where you need it, instead of having to define your own function pointer or delegate method elsewhere. Here’s an example of a simple lambda function:

#include 
#include
#include
using namespace std;
int main(int argc, char* argv[])
{
vector<int> v = { 1, 2, 3, 4, 5 }; // Another new feature of C++ 0x
for_each(v.begin(), v.end(), [](int& x) { x = x * x; });
for_each(v.begin(), v.end(), [](int x) { cout << x << endl; });
return 0;
}

How to use lambda functions to perform RAII

Here’s where we combine these two techniques to perform RAII using lambda functions. In the first code snippet I showed you where we used RAII to automatically release the resource, we still had to define our own class, complete with a destructor, in order to get the kind of resource release we want in the face of errors and exceptions. And although you could design a generic class which could handle automatically deleting pointers which were allocated via new, you would have to create a separate class for each resource type if you have to do something a little bit more complicated to release a resource (for example, by calling a method). But thanks to lambda functions, this is no problem:

#include 
#include
using namespace std;
using namespace std::tr1;

int main(int argc, char* argv[])
{
FILE* fp = fopen("test.txt", "r");
shared_ptr fileReleaser(fp, [](FILE* fp) { fclose(fp); });
char line[100];
fgets(line, 100, fp);
cout << line << endl;
throw;
return 0;
}

The shared_ptr in this example is used to create a sharable pointer to the object we want to release. The arguments to its constructor are first the pointer, and then a pointer to a deleter method, in this case the lambda function. When the shared_ptr goes out of scope outside of main it will automatically call the lambda function to release the file.

Summary

In summary, RAII is a great defensive programming technique you can use to make sure your resources are released and your code doesn’t leak, and lambda functions are a great new way to make using RAII much less painful than it ever was before.

References

Wikipedia: RAII
HackCraft: RAII
Wikipedia: C++0x
Herb Sutter’s Blog
MSDN: shared_ptr

Top Ten Things I Miss in Austin

10. MoPac and 183

I never thought I’d say that I miss an intracity highway, but there’s something to be said for a 3-lane highway that can get you from the tech suburbs to downtown in 20 minutes. My typical commute from the U-District to Microsoft in Seattle is about an hour on 520 (a 2-lane highway), so I definitely miss being able to get around town fast.

9. Barton Springs

Barton Springs might be the coldest body of water in Austin, but it’s also one of the coolest. The fact that Austin decided to turn the springs into a public swimming pool, complete with diving boards, is pretty awesome. It’s best to go on a really hot day and alternate between freezing your skin off and drying off in the sun.

8. Brisket – Rudy’s, Salt Lick, County Line

Austin has some of the best BBQ in the country. Where else would you find affordable and delicious fast food barbecue than at Rudy’s? And you really can’t beat the cash-only BYOB experience at the Salt Lick. Their brisket is absolutely amazing.

7. Town Lake (aka Ladybird Lake)

Town Lake has everything you could ever want in a jogging trail. Variable distances, nice dirt paths, a great view of the water, and even a dog park! Plus there’s free water. Thanks RunTex!

6. Austin City Limits

Three days of fun in the sun and all-you-can hear music at an affordable price in Zilker Park. Some of my favorite bands over the years have been Muse, Cake, Sheryl Crow, John Mayer, Blue October, Regina Spektor, Ghostland Observatory, Guster, Brazilian Girls… I could go on and on.

5. Lake Travis

Party barges, cliff diving, and scuba diving are among the few of many activities I have participated in at Lake Travis. All of them were incredibly fun. The water is so warm in the summer you don’t even need a wetsuit, even if you’re 60 feet underwater scuba diving. Lake Travis more than makes up for the incessant Texas summer heat.

4. Sixth Street

If you haven’t heard of Sixth Street yet, you probably haven’t heard of Austin. A street so populated with bars and so popular on Friday and Saturday nights that the police closes off traffic in preparation for all of the stumbling foot traffic. Vendors sell pizza and the best wurst you’ve ever had after the bars close, and let me tell you, it’s delicious.

3. Tubing on the Guadalupe and Comal Rivers

I love tubing on the Guadalupe River. Spending the whole day laying around, having fun with your friends, and shooting through the “rapids”. Okay, so they aren’t that fast, but they are fun.

2. Tex Mex – Trudy’s and Chuy’s, even Taco Cabana

Finding great Tex Mex in Seattle is like trying to go snowboarding in Austin. No matter how hard you look, you just won’t find it. Trudy’s has both delicious stuffed avocados, and potent Mexican Martinis. Chuy’s has a Chuychanga that is easily my favorite chimichanga ever made, and their secret jalapeno ranch salsa makes tortilla chips into a dessert. Even Taco Cabana, a fast food joint in Austin, has some pretty tasty eats.

1. Friends

Since my first foray into Austin as an intern at National Instruments, the friends I made were the reason I was able to do so many fun and exciting things, because without friends to share them with, they would have just been things. To those of you I left behind in Austin, I miss you and I hope you visit Seattle soon.

Solving Tetris via a Scoring Algorithm

When I was in college, I wrote a simple Windows Tetris game I called WinTetris, because the name wasn’t taken and I wasn’t feeling very creative that day. This project was at first just an exercise in Win32 programming and game logic, and the first version featured very ugly GDI graphics with DrawRect.

Over time, I polished it quite a bit, and added a lot of new features. I replaced the ugly graphics with DirectDraw, added support for mouse/keyboard controls, added sound and music, A/B game types, and LAN multiplayer. But quite possibly my favorite new feature was the “demo mode”, which starts when you launch the program.

Demo mode was a simple way for me to use the tetris engine I had already developed to do something cool; write an algorithm that could play tetris extremely well. The bonus was that I could show off both my AI programming and my game at the same time, thanks to my demo mode.

So how does it work? At the top-most level, it does the following:

  1. Considers every possible piece placement for the current piece, including both lateral movement and rotation.
  2. Assigns each piece placement a score, according to some kind of scoring mechanism.
  3. Chooses the placement with the highest score.
  4. Moves and rotates the piece into place.

This arrangement is pretty logical, but everything hinges on the scoring algorithm. How would you write such a function? Personally, I thought a bit about things that are good, and things that are bad when it comes to playing tetris, in general:

  • Completing a line is a good thing. Completing more than one line at once, all the way up to a tetris, is even better.
  • Reducing the maximum stack height is a good thing. Conversely, making the maximum stack height higher is bad.
  • Placing a piece lower in the playing field is generally better than placing it higher.
  • Creating empty holes beneath a piece is bad.

So now that you know what to look for, you write methods that will tell you the effects of your piece placement, assign each measure a weight, and total up the plusses and minuses. But what weight values to pick?

Personally, what I did was to pick weights that seemed reasonable, watch how the game played, and then made adjustments if I thought it was playing in a way that was either too reckless or too safe. And although I didn’t mention it before, my goal was an algorithm that would take a long time to lose, not one that loves to set up tetrises or rack up a high score.

In the end, I made the following weight table:

Hole Count: -500
Lines Completed: +500
Block Height: -150
Chasm Height: -400

If I was to tweak these weights again, I would want to take a more scientific approach. I could use even more AI programming to run thousands of tetris simulations, and see which weight values yielded the lowest average stack height, and use those values. Perhaps that could be a future project of mine.

The last trick I added was a modification of the process, not the scoring algorithm. In the process I gave above, the code only evaluates the current piece. But players get to see the next piece! So instead of choosing the piece placement with the highest score, I modified my algorithm to calculate the best combined score (A + B) for the current piece as well as the next piece, and use that information to place only the current piece. So the current piece is placed with the highest potential score. The weird thing about this is that the algorithm is not committed to that second placement. It can choose to do something completely different if the next piece changes the optimal two-piece score.

So now that you know how to write a Tetris AI, break out your favorite compiler and improve on my methods. I implemented WinTetris such that you can create your own AI DLLs if you want to play around with it. The installer and source code for WinTetris is available at http://www.hexar.net/wintetris.php. Enjoy!

How Much Money I Lost in Economic Storm 2008

I like to track my finances every month; I like knowing not just how much money is in my accounts right now, but also where exactly my money went. I use a new Excel spreadsheet for every new month, with a tab for each individual account, as well as a budget tracking tab and a summary page. It sounds like a lot of work, and it is, but it’s worth it for me to know where my money is being spent on. It gives me, at the very least, some illusion of control over my finances.

But, for all of 2008, I was frustrated. Frustrated because of the one number I looked at most month to month, my net worth. This number is easily calculated as my assets minus my liabilities. In previous years, starting in 2004, I saw this number gradually rise from a negative number (thanks student loans) to a positive one. And then to grow further from there. At the end of each year I could see how much this number increased on average per month. It was very reassuring. But not in 2008. Despite my best efforts to save money, my number didn’t budge.

As you know, 2008 was a bad year for the stock market. But I didn’t actually realize how bad it really was. Take a look at this graph:


(image from Google Finance)

Between Dec. 28, 2007 and Dec. 31, 2008, the S&P 500 index dropped from 1478.49 to 903.25. That’s a reduction of about 38.9% in a single year, which is a lot more than I realized. Furthermore, in the beginning of January 2008 I had money invested in my 401k and Roth IRA to the tune of about 52% of my yearly salary. I won’t tell you how much I make, but I will say that this means I lost money totaling around 20% of my salary in the stock market, just from money that I had in January. Not to mention all of the money I invested up until October, when the market started really going south.

In a weird way, I find this information comforting. The reason is that I was beating myself up over not being able to make any headway with my net worth over an entire year. But I failed to realize just how strongly I was trying to swim against the current. The fact that my net worth didn’t move much over a year is actually a good thing; it could have been much worse. And with prices as they are right now, my new 401k and Roth IRA investments have a lot more purchasing power for the same amount of money. Perhaps I should care less about the net worth itself, and worry more about how many new investment shares I acquired over the course of the year. Someday, when the market bounces back, I will be glad I continued to invest even in such a year as 2008 ended up being.

What about you? How were you affected by the Economic Storm of 2008?