Introduction: Python and Word Lists

One of the challenges of working with large amounts of data in a program is how to do it efficiently. In Python importing the code could not be easier, but everything gets bogged down when you try to work with it and search for items inside of modify it. In these next few examples I will provide several solutions to working with a word list containing practically every word in the English Language to solve various problems.

If you are new to Python - a great first language - everything can be downloaded from here 

Attached is the actual word lists in a .txt file. There are 113,808 individual words, each on their own line.

When you run the program the script and word list need to be in the same folder, otherwise you will get an error to the effect of "no such file exists"

Step 1: Helpful Code Sections

Before we begin there are a couple of lines of code that you might find helpful for interpreting what is going on if Python is still new to you.

fin = open('words.txt')  - This line opens up the file (it can be any file)
for line in fin: - This line goes through the file one line at a time up until the 113,809th line
word = line.strip() - This line removes all the white space and formatting so that only the characters in the specific line are stored in a string with variable name word.

Step 2: Words With "E"

In this program the challenge was to find how many words had the letter "e" in them or did not. One of the things to keep in mind though is that some words have "e" in them more than once, but they should only be counted as having "e" in them once.

This program runs through the word list line by line and tests to see whether "e" is in the word. If it is it keeps checking through the word to see if there are anymore "e". The total number of "e" are stored in a temporary variable. At the end of the word if that temporary variable is greater or equal to 1 does it then add 1 to the total number of words with "e", thus avoiding duplicates.

Step 3: Palindrome

What this program does is check to see if a word is a palindrome by starting with the first and last characters of a string and comparing them. If they are the same it works its way inward until it has completed all the letter or the number of letters is odd in which case the middle letter is ignored.

If it is a palindrome it returns True otherwise it returns False

This program only checks for individual words. For an extra challenge try modify it so that it goes through the entire word list and finds all the words with palindromes.

Step 4: Use Only Some/All of Provided Letters

In this program the challenge was to find all the words that could be made only with the supplied letters. Then more for fun we tried to form a complete sentence out of these words.

The words could use one, some, or all of the supplied letters. You can put in different combinations of letters to see what gives you the most or the least.

For a second program that was similar we had to modify it so that it would only print words that contained each and every word provided. However, if there were two letter "a" then the other word had to have 2 letter "a" not one and another letter of something else.

Step 5: Exclusion

In the exact opposite of the previous two this program's purpose was to find all of the words that did not contain any of the provided letters. 

What combination of words do you think excludes the fewest words? 

Step 6: Alphabetical Order

Now enough about characters in and out of words! For this program the challenge is to find all of the words that are spelled in alphabetical order. Our of 113,809 words how many do you think are spelled in alphabetical order?

The way this program works is to look at each word and go through it character by character comparing each character to the last character stored in a temporary variable. If it is in alphabetical order then it adds 1 to the total amount list.

You can actually compare characters in python for example is a < b ? Or you can convert it to a number ord[a] = 97 ord[b] = 98 and so on.

Step 7: Rotate

You could think of this as a very basic encryption or decryption program. What this does is take each letter and move it down x spaces (that you define) along in the alphabet so "a" moves 2 spaces  = "c".

Since we talked about how to convert a character to number in this program it seems only fit to convert number to characters. Use the chr(x) function ex. chr(97) = "a".

This program works by converting each character in a string to a string to a number, adding the defined amount and then converting it back into a character.

Step 8: The Ultimate Project - Anagrams / Bingo Solver

This program - a compilation of them all sorts through the entire word list 100,000+ words and sorts them according to the characters that they are made up of. Amazingly it only takes Python roughly 1.5 seconds to run through the entire program. Then it sorts the anagrams according to the number of anagrams per character set from greatest to least. For example from 'islt' you can form 'list' and 'silt'

Part 2 of this is to see which 8 character set forms the most anagrams in order to see what is the maximum possible number of Bingos that can be formed. It turns out there are 7 possible words that can be formed with the solution.

The most possible anagrams of a character set is 11.

Comments

author
bertwert made it! (author)2013-12-16

Bigger word list (178 691 words):
http://www.freescrabbledictionary.com/twl06.txt

author
umursengul made it! (author)2013-02-04

These will save my life :) I did some before but definitely needed some for analyzing data! Thanks!

author
Hammock Boy made it! (author)Hammock Boy2013-02-06

Glad to help. It is really amazing the way that Python can sort through text. Please let me know if you have any questions or need help.

author
amandaghassaei made it! (author)2012-10-29

nice, I think I've done some of these before too!

About This Instructable

44,809views

12favorites

License:

Bio: From solder to zip ties, lead acid batteries and LEDs, and especially Legos, putting things together has always fascinated me. The more challenging the better ... More »
More by Hammock Boy:Chipper Board - ATtiny Programming ShieldBoxhead recreated with Python with TkinterInterface Python and Arduino with pySerial
Add instructable to: