One of the challenges of working with large amounts of data in a program is how to do it efficiently. In Python importing the code could not be easier, but everything gets bogged down when you try to work with it and search for items inside of modify it. In these next few examples I will provide several solutions to working with a word list containing practically every word in the English Language to solve various problems.
If you are new to Python - a great first language - everything can be downloaded from here
Attached is the actual word lists in a .txt file. There are 113,808 individual words, each on their own line.
When you run the program the script and word list need to be in the same folder, otherwise you will get an error to the effect of "no such file exists"
Teachers! Did you use this instructable in your classroom?
Add a Teacher Note to share how you incorporated it into your lesson.
Step 1: Helpful Code Sections
Before we begin there are a couple of lines of code that you might find helpful for interpreting what is going on if Python is still new to you.
fin = open('words.txt') - This line opens up the file (it can be any file)
for line in fin: - This line goes through the file one line at a time up until the 113,809th line
word = line.strip() - This line removes all the white space and formatting so that only the characters in the specific line are stored in a string with variable name word.
Step 2: Words With "E"
In this program the challenge was to find how many words had the letter "e" in them or did not. One of the things to keep in mind though is that some words have "e" in them more than once, but they should only be counted as having "e" in them once.
This program runs through the word list line by line and tests to see whether "e" is in the word. If it is it keeps checking through the word to see if there are anymore "e". The total number of "e" are stored in a temporary variable. At the end of the word if that temporary variable is greater or equal to 1 does it then add 1 to the total number of words with "e", thus avoiding duplicates.
Step 3: Palindrome
What this program does is check to see if a word is a palindrome by starting with the first and last characters of a string and comparing them. If they are the same it works its way inward until it has completed all the letter or the number of letters is odd in which case the middle letter is ignored.
If it is a palindrome it returns True otherwise it returns False
This program only checks for individual words. For an extra challenge try modify it so that it goes through the entire word list and finds all the words with palindromes.
Step 4: Use Only Some/All of Provided Letters
In this program the challenge was to find all the words that could be made only with the supplied letters. Then more for fun we tried to form a complete sentence out of these words.
The words could use one, some, or all of the supplied letters. You can put in different combinations of letters to see what gives you the most or the least.
For a second program that was similar we had to modify it so that it would only print words that contained each and every word provided. However, if there were two letter "a" then the other word had to have 2 letter "a" not one and another letter of something else.
Step 5: Exclusion
In the exact opposite of the previous two this program's purpose was to find all of the words that did not contain any of the provided letters.
What combination of words do you think excludes the fewest words?
Step 6: Alphabetical Order
Now enough about characters in and out of words! For this program the challenge is to find all of the words that are spelled in alphabetical order. Our of 113,809 words how many do you think are spelled in alphabetical order?
The way this program works is to look at each word and go through it character by character comparing each character to the last character stored in a temporary variable. If it is in alphabetical order then it adds 1 to the total amount list.
You can actually compare characters in python for example is a < b ? Or you can convert it to a number ord[a] = 97 ord[b] = 98 and so on.
Step 7: Rotate
You could think of this as a very basic encryption or decryption program. What this does is take each letter and move it down x spaces (that you define) along in the alphabet so "a" moves 2 spaces = "c".
Since we talked about how to convert a character to number in this program it seems only fit to convert number to characters. Use the chr(x) function ex. chr(97) = "a".
This program works by converting each character in a string to a string to a number, adding the defined amount and then converting it back into a character.
Step 8: The Ultimate Project - Anagrams / Bingo Solver
This program - a compilation of them all sorts through the entire word list 100,000+ words and sorts them according to the characters that they are made up of. Amazingly it only takes Python roughly 1.5 seconds to run through the entire program. Then it sorts the anagrams according to the number of anagrams per character set from greatest to least. For example from 'islt' you can form 'list' and 'silt'
Part 2 of this is to see which 8 character set forms the most anagrams in order to see what is the maximum possible number of Bingos that can be formed. It turns out there are 7 possible words that can be formed with the solution.
The most possible anagrams of a character set is 11.
Participated in the
Hack It! Contest