How to Make an A.I. Part 2 - Code Modules

This is part 2 about steps I took to build an AI on a windows computer, using a free database, Programming development tool and the free built in TTS engine that comes with Windows.

The word "Windows" belongs to Microsoft.

The word "Dragon" belongs to Nuance.

Step 1: How to Make an AI Part 2.

Pick a programming language and get some tools.

There are many programming languages. Some are specialized for A.I. My favorite is Visual Basic, so that is what I used. I also work with SQL server databases, so I used that as well.

You can download free versions of these from the Microsoft website. Just search for “EXPRESS” on the Microsoft web site. [Visual Studio Express and SQL server Express]

Other languages you may want to us are: Python, C#, C++, Java, Prolog, Lisp, IPL

and many others. AIML is a “Markup Language” that is very interesting.

I wanted a better “speech recognition” program than the one that comes with Windows, so I bought the DRAGON software. I am using the standard “Text-to-speech” program that came with Windows.

Step 2: Design Your System:

Divide your big projects into a bunch of smaller projects. I divided my program code into modules.

I divided my code into different modules so that a particular function is easier to find.

I have modules named “Process Input”, “Process AI”, “Process Output”, “User Interface” and a few others. Some of my functions need to be accessible to all of the other code modules, so I put those functions in a “common” module where everything is shared

Step 3: Functions That Are Built Into the Programming Language:

Different languages may have different names for these, but all high level languages have similar functions.

LCase or ToLower: Converts a string to all lower case. I convert everything to lower case before doing a database search – even though most things are “case-insensitive” – Just in case.

Replace: Replace a string inside a string into another string. You can replace a string with an empty string “” to get rid of it. I get rid of periods, question marks, commas and other punctuation marks.

Split: Splits up a string into individual pieces and puts them into an array. This function will split a string on any character, or “Delimiter”. I split a sentence on a “space character” “ “ to make an array of words. This is called “Tokenizing” by the AI Gurus.

I use the individual words to build queries used to search the database. (More on this in my next article)

Step 4: Combine Built-In Functions to Create Your Own Functions

This is a “visual basic” example. Use your programming language to build something like this.

Of course, you will need to write a lot of code, and build many functions, using the programming language of your choice.

Step 5: What Do the Modules Do? “Input Processor”

There might be a hundred different ways to ask the AI the same question. For example; “What time is it?”, “Do you have the time?” “Do you know what time it is?”, “Can you tell me the current time of day?” Since the user is just asking for the time, I convert any of these Inputs to a single Output called “Query Time” using a database “look up” table.

You can write code to loop thru a table until it finds a match, or if you are using a SQL database, you can write a SQL query, like…

“Select Output from TableName where Input = ‘” whatever ‘”

…And then I send the output, “Query Time”, to the next code module; “Process AI”

Besides questions, there are many ways to say “Hello”

Hi, Hello, what’s up, hey, hola, how ya doin?, greetings, welcome, salutations, howdy ….

All of these are reduced to “Greeting”

When the AI processor sees “Greeting” it sends “Greeting” to the output processor, which will pick a random greeting from a database table, and speak it out loud.

Step 6: “AI Processor”

Process AI is the largest code module. It is so big that I divided it into sections as well.

The input is checked to see if the user spoke a command, or asked a question. Also, the AI can be in any of several “Modes” which means that the “Process AI” code is expecting the user to ANSWER a question, instead of ASKING a question.

If the user did not speak a command and the AI is not in a special “Mode” then it builds and executes a bunch of queries, out of combinations of the words in the “words array”. All of the query results are stored in a table, and each query result is given a “score” as to how closely the result matches what the user had spoken. The table is sorted by the score, and the result with the highest score is sent to the output, if it exceeds a certain threshold. If all of the scores are below the threshold, the AI may respond with “I don’t know” or “That does not compute”

Step 7: The "output and Scores" Table

The AI’s output from my input “What did the chicken do?”

Step 8: “Output Processor”

This does several “unrelated” looking things but they all have to do with getting the text from the AI Processor to the user.

Here is a list.

1. Text from the database may be in all lower case, and not have any punctuation. .Subroutines will capitalize the first letter, and put a period or question mark on the end.

2. Another subroutine will put apostrophes back into contractions, or convert contractions back into full words (i.e. “cant” is replaced with “can not”)

3. The text-to-speech engine does not pronounce some words the way I like, so the “Output Processor” replaces those words with a phonetic spelling.. I have database “look up” tables to hold these, similar to the one in the “input processor”

4. If the AI does not find a suitable reply in the database, it can say “I don’t know” but I don’t want it to say this over and over and over. Real people vary their responses. So there is a table with “Common Output” phrases and a function that picks one at random (and will never pick the same one twice in a row.)

5. The free “text-to-speech” (TTS) engine does not give a programmer a lot of options for the way sentences are spoken, but you do have a little control over the pitch and the speed of phonemes. The term for this is “Prosody”. I added some “prosody” codes into the text in my database, and when the “Output Processor” sees these, it adjusts the pitch and speed in the TTS engine as each word is spoken.

6. Sometimes the TTS is just hard to understand, so besides speaking words out loud, I also display them in big letters on my computer display. This part of the “User Interface” is a grid that shows the last 6 lines of a conversation, (User input & AI output) and scrolls up as new lines are added..