Introduction: Build a Chatbot Bear

About: I was acceptable in the 80's

Overview

This Instructable covers my attempts to write a chatbot C++ library for the Raspberry PI.

What makes this different to all the other Eliza style chatbots is the use of a database to hold information, SQL to retrieve this information and modifiers to customise the replies to the subject being discussed (sort of).

Supplies

Raspberry PI. I'm using a 3B model

Geany complier or similar

Compile command = g++ -Wall -c "%f" -lwiringPi -lsqlite3 -pthread

Build command = g++ -Wall -o "%e" "%f" -lwiringPi -lsqlite3 -pthread

SQLite or similar, but you're on your own with connecting it to C++ if you don't use SQLite

sudo apt-get install sqlite3

sudo apt-get install sqlitebrowser

and

sudo apt-get install libsqlite3-dev

for the C++ libraries

Picowave speech app

wget -q https://ftp-master.debian.org/keys/release-10.asc -O- | sudo apt-key add -
echo "deb http://deb.debian.org/debian buster non-free" | sudo tee -a /etc/apt/sources.list sudo apt-get update sudo apt-get install libttspico-utils

Step 1: Introduction

Bobbs (https://www.instructables.com/id/Build-a-Better-Be...) was cool but his intellect was limited by his Arduino brain. At the end of his Instructable I mentioned a program called Eliza written by some dude in the 60's (https://en.wikipedia.org/wiki/ELIZA) and I was thinking how cool would it be to have a real (not imaginary) two way conversation with a toy or robot or whatever.

This is the result. Bobbs now has a counter part called Bits, and because she talks a lot, she's female.

As an overview, I'm using C++ and SQL to interrogate a database which holds a dictionary and the select answers or questions from another. Using a database avoids a heap of hard coding. I haven't seen this approach used before, so maybe this is something new.

Currently Bits has a vocabulary of 650+ words (more than some people), and can chose from over 500 replies (also more than some people). She also has opinions on some subjects, will ask questions, and say things that might not be true.

Much like the original Eliza program, I'm not claiming Bits has any Artificial Intelligence; it's all done by luck and picking predetermined replies. But it's not too shabby.

I understand that the lexicon, replies and questions I'm using won't be helpful to everyone, therefore I'm going to try to explain how it works and allow you to make your own modifications. Incidentally, if anyone objects to the opinions contained within the lexicon, keep it to yourself and change the files.

Some screen dumps are including which show the conversations I've been having with a stuffed toy. That looks a bit mad now it's written down...

Limitations

Bits works best with short, simple sentences.

Bits will answer questions such as 'what do you think about?' but as she doesn't have any knowledge (also like some people), you can't ask 'where is London?'.

On occasion the user might have to lead the conversation, but that sometimes happens in real life too.

Step 2: How It Works

Bits determines what to reply based on its analysis of a string entered by the user. Where there are several possible answers, it randomly selects one. To extend the conversation, a question to the user may also be chosen.

There are three main processes

1) Understand the input

2) Chose a reply

3) Customise the reply and output it.

These are the steps Bits follows;

1) get user input as string

2) prepare the user input by restricting valid characters to lower case and 'space' characters.

3) Search the Lexicon to find any predetermined phrases and known words in the user text

Lexicon data format.

The lexicon is Bits's dictionary and contains a list of known words, the word's type (verb, subject), an opinion (good, bad, mid, boring, interesting), where applicable, an alias, and spare column. I should point out that while I have a reasonable grasp of English, the submissive clause is clearly something to do with S&M, and prepositions are just boring. I won't be using any of these terms and hope my English teacher is turning in his grave.

Order is important in the lexicon. For example, if we're looking for the phrase 'do you like' in the user's input 'do you like music' it needs to be before 'like' in the lexicon. The code starts searching at the first record, and if it finds 'like' in the user's input it will record a match and stop looking. This will give the wrong reply later in the code.

As we know, English words can be the same but have different meanings depending on the context in which they're used. In the sentence 'I hate running', from what I remember about English, 'running' is the subject, yet it can also be a verb. Like any database, the Lexicon can only hold one instance of 'running', so this sort of thing can be a problem.

I'm using phrases in the lexicon as the code is based on a version of the original Eliza program, and goodness knows how to make 'do' and 'you' and 'like' equal a question. I expect it's quicker not to bother.

Aliases are used to limit the number of replies required. For example, 'great' has more or less the same meaning as 'good', therefore we don't need separate replies for both, only one for 'good'.

Opinions are used to select the reply. A good opinion will select one answer, bad opinions select another.

NOTE change the column headers at your peril. The code will have to be updated as well.

The process continues below;

For the entire user input

Open lexicon table and read each record. Get the Lexicon word

Convert word to lower case and add 'space' characters to start and end.

use string.find(lexicon word) to find any instances of the lexicon word.

If yes, check for 'alias' type.

If 'alias' replace the word in the user input with the 'alias'

Otherwise assign the word or phrase by type eg subjects, adjectives, verbs, phrases, etc, and remove in from the user text

Start again

when the end of the lexicon is reached, and no words are found, record the user input as unknown word.

Any unknown words are then checked see if they are plurals. If yes, they are converted to the singular form and the lexicon is searched again.

4) Now we need to get a reply from the 'Replies' file. The obvious is sometimes the way forward.

Replies file format

The file is organised into Word and type fields. The type field is used as a lookup which returns the text in the word column.

I'm creating this to be used by the Pico2wave application so some of the content is spelt 'fonetically'.

Suppose we had a user input of 'do you like music'. Step 1 would identify 'do you like' as a question, and music as a subject with a 'mid' opinion. The search parameter in the SQL command would be

Select * from replies

Where type = 'do you like mid'

Looking at the replies file, there's a choice of two possible answers, and we only need one. The next bit of code picks one randomly

Order by random()

Limit 1

If a suitable reply can't be found then a default response is returned.

5) Modify the response. The original version of Eliza I was looking at replied in general terms, 'why would it be good', 'do you like that', etc. By modifying the answer from the previous step we can customise the response to appear that Bits is paying attention, and get results like 'why do you like ACDC?'. The answer is because they rock, if anyone was wondering.

I have a number of modifiers identified with the '#' prefix. The adjustAnswer function scans through the result of the previous step for the # character then replaces it, plus the next three characters, with the subject or whatever Some of the codes I'm using are

#tal - an anecdote about the subject

#sub - the subject

#vrb - the verb

#jok - a joke

#hob - a hobby

#ins - an insult

#mus - a word or phrase relating to music

An important code is #qxx, which refers to questions.

6) Add a question to the reply as a tool to extend the conversation.

Question file format

The question column contains the text of the question, Reply indicates what the next user input should contain, and situation is the lookup key. The next two columns are used to 'preload' the next input from the user. Suppose the user was asked 'what interests you', the input is 'preloaded' with 'i like' and flagged as an Eliza type, otherwise the user might enter 'ACDC' instead of 'i like ACDC'. The first example will give a reply of 'what about ACDC', which would be a poor response compared to 'yes I like ACDC as well'.

7) output final version of the reply

Step 3: The Code in Detail

Its all about the permutations and exceptions with English, so tons of if and else if. I'm making the assumption that you're reasonably fluent in C++ and have tried to avoid the use of pointers so keep things simple.

Note - I've found that Instructables adds in HTML to the code. It's also got something against greater than or less than characters. I've tried to correct them all but they keep coming back. Don't copy & paste the code - I'll upload some text files somehow.

First off, the bits code is called by the following;

#include <iostream>
#include "reply.h"
#include <string>
#include <stdio.h>
#include <stdlib.h>
using namespace std;
reply bits;  // create instance of reply called bits
string userText;
int main()
{
	struct status_t statusCard[14];
// clear structs array
	for (int x =0; x <15; x++)
	{
		statusCard[x] = {};
	}
	while (1)
	{
		printf("\x1B[32mUser :\033[0m"); // prints 'user' in green 
		printf("\x1B[31\033[0m"); //then swaps back to white
		getline(cin, userText);
		if (userText != "")
		{
		statusCard[5].word = userText; 
		userText = bits.replyMethod(statusCard, 14);
		printf("\x1B[32mBits :\033[0m");
		printf("\x1B[31\033[0m");
		cout << userText + "\n";
		}
	}
	return 0;
}

I'm passing data to the bits code using structs called statusCard[x] so we can have some continuity between answers.

We'll also need a header file. Nothing to tricky; brings in the string library, defines a struct template and gives us access to the method.

#ifndef REPLY_H
#define REPLY_H
#include < string.h >

using namespace std;

struct status_t
	{
		string word;
		string type;
		string opinion;
		string alias;
		string tag;
		int count;
	};

class reply
{
	public:
	string replyMethod(status_t *statusCard, int i);	
};
#endif

The struct template holds the word or phrase, what type of text it it, i.e. subject, verb, etc, an opinion about the word, mainly used for subject types to pick a reply. The alias was explained earlier, and reduces the number of replies required, and a tag. Count is used as a flag.

Main code next

Step 4: Main Code - Understanding the Input

First thing the code does is some house keeping; get the last interaction from the statusCard structs, reset some variables, seed the random number generator and get the user's input. The structs are global variables to avoid having to pass them to functions and return multiple results.

// assign structs
 lastSubject = statusCard[7];
 lastVerb = statusCard[8];
 lastAdject = statusCard[9];
 lastElizaInput = statusCard[10];
 lastUnknown = statusCard[11];
 lastKeyword = statusCard[12];
 query = statusCard[13];				
	
string	userText = statusCard[5].word;
	
	
// clear sentence components
	string word = "";
	string type = "";
	string answer = "";
	string sqlCommand;
	
	goEliza = false;
	goKeyword = false;
	subject = {};
	verb = {};
	adject = {};
	elizaInput = {};
	keyword = {};
	unknown = {};
	temp = {};

// seed random number generator
	srand(time(NULL));

Next we're going to convert the user's input to lower case and remove any none alphanumeric characters. This will make matching the lexicon and user text a bit easier.

string prepareText(string userText)
{
string prepUserText = "";
// remove punctuation and convert to lower case
// restrict valid characters to A to Z, a to z and space only

for (int count =0; count < (int)userText.length(); count++)
	{
		if (userText[count] != '\'')
		{
			if ((int(userText[count]) > 64 && int(userText[count]) < 91)
			|| (int(userText[count]) > 96 && int(userText[count]) < 123)
			|| int(userText[count]) == 32)
			{
				prepUserText += tolower(userText[count]);
			}
		}
	}
	return prepUserText;
}

Once that's done, it's time to see if we recognise any words or phrases in the user's input.

Using SQL we're taking a word from the Lexicon table, converting it to lower case (see above), and testing to see if it appears in the user's text using str.find. Preceding and following spaces are added to the Lexicon word to avoid the word ' and ' being found in 'candy'.

string lexiconSearch(string userText, string table)
{
	string lexiconWord;
	string sqlCommand; 
	size_t found;
	bool matchFlag;

// Get size of lexicon
	int lexiconSize =0;
	sqlCommand = "select count(*) from " + table;
	sqlCall(sqlCommand);
	lexiconSize = result.count;

restart:	
	userText = " " + userText + " ";
	result = {};
	matchFlag = false;

	for (int count =1; count < lexiconSize +1; count++)
	{
		sqlCommand = "select * from " + table +" where _rowid_ =" + to_string(count);
		sqlCall(sqlCommand);
	// ensure lowercase
		for (int i =0; i < (int)result.word.length() +1; i++)
		{
			result.word[i] = tolower(result.word[i]);
		}
						
		lexiconWord = " " + result.word + " ";
		
		found = userText.find(lexiconWord);

The key bit of code above is the 'sqlCall(sqlCommand) line, which executes the SQL code. This isn't my code and I can't remember where I lifted it from.

void sqlCall(string sqlCommand)
{
  result = {};
   sqlite3 *db;
   char *zErrMsg = 0;
   int rc;
   const char *sql;
   const char* data = "Callback function called";

  /* Open database */
  /*CHANGE THE PATH BELOW TO PICK UP YOUR DATABASE*/
   rc = sqlite3_open("/home/pi/superbits/Chat/lexicon.db", &db);   
   if( rc ) 
   {
      fprintf(stderr, "Can't open database: %s\n", sqlite3_errmsg(db));
   }
sql = sqlCommand.c_str();

   /* Execute SQL statement */
   rc = sqlite3_exec(db, sql, callback, (void*)data, &zErrMsg);
  
   if( rc != SQLITE_OK ) 
   {
      fprintf(stderr, "SQL error: %s\n", zErrMsg);
      sqlite3_free(zErrMsg);
   } 
  sqlite3_close(db);				
}

The results from the SQL code are extracted using the callBack function, also lifted from somewhere, but modified. Dependent on the column names, the data is loaded into a struct called result.

static int callback(void *data, int argc, char **argv, char **azColName)
{
// get record from Dbase and cast types to fit result struct
			
   int i;
   string colAsString;
   
   for(i = 0; i <argc; i++)
{
	colAsString = (string)azColName[i];
	
	if (colAsString == "Word")
			{
			result.word = (string)argv[i];		
			}	
	else if (colAsString == "Type")
			{
			result.type = (string)argv[i];
			}
	else if (colAsString == "Alias")
			{
			result.alias = (string)argv[i];
			}	
	else if (colAsString == "Opinion")
			{
			result.opinion = (string)argv[i]; 
			}
	else if (colAsString == "Tag")
			{
			result.tag = (string)argv[i];
			}
	else if (colAsString.substr(0,5) == "count")
			{
			result.count = stoi(argv[i]);
			}
// special case for questions
	else if (colAsString == "Question")
			{
			query.word = (string)argv[i];
			}
	else if (colAsString == "Reply")
			{
			query.type = (string)argv[i];
			}				
   }   
   return 0;
}

Back in the Lexicon search function, If a match is found, we check if the type is 'alias' (see the 'how it works' section ), in which case we delete the word and replace it with our new one, then start again. Otherwise we recognise a word, stop searching, and set the matchFlag to true.

if (found != string::npos)
		{
			if (result.type == "alias")
				{
				// restart search
				userText.erase(found, (int)lexiconWord.length());
				userText.insert(found, " " + result.alias + " ");
				userText = prepareText(userText);
				count = 0;
				
				goto restart;
				}
			else
				{
				// stop searching
				userText.erase(found, (int)lexiconWord.length());
				userText.insert(found, " ");				
				count = lexiconSize;	
				matchFlag = true;
				}			
			break;
		}		
	}

Finally if we have matched a word, we assign it to a word type and continue searching until the end of the Lexicon. Otherwise, we record an unknown word.

if (matchFlag)		{
		assign(false);
		matchFlag = false;
		goto restart;
		}
	else if (!matchFlag)
		{
		// no match found, record as unknown
		result.type = "unknown";
		result.word = userText;
		result.opinion = "mid";
		assign(false);
		}	

	return userText;

Here's the word being assigned. Depending on the word type, the word is assigned to a particular struct, provided a word hasn't already been recorded in that struct. The boolean overWrite variable gives the option to (erm) overwrite the struct. We're splitting the words into variables, verbs, adjectives, etc.

void assign(bool overwrite)
{
		if (result.type == "subject" && (isBlank(subject.word)
			|| overwrite))
			{
				subject.word = result.word;
				subject.opinion = result.opinion;
				subject.type = result.type;
			}
		else if (result.type == "verb" && (isBlank(verb.word)
			|| overwrite))
			{
				verb.word = result.word;
				verb.opinion = result.opinion;
			}
		else if (result.type == "adject" && (isBlank(adject.word)
			|| overwrite))
			{
				adject.word = result.word;
				adject.opinion = result.opinion;
			}
		else if (result.type == "unknown" && (isBlank(unknown.word)
			|| overwrite))
			{
				unknown.word = result.word;
				unknown.opinion = result.opinion;
			}
		else if (result.type == "eliza" && (isBlank(elizaInput.word)
			|| overwrite))
			{
				elizaInput.word = result.word;
				goEliza = true;
			}
		else if (result.type == "keyword" && (isBlank(keyword.type)
			|| overwrite))
			{
				goKeyword = true;
				keyword.word = result.word;
			}
		else if (isBlank(subject.word) && (result.type != "keyword"
				&& result.type != "eliza" && result.type != "adject"
				&& result.type != "verb" && result.type != "unknown"
				&& result.type != "subject" ))
			{
			// default
				subject.word = result.word;
				subject.opinion = result.opinion;
				subject.type = result.type;
			}
	result = {};
}

More or less, that's the user input understood. I make one last check, which is to check for plurals, convert them to singular form, and run the Lexicon search once more.

string makeSingle(string word)
{
	string testWord;
	string subWord;
	int length = (int)word.length();
	
// plural ends ies
	if (length > 3 && word.substr(length-3,length) == "ies")
	{
		for (int i=0; i < length-3; i++)
		{
			testWord += word[i];
		}
		testWord += 'y';
	}	
// plural ends es
	else if (length > 2 && word.substr(length-2,length) == "es")
	{
		for (int i=0; i < length-2; i++)
		{
			testWord += word[i];
		}
	}
// regular nouns
	else if (word.substr(length-1,length) == "s")
	{
		for (int i=0; i < length-1; i++)
		{
			testWord += word[i];
		}
	}
	return testWord;
}

Step 5: Picking an Answer

After the first step we have the user's input split into the following categories according to the type of word or phrase determined by the Lexicon.

Subject

Verb

Adjective

Keyword - The Lexicon search function has detected a word that needs to be dealt with outside of the normal processing. In this iteration, I'm only using the code to replace 'it' with the previous subject.

Unknown - a word or phrase has been found that doesn't appear in the Lexicon

Eliza - showing the program's roots with this. A phrase has been found for which a particular group of answers exists.

From a programming view, these are all global structs.

First up, is there any information carried forward from the previous interaction? If so, restore it. I'll explain this later as it fits better with the questions piece.

Next, are we dealing with an Keyword type word?

if (goKeyword)
	{
		keywordReply();
	}

If so, do the keywordReply function to load the previous unknown word or subject.

void keywordReply()
{
// specific instructions
	string answer;
	if (keyword.word == "i" || keyword.word == "it")
	{
		if (isBlank(subject.word))
			{
			if (!isBlank(lastUnknown.word))
			{
				unknown = lastUnknown;
			}	
			else
			{
				subject = lastSubject;
			}
			}
		}
}

Then decide if we're dealing with an Eliza type or some ad-hoc input. The global 'goEliza' flag was set in the assign function when this type of word is found.

if (goEliza)
	{
		answer = eliza();
	}
	else
	{	
		answer = getReply();
	}

Let's assume we have an Eliza word.

We need to check for the word 'not' in the text as it changes the opinion of an adjective, e.g. 'not good' should have opinion = bad.

There are special replies for 'what', 'body', and 'where' which don't fit into the normal processing pattern, so we need to adjust for them. The output from this function is two parameter which feed into a SQL command. Parameter 1 is the eliza word, parameter 2 is an opinion dependent on words found in the user's text.

string eliza(){
	string searchParameter1;
	string searchParameter2;
	string answer = "";

	checkForNot();
	searchParameter1 = elizaInput.word;
		if (elizaInput.word == "what")
			{
				searchParameter2 = subject.word;					
				if (!isBlank(adject.opinion))
				{
					searchParameter2 += " " + adject.opinion;		
				}	
			}
		else if (elizaInput.word == "where")
			{
				if (subject.opinion != "good" && subject.opinion != "bad"
				&& subject.opinion != "boring" && subject.opinion != "interesting"
				&& subject.opinion != "mid")
				{
					subject.opinion = "";
				}
			}
		else if (subject.type == "body")
			{
				searchParameter1 = subject.type;
				if (!isBlank(adject.opinion))
				{
					searchParameter2 = " " +adject.opinion;
				}
			}
		else if (subject.opinion != "") 
			{
					searchParameter2 = subject.opinion;
			}
		else if (verb.opinion != "")
			{
					searchParameter2 = verb.opinion;
			}
		else if (adject.opinion != "")
			{
					searchParameter2 = adject.opinion;
			}
	return checkAnswer(answer, searchParameter1, searchParameter2);	
}

Suppose we don't have an Eliza word? Then we run the ad-hoc function. I thought is would be cool for bits to express an opinion about the subject which is what the nested SQL statement is doing.

Once that headache is done, we build up the two search parameters as in the Eliza function.

string getReply()
{
	string searchParameter1 = "";
	string searchParameter2 = "";
	string answer;
	
	checkForNot();

		if (!isBlank(subject.word) && isBlank(verb.word)
			&& isBlank(adject.word))
			{
			// just subject entered 
			// nested sql statement to find something to say about subject
			searchParameter1  = " Select word from lexicon ";
			searchParameter1 += " where type = \"adject\" and ";
			searchParameter1 += " opinion = (select opinion from lexicon ";
			searchParameter1 += " where lower(word) = lower(\"" + subject.word + "\")) ";
			searchParameter1 += " order by random() ";
			searchParameter1 += " limit 1";
			sqlCall(searchParameter1);		
			answer = subject.word + " is " + result.word + ". ";	
			searchParameter1 = "sub only";
			}
		else if (subject.opinion == "")
			{
				searchParameter1 = verb.opinion;
				searchParameter2 = adject.opinion;
			}
		else if (adject.opinion == "")
			{
				searchParameter1 = subject.opinion;
				searchParameter2 = verb.opinion;				
			}
		else
			{
				searchParameter1 = subject.opinion;
				searchParameter2 = adject.opinion;
			}
		
	return  checkAnswer(answer, searchParameter1, searchParameter2);
}

The parameters are fed into a function which checks the answer. To limit the chances of receiving a blank answer from the SQL call, we run it three times; firstly with both parameters, then with the parameters switched round, and finally, just one parameter. The SQL call returns one randomly selected answer.

string checkAnswer(string answer, string searchParameter1, string searchParameter2)
{
	searchParameter1 = checkSpaces(searchParameter1);
	searchParameter2 = checkSpaces(searchParameter2);
	string sqlCommand = "";
							
// run sql try both parameters
	sqlCommand = "select * from Replies where lower(type) = lower(\"";
	sqlCommand += searchParameter1 + " " + searchParameter2 + "\")";
	sqlCommand += " order by random()";
	sqlCommand += " limit 1";
	sqlCall(sqlCommand);
// check result isn't blank	
	if(isBlank(result.word))
		{
		if(isBlank(searchParameter1))
			{
			searchParameter1 = searchParameter2;
			}
// run sql with parameters switched
			sqlCommand = "select * from Replies where lower(type) = lower(\"";
			sqlCommand += searchParameter2 + " " + searchParameter1 + "\")";
			sqlCommand += " order by random()";
			sqlCommand += " limit 1";
			sqlCall(sqlCommand);
		}	
// check result isn't blank	
	if(isBlank(result.word))
		{
		if(isBlank(searchParameter1))
			{
			searchParameter1 = searchParameter2;
			}
// run sql with one parameters
			sqlCommand = "select * from Replies where lower(type) = lower(\"";
			sqlCommand += searchParameter1 + "\")";
			sqlCommand += " order by random()";
			sqlCommand += " limit 1";
			sqlCall(sqlCommand);
		}	
	
	return answer + " " + result.word;	
}

Now we have a first draft of an answer, but it needs tidying up.

Step 6: Adjusting the Answer

This is probably the worst bit of the project. The previous section produced a draft answer which looks a bit like this

"maybe your not a #adj #sub #qst"

The 'maybe your not a' bit is straight forward, but the rest? Just to add that 'your' is spelt phonetically for use with Pico2wave, and it's not a typo.

#adj, #sub and ##qst are instructions (which I'm calling modifiers) to the adjustAnswer function to add in some more text. This is something the original code lacked and sort of makes bits appear to be paying attention and not just returning generic answers. It also handles questions and ancedotes.

Let's do a walk through,

User: what is your least favourite band?

From this bits gets 'what music bad' and selects the following draft answer as per the previous section.

draft answer: !mus$b has a special place in hell #qr2

This goes into the adjustAnswer function.

!mus means get the word of a music

$b means get a word with the opinion = bad

Kpop would be an excellent example of bad music.

#qr2 means get a question to extend the conversation.

Here's the first bit of code which creates some variables. Nothing too bad here

string adjustAnswer(string draftAnswer)
{
	int count = 0;
	
	string sqlCommand;
	string searchParameter;
	string searchOpinion;
	string searchTable;
	string mod; // answer modifier
	string type; // further detail on modifier
	size_t found;
					
re_check:
	sqlCommand = "Select * from ";
	searchParameter = "";
	searchOpinion = "";
	searchTable = "";

The next bit finds modifiers that begin with a ! character. We don't want !mus$b to appear in the final answer so the modifier (mus) and type (b) are saved, then the text is deleted. Finally the parameters for the SQL command are determined and we jump to the SQL call.

found = draftAnswer.find("!");

if (found != string::npos)
	{
		// delete !
		draftAnswer.erase(found,1);
		// save modifier
		mod = draftAnswer.substr(found,3);
		// save modifier type	
		type = draftAnswer.substr(found+3,2);
		
		// delete from answer
		draftAnswer.erase(found,3);
		// delete modifier type
		if (type.substr(0,1) == "dollar sign") - instructables goes wrong here
		{
			draftAnswer.erase(found,2);
		}	
	
		searchTable = "lexicon";
		searchParameter = " where tag = \"!" + 
			mod + "\" ";	
		goto update;		
		}

The next bit goes on for ages so I'm not going to reproduce it in full. It's also full of if and else ifs. When I figure out how to upload the code, you can look at it there.

Modifiers that start with '#' character are dealt with here. Again we don't want #qr2 to appear in the final answer so the modifier is saved and the text deleted.

The SQL call parameters are set depending on the modifier

found = draftAnswer.find("#");

	if (found != string::npos)
	{
		// delete #
		draftAnswer.erase(found,1);
		// save modifier
		mod = draftAnswer.substr(found,3);
		// save modifier type	
		type = draftAnswer.substr(found+3,2);
		
		// delete from answer
		draftAnswer.erase(found,3);
		// delete modifier type
		if (type.substr(0,1) == "dollar sign")  - yep, instructables strikes again
		{
			draftAnswer.erase(found,2);
		}	
				
			if (mod == "tal")
				{
				searchTable = "replies";
				searchParameter = " where lower(type) = lower(\"tale ";
				if (subject.word !="")
					{
						searchParameter += subject.word + "\") ";
					}
				else if (verb.word !="")
					{
						searchParameter += verb.word + "\") ";
					}
				}				
			else if (mod == "jok")
				{
				searchTable = "replies";
				searchParameter = " where type = \"joke\" ";
				}
etc
			else if (mod.substr(0,1) == "q")
				{
				query.opinion = mod;
				draftAnswer.insert(found,question());	
				}	
		}

In our example we have the modifier #qr2. This tells adjustAnswer to get a question, and we'll look at this in the next section.

We haven't done the $b modifier yet, and that's here. A parameter called 'searchOpinion' is added into the SQL command.

update:		
	// adjust sqlCommand for additional modifer 
		if (searchTable == "lexicon")
			{
			if (type == "$b")
				{
				searchOpinion = " and opinion = \"bad\" ";
				}
			else
				{
				searchOpinion = " and opinion =\"good\" ";
				}
			}

Next run the SQL command and insert the result into the draft answer.

<p>if (searchParameter !="")
			{
			result.word = "";
			sqlCommand += searchTable;
			sqlCommand += searchParameter;
			sqlCommand += searchOpinion;
			sqlCommand += " order by random()";
			sqlCommand += " limit 1";

			sqlCall(sqlCommand);			
			draftAnswer.insert(found,result.word);	
			}

There's another bit which relates to questions, so I'll save that until later.

The answer needs to be checked to ensure it's not blank and that all the modifiers have been dealt with. If the answer's blank then we get a default answer and clear the user's input so that it doesn't get carried forward to the next interaction. The program loops back to the start if there are any other modifiers left. Need to be really careful here that we don't end up in an endless loop, so I've limited the code to seven loops.

// check for valid answer

	if (isBlank(draftAnswer))
		{
		// get unknown reply
		sqlCommand = "select * from Replies where type = \"unknown\"";
		sqlCommand += " order by random()";
		sqlCommand += " limit 1";
		sqlCall(sqlCommand);
		draftAnswer = result.word;		
		// start again with input
		clearAllStructs();
		}

// have all the modifiers been dealt with
	if ((draftAnswer.find("#") != string::npos
		|| draftAnswer.find("!") != string::npos)
		&& count <7)
		{
		count ++;
		goto re_check;
		}
	return draftAnswer;
}

Finally the final answer is returned to be outed.

Step 7: Questions

Questions are used to extend the number of interactions with the user. They start of life as part of the draft reply and are actioned by the adjustAnswer function.

The reply (from the replies table) will look something like this;

now it makes sense. #qst

The reply is processed by adjustAnswer, specifically this bit of code

else if (mod.substr(0,1) == "q")
	{
	query.opinion = mod;
	draftAnswer.insert(found,question());	
	}

Which then calls the question finding bit of code. This generates another SQL command to find a question in the questions table depending on the modifier used in the adjustAnswer. Once the question has been picked we overwrite any existing structs, and flag that a question has been asked.

string question()
{
	string answer;
	string sqlCommand;

// get question
	sqlCommand = "select * from Questions where lower(situation) = \"";
	sqlCommand += query.opinion + "\"";
	sqlCommand += " order by random()";
	sqlCommand += " limit 1";

// clear assigned words
	elizaInput = {};
	sqlCall(sqlCommand);
	assign(true);
	query.count = 1;	
	return query.word;
}

Why do we do this? Firstly when the user is asked 'Who is your favourite super hero?' we don't what 'green' as the answer, so there's some error checking. The code picks up that the question flag has been set and runs the following code when processing the user input.

// has a question been asked
	if (query.count == 1)
	{	
// verify reply
	answer = verifyReply();
	query = {};
	if (answer != "")
		{
			goto done;
		}
	}

The verify answer function looks like this. If we get a duff answer from the user then bits returns a reply that says 'what' or something similar, and it clears all the structs so we start from fresh with the next input. An acceptable answer from the user is allowed to continue in the normal program flow.

string verifyReply()
{
	string sqlCommand;
	string answer;
	string parameter = "";

 if (query.type == "#sub" && !isBlank(unknown.word) && isBlank(subject.word))
	{
		parameter = "question unknown";
	}
 else if ((query.type == "#sub" && isBlank(subject.word))
		|| (query.type == "#adj" && isBlank(adject.word))
		|| (query.type == "#vrb" && isBlank(verb.word))
		|| (query.type == "#qyn" && (elizaInput.word != "yes" 
		|| elizaInput.word != "no" || elizaInput.word != "maybe")))
		{
			parameter = "question bad";
			clearAllStructs();
		}	
		
		
	if (parameter != "")
		{
			sqlCommand = "select * from Replies where type = \"";
			sqlCommand += parameter + "\"";
			sqlCommand += " order by random()";
			sqlCommand += " limit 1";
			sqlCall(sqlCommand);
			answer = result.word;
		}
	else
		{
			answer = "";
		}

	return answer;
}

We also want to pre-load the user's response, so if asked 'how are you', and the user replies 'good' then that wouldn't be recognised as 'i am good', so we add in 'i am' when the struct data is saved (see later)

A question from the question table might look like this (except in real life it's ordered into a nice table. Thanks so much Instructable editor for messing up my layout again - see image at top of this section)

Index Question Reply situation Word Type
5 whats up buttercup #xxx$f #adj qo1 I am eliza

In this instance the verifyAnswer function will expect to see an adjective in the user's response. This question is picked up from replies carrying the modifier #qo1, and it preloads the users response with 'I am'. If the user just replies 'good' then this is interpreted as 'i am good'.

The question table includes more modifiers for the adjustAnswer function, in this case #xxx$f. The #xxx doesn't do anything (and could possibly be removed), while the $f bit tells adjustAnswer to clear all the structures except the Eliza and subject structs. We do this to clear out all the old data but retain the preloaded Eliza data and subject, i.e. it remembers only the Eliza word and subject.

else if (type == "$f")
	// save eliza input and subject
	{
	saveEns();
	}
void saveEns()
{
	// clear previous responses and save Eliza Input and Subject
		adject = {};
		keyword = {};
		unknown = {};
		verb = {};
}

Lets assume that we've processed all the modifiers and have our finished answer. At this point we're back in the replyMethod code, and the last thing to do is to save the information in the structs.

statusCard[7] = subject;
		statusCard[8] = verb;
		statusCard[9] = adject;
		statusCard[10] = elizaInput;
		statusCard[11] = unknown;
		statusCard[12] = keyword;
		statusCard[13] = query;

Then the last thing is to return the answer to whatever called the routine

return answer;

Done.

Step 8: Adding Your Own Words, Replies and Questions

To add words to the Lexicon you'll need;

The word or phrase

What type of word (subject, verb, Eliza, alias). This determines how the code searches for a reply

Tag - is the word related to music (!mus), sport (!spt), film (!flm)

Alias - if applicable otherwise leave blank

Opinion - if you have a subject, adjective or verb, is it a good thing or a bad?

The order of the word or phrase is important. The code searches from row one to the end, and stops when it finds a match. If you have the phrase 'do you like' and the word 'like' is before the phrase, then it'll stop when it finds 'like', and you'll get the wrong answer.

To add an answer to the reply table you'll need;

The text of the answer

Within the text you'll need to add modifiers like #sub, #qst, etc.

A search key, for example 'I like good' will find answers with the Eliza phrase 'i like' and the subject, verb or adjective opinion 'good'.

To add a question to the questions table you'll need;

The question text, including modifiers to save preloaded user responses and subjects

The expected reply type

A search key which is taken from the answer text (#qst)

If you want to preload the users response, enter text in this column

and the type in the next.

Tips

1) Ensure that the reply to an input doesn't like to a question asking for a similar input to the one just processed, e.g.

User: I like [something]

Bits: [reply] it's okay [question] what do you think?

you'll end up going in circles

2) Spelling. Really important that the Lexicon has the correct spellings or the words won't be found.

3) Check the question follows logically on from a reply, e.g.

Bits: [reply]we don't talk about that. [question] why do you ask?

4) Give the chatbot a personality with phrases such as 'what's up buttercup' or 'live long and prosper', if that's your kinda thing.

5) Keep replies in the replies table and questions in the questions table. It gives you more control.

6) Don't have something in the Lexicon like 'good' is an alias of 'great' and 'great' is an alias of 'good'. One way trip to endless loop city.

Step 9: Further Development

Bits needs a routine to tidy up her English before outputting the answer.

It'll never pass the Turin test without more computing horse power. The Pi 3 B is a bit slow.

I also need to explain how Bits was made but this Instructable is long enough already. One for another time I think.

Step 10: Files

Question, replies and Lexicon are attached (hopefully). They make more sense when opened in Excel or Calc as a CSV import.

I'm really hoping you'll be able to copy and paste the chat code text.txt file into a suitable editor.