Introduction: Q Learning Crawler Bot
Welcome to the Q-Learning Instructables! In this project we intend to teach you the basics of Q-Learning, a descendant of Machine Learning. The instructions, tutorials and files required to print and build your own 2 degree of freedom crawler are provided within this project.
This project is intended for hobbyists and students with a fascination for proof and replication of machine intelligence. Although the main subject is about machine learning, you will also find that this project will also provide you with a definite example of mechatronics engineering, the integration of mechanical, electrical and computer systems.
We have included a bill of materials that you will need in order to build this crawler. First off, you will need access to a 3-D printer for the crawler bot parts, as well as a computer for the programming and testing. If you do not have access to a 3-D printer, there are companies online that will print and ship the parts to you.
In addition, experience with coding and computing in general are required, as well as some experience with 3-D modeling for the parts.
Step 1: Machine Learning Robot: Crawler
In this project, you will be applying the Q-Learning algorithm to the CrawlerBot in order for it to learn how to move forward. The robot will pull itself forward using a two degree of freedom arm by stepping through a state map as seen above.
The state map represents the various positions of the front and rear arm. As the robot transitions to the right on the state map, the front arm rotates up. As it transitions down on the state map, the rear arm rotates down. The robot will use the Q-Algorithm to learn what is most optimum sequence of movements (or state transitions) in order to move forward. These state transitions are rewarded, and are stored in the robot's representation of its own reward map.
The wheels of the robot are not motorized. The right wheel simply has an encoder attached to it in order for the robot to know how much it has moved forward or backwards. The algorithm uses the input of the encoder in order to solve for the "quality" of the state transition. This "quality" can either be positive or negative depending on the direction that the bot moved. The algorithm also takes into account several other variables such as the previous quality of the same transition, the level of random exploration of the state map, and more. The details of the algorithm will be discussed later.
As the robot continues to explore the set of all possible states of servo combinations, it slowly learns to crawl due to rewarded behavior (moving forward) being reinforced and negative behavior (moving backwards) being dissuaded by negative reinforcement in the reward map.
Here is a Q-map simulator that allows you to explore the basics of reinforcement learning and testing out ideas of how to explore states and state transitions. Clicking on the states on the reward map changes the amount of reward the bot would receive for making the transition. The amount of reward varies between +1.0 and -1.0 stepping by 0.5. You can manually change states, or set it to auto and speed it up. Auto mode uses the Q-Algorithm to transition from state to state. You can observe the variable values on the bottom right as it moves around on the map.
Step 2: Assembly Part 1 - Electrical
Step 3: Assembly Part 2 - Mechanical
Step 4: Q Learning Tutorial
HereIn this tutorial, we're going to be teaching you the basics of Q Learning, as well as how it applies to this robot.
Q learning is a form of machine learning called model free reinforcement learning. Model free simply means that the program, or machine, has no knowledge of the environment it is in, or indeed of anything. Reinforcement learning means that there are certain states, or state transitions, that are given a certain reward.
Inside the "brain" of the robot, there is a matrix of values corresponding to the present state, along with memory of state transitions.This matrix, Q, contains the values that the robot has learned about the state transitions. It updates every state transition given the reward values and the learning parameter.
As the robot continues learning, it updates its own knowledge of the system, learning by itself about the traps and pitfalls of the actions it is allowed to take. Our robot, for example, rewards movement forwards and penalizes movement backwards. Here is a good tutorial on Q learning, but some basic knowledge of linear and matrix algebra might be helpful.
Step 5: Raspberry Pi Setup
In this section we'll cover setting up the Raspberry Pi, connecting it, and setting up the OpenCM environment. We need the Raspberry Pi Zero, an SD card, a USB cable (one end regular, one end microUSB) and an internet connection.
Let's get started. First we need to download the Raspbian OS image for the Pi Zero and flash it onto an SD card.
Head here to obtain the download. Download the Raspbian Jesse with Pixel Zip file. It's 4GB in size, so it'll take a bit to download.
A good image flashing software that we used was Etcher. The pictures above show the process of opening Etcher, selecting the image to be flashed, and then flashing it using Etcher.
Now that you've downloaded the Raspbian image and flashed it onto the SD card, it's time to set up the Raspberry Pi to run headless. What this means is that we won't require a keyboard, mouse, or monitor to run the Raspberry Pi zero. You will be able to run everything over a USB chord or over wifi. To do this we need to set up SSH on the Raspberry Pi zero, as well as set up a VNC and enable wifi..
There is a good tutorial on setting up SSH through USB here. Work through the tutorial to change the config.txt file, and the cmdline.txt file.
Newer versions of Raspbian Jessie have SSH disabled by default. In order to enable SSH we're going to have to create a file called "ssh" in the boot folder of the image disk. Open notepad, and and save an empty file called "ssh" into the root folder on the SD card. In explorer, navigate to the SD card and rename the file you just created to "ssh", but delete the .txt extension. Now you should have a file named "ssh" in the boot folder with no extension. This will enable SSH on the Pi at boot up.
Before opening an SSH connection, we need to download some software.
- Download PuTTY. Putty is a good terminal emulator. It allows you to control the Raspberry Pi through terminal commands from your PC.
-Download Bonjour Print Services. This may seem strange, but software included in Bojour does networking magic which allows you to use the USB interface. For more information about why Bonjour is necessary, look at this.
Finally, we are ready to run the Raspberry Pi. Unplug the SD card, put it into your Raspberry Pi. Connect the Raspberry Pi to your computer using a USB chord. This will power on the Pi and start the boot up process. You may need to wait a minute of two to make sure the Pi is booted before trying to connect through SSH. If you are using a Pi zero, make sure you are using the connector that says "usb" not the one that says "pwr."
Time to make sure everything works.
- Where it says host name type: raspberrypi.local
- Make sure SSH is selected.
- Port: 22
- (optional) Once you have all of the settings set, you can save them by typing something like "RaspberryPi" in the saved sessions box, then hit save. Now whenever you open putty in the future, just double click the name you saved or load it and click open. This can save a lot of time if you are working through ssh a lot.
-After setting the Hostname, Port, and SSH, click Open
-If everything goes smoothly you should see a terminal prompt that say "login as:"
- The default login for the raspberry pi is as follows:
- login as: pi
- password: raspberry
- (when typing the password, you will not see that you typed anything this can be confusing the first time)
Now you are logged in, and can control the Raspberry Pi through the terminal. As a test, you can type "pwd", and you should see something like "/home/pi." This means you can see the /home/pi folder on the raspberry pi. "pwd" is a Linux command that stands for print working directory. If you are not familiar with navigating Linux based systems using the terminal, it is worth taking some time to learn some Linux commands, as working with a Raspberry Pi often requires working through the terminal. You can always do this later as well once everything is set up!
Take a breather. You have done a lot so far. Now that SSH is set up, we can set up a VNC so we can actually see images and the GUI that comes with Raspbian.
We need to download some more software.
Download RealVNC on your PC. Go to the downloads page, select your platform, and click download.
RealVNC is the VNC service that comes with Raspbian Jessie. There is some good documentation for getting it set up on the website here, but to make sure we install the newest version of RealVNC on the Pi, would need to set up wifi first.
At this stage, the Pi you are using will determine the easiest path forward. If you are using a Pi Zero (no W), then it may be easier to set up your wireless network first since you will have to configure it through SSH anyway.
If you have a Pi with built in Wifi or multiple USB ports (not Pi Zero), it may be easier to set up VNC first over USB or Ethernet and then connect to Wifi using the much more intuitive GUI. To do this, use the RealVNC setup tutorial (linked above), and skip the step that tells you to make sure you have the latest version. Once you download the VNC viewer on your computer, you can connect with the same host name you used for PUTTY. (raspberrypi.local by default). You will have to enter your Pi's login info (i.e. username: pi password: raspberry). Now you should see the GUI on you Pi! You can click the Wifi symbol at the top right of the screen and set up your network. You may need to adjust the resolution on the Pi if some of the windows don't fully fit on the screen. Now use the RealVNC setup tutorial to set up VNC over the cloud. This way you can login without fooling with IP addresses. Now that you can connect through the cloud, your Pi is setup. There is no need to read the rest of this tutorial.
If you have a Pi Zero (no W), let's go ahead and get connected wirelessly before setting up a VNC. To set up network connections using SSH, we will need to alter a configuration file. Let's back up the file in case something happens.
- Connect to the Pi using PUTTY. Then login.
- Open the config file using nano by typing the following
- type: sudo nano /etc/network/interfaces
- now you should see network information. Lets save this file somewhere else in case something goes wrong.
-hit: ctrl-o (this is how you save in nano)
- You will see highlighted at the bottom, "File Name to Write: /etc/network/interfaces"
- type something like: ~/interfaces_backup
-This saves the backup to your home directory (home directory is represented by '~' in linux systems)
Now you are ready to follow this tutorial to set up wifi through the command line. Once you have changed the configuration, shutdown the Pi. Unplug the USB, plug in a Wifi stick and then plug in power using the PWR connector. If you have done everything correctly, the Pi will boot and connect to the Wifi on it's own.
Great, we set up Wifi, but now we can't use the USB to control the Pi. Here's the trick, we can use SSH wirelessly. All we need is the Pi's IP address. There are a couple of ways to get the address you need. You can log in to your router and find the raspberry pi on the network. Follow this tutorial to connect to the router, then look for something like "client list" for a list of IP's on the network. (It looks different on every router so I can't go into specifics) Alternatively you can use a tool like Advanced IP Scanner. This tool will simply scan for devices on your Wifi network. Once you have your Pi's IP address, you can plug it in to PUTTY. Just type the IP address where it says host name, and you should be able to log in just like if you were using a USB cable. Keep in mind, the IP address can change from time to time, so if you can't connect, you may need to run the IP scanner again to find the new address.
At this point you are ready to follow the RealVNC setup tutorial. Once you download the RealVNC viewer on your computer, you can start by setting up a direct connection. Open the viewer, and where it asks for host name, type in the IP address that you just discovered by using the IP scanner. You will have to enter your Pi's login info (i.e. username: pi password: raspberry). Now you should see the GUI on you Pi! You may need to adjust the resolution on the Pi if some of the windows don't fully fit on the screen. Now use the RealVNC setup tutorial to set up VNC over the cloud. This way you can login without fooling with IP addresses. Now that you can connect through the cloud, your Pi is setup. Congratulations!
Step 6: OpenCM Setup
In this tutorial, we're going to show you how to set up the OpenCM IDE and...
First thing we need to do is download the OpenCM IDE, which allows us to write code for the OpenCM and upload it onto the machine itself. It comes as a zip file, and you can find the download page here.
Once you've downloaded the IDE, go here to get a guide to how to install the IDE and set it up for first time use.
Step 7: Code Overview
In this section, we're going to go over the code given and get you started on what this code does. There are two main programs. One runs on OpenCM, and the other runs on the Raspberry Pi. On the Raspberry Pi there is a user interface, so that when you run the program you can type commands to control the robot. On the OpenCM the code governs the servos and the lower level hardware.
Q_Agent_class governs the possible moves of the crawler bot and shows where on the Q-matrix the values end up. S0 stands for the first servo, and S0 stands for the second servo farther down on the arm. The commands up, left, right, down state where on the Q-map the crawler will go to next. There is a complicated looking equation next, and this is the actual reward function of the bot itself. Compare it to the equation given earlier. Lastly is a section governing exploring behavior. It sets up a random number and then tells the bot to go to this random position, thus exploring the possible state space and possibly introducing more beneficial behavior.
RPi_Controller acts as the controller for the overall bot. Here we see where the left and right commands come from, as this section tells the OpenCM what to do and where to go. If you want to, you can spend some additional time exploring the links between these two pieces of code. The Controller Thread file does the same thing, but runs it as a thread, which allows you to still program details into the Raspberry Pi while it's running.
On the OpenCM side, we have Q_Learn_Master. This piece of code runs the servos and sets up the rewards, and prints them to the serial line. This allows us to monitor the progress of the robot and determine if the actions rewarded correspond to the theoretical actions rewarded for certain movements of the robot.
Step 8: Uploading Code and Testing Full Assembly
Ok, it's finally time to get this thing moving! First, make sure you have downloaded or cloned the code from the github repository. Once you have the code, we need to transfer it to the raspberry pi. You should already have a VNC connection with the Pi Zero, so use the built in ftp functionality of Real VNC to copy over the files: RPi_Controller and Q_Agent_Class. Make sure you put this in a directory that makes sense. Most importantly, you'll need to now the path which points to these files.
Next, pull up terminal and type in:
$ python RPi_Controller.py
This will initialize the Q learning application. You can use the help function to look up commands in the user interface, but here are the basic commands to get the ball rolling:
>>Enter Command: gui on (hit enter)
Now, before we move further on the pi side you'll want to download the Q_LEARN_MASTER code to the OpenCM(this will need to happen at the beginning of each session). Open the Q_LEARN_MASTER OpenCM code in the OpenCM IDE and hit download. This will reinitialize this code on the OpenCM side.
now, to double check that the pi and opencm are communicating type 'down' at the command prompt. The shoulder servo should move down. If this is successful, type in 'zero'. This will return the servos to a zero position.
Finally, when you're ready to watch your bot learn in real time, type in 'Run Q' and hit enter. A pygame console should pop up displaying the q map and live update as the bot explores its state space. The bot begins with the following initial hyperparameters:
Epsilon = 0.9
Alpha = 0.8
Gamma = 0.8
To change these, first pause the Q algorithm by typing 'pause'
then type 'Eps', 'Alpha', or 'Gamma' followed by '= '. To resume the Q learning with the new hyperparameter value, simply type 'Resume'
Step 9: Learn to Walk!
That's it! Try out different parameters, see what works and what doesn't. We had some luck letting the crawler explore, then turned down Epsilon and let the little guy try it out.