Self-Learning Rock - Paper - Scissors Robot From Lego Mindstorms NXT!

22K4313

Intro: Self-Learning Rock - Paper - Scissors Robot From Lego Mindstorms NXT!

Hi everyone!  This is my first instructable!

This is a REAL self - learning robot that learns how to play rock - paper - scissors!  It will learn how to beat a person 100% of the time!  A person is NOT needed to teach the robot how to play the game; it really does learn by itself!  

This robot does not play rock-paper-scissors in the way people play.  It first asks the user to input a move (either rock - paper - or scissors).  The robot then calculates the best move to play, and then will extend a retractable arm that shows its next move (a Lego rock, paper, or a Lego scissors).  The player must then tell the robot if the robot won, lost, or tied, against the player.  

While you may think that this robot is cheating, since it waits for the player to make a move, I did not program the robot to know the rules of the game!  The robot does not know that rock beats scissors, paper beats rock, or scissors beats paper!  Instead, the robot relies on the player to tell whether it won/lost/tied to learn from past success/failures and to use this information in the future!

This robot was featured at Robogames 2011 for the Lego Open challenge (first place!).  Moar pics and the source code will be attached!

STEP 1: You Shall Not Win! - Construction Overview

The robot is made out of the following pieces:

1x Lego Mindstorms NXT - the brain!
3x Lego Mindstorms Touch Sensors - User Inputs
3x Lego Mindstorms NXT Motors - Peripherals for the robot
Tetrix Pieces for the Base (Aluminum chassis)

This robot has a very simple construction; each motor drives a retractable arm that carries a Lego rock (left), paper (center, I didn't have any Lego paper handy!), and scissors (right).  The robot will extend the arm that indicates its move.

On the bottom of the robot is the user inputs, which are three touch sensors with gears on them.  Each sensor has two possible options.  

Left Sensor: Rock and Yes!
Center Sensor: Paper and No!
Right Sensor: Scissors and Tie!

I will explain why each sensor has two options in the next slide!

STEP 2: How Does the Robot Learn If It Doesn't Cheat?! (Part 1)

As I've written on the first step, this robot first asks the (much smarter!) human to input a move through the touch sensors.  It will then look through a database and determine what the best possible move to make is.  After making that move, the human will have to tell the robot whether it won/lost/tied for that round.  If the robot does not know that rock > scissors, paper > rock, or scissors > paper (I did not program these rules into the robot), how can it use this information to learn?!

The robot creates a virtual database (for you computer geeks, it uses a 3 dimensional array to do this!).  Think of this database like a rubik's cube.  The robot has to keep track of three things: 1) The move the player entered (rock, paper, or scissors); 2) The move the robot made (again rock, paper, or scissors); and 3) The result of this round (did the robot win, lose, or tie with the player?).  In this database, the robot will factor in a probability of success for this move.  This value is stored in the array, or (using the Rubik's Cube analogy) in one of the 27 cubes.

For example, if the player chose ROCK, but the robot choose SCISSORS, the robot lost, so it will enter a success rate of 0% for playing scissors when the player chooses rock in the future.

To encourage the robot to learn, I reward the robot using a system of virtual points!  An analogy is that of a little kid.  If I went up to the kid and said, "Hey, I'll give you $20 if you can learn to fly by yourself!", the kid will think, "Wow!  $20!  That's a good reward!  Let me try!".  The kid will first crawl, then walk, then run, and then jump in an attempt to fly and get the $20 reward.  However, the kid will eventually learn that he/she cannot fly without an airplane, and cannot succeed.  However, along the way, the kid had learned how the crawl, walk, run, and jump!

I applied these principles to the robot!  Instead of cash (would I seriously waste my time trying to give my robot $20?!), I will give the robot a virtual point (+1) if the robot beats the player.  BUT, I will take away 10,000 virtual points from the robot (yes I'm mean) if the robot either loses or ties with the player.  Since the robot wants to maximize the number of points it gains, it will use the success probabilities in its database to achieve this goal.

STEP 3: How Does the Robot Learn If It Doesn't Cheat?! (Part 2)

Would you believe it if I told you that to make this robot learn how to play RPS, it only requires FOUR variables?!  O_O.  

The main variable is called EPSILON.  This variable is also known as the learning rate.  Epsilon starts out ridiculously high, which causes the robot to make random moves in the beginning of the game.  As the robot plays more (and consequently learns the best moves to make against the player), Epsilon decreases.  Since Epsilon gets smaller, over time, the robot will slowly begin to use the success probabilities in its database against the player.

The three other variables are: ALPHA, GAMMA, and KAPPA.

Alpha keeps track of how much each move affects the robot's learning.  That sounds confusing!  Actually, Alpha is intentionally set to as close to zero as possible.  If a player lies (*gasp*) to the robot (say if the player chose rock, and the robot chose paper, but the player claims that the robot lost), a low value of Alpha will cause the robot to ignore the lie!  However, if Alpha is too low, then the robot will not learn as quickly.

Gamma is a reward rate.  Gamma is set high (0.80) because as Gamma approaches 1, the robot is more likely to begin to use success probabilities sooner.

Kappa is a thoroughness value that helps the robot refine its probabilities.


STEP 4: Enough Talk! Let's Play a Sample Game!

Now that the theory part is done, let's play a sample game with the robot!  I wish I took a video of this bot before I tore it down, but here's a sample game:

Round 1:
Me: Rock
Robot: Scissors
Loss

Round 2:
Me: Paper
Robot: Paper
Tie

Round 3:
Me: Scissors
Robot: Rock
Win!

Round 4:
Me: Rock
Robot: Scissors
Loss

Round 5:
Me: Paper
Robot: Scissors
Win!

Round 6:
Me: Scissors
Robot: Paper
Loss

Round 7:
Me: Rock
Robot: Paper
Win!

Round 8:
Me: Paper
Robot: Scissors
Win!

Round 9:
Me: Scissors
Robot: Rock
Win!

If you're observant, you can see that in the beginning of the game, the robot makes RANDOM moves, but eventually the robot beats me every time!  If you've also kept track of the moves I've made, you should have noticed that I cycled through Rock -> Paper -> Scissors.  This is to help the robot learn what moves to make to counter mine.  However, here's rounds 10 -> 13 to prove that the robot can now beat me every time:

Round 10: 
Me: Scissors (again)
Robot: Rock
Win!

Round 11:
Me: Paper 
Robot: Scissors
Win!

Round 12:
Me: Scissors
Robot: Rock
Win!

Round 13:
Me: Rock
Robot: Paper
Win!

STEP 5: Source Code and Moar Pics!

Well I know that this project may be confusing for people, but here's the source code for it if you want to try this at home!  I used Bricxcc (a C programming language for Lego Mindstorms: http://bricxcc.sourceforge.net/) I've commented it to help aid you with understanding it!

For more information, try Googling the following:
- Q-Learning
- Reinforcement Learning

Thank you for reading this instructable, and I hope you enjoyed it! :D

P.S. If you try downloading my source code, it may show up as a .tmp file.  Please rename it as "RPSGame.nxc" to open it with Bricxcc.

12 Comments

If I understand this correctly: this robot does NOT learn rock-paper-scissors. It only learns the rules

Interesting.  It's simulating a simple neural network.
There are programs around which will learn the quirks and frequencies of the person (or algorithm) it's playing against and make a good guess as to what they are going to play next.  That could be something to work towards.
Also, what's the range of the NXT colour sensor?  You could possibly use one to detect if the human was holding up a rock, scissors or paper marked with an appropriate colour.
(BTW, the word is 'more'.)
Hi!

Which color sensor are you referring to? The Retail 2.0 one can only see 6 colors: White/Black/Green/Red/Yellow/Blue, but the Hitechnic one (both 1.0 and 2.0) can see 17 different shades. I think it's 2 each of Red-Orange-Yellow-Green-Blue-Indigo-Purple, and Black/Gray/White.

That's an interesting option... reducing the number of sensors from 3 Touch to 1 Color! I think people liked pushing buttons more, so I chose the buttons over the color sensor!

Actually the color sensor CAN detect ANY color


I'm not that familiar with the NXT accessories but I knew there was (at least) one colour sensor.  Of course you'd only need to differentiate three colours here.  Fair enough if people prefer pushing buttons though.
I just read through all of the code and your reward system is pointless. it does nothing except have a variable set to 1 or -10000. the the only time it uses the reward variable is when setting it. there is nothing that tells it that 1 is good or -10000 is bad. please correct me if I'm wrong.
I'm very sorry, I just checked again and I somehow missed the point where it uses reward to update qscore or something like that. I don't know how I missed it I even searched for it and didn't see it.
very cool! I would appreciate it if you made a separate instructable going into the program and how that in particular works I find it very interesting and you seem to have it figured out.
WARGAMES!!!!!!!!!! next you should make a robot to play tic tac to with one zero players
You mentioned a 'reward points' system to encourage the robot to learn. But what would you do if the robot came to realize that the reward points were really meaningless, and started just messing with the human (always getting a tie, for instance?). This robot doesn't seem like it could take over the world or anything, but it might try.
I guess we're all screwed then! :D

Actually the program CAN tie with a person.... you just need to add an extra if statement in the code:

if((i == 0 && j == 0) || (i == 0 && j == 2) || (i == 1 && j == 0) || (i == 1 && j == 1) || (i == 2 && j ==1) || (i == 2 && j == 2))
{
reward = -10000;
}//end if

//otherwise, the robot must win; reward it with a virtual point!
else
{
reward = 1;
}//end else

Change this to:
if((i == 0 && j == 2) || (i == 1 && j == 0) || (i == 2 && j == 1))
{
reward = -10000; //For losing against human, lose 10K points!
}
else if((i == 0 && j == 0) || (i == 1 && j == 1) || (i == 2 && j == 2))
{
reward = 1; //1 point for tying
}
else
{
reward = 1; //1 point for winning
}

This way, the robot might tie or might decide to win, depending on how it's feeling :D I tested out this code before, and most of the time, it prefers to win....
Oops, forgot to mention the outputs!

Rock - Left Motor - Port A
Paper - Center Motor - Port B
Scissors - Right Motor - Port C

Left Touch - Port 1
Center Touch - Port 2
Right Touch - Port 3