Q Learning Game (Artificial Intelligence)

About: I am a hobbyist. I love building stuffs and robots. Things I like - You, Wireless, Arduino, Servos, Jesus Things I hate - Wires, Batteries, Satan Things I wish - Roam around the world, meet different people ...

Hello, friends, I guess most of DIY users here already know about Q Learning. It is a branch of Machine Learning or better say a branch of Re-enforcement Learning.

Q learning is an unsupervised way of learning where you are rewarded for correct moves and punished for wrong moves by making a Q matrix. Let me explain it in a better way.

Consider the image of Square.

Our bot starts at the yellow circle i.e box 16. She has to reach the green box. If she touches the red box she will have to give a penalty of -1 If she touches black box she will have to give a penalty of -2. If she reaches Green box she will be rewarded +100. If she makes a move to any other box she will be rewarded -0.2.

So she has to find the shortest path which will reward her the most. Clearly to our eyes the shortest path is

Path a = 16 - 17 - 18 - 19 - 20 - 15 - 10 - 5

Reward received -0.2*7+100 = 98.6 pts

Path b = 16 - 17 - 12 - 7 - 2 - 3 - 4 - 5

Reward received = -0.2*7+100 = 98.6 pts

So here each box is a state and each state has a reward. Thus the Q matrix will be (See Images)

In matrix 0 refers move not possible -1 refers a penalty -0.2 refers a penalty +100 refers a reward.

This Q matrix is the BRAIN of the artificial bot. The bot will take random actions moving only in 4 directions which are Up, Down, Left and Right.Thus by using this matrix data the bot will try to maximize the reward and minimize the penalty by trial and error.

Teacher Notes

Teachers! Did you use this instructable in your classroom?
Add a Teacher Note to share how you incorporated it into your lesson.

Step 1: Installing Python 3.5 and Module

Go to the official python website and download Python 3.5 64bit.

Then go the folder where Python got installed and find Scripts folder. Open it. Then at some random white space Shift+Right Click to open Command Prompt (Admin) at that location. You cmd should show address like this


Else if you have windows 10 anniversary then Right click windows icon and open command prompt as admin.

Then type cd \

Then type E: or F: or the drive letter where you have installed Python. Leave it if you have installed it in C drive.

The to locate Scripts folder type like this (I have mine in E drive)

E:\>cd E:Python3\Scripts

You will get this


After you reach here type this

pip install numpy

Let it install till then do something productive

then type

pip install pygame

then type

pip install tensorflow

pip install tflearn

If done correctly proceed

Step 2: Chase the Pong Ball Code

Our game is to catch the ball that falls from above. If the bot misses then he will be rewarded with -1 else he will be rewarded with +1.

The code begins like this

import sys
from pygame.locals import *

import pygame as py #IMPORTING PYGAME

import numpy as np # IMPORTING NUMPY


Choose your game FPS (Sounds cool right)

FPS = 30 #Lower for slow and higher for fast
Clk = py.time.Clock()

py.init() # pygame initialization

py.display.set_mode((800, 600)) #Width = 800 #Height = 600

py.display.set_caption('Q learning Example!')

Left = 400

Top = 370

Width = 200

Height = 20

BLACK = (0,0,0)

GREEN = (0,255,0)

WHITE = (255,255,255)

rectangle = py.rect(Left,Top,Width,Height)

dict = {} # dictionary for Q temp matrix

Brain = np.zeros([4000,3])

action = 1 # 0 for still, -1 for left, 1 for right

score = 0

reward = 0

penalty = 0

cirX = 400 #circle center x corrdinate

cirY = 0 # circle center y corrdinate

radius = 10

y = 0.98 #gamma

class State:

def __init__(self,rect,circle):

self.rect = rect

self.circle = circle

class Circle:

def __init__(self,cirX,cirY):



def changeCirX(radius):

newx = 100-radius

mul = random.randint(1,8)

newx *=mul

return newx

def calscore(rectangle,circle):

if rectangle.Left <=circle.cirX <= rectangle.Right:

return 1;


return -1

def convert(state):

alpha = float(str(state.rect.Left)+str(state.rect.Right)+str(state.circle.cirX)+str(state.circle.cirY))

if alpha in dict:

return dict[alpha]

elif len(dict)>=1:

dict[alpha] = max(dict, key=dict.get) + 2


dict[alpha] = 1

def best_action:


def afteraction(state,act):

if action==1:

if rectangle.Right + rectangle.Width > 800:

rectangle = state.rect


rectangle = py.rect(rectangle.Left+rectangle.Width,rectangle.Top, rectangle.Width, rectangle.Height)

elif action == -1:

if rectangle.Left + rectangle.Width < 0:

rectangle = state.rect


rectangle = py.rect(rectangle.Left-rectangle.Width, rectangle.Top, rectangle,Width, rectangle.Height)


rectangle = state.rect

C = Circle(state.cirX,state.cirY+pixeljump)

return State(rectangle,C)

def newRectangle(rectangle,circle):

if action==1:

if rectangle.Right + rectangle.Width > 800:



rectangle = py.rect(rectangle.Left+rectangle.Width,rectangle.Top, rectangle.Width, rectangle.Height)

elif action == -1:

if rectangle.Left + rectangle.Width < 0:

return (rectangle)


rreturn(py.rect(rectangle.Left-rectangle.Width, rectangle.Top, rectangle,Width, rectangle.Height))

return (rectangle)

while True:

for event in py.event.get():

if event.type == QUIT:




if CirY >=600-radius:

reward = calscore(rectangle,circle(cirX,cirY))

cirX = changecirX(Radius)

cirY = 0


reward = 0

cirY += pixeljump

state = State(rectangle,circle(cirX,cirY))

action = best_Action(state)

x = calscore()

newstate = afteraction(state,action)

Brain[convert(state),act] += x + [y*np.max(Brain[convert(newstate),:])]

rectangle = newRectangle(state.rectangle,action)

cirX = State.circle.CirX

cirY = State.circle.CirY

if reward == 1:

score += reward


score -= reward

Penalty += reward

text = font.render('Score: ' + str(score), True, (243, 160, 90)) # update the score on the screen

text1 = font.render('Penalty: ' + str(penalty), True, (125, 157, 207)) # update the score on the screen

text2 = font.render('Time Taken : ' + str(int(el/60))+'m'+str(int(el%60))+'s', True, (0, 0, 255))

surface.blit(text, (670, 10)) # render score

surface.blit(text1, (10, 10)) # render penalty

surface.blit(text2, (250, 10))

py.display.update() # update display



Be the First to Share


    • Assistive Tech Contest

      Assistive Tech Contest
    • Reuse Contest

      Reuse Contest
    • Made with Math Contest

      Made with Math Contest