Introduction: Self-learning Chaotic Robot
Are you interested in machine learning, AI och robots? You don't need to work at some fancy university. This is is a description of my chaotic robot. It is a very simple robot to demonstrate how to use self learning code and how to implement it into a arduino platform, in this case an Arduino due. It is a very cheap platform!
The code evolves the robot so that the robot learns to crawl. It gets feedback from the mouse that is draged behind.
The code is "genetic". This means that a number of individs are tested and the best ones are kept and will have babys. This means that the code evolves in a evolutionary way.
Step 1: Hardware AKA the Robot
- 1 Arduino Due
- 8 microservos
- 1 PS/2 mouse
- 1 levelshifter
- some variant of a sensor-shield or similar, I got tired of the sensor shield and welded my own.
-external 5V power supply for the servos
- some scrap metal pieces, some glue and some steel thread. And tape!
So put the Due on the floor. Put the servos in a ring around it. Put them together with scrap metal, glue and thread. This is the chaos part! Since it is chaotic in its design, it is unpredictable to determine how to move to get it to crawl. This is why self-learning code is the way to go!
Tips: use some fairly heavy metal parts, it makes it easier for the robot to move.
Connect the servos to the due, in my case they are connected to D39,41,43,45,47,49,51,53.
Connect the servos to the external 5V power supply. For this, build some kind of shield, or use a sensor shield or similar. Do NOT feed the servos from the dues 5V pin, it is not enough, Due will burn. I used a small prototype board to distribute the 5 V to all servos. This board also holds the level shifter for the PS/2 mouse clock and data lines. The board also feeds the mouse with 5V. Remember connect ground from external power to Arduino due ground! schematics shows how to connect it all.
Connect the PS/2 to power (5V) and ground. Connect the clock and data line of the PS/2 to the Due through a level shifter. (due goes 3.3V, PS/2 goes 5V). Connect clock on D12 and data on D13.
For details on the PS/2 protocol, this is a very good instructable:
The PS/2 library by jazzycamel that I have used: https://github.com/jazzycamel/PS2Mouse
Step 2: The Code
At first let me say: I am NOT a programer. Some parts are very extensive, a skilled programmer could of course shorten it down and so and so.
The code is self learning and this is the core of the project. This is the fun part of it! It means that the robot evolves and gets better and better, in this case it gets better in crawling. The amazing thing about this is that the robot will evolve to what ever you feed-back it on. In this case it drags a PS/2 mouse and the longer the mouse is draged, the higher points it gets.
This also means that you can use this code to train your robot do do something else, as long as it is measured and fed back to the robot!
As you can see in the images, the mouse is dragged on a thin cord. At first it was dragged in the mouse-cable. However, the cable is kind of stiff, so the robot learned to shake the mouse, instead of dragging it. Shaking produced high points...
The code uses 50 individs. The core of this is an array of 50x50 bytes.
An individ is an array of bytes. When the individ is used to run the robot this individ is sent to a function in the code called "tolken".
At the start of a run there are 8 variables m1,m2,m3,m4,m5,m6,m7 and m8 (one for each servo). In this robot they all have constant starting values. In "tolken" the mś are transformed in a case/swich loop depending on the values of the individ. for example a value of "1" executes the following: m1 = m1 + m2.
If an individ is: 1,2,3,0,0,0,0..... then the mś will be transformed in the following way:
m1 = m1 + m2;
m1 = m1 + m3;
m1 = m1 + m4;
Tolken is a list of 256 different mathematic operations, so every possible value of the individs array represent a mathematic change of the m values.
The tolken-process is done 4 times over, with read-out between every lap, generating four different motor-codes for each "m". The motorcodes are the values that later are sent to the servos.
In every step of the evolvement, 4 individs compete in crawling. The best two individs will be parents to two babys, the babys will replace the two worst individs. When babys are made, a splice of "genetic code" from one parent is traded for a slice from the other parent, this creates two new individs.
If no individ performs at all, mutation of the individs will take place to generate new ones.
You can find the code at GitHub: https://github.com/ola667/Self-learning-robot
Step 3: How to Train It?
This is the tricky part. In order to train properly, you need to "reset" it after every run. This means that you have to put it in the same position every time.
I have put a few check-points inside the code to assure that the robot is in its starting position.
So align the robot and let it run.
It tests 4 individs and then it chooses the best 2 to be parents. After replacing the worst with the babys it prints some data on the performance of the individs. It also prints the 50x50 array. It is wise to copy this into an excel sheet or similar. (or write some need code in processing) If the Due resets (this happens of various reasons) you will then not loose your training work. You can copy/paste the array into the code and keep on training were you left.
My robot learned to crawl after a couple of hours. Download the video to se it crawl. It did not go in the direction I thought it would!
Also try different floors! My robot performed best on a nylon carpet.
1. It would be better to have a separate nano to read the PS/2 mouse, and send the processed distance moved over serial to the nano. The reading of my PS/2 mouse it a bit shaky. This is the reason for the mouse reading/clearing parts of the code.
2. some sort of test rigg that dragged the robot back to its starting position would speed up the training.
3. I think it is wise to train it a bit slower than I did. Slower training assures that it is trained "in the right direction". The mean performance of several test-runs could be a possible way.