Object Detection With Sipeed MaiX Boards(Kendryte K210)




https://www.youtube.com/c/hardwareai My channel about robotics with ROS and machine learning!

As a continuation of my previous article about image recognition with Sipeed MaiX Boards, I decided to write another tutorial, focusing on object detection. There was some interesting hardware popping up recently with Kendryte K210 chip, including Seeed AI Hat for Edge Computing, M5 stack's M5StickV and DFRobot's HuskyLens (although that one has proprietary firmware and more targeted for complete beginners). Because of it's cheap price, Kendryte K210 has appealed to people, wishing to add computer vision to their projects. But as usual with Chinese hardware products, the tech support is lacking and this is something that I'm trying to improve with my articles and videos. But do keep in mind, that I am not on the Kendryte or Sipeed developers team and cannot answer all the questions related to their product.

With that in mind, let's start! We'll begin with short(and simplified) overview of how object recognition CNN models work.

Teacher Notes

Teachers! Did you use this instructable in your classroom?
Add a Teacher Note to share how you incorporated it into your lesson.

Step 1: Object Detection Model Architecture Explained

Image recognition (or image classification) models take the whole image as an input and output a list of probabilities for each class we're trying to recognize. It is very useful if the object we're interested in occupies a large portion of the image and we don't care much about its location. But what if our project (say, face-tracking camera) requires us not only to have a knowledge about the type of object in the image, but also its coordinates. And what about project requiring detecting multiple objects(for example for counting)?

Here is when Object Detection Models come in handy. In this article we'll be using YOLO (you only look once) architecture and focus the explanation on internal mechanics of this particular architecture.

We're trying to determine what objects are present in the picture and what are their coordinates. Since machine learning is not magic and not "a thinking machine", but just an algorithm which uses statistics to optimize the function(neural network) to better solve a particular problem. We need to paraphrase this problem to make it more "optimizable". A naive approach here would be to have the algorithm minimizing loss(difference) between it's prediction and correct coordinates of the object. That would work pretty well, as long as we have only one object in the image. For multiple objects we take a different approach - we add the grid and make our network predict the presence (or absence) of the object(s) in each grid. Sounds great, but still leaves too much uncertainty for the network - how to output the prediction and what to do when there are multiple objects with center inside one grid cell? We need to add one more constrain - so called anchors. Anchors are initial sizes (width, height) some of which (the closest to the object size) will be resized to the object size - using some outputs from the neural network (final feature map).

So, here's a top-level view on what's going on when YOLO architecture neural network performs an object detection on the image. According to features detected by feature extractor network, for each grid cell a set of predictions is made, which includes the anchors offset, anchor probability and anchor class. Then we discard the predictions with low probability and voila!

Step 2: Prepare the Environment

My work is based on wonderful project by penny4860, SVHN yolo-v2 digit detector. There are many implementations of YOLO architecture with Keras, but I found this one to be working out of the box and easy to tweak to suit my particular use case.

Clone my github repo for this project. It is a fork of penny4860's detector with some minor changes. I highly recommend you installing all the necessary dependencies in Anaconda environment to keep your project separated from others and avoid conflicts.

Download the installer here.

After installation is complete, create a new environment and install the necessary packages:

conda create -n yolo python=3.6 

Let's activate the new environment

conda activate yolo

A prefix before your bash shell will appear with the name of the environment, indicating that you work now in that environment.

We'll install the necessary packages from text file requirements.txt (these two commands needs to be done inside of the folder cloned from my github repo)

pip install -r requirements.txt

Then we'll install the yolo package

pip install -e .

Step 3: Train an Object Detection Model With Keras

Now we can run a training script with the configuration file. Since Keras implementation of YOLO object detector is quite complicated, instead of explaining every relevant piece of code, I will explain how to configure the training and also describe relevant modules, in case you want to make some changes to them yourself.

Let's start with a toy example and train a racoon detector. There is a config file inside of /config folder, raccoon.json. We choose MobileNet as architecture and 224x224 as input size. Most of the parameters are pretty much self-explanatory, with the exception of:

jitter - image augumentation, resizing, shifting and blurring the image in order to prevent overfitting and have greater variety in dataset. It also flips the image randomly, so set it to false if your objects are orientation-sensitive.

train_times, validation_size - how many times to repeat the dataset. Useful if you have jitter enabled

first_trainable_layer - allows you to freeze certain layers if you're using a pre-trained feature network

Now we will clone this github repo, which is a racoon detection dataset, contaminating 150 annotated pictures.

Make sure to change the lines in configuration file(train_image_folder,train_annot_folder) accordingly and then start the training with the following command:

python train.py -c config/raccoon.json

Train.py reads the configuration from .json file and trains the model with yolo/yolo_frontend.py script. yolo/backend/loss.py is where custom loss function is implemented and yolo/backend/network.py is where the model is created(input, feature extractor and detection layers put together). yolo/backend/utils/fit.py is script that implements training process(I made a slight modification to it, which will save Keras model to .tflite file on Ctrl-C and training end) and yolo/backend/utils/feature.py contains feature extractors. If you intend to use trained model with K210 chip, you can choose between MobileNet and TinyYolo, but I've found MobileNet gives better detection accuracy. By default it is hard-coded to use 0.75 alpha 224 input_size MobileNet with imagenet weights, you can change these settings in yolo/backend/utils/feature.py.

Since it is a toy example and only contains 150 images of raccoons, the training process should be pretty fast, even without GPU, although the accuracy will be far from stellar. For work-related project I've trained a traffic sign detector and a number detector, both datasets included over a few thousand training examples.

When the training is done, it's time for the next step, model conversion to .kmodel format.

Step 4: Convert It to .kmodel Format

Upon training end (or keyboard interrupt event) we should have our model saved to both .h5 and .tflite formats to folder where you ran train.py script.

After that clone Maix toolbox repository and from repository directory execute following command in terminal

bash get_nncase.sh

This will download nncase, a toolkit for model conversion. Place a few of the images, that have the same dimensions as input_layer of your network(224x224) to image directory of Maix toolbox folder. Then copy the trained model to Maix toolbox folder and run the following command:

./tflite2kmodel.sh model.tflite 

If the conversion was successful you will see output similar to the one above. Now to the last step, actually running our model on Sipeed hardware!

Step 5: Run on Micropython Firmware

It is possible to run inference with our object detection model with C code, but for the sake of convenience we will use Micropython firmware and MaixPy IDE instead.

Download MaixPy IDE from here and micropython firmware from here. You can use python script kflash.py to burn the firmware or download separate GUI flash tool here.

Copy model.kmodel to the root of an SD card and insert SD card into Sipeed Maix Bit(or other K210 device). Alternatively you can burn .kmodel to device's flash memory. My example script reads .kmodel from flash memory. If you are using SD card, please change this line

task = kpu.load(0x600000)


task = kpu.load("/sd/model.kmodel")

Open MaixPy IDE and press the connect button. Open raccoon_detector.py script and press Start button. You should be seeing a live stream from camera with bounding boxes around ... well, raccoons. You can increase the accuracy of the model by proving more training examples, but do keep in mind that it is fairy small model(1.9 M) and it will have troubles detecting small objects(due to low resolution).

One of the questions I received in comments to my previous article on image recognition is how to send the detection results over UART/I2C to other device connected to Sipeed development boards. In my github repository you will be able to find another example script, raccoon_detector_uart.py, which (you guessed it) detects raccoons and sends the coordinates of bounding boxes over UART. Keep in mind, that pins used for UART communication are different of different boards, this is something you need to check yourself in the documentation.

Step 6: Summary

Kendryte K210 is a solid chip for computer vision, flexible, albeit with limited memory available. So far, in my tutorials we have covered using it for recognizing custom objects, detecting custom objects and running some OpenMV based computer vision tasks. I know for a fact that it is also suitable for face recognition and with some tinkering it should be possible to do pose detection and image segmentation(for example for monocular depth estimation). Feel free to fork my Github repos and do some awesome things yourself!

Here are some articles I used in writing this tutorial, have a look if you want to learn more about object detection with neural networks:

Bounding box object detectors: understanding YOLO, You Look Only Once

Understanding YOLO (more math)

Gentle guide on how YOLO Object Localization works with Keras (Part 2)

Real-time Object Detection with YOLO, YOLOv2 and now YOLOv3

Hope you can use the knowledge you have now to build some awesome projects with machine vision! You can buy Sipeed boards here, they are among the cheapest options available for ML on embedded systems.

Add me on LinkedIn if you have any questions and subscribe to my YouTube channel to get notified about more interesting projects involving machine learning and robotics.

Be the First to Share


    • Instrument Contest

      Instrument Contest
    • Make it Glow Contest

      Make it Glow Contest
    • STEM Contest

      STEM Contest

    16 Discussions


    Question 18 hours ago on Step 6

    Hi ! Thank you for the tutorial, the raccoons works like a charm.

    However, when I try to customize it (to recognise numbers/letters so that my robot would follow S,1,2,3,4,5,E) it trains for a few images (between ~30 and ~80, seemingly randomly) before failing with an [ IndexError: index 7 is out of bounds for axis 0 with size 7 ] error.

    I changed the images/annotations, and the classes in the raccoon.json file, did I miss something ?

    Thanks in advance

    1 answer

    Answer 18 hours ago

    The full error message is the following:

    File "train.py", line 77, in <module>
    File "C:\CNN_robot\yolo\frontend.py", line 140, in train
    saved_weights_name = saved_weights_name)
    File "C:\CNN_robot\yolo\backend\utils\fit.py", line 119, in train
    max_queue_size = 8)
    File "C:\Users\Me\AppData\Local\Continuum\anaconda3\envs\yolo\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
    File "C:\Users\Me\AppData\Local\Continuum\anaconda3\envs\yolo\lib\site-packages\keras\engine\training.py", line 1732, in fit_generator
    File "C:\Users\Me\AppData\Local\Continuum\anaconda3\envs\yolo\lib\site-packages\keras\engine\training_generator.py", line 185, in fit_generator
    generator_output = next(output_generator)
    File "C:\Users\Me\AppData\Local\Continuum\anaconda3\envs\yolo\lib\site-packages\keras\utils\data_utils.py", line 625, in get
    File "C:\Users\Me\AppData\Local\Continuum\anaconda3\envs\yolo\lib\site-packages\six.py", line 696, in reraise
    raise value
    File "C:\Users\Me\AppData\Local\Continuum\anaconda3\envs\yolo\lib\site-packages\keras\utils\data_utils.py", line 610, in get
    inputs = future.get(timeout=30)
    File "C:\Users\Me\AppData\Local\Continuum\anaconda3\envs\yolo\lib\multiprocessing\pool.py", line 644, in get
    raise self._value
    File "C:\Users\Me\AppData\Local\Continuum\anaconda3\envs\yolo\lib\multiprocessing\pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
    File "C:\Users\Me\AppData\Local\Continuum\anaconda3\envs\yolo\lib\site-packages\keras\utils\data_utils.py", line 406, in get_index
    return _SHARED_SEQUENCES[uid][i]
    File "C:\CNN_robot\yolo\backend\batch_gen.py", line 87, in __getitem__
    y_batch.append(self._netout_gen.run(norm_boxes, labels))
    File "C:\CNN_robot\yolo\backend\batch_gen.py", line 164, in run
    y += self._generate_y(best_anchor, label, norm_box)
    File "C:\CNN_robot\yolo\backend\batch_gen.py", line 179, in _generate_y
    y[grid_y, grid_x, best_anchor, 0:4] = box
    IndexError: index 7 is out of bounds for axis 0 with size 7


    Question 22 days ago

    Good project, it work for me everything... and i cut made my own object detector.... but i have a question ┬┐what shoud i do to train more than two objects?, i have alrady tried with 3 diferente objects but it don't made the tflite file just the h5. i tried to convert to tflite file and then turn into kmodel but it doesn't work

    2 answers

    5 weeks ago

    I just purchased the m5stickV K210 device. Will this code run on it? Their quick start guide has you run the training through a compressed zip file with images and send to there website. i would like to try your code and do the training on my PC. Also their example code only does 10 classifications.Would really like to do more then 10 objects if possible.

    i will get your code and try it out to see if i can get it working. Thanks foir this tutorial

    1 reply

    Reply 16 days ago

    No problem. Yes, it should be working with m5stickV as well, although keep in mind this is a detection, not classification model training tutorial. The datasets for detection and training are different.


    2 months ago

    Just tell you a really BIG thank you for your fantastic answer, and congrats you by another great tutorial.
    Thanks to you I found a way to get all I needed for my robots (CNN object detection and send it via UART to a SoC) and with ROS link the coordenates to a Point cloud (PCL) to give a 3d ubication of the object and can get semantic slam.
    I will need learn everything from 0 to get it, from make my own model, to learn micropytnon(openmv implementation included),and make my own ros node for openmv/maix communication,(I have planned fork and modify one from openmv) further the PCL and semantic slam implementation.
    With this I will have entertaiment learning for probably years, all thanks to you.
    The part of "with some tinkering it should be possible to do pose detection and image segmentation(for example for monocular depth estimation)" sounds a lot similar to what I have planned to get semantic slam (include segmentation), any further information about this should be great.

    2 replies

    Reply 6 weeks ago

    It does sound like "entertaiment learning for probably years", haha
    What is the end goal of your project?


    Reply 6 weeks ago

    I have planned 6 robots. And I have planed different features for them, some of hardware, sensors, others in AI...which each one is a whole project itself. In the project that you helped me, it is a AI project to incorporate it in at least three robots in LV1 and in other 3 in a LV2 with more features.
    The LV1 is a Inference model to detect different kind of ground surface , as parquet, carpet,gravel, sand,grass,high grass....and give them a condition of obstacule or drivable surface in the navigation stack of ROS. The LV2 go further, and with a exotic senors+ML will make semantic slam, avoiding the segmentantion (via computer vision) and object classification (CNN) avoiding the heaviest duty of the semantic part, and can bring semantic slam to low power SoC as rpi4, upboard., Nano and the small robots , no just to autonomous cars or 100k drones


    7 weeks ago

    Hmm...doesn't work for me...installed modules from requirements.txt....

    During training I get at the end:

    E tensorflow/core/grappler/grappler_item_builder.cc:637] Init node conv1/kernel/Assign doesn't exist in graph

    It creates model.h5 and model.tflite, but only if I only use one label in config..json...otherwise only model.h5 is generated when I want images from more than one class like in the digit example.

    When train the example raccoon images, generate the model, convert it to k210 and flash it...no raccoon is recognized....

    2 replies

    Reply 7 weeks ago

    Okay..got the raccoon example running...but no chance with custom images...

    Don't have much images of different cups/mugs at home to be recognized...and even less then for the kmodel conversion....

    Even when I cover the camera it displays loads of rectangles of recognized cups/mugs...

    Is there a bare minimum of images to be used either for training and for the tflite2kmodel conversion?


    Reply 6 weeks ago

    Glad that you got it working with example dataset!
    Now for custom dataset, I would say bare minimum would be 200-300 images and that might not give stable results. I was training model for detecting 10 classes of objects (traffic signs) with 3000+ images.
    You can check the performance of the model on computer by using evaluate.py script. You'll need to edit it to match the location of your validation dataset though.
    Good luck!


    8 weeks ago

    I have a relu6 erron on train, perharps SVHN yolo-v2 digit detector trains ok.
    Do you have specific versions of keras and tensorflow?

    Thanks for your fantastic work

    2 replies

    Reply 7 weeks ago

    Yes, tensorflow 1.14.0, keras 2.3.0. If you do pip install -r requirements.txt it will install these versions. I added requirements.txt to my github repo yesterday


    Reply 7 weeks ago

    Now it works like a charm!.

    Thank you very much