Introduction: Data Collection With Raspberry Pi

About: UX/UI Design, Machine Learning, MLOps, Data Viz, - founder of BentoML.ai, previously @databricks, @uwmhcid

The GPIO pins on Raspberry Pi brings it great extendability to all kind of sensors. Many projects utilize this ability to collect data on Pi and do interesting things with the data. This article gives an overview on how you can store data on Raspberry Pi, how to structure complex data models, and how to export your data for data analysis.

This instructable is written based on our experience building the VoteWithYourFeet project. In short, it's two doorways stand in the middle of the street, with a question and two options displayed on a sign above. The installation changes the question every 5 minutes, and counts the number of people that walk through each door. All related code can be found at VWYF github page.

This instructable also assumes the reader has basic understanding of computer programming, otherwise some of the content might be hard to follow.

Step 1: Writing Data to a CSV File

If all you need is storing small amount of structured data on SD card and do data analysis later, the easiest way is simply writing your data to a CSV file. Here is an example CSV file:

Year,Make,Model,Length
1997,Ford,E350,2.34
2000,Mercury,Cougar,2.3

The first row is usually the title of each column, and each of the rows following is one data point containing values separated by comma.

In essence, a CSV file is just a file containing plain text in this format. Here is an example of how to write data to CSV file using Python:

file = open('./data.csv', 'a')

file.write("Year,Make,Model,Length")
file.write("\n")

# faking a list of data, in reality it may be reading data from a sensor
my_data = [
    ["1997", "Ford", "E350", "2.34"],
    ["2000", "Mercury", "Congar", "2.3"],
]

for d in my_data:
    # note ','.join(["a", "b", "c"]) will give you string: "a,b,c"
    file.write(','.join(d))
    file.write("\n")</p>

Alternatively, the Python standard library comes with a CSV module that made it even easier to read/write data in CSV format.

One advantage of using CSV format, is you can open a CSV file with any spreadsheet software such as Microsoft Excel for visualization and data analysis.

Step 2: SQLite - Modeling You Data With a Tiny Database

CSV file is easy to use for simple data logging. Although if you have slightly more complicated data models, or if you want to do searching and filtering on the data collected, it can be very cumbersome and very inefficient to program against a CSV file. For the VoteWithYourFeet project, we ended up choosing SQLite as our data storage on Raspberry Pi client:

SQLite is popular choice for the database engine in cellphones, PDAs, MP3 players, set-top boxes, and other electronic gadgets. SQLite has a small code footprint, makes efficient use of memory, disk space, and disk bandwidth, is highly reliable, and requires no maintenance from a Database Administrator.

Below is an example of using python to set up a SQLite database and adding data to it:

import sqlite3
import datetime

def init():
  conn = sqlite3.connect('data.db')
  createTable(conn)
  conn.close()

def create_table(conn):
  c = conn.cursor()
  c.execute('''CREATE TABLE IF NOT EXISTS answers (
      id INTEGER AUTOINCREMENT,
      questionsId INTEGER,
      answer INTEGER,
      createdAt TIMESTAMP
      )''')
  conn.commit()

def add_new_answer(questionId, answer):
  now = datetime.datetime.now().ctime()
  c = conn.cursor()
  c.execute('''INSERT INTO answers (questionId, answer, createdAt) VALUES (?, ?, ?)''',
      (questionId, answer, now))
  conn.commit()

You can learn more about using SQLite with python here.

This approach works very well, but it's a lot of work writing SQL statements manually. If your application has multiple data tables or needs to do different kinds of updates to the table, you should consider using an ORM(Object Relational Mapper) library. It allows you to write more readable and manageable code comparing to writing raw SQL statements, especially for larger projects.

SQLAlchemy is one of the most popular ORM libraries for Python. Below is how you can use SQLAlchemy to do the same thing as the code snippet above:

import datetime

from sqlalchemy import create_engine, Column, Integer, String, Boolean
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

# sql alchemy config
engine = create_engine('sqlite:///data.db', echo=True)
Base = declarative_base()
Session = sessionmaker(bind=engine)

class Answer(Base):
  __tablename__ = 'answers'
  id = Column(Integer, primary_key=True)
  question_id = Column(String, index=True)
  answer = Column(String)
  created_at = Column(String)

def add_new_answer(questionId, answer):
  now = datetime.datetime.now().ctime()
  session = Session()
  answerObj = Answer(
      question_id = question_id,
      answer=answer,
      created_at = now)
  session.add(answerObj)
  session.commit()

Base.metadata.create_all(engine)</p>

To learn more about SQLAlchemy, take a look at their official documents.

Step 3: Play With the Data Collected

Running the python code in previous step should generate a '.db' file in the same directory of your Python script. If you are already familiar with SQL, you can always use the SQLite command line shell to load this file and play with the data.

Although there are much easier ways to interact with a SQLite database. For example, SQLite Browser is a GUI tool for viewing and editing SQLite databases without writing SQL commands. It also supports exporting SQLite data tables to CSV files.

Step 4: Syncing Data With Server

For VoteWithYourFeet project, we run a Python script on Raspberry Pi that collects data from sensors and uploads the results to a web server in real time. This allows us to show latest voting results live on our website and our twitter page.

Here is more details: we built a web server with NodeJS which provides HTTP endpoint for logging data. On the client side, Raspberry Pi is connected to a 3G wifi router while the Python script running in the background can send new data collected to our web server over HTTP request. You may checkout our source code as a reference implementation.

If you are new to server side programming, there are a few other options as well:

  • Some web frameworks such as Rub on Rails, allow you to quickly set up REST API end points with only a few lines of code.
  • There are many Database-as-a-Service options from AWS, Azure, MongoDB etc, that allow your application to post data to a remote database directly, without you running your own server.

Step 5:

Over the 3 days on Market St. San Francisco, this installation collected 10811 votes for 83 different questions! After a quick dive into the data we've collected, here is what we found.

Hope you find this article helpful, leave a comment if you have any questions or suggestions.