Introduction: Data Crystals: 3D Data Visualizations From Open Data

About: Scott Kildall is an new media artist and researcher. He works at Autodesk, Pier 9 and is an artist-in-residence with the SETI Institute
One of my projects while a resident artist at Autodesk has been to create "Data Crystals" —  3D prints that I algorithmically generate using data as input. I design these 3D data visualizations for aesthetics over legibility and they show off what can be done with code and 3D printing.

I've finished the first three and have several more in production. This batch is derived from San Francisco Open Data sets.

This Instructable will give an overview of my creative and technical process for making these and I hope will encourage others to think about creative data visualization techniques.

At some point, I will likely share my code, but right now this project is too fresh and the code is too rough for public consumption. 

Step 1: Find the Dataset

There are many types of data I would like to represent. It's easy to think of fun and useful possibilities such as: people's favorite lottery numbers,  income levels for every single household in San Francisco and every single shipwreck in history.

However, in most cases the data that I want is simply unavailable, so instead, I work with what I can get.

What is now becoming accessible is loads of data from city governments — San Francisco leads the way with its Open Data Portal with an API powered by Socrata. Since people have a daily relationship with their urban environment and connect to data patterns that reflect the city they live in, this municipal data can be compelling.

For the first 3 Data Crystals, I chose from the SF Open Data site: construction permit data, incidents of crime and SF Civic Art Collection. 



Step 2: Extract and Parse

You can download many of these SF data datasets in a CSV file format.

I wrote utility code in Java, which goes through each dataset  and converts the CSV to a JSON format, keeping just the fields I want. At a minimum, I'm looking for some way to map (x, y, z) coordinates. These contain geo-spatial locations, which I translate to (x, y) values. The z-value is usually time.

I can also extract a dimension value — such as the number of units in a construction permit — and hence the "size" of each datum.


Step 3: What Does Data Look Like?

Remember these are 3D prints and have a physical presence and so have a material form. The question I am trying to answer is: What does data look like?

In the case of open data, I experimented with many shapes and came up with simple cubes, aligned on the same orientation. The white ones (VeroWhite resin) seemed to resonate both in my mind and those of colleagues when I showed it to them.

Step 4: Map Into 3D Space

Using the Processing program, along with the ModelBuilder libraries by Marius Watz, I map the 3D data onto the screen, so I can see what it looks like in its "raw" state.

The first image is the construction permit data — as you can see there is a lot of building in the southern part of the city, such as the Mission District. The lone dot is Candlestick Park, which is to be converted into housing units.

The second one is the SF Civic Art Collection data. Many art pieces are located in the same places such as City Hall, hence the vertical columns. And that column on it's own...that's the San Francisco Airport.

Step 5: Do Many, Many Samples

This took weeks and weeks of programming and printing and going back and forth before I settled on viable technique. I played with different forms. I tested clustering patterns for both looks and for structural considerations. I showed samples to friends.

I kept returning to my central question of what does data look like as a guide and tweaked my code to give a better form to the data.

Step 6: Run Clustering Algorithms

Finally, once I massaged and reworked the data, I ran a clustering algorithm, which essentially bunches the cubes together into one cohesive structure.

The cubes have to stick together. Every single one needs to be accounted for. I use a combination of a gravity attractor, a spherical searcher and a Brownian motion generator. Each "crystal" takes a different amount of time to properly cluster.

The video depicts the construction permit data, which only takes 2 minutes. However, the crime data has something like 35,000 data points and takes about 5 hours to properly cluster.


Step 7: Patch for Structural Integrity

After running the algorithms, I extract the model as an STL file and inspect it closely for structural defects.

Using a combination of MeshLab (good for quick inspection) and 123D Design (good for adding material), I fix up any weak structural points. Usually there is no more than two spots of question, but the last thing I want is for the Data Crystal to break because it is too fragile.

Step 8: 3D Print and Clean

I run these prints out on an Objet500 Connex3 printer, which leaves behind this form, a cocoon-like support material.

With all the nuances and contours of the Data Crystal, there is a lot of cleaning required. I soak it in water overnight, I pick away at it with dental tools and I use a high-pressure water jet to blast out the support gunk.

Step 9: Mount on Wood

Once fully cleaned, the data sculptures are ready for finishing work. Using 1/16" stainless wire, I mounted each of the 3D prints onto an exotic hardwood stand to give it a compelling presentation.

I carefully drilled into the base of the 3D sculpture, which has to be done by hand and then I press-fit it onto the wire. I did the holes for the base on a drill press. 

Step 10: Done!

These are some of the final Data Crystals (in order of images):
- SF Civic Art Collection
- Development Pipeline (a.k.a. construction permits)
- Incidents of Crime (over a 3 month period).

I hope this was inspiring and an alternate approach to Data Visualization
For more Data Crystals and other projects, you can find me here: @kildall or www.kildall.com/blog
Scott Kildall