Introduction: Real Estate Data Visualization
As a former real estate agent, aspiring investor, and software engineer, I have set out to learn more about the data behind real estate.
Zillow is an excellent resource to find an overview of the trends going on in the real estate market. For this visualization, I have taken the time to compile the average sales price for residential homes for each state yearly.
Zillow provided me with the average sale price of a house for a given state over one month. The following visualization shows the Year-Over-Year calculation of home prices based on Zillow’s data.
- Internet Connection
- Python 3.x Installation
- Node.js installation (for exporting photos)
- Video editor (to polish the visualization)
Step 1: Data Preprocessing
Zillow provided me the average sale price of a residential home for each month in a given year. There was a spreadsheet for each of the fifty states. To have data that I could process in python, I combined each spreadsheet's price data into one excel workbook.
For each state:
- Average the sale price of a home by year
- Calculate the year over year change
The result of the data preprocessing resulted in the table seen in the picture shown above.
To make things easier to work with, I converted this table into a CSV file.
Once I save the CSV file to my project directory, I may begin working on the visualization.
Step 2: Write Code
The program follows a basic set of logic:
- Use the python library pandas to read in the CSV file data
- Print it to make sure that everything looks good
- Create a plotly figure
- Set the data of the plotly figure object to one of the columns in the dataset
- Configure the color scheme, graph parameters, title, etc
For an interactive visualization, calling fig.show() inside of a jupyter notebook will allow you to view the data interactively.
The logic above is all you will need to know in order to create this type of visualization. If you would like to see how I exported the visualization to a photo sequence, see my gist on github.
Step 3: Combine Normalized Graphs
One of the challenges that I faced when creating this visualization is having the ability to specify the minimum and maximum values of for the chart. I never managed to figure it out. In order to get a minimum viable data visualization out the door, I ended up creating two visualization sequences.
For the first sequence, I hard coded Arkansas and Alabama to the minimum value and maximum value in my dataset. This forced plotly to normalize the colors for the rest of the states to be within that range.
For the second sequence, I hard coded Alaska and Hawaii to the minimum value and maximum value in my dataset.
My website article shows the gifs of the two sequences if you would like to see it visually.
In Adobe Premiere, I masked the part of the image sequence containing Hawaii and Alaska (using the image sequence where Arkansas and Alabama were hard coded). This provided me with a normalized data visualization for Hawaii and Alaska. I can then overlay this masked portion onto the second visualization (the one where Alaska and Hawaii are hard coded.)
The resulting composite is the normalized image sequence for all fifty states.
Step 4: Add Artistic Touches
In Adobe Premiere, I added:
- a custom formatted title
- an animated dot that corresponds with the year
- labels for the range
- frame interpolation to smooth out the graph
- and an intro/outro sequence
This is completely optional but it was by far the most rewarding part.
I hope you found this helpful.
Let me know in the comments if you have any questions or would like to see a new visualization in the future!