Introduction: Plot COVID Data on Interactive Graphs
Sometimes it can be difficult to sift through COVID-19 data to figure out what information is important and pertinent to you. It can be hard to find localized data or the localized data is poorly presented and indecipherable. This project aims to address that need.
In this tutorial, I will teach you how to plot COVID-19 data on interactive graphs using Python. The data is pulled from the Johns Hopkins University COVID-19 Data repository. This project uses two very powerful libraries: Pandas and Plotly. It plots four key pieces of data: total confirmed cases, new cases per day, total confirmed deaths, and new deaths per day. This data is plotted over time since the start of the pandemic at the county and state levels for the United States.
Always remember, stay safe out there and look after each other!
All of the code for this project can be found here: https://github.com/mjdargen/covid
Preview of Output Graphs: https://dargen.io/covid
COVID-19 Data by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University: https://github.com/CSSEGISandData/COVID-19
Step 1: Setting Up Environment & Dependencies
In order to run the code required for this project, you will need to install Python 3, git, and ffmpeg. The installation steps can vary slightly based on your operating system, so I will link you to the official resources which will provide the most up-to-date guides for your operating system.
Git Project Files
Now you will need to retrieve the source files. If you would like to run git from the command line, you can install git by following the instructions here: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git. However, you can just as easily download the files in your browser.
To download the files required, you can navigate to the repository in your browser and download a zip file or you can use your git client. The link and the git command are described below.
https://github.com/mjdargen/covid
git clone https://github.com/mjdargen/covid.git
Installing Dependencies
You need to install the following programs on your computer in order to run this program:
- Python 3 with pip: https://www.python.org/downloads/
- Install requests, pandas, & plotly: `pip3 install -r requirements.txt`
Step 2: Pandas & Plotly
Pandas is a Python software library created for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. Think of it as the programmatic equivalent to Excel. We will be using it to parse the data from CSV (comma-separated values) file and extract the needed information for plotting.
- pandas Documentation - pandas can do a lot so this site may seem very overwhelming. You can search this website or use a search engine to find the specific information you need.
- pandas Tutorials
- Intro to Dataframes
- Reading in CSV Data
- Manipulating Data
- pandas Cheatsheet
Plotly is a software library created for data analytics and visualizations. We will be using it to make beautiful interactive graphs that you can zoom in and select the relevant data you want to inspect. The Plotly documentation is a lot less overwhelming and has a number of helpful examples.
Step 3: Graph Descriptions
The program produces 16 graphs. The 16 graphs are broken down into four categories: by county, by state, county comparison at state level, and state comparison at federal level. For each of these four categories, they are four different types of graphs: total confirmed cases, new cases, total confirmed deaths, and new deaths. The 16 graphs are described below:
- Total Confirmed COVID-19 Cases in Wake County
- New COVID-19 Cases in Wake County
- Total Confirmed COVID-19 Deaths in Wake County
- New COVID-19 Deaths in Wake County
- Total Confirmed COVID-19 Cases in North Carolina
- New COVID-19 Cases in North Carolina
- Total Confirmed COVID-19 Deaths in North Carolina
- New COVID-19 Deaths in North Carolina
- Total Confirmed COVID-19 Cases in North Carolina by County
- New COVID-19 Cases in North Carolina by County
- Total Confirmed COVID-19 Deaths in North Carolina by County
- New COVID-19 Deaths in North Carolina by County
- Total Confirmed COVID-19 Cases in US by State
- New COVID-19 Cases in US by State
- Total Confirmed COVID-19 Deaths in US by State
- New COVID-19 Deaths in US by State
Step 4: Program Description
The program performs the following functions: creates the necessary directories, checks for updates to the data, downloads the new updates, and produces the 16 graphs. I have broken the program down by function to describe how it works. The code also contains thorough comments describing each step that was performed.
Code: https://github.com/mjdargen/covid/blob/main/main.py
make_dirs() - This function creates the necessary input and output directories to allow the program to work. The input directory is where the .csv files are stored once they are retrieved from Johns Hopkins. The output directory is where the .html files for the interactive graphs are stored. This allows you to open the graphs later.
check_updates() - This function retrieves one of the .csv files from Johns Hopkins and checks to see if it was updated. If it was, it lets the program know that it needs to retrieve all of the new files.
download_files() - This function downloads all of the files from Johns Hopkins. There are four files in total: time_series_covid19_confirmed_US.csv, time_series_covid19_confirmed_global.csv, time_series_covid19_deaths_US.csv, and time_series_covid19_deaths_global.csv. The global files are presently unused. However, if you are interested in pulling data from other countries, this code could be easily manipulated to produce similar graphs for countries and localities around the world.
confirmed_county(state, county, mode, show) - This function produces the graph showing the total number of confirmed cases/deaths for a given county in a given state over time. Pandas is used to open the .csv file, drop the unnecessary data fields, and restructure the data for plotting. Plotly is then used to create the graph, show the graph, and save .html for later reference.
new_county(state, county, mode, show) - This function produces the graph showing the daily number of new cases/deaths for a given county in a given state over time. Pandas is used to open the .csv file, drop the unnecessary data fields, and restructure the data for plotting. Plotly is then used to create the graph, show the graph, and save .html for later reference.
confirmed_state(state, mode, show) - This function produces the graph showing the total number of confirmed cases/deaths for a given state over time. Pandas is used to open the .csv file, drop the unnecessary data fields, and restructure the data for plotting. Plotly is then used to create the graph, show the graph, and save .html for later reference.
new_state(state, mode, show) - This function produces the graph showing the daily number of new cases/deaths for a given state over time. Pandas is used to open the .csv file, drop the unnecessary data fields, and restructure the data for plotting. Plotly is then used to create the graph, show the graph, and save .html for later reference.
confirmed_by_county(state, mode, show) - This function produces the graph showing the total number of confirmed cases/deaths for all counties in a given state over time. Pandas is used to open the .csv file, drop the unnecessary data fields, and restructure the data for plotting. Plotly is then used to create the graph, show the graph, and save .html for later reference.
new_by_county(state, mode, show) - This function produces the graph showing the daily number of new cases/deaths for all counties in a given state over time. Pandas is used to open the .csv file, drop the unnecessary data fields, and restructure the data for plotting. Plotly is then used to create the graph, show the graph, and save .html for later reference.
confirmed_by_state(mode, show) - This function produces the graph showing the total number of confirmed cases/deaths for all states in the US over time. Pandas is used to open the .csv file, drop the unnecessary data fields, and restructure the data for plotting. Plotly is then used to create the graph, show the graph, and save .html for later reference.
new_by_state(mode, show) - This function produces the graph showing the daily number of new cases/deaths for all counties for all states in the US over time. Pandas is used to open the .csv file, drop the unnecessary data fields, and restructure the data for plotting. Plotly is then used to create the graph, show the graph, and save .html for later reference.
generate_docs() - This function is a commented out utility function used to create the example documentation.
main() - The main function responsible for the program flow. Calls all other functions in the order described above.
Step 5: Run!
Now, it's time to run the program! Run the program from a terminal using the command below or run it by double-clicking the executable Python script. There will be two user prompts. One asking you for the state you would like to view data for and one asking you for the county.
python3 main.py
There are some minor modifications you can make to the code to make it work for your application. For example, if you want to just save the graphs instead of having them pop up when they are ready, set the last argument to the graph functions to be "False" instead of "True". If you only want to plot case numbers, change the loop in the main() function so that it does not loop back through to plot deaths as well.
Step 6: More Projects
For more projects, visit these links:
To support me, go here: https://www.buymeacoffee.com/mjdargen