How to Graph the Instructables





Introduction: How to Graph the Instructables

About: Swedish expat living now living in Malaysia after spending some years working in Dubai.
I was curious of how many instructables that are published each month and what the growth rate looks like. Since I couldn't find any statistics here I decided to do it myself.

Sorry, there are no photos, just some screenshots.

Tools needed:
  • Access to a Linux shell prompt
  • A Google account

You can probably use BSD instead of Linux if you're prefer a deamon over a penguin. And I'm sure that even a GNU toolkit for Windows will work as well.

The Google account can also be replaced by Excel, Open Office or any other decent spreadsheet. I wanted to try the Google Docs so I used that here.

Step 1: Finding the Data

Ok, the first step of making a graph is obviously to collect the information you want to created the graph from.

In the Explore-menu on the site there's a entry called Most recent that shows all instructables in groups of 15. The URL for that page looks like this: . After clicking on one of the numbered pages at the bottom the string "&offset=15" is added to end of the URL.

On each of the 15 entries on the pages there are a line that displays the author and the date of that particular instructable. "posted by Parker955 on Aug 4, 2007"

So we now know the URL for getting the data that we want to use and we also have a string to search to extract the data from the page.

Step 2: Getting the Data

Here's the BASH script that I wrote to get the data. I think the comments in in code itself is descriptive enough so I won't explain them again here. I'd be happy to explain any particular part of the code that you don't understand, just ask in the comment section.

Save the file as and make it runnable by giving the command "chmod 0755" .

Run the program by typing "./"

#!/bin/bash# Start from the beginningoffset=0# Continue getting pages until the offset is larger than 5731while [ $offset -lt 5731 ] do  # Be nice to the webserver!  sleep 1  # Show te user the progress  echo $offset    # Make the URL to retrieve  url=''$offset  # Get the page as viewable text (not html) by using the -dump flag  # Then let grep remove all lines not containing the phrase "posted by"  # And finally concatenate all lines into the log-file for later processing  lynx -dump $url | grep "posted by" >> instructables.log  # Increment the offset with 15 to get to the next 15 instructables  offset=$((offset+15))done# Now we've got the dates of all instructables in the file named instrucatbles.log # It't time to count the number of lines for each combination of lines that contains# a particular year and month combo. (Like Mar 2007).  By counting them we know how# many instructables that was publised during that month.# The outer loop handles the years that we are interested infor year in 2005 2006 2007do  # The inner loop handles the months  for month in Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec  do    # First search for all lines containign the year.    # Then pipe that result into a second grep that searches those lines for the month    # And finally count the number of resulting lines    count=$(grep $year instructables.log | grep $month | wc --lines)     # Show the information to the user to we can make a graph of it later    echo $year $month,$count  donedone

Step 3: The Output

This program will run for quite a while since I've added a pause for 1 second between each page request. Without that delay the script would hammer the webserver with a shitload of requests in a very short period of time and that would probably incur the wrath of Mr. ewilhelm, something that we all should avoid. :-)

When the program is done it will show a list of months+years and the count of the number of instructables written that month. It will look like this:

2005 Jan,02005 Feb,02005 Mar,02005 Apr,02005 Jun,02005 Jul,02005 Aug,752005 Sep,472005 Oct,232005 Nov,222005 Dec,442006 Jan,302006 Feb,442006 Mar,1582006 Apr,1462006 May,1422006 Jun,2072006 Jul,2832006 Aug,2282006 Sep,2802006 Oct,3172006 Nov,2532006 Dec,2572007 Jan,3202007 Feb,3522007 Mar,3672007 Apr,4272007 May,4482007 Jun,6102007 Jul,6362007 Aug,472007 Sep,02007 Oct,02007 Nov,02007 Dec,0

As can be seen in this list there are no data before August 2005, and since I'm running this in the first week of August 2007 the count for this month is really low and should be discarded.

Step 4: Day-by-Day Info

To extract day by day data from the file generated by the script in step 2 the following script can be run.

#!/bin/bash# The outer loop handles the years that we are interested infor year in 2005 2006 2007do  # The middle loop handles the months  for month in Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec  do    # The inner loop handles the days    day=1    maxdays=31    if [ "$month" == "Feb" ]; then maxdays=28; fi    if [ "$month" == "Apr" ]; then maxdays=30; fi    if [ "$month" == "Jun" ]; then maxdays=30; fi    if [ "$month" == "Sep" ]; then maxdays=30; fi    if [ "$month" == "Nov" ]; then maxdays=30; fi    while [ $day -le $maxdays ]    do      # Search for the month day year-combo in the already existing log file      # and count how many instructables there are for that particular day      count=$(grep "$month $day, $year" instructables.log | wc --lines)       # Show the information to the user to we can make a graph of it later      echo $day $month $year,$count      day=$((day+1))    done  donedone

To see the graphs attached to this step in the instructable you need to click on the (i) -symbol in the upper left corner of the image and then on the link with the size information ( (880x585) 24 KB) just below the image. You will then get a bigger and unclipped version of the image.

I've also included the excel spreadsheet with the day-by-day data that generated the charts on this step.

Step 5: Google Docs

Here I'm showing how to do in using Google Docs. If you're using Excel or OO you're on your own.

Start by going to Google Docs. If you don't have a Google Account you need to Sign Up first. It's free and gives you a lot of nice features.

When you are at Google Docs click on the New-button and select Spreadsheet

Step 6: The Spreadsheet

Copy all lines from the output if the Linux script that you want to include in the graph.( I copied from August 2005 to July 2007.)

To paste the data you need to select the A1-cell and then press ctrl-v. Note! You can't click on the clipboard or right click and select paste or even use the Edit | Paste menu of the Firefox. You must use ctrl-v to paste external data.

Now we need to separate the dates from the counts and put these separately into the B and C columns.

In the B1-cell type "=MID(A1,1, 8)" and press Enter
In the C1-cell type "=value(MID(A1,10,999))" and press Enter
( Note: Don't type the "-marks )

After entering the formulas in B1 & C1 we need to copy them to the rest of the rows. Mark both B1 & C1 by clicking and holding on B1 and drag over to C1. Then right-click and select Copy.

Mark the entire destination area by clicking and holding on cell B2 and drag down to C24. Then right-click on select Paste.

We are now ready with the copying of the data and it's time to create the graph.

Step 7: Making the Graph

Creating the graph from the data is the easiest step. Click on the graph symbol and fill in the form that pops up.

There you should select the Columns type of chart. Enter some descriptions for the legends and labels. Tell the chart that you want to use the area B1 to C24 as the data for the graph by typing B1:C24 in the area field. And finally tell the graph that you have the labels (the dates) in column B.

Please note the screenshot below incorrectly shows C22 as the last cell instead if C24. I forgot the month of May on my first run. :-(

Click on Save Chart and you're done here. The graph will be nicely inserted into the spreadsheet and you can save the graph as an image by clicking on the Graph-dropdown and select Save Image.

Easy as pie!

Step 8: The Final Result

The final result of this exercise is the image below.

There it can be clearly seen that we have an increasing rate of new instructables almost every month. Between April and June 2007 we got a 50% increase! And looking back further to February / March 2006 the increase was even larger. On the graph in looks like 300%.

Note: The 3D graph is done in Excel.



    • Woodworking Contest

      Woodworking Contest
    • Planter Challenge

      Planter Challenge
    • Casting Contest

      Casting Contest

    We have a be nice policy.
    Please be positive and constructive.




    I would like to see (or make) a graph on people becoming members of Instructables, but I don't know where to get that information.

    Hmm... I just got numbers... currently I have a list of seemingly random numbers between 0 and 885... help, please?

    Hey thanks! I might just use your script! Month-by-month is little coarse; how about modding the script to do day-day?

    8 replies

    Sure, I can do that. But do you really want to do it this way? I mean that this technique is good for pulling data from a site that you only have web access to. If you got access to the underlying info from the database itself the correct way would be to get the data aggregated into the needed format directly and then plot the data using php or jsp or whatever language the web server is using. Or maybe plot it with Excel connected to the database using a ODBC connection. But if you want to do it "the external way" how would you like to display the data? If doing it we would currently have about 750 point in the dataset. A 750 pixel with graph with one pixel representing each day?

    Well of course I can run a query on the database and get this, but that's only me. I thought others might be interested in seeing the spike in projects right before contest deadlines, for example. Perhaps I'm just being optimistic that anyone wants to see this data besides me.

    I would like to have a graph in my stat tab.

    What do you want to see in the graph?

    I want to see the daily views

    Nah, at least I'm interested in seeing this, so we're two at a minimum :-) I've added a new step4 in the instructable with the script that can be run after running the monthly script. It takes the log file already generated by the first script and extracts daily data from it. There are some graphs and the excel file with that daily data in it. I could need some help visualizing this data because the graphs I've made out of it are very "noisy" and it's a bit hard to really see any real spikes...

    I typically look at the cumulative number of projects rather than the day-by-day totals. You are essentially looking at the derivative, which is definitely going to be noisy.

    I added cumulative graphs in step4, but I'm really not happy with them either....

    Wow, I always wanted to do that. I don't have linux, and have no scripting knowledge. Can this work of an Abyss server running on spare computer?

    So, I go away for a week, and this is why it takes me so long just to skim through what's been posted in my absence!

    I think it would be great if each month had a different colour. Like: January always red, February always orange...

    1 reply

    Yea, I agree, but the level "customibility " of the graphs in Google Docs seems to have its limitations. You can't change the colors - at least I haven't found a way of doing it. But I've added a 3D graph done in Excel in the last step for another view of the data.

    Hmmm... MAYbe it got eaten by a gremlin? :-) Sorry, my bad. I'll correct the script and redo the graph with May included.

    That said and corrected, all around a nifty idea. + !