loading

I've been disappointed for a while now with the options for displaying population variation or uncertainty in graphs.

(For the files used to create the graphs, see here and here.)

Motivation:
As far as I can tell, there are three options for plotting :

  1. Standard bar chart + error bars (typically at +- 2*SD, ie. α = 0.05; For example)
  2. Co-graphed CDFs of the populations or samples (For example)
  3. Co-graphed histograms (or PDFs) of the populations (For Example)

My complaint is that:

  1. Requires that the reader be familiar with the concept of population variation and/or uncertainty; it does not provide intuitive support for the idea.
  2. Is a somewhat unusual type of graph, so the unsophisticated may not know how to read it. Plus, it gets messy when a large number of data series are graphed.
  3. Is just plain weird to read if there's substantial differences between the variation of the datasets.

To that end, this instructable is an experiment in intuitive, layman-friendly presentation of population variation within a bar chart by, essentially, topping each bar with a CDF.

Let me know what you think!
Also let me know if you know of any work similar to this, or if you have any ideas on the topic.

Step 1: Creating Said Charts

Since this is an early experiment, my process is pretty rough (and time consuming), but here goes:

  1. Use Excel's X-Y plot to draw some bars. (files here and here; all user-modified values on sheet 1, calculations on sheet 2.)
  2. Use your photo-editing tool of choice to fill in the bars. I use paint; takes about 1-2 minutes per bar, depending on geometric complexity.
  3. Add legends, axis labels, etc.
<p>I am always looking for new and interesting ways to present data, as that's what I have to do professionally! This concept is very interesting, but I think needs a little refinement.<br><br>Looking at the plots as an analyst, the concept is intuitive and easy to understand, but the underlying assumption is that your data is Gaussian Distributed; eventhough this is most common, many data sources are not (for example Time-To-Failure for devices tends to be Gamma Distributed and this is important for many engineers). So some consideration towards shaping will be needed.<br><br>Looking at the plots as a non-analyst, the concept, unfortunately, isn't intuative. Most of the time, those making &quot;decisions&quot; based on the data you produce are not domain experts and are trained to just glance at plots rather than look and think about what the data means. Because of this, the plots make the minimum values very evident. Others may either initially believe that the curve on the top of the bar is just a way to make the plot look more beautiful, or worse believe it's an optical illusion to trick them in how to interpret the data. As such, I warn to be aware of your intended audience and maybe lead by presenting how to read the plots before presenting data.</p>
<p>&gt; Non Gaussian distributions exist.<br>Yep. Sorry for not explicitly addressing this in the OP.<br>The basic idea supports plotting them... though in Excel, that would get complicated fast, so, in this prototype, I limited myself to one distribution.</p><p>&gt; The concept isn't intuitive to the non-analyst.<br><br>I think you are correct, especially WRT the possibility that they might think it's just decoration on a bog-standard chart.<br><br>The concern that the plots make the minimum values very evident was, on the other hand, very surprising. I felt that, if something like that, it was the /maximum/ values that stood out. I'll have to think about that when working on future interations.</p>
<p>I do understand the issues with Excel scripting. You may wish to consider prototyping data displays with other tools and go from there. Remember, sometimes great ideas will not be supported by current tools, and when they catch on, those tools will bend to support your idea. That's how Tufte's &quot;Spark Lines&quot; gained mainstream popularity!</p><p>Again, I'd like to state that this is a very interesting way of presenting the data. I think one of the most valuable features this could grant is if that are different distributions between compared random variables (i.e., it would communicate that one variable was infact Gaussian and another was a F-Distribution at a glance). Adding additional dimensionality to presented data is powerful and certainly this technique is not something I've ever considered before! Thank you for your contribution!</p>
<p>New ways to display data are always good! Thanks for sharing your knowledge!</p>

About This Instructable

558views

13favorites

More by BoilingLeadBath:Linear-Cam Driven Keyboard Drawer Honda Fit Bedliner  Easy-Entry Knife Block Disassembles for Cleaning 
Add instructable to: