155Views35Replies

Author Options:

Ratings, so you know... Answered

I spent a bit of time today looking again at the Instructables rating system, here's what I found:

A single rating lands an Instructable around the 3 mark, there is a huge spike just on the upper side of 3.0. Mean average rating (rather close to Pi) is 3.14

Most ratings seem to be positive, the overall distribution has 68.7% of rated Instructables "above average" (60.5% of all Instructables if you include unrated submissions)

There is a "wave" of ratings heading up towards 4, and a smaller ripple going down towards 2. I see this as some kind of fluid-motion effect like waves on water, but I'd be interested in opinions.

50% of rated Instructables sit between 2.9 & 3.4
Instructables rated above 3.4 are in the top 25%
Instructables rated below 2.9 are in the lower 25%

Note: Rating is essentially a measure of popularity, it is not a linear indicator of quality / value.

Other:
Only 31 Instructables are rated above 4.5 (0.1%)
Only 19 Instructables are rated below 1.5, 1 of which is the lowest at 0.99
Only 12% of Instructables are unrated

The rating algorithm is in Rachel's FAQ if you're interested (I probably should have put this in earlier)

Discussions

0
None
yokozuna

9 years ago

Quote: "Note: Rating is essentially a measure of popularity, it is not a linear indicator of quality / value." I agree, and don't think that's necessarily a good thing. I once was the first to rate an ible, and gave it a 3.5. It then showed up as having a 2.94 average or something like that. It was like I downrated the ible, which I didn't! We can already tell popularity by page views and to a lesser degree the number of comments. Since it seems so few people rate I would like to see the ratings be more of an average and less of bringing everything to the middle. JMO :)

0
None
lemonieyokozuna

Reply 9 years ago

It works better with a lot of ratings, in the green & red zones you've got definitely liked and definitely not liked. L

0
None
yokozunalemonie

Reply 9 years ago

You are correct, but I still perceive it to be a problem because few get to the red and green zones, mostly due to lack of votes.

0
None
kelseymhyokozuna

Reply 9 years ago

The weighting, and the use of the site-wide average, is (I believe) an important feature of the system. It prevents the stupid bias of "five star" (or "zero star") Instructables coming from just one or two votes.

They are essentially imposing the Copernican assumption of "everything is average until proven otherwise." :-)

0
None
yokozunakelseymh

Reply 9 years ago

I'm not saying it shouldn't be weighted. I just don't think it should be weighted so much. Out of my published instructables, there isn't a full point of difference in the rating of any two of them. I think that some are quite a bit better quality than others, even if they are only leaning towards the two ends of average. In fact, with the exception of the Stolen Ibles one I'm pretty sure the ones that got rated more often tend to be rated higher, regardless of quality. Several of those benefited in page views (and thus number of ratings, and thus higher overall rating) from contests, which should have little to do with the overall rating. Again, it's all just my opinion, I don't mean to step on toes. I'm hoping the staff appreciates the feedback and can do whatever pleases them with the information. On a side note, one of my co-workers once told me "You should always strive for mediocrity, that way nobody steals your... stuff." Only he didn't say stuff.

0
None
kelseymhyokozuna

Reply 9 years ago

I don't think you're stepping on toes! You've got a good, thoughtful opinion about an intrinsically hard problem. You're pointing out some clear shortcomings of the existing algorithm. Others have pointed out problems with the simplest alternative (no weighting). Rachel et al. could add more complexity to the algorithm to deal with your concerns (e.g., reduce the weight of the site-average into the score as the number of votes goes up). It's not clear that a more complex algorithm will be either palatable or understandable to the average member.

0
None
yokozunakelseymh

Reply 9 years ago

I think the algorithm works quite efficiently, just doesn't provide as much range as I'd like. The simplest solution I can think of is: current algorith + actual rating average % 2. Of course I think that may take it back too far the other direction. So perhaps something like 2(current algorithm) + (actual rating average) % 3.

0
None
kelseymhyokozuna

Reply 9 years ago

You ought to send Rachel and Eric a PM about this. You've got a concrete proposal, and a good argument supporting it. You need access to the raw data to test whether your proposal does what you want (and doesn't do what you don't want :-).

0
None
Ninzerbean

9 years ago

But but but... it depends on how often you rate an ible as to how much weight your rating is worth - this is to prevent shills from over inflating your tires or something like that. For an example today I rated an ible that had not been rated yet, with a 4.5 ,and it was posted as a 3.3, and I rate pretty often, so I would have thought my rating might actually give the person a rating close to what I rated it but obviously not. Any thoughts? My thoughts on this are that the folks who don't understand the system might feel quite slighted and consequently discouraged by a few ratings and the "score" being a low 3. I know I did until it w as explained to me.

0
None
zachninmeNinzerbean

Reply 9 years ago

The rating system is impersonal: your ratingness doesn't change how much your vote counts. Ratings are just averages, but with a few extra data points thrown in to smooth them out (and prevent the "1 vote 5 stars" issues)

0
None
Ninzerbeanzachninme

Reply 9 years ago

That's not what I was told awhile back, how do you know you are right? That comes across sounding like my hands are on my hips and I'm yelling at you - I'm not, I just want to know how you know what you know.

0
None
zachninmeNinzerbean

Reply 9 years ago

Hmm... maybe they've changed it. I helped them do some stats when they were setting up the star system, so I understood well how it worked, at the time. I wouldn't be too surprised if they changed it, though! I remembered something, though. I know for sure, your rating is weighted more if you've commented.

0
None
kelseymhzachninme

Reply 9 years ago

Hey, Zach. Do you have an easy way to query I'bles and pull out the individual data on ratings? That is, for each I'ble, what is the current value, and how many users have put in values?

0
None
zachninmekelseymh

Reply 9 years ago

(Re: earlier post) Ah, so they did change it.

You can only see the data that's there, but you just want to automate the scraping? There's no way to see individual ratings, but you can get the 'raw data' by looking in the HTML.

Around line 955 (for me) for this page, there is the following:
var rateIt_TVEZT4LFZ8J39G8 = new InstructStarbox(        &aposrate_TVEZT4LFZ8J39G8&apos,         3.073076923045769,         { overlay: &apospointy_gray.png&apos,          total: 3,           max: 5, ...snip...

You can see here the rating is 3.07307... and there are 3 votes. (total). If I were scraping for this, I'd look for the string " new InstructStarbox("

I'm sure lemonie could show you his code or whatever he used.
0
None
lemoniezachninme

Reply 9 years ago

Is that commented on the posting, then voted? L

0
None
Ninzerbeanlemonie

Reply 9 years ago

My understand is that it's an overall thing - not at all for that particular posting.

0
None
kelseymhzachninme

Reply 9 years ago

Hi, Zach. The algorithm was adjusted last November; see Rachel's FAQ, which she updated to describe the per-user weights.

0
None
kelseymhNinzerbean

Reply 9 years ago

If your single value contributed ten percent to the weighted average, then you certainly do have "undue influence" (and I mean that in a good way :-D).

There's a minor clarification to what you write. It's not "how often you rate "an" ible" (you can only rate any given I'ble one time, repeats just replace your previous value), rather, people who "rate mahy ibles" get a higher weight contribution.

If you read Rachel's FAQ describing the algorithm, you'll see that the initial value for any I'bles rating is the average of all ratings of all I'bles on the site. With more than 25,000 rated I'bles (out of 29,000 total), you're single rating cannot possibly have 100% influence.

0
None
lemoniekelseymh

Reply 9 years ago

What's your opinion on the distribution, I'd like more data-points but the rough shape shows patterns well enough? L

0
None
kelseymhlemonie

Reply 9 years ago

The distribution makes sense to me. Since the site-wide average is very close to three, that's what you'd expect for the mode value (all of the one-to-few rated I'bles). For I'bles with enough "votes" to have a meaningful value, you'd expect something more-or-less symmetric about the mode.

I do think people are more likely to rate something they like (even if they aren't "huge fans") than something they dislike only a little. That selection bias may very well be enough to skew the distribution in the way you observe.

I'm not convinced that your interpretation of the secondary peaks as "waves" is correct (but not necessarily incorrect either!). You're looking at a static distribution, not a snapshot of some time evolution. Either those "peaks" are just statistical fluctuations (very unlikely with bin-by-bin contents of ~1,000), or they're an artifact of the discrete rating values.

I would be curious to see the 2D distribution of ratings vs. number of votes. If your hypothesis is correct, then you should see those secondary peaks form a "V" in that 2D space, where they are most widely separated for a large number of votes, and get closer to the modal peak as the number of votes drops. If my hypothesis is correct, then there should be no correlation.

I tried looking at your spreadsheet, but it only contains the bin values needed to draw the plot, not the underlying raw data.

0
None
lemoniekelseymh

Reply 9 years ago

Thanks for the opinion, I maybe should be thinking of standing waves. That is, one rating goes (here), two ratings push (this far), three (this far) and above so many it gets fuzzy, but for a section on their way / up / down there's a distinct lump like an interference pattern or something. The .xls has counts per rating block - 0.1 points per block, that is the raw data I'm afraid... L

0
None
kelseymhlemonie

Reply 9 years ago

Ah, we agree :-) What you describe is just what I had in mind with the 2D distribution (rating value vs. votes). I'm not convinced you're right, but such a plot is the test case for both our hypotheses.

0
None
lemoniekelseymh

Reply 9 years ago

The data I collected in March shows a similar "curve" but because I only counted to the nearest 200 it's much more lumpy so I won't reproduce it. L

0
None
Ninzerbeankelseymh

Reply 9 years ago

Yes, that's what I meant, it could be read both ways.

0
None
lemonieNinzerbean

Reply 9 years ago

When you see people posting comments that indicate they've given it 5, or 0.5 you see a measure of popularity at work. People have to like or dislike it enough to feel inclined. As you observe, what you click and what you get are usually different things, which is why I posted this. The original topic explains the complex maths. L

0
None
DJ Radio

9 years ago

There has only been 1 ible in history that is below 1*

0
None
Ninzerbeanlemonie

Reply 9 years ago

58 people have been to this ible and guess what? 58 people have voted on that terrible puzzle one.

0
None
DJ Radiolemonie

Reply 9 years ago

OMG! I thought the extreme mini knex gun had once again dipped below the 1* mark.

0
None
DJ RadioRock Soldier

Reply 9 years ago

the ible lemonie linked to, or the extreme mini knex gun?

0
None
trebuchet03

9 years ago

A while back (long while), the instructables team posted the rating algorithms ideology. Something about being normalized to the average of ALL instructables' rating of rated projects.... This is why you've got a healthy spike around the 3ish range :) I've seen the rating function written out on a white board (perhaps an old version)... IIRC, there was no page view (popularity) parameter.

0
None
lemonietrebuchet03

Reply 9 years ago

Yes I've seen it. But the spike in the middle is sitting on top of the healthy distribution, this is where the on-rating entries come into the system, as yokozuna observes.
My imagination sees a shield volcano, with a big plume of fresh lava rising in the middle, then rolling down either side?

L

0
None
Ninzerbean

9 years ago

I don't think it's about popularity though L. The average ible gets about 3 ratings per thousand hits.