149Views35Replies

Author Options:

Ratings, so you know... Answered

I spent a bit of time today looking again at the Instructables rating system, here's what I found:

A single rating lands an Instructable around the 3 mark, there is a huge spike just on the upper side of 3.0. Mean average rating (rather close to Pi) is 3.14

Most ratings seem to be positive, the overall distribution has 68.7% of rated Instructables "above average" (60.5% of all Instructables if you include unrated submissions)

There is a "wave" of ratings heading up towards 4, and a smaller ripple going down towards 2. I see this as some kind of fluid-motion effect like waves on water, but I'd be interested in opinions.

50% of rated Instructables sit between 2.9 & 3.4
Instructables rated above 3.4 are in the top 25%
Instructables rated below 2.9 are in the lower 25%

Note: Rating is essentially a measure of popularity, it is not a linear indicator of quality / value.

Other:
Only 31 Instructables are rated above 4.5 (0.1%)
Only 19 Instructables are rated below 1.5, 1 of which is the lowest at 0.99
Only 12% of Instructables are unrated

The rating algorithm is in Rachel's FAQ if you're interested (I probably should have put this in earlier)

35 Replies

user
yokozuna (author)2009-09-07

Quote: "Note: Rating is essentially a measure of popularity, it is not a linear indicator of quality / value." I agree, and don't think that's necessarily a good thing. I once was the first to rate an ible, and gave it a 3.5. It then showed up as having a 2.94 average or something like that. It was like I downrated the ible, which I didn't! We can already tell popularity by page views and to a lesser degree the number of comments. Since it seems so few people rate I would like to see the ratings be more of an average and less of bringing everything to the middle. JMO :)

Select as Best AnswerUndo Best Answer

user
lemonie (author)yokozuna2009-09-07

It works better with a lot of ratings, in the green & red zones you've got definitely liked and definitely not liked. L

Select as Best AnswerUndo Best Answer

user
yokozuna (author)lemonie2009-09-07

You are correct, but I still perceive it to be a problem because few get to the red and green zones, mostly due to lack of votes.

Select as Best AnswerUndo Best Answer

user
kelseymh (author)yokozuna2009-09-08

The weighting, and the use of the site-wide average, is (I believe) an important feature of the system. It prevents the stupid bias of "five star" (or "zero star") Instructables coming from just one or two votes.

They are essentially imposing the Copernican assumption of "everything is average until proven otherwise." :-)

Select as Best AnswerUndo Best Answer

user
yokozuna (author)kelseymh2009-09-08

I'm not saying it shouldn't be weighted. I just don't think it should be weighted so much. Out of my published instructables, there isn't a full point of difference in the rating of any two of them. I think that some are quite a bit better quality than others, even if they are only leaning towards the two ends of average. In fact, with the exception of the Stolen Ibles one I'm pretty sure the ones that got rated more often tend to be rated higher, regardless of quality. Several of those benefited in page views (and thus number of ratings, and thus higher overall rating) from contests, which should have little to do with the overall rating. Again, it's all just my opinion, I don't mean to step on toes. I'm hoping the staff appreciates the feedback and can do whatever pleases them with the information. On a side note, one of my co-workers once told me "You should always strive for mediocrity, that way nobody steals your... stuff." Only he didn't say stuff.

Select as Best AnswerUndo Best Answer

user
kelseymh (author)yokozuna2009-09-08

I don't think you're stepping on toes! You've got a good, thoughtful opinion about an intrinsically hard problem. You're pointing out some clear shortcomings of the existing algorithm. Others have pointed out problems with the simplest alternative (no weighting). Rachel et al. could add more complexity to the algorithm to deal with your concerns (e.g., reduce the weight of the site-average into the score as the number of votes goes up). It's not clear that a more complex algorithm will be either palatable or understandable to the average member.

Select as Best AnswerUndo Best Answer

user
yokozuna (author)kelseymh2009-09-08

I think the algorithm works quite efficiently, just doesn't provide as much range as I'd like. The simplest solution I can think of is: current algorith + actual rating average % 2. Of course I think that may take it back too far the other direction. So perhaps something like 2(current algorithm) + (actual rating average) % 3.

Select as Best AnswerUndo Best Answer

user
kelseymh (author)yokozuna2009-09-08

You ought to send Rachel and Eric a PM about this. You've got a concrete proposal, and a good argument supporting it. You need access to the raw data to test whether your proposal does what you want (and doesn't do what you don't want :-).

Select as Best AnswerUndo Best Answer

user
Ninzerbean (author)2009-09-05

But but but... it depends on how often you rate an ible as to how much weight your rating is worth - this is to prevent shills from over inflating your tires or something like that. For an example today I rated an ible that had not been rated yet, with a 4.5 ,and it was posted as a 3.3, and I rate pretty often, so I would have thought my rating might actually give the person a rating close to what I rated it but obviously not. Any thoughts? My thoughts on this are that the folks who don't understand the system might feel quite slighted and consequently discouraged by a few ratings and the "score" being a low 3. I know I did until it w as explained to me.

Select as Best AnswerUndo Best Answer

user
zachninme (author)Ninzerbean2009-09-07

The rating system is impersonal: your ratingness doesn't change how much your vote counts. Ratings are just averages, but with a few extra data points thrown in to smooth them out (and prevent the "1 vote 5 stars" issues)

Select as Best AnswerUndo Best Answer

user
Ninzerbean (author)zachninme2009-09-07

That's not what I was told awhile back, how do you know you are right? That comes across sounding like my hands are on my hips and I'm yelling at you - I'm not, I just want to know how you know what you know.

Select as Best AnswerUndo Best Answer

user
zachninme (author)Ninzerbean2009-09-07

Hmm... maybe they've changed it. I helped them do some stats when they were setting up the star system, so I understood well how it worked, at the time. I wouldn't be too surprised if they changed it, though! I remembered something, though. I know for sure, your rating is weighted more if you've commented.

Select as Best AnswerUndo Best Answer

user
kelseymh (author)zachninme2009-09-08

Hey, Zach. Do you have an easy way to query I'bles and pull out the individual data on ratings? That is, for each I'ble, what is the current value, and how many users have put in values?

Select as Best AnswerUndo Best Answer

user
zachninme (author)kelseymh2009-09-08
(Re: earlier post) Ah, so they did change it.

You can only see the data that's there, but you just want to automate the scraping? There's no way to see individual ratings, but you can get the 'raw data' by looking in the HTML.

Around line 955 (for me) for this page, there is the following:
var rateIt_TVEZT4LFZ8J39G8 = new InstructStarbox(        'rate_TVEZT4LFZ8J39G8',         3.073076923045769,         { overlay: 'pointy_gray.png',          total: 3,           max: 5, ...snip...

You can see here the rating is 3.07307... and there are 3 votes. (total). If I were scraping for this, I'd look for the string " new InstructStarbox("

I'm sure lemonie could show you his code or whatever he used.

Select as Best AnswerUndo Best Answer

user
lemonie (author)zachninme2009-09-07

Is that commented on the posting, then voted? L

Select as Best AnswerUndo Best Answer

user
Ninzerbean (author)lemonie2009-09-08

My understand is that it's an overall thing - not at all for that particular posting.

Select as Best AnswerUndo Best Answer

user
kelseymh (author)zachninme2009-09-08

Hi, Zach. The algorithm was adjusted last November; see Rachel's FAQ, which she updated to describe the per-user weights.

Select as Best AnswerUndo Best Answer

user
kelseymh (author)Ninzerbean2009-09-08

If your single value contributed ten percent to the weighted average, then you certainly do have "undue influence" (and I mean that in a good way :-D).

There's a minor clarification to what you write. It's not "how often you rate "an" ible" (you can only rate any given I'ble one time, repeats just replace your previous value), rather, people who "rate mahy ibles" get a higher weight contribution.

If you read Rachel's FAQ describing the algorithm, you'll see that the initial value for any I'bles rating is the average of all ratings of all I'bles on the site. With more than 25,000 rated I'bles (out of 29,000 total), you're single rating cannot possibly have 100% influence.

Select as Best AnswerUndo Best Answer

user
lemonie (author)kelseymh2009-09-08

What's your opinion on the distribution, I'd like more data-points but the rough shape shows patterns well enough? L

Select as Best AnswerUndo Best Answer

user
kelseymh (author)lemonie2009-09-08

The distribution makes sense to me. Since the site-wide average is very close to three, that's what you'd expect for the mode value (all of the one-to-few rated I'bles). For I'bles with enough "votes" to have a meaningful value, you'd expect something more-or-less symmetric about the mode.

I do think people are more likely to rate something they like (even if they aren't "huge fans") than something they dislike only a little. That selection bias may very well be enough to skew the distribution in the way you observe.

I'm not convinced that your interpretation of the secondary peaks as "waves" is correct (but not necessarily incorrect either!). You're looking at a static distribution, not a snapshot of some time evolution. Either those "peaks" are just statistical fluctuations (very unlikely with bin-by-bin contents of ~1,000), or they're an artifact of the discrete rating values.

I would be curious to see the 2D distribution of ratings vs. number of votes. If your hypothesis is correct, then you should see those secondary peaks form a "V" in that 2D space, where they are most widely separated for a large number of votes, and get closer to the modal peak as the number of votes drops. If my hypothesis is correct, then there should be no correlation.

I tried looking at your spreadsheet, but it only contains the bin values needed to draw the plot, not the underlying raw data.

Select as Best AnswerUndo Best Answer

user
lemonie (author)kelseymh2009-09-08

Thanks for the opinion, I maybe should be thinking of standing waves. That is, one rating goes (here), two ratings push (this far), three (this far) and above so many it gets fuzzy, but for a section on their way / up / down there's a distinct lump like an interference pattern or something. The .xls has counts per rating block - 0.1 points per block, that is the raw data I'm afraid... L

Select as Best AnswerUndo Best Answer

user
kelseymh (author)lemonie2009-09-08

Ah, we agree :-) What you describe is just what I had in mind with the 2D distribution (rating value vs. votes). I'm not convinced you're right, but such a plot is the test case for both our hypotheses.

Select as Best AnswerUndo Best Answer

user
lemonie (author)kelseymh2009-09-08

The data I collected in March shows a similar "curve" but because I only counted to the nearest 200 it's much more lumpy so I won't reproduce it. L

Select as Best AnswerUndo Best Answer

user
Ninzerbean (author)kelseymh2009-09-08

Yes, that's what I meant, it could be read both ways.

Select as Best AnswerUndo Best Answer

user
lemonie (author)Ninzerbean2009-09-06

When you see people posting comments that indicate they've given it 5, or 0.5 you see a measure of popularity at work. People have to like or dislike it enough to feel inclined. As you observe, what you click and what you get are usually different things, which is why I posted this. The original topic explains the complex maths. L

Select as Best AnswerUndo Best Answer

user
DJ Radio (author)2009-09-05

There has only been 1 ible in history that is below 1*

Select as Best AnswerUndo Best Answer

user
lemonie (author)DJ Radio2009-09-06
user
Ninzerbean (author)lemonie2009-09-08

58 people have been to this ible and guess what? 58 people have voted on that terrible puzzle one.

Select as Best AnswerUndo Best Answer

user
DJ Radio (author)lemonie2009-09-06

OMG! I thought the extreme mini knex gun had once again dipped below the 1* mark.

Select as Best AnswerUndo Best Answer

user
Rock Soldier (author)DJ Radio2009-09-07

Nope, now it's 1.2 stars.

Select as Best AnswerUndo Best Answer

user
DJ Radio (author)Rock Soldier2009-09-07

the ible lemonie linked to, or the extreme mini knex gun?

Select as Best AnswerUndo Best Answer

user
Rock Soldier (author)DJ Radio2009-09-07

Sadly, extreme mini Knex gun

Select as Best AnswerUndo Best Answer

user
trebuchet03 (author)2009-09-07

A while back (long while), the instructables team posted the rating algorithms ideology. Something about being normalized to the average of ALL instructables' rating of rated projects.... This is why you've got a healthy spike around the 3ish range :) I've seen the rating function written out on a white board (perhaps an old version)... IIRC, there was no page view (popularity) parameter.

Select as Best AnswerUndo Best Answer

user
lemonie (author)trebuchet032009-09-07

Yes I've seen it. But the spike in the middle is sitting on top of the healthy distribution, this is where the on-rating entries come into the system, as yokozuna observes.
My imagination sees a shield volcano, with a big plume of fresh lava rising in the middle, then rolling down either side?

L

Select as Best AnswerUndo Best Answer

user
Ninzerbean (author)2009-09-05

I don't think it's about popularity though L. The average ible gets about 3 ratings per thousand hits.

Select as Best AnswerUndo Best Answer