From a fan: “Just stumbled on your site and I'm super confused by your rating system. It seems that this system is extremely disconnected from reality. Just an example: The standard Four Roses Yellow Label is rated 100 points higher than the 2013 Single Barrel. This is impossible in a normal world. Have you tried either of these? This is just one example with countless more that don't add up. Why are you doing this? This is super confusing to people that are starting out in whiskey or whisky.”
This is a pretty common question and a very reasonable one. Assigning numerical scores to a subjective opinion is always problematic. We spend a lot of time to try and get it right but it’s not remotely infallible. One reason why we wanted to bring this to a more public forum is to answer specifically the question, “Why are you doing this?” and the notion that we’re actually adding to the confusion of a would-be purchaser rather than our stated goal of helping.
Our rating system is based upon the combined critical acclaim from the various organizations we follow. The Four Roses Yellow Label has strong results in both the 2013 from San Francisco World Spirits Competition and the 2013 Wine Enthusiast review... particularly Wine Enthusiast (the Beverage Testing Institute scores are more modest).
Meanwhile, the 2013 vintage has a single strong result from 2013 from San Francisco. In fact, San Francisco scored the two identically. So given that the yellow label has more (and generally solid) results from other years and other institutions, mathematically we have to say it generally enjoys greater critical acclaim between the two spirits. Makes sense, no?
In reality, this is not perfect for many reasons. The most obvious is that the 2013 is a vintage year spirit and will only be entered (probably) in one year in maybe one or two institutions. So vintage brands are getting penalized precisely for being vintage brands against products that get entered more routine over multiple years in our scoring algorithm.
In other cases, this is misleading because the judges will generally classify spirits when they taste. The young whiskeys go with the young, the old with the older, the grain vodkas in one group and the potato vodkas in another… and so forth. They wouldn't assess or judge an 18yr scotch the same way they'd assess a 4yr bourbon. A silver medal result in one isn't the mathematical equal of a silver in the other. While our algorithm takes into account relative scores within a category (say a vodka versus a whiskey), the mathematics behind it are still the same. And when you’re within the same class but with different styles or years (the Yellow Label versus the 2013 Single Barrel is a good example), there’s a great chance that identical scores mean different things.
Yikes! So given all of that, why do we do it? Easy: for all its flaws, all its subjectivity, and all its confusion, we think it's a way better method of evaluating spirits-really, of introducing people to spirits-than the alternative. One simply can’t taste everything any more than one can read every book that’s published. So you go to critics for a little help. The alternative to critics is generally being commercials, rap videos, models on labels, cheeky advertising, or celebrity endorsements. For all its flaws, critics lauding a label is generally a better guide to finding new spirits than these other methods. And indeed, I've personally found spirits I never would've tried otherwise but for the critical ratings.
So why numbers? Why not just use written opinion? Well, it would be just as hard to read all of those critical opinions as it would be to just taste them all (though much less expensive). Scores make things much, much easier. That’s why the critics use them; that’s why we use them. So, we try to make it easy: a top 20 guide that allows the novice to pick up the Yellow Label over, say, Jim Beam or Captain Morgan (because he probably wouldn't know much of the difference between the two). Picking up the Yellow Label, they might get inspired to try other spirits... maybe even the 2013 vintage. Maybe something else (Blanton's?). Then they begin making their own decisions about what they like and don't like-and they certainly will disagree with the critics (as we do). But as a place to start? To us, it's hard to imagine a better place to start than a general critical consensus. That’s what Proof66 is: a critical consensus.
by Neal MacDonald, editor