Assessing the Critics: 2013 Report on the Beverage Testing Institute

Assessing the Critics: 2013 Report on the Beverage Testing Institute

Part 2 of our 3 part assessment of the three critical institutions we follow brings us to the Beverage Testing Institute. We ask again, how stable and accurate are the scores from critics over time? The scores and what we do with them is a collision of the subjective with the scientific. We put our faith in the hands of these critical “experts,” watch them somehow come up with points and medals to rank spirits of all prices and types, and then hope to some kind of Bacchus-God that they’ll be good guides for our personal palates.

Meanwhile, distillers, bottlers, and blenders all submit their life’s work to these same critical “experts” to have it assessed, assayed, and analyzed while praying to their own Bacchus-God that they’ll ultimately receive some kind of award.

We field questions (and sometimes accusations) from all places: regular people question the critical method, the distillers question the critical authenticity, and the press often claims the whole business is pure marketing and/or one step removed from astrology.

What do we have here at Proof66? We have data. Some of these questions are not answerable but we can at least run some analyses and reveal how exclusive the higher scores are, how consistent the scores are over time, and how consistent the critics are among themselves. The results are fascinating and they’re revealed below.

The Beverage Testing Institute

Located in Chicago, Illinois we’ve always appreciated the approach that the BTI takes. For one reason, it has a great pedigree going all the way back to 1981 with spirits entering the game in 1994. They use blind tastings, accept no advertising, have a flat fee for entry, and ensure that every product is tasted under exactly the same conditions among a panelist group that, while changing over time, is always experienced in the product that they’re tasting. They even strive to minimize external environmental impact in their tasting room. They use an 80pt – 100pt rating system along with a below 80pt threshold that is “not recommended” (and goes unpublished). We use the four-part breakdown that they themselves use: 96-100 “Superlative,” 90-95 “Exceptional,” 85-89 “Highly Recommended,” and 80-84 “Recommended.” These are general classes of spirits that we’ll echo below to examine for trends.

One caveat before we start entering the business of stereotyping and grouping: the BTI scoring methodology allows, of course, a very finely grained scoring of 21 points ranging from 80 to 100. Even while they may group them in batches for simplicity’s sake (and we follow suit), there is a big difference between a 90pt score and a 95pt score… but for this article, they appear virtually identical. That’s highly misleading in a way and we’d be remiss in not pointing that out. Nevertheless…

Looking back at the last four years of scores, you can see the ratio of scores show a very high level of consistency.

Distribution of Scores by Year for BTI

Only a very small percentage of spirits claim the 96-100 point threshold… a fact we’ve noticed over the years and we heavily reward those spirits in our internal algorithm. The great bulk of all spirit scores lie between 85 and 95 point… well over three-quarters of all entries making a very obvious and broad bell-curve of results. In general, one can say that scores north of 90 points generally mean a spirit is in the 50th percentile or better. Getting into the 96pt+ territory is truly hallowed ground. Let’s turn our attention to individual categories of spirits.

Spirit Type

96-100 Points

90-95 Points

85-89 Points

80-84 Points

Scotch

11%

53%

33%

2%

Irish Whiskey

5%

51%

27%

17%

Brandy

3%

45%

43%

9%

Tequila

2%

58%

35%

5%

American Whiskey

2%

48%

42%

8%

Vodka

2%

45%

43%

10%

Liqueur

2%

41%

42%

14%

Rum

2%

35%

41%

22%

Gin

0%

49%

42%

9%

Canadian Whisky

0%

44%

36%

20%

Flavored Vodka

0%

31%

45%

24%

All Spirits

3%

44%

40%

14%

 

Here we see what we would interpret as very, very traditional scoring methodology: very, very sympathetic to aged spirits of any kind (save the lowly and often adulterated Canadian whisky) followed by white spirits with the cowboy category of flavored vodka bringing up the bottom. But even with Rum, Canadian Whisky, and Flavored Vodka showing up in the lower end the bell-curve like quality of the scoring sensibilities of the BTI rings clear across the entire spirit world.

What about volatility versus consistency? We looked back at any bottle with two scores within the same period of time… there were 188 examples in our data set. We’ll cheerfully acknowledge immediately that this is a decent sample for all spirits but becomes very, very small when you start looking at individual spirit types. Overall, BTI is very, very consistent with 76% of the spirits enjoying perfect consistency and less than 1% showing significant inconsistency. (“Perfectly consistent” was measured by receiving the same class of score as a previous year while “significant inconsistency” indicated a two-class difference, say an 80-84 point score becoming a 90-95 score or an 85-89 score a 96-100.) In fact, the consistency borders on the astonishing (again, ignoring the small sample size). Here are the detailed results sorted by consistency:

Spirit Type

Perfectly Consistent

Significant Inconsistency

Canadian Whisky

92% (11 bottles)

0%

Tequila

88% (14 bottles)

6% (1 bottle)

Gin

80% (4 bottles)

0%

Vodka

78% (18 bottles)

0%

Scotch

77% (17 bottles)

0%

Rum

76% (22 bottles)

0%

Brandy

75% (6 bottles)

0%

Irish Whiskey

75% (3 bottles)

0%

Liqueur

75% (12 bottles)

0%

Flavored Vodka

72% (13 bottles)

0%

American Whiskey

63% (15 bottles)

0%

All Spirits

76% (142 bottles)

1% (1 bottle)

 

Unlike the San Francisco competition—which reviews a lot more labels but shows a lot more inconsistency (33% consistent and 22% inconsistent), the BTI has an almost 3 in 4 record of perfect consistency with only a very, very small chance of significant deviation. We’ll reiterate that inconsistency isn’t necessarily the fault of the judge: distillers themselves could be offering variations in their products and producers often tweak their formulas in response to marketplace demand, conditions and availability of ingredients, and perhaps even prior critical assessments. One might even expect inconsistent scores. But the tiny rate of significant variation would imply that the BTI’s rather heroic measures to maintain consistency in their judging practices is paying off.

One oddity is that some of the most highly variable spirits—Canadian whisky in its flavoring, gin in its breadth of experimentation, vodka in its subtlety, and even liqueur in its multiplicity of styles and flavors—all show up as rather consistent. Meanwhile, spirits whose flavors are established by tradition tend to come near the bottom. But this might be an experiment of splitting hairs. Given the sample size, the level of consistency seems nothing short of remarkable.

What about comparing the judges from San Francisco with the judges from BTI?

Now this gets very interesting. It’s all one big, happy critical body… they should agree with each other, no?

Well, no.

We looked back across the last four years and looked at any label that was submitted to both the San Francisco competition as well as the Beverage Testing Institute (777 total bottles). Here we assume that the four groups of BTI scores correspond with the bronze, silver, gold, and double gold medal awarded at San Francisco.

Spirit Type

Perfectly Consistent

Significant Inconsistency

Tequila

35% (25 bottles)

62% (44 bottles)

Rum

32% (26 bottles)

29% (24 bottles)

Scotch

29% (26 bottles)

18% (16 bottles)

American Whiskey

27% (29 bottles)

22% (23 bottles)

Brandy

26% (10 bottles)

21% (8 bottles)

Vodka

25% (21 bottles)

29% (24 bottles)

Canadian Whisky

21% (7 bottles)

35% (12 bottles)

Irish Whiskey

19% (5 bottles)

33% (9 bottles)

Gin

18% (7 bottles)

40% (15 bottles)

Liqueur

16% (7 bottles)

50% (22 bottles)

Flavored Vodka

15% (6 bottles)

38% (15 bottles)

All Spirits

24% (186 bottles)

30% (233 bottles)

 

Wow… the level of consistency between judges at the two big agencies is verging on Hatfield vs McCoy / George Lucas vs Gene Rodenberry proportions. One would imagine that there would be broad agreement between critics but clearly there is a huge degree of subjectivity at play. Two-thirds of tequila is significantly inconsistent, which is an astonishing level of volatility for the producers out of Jalisco. Liqueurs are just as likely to be significantly inconsistent as not.

To us at Proof66, as we continue our journey of fine-tuning our algorithm, to pay greater attention to the curve within scoring criteria but also the degree of consistency between institutions. Surely, a spirit that is gaining high praise from both institutions is worthy of high praise indeed.

What this all means to you!

What does this mean to the consumer? We will continue to say that critical scores in general are the best available indication of quality and certainly far superior to price (though that’s not bad); fanciness of bottle; and perhaps even the recommendation of your neighbor.

What does this mean to the distiller or producer?

What does this mean for Proof66? Publicizing and highlighting the results of leading critical institutions will continue to be a passion for us. Using analyses like these will, we hope, help maintain the integrity and quality of events like the San Francisco World Spirits Competition so that their findings will have relevance in the industry. It should also hopefully drive more producers to submit more frequently.

By Neal MacDonald, Editor

[Disclosures and notes: we are an independent, limited liability company with no affiliation with the Beverage Testing Institute or any other critical body; our opinions are our own. All scores noted here were compiled from the results made public by the Beverage Testing Institute—while we believe our data are complete and accurate, any errors or omissions are unintentional and ours alone.]


2014-03-02
Published by Proof66.com