Expert Political Judgment: How Good Is It? How Can We Know? | Philip Tetlock | Talks at Google
By Talks at Google
Summary
## Key takeaways - **Partisanship Trumps Accuracy**: Factors making pundits charismatic like passionate partisanship and entertaining sarcasm are inversely related to empirical accuracy, drawing more blogosphere attention than nuanced probabilities. [00:45], [02:36] - **Experts Studied Over 20 Years**: Studied roughly 300 political, economic, and military experts with PhDs and 12 years experience, eliciting 30,000 specific forecasts on 50+ nations across domains like economy and security. [01:54], [09:17] - **Foxes Outperform Hedgehogs**: Foxes, intellectual opportunists who know many things and score high on cognitive reflection tests, are better calibrated and discriminating forecasters than hedgehogs who cling to one big theory. [26:37], [27:47] - **Even Best Experts Beat by Stats**: Best fox-like forecasters did not outperform simple time series or statistical models like extrapolation, and barely beat random dart-throwing monkeys. [28:21], [28:52] - **Hedgehogs Nail Rare Black Swans**: Hedgehogs make more extreme predictions, over-represented among those correctly forecasting Soviet demise, Chinese growth, or Islamic terrorism rise, though with many false alarms. [29:57], [30:53] - **Aggregate Hedgehogs Match Foxes**: Aggregating hedgehogs boosts their accuracy to near average fox levels due to variance, despite individual hedgehogs performing worse. [36:21]
Topics Covered
- Full Video
Full Transcript
I'd like to start by asking you to contemplate one of the many early 21st century social phenomena that Google has
helped to facilitate and that's the blogosphere particularly the political blogosphere and I'd like you to contemplate the following what what
factors do you think would be more potent which which of the following factors would you think would be more potent determinants of the amount of attention that a pundit would attract in
the blogosphere on the one hand there's the empirical accuracy of the pundit the degree to which the pundit is able to
attach reasonably nuanced and realistic probabilities to possible futures and on the other hand you have the degree to which the pundit is passionately
partisan and entertainingly sarcastic okay so it's not it's not even a remotely close call is it and that that said something interesting now if you're a 19th century liberal in the spirit of
John Stuart Mill you might say well look the marketplace of ideas is going to equilibrate it's going to self-correct in the long run but of course in the long run we as Keynes said we're all
going to be dead so it's a question how patient do you want to be the major thrust of my talk is that there's a great value to be had in keeping score
in systematically monitoring the degree to which pundits and various other self-appointed or other appointed experts actually do have an empirically
realistic view of the world and that really is I guess the segue into what I've done what I've done over the last 20 years is I've been carrying out a
series of studies of political and economic and military and other experts asking them essentially what do you think the future holds and defining the
futures with sufficient specificity and clarity that we can actually tell who is right or wrong afterward and getting them to assign subjective probabilities to those possible futures and then scoring them
so this is essentially a story that about two things one is it is possible to monitor the accuracy of even pretty
intellectually slippery characters like political experts say and me I you learn something you would not have otherwise learned there are some surprises and one
of the bigger surprises is the very factors that make experts charismatic and attractive and likely to draw attention to themselves in the blogosphere those factors tend to be inversely related to what makes them
empirically accurate so there's a perverse dynamic at work and the reason I wanted to give this talk at Google is because it's a pitch essentially it's not not a pitch to sell books there's a pitch really that Google I think could
serve a useful public good function I think Google has the credibility and the influence and the visibility to tip the world at least a little bit in the
rational direction in this case by sponsoring competitions of accurate competitions that focus on accuracy so rather than people in the blogosphere competing on how entertaining they can
be and how how cleverly they can help you reinforce your partisan prejudices rather than competing on those dimensions getting them to compete more on the dimensions of how nuanced and
realistic are they about the world and I think that would unbalance be a good thing but the research I'm going to talk about today is essentially an illustration that it is possible it's not a model for how it could be done in
the context of an influential company like Google so the title foxes hedgehogs and dart throwing monkeys the dart throwing monkey is of
course a favorite metaphor that the Economist and Wall Street Journal have used for debunking stock pickers the question of course is whether the stock pickers can do better than are throwing
monkeys and well in some competitions leave the pickers do a little bit better but they don't do a lot better and as for outperforming market averages forget
it on aggregate they don't my agenda I'm essentially going to talk about four things I'm gonna describe what I've done in my research I'm going to describe for
those of you who looked at the exercises that jennifer circulated I'm going to assist you in some of the self-assessment exercises where do you fall on the individual difference
dimensions that we assessed in our studies of political and economic forecasters what did I find about them about the both the forecasters I studied
roughly 300 over 20 years and finally what more general conclusions should we draw psychological conclusions about the dynamics of human thought and perhaps more normative or policy conclusions
about how we organize societal decision-making incidentally it's much more interesting I think if there's a dialogue so you shouldn't assume that this is going to be a monologue you should feel free to ask questions when
they arise I'll often say things that are not entirely transparent so you should raise your hand and I'll try to
address them okay what did I do essentially I asked a lot of smart people a lot of questions and I asked them to assign subjective probabilities
to possible futures then I went ahead and I scored the accuracy of the judgments I also did something else I gauged the willingness of experts to change their minds when they got it wrong one of the interesting features of
a lot of political and economic forecasting is there's a deep and deeply ingrained reluctance to acknowledge mistakes you very rarely see that in the blogosphere and I think you very even in
academia you don't see it all that often so I think we call that a Bayesian benchmark the degree to which people change their mind roughly in accord with
a Bayesian probability theory yeah bloggers Indian writers it seems to be they're spending a very small amount of time making predictions about future so I'm working on a possibility that you're
assessing their accuracy what's up you're 100% right that it's not a natural activity for most political pundits to articulate expectations about the future that are testable
in any scientific or quasi scientific sense and that's part of the problem it's but part of the problem is pinning people down what you will frequently see in the blogosphere is people
articulating strong big expectations about the future so if we stay in Iraq it's going to be a hopeless quagmire or if we pursue this particular well for welfare reform bill we're going to have
an increase in child poverty but they're not stated in a way that's really very directly testable they're not time framed they're not quantified all of the ingredients you would expect to see for
example in a good performance management system for employees and a cutting edge organization like smart you know the acronym smart is that has that reach to Google it's there's a good performance
appraisal system is supposed to be specific measurable achievable results oriented in time frame right well a good performance appraisal system for political pundits should be roughly the same should have roughly the same
qualities so this is an effort to change the ground rules it's not an I think it's a bad thing that we've been content for as long as we have with vague
untestable claims and I think it helps to entrench a culture of partisanship and blame and it certainly doesn't facilitate anything remotely resembling
organizational learning and then finally another thing I did do I wasn't I wasn't totally remorseless I want I did listen
to the experts when they complained when they when they came back to me and they said look you know you're the criteria you're using for judging accuracy or unfair so for example if an expert exaggerated the likelihood of finding
weapons of mass destruction in Iraq they might insist well you know yeah I was wrong but I made the right mistake it was a good idea to overestimate the likelihood Dental underestimated because type 1 and type 2 errors are not
symmetrical here or the expert might say well this this defense came up more often a few years ago then now we haven't found weapons of mass
but just wait we will be a little bit patient for an expert who predicted the disintegration of Canada a few years ago said well you know yeah I was wrong but you know I was almost right
the second secessionist referendum in Quebec almost succeeded it was fifty point one percent of the vote that's well within the margin of sampling error
so what we kept running into is a lot of resistance to our accuracy criteria so we tweaked our scoring criteria in a variety of ways to address the
objections that experts raised ok participants very briefly there were two major cohorts and many smaller ad hoc
groups as well but the two big cohorts were in 1988 89 and 92 93 so one one big wave of experts we studied for example was before the disintegration of the
Soviet Union there still was something resembling the Cold War going on I am indeed that old I'm actually older
than that all participants made their living writing and thinking about political and economic trends that was essentially the litmus test for whether you're an expert or not so the experts
included a lot of people in academia they included some journalists they included a fair number of intelligence analysts they included people who worked for international institutions like the
World Bank and the IMF 64% had PhDs virtually all of them had some kind of postgraduate degree and they had an average of twelve point two years of
professional experience so by virtually all of the standards superficial criteria for qualifying somebody as an expert virtually all of them qualified
now the conceptual ingredients for good judgment what one of the minimum things we have to have in place if for example a company like Google were to
contemplate seriously the idea of creating competitions in that focus on accuracy rather than entertainment value or reinforcing partisan prejudices well
you would need to define possible futures so they pass certain tests you need to pass the exclusiveness and exhaustive Mis tests in order to use subjective probabilities in the sack
you have to have that and then also something called the clairvoyance test and the clairvoyance test means you have to state the possible future sufficiently precisely so that if say Jennifer were a genuine clairvoyant and
really did have the power of seeing into the future all she would need to do was just focus her attention a little bit look out into the future and and then she could tell me thumbs up or thumbs down as to whether it happened and she wouldn't have to come back to me and say
oh what exactly did you mean by populist backlash in Poland and what exactly did you mean all of the vague stuff that you typically observe in the blogosphere and and elsewhere so there needed there
needs to be a degree of clarity a degree of precision to go back to the earlier question that you don't typically see in the in the blogosphere then you get people to play subjective probabilities
on each set so rather than saying you know it's likely or possible or what-have-you instead of the vague verbal quantifiers of uncertainty that people typically use we were able to induce them to to translate those into
subjective probabilities and that's a non-trivial thing to do learning how to use subjective probability scales and here here's a sample question central government debt will either hold between
35 and 40 percent of GDP or fall below or rise above that range and that would apply to a long list of possible countries but it wouldn't have to be that it could be could be predictions
about casualties in Iraq or it could be predictions about where the nikah is going to close at the end of this calendar year and and so forth so to
give you a sense for the full range of things we looked at there were 50 to 59 different nation-states on which we elicited predictions plus transnational
entities like the EU NATO and WTO many different domains a lot of predictions on economic performance a lot of predictions on policy priorities how much the governments are going to spend
how much they're going to put into defence state-owned enterprises RSO ease the degree to which privatization is going to proceed in an economy the degree to which they're going to be leadership changes where the same people
are going to be in power or not where they're going to be cross border conflicts on and on and forecasting arises we had a number of different forecasting horizons we had shorter term
predictions for the faster who variables like like stock markets and we had longer-term predictions for things that change much more slowly in the world like changes in borders or changes
in nuclear status whether whether there are countries or is or is not a nuclear power although that of course raises its own set of tricky issues about when you qualify a country as being a nuclear
power so this is just a rough breakdown of all the forecast that we elicited there were 30,000 different forecasts and they fall into these categories experts making predictions in the domain
of their expertise so you'd have experts on rush on making predictions about Russia or experts playing the role of dilettante or trespassers dilettante so you have experts on Russia making
predictions about Canada and experts on Canada making predictions about Russia shorter long term forecasts some of the forecasts falling in the geopolitical
zone of stability so Western Europe North America Japan others falling in the geopolitical zone of turbulence virtually all of Africa large parts of
Asia different domains government policy economic performance national security and then questions within domains and then finally subjective probability judgments within within those categories
so a total of roughly about 30,000 predictions are and then close to 90,000 subjective probability judgments because a subjective probability agents are typically organized into three
possibilities three possible futures for each question there's going to be more of something there's going to be less of something it's gonna be about the same amount of something and the boundaries are defined precisely
you know love you won't mind something a lot
those yeah you're absolutely right the people in the blogosphere disagree not only about the future they disagree about the present and they disagree about the past and a lot of they
disagreements about the past hinge on arguments about causation and those arguments in turn hinge on lots of speculative historical counterfactuals about how history would have unfolded if Reagan hadn't been president with the
Cold War of ended just as quickly or Clinton hasn't been president with the economy of the boomed as much as it did in the 90s so there are lots of disagreements of a counterfactual character that are extremely difficult
to have accuracy criteria because nobody can go back in a time machine and tweak history and see how history would have unfolded in the altar in the alternative
universe it's just not it's just not
something we can do that's true that's true and we know we were not looking at questions of that sort we were looking
at questions that typically had some kind of fairly direct policy relevance and we're dealing so much with salacious scandals and issues at the moment we're dealing with more policy wonkish kinds
of things the only part of your question I take a little bit of issue with is that the people in the blogosphere aren't really making predictions I would argue they're doing it constantly that's a that's an interesting question
I haven't done as a systematic content analysis of the blogosphere to assess what proportion of their claims are value claims like this is good or bad just in and of itself as opposed to this
is good or bad because X or Y or Z will
happen as a result oh I have not done that but I am I have an existential
certitude that they're doing a lot of predicting conditional predictions yeah
that's right then it becomes counterfactual we have to go back in our time machine right and then see how history would have unfolded history doesn't give us control groups that that's and that's one of the reasons why
the the arguments over history can be so readily and profoundly politicized if you want to know whether someone's a liberal or a conservative for example just ask them how history would have unfold in the 1980s without Ronald Reagan whether the whether the Cold War
would have ended in roughly the same way as it did if someone's a conservative they know with existential certainty that the Cold War would not have ended that way and that if there'd been a weak
liberal Democrat in power in 1981 the Soviet Union would still be with us today in all likelihood they believe that Ronald Reagan's policies play the key role in precipitating the collapse of the Soviet Union
whereas for a lot of liberals that's just total nonsense they believe that Gorbachev is the result of an internal evolution in the Soviet quality and that those are trends that would have
unfolded pretty much regardless of who was in the American white house yep never absolutely not absolutely not they're not they're not broken down by
individuals one of the conditions for getting cooperation here is anonymity if there's simply no way I mean I told people right up front and it's in the book also I really emphasize to them
upfront the point of this is not to identify winners and losers the point of this is to identify to test some rather abstract psychological and political hypotheses about how styles of reasoning
are related to greater or lesser empirical accuracy over the absolutely yeah we do yeah yes and and it doesn't really matter that much if you're a liberal or conservative in
aggregate I mean there are certainly issues where liberals are more accurate and certainly issues where conservatives are more accurate but on AG in aggregate across the board ideology is not a very good predictor and I'll be saying in a
minute what is but we it's a long list of things that are not good predictors ok probability scoring but this is this is very very very simple stuff this is
actually a method of assessing the accuracy of judgments that was developed originally by meteorologists and one of the reasons why meteorologists are among the very best calibrated professionals
ever studied by psychologists is that meteorologists got in the habit about 25 or 30 years ago of testing themselves and of seeing how well calibrated they are of making specific subjective
probability judgments about possible futures precipitation and rainfall and all the things that meteorologists are interested in and getting and getting getting feedback getting quick clear feedback and whether they're right or
wrong and the process of doing that is it's a process by which people become well calibrated it's just something it's just something that doesn't occur in the political realm but the basic idea is
that when an event occurs you score it as a one when an event doesn't occur you score it as a zero and you deviate the probabilities against the outcomes so if you assign a high probability to something that occurs your probabilities
going to be close to one so that's going to reduce the score and if you sign low probabilities to things that don't occur again that's going to reduce your probability score and low scores tend to
be good now there's a lot lots of fancy or not so fancy depending on your point of view manipulations you can do to this formula and their various scores you can derive from it you can derive scores of
knowledge about the amount people know about the world you can derive scores about calibration and discrimination and that's what I'm going to talk about now calibration and discrimination are the
two major properties of good judgement that I'm going to address today and what does it to be well calibrated means you're on this line here the diagonal represents
perfect calibration that is there's a perfect correspondence between the subjective probabilities you assign events and the frequency with with which events assigned those probabilities occur so all those things that you
assigned a point through probability to they occur point two percent of the time we're twenty percent of the time now in the case of this particular forecaster you could say this this forecaster is a bit of a fence sitter this is a
forecaster who never really says anything other than varying shades of maybe they say well you know they never go below point four and their probability assessments and never rise
above 0.6 so it's a but they're perfectly calibrated so this would be an example of someone who's very well calibrated but isn't very discriminating they're not doing a very good job
attaching higher probabilities to things that happen then two things that don't happen this is an example of excellent
calibration still but better you've been pretty good discrimination because now the forecaster is using a much wider range of the probability scale and then finally this is what God looks like in subjecting the subjective probability
score in universe this is omniscience this is why you whenever you say there's a zero probability of something occurring it never occurs you never you say there's a 1.0 probability of
something occurring it always occurs now
before telling you I'm sorry
I'm sure what's that it's difficult but it's possible to something tell where you set your thresholds but there's
attention there but yeah there is in in practice there's a bit of a trade-off between calibration and discrimination so the people who are best calibrated do tend to be a bit bit like fence sitters
they're they're they're they're more cautious it's it's relatively rare to have forecasters who achieve both very good calibration and very good discrimination nobody really approaches
God in this data set now Jennifer circulated some exercises that are somewhat similar at least in principle to the exercises that we asked our 4k
to do there was a 50 item quiz that asked people questions like Mandarin is the world's most widely spoken language true or false
and you from a 50% probability is coin-toss confidence 1.0 is absolute existential certainty it's right and in
a mortar and pestle the pestle is the bowl which holds the material true or false again so you it goes on and on like these are pretty tricky questions and most it's difficult for people to
get much more than 55 or 60 percent right and typically people are about 20 20 percent plus overconfident on an
instrument like this but it's one of the it's one of the things we used so how many of you actually did that exercise and then it maybe this is all all alien
to you well here's what I would simply do as a blurb for the exercise that Jennifer circulated if you're interested in how well calibrated or discriminating you are go ahead and do that exercise
and then we can talk a little bit about what mathematical operations you'd have
to perform to compute your scores but ok but I see it's an instructive exercise and I think it's actually a good thing
to get in the mental habit of doing right very simply so yeah that's one thing with this the self-assessment
exercise is assessing your calibration and discriminatory power there's this other thing called the cognitive reflection test I understand that Danny Kahneman it was here and he gave you the bat on the ball question how many
remember the bat and the ball question the bat on the ball I'm sorry yes you're
right the bat in the ball together cost a dollar 10 the bat costs a dollar more than the ball how much does the ball cost and it sways it it's an example of a very clever test called the cognitive
reflection test in which no matter how smart you are your first reaction is going to be to get it wrong almost everybody get its quest whether or not you reign in your first reaction if you reign in your first we a
the people who get it right or all the people who reigned in their first reaction and went about it analytically if you but if you if you did it the Gladwell way if you blinked you get it wrong simple as that
actually the condiment glad well I guess glad will also talk to you the condiment Gladwell juxtaposition is an interesting one because Danny condiment is a think guy he believes and think and Gladwell
is of course much more of an advocate of Blink but they represent quite different perspectives I think on what on what ales human judgment and I'm much more toward the kaanum and end of that of
that scale i I think that a lot of intuitive judgment gets us into serious trouble especially when you're dealing with very complex societal issues
another thing that we that jennifer circulated was whether you think of yourself as a hedgehog or a fox how many of you ever heard the hedgehog fox metaphor the Fox knows many things but
the Hedgehog knows one big thing comes from a fragment of poetry from the Greek poet or kill Lucas 2500 years ago what
does it mean what does it mean well it means it in terms of our measurement instruments the Foxes tend to be intellectual opportunists they don't
fall in love with ideas they're not very ideological they're very pragmatic they like irony whereas hedgehogs are people who fall in love with big organizing principles that
impose order on the world so the Hedgehog knows one big thing so you know a very flattering view of Hedgehog would be Einstein right who's stuck with his
worldview even when it was being undercut by quantum mechanics he really had a firm belief that God doesn't play dice with the cosmos see he was a very
principled and of course brilliant Hedgehog so that's another exercise you can you can engage in if you want to classify yourself as a fox or a hedgehog
there's good news and bad news for hedgehogs and foxes and for people who score high and low on these various tests so it turns out that if you
classify yourself as a fox who is wary of master series like the Fox those many things and if you did really well the cognitive reflection test which has these kind of baton ball kinds of
questions in the unit you're going to resemble the best forecasters in this sample so people who are not very ideological who have a certain ironic
distance from Avance and who analytically reign in their first reactions to things do better that's the good news for the analytical foxes the bad news for them is that even the very
best forecasters inter sample weren't very good none of the best forecasters did better than time series models or statistical
models and some of them did better than the monkey but it was hard for them to be even the crudest kind of statistical models like simple extrapolation algorithms like whatever happened last
time predicted or whatever is whatever is now just predict more of the same even those very crude kinds of models were hard for for the experts to be so
that's the good in the bad news for for people on the Fox end of the scale the bad news and the good news for people of the Hedgehog and again the hedgehogs with people with the fondness for master
theories and these are the people who do resemble the worst forecasters in my sample so if you have a cognitive style like Einstein put it in an ironic way it's a bad sign for your ability to
predict things in a messy complicated social world of the sort that we have here so the the desire for one of the one of the Trump values of science is
parsimony hedgehogs treat parsimony as a trump value people who treat parsimony as a trump value get into lots of trouble in these kinds of exercises their subjective probability estimates
are really substantially further off from those of people who view the world as much more of an exercise and ad hoc
eree whether there is some good news for the hedgehogs even in my data and that is the hedgehogs had lower batting averages for sure but they were over-represented among the grand slam
home run hitters they were over-represented among the people for example who predicted the demise of the Soviet Union or over-represented among the people who predicted the phenomenal Chinese growth rates over the last 15 years
they're over-represented among people who predicted the rise of Islamic terrorism so hedge honey whenever something out of the blue happens there is usually a hedgehog standing around pretty close
ready to claim plausible credit for
anticipating it lot that triggered lots
of questions in the back they they there are hedgehogs all over the map there are Marxist hedgehogs there are
libertarian hedgehogs there are boom stir hedgehogs there are doom stir hedgehogs they're all over the map there hire various predictions they make more extreme predictions so when something extreme happens there's not
there's usually a hedgehog close by now what does that mean in terms of the dynamic to the blogosphere I that means the path to fame of course is not assigning realistic probabilities to things that are just marginally
different from the status quo that's not going to get you very far if you want to become famous what's going to get you famous it's predicting the demise of the Soviet Union three or four years in advance or being out front predicting Chinese growth rates or being out front
predicting Islamic terrorism wow that's
a yep yes I I think I think it has a number of interesting parallels to finance and indeed some of our dependent
variables are financial and nature of our financial markets oh yes yeah I'm sorry yes hedgehogs lose on both they lose on both
but but it's a good question and it does it's somewhat in the spirit of the argument up to this point yeah here here's what you see with calibration and here you see higher scores are bad
because higher scores mean they're larger gaps between your subjective probabilities and reality so the largest scores where calibration shows them this is called a third order interaction
statistically here this this mean here when you have people who have extreme relatively extreme theoretical of political views who are hedgehogs who have low scores on the cognitive reflection test and they're making
long-term predictions the combination of all those things produces the largest gaps from reality by a significant margin yes yes that's exactly right
that's right yeah it's a quadratic scoring role here you see a to demean
calibration now along the x-axis is one minus your calibration score and you see improving discrimination along the y-axis and this goes to the question of whether or not well you know the
hedgehogs lost on calibration but maybe they made up for it on discrimination what you see is Fox is making short-term predictions Fox is making long-term
predictions FS LTE LTE excuse me FST and FLT these these two blobs there they are this is the kind of maximum performance frontier this is the best calibration
discrimination combinations and human beings forgetting time series models are off the chart they're above what can be represented here simple extrapolation
models like Oh point 35 and point 36 those represent mindless extrapolation algorithms like predict more are the same or the most recent rate of change
Hedgehog short-term Hedgehog long-term they're worse on calibration and you can see that by and large they're also somewhat worse on discrimination sometimes they tie on discrimination but
by and large they're they're worse on discrimination as well they're not as much bad when discrimination is not as great but
it's it's still there yes the discrimination measure is essentially a measure of whether you're assigning much higher probabilities to things that occur then two things that don't occur
so it's like the variance of correct answers across probability categories this is what it looks like when you look at the full subjective probability scale
from 0 to 1 to the extent to the extent that your calibration curve sticks very close to the diagonal you're perfectly calibrated and you can see that the forecasters who are straying the farthest from the calibration curve or
hedgehogs making long-term predictions in the domain of their expertise again but it's just showing showing showing along the whole the full scale so
essentially there are three big risk factors when you're doing subjective probability forecasting one is you're too quick to make up your mind that is you get low scores on the baton of all
kind of tests you're too slow to change your mind and this this goes into something about having to do with a Bayesian benchmarks for whether you change your mind as much as you should when you discover what actually happened
so you're too slow to change them too quick to make up your mind too so to change your mind and then finally more generally you just fall too passionately in love with your pet theories and
that's gauged by whether or not you're an ideological or theoretical extremist so the deeper trade-off if you want reasonably good calibration and reasonably good discrimination over
large numbers of predictions in relatively stable environments you're best off going to the analytical foxes but if you want I think creative contingency planning
for sharp breakpoints in history you should seek a more ideologically balanced portfolio of hedgehogs you're going to want boom stir hedgehogs and do mr. hedgehogs they're going to want realist and institutionalist hedgehogs
are going to want a wide range but you're going to if you're going to have all these hedgehogs around though you're gonna have to have a lot of tolerance for false alarms
and now we can just talk yeah great question straight question absolutely we did and as you would expect from the
fact that the hedgehogs have greater variance hedgehogs benefit more from aggregation than foxes do in fact when you aggregate all the hedgehogs together the AG the accuracy of the average
hedgehog is not all that different from the accuracy of the average Fox even though the individual hedgehogs are very very inferior to the individual Fox
individual foxes so you do get that aggregation paradox at work and work here a little bit yeah
we don't get a lot we don't get a Big Sur wiki style wisdom of the crowd effect but it's a little bit yeah yes Kamiya or yeah this is a long and
tedious list of null hypothesis results the Fox Hedgehog scale and the cognitive reflection test were the two most powerful individual difference predictors of how well people do in these exercises if you want to if you want to know whether they're liberal or
conservative you want to know whether they're an academia or in government or in journalism if you want to know how old they are if you wanna know how prestigious the university they went to
is you want if you want to know their other whether they have PhDs or not if you want there's a long list of things that don't matter there were certainly some opinionated
people but there were there were people who today some of whom are bought or
some yes I think I think there's a huge amount of value to prediction markets in
fact I I'm old enough to have a son who's an assistant professor of Finance and he works on prediction markets I'm very sympathetic the idea idea and in fact my psychological point of view
prediction markets work by forcing people to be more self-critical and Fox like because prediction markets have this relentless second-guessing dynamic there's always somebody ready to pounce
and and capitalize on your stupidity and it forces you to engage in a much more thorough kind of introspection than you other and then you normally would so there's a prediction markets make people smarter I think than they otherwise
would be and I think it's quite consistent with these results I don't think prediction markets serve the kind of public good function that I'm talking about here though I think
I'm sorry here is you have prediction markets and claims that experts at loss of me and then you sort of automatic warm bottoms
beats god-like published days or somebody's made a claim about the various dangers so you see how individuals perform at all these markets
very interesting clearly yeah very interesting the dynamic I had in mind was one in
which a powerful institution like Google could tip the incentives just a little I think the incentives are so skewed right now toward being entertaining and toward
being partisan and shrill I think there's I think even Google is not going to be able to produce a radical shift
but I think it could educate if it created systemic competitions in which the participating blogs could be ranked by the accuracy of their of their of
their performance and it would all have to be politics I mean obviously there's a lot of functions and finance and economics and politics but in sports or
any other domain of what yep financial magazines that be doing time for years experts to write down the prediction at
the end of the year they actually write them and that does not really seem to have make that much of a dent in what because people are also looking for
entertainment yeah absolutely and in index funds are boring yeah so yeah instead day before disregarding this
guy's boring but he gets it trying to have is very strong how it is thought sometimes you go you will have a direct interest but mostly it's like big
questions yeah have you guys ever been visited by the former chairman of the Vanguard fund
Bogle John Bogle he he I think he I think he's bald now but he may have torn out his hair over the years had exactly this phenomenon how slow people were to
realize that that even though they were paying large transaction costs to go for these design levy the individual advisors they weren't getting much on average for their money it's an
interesting question I mean index funds have grown a lot in popularity it's not it's not clear there's been no effect as a result of doing this I think I think there probably has been it has a put people as I put the big individual
advisors out of business clearly not they still make very very attractive livings but has there has have been a palpable shift I would argue that I
don't I don't know the answer to that
question but I would I understand it there's a potential paradox there yeah
yeah so the I the ideal solution would be for the good judge to move to go into index funds and let other people do all the hard cognitive work of making the
markets efficient and if a free ride or kind of problem at the very back or hedgehogs born or made ah that's a wonderful question to ask of
psychologists like me that's the old nature-nurture question and the answer is both there is some evidence that cognitive style is heritable and that
people do very systematically in their tolerance for ambiguity some people really prefer sharp logical clarity and really dislike ambiguity from it from a
pretty early age on and it doesn't and it seems to be courted identical twin studies and so forth suggests there is some some genetic component to it on the
other hand I think it can be learned too so it's a mixture if you want me to estimate the relative effect sizes I don't think psychology is quite that
precise of science but it would probably be 30% 30% and that adds up to 60% and it adds up to 60% cuz the other 40% is
measurement noise psychology very very
messy yeah oh sure sure there are lots of topics where I mean
liberals clearly have done better than conservatives on the war in Iraq liberals did better than conservatives in predicting that major liberalisation would occur could could occur in the Soviet Union a lot a lot of
conservatives were on record in the in the nineteen eighty-four maintaining that that way that was extremely unlikely the Soviet Union was infallibly self-reproducing totalitarian system it
wouldn't happen Jean Kirkpatrick and Richard pipes and people like that work very very very very clear on that point but you know the Conservatives won later on because the Conservatives argued well if Gorbachev is really serious he's
actually liberalizing a system this system has no legitimacy so if he's really liberalizing the system has to fall apart whereas there are many liberals who felt that the Gorbachev could succeed in maintaining as the
Soviet Union so conservatives went out ahead on that one conservatives have done somewhat better on the wealth welfare reform bill of 1996 other there are a lot of it's a give-and-take kind of thing you
know the Andy Warhol expression everybody gets their 15 minutes of fame right well every that's that's true here to every point of view I think gets get some portion of thing
yes yes there is a pressure to become more Fox like when you're in power yeah
responsibility and accountability tend to make you more Fox like when you when you're on the out and you when you're on when you're on the outside
looking in it's easy to be demagogic
yeah not really no yeah you they come young and old yes I think jennifer circulated that to one
another one of the exercises you can you could you could do if you're interested in doing it okay yes that's certainly
possible absolutely now there there certainly are some people who are extreme but there
are many people who are it's more of a checker board there are there some domains of life where they're more tolerant of ambiguity other domains where they're less there's certainly variants what I'm talking about when I talk about individual differences I'm talking about an aggregation across
domains but there certainly is noise and a mean variance and it's not just noise oh well some of it is systematic variance across issues in fact there are there are models in psychology that
interesting okay they would be close but
maybe not identical you lose what yeah yeah yeah probabilities of 0 & 1 are
very problematic from a Bayesian point of view also because in principle you're not supposed to change your mind it's it's like looks like an absolute it's an affirmation of absolute religious faith like what would it take to convince Osama bin Laden he's wrong
about Allah it is one of those it's just unthinkable and we had we had to do some little statistical we had we got the movements and scores out of 0 & 1 to
make to make certain things work but hedgehogs interestingly are significantly more likely to use zeros and ones on the probabilities I say they're more extreme they're more
extreme by a factor of about 2 they basically saying something is absolutely certain or something absolutely impossible they're just saying expert
better readers right now betting accident money right yeah they're okay that's well that's the best prediction market issue right and that's
your survey just their pride they're all anonymous right yeah which makes some of the defenses they invoke later some
somewhat unusual for example I made the right mistake is an unusual mistake that might that might make sense if you're offering advice to a policymaker at a given moment but when you're talking in an anonymous interview it's it what it
suggests is that people conflate probability judgments and values quite routinely and that there's actually a lot of psychological evidence for that as well when people are attaching a probability judgment is something not
just making a probability they're infusing it with a lot of evaluative significance and people have a hard time teasing those things apart that's one other reason why these exercises these
calibration exercises are useful you get in the mental habit of being more thoughtful about how you quantify uncertainty okay earning more questions I will go ahead and circulate all the exercises
that Phil kept referencing I will send it to miss Andrew so you filter that into dev and all your friends it doesn't
Loading video analysis...