This is a linkpost for https://markxu.com/strong-evidence

Portions of this are taken directly from Three Things I've Learned About Bayes' Rule.

One time, someone asked me what my name was. I said, “Mark Xu.” Afterward, they probably believed my name was “Mark Xu.” I’m guessing they would have happily accepted a bet at 20:1 odds that my driver’s license would say “Mark Xu” on it.

The prior odds that someone’s name is “Mark Xu” are generously 1:1,000,000. Posterior odds of 20:1 implies that the odds ratio of me saying “Mark Xu” is 20,000,000:1, or roughly 24 bits of evidence. That’s a lot of evidence.

Seeing a Wikipedia page say “X is the capital of Y” is tremendous evidence that X is the capital of Y. Someone telling you “I can juggle” is massive evidence that they can juggle. Putting an expression into Mathematica and getting Z is enormous evidence that the expression evaluates to Z. Vast odds ratios lurk behind many encounters.

One implication of the Efficient Market Hypothesis (EMH) is that is it difficult to make money on the stock market. Generously, maybe only the top 1% of traders will be profitable. How difficult is it to get into the top 1% of traders? To be 50% sure you're in the top 1%, you only need 200:1 evidence. This seemingly large odds ratio might be easy to get.

On average, people are overconfident, but 12% aren't. It only takes 50:1 evidence to conclude you are much less overconfident than average. An hour or so of calibration training and the resulting calibration plots might be enough.

Running through Bayes’ Rule explicitly might produce a bias towards middling values. Extraordinary claims require extraordinary evidence, but extraordinary evidence might be more common than you think.

50

0
0

Reactions

0
0
Comments7
Sorted by Click to highlight new comments since:

I think in the real world there are many situations where (if we were to put explicit Bayesian probabilities on such beliefs, which we almost never do), beliefs with ex ante ~0 credence quickly get extraordinary updates. My favorite example is sense perception. If I woke up after sleeping on a bus and were to put explicit Bayesian probabilities on anticipating what I will see next time I open my eyes, then my belief I'd assign in the true outcome (ignoring practical constraints like computation and my near inability to have any visual imagery) has ~0 credence. Yet it's easy to get strong Bayesian updates: I just open my eyes. In most cases, this should be a large enough update, and I go on my merry way. 

But suppose I open my eyes and instead see  people who are  approximate lookalikes of dead US presidents sitting around the bus. Then at that point (even though the ex ante probability of this outcome and that of a specific other thing I saw isn't much different), I will correctly be surprised, and have some reasons to doubt my sense perception.

Likewise, if instead of saying your name is Mark Xu, you instead said "Lee Kuan Yew", I at least would be pretty suspicious that your actual name is Lee Kuan Yew.

I think a lot of this confusion in intuitions can be resolved by looking at what MacAskill calls the difference between unlikelihood and fishiness:

Lots of things are a priori extremely unlikely yet we should have high credence in them: for example, the chance that you just dealt this particular (random-seeming) sequence of cards from a well-shuffled deck of 52 cards is 1 in 52! ≈ 1 in 10^68, yet you should often have high credence in claims of that form.  But the claim that we’re at an extremely special time is also fishy. That is, it’s more like the claim that you just dealt a deck of cards in perfect order (2 to Ace of clubs, then 2 to Ace of diamonds, etc) from a well-shuffled deck of cards. 

Being fishy is different than just being unlikely. The difference between unlikelihood and fishiness is the availability of alternative, not wildly improbable, alternative hypotheses, on which the outcome or evidence is reasonably likely. If I deal the random-seeming sequence of cards, I don’t have reason to question my assumption that the deck was shuffled, because there’s no alternative background assumption on which the random-seeming sequence is a likely occurrence.  If, however, I deal the deck of cards in perfect order, I do have reason to significantly update that the deck was not in fact shuffled, because the probability of getting cards in perfect order if the cards were not shuffled is reasonably high. That is: P(cards not shuffled)P(cards in perfect order | cards not shuffled) >> P(cards shuffled)P(cards in perfect order | cards shuffled), even if my prior credence was that P(cards shuffled) > P(cards not shuffled), so I should update towards the cards having not been shuffled.

Put another way, we can dissolve this by looking explicitly at Bayes' theorem. 

and in turn, 

 is high in both the "fishy" and "non-fishy" regimes. However, is much higher for fishy hypotheses than  for non-fishy hypotheses, even if the surface-level evidence looks similar!

More Facebook discussion of this post:

___________________________

Satvik Beri:  I think Bayes' Theorem is extremely hard to apply usefully, to the point that I rarely use it at all despite working in data science.

A major problem that leads people to be underconfident is the temptation to round down evidence to reasonable odds, like the post mentions. A major problem that leads people to be overconfident is applying lots of small pieces of information while discounting the correlations between them.

A comment [on LessWrong] mentions that if you have excellent returns for a year, that's strong evidence you're a top 1% trader. That's not really true, the market tends to move in regimes for long periods of time, so a strategy that works well for a year is pretty likely to have average performance the next year. Studies on hedge fund managers have found it is extremely difficult to find consistent outperformers, e.g. 5-year performance on pretty much any metric is uncorrelated to the performance on that metric next year.

I didn’t say anything about what size/duration of returns would make you a top 1% trader.

Facebook discussion of this post:

___________________________

Duncan Sabien:  This is ... not a clean argument. Haven't read the full post, but I feel the feeling of someone trying to do sleight-of-hand on me.

[Added by Duncan: "my apologies for not being able to devote more time to clarity and constructivity.  Mark Xu is good people in my experience."]

Rob Bensinger:  Isn't 'my prior odds were x, my posterior odds were y, therefore my evidence strength must be z' already good enough?

Are you worried that the person might not actually have a posterior that extreme? Like, if they actually took 21 bets like that they'd get more than 1 of them wrong?

Guy Srinivasan:  I feel like "fight! fight!" except with the word "unpack!"

Duncan Sabien:  > The prior odds that someone’s name is 'Mark Xu' are generously 1:1,000,000. Posterior odds of 20:1 implies that the odds ratio of me saying 'Mark Xu' is 20,000,000:1, or roughly 24 bits of evidence. That’s a lot of evidence.

This is beyond "spherical frictionless cows" and into disingenuous adversarial levels of oversimplification. I'm having a hard time clarifying what's sending up red flags here, except to say "the claim that his mere assertion provided 24 bits of evidence is false, and saying it in this oddly specific and confident way will cow less literate reasoners into just believing him, and I feel gross."

Guy Srinivasan:  Could it be that there's a smuggled intuition here that we're trying to distinguish between names in a good faith world, and that the bad faith hypothesis is important in ways that "the name might be John" isn't, and that just rounding it off to bits of evidence makes it seem like the extra 0.1 bits "maybe this exchange is bad faith" are small in comparison when actually they are the most important bits to gain?

(the above is not math)

Marcello Herreshoff:  I share Duncan's intuition that there's a sleight of hand happening here. Here's my candidate for where the slight of hand might live:

Vast odds ratios do lurk behind many encounters, but specifically, they show up much more often in situations that raise an improbable hypothesis to consideration worthiness (as in Mark Xu's first set of examples) than in the situation where they raise consideration worthy hypotheses to very high levels of certainty (as in Mark Xu's second set of examples.)

Put another way, how correlated your available observations are to some variable puts a ceiling on how certain you're ever allowed to get about that variable. So we should often expect the last mile of updates in favor of a hypothesis to be much harder to obtain than the first mile.

Ronny Fernandez:  @Duncan Sabien   So is the prior higher or is the posterior lower?

Chana Messinger:  I wonder if this is similar to my confusion at whether expected conservation of evidence is violated if you have a really good experiment that would give you strong evidence for A if it comes out one way and strong evidence for B if it comes out the other way.

Ronny Fernandez:  @Marcello Mathias Herreshoff I don’t think I actually understand the last paragraph in your explanation. Feel like elaborating?

Marcello Herreshoff:  Consider the driver's license example. If we suppose 1/1000 of people are identity thieves carrying perfect driver's license forgeries (of randomly selected victims), then there is absolutely nothing you can do (using drivers licenses alone) to get your level of certainty that the person you're talking to is Mark Xu above 99.9%, because the evidence you can access can't separate the real Mark Xu from a potential impersonator. That's the flavor of effect the first sentence of the last paragraph was trying to point at.

I’m guessing they would have happily accepted a bet at 20:1 odds that my driver’s license would say “Mark Xu” on it

Pretty minor point, but personally there are many situations where I'd be happy to accept the other side of that bet for many (most?) people named Mark Xu, if the only information  I and the other person had was someone saying "Hi, I'm Mark Xu."

More Facebook discussion of this post:

___________________________

Ronny Fernandez:  I think maybe what’s actually going on here is that extraordinary claims usually have much lower prior prob than 10^-6

Genuinely extraordinary claims, not claims that seem weird

Curated and popular this week
Relevant opportunities