DYNOMIGHT ABOUT RSS SUBSTACK

Please show lots of digits

Please show lots of digits

Oct 2024

Hi there. It’s me, the person who stares very hard at the numbers in the papers you write. I’ve brought you here today to ask a favor.

Say you wrote something like this:

There were 446 students tested. The left-handed students passed 5.67664% more often than right-handed students.

Many people would think that’s hilarious. You wrote 5 digits after the decimal point! When you only have 446 students! Do you think you’re estimating the true difference in pass rates to an accuracy of 0.00001%? Do you think that final “664” is meaningful? Ha! Hahahaha! What a fool you are!

For example, the American Psychological Association says:

Round as much as possible while considering prospective use and statistical precision

And the Political Analysis journal says:

Numbers in the text of articles and in tables should be reported with no more precision than they are measured and are substantively meaningful.

And The Principles of Biomedical Scientific Writing says:

Significant figures (significant digits) should reflect the degree of precision of the original measurement.

And Clymo (2014) says:

a recent report that the mean of 17 values was 3.863 with a standard error of the mean (SEM) of 2.162 revealed only that none of the seven authors understood the limitations of their work.

And Andrew Gelman makes fun of a paper:

my favorite is the p-value of 4.76×10^−264. I love that they have all these decimal places. Because 4×10^-264 wouldn’t be precise enuf.

I beg you, do not listen to these people. Write all the digits.

Of course, they’re right that when you write 5.67664%, the final digits say nothing about how much better left-handed students might be at whatever was tested. With only 446 students, it’s impossible to estimate more than 2-3 digits with any accuracy.

But the argument isn’t just that those digits are unnecessary. It’s that they’re misleading. Supposedly, printing that many digits implies that the true population difference must be somewhere between 5.676635 and 5.676645%. Which is almost certainly untrue.

But… who says it implies that? If your point is that there’s lots of uncertainty, then add a confidence interval or write “±” whatever. Destroying information is a weird way to show an uncertainty band.

And deleting digits does destroy information. Yes it does. Yes it does! Important information.

Let’s look at the sentence again:

There were 446 students tested. The left-handed students passed 5.67664% more often than right-handed students.

Notice something? This doesn’t says how many of those 446 students were left-handed. You might not care about that. But I—the person staring very hard at the numbers in your paper—don’t necessarily have the same priorities you do.

Fortunately, I know things:

  1. If there were L left-handed students out of which l passed and R right-handed students, out of which r passed, then the exact percentage increase is P = 100% × (RATIO - 1), where RATIO = (l / L) / (r / R)
  2. When you wrote down 5.67664%, you might have rounded up. Or you might have rounded down. Or you might have just truncated the digits. But whatever you did, P must have been somewhere between 5.676635% and 5.67665%.

Since I know those things, I can try all the possible values of L, R, l, and r, and see how many have a total of R+L=446 students and lead to a percentage increase in the right range. Like this:

def p_inc(L, l, R, r):
    ratio = (l / L) / (r / R)
    return 100*(ratio-1)

def search(lower, upper, tot_students):
    for L in range(1,tot_students):
        R = tot_students - L
        for l in range(1,L):
            for r in range(1,R):
                if lower <= p_inc(L, l, R, r) < upper:
                    print(L, l, R, r)
    return tot_printed

search(5.676635, 5.67665, 446)

If I do that, then there’s only one possibility.

L l R r
45 37 401 312

There must have been 45 left-handed students, out of which 37 passed, and 401 right-handed students, out of which 312 passed.

But I can only do this because you wrote down so many “unnecessary” digits. If you’d only written 5.6766% then the true number could be anything from 5.67655% to 5.6767% and there would be 7 possibilities:

L l R r
45 37 401 312
203 113 243 128
218 97 228 96
218 194 228 192
238 237 208 196
251 219 195 161
274 234 172 139

If you’d written 5.677%, there would be 71 possibilities. For 5.7%, there would be 9494.

Now I know what you’re thinking: This kind of reverse-engineering is disgusting. Isn’t asking for extra digits of precision the wrong solution? Shouldn’t we simply require authors to publish their data?

As someone who has read thousands of academic papers, I’ll answer those questions as calmly as possible.

NO.

REQUIRING PUBLIC DATA IS NOT THE ANSWER.

HAVE YOU EVER TRIED TO USE THE DATA FROM A PUBLISHED PAPER? HAVE YOU?

USUALLY THERE’S URL AND IT’S BROKEN. OR YOU CONTACT THE “CORRESPONDING AUTHOR” AND THEY IGNORE YOU. OR THERE’S A WORD DOCUMENT WITH AN EMBEDDED PICTURE OF AN EXCEL SPREADSHEET SOMEONE TOOK WITH A NOKIA 6610. OR YOU CAN’T REPRODUCE THE PUBLISHED RESULTS FROM THE DATA AND WHEN YOU ASK THE AUTHORS WHY, THEY GET ANGRY THAT SOMEONE ACTUALLY TOOK AN INTEREST IN THE KNOWLEDGE THEY SUPPOSEDLY WANTED TO CONTRIBUTE TO THE WORLD.

THIS KIND OF REVERSE ENGINEERING IS NECESSARY ALL THE TIME, OFTEN JUST TO UNDERSTAND WHAT CALCULATIONS WERE DONE BECAUSE THE AUTHORS DON’T EXPLAIN THE DETAILS.

SCIENCE IS ON FIRE. PAPERS EVERYWHERE ARE FULL OF MISTAKES OR THINGS WORSE THAN MISTAKES. THIS IS HOW WE CATCH FRAUDS. THIS IS HOW WE CATCH PEOPLE WHO RIG ELECTIONS. YOUR PETTY WRITING STYLE OPINIONS IMPEDE THE SEARCH FOR TRUTH.

Yes, it would be great if everyone did fully reproducible science and made all their data available and actually responded when you asked them questions and got a pony. But in the current, actual world, most papers are missing important details. The problem of having to scan your eyeballs past a few extra digits is a silly non-issue compared to the problem of meaningless results everywhere. So please stop spending your energy actively trying to convince people to delete one of the very few error correction methods that we actually have and that actually sort of works. Thanks in advance.

new dynomight every thursday
except when not

(or try substack or rss) ×
Arithmetic is an underrated world-modeling technology

if you keep units

Of all the cognitive tools our ancestors left us, what’s best? Society seems to think pretty highly of arithmetic. It’s one of the first things we learn as children. So I think it’s weird that only a tiny percentage of...

Datasets that change the odds you exist

Stats for dangerous situations

It's October 1962. The Cuban missile crisis just happened, thankfully without apocalyptic nuclear war. But still: Apocalyptic nuclear war easily could have happened. Crises as serious as the Cuban missile crisis clearly aren't *that* rare, since one just happened. You...

Using axis lines for good or evil

add them only if they mean something

Say you want to plot some data. You could just plot it by itself. Or you could put lines on the left and bottom. Or you could put lines everywhere. Or you could be weird. Which is right? Many people...

Prediction market does not imply causation

Unless you're careful, conditional prediction markets have all the same problems as observational studies.

We all want to make good decisions. But it’s hard because we aren’t sure what’s going to happen. Like, say you want to know if CO₂ emissions will go up in 10 years. One of our best ideas is to...

The conspiratorial Monty Hall problem

What if you and Monty decide to cheat?

The Monty Hall problem has now been a pox on humanity for two generations, diverting perfectly good brains away from productive uses. Hoping to exacerbate this problem, some time ago I announced a new and more pernicious variant: What if...

Why ‟controlling for a variable” doesn't (usually) work

It's just adding a variable to a regression.

I’ve always seen cathedrals as presenting a kind of implicit argument to atheists. Something like: God must exist, because otherwise it would have been insane for people to build this:

Social dynamics of bluetooth speakers

A mathematical model of who turns on their bluetooth speaker at the beach.

Say you're at a park or a beach. How many people will have bluetooth speakers on? It seems to me there are three types of people: The main characters always turn on their speakers regardless of what anyone else is...

It’s perfectly valid for a trait to be more than 100% heritable

What heritability really is: A fluid statistic that changes whenever society changes.

All psychological traits are heritable. This is the best replicated finding in all of behavioral genetics. Some recent numbers include: Religiosity: 44% Schizophrenia: 79% Big five personality traits: ~40% But what, exactly, does "heritability" mean? I used to have a...

Simpson's paradox all the way down

Visualizes Simpson's paradox, and shows how it's a deeper problem than many people realize.

It's hard to get into Oxford. Is it easier if your parents are rich? In 2013, The Guardian showed noticed something disturbing: Students from (expensive) independent schools were accepted more often that students from state schools (28% vs 20%). Of...

The simplest possible way to convert Celsius and Fahrenheit

If you can switch the order of two numbers, you can convert temperatures.

This is a new way to convert temperatures between Celsius and Fahrenheit. It’s not the most accurate method, but it’s surely the easiest.

Making the Monty Hall problem weirder but obvious

It's this simple: Do you want what's behind one door or the other nine?

Here’s an Obvious Problem: There are 10 doors. A car is behind a random door, goats behind the others. Do you want what’s behind door 1, or what’s behind all the other doors? That’s easy, right? Well, how about the...