Does the gender-equality paradox actually exist?

Jul 2021

The gender-equality paradox is the (disputed) idea that countries with more gender equality have fewer women in STEM careers. It turns out that the debate in the scientific literature is largely about causality, despite that there’s no agreement about a more basic question: Does the paradox even exist, or is it just an illusion caused by a contrived data analysis?


The debate so far

Act I

In 2018, Stoet and Geary had one of the most surprising results in social science in a decade. They took the Global Gender Gap Index (GGGI), which measures gender equality, and plotted it against the percentage of women among STEM graduates.

GGGI against fraction of women in STEM

Finland has high equality but few women in STEM, while Algeria is the opposite. That’s the trend.

Why this would be true is unclear, but the result seems hard to dispute. It’s obvious from the graph that GGGI is measuring something, and you don’t need to trust any fancy statistics. You can just look at the data.

This was picked up by The Atlantic, The American Enterprise Institute, Ars Technica, MacLean’s, and Jordan Peterson. Stoet and Geary themselves published an article at Quillette, where they suggest their graph is partly due to different levels of interest in STEM and partly to comparative advantage—in places like Finland, girls perform similarly to boys in science but much better in reading, meaning fewer girls have science as their personal best subject.

Act II

Inevitably, this was disputed. Richardson and colleagues took the same data and found that the percentage of women among STEM graduates was completely different. They—I think—contacted the journal, which led to a corrigendum from Stoet and Geary in late 2019. This clarified what’s on the x-axis in the above graph:

The propensity of women to graduate with STEM degrees was a/(a + b), where a is the percentage of women who graduate with STEM degrees (relative to all women graduating) and b is the percentage of men who graduate with STEM degrees (relative to all men graduating).

Get that? Take a country with the following graduates each year:

‎‎‏‏‎ ‎ men women
STEM degrees 100 5
All degrees 1000 50

Women make up 5/105 or around 4.8% of STEM graduates. However, their formula gives 50%, since the fraction of women who do STEM is the same as the fraction of men who do STEM, i.e. a=5/50 is equal to b=100/1000.

There’s a good argument for this. The most salient fact about the above country is that few women get degrees, rather than anything STEM-specific. Stoet and Geary’s formula is invariant to this kind of imbalance.

There’s also a good argument against this formula. Maybe you think that this imbalance is really important, and you don’t want to be invariant to it.

What there’s not a good argument for is calling this quantity “Women Among STEM Graduates (%)”! It’s not clear how this happened. In any case, Stoet and Geary don’t change much about their paper other than adding the quote above and inserting the word “propensity” everywhere.


Simultaneously with Stoet and Geary’s corrigendum, Richardson and colleagues published a commentary on the corrected paper. They argue:

  1. Propensities are bad.
  2. It’s not cool to use GGGI because it “measures achieved outcomes, not propensities” and “is not intended to be used to causally explain outcomes”.
  3. Better than GGGI is the ultra-simple Basic Indicator of Gender Inequality (BIGI). Stoet and Geary shouldn’t object to this, since it was proposed by… Stoet and Geary.
  4. If they compute the actual percentage of STEM degrees earned by women and plot it against BIGI, they get this graph, along with a non-significant regression coefficient.

richardson reply graph

They also published articles in Slate and on their blog. This was picked up by Buzzfeed and The Scientist, but doesn’t seem to have gotten as much publicity as the original article.

Act IV

In 2020, Breda and colleagues joined the party. They published a paper, part of this uses the same propensities as Stoet and Geary use. They argue this is worthwhile both because the original result is well-known and because it’s nice to be invariant to imbalances in the overall number of degrees.

Their first observation that the propensities aren’t just correlated with GGGI, but with all sorts of other stuff as well:

They do a regression to predict propensities from each of these variables (one variable at a time) and get these coefficients (from Table S5):

regression of STEM propensities on different country variables

Everything “good” is associated with fewer women in STEM, be it more GDP, more development, less income/human inequality, or more gender equality.

Their goal was to test how all this relates to gender stereotypes. They took the PISA 2012 data, and looked at how boys and girls felt about these two statements:

“Whether or not I do well in mathematics is completely up to me.”

“My parents believe it’s important for me to study mathematics.”

These were chosen because they don’t directly mention gender, reducing the risk of social desirability bias.

Their stereotype score for each country reflects how much boys vs. girls agree with the above statements. If a boy (girl) of equal math ability is more likely to agree, the stereotype score is positive (negative).

Their main result is a second regression to predict STEM propensities, now controlling for the stereotype scores in each country:

regression of STEM propensities controlling for stereotypes

Knowing stereotypes makes the other variables less predictive, dramatically so in some cases (Human Inequality) less so for others (GGGI).

This paper is often summarized (e.g. on Wikipedia) with quotes like this (emphasis mine):

The stereotype associating math to men is stronger in more egalitarian and developed countries. It is also strongly associated with various measures of female underrepresentation in math-intensive fields and can therefore entirely explain the gender-equality paradox.

However, most of their paper is about predicting other things (e.g., the intention to study STEM) where controlling for stereotypes has a stronger effect. I think it’s misleading to take them as claiming to entirely explain Stoet and Geary’s paradox, when the reduction for GGGI coefficient above is so modest.

New Analysis

Paradox dissolved?

After reading these follow-up papers, I had the impression the original study was debunked. But notice three things:

First, causality isn’t everything. Richardson et al. think that BIGI is better than GGGI for establishing causality. I don’t understand their reasoning in the slightest, but it doesn’t matter. None of these analyses prove causality.

Still, does the paradox actually exist? It can’t simultaneously be false (as Richardson et al. seem to claim) and true but explained by gender stereotypes (as Breda et al. claim.) Which is it?

Second, stereotypes don’t solve the paradox. Suppose that the paradox was entirely explained by gender stereotypes. That’s valuable but leaves the mystery of why more gender-equal countries should have stronger stereotypes!

  • It could be cultural. Maybe in rich, gender-equal countries, The Patriarchy has more spare resources to spend indoctrinating everyone.
  • It could be intrinsic interest. Maybe women are less likely to have STEM as their #1 choice, but in unequal countries they have few other options and so they conclude math is important for them.
  • It could be some impossible-to-disentangle combination. Maybe parents in gender-unequal countries know that their daughters have fewer opportunities, and so they constantly tell them how amazing math is, resulting in those girls liking math.

Third, it’s unclear how fragile the result is. Richardson et al. say that the paradox only appears because of “contrived measures and selective data”. Certainly, if the paradox only appears after torturing the data in one way, we shouldn’t trust it. But their evidence is… what happened when they tortured the data in one other way. Shouldn’t we try a bunch of analyses, and see how robust things are?

A bunch of analyses

Let’s start with the original analysis, relating GGGI to propensities. (Click to zoom in and look at the country names.)

gggi vs female propensity

This the same as the original Stoet and Geary figure, with three small changes:

  1. Switch the axes.
  2. Color countries according to continent.
  3. Show a LOWESS smoothing (linearity is for wimps) along with a 95% confidence interval, computed using bootstrapping.

A different calculation for STEM-participation

The above figure uses propensities, which is a major point of contention. Personally, I think this debate is silly. Propensities give one view of the data, while the raw fraction of women in STEM gives another. They both have value.

So, what if Stoet and Geary had just switched to using the actual percentage of women among people who earn STEM degrees, as Richardson et al. suggest they should have? They’d have gotten the following curve, where I’ve included non-STEM degrees for context.

gggi vs female STEM and non-STEM fractions

In more-equal countries, women earn a larger share of non-STEM degrees, but a smaller share of STEM degrees. The paradox is still there.

Other measures of equality

Maybe this all depends on some weirdness with how GGGI measures equality? A newer alternative is the Gender Inequality Index (GII). I took the 2019 rankings and used them instead of GGGI.

Be careful interpreting this graph: While more equality meant more GGGI, it means less GII.

gii vs female STEM and non-STEM fractions

Again, the most gender-equal countries have a smaller fraction of women in STEM, but not non-STEM. With propensities, this effect is even stronger.

A third alternative is BIGI, as suggested by Richardson et al. Be very careful here. BIGI is negative when women are favored and positive when men are favored. Equality occurs around zero.

bigi vs female STEM and non-STEM fractions

The more women are favored, the more non-STEM degrees they earn. With STEM degrees, women earn the smallest share for BIGI ≈ -.02, where women are just slightly favored. The fraction increases when there’s more inequality in either direction. Comparing BIGI to propensities gives a stronger, but less symmetric, effect.

While we’re on the subject… The red dots in the above graph show the same data as in Richardson et al.’s commentary above, which they used to claim that there was no gender-equality-paradox. (You can also see them by themselves with country labels.) What’s going on?

For one thing, I made the graph better differently, switching the axes and using smaller markers so you can see the density of countries. Don't believe me? Here's what you get if you take their graph, rotate right by 90 degrees, flip the vertical axis, and change the aspect ratio: transformed version of richardson's data

For another thing, they did a linear regression and found no significant result. That’s not too surprising, given that the effect above is nonlinear and symmetric.

Against BIGI

We have three different measures of gender inequality, GGI, GII, and BIGI. Here’s a plot of GGGI against GII:

gggi vs gii

Are the Philippines more gender-equal than Japan (as GGGI implies) or the opposite (as GII implies)? I don’t know, but I’ll accept that it depends on different, reasonable definitions of gender-equal.

On the other hand, here’s a plot of GGGI against BIGI:

gggi vs bigi

According to BIGI, Saudi Arabia—where women can only show their hands and eyes in public and must have a legal male guardian—is basically the same as Switzerland. Lesotho—the tiny country inside South Africa—is by far the most women-favored place in the entire world. Ooohkaaay.

This isn’t to say that BIGI is bad—they specifically discuss Saudi Arabia in their paper—but that it doesn’t capture what we have in mind when thinking about a gender-equality paradox.

Other measures of women in STEM

While the result seems robust to different measures of gender equality, everything above uses the same data from UNESCO on the number of STEM graduates. We’ve analyzed it both in terms of propensities and raw fractions, and the result is still robust. Still, what if we use a different data source entirely to measure STEM participation?

For variety, I looked at the female share of researchers in engineering and technology. If you compare this to GGGI, there’s really no paradox at all.

GGGI vs. female share of engineering researchers

If you look at natural science researchers instead of engineering, you again see no paradox.

On the other hand, if you compare to GII instead of GGI, you do see an effect in the most gender-equal countries:

GII vs. female share of engineering researchers

Comparing GII to the natural sciences shows more of a leveling off than a full reversal. I’m not sure if that’s a “paradox” but it’s not something I’d have predicted.


So, is there a gender-equality paradox? Three points.

First, Stoet and Geary’s original paradox is robust. It doesn’t matter how you measure gender inequality and or if you use propensities or raw fractions to measure women’s fraction of STEM degrees. It’s not fair to imply that they cherry-picked the details of their analysis to support some pre-determined conclusion.

Second, the paradox is somewhat limited. It appears with STEM degrees no matter how you define “equality” and how you torture the data. For STEM researchers, the effect depends on the definition of gender equality, and it is more modest when it does appear. This is weird, and I don’t understand it. Still, it shows that we need more nuance than “more gender equality → fewer women in STEM”.

Third, resist simplistic causal explanations! People choose degrees for lots of reasons: Economics, working conditions, family influences, cultural/media influences, intrinsic interest, and simply what degree programs are accessible. Most of these operate in feedback loops with each other. My love for scatterplots is vaster than the seas, but they’re at most vaguely suggestive of any single cause.

Plot all the plots

Lest I be accused of cherry-picking, here’s all the different ways of measuring gender inequality against all the ways of measuring women’s participation in STEM. I also threw in per-capita GDP and Breda et al.’s stereotype measurements. (For GDP I removed Qatar and the top 10 tax havens where GDP is meaningless.)

GGGI GII BIGI GDP stereotypes
STEM propensity x x x x x
STEM degrees x x x x x
non-STEM degrees x x x x x
Engineering researchers x x x x x
Natural science researchers x x x x x

Choose the column you want on the x-axis, the row you want on the y-axis, and let the beautiful dots wash over you.

Data sources
  • GGGI: Wikipedia (2015 rankings)
  • GII: Wikipedia (2019 rankings)
  • BIGI:
  • Women’s share of STEM / non-STEM: The actual UNESCO data used for the share of STEM degrees going to women appears to no longer be on their website. With much gnashing of teeth, I was able to find an older version on
  • Propensities: Due to the same problem, I couldn’t find the raw data for propensities. Instead, I took these from Stoet and Geary’s supplementary material.
  • Women’s share of engineering / natural science researchers: UNESCO report
  • GDP: The IMF’s 2021 estimates in purchasing power parity, via Wikipedia.
  • Stereotypes: Breda et al.’s supplementary material.