Does the gender-equality paradox actually exist?

Jul 2021   (Updated Dec 2021)

The gender-equality paradox is the (disputed) idea that countries with more gender equality have fewer women in STEM careers. While there’s lots of debate in the scientific literature about the causal implications of this paradox, there’s no agreement about a more basic question: Does the paradox even exist, or is it just an illusion caused by a contrived data analysis?

The debate so far

Act I

In 2018, Stoet and Geary had one of the most surprising results in social science in a decade. They took the Global Gender Gap Index (GGGI), which measures gender equality, and plotted it against the percentage of women among STEM graduates.

GGGI against fraction of women in STEM

Finland has high equality but few women in STEM, while Algeria is the opposite. That’s the trend.

Why this would be true is unclear, but the result seems hard to dispute. It’s obvious that GGGI is measuring something, just look at the countries that are high or low on the graph. And you don’t need to trust any fancy statistics, you can see the trend in the data.

This was picked up by The Atlantic, The American Enterprise Institute, Ars Technica, MacLean’s, and Jordan Peterson. Stoet and Geary themselves published an article at Quillette, where they suggest their graph is partly due to different levels of interest in STEM and partly to comparative advantage—in places like Finland, girls perform similarly to boys in science but much better in reading.

Wait, did I just say this was hard to dispute?

Act II

Suspicious of these results, Richardson and colleagues took the same data, calculated the percentage of women among STEM graduates, and got… completely different numbers. They—I think—contacted the journal, which led to a corrigendum from Stoet and Geary in late 2019. This clarified what’s on the x-axis in the above graph:

The propensity of women to graduate with STEM degrees was a/(a + b), where a is the percentage of women who graduate with STEM degrees (relative to all women graduating) and b is the percentage of men who graduate with STEM degrees (relative to all men graduating).

Get that? Take a country with the following graduates each year:

‎‎‏‏‎ ‎ STEM degrees All degrees
Men 100 1000
Women 5 50

Women make up 4.8% (5/105) of STEM graduates. However, their formula gives 50%, since the fraction of women who do STEM is the same as the fraction of men who do STEM. That is, a=5/50 is equal to b=100/1000.

There’s a good argument for this. The most salient fact about the above country isn’t anything STEM-specific, it’s just that few women get degrees. Stoet and Geary’s formula is invariant to this kind of imbalance.

There’s also a good argument against this formula. Maybe you think that imbalances in the total number of degrees are important, and you don’t want to be invariant to them.

What there’s not a good argument for is calling this quantity “Women Among STEM Graduates (%)” like the above graph does. In their corrigendum, Stoet and Geary don’t really explain how this happened. In fact, they don’t change much about their paper at all, other than adding above quote and inserting “propensity” everywhere.

Act III

Simultaneously with Stoet and Geary’s corrigendum in 2019, Richardson and colleagues published a commentary on the corrected paper. They argue:

  1. Propensities are bad.
  2. It’s not cool to use GGGI because it “measures achieved outcomes, not propensities” and “is not intended to be used to causally explain outcomes”.
  3. Better than GGGI is the ultra-simple Basic Indicator of Gender Inequality (BIGI). Stoet and Geary shouldn’t object to this, since it was proposed by… Stoet and Geary.
  4. If they compute the actual percentage of STEM degrees earned by women and plot it against BIGI, they get this graph, along with a non-significant regression coefficient.

richardson reply graph

They also published articles in Slate and on their blog. This was picked up by Buzzfeed and The Scientist.

Act IV

In 2020, Breda and colleagues published a paper, part of this uses the same propensities as Stoet and Geary. They argue this is worthwhile both because the original result is well-known and because it’s nice to be invariant to imbalances in the overall number of degrees.

Their first observation is that the propensities aren’t just correlated with GGGI. They are also correlated with:

They do a regression to predict propensities from each of these variables (one variable at a time) and get these coefficients (from Table S5):

regression of STEM propensities on different country variables

Everything “good” is associated with lower propensities, be it more GDP, more development, less income/human inequality, or more gender equality.

Their goal was to test how all this relates to gender stereotypes. They took the PISA 2012 data, and looked at how boys and girls felt about these two statements. These were chosen because they don’t directly mention gender, reducing the risk of social desirability bias.

“Whether or not I do well in mathematics is completely up to me.”

“My parents believe it’s important for me to study mathematics.”

Their stereotype score for each country reflects how much boys vs. girls agree with the above statements. If a boy of equal math ability is more likely to agree than a girl, the stereotype score is positive. If a girl is more likely to agree, the stereotype score is negative.

Their main result is a second regression to predict STEM propensities, now controlling for the stereotype scores in each country:

regression of STEM propensities controlling for stereotypes

Knowing stereotypes makes the other variables less predictive, dramatically so in some cases (Human Inequality) less so for others (GGGI).

This paper is often summarized (e.g. on Wikipedia) with quotes like this (emphasis mine):

The stereotype associating math to men is stronger in more egalitarian and developed countries. It is also strongly associated with various measures of female underrepresentation in math-intensive fields and can therefore entirely explain the gender-equality paradox.

New Analysis

Paradox dissolved?

After first reading these follow-up papers, I had the impression the original study was debunked. But notice three things:

First, causality isn’t everything. Richardson et al. think that BIGI is better than GGGI for establishing causality. I don’t understand their reasoning in the slightest, but it doesn’t matter. None of these analyses prove causality.

Still, does the paradox actually exist? It can’t simultaneously be false (as Richardson et al. seem to claim) and true but explained by gender stereotypes (as Breda et al. claim.) Which is it?

Second, stereotypes don’t solve the paradox. How could they, when the reduction for the GGGI coefficient above is so modest? I think the Wikipedia quote is misleading: Most of Breda et al.’s paper is about predicting other things, e.g. the intention to study STEM, where controlling for stereotypes has a stronger effect.

But OK, suppose that the paradox was entirely explained by gender stereotypes. That would just mean we’ve traded the mystery of why more gender-equal countries have fewer women in STEM for the mystery of why more gender-equal countries would have stronger stereotypes. That is still very paradoxical!

Third, it’s unclear how fragile the result is. Richardson et al. say that the paradox only appears because of “contrived measures and selective data”. Of course, if the paradox only appears after torturing the data in one particular way, then we shouldn’t trust it. But their evidence is what happened when they tortured the data in one other particular way.

Shouldn’t we try a bunch of analyses, and just check how robust things are?

A bunch of analyses

Let’s start with the original analysis, relating GGGI to propensities. (Click to zoom in and look at the country names.)

gggi vs female propensity

This is the same as the original Stoet and Geary figure, with three small changes:

  1. Switch the axes.
  2. Color countries according to their continent.
  3. Show a LOWESS smoothing (linearity is for wimps) along with a 95% confidence interval, computed using bootstrapping.

A different calculation for STEM-participation

The above figure uses propensities, which is a major point of contention. Personally, I think this debate is silly. Propensities give one view of the data, while the raw fraction of women in STEM gives another. They both have value.

So, what if Stoet and Geary had just switched to using the actual percentage of women among people who earn STEM degrees, as Richardson et al. suggest they should have? They’d have gotten the following curves. (I added non-STEM degrees for context.)

gggi vs female STEM and non-STEM fractions

In more-equal countries, women earn a larger share of non-STEM degrees, but a smaller share of STEM degrees. The paradox is still there.

Other measures of equality

Maybe this all depends on some weirdness with how GGGI measures equality? A newer alternative is the Gender Inequality Index (GII). I took the 2019 rankings and used them instead of GGGI.

Be careful interpreting this graph: While more equality meant more GGGI, it means less GII.

gii vs female STEM and non-STEM fractions

Again, the most gender-equal countries have a smaller fraction of women in STEM, but not non-STEM. If you use propensities instead of the female share of degrees, the effect is even stronger.

A third alternative is BIGI, as suggested by Richardson et al. Be very careful here: BIGI is negative when women are favored and positive when men are favored. Equality occurs around zero.

bigi vs female STEM and non-STEM fractions

For non-STEM degrees, the trend is simple—the more women are favored, the more degrees they earn. But for STEM degrees, there’s a U-shaped curve where women earn the smallest share around BIGI ≈ -.02, where women are just slightly favored. Comparing BIGI to propensities gives a stronger, but less symmetric, effect.

While we’re on the subject… The red dots in the above graph show the same data as in Richardson et al.’s commentary above, which they used to claim that there was no gender-equality-paradox. (You can also see them by themselves with country labels.) What’s going on?

Well, for one thing, I made the graph better differently, switching the axes and using smaller markers so you can see the density of countries. Don't believe me? Here's what you get if you take their graph, rotate right by 90 degrees, flip the vertical axis, and change the aspect ratio: transformed version of richardson's data If you look carefully, you can see that these dots are the same as the red dots above.

For another thing, they did a linear regression and found no significant result. That’s not too surprising, given that the effect above is nonlinear and symmetric.

Against BIGI

I think BIGI is a terrible measure of gender-equality and we shouldn’t be using it. For context here’s a plot comparing the other two measures we’ve looked at, GGGI and GII:

gggi vs gii

Are the Philippines more gender-equal than Japan (as GGGI implies) or the opposite (as GII implies)? I don’t know, but I’ll accept that it depends on different, reasonable definitions of gender-equal.

On the other hand, here’s a plot of GGGI against BIGI:

gggi vs bigi

According to BIGI, Saudi Arabia—where women can only show their hands and eyes in public and must have a legal male guardian—is basically the same as Switzerland. And Lesotho—the tiny country inside South Africa—is by far the most women-favored place in the entire world. Ooohkaaay.

This isn’t to say that BIGI is bad exactly. They specifically discuss Saudi Arabia in their paper. My point is that it doesn’t capture what we have in mind in this context. At all. So while we do seem to get a paradox with BIGI, I think it’s meaningless and we should forget about it.

Other measures of women in STEM

While the result seems robust to different measures of gender equality, everything above uses the same data from UNESCO on the number of STEM graduates. We’ve analyzed it both in terms of propensities and raw fractions, and the result is still robust. Still, what if we use a different data source entirely to measure STEM participation?

For variety, I looked at the female share of researchers in engineering and technology. If you compare this to GGGI, there’s really no paradox at all. At most, there’s a bit of a “leveling off”.

GGGI vs. female share of engineering researchers

If you look at natural science researchers instead of engineering, you again see no paradox.

On the other hand, if you use GII instead of GGGI, you do see a small effect in the most gender-equal countries:

GII vs. female share of engineering researchers

Comparing GII to the natural sciences shows more of a leveling off than a full reversal.

I’m not sure if all these observations constitute a “paradox” exactly, but they aren’t something I would have predicted.

Takeaways

So, is there a gender-equality paradox? Three points.

First, Stoet and Geary’s original paradox is robust. It doesn’t matter how you measure gender inequality and or if you use propensities or raw fractions to measure women’s fraction of STEM degrees. It’s not fair to imply that they cherry-picked the details of their analysis to support some pre-determined conclusion.

Second, the paradox is somewhat limited. It appears with STEM degrees no matter how you define “equality”, or how you torture the data. For STEM researchers, the effect is more modest and only appears for certain definitions of gender equality. This is weird, and I don’t understand it other than that it suggests we need more nuance than “more gender equality → fewer women in STEM”.

Third, resist simplistic causal explanations! People choose degrees for lots of reasons: Economics, working conditions, family influences, cultural/media influences, intrinsic interest, and simply what degree programs are accessible. Most of these operate in feedback loops with each other. My love for scatterplots is vaster than the seas, but they’re at most vaguely suggestive of any single cause.

Plot all the plots

Lest I be accused of cherry-picking, here’s all the different ways of measuring gender inequality against all the ways of measuring women’s participation in STEM. I also threw in per-capita GDP and Breda et al.’s stereotype measurements. (For GDP I removed Qatar and the top 10 tax havens where GDP is meaningless.)

GGGI GII BIGI GDP stereotypes
STEM propensity x x x x x
STEM degrees x x x x x
non-STEM degrees x x x x x
Engineering researchers x x x x x
Natural science researchers x x x x x

Choose the column you want on the x-axis, the row you want on the y-axis, and let the beautiful dots wash over you.

Data sources
  • GGGI: Wikipedia (2015 rankings)
  • GII: Wikipedia (2019 rankings)
  • BIGI: genderinequality.info
  • Women’s share of STEM / non-STEM: The actual UNESCO data used for the share of STEM degrees going to women appears to no longer be on their website. With much gnashing of teeth, I was able to find an older version on archive.org.
  • Propensities: Due to the same problem, I couldn’t find the raw data for propensities. Instead, I took these from Stoet and Geary’s supplementary material.
  • Women’s share of engineering / natural science researchers: UNESCO report
  • GDP: The IMF’s 2021 estimates in purchasing power parity, via Wikipedia.
  • Stereotypes: Breda et al.’s supplementary material.