Does the gender-equality paradox actually exist?

Updated Dec 2021

The debate so far
New Analysis
Takeaways

The gender-equality paradox is the (disputed) idea that countries with more gender equality have fewer women in STEM careers. While there’s lots of debate in the scientific literature about the causal implications of this paradox, there’s no agreement about a more basic question: Does the paradox even exist, or is it just an illusion caused by a contrived data analysis?

The debate so far

Act I

In 2018, Stoet and Geary had one of the most surprising results in social science in a decade. They took the Global Gender Gap Index (GGGI), which measures gender equality, and plotted it against the percentage of women among STEM graduates.

$GGGI against fraction of women in STEM$

Finland has high equality but few women in STEM, while Algeria is the opposite. That’s the trend.

Why this would be true is unclear, but the result seems hard to dispute. It’s obvious that GGGI is measuring something, just look at the countries that are high or low on the graph. And you don’t need to trust any fancy statistics, you can see the trend in the data.

This was picked up by The Atlantic, The American Enterprise Institute, Ars Technica, MacLean’s, and Jordan Peterson. Stoet and Geary themselves published an article at Quillette, where they suggest their graph is partly due to different levels of interest in STEM and partly to comparative advantage—in places like Finland, girls perform similarly to boys in science but much better in reading.

Wait, did I just say this was hard to dispute?

Act II

Suspicious of these results, Richardson and colleagues took the same data, calculated the percentage of women among STEM graduates, and got… completely different numbers. They—I think—contacted the journal, which led to a corrigendum from Stoet and Geary in late 2019. This clarified what’s on the x-axis in the above graph:

The propensity of women to graduate with STEM degrees was a/(a + b), where a is the percentage of women who graduate with STEM degrees (relative to all women graduating) and b is the percentage of men who graduate with STEM degrees (relative to all men graduating).

Get that? Take a country with the following graduates each year:

‎‎‏‏‎ ‎	STEM degrees	All degrees
Men	100	1000
Women	5	50

Women make up 4.8% (5/105) of STEM graduates. However, their formula gives 50%, since the fraction of women who do STEM is the same as the fraction of men who do STEM. That is, a=5/50 is equal to b=100/1000.

There’s a good argument for this. The most salient fact about the above country isn’t anything STEM-specific, it’s just that few women get degrees. Stoet and Geary’s formula is invariant to this kind of imbalance.

There’s also a good argument against this formula. Maybe you think that imbalances in the total number of degrees are important, and you don’t want to be invariant to them.

What there’s not a good argument for is calling this quantity “Women Among STEM Graduates (%)” like the above graph does. In their corrigendum, Stoet and Geary don’t really explain how this happened. In fact, they don’t change much about their paper at all, other than adding above quote and inserting “propensity” everywhere.

Act III

Simultaneously with Stoet and Geary’s corrigendum in 2019, Richardson and colleagues published a commentary on the corrected paper. They argue:

Propensities are bad.
It’s not cool to use GGGI because it “measures achieved outcomes, not propensities” and “is not intended to be used to causally explain outcomes”.
Better than GGGI is the ultra-simple Basic Indicator of Gender Inequality (BIGI). Stoet and Geary shouldn’t object to this, since it was proposed by… Stoet and Geary.
If they compute the actual percentage of STEM degrees earned by women and plot it against BIGI, they get this graph, along with a non-significant regression coefficient.

richardson reply graph

They also published articles in Slate and on their blog. This was picked up by Buzzfeed and The Scientist.

Act IV

In 2020, Breda and colleagues published a paper, part of this uses the same propensities as Stoet and Geary. They argue this is worthwhile both because the original result is well-known and because it’s nice to be invariant to imbalances in the overall number of degrees.

Their first observation is that the propensities aren’t just correlated with GGGI. They are also correlated with:

GDP per capita.
The human development index.
Income inequality, measured via the Gini index.
The Coefficient of Human Inequality.

They do a regression to predict propensities from each of these variables (one variable at a time) and get these coefficients (from Table S5):

Everything “good” is associated with lower propensities, be it more GDP, more development, less income/human inequality, or more gender equality.

Their goal was to test how all this relates to gender stereotypes. They took the PISA 2012 data, and looked at how boys and girls felt about these two statements. These were chosen because they don’t directly mention gender, reducing the risk of social desirability bias.

“Whether or not I do well in mathematics is completely up to me.”

“My parents believe it’s important for me to study mathematics.”

Their stereotype score for each country reflects how much boys vs. girls agree with the above statements. If a boy of equal math ability is more likely to agree than a girl, the stereotype score is positive. If a girl is more likely to agree, the stereotype score is negative.

Their main result is a second regression to predict STEM propensities, now controlling for the stereotype scores in each country:

Knowing stereotypes makes the other variables less predictive, dramatically so in some cases (Human Inequality) less so for others (GGGI).

This paper is often summarized (e.g. on Wikipedia) with quotes like this (emphasis mine):

The stereotype associating math to men is stronger in more egalitarian and developed countries. It is also strongly associated with various measures of female underrepresentation in math-intensive fields and can therefore entirely explain the gender-equality paradox.

New Analysis

Paradox dissolved?

After first reading these follow-up papers, I had the impression the original study was debunked. But notice three things:

First, causality isn’t everything. Richardson et al. think that BIGI is better than GGGI for establishing causality. I don’t understand their reasoning in the slightest, but it doesn’t matter. None of these analyses prove causality.

Still, does the paradox actually exist? It can’t simultaneously be false (as Richardson et al. seem to claim) and true but explained by gender stereotypes (as Breda et al. claim.) Which is it?

Second, stereotypes don’t solve the paradox. How could they, when the reduction for the GGGI coefficient above is so modest? I think the Wikipedia quote is misleading: Most of Breda et al.’s paper is about predicting other things, e.g. the intention to study STEM, where controlling for stereotypes has a stronger effect.

But OK, suppose that the paradox was entirely explained by gender stereotypes. That would just mean we’ve traded the mystery of why more gender-equal countries have fewer women in STEM for the mystery of why more gender-equal countries would have stronger stereotypes. That is still very paradoxical!

Third, it’s unclear how fragile the result is. Richardson et al. say that the paradox only appears because of “contrived measures and selective data”. Of course, if the paradox only appears after torturing the data in one particular way, then we shouldn’t trust it. But their evidence is what happened when they tortured the data in one other particular way.

Shouldn’t we try a bunch of analyses, and just check how robust things are?

A bunch of analyses

Let’s start with the original analysis, relating GGGI to propensities. (Click to zoom in and look at the country names.)

This is the same as the original Stoet and Geary figure, with three small changes:

Switch the axes.
Color countries according to their continent.
Show a LOWESS smoothing (linearity is for wimps) along with a 95% confidence interval, computed using bootstrapping.

A different calculation for STEM-participation

The above figure uses propensities, which is a major point of contention. Personally, I think this debate is silly. Propensities give one view of the data, while the raw fraction of women in STEM gives another. They both have value.

So, what if Stoet and Geary had just switched to using the actual percentage of women among people who earn STEM degrees, as Richardson et al. suggest they should have? They’d have gotten the following curves. (I added non-STEM degrees for context.)

$gggi vs female STEM and non-STEM fractions$

In more-equal countries, women earn a larger share of non-STEM degrees, but a smaller share of STEM degrees. The paradox is still there.

Other measures of equality

Maybe this all depends on some weirdness with how GGGI measures equality? A newer alternative is the Gender Inequality Index (GII). I took the 2019 rankings and used them instead of GGGI.

Be careful interpreting this graph: While more equality meant more GGGI, it means less GII.

$gii vs female STEM and non-STEM fractions$

Again, the most gender-equal countries have a smaller fraction of women in STEM, but not non-STEM. If you use propensities instead of the female share of degrees, the effect is even stronger.

A third alternative is BIGI, as suggested by Richardson et al. Be very careful here: BIGI is negative when women are favored and positive when men are favored. Equality occurs around zero.

$bigi vs female STEM and non-STEM fractions$

For non-STEM degrees, the trend is simple—the more women are favored, the more degrees they earn. But for STEM degrees, there’s a U-shaped curve where women earn the smallest share around BIGI ≈ -.02, where women are just slightly favored. Comparing BIGI to propensities gives a stronger, but less symmetric, effect.

While we’re on the subject… The red dots in the above graph show the same data as in Richardson et al.’s commentary above, which they used to claim that there was no gender-equality-paradox. (You can also see them by themselves with country labels.) What’s going on?

Well, for one thing, I made the graph ~~better~~ differently, switching the axes and using smaller markers so you can see the density of countries.

Don't believe me? Here's what you get if you take their graph, rotate right by 90 degrees, flip the vertical axis, and change the aspect ratio: transformed version of richardson's data

transformed version of richardson's data

If you look carefully, you can see that these dots are the same as the red dots above.

For another thing, they did a linear regression and found no significant result. That’s not too surprising, given that the effect above is nonlinear and symmetric.

Against BIGI

I think BIGI is a terrible measure of gender-equality and we shouldn’t be using it. For context here’s a plot comparing the other two measures we’ve looked at, GGGI and GII:

Are the Philippines more gender-equal than Japan (as GGGI implies) or the opposite (as GII implies)? I don’t know, but I’ll accept that it depends on different, reasonable definitions of gender-equal.

On the other hand, here’s a plot of GGGI against BIGI:

According to BIGI, Saudi Arabia—where women can only show their hands and eyes in public and must have a legal male guardian—is basically the same as Switzerland. And Lesotho—the tiny country inside South Africa—is by far the most women-favored place in the entire world. Ooohkaaay.

This isn’t to say that BIGI is bad exactly. They specifically discuss Saudi Arabia in their paper. My point is that it doesn’t capture what we have in mind in this context. At all. So while we do seem to get a paradox with BIGI, I think it’s meaningless and we should forget about it.

Other measures of women in STEM

While the result seems robust to different measures of gender equality, everything above uses the same data from UNESCO on the number of STEM graduates. We’ve analyzed it both in terms of propensities and raw fractions, and the result is still robust. Still, what if we use a different data source entirely to measure STEM participation?

For variety, I looked at the female share of researchers in engineering and technology. If you compare this to GGGI, there’s really no paradox at all. At most, there’s a bit of a “leveling off”.

If you look at natural science researchers instead of engineering, you again see no paradox.

On the other hand, if you use GII instead of GGGI, you do see a small effect in the most gender-equal countries:

Comparing GII to the natural sciences shows more of a leveling off than a full reversal.

I’m not sure if all these observations constitute a “paradox” exactly, but they aren’t something I would have predicted.

Takeaways

So, is there a gender-equality paradox? Three points.

First, Stoet and Geary’s original paradox is robust. It doesn’t matter how you measure gender inequality and or if you use propensities or raw fractions to measure women’s fraction of STEM degrees. It’s not fair to imply that they cherry-picked the details of their analysis to support some pre-determined conclusion.

Second, the paradox is somewhat limited. It appears with STEM degrees no matter how you define “equality”, or how you torture the data. For STEM researchers, the effect is more modest and only appears for certain definitions of gender equality. This is weird, and I don’t understand it other than that it suggests we need more nuance than “more gender equality → fewer women in STEM”.

Third, resist simplistic causal explanations! People choose degrees for lots of reasons: Economics, working conditions, family influences, cultural/media influences, intrinsic interest, and simply what degree programs are accessible. Most of these operate in feedback loops with each other. My love for scatterplots is vaster than the seas, but they’re at most vaguely suggestive of any single cause.

Plot all the plots

Lest I be accused of cherry-picking, here’s all the different ways of measuring gender inequality against all the ways of measuring women’s participation in STEM. I also threw in per-capita GDP and Breda et al.’s stereotype measurements. (For GDP I removed Qatar and the top 10 tax havens where GDP is meaningless.)

	GGGI	GII	BIGI	GDP	stereotypes
STEM propensity	x	x	x	x	x
STEM degrees	x	x	x	x	x
non-STEM degrees	x	x	x	x	x
Engineering researchers	x	x	x	x	x
Natural science researchers	x	x	x	x	x

Choose the column you want on the x-axis, the row you want on the y-axis, and let the beautiful dots wash over you.

Data sources

GGGI: Wikipedia (2015 rankings)
GII: Wikipedia (2019 rankings)
BIGI: genderinequality.info
Women’s share of STEM / non-STEM: The actual UNESCO data used for the share of STEM degrees going to women appears to no longer be on their website. With much gnashing of teeth, I was able to find an older version on archive.org.
Propensities: Due to the same problem, I couldn’t find the raw data for propensities. Instead, I took these from Stoet and Geary’s supplementary material.
Women’s share of engineering / natural science researchers: UNESCO report
GDP: The IMF’s 2021 estimates in purchasing power parity, via Wikipedia.
Stereotypes: Breda et al.’s supplementary material.

Taste games

or why I avoid midrange beer

I bet you like it when beautiful people laugh at your jokes. And I bet you like the taste of sugar. I sure do. But what about camping or dubstep or chain restaurants or installation art? What about blue cheese...

Your tastes are a point in space

a 2d space

Here’s something that seems weird: More educated people are more often Democrats. Richer people are more often Republicans. Richer people tend to be more educated. Don’t believe me? Here, I made some figures. (All data comes from the General Social...

Bourdieu's theory of taste: a grumbling abrégé

what the stuff we like says about us

I recently noticed that when I buy beer, I sometimes get Belgian Trappist Quintupel. And I sometimes get American Fermented Value Product. But never Blue Moon or Sam Adams or Peroni or Becks or Pilsner Urquell. Why? I guess I...

You, your parents, and the hotness of who you marry

Why do you disagree with your parents about the importance of looks?

When you look for someone to marry, you’ll care about many things: Are they smart? Healthy? Kind? Funny? Educated? Employed? And are they, like, wicked hot?

Homosexuality and evolution

Why does evolution do what it does? And if you're a sentient being created by it, what should you think about that?

Here are two things that seem to be true:

General factors of intelligence and physical fitness

A review of correlations between human performance on physical and mental tasks, plus a description of how factor analysis is like a cigar

Is there a general factor of intelligence?

Are some personalities just better?

People with the same big five personality types tend to be more happy, successful, intelligent, creative, and popular. Why aren't there more tradeoffs?

I don't know if you like parties. I don't know if you're organized or punctual. But I bet you don't like rotting smells or long swims in freezing water. That is to say: People are different, but only in certain...

In defense of Myers-Briggs

examines claims that the big five personality traits is more scientifically valid than the myers briggs personality indicator

The Myers-Briggs Personality Indicator (MBTI) gets a lot of scorn. It would seem that the the MBTI is nonsense, but the Big Five is a real, scientifically valid test. To be sure, there's nothing wrong with the Big Five. But...