Just another WordPress.com site

The Simpson’s Paradox

This statistical paradox has been introduced by E.H. Simpson in 1951. This apparently impossible effect can be very often observed in social and medical research. The idea is that the effect of several groups seems reversed when the groups are combined.

As a real world example of this paradox, it is worth looking at the example proposed by Morrel (1999)[ http://www.amstat.org/publications/jse/secure/v7n3/datasets.morrell.cfm%5D, or Charig, Webb, Payne, and Wickham (1986). In this medical study they compared success rates on two different treatments for kidney stones. 

The following table shows the success rates and the number of treatments involving both large and small kidney stones. Treatment A includes all the open surgery procedures, and treatment B percutaneous nephrolithotomy.


Treatment A:

Treatment B:

Small Stones:

Group 1

93% (81/87)

Group 2

87% (234/270)

Large Stones:

Group 3

73% (192/263)

Group 4

69% (55/80)


78% (273/350)

83% (289/350)


As a conclusion it is clear that the treatment A is more effective than the treatment B on Small stones as well as on the Large stones. However when both sizes are compared at the same time, the treatment B seems to be more effective.

I am sure many of you can instantly see a big problem with this design. The group sizes are not equal. Also the size of the stone was a confounding variable. The inequality between the two ratios (success/ total), needs to be considered to determine which treatment is more successful.

When the confounding variable is ignored, the group means differ a lot. The severe cases (large stones) were more likely to receive the better treatment (A), and milder cases (small stones) the other treatment. So the totals were dominated by bigger groups 1, and 3, rather than smaller 2, and 4. 

Severity of the case, influenced the success rate more than the choice of the treatment. This means that the group of patients with small stones (group 2) will do still do worse when administered the inferior treatment (B), than patients with large stones (group 3) when administered the better treatment.

According to Pavlides and Perlman (2009), the probability of occurrence of the Simpson’s Paradox just by chance in a 2x2x2 table is 1/60.

The Simpson’s Paradox points out how important it is to include data about possible confounding variables, when calculating causal relations. Pearl (2000, 2009), gives a precise criteria for selecting confounding variables when using causal graphs.

In conclusion this is another very interesting probability paradox. The researchers need to be very careful, and try to avoid such confounding variables. Not just to avoid getting involved with the paradox itself but also to avoid making Type 1 and Type 2 errors which are often results of confounding variables. However it might be a good idea to explore this topic further for the next blog. Thanks for reading!


Comments on: "The Simpson’s Paradox" (4)

  1. I’m afriad that after reading your blog I was rather unclear upon what the simpson paradox actually was. I recognised the name etc but wasn’t greatly sure of what it entailed. After reseraching it breifly I understood fully what you were talking about and it all became clear.

    A short description rather than just the one sentance would really have clued the understanding of what it was I was reading. Also there was no discussion of what it is used for and why it is used.

  2. Although initially disappointed that “The Simpsons Paradox” had nothing to do with Homer or Marge, I think you raise several interesting points in this blog. I may have to agree slightly with iamjackscompletelackofsurprise (awesome name btw) that you were a little unclear in places, but I managed to understand what you meant and the problems with this paradox. It is definitely an interesting topic seeing as we are all deciding on project hypotheses at the moment; I certainly wouldn’t want this to occur in the experiment we are going to run!
    An obvious way in which to not let this happen would be to not combine data sets of different sizes, as well as attempting to account for or prevent confounding variables from being present. This can be done through laboratory conditions that have a high level of control (Dement & Klietman, 1957), or assessing participants on a specific trait before the actual experiment e.g. aggression (Bandura, Ross & Ross, 1961).

  3. […] https://kmusial.wordpress.com/2012/03/10/the-simpsons-paradox/#comment-39 Share this:TwitterFacebookLike this:LikeBe the first to like this post. This entry was posted in University; Psychology by exactestimates. Bookmark the permalink. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Tag Cloud

%d bloggers like this: