9 Causal Inference

Global health researchers ask lots of causal questions. Do subsidies increase the coverage of bed nets? Does receiving a subsidized bed net make people less likely to purchase a full-price net in the future? Do bed nets prevent malaria? What happens if we treat bed nets with insecticide?

via GIPHY

What if? Seems like such a simple question.

Apologies to any philosophers for my severe simplification of causality.

In many cases it is. We humans are experts at inferring causal relationships in our daily lives. You know what will happen if I hurl a carton of eggs at the sun. The Law of Gravity is a deterministic model. What goes up must come down.

But global health is not this simple. The causes we investigate tend to have a probabilistic (or stochastic, if you are fancy) relationship with effects on health and behavior. What goes up might come down.

So how

Causal inference is a large, vibrant field of study today, and researchers continue to develop new techniques for drawing causal inferences from experimental and non-experimental data. In this chapter I’ll introduce you to this field

My goal in this chapter is relatively modest: I want to help you understand why making causal inferences is difficult and introduce you to several widely used frameworks for asking and answering causal questions.

9.1 Cause and Effect

For example, an experimental treatment is given to 100 people suffering from a disease, and only 60 get better. If the causal relationship between the drug and disease state were deterministic, all 100 patients would have recovered. This is not what happened, however. The causal relationship only increased the probability that the effect would occur.

In his book Causal Inference in Statistics, computer scientist Judea Pearl (2016) provides a simple definition of causes: “A variable X is a cause of a variable Y if Y in any way relies on X for its value.” The phrase “in any way” is a reminder that most of the causal relationships we investigate in global health are not deterministic and effects can have more than one cause.

Although it is clearly possible to estimate the average difference between two groups, can this difference be interpreted as an estimate of the causal impact of X on Y? In other words, are X and Y causally related? Shadish et al. (2003) point to useful three characteristics of causal relationships:

  1. The cause is related to (i.e., associated with) the effect.
  2. The cause comes before the effect.
  3. There are no plausible alternative explanations for the effect aside from the cause.

Condition #1 is easy to establish. Is X correlated with Y? In fact it’s so easy to establish that someone came up with the maxim, “correlation does not prove causation,” to remind us that the burden of proof is greater than the output of correlate x y or cor(x, y), or whatever command a statistical software package runs. But it is a start.

Condition #2 is a bit harder to demonstrate conclusively because X and Y might be correlated, but the causal relationship may run in the opposite direction—maybe Y causes X. Correlations do not conclusively indicate which comes first, X or Y.

Consider malaria and poverty as an example. Jeffrey Sachs and Pia Malaney (2002) published a paper in Nature in which they wrote:

As a general rule of thumb, where malaria prospers most, human societies have prospered least…This correlation can, of course, be explained in several possible ways. Poverty may promote malaria transmission; malaria may cause poverty by impeding economic growth; or causality may run in both directions.

Condition #3 is the trickiest of all: ruling out plausible alternative explanations. As Sachs and Malaney note, the literature on poverty and malaria has not found a way to do so conclusively. They write that it is “possible that the correlation [between malaria and poverty] is at least partly spurious, with the tropical climate causing poverty for reasons unrelated to malaria.” The authors are proposing that climate is a potential cause of both poverty and malaria. If true, that would make climate a confounding (or lurking) variable that accounts for the observed relationship between poverty and malaria.

Causal impact is the difference in counterfactual outcomes (i.e., potential outcomes) caused by some exposure, program, intervention, or policy. Although this sounds simple, it leads back to a fundamental problem: only one counterfactual outcome can be observed for an individual; we cannot observe someone in two states simultaneously (i.e., the treatment and the control). Therefore, it is not possible to observe an effect of the program on an individual. Instead, groups of individuals are observed both under the intervention and without the intervention. Thus, we an infer the counterfactual by comparing some people who get some treatment to other people who do not.19 Effects can and often are measured on other units like schools, clinics, etc., but it is easier to think about “subjects” as being people.

9.2 Statistical Inference vs Causal Inference

As you learned in the last chapter, statistical inference helps you to estimate the direction and size of an effect and to test hypotheses. But neither Frequentist p-values nor Bayesian credible intervals tell you if your estimate or test result reflects a causal relationship. To make a causal inference, you have to consider the study design and analysis details.

For instance, imagine you conduct a study of workforce retention and observe that, on average, team members who took advantage of optional yoga classes before work reported 20% greater job satisfaction, p < 0.05. You can reject the null hypothesis (if you’re into that) and conclude that this is a statistically significant difference. “Eureka,” you shout, “yoga increases job satisfaction!”

Not so fast, friend. Before you tell the world about the benefits of downward facing dog on retention, consider this question: are there other plausible explanations for why employees who participated in the classes scored higher on a measure of satisfaction?

Quiet reflection in child’s pose

Of course there are! You studied an optional wellness program that took place before the start of each shift. This means that participants selected themselves into the program. Guess who is more likely to choose to come early to work? People who like their jobs! So it’s plausible that the observed difference in mean satisfaction scores reflects the effects of satisfaction on the decision to participate in the program, rather than the effects of program on satisfaction.

9.3 Internal Validity

Internal validity is “internal” because it relates to the correctness of the inferences you make about your local study results. We can contrast this with external validity, which pertains to the applicability of your study results to different populations and places.

In this example, selection bias is probably leading you away from the truth about the causal effect of yoga on worker satisfaction. That’s just bias doing what bias does. A good reviewer will spot this risk of bias easily and might tell you that your study has low internal validity. This type of validity applies to your causal inferences—to your claims that your study result reflects a causal impact (Campbell 1957).

p < 0.05 tells you nothing about potential bias.

A tricky thing about bias is that it does not announce its presence. You have to know where it hides. As a researcher, this means thinking through possible sources of bias at the outset and designing a study that eliminates, reduces, or measures possible bias. As a reviewer, this means evaluating each claim for possible threats to internal validity that an author might not have addressed.

9.4 Threats to Internal Validity

Shadish, Cook, and Campbell (2003) outlined nine primary reasons why it might not be valid to assume that a relationship between X and Y is causal.


Table 9.1: Threats to internal validity. Source: Shadish et al. (2002), http://amzn.to/2cBaAM1.
Threats Definitions
Ambiguous temporal precedence Lack of clarity about which variable occurred first may yield confusion about which variable is the cause and which is the effect.
Selection Systematic differences over conditions in respondent characteristics that could also cause the observed effect.
History Events occurring concurrently with treatment could cause the observed effect.
Maturation Naturally occurring changes over time could be confused with a treatment effect
Regression When units are selected for their extreme scores, they will often have less extreme scores on other variables, an occurrence that can be confused with a treatment effect.
Attrition Loss of respondents to treatment or to measurement can produce artifactual effects if that loss is systematically correlated with conditions.
Testing Exposure to a test can affect scores on subsequent exposures to that test, an occurrence that can be confused with a treatment effect.
Instrumentation The nature of a measure may change over time or conditions in a way that could be confused with a treatment effect.
Additive and interactive effects The impact of a threat can be added to that of another threat or may depend on the level of another threat.

9.4.1 AMBIGUOUS TEMPORAL PRECEDENCE

Correlational studies can establish that X and Y are related, but often it is not clear that X occurred before Y. Uncertainty about the way a causal effect might flow is referred to as ambiguous temporal precedence—or simply “the chicken and egg” problem.

Sometimes, the direction is clear because it is not possible for Y to cause X. For instance, hot weather (X) might drive ice cream sales (Y), but ice cream sales (Y) cannot cause the temperature to rise (X).

Most relationships of concern in global health are not so clear, however. Take bed-net use and education as an example. Does bed-net use prevent malaria and allow for greater educational attainment? Or does greater education lead to a better understanding and appreciation of the importance of preventive behaviors like bed-net use?20 The possibility of bidirectonal (reciprocal) causation in not considered in this book.

9.4.2 SELECTION

The fundamental challenge of causal inference is that the counterfactual cannot be observed directly. In health research, we often compare a group of people who were exposed to the potential cause to a group of people who were not exposed. No matter the effort to make sure that these two groups of people are equivalent before the treatment occurs, there may be observable and unobservable ways in which these groups differ. These differences represent selection bias, which is a threat to internal validity.

For instance, Bradley et al. (1986) compared parasite and spleen rates among bed-net users and nonusers in The Gambia and concluded that bed nets had a “strong protective effect” against malaria. However, the authors also observed that bed net use and malaria prevalence were also associated with ethnic group and place of residence. Thus, ethnic group and place of residence are confounding variables, that is, plausible alternative explanations for the relationship between bed net use and malaria.

Identifying selection bias and trying to account for it in the analysis can be frustrating because not all biases are visible. The same applies to selection threats. Although some may be discernible, many confounding variables often go unnoticed. The only way to be certain that such threats have been minimized is to randomly assign people to conditions (i.e., study arms).

9.4.3 HISTORY

History threats to validity begin where selection threats end. Whereas selection threats are reasons that the groups might differ before the treatment occurs, history threats occur between the start of the treatment and the posttest observation.

Before-and-after studies (i.e., pre–post studies) are particularly susceptible to history threats. In these designs, researchers assess the same group of people before and after an intervention without a separate control or comparison group. The assumed counterfactual for what would have happened in the absence of the intervention is simply the pre-intervention observation of the group.

Okabayashi et al. (2006) provide a good example. In this study, the researchers conducted a baseline survey and then began a school-based malaria control program. Nine months later, they conducted a postprogram survey with the same principals, teachers, and students. On the basis of the before-and-after differences they observed, they concluded that the educational program had a positive impact on preventive behaviors. For example, student-reported use of bed nets (“always”) increased from 81.8% before the program to 86.5% after the program.

It is possible that the program changed behavior, but without evidence to the contrary, it is also possible that something else was responsible for the change. Maybe another program was active at the same time. Maybe there was a marketing campaign for a new type of bed net just entering the market. Maybe the posttest occurred during the rainy season when people know the risk of malaria is greater. The examples of possible history threats illustrate how causal inference can invalidate the impact of a study that includes behavior change as an outcome.

9.4.4 MATURATION

Single-group designs like Okabayashi et al. (2006) are also subject to maturation threats. The basic issue is that people, things, and places change over time, even in the absence of any treatment. For example, all children grow and change over the course of a school year. Therefore, comparing children at the end of the year to their younger selves a year earlier and making a causal inference about some program or intervention is problematic because kids gain new cognitive skills as they age. Changes observed can be due to a specific program or intervention, or they may simply be related to the passage of time. Without a comparison group (i.e., control group) of similar-aged children, it can be hard to determine the difference.

9.4.5 REGRESSION ARTIFACTS

Certain study designs are susceptible to regression artifacts. Sometimes, people are selected for a study because they have very high or very low scores on some outcome. Often, these scores are less extreme at retest, independent of any intervention. This statistical phenomenon is called regression to the mean, and it occurs because of measurement error and imperfect correlation.

9.4.6 ATTRITION

Attrition occurs when study participants are lost to the cohort, for example, when they do not participate in outcome assessments. Attrition that is uneven between study groups is described as systematic attrition. Whereas selection bias makes groups unequal at the beginning of a study, attrition bias makes groups unequal at the end of the study for reasons unrelated to the treatment under investigation.

For example, researchers recruit depressed patients to take part in an RCT of a novel psychotherapy that is delivered over the course of 10 weekly sessions. If the most depressed patients in the treatment group drop out because the schedule is too demanding, then the analysis would compare the control group (with the most depressed patients still enrolled) to a treatment group that is missing the most depressed patients.21 Recruiting depressed patients sounds like it could be subject to regression to the mean, but there is no cause for concern because a control group will undergo the same phenomenon. The data would show that the treatment group got better on average, but part or all of the observed treatment effect would be due to attrition of the most depressed patients from the treatment group, not due to the treatment.

9.4.7 TESTING

Repeated administrations of the same test can influence test scores, independent of the program that the test is designed to evaluate. For instance, practice can lead to better performance on cognitive assessments, and this improved performance can be mistaken as a treatment effect if there is not a comparison group. Testing threats decrease as the interval between administrations increases.

9.4.8 INSTRUMENTATION

Testing threats describe changes in how participants perform on tests over time due to repeated test administrations. When the tests themselves change over time, an instrumentation threat occurs. For example, if a study uses different microscopes or changes measurement techniques for the posttest assessment, differences in blood smear results could be incorrectly attributed to an intervention.

9.4.9 ADDITIVE AND INTERACTIVE EFFECTS

Unfortunately, a study can be subject to more than one of these threats to internal validity. Interestingly, threats can work in opposite directions, or they can interact to make matters worse. For example, if Okabayashi et al. (2006) had decided to compare students who went through the depression program to students from another part of the country who did not go through the program, their study might have been subject to both selection and history threats. The two groups of students might have been different to begin with (selection), and they might have had different experiences over the study period unrelated to their treatment or nontreatment status (history).

9.5 The Fundamental Challenge of Causal Inference

Causal impact is the difference between what happened and what would have happened. But

fundamental challenge of causal inference, which is that we can’t observe someone in two different states at the same time. Where two roads diverge in a yellow wood, you cannot travel both. The road you do not take represents what we call the counterfactual, a hypothetical situation that we never observe directly. Causal inference is concerned with imagining this counterfactual—asking “what if” you had taken the road less traveled by.

In an ideal research world you could answer causal questions by cloning individuals and observing what happens when each clone receives a different treatment. Cloning isolates the causal effect of some intervention by holding everything else constant.

Cloning isn’t an option, of course, so we’re left with what is referred to as the fundamental challenge of causal inference, which is that we can’t observe someone in two different states at the same time. Where two roads diverge in a yellow wood, you cannot travel both. The road you do not take represents what we call the counterfactual, a hypothetical situation that we never observe directly. Causal inference is concerned with imagining this counterfactual—asking “what if” you had taken the road less traveled by. Causal impact is the difference between what did happen and what would have happened.

Causal inference is all about imagining this counterfactual—asking “what if” you had taken the road less traveled by.

Page built: 2020-08-09

Eric P. Green
themethodsection.com
page built 2020-08-09