Global health, many would agree, is more a bunch of problems than a discipline. As such it lacks theories that can generalise findings—through an iterative process of knowledge construction, empirical testing, critique, new generalisation, and so on—into durable intellectual frameworks that can be applied not only to distinctive health problems, but to different contexts and future scenarios.
That’s psychiatrist and medical anthropologist Arthur Kleinman writing in The Lancet about the atheoretical approach that characterizes much of global health research (Kleinman 2010). Rather than test and build theory though the hypothetico-deductive model—that is, making predictions based on theory and collecting data to test these predictions—global health scholars and practitioners mostly solve problems.
Yes, you’ll find a veneer of the hypothetico-deductive model in our work, a hypothesis here and there, but across a large swath of the research landscape in global health, we test empty hypotheses that do not put theory to the test. We ask questions like, “Does
MY PROGRAM improve
IMPORTANT HEALTH OUTCOME?” We then hypothesize that, yes,
MY PROGRAM will improve health compared to
NOT MY PROGRAM, and use a statistical model to test whether the improvement is equal to zero.5 Frequently you’ll find that authors do not even explicitly state this type of hypothesis.
Skeptical? Let’s put this claim to the test. Based on the Kleinman theory of the atheoretical practice of global health6 Apologies to Professor Kleinman. If you read his article, you’ll see that he goes on from this observation to offer four social theories for global health that could help us critically reflect on illness and the practice of medicine., I hypothesize that a search of recent research articles published in a leading global health journal will turn up no evidence that the studies sought to contribute to theory. To test this hypothesis, I found the latest issue of The Lancet Global Health (July 2019, Vol 7, Num 7) and searched the 13 original research articles for the words
mechanis*. Here’s what I found: nothing. None of the articles had anything to say about theory.7 But of course we don’t say the results of any one study, especially one as poor as this, “prove” a hypothesis. Should we have expected something different?
Several studies were descriptive in nature, so we might not expect them to have much of a theoretical basis or any hypothesis testing to falsify a theory. For instance, Blencowe et al. (2019) estimated prevalence of low birth weight for 195 countries from 2000 to 2015. Hsia et al. (2019) described the use of antibiotics to treat pediatric patients across 56 countries. And Ahmed et al. (2019) estimated modern contraceptive prevalence rates across eight sub-Saharan African countries. To use Kleinman’s phrase, these articles reported on a bunch of problems.8 And good that they did. Descriptive work like this helps us to understand problems.
I don’t intend to pick on these studies. Each one makes an important contribution to the literature. Wagg et al. (2019) find that a simple group-exercise intervention helps manage urinary incontinence in rural, older Bangladeshi women who might not be able to access pharmaceutical or surgical interventions. Sullender et al. (2019) report that vaccinating children in India against influenza had short-term, indirect protective effects for unvaccinated household members. And Gureje et al. (2019) show that a stepped care intervention for depression delivered by community health workers did not result in a clinically significant benefit compared to enhanced usual care, but might be more cost-effective to implement for the same outcome.
The collection of articles did, however, include several intervention trials (Wagg et al. 2019; Sullender et al. 2019; Gureje et al. 2019) that had the potential to contribute to theory, but did not (at least not directly). Instead, these trials sought only to estimate the effect of each intervention on the primary outcome and make an evidence-based claim about whether the intervention “worked”, but not why or why not. Some would call these trials examples of “black box” evaluations. A black box evaluation is one that looks to see if
X leads to
Y—does the intervention improve some health outcome—but does not explore the mechanisms of this change (or the failure to change).
Scholars have been lamenting this black box trend for some time, and not just in global health. Chen and Rossi (1989) wrote about the need for theory-driven evaluation and criticized theoretically-empty program evaluations as being “at best social accounting studies that enumerate clients, describe programs and sometimes count outcomes.”9 See Stame (2004) for an interesting discussion of black box evaluations in the program evaluation literature. Writing about evaluation in international development, White (2009) called for theory-based impact evaluation to open the black box approach. More recently, Wilke and Humphreys (2019) observed the same trend in political science and economics with the rise of black box field experiments.
Is theory dead? Does it even matter?
Wilke and Humphreys (2019) consider this question and make a strong, affirmative case for theory. Citing economist, Nobel laureate, and black box evaluation critic Sir Angus Deaton (2010), the authors remind us that many scholars believe the whole point of research is to learn about theories—to understand how the world works. Let’s consider a few roles for theory before turning to an example of theory in action.10 Items 2-5 in this list come from Wilke and Humphreys (2019). I leave out two additional items from the authors’ original list: denying that we need theory in the first place and using experiments to estimate structural models. Read their paper to learn what you missed.
1. Use theory to improve intervention design
You’d be surprised how often organizations get funding to implement a Program
X that promises to make Outcome
Y better without very much thought about how doing
X will bring about change in
Y. In the second half of this chapter, my aim will be to convince you that it’s easy and worthwhile to think critically about this issue and develop what we call a theory of change.
2. Use theory to improve research design
For reasons that are a bit too technical for this point in our journey, theory can help us design better experiments. For instance, theory can inform the choices we make about where to conduct a study, which treatments to examine, how to recruit participants and randomize them to different study arms, and what variables to measure alongside the outcome of interest. We’ll focus on this last point in the next chapter.
3. Use theory to improve inferences
Imagine that you are a policymaker in Pakistan interested in geriatric health and you come across the paper by Wagg et al. (2019) in The Lancet Global Health (discussed above). You want to know whether the group-exercise intervention they tested in rural Bangladesh is a good use of public funds in Pakistan. Wagg et al. (2019) reported good effects in the Bangladesh trial and, despite listing several limitations, concluded (without evidence or justification) that “these results would likely be generalisable to other communities with few resources in which physiotherapists are working with groups of older people in the community”. But you might wonder if the results will really travel to your setting and hold despite some necessary contextual changes to the program design you’ll need to make. Wilke and Humphreys (2019) suggest that drawing your assumptions in a causal model could help to extend the inferences you can make from this one study in Bangladesh.
4. Use empirical studies to BUILD theory
In the same way that we can use qualitative research to observe the world and, through induction, generate ideas we can test in future studies, we can also use quantitative evaluations to generate theory. On this point you might find authors of black box evaluations quick to say, “But we do this!” In some cases this is probably true, at least in their own minds. I know that I accumulate ideas generated from my work that I do not formally document as theoretical ideas in a Discussion section of any particular article.11 To find writing about theory generation, look for articles with the word “conceptual” in the title (e.g., Li et al. 2015). My reading of Wilke and Humphreys (2019) is a reminder that empirical work is important for theory generation, even if not explicitly linked to published study results.
5. Use empirical studies to TEST theory
This brings us back to the hypothetico-deductive model of science: use theory to generate falsifiable predictions and collect data to test these hypotheses. This is the classic deductive approach to science that some would argue is missing from global health studies.
Figure 6.2: The putative role of PfATP4 in Na+ homeostasis in the malaria parasite, P. falciparum. Source: Spillman and Kirk (2015).
While much of global health might be described as atheoretical, you can find good examples of theory-informed research if you know where to look. One area to explore is basic science and early phase clinical trials where scientists develop theories about the mechanism of action of drug candidates. For instance, Spillman and Kirk (2015) reviewed studies that have tested hypotheses derived from a theory that the protein ‘PfATP4’ plays a role in helping malaria parasites to maintain a low cytosolic Na+ concentration and receive energy.
Another area is development economics. Let’s consider an experiment from Dr. Pascaline Dupas on HIV education in Kenya (Dupas 2011). I consider it to be an interesting example of theory building and testing. Download your copy here.
In 2014, an estimated 1.4 million people in Kenya were living with HIV, a prevalence rate of 5.3% among adults aged 15 to 49. Without a cure for AIDS, prevention remains critical to ending the epidemic.
Starting in 2001, Kenya integrated HIV/AIDS education into the primary school curriculum as a new prevention strategy (JPAL 2007). At the time, the focus of this program—and many other programs across sub-Saharan Africa—was complete risk avoidance, otherwise known as abstinence. Information on risk reduction was limited. Specifically, students were not learning about the differential prevalence of HIV infection by age and gender. Girls were not learning that the older “sugar daddies” who provide nice things like phones and airtime in return for sex are more likely than the girls’ goofy age mates to be infected.
An organization called International Child Support (ICS) Africa aimed to change this by rolling out a “Relative Risk Information Campaign” in Kenya. The intervention was brilliant in its simplicity. A program staffer would talk with students for 40 minutes. During this time, the staffer showed the class a 10-minute video on sugar daddies and led a discussion about cross-generational sex. During the session, the staffer reviewed results of recent studies and wrote facts about the distribution of HIV prevalence on the chalk board.
Researchers from the Abdul Latif Jameel Poverty Action Lab (JPAL) tested ICS Africa’s risk reduction strategy in a randomized experiment in Western Kenya. In the first phase (2003), 328 schools were randomized to teacher training on the national HIV prevention curriculum (Duflo et al. 2006). In the second phase (2004), 71 of these schools were stratified and randomized to receive the sugar daddy intervention (Dupas 2011). In total, there were 4 study arms: (1) teacher training only, (2) sugar daddy only, (3) teacher training and sugar daddy, and (4) nothing.
The results were shocking. Teacher training was a bust. Although the training led to a change in teaching practices—notably that trained teachers mentioned HIV in class more often than nontrained teachers—it had little effect on HIV knowledge or childbearing rates.
Increasing knowledge about HIV makes intuitive sense as an outcome for a study about HIV prevention. But why childbearing? thinks. Because it is harder to lie about having a baby than it is to lie about private sexual behavior, childbearing is considered a more objective measure of unprotected sex. Unprotected sex is also a main driver of HIV transmission, so childbearing serves as a proxy for HIV risk from unprotected sex.
In contrast, the 40-minute sugar daddy discussion and video reduced childbearing with men at least 5 years older by 65%—and not because girls started having babies with males their own age. The overall incidence of childbearing fell by 28%. With a cost of $28.20 USD per school and $0.80 per student, the cost per childbirth averted was $91 (JPAL 2007).
Figure 6.3: Written description of the relative risk reduction conceptual framework. Source: Dupas (2011).
Unlike The Lancet Global Health articles referenced above, Dupas (2011), which was published in the American Economic Journal: Applied Economics, includes a long Introduction to set up the paper, a thorough background section that discusses HIV education in Kenya, and a section called “Theoretical Motivation”.
In this section, Dupas proposes a conceptual framework that I have attempted to present visually in the figure below. This framework puts forward a theory of why teenage girls engage in unprotected sex with older men. Core to the theory is an assumption that girls do not have information about the relative risk of contracting HIV from older men compared to their male peers. As Dupas explains:
Teenage girls maximize their expected utility based on their beliefs about the risks of HIV infection and pregnancy occurring, and how those risks vary with condom use and partners’ characteristics.
Dupas goes on to derive several predictions from this framework (sounds very hypothetico-deductive, right?). Here is one such prediction:
If all men have the same reservation price for sex with teenage girls (…), information on the distribution of [HIV] prevalence among men unambiguously leads teenage girls to move toward lower-risk partners (teenage boys) and thus reduce the rate of cross-generational transmission of HIV.12 Outside of the physical sciences, most theories are probabilistic. Rather than hypothesizing that what goes up must come down, we hypothesize that coming down will be more likely.
In other words, all else equal, telling girls that older men are riskier sexual partners should lead them to have less sex with older men. That is a testable hypothesis. So what did she find?13 Econ papers are different in more ways than just structure and length. Solo authored papers like this one are very uncommon in other disciplines like public health and medicine.
in response to the RR information, teenage girls substituted away from older partners and toward protected sex with teenage partners, but not more than one-for-one.
And there you have it: proof that global health is not completely atheoretical!
Your exposure to conceptual models—and your ideas about what makes a good model—will depend in part on the disciplinary home of the literature you read most often. But at the core, every model is a simplified representation of a more complex reality. A plastic replica of the human heart is a model. So is an epidemic model of Ebola transmission. Neither one is perfect, but both are valuable tools for teaching and learning. As the mathematician and statistician George Box famously wrote, “Essentially, all models are wrong, but some are useful” (Box and Draper 1987). Let’s take a discipline-specific tour of conceptual models before landing on one of my favorites, the theory of change.
Figure 6.7: Graphical presentation of confounding in directed acyclic graphs. Identification of a minimal set of factors to resolve confounding. In (a), the backdoor path from chronic kidney disease (CKD) to mortality can be blocked by just conditioning on age, as depicted by the box around age. However if we assume that cancer also causes CKD (b), the backdoor paths can only be closed by conditioning on two factors, either age and cancer (as depicted) or cancer and dementia. Source: Suttorp et al. (2015).
A related type of causal diagram is the directed acyclic graph, or DAG (or causal diagram/graph). This idea from mathematics and computer science has been applied to observational research to identify potential confounding variables that need to be addressed to make valid causal inferences (Greenland, Pearl, and Robins 1999). Figure 6.7 shows two DAGs. In this example, Suttorp et al. (2015) shows that, at a minimum, it is necessary to control for age when estimating the relationship between chronic kidney disease and mortality (a). But if we also assume that cancer is directly related to kidney disease (b), it is also important to control for cancer. Drawing out these relationships helps clarify that it may not be necessary to control for dementia because there is no direct relationship between dementia and kidney disease. We’ll review causal diagrams in more detail in Chapter 8.
Figure 6.8: Compartmental flow of a mathematical model of the Ebola Epidemic in Liberia and Sierra Leone, 2014. Source: Rivers et al. (2014).
Epidemic models are used to explain or predict the spread of an epidemic. Kermack and McKendrick (1927) proposed a deterministic compartmental model called SIR that consists of the number of uninfected people susceptible to the disease (S), the number of infected (I), and the number of people removed (R) through death or immunization. Rivers et al. (2014) used this basic framework to create a compartmental model of the 2014 Ebola outbreak in Liberia and Sierra Leone (Figure 6.8):
Figure 6.9: Information-Motivation-Behavioral Skills structural equation model. Source: Zhang et al. (2011).
Statistical models are nondeterministic and thus incorporate stochastic variables. In other words, some of the variables being modeled have probability distributions rather than constant inputs like those in physics or mathematics. Statistical models are typically communicated as a set of equations and are visualized as a set of results. An exception is the path diagram represented visually in Figure 6.9.
Figure 6.10: Effect of increased inequality on population mortality. Source: Wildman and Shen (2014).
Economic models can be stochastic or nonstochastic. The field of econometrics shares much in common with statistics, including a focus on stochastic models. Economics more broadly, however, also uses nonstochastic models. For instance, Figure 6.10 shows the hypothesized mathematical relationship between mortality and income (Wildman and Shen 2014). This graph is based on theory and does not plot actual empirical data collected in a particular setting.
Theory of Change: It’s Easier Than You Think. Chemonics International (2018). Source: https://tinyurl.com/y5fc8m6l.
Underneath any good claim of causal inference is a theoretical model of how the researcher thinks
X actually causes
Y. Few studies set out to test a specific mechanism of impact, but many impact evaluations are designed around a theory about how the world works. This is called a theory of change. It’s a type of conceptual model used extensively in the evaluation literature.
A theory of change articulates how an intervention—or a policy, program, or treatment—is expected to impact an outcome. It explains how
Y, and what is needed for this to happen. This concept may be referred to as a theory of change, a mechanism of change, a logic model, a logical framework, a causal model, a results chain, a pipeline model, a results framework, a program theory, or one of several other combinations of these terms. Let’s review strategies for creating a theory of change diagram and its cousin the logic model as a precursor to thinking about study measurement.
Theory of change diagrams can be designed in various ways. There is no RIGHT WAY™ to create one, as long as the fundamentals of how
X leads to
Y are conveyed. If the diagram is easy to understand, it is a good diagram.
Building Adult Capabilities to Improve Child Outcomes: A Theory of Change. Center on the Developing Child at Harvard University (2013). Source: https://tinyurl.com/y4a3n25a.
Most theory of change diagrams include a few common elements. The United Kingdom’s Department for International Development, commonly known as DFID or UK Aid, commissioned a report on the uses of theories of change in international development that identified several common components:
Start with the (research) problem statement (Box 1). This box gets at the heart of the reason the intervention exists. What is the problem to solve? Although this seems obvious, too often there is a disconnect between the primary aim—solving a problem or answering a research question—and the intervention strategy.
Next, take and inventory of the assets that already exist and the needs that remain (Box 2). There is always something to build upon, so it is important to look for strengths in addition to challenges. This process is best conducted in collaboration with people impacted by the problem so that the proposed solution is grounded in their reality. If available, descriptive data sources like the DHS can help to outline boxes 1 and 2. For a more local perspective, it is often beneficial to conduct a brief needs assessment in partnership with representatives from the local community if resources permit.
Box 3 jumps to the desired results. If the program works, what will change? With the results articulated, reconsider the factors that might affect the program’s success positively or negatively (Box 4).
The next step is to outline strategies for achieving the desired results, accounting for potential barriers and facilitators (Box 5). At this point, what the program will actually do should be stated clearly.
Finally, Box 6 reminds us that every theory of change is built on a set of assumptions. It is important to be thorough and transparent when considering the hidden beliefs that underlie the ideas about how this program will achieve results.
Let’s return to our relative risk reduction example from Dupas (2011) and use this template to create a theory of change. Take a moment to compare this theory of change diagram to the previous conceptual framework.
While a theory of change tends to be a high-level depiction of the “why”, a logic model—or logframe—is more detailed and focuses on the “how”. Logical models are useful tools for program planning, monitoring program implementation, and program evaluation and reporting.
If you leave school and take a job in global health or international development—especially if you work at USAID, DFID, or one of their grantees/contractors—you’ll become very familiar with theory of change diagrams and logic models. The most common thing that former students write to tell me is that they impressed the boss by knowing how to create a logic model on Day 1. You’re welcome.
Logic models are often presented in the “results chain” or “pipeline” format shown in Figure 6.13. Inputs and activities represent the planned work. Outputs, outcomes, and impact are the intended results.
Inputs are the resources needed to implement the program, most often including people, money, program materials, and time. Activities comprise what the program will do, like trainings, events, and distribution of goods.
Outputs are counts of what the program did, such as the number of people trained, number of events held, number of goods delivered and number of people who benefitted. Conversely, outcomes are expected indicators of change—the short- and medium-term results of the program. Examples of outcomes include increased knowledge, decreased risky behavior, and improved functioning. Similar to outcomes, impacts are what you expect will happen in the long-run as a result of the program. Things like lower HIV prevalence and reduced mortality.
Judging by the eyerolls I observe in class, it must seem redundant to students to further explain the difference between outputs and outcomes. But judging by historical performance on the midterm, a bit of redundancy is warranted.
So think of it this way:
So what’s the difference between an outcome and impact? Generally speaking, impacts are longer-term changes to population-level indicators. Most studies are too small and short to actually detect any change in “impacts”. For instance, in the Kenya HIV risk reduction program the ultimate goal was to reduce HIV transmission, but the study “only” estimated the impact of the program on unprotected sex (the outcome). The assumption is that reducing unprotected sex in the short-term will reduce HIV transmission over the longer term if the program were to be scaled-up.
Figure 6.14 shows what a logic model might look like for the relative risk reduction program.
It’s been said that global health is more a bunch of problems than a discipline. If you pick up a recent article on a global health topic, chances are good that you won’t find much, if any, reference to theory. Instead, black box evaluations are the norm. These studies ask “does my intervention work, and to what degree”, but not why, why not, or how. There are compelling reasons to think that more fully integrating theory into our work would benefit our studies in terms of design and inference, while also advancing the progress of science. As shown in the Kenya relative risk reduction evaluation, a good conceptual framework can clarify the theoretical motivation underlying your intervention. One common form of conceptual model in the evaluation literature is called a theory of change. A theory of change lifts the lid on the black box and states how intervention activities are hypothesized to generate outcomes, and what assumptions must hold for this to happen. Another way to draw these assumptions is called a logic model. Both models serve as an excellent bridge to our next topic: measurement.
Page built: 2020-08-09
Eric P. Green
page built 2020-08-09