From Pre-Post to Difference-in-Differences: Enhancing Your Evaluation Strategy
When evaluating the impact of a program or intervention on your clients, one of the most common methods is the pre-post test design. This approach involves collecting data before and after the intervention to measure any changes. It’s a straightforward approach, but it cannot answer an important question about the change: why it happened or even what contributed to it.
You could run a hypothesis test on the change score and conclude, ‘there is a statistically significant difference between time A and time B.’ However, you can’t attribute the change score to the program. So how can we improve our evaluations to better understand the true effect of our programs?
Difference-in-Differences provides one accessible approach.
What is Difference-in-Differences (DiD)?
Difference-in-Differences addresses many of the challenges associated with pre-post test designs by introducing a comparison group that does not receive the intervention. This control group helps us control for possible external factors that could influence the change score over time. Here’s how DiD works:
First, we measure the same outcomes for both the treatment group (which receive the intervention) and the control group (which doesn’t) before the intervention begins (A-B in the graph above). Second, during our evaluation period, we aim to prevent any spillover effect by keeping the groups separate from one another. We also assume that the groups would follow the same trend on the outcomes over time without the intervention. Third, once our evaluation period concludes, we measure the same outcomes for both groups again (C-A and D-B in the graph above). By controlling for external factors, DiD gives us a clearer understanding of how much of the observed change can be attributed to the intervention.
Example: Evaluating a Foster Care Support Program
Let’s imagine you’re evaluating two versions of a foster care support program. The newer program offers additional support but comes at a higher cost compared to the original program. You want to know if the increased cost is justified by better outcomes.
Step 1: Pre-Test
At the start of the program, you measure key outcomes for both groups- those enrolled in the newer program and those in the original program. These are your pre-intervention measurements (A & B in the graph above).
Step 2: Control for Spillover
Throughout the study, you ensure that there is no ‘spillover effect’ between the groups, meaning the clients in each group don’t influence each other.
Step 3: Post-Test
After the intervention period, you measure the same outcomes again for both groups (C & D in the graph above).
Step 4: Calculate the Impact
To calculate the impact of the new program, you compare the change in the outcomes for the treatment group (C-A) with the change in outcomes for the control group (D-B). This gives you the difference-in-differences- the effect of the newer program compared to the original program.
Why Difference-in-Differences improves your claims
Using a pre-post test design without a comparison group, you could say, “The outcome changed by 15% over the time period.” While this gives you some insight, you wouldn’t be able to confidently say that the change was due to your program- it could be due to any number of external factors.
With Difference-in-Differences, however, you could say, “The newer program led to a 15% improvement in [specific outcome] compared to the standard approach to care.” By comparing trends between the groups, DiD allows you to strengthen your causal claims. You can attribute the change more confidently to the intervention.
When to use Difference-in-Differences
DiD is particularly useful when you want to understand the effect of an intervention but can’t randomly assign participants to treatment and control groups. This is common in many real-world settings of nonprofits. By leveraging a comparison group, you can still derive meaningful insights and make stronger causal claims about your program’s effectiveness.
What’s Next?
In our next post, we’ll explore an important question: What if the program we’re interested in evaluating has already started?
What is the difference between outcomes and impact?
It all begins with an idea.
Is your program REALLY making a difference?
This question is deceptively simple. “Yes, of course…” and we share stories or output measurements like, “X number attended our events,” or “Y number successfully completed our program.” Logic models help define the relationship between our planned work and intended results.
Inputs: Resources needed for program activities.
Activities: Services or components of a program.
Outputs: Direct result of activities.
Outcomes: Intended benefits of activities.
It’s important to distinguish outputs and outcomes because outputs can easily inflate our perceived impact. During my freshman year of college, I started a weekly meeting in a local prison with men approaching reentry. It was a form of kind-hearted social malpractice that I thought was making a real impact. Why? The room was packed every week! And the feedback of inmates was always positive. The difference between outputs (attendance) and outcomes (benefits) could be revealed with the follow-up, “so that…” to the outputs. “Well, they come to the meetings, so that they can develop a reentry plan, identify housing and job prospects, and strengthen positive relationships.” These were some of the (unmeasured) program outcomes. How many inmates obtain stable housing and employment? Such an outcome measurement could be distinguished by short and long-term estimates.
But do outcomes equate to impact?
No, outcomes alone cannot fully answer the question, “Is your program making a difference?” We could show some convincing pre and post-test data that is statistically significant. However, what outcomes alone cannot tell us is what would have happened if the person did not participate in the program. This is known as the counterfactual. The impact of a program is simply “what happened” minus “what would’ve happened without it.”
Impact = factual — counterfactual
Unfortunately, the simplicity of the concept belies its complexity in practice. Since counterfactuals aren’t observable, causal inference methods are necessary to try to create the context where we can come as close as possible to observe the unobservable. In the next few posts, I’ll share the primary concepts, terms, and methods for measuring social impact. Before we jump in, let’s make sure to distinguish between simple difference and treatment effect (impact).
Simple difference is NOT the treatment effect
Treatment effects (impact) are changes in outcomes due to changes in treatment (activities) holding all other variables constant. This last phrase is really important. In order to answer the question, “Is the program REALLY making a difference?” we need to isolate the effect of the program, which means that a simple difference between pre and post-test scores cannot be the treatment effect. Post-test data could show significant positive change during the time of the program, but how do we know the program is responsible for the change? We need to create a control group in order to ‘hold all other variables constant.’ The gold standard for such an evaluation is a randomized controlled trial (RCT); however, our focus will be social programs where resources are limited and RCT’s aren’t feasible. How can we identify the effects of our programs with limited resources? That’s our focus in this series of posts.