When you think about how sample sizes are decided for an impact evaluation, the mental image is that of a lonely researcher laboring away on a computer, making calculations on STATA or Excel. This scenario is not too far removed from reality.
But this reality is problematic. Researchers should actually be talking to government officials or implementers from NGOs while making their calculations. What is often deemed as ‘technical’ actually involves making several considered choices based on on-the-ground policy and programming realities.
In a recent blog here on Evidence Matters, my 3ie colleagues Ben and Eric highlighted the importance of researchers clarifying the assumptions they had made for their power and sample size calculations. One assumption that is key to power and sample size calculations is the minimum detectable effect (MDE), the smallest effect size you have a reasonable chance of detecting as statistically significant, given your sample size.
For example, in order to estimate the sample size required for testing the effects of a conditional cash transfer (CCT) programme on children’s learning outcomes at a given power level, say 80 per cent, the MDE needs to be assumed. This assumption is about the minimum effect we want to be able to detect. Suppose the CCT programme is expected to lead to a 20 per cent or more increase in learning outcomes as compared to the control group, then the researchers might use 20 per cent as the MDE for calculating the required sample size. Of course, the baseline value of the outcome is important and one’s expectation regarding how much the outcome will improve depends on the baseline value with which one is starting.
But there is often no clear basis for knowing what number to pick as the MDE. So, researchers make a best guess estimate or assume a number. Some researchers might use the estimated effect size of similar past interventions on selected outcomes. Others might do a thorough literature review to understand theoretically how much the impact should be, given the timeline of the intervention.
But how one comes up with this number has serious implications. Once you decide on a MDE of 20 per cent, your impact evaluation will have less chance of detecting a difference of anything smaller than 20 per cent. In other words, if the actual difference between treatment and control schools for learning outcomes is 10 per cent, the study that is powered for a 20 per cent difference is more likely to come up with a null finding and conclude that the CCT programme has no impact on learning outcomes.
And the smaller your MDE is, the larger is the sample size required. So, researchers always have to balance the need for having a study that can detect as low a difference as possible with the additional expense incurred on data collection with larger sample sizes.
Having a study that can detect as low a difference as possible between treatment and control groups is not necessarily a good thing. That is why it is important to involve decision makers and other key stakeholders using the evaluation in deciding the MDE. They may feel that a 10 per cent increase in learning outcomes between control and treatment groups, as a result of the CCT programme, is too low to justify investment in the programme. Spending money on a study with an MDE of 10 per cent, which would require a larger sample size, would not then be meaningful for those policymakers. If we did, wouldn’t we be spending money on a study that is ‘overpowered’?
At 3ie, many of the research proposals we receive make no mention of the policymaker’s perspective on the expected minimum improvement in outcomes required for justifying the investment being made in the programme. No one has defined what success will look like for the decision maker.
To get that important definition, researchers can ask policymakers and programme managers some simple questions at the time the research proposal is being prepared. Here are some indicative questions to illustrate: What aspects of the scheme or programme are most important to you? What outcomes should this scheme change or improve according to you? If the scheme were to be given to village A and not village B, then what is the difference you would expect to see in the outcomes for individuals in village A as compared to village B (assuming the baseline value of the outcome is the same in both villages)? Would this difference be sufficient for you to decide whether or not to roll out this programme in other districts?
Of course, as 3ie’s Executive Director, Howard White, highlights in his blog, researchers need to balance this information and ensure that the MDE is not set at too large a MDE.
So, as researchers, instead of monopolizing, let us involve policymakers and programme implementers in the power calculations of a study. It’s high time that we started following the conventional wisdom of having the MDE be the minimum change in outcome that would justify the investment made in the intervention for a policymaker.