When will researchers ever learn?
I was recently sent a link to this 1985 World Health Organization (WHO) paper which examines the case for using experimental and quasi-experimental designs to evaluate water supply and sanitation (WSS) interventions in developing countries.
This paper came out nearly 30 years ago. But the problems it lists in impact evaluation study designs are still encountered today. What are these problems?
Lack of comparability of treatment and control, including in randomised control trials (RCT):Experience in several large trials in both developed and developing countries shows that differences in secular trends, including changes caused by epidemics which disproportionately affect one of the two groups, make it very difficult to attribute observed changes to the intervention. Researchers who use black box RCT designs, which ignore context and just assume that randomization will achieve balance, fall into this trap. Data must be collected so as to be aware of contextual factors, including other interventions in the study area.
Small sample sizes are often too small: Ex ante power calculations are becoming more common but they are still not common enough. When they are done they assume unrealistically large effect sizes, over-estimate compliance and under-estimate attrition, and sometimes even ignore the effect of clustering on the standard errors. All of this reduces the true power of the study. A typical power calculation calculates the sample size required for power of 80 per cent, but the actual power achieved is more likely around 50 per cent. This means that if the intervention works, there is only a 50 per cent chance that the study will find that it does so. An under-powered RCT is no better than tossing a coin at finding out if a successful intervention is actually working or not. The WHO paper suggests that sample sizes of 120,000 households may be too low to detect the impact of a WSS intervention – but we see many studies with sample sizes of 200 or less!
Inaccurate data (misclassification bias): There are good reasons to believe that the reporting of outcomes by those surveyed, and data on whether they utilised the intervention or not, are likely to be inaccurate. Better designed instruments, with cross checks, and using more triangulation, including complementary qualitative research, can help get round this problem. In the absence of a counter-acting measure, studies are likely to under-estimate programme impact, thus exacerbating the problem of lack of power.
Ethical problems: There are legitimate ethical concerns regarding withholding interventions which we know to have positive impacts. For many interventions, including those in WSS, we know there is a positive impact. Most WSS interventions achieve a 40-60 per cent reduction in child diarrhoea. So, research resources are better devoted to addressing questions of how to ensure sustained adoption of improved facilities with proper use. Doing this can avoid ethical problems. But too few researchers concern themselves with answering these practical design and implementation questions.
Time and budget constraints: Several of the above problems, such as poor survey instrument design and insufficient sample size, stem from the unrealistic time and budget constraints imposed on studies. Study time frames are often too short to allow study effects to emerge, and certainly too short to see if they are sustained. So, what is the solution then?
Impact evaluation is not the only sort of evaluation, and with short time frames and small budgets it is probably better to do a high quality process evaluation rather than a low quality impact evaluation. This does not mean that impact evaluations are not required. Impact evaluations are needed but resources need to be strategically deployed to undertake high quality studies which avoid these problems. Considering we have known about these problems for nearly thirty years, it is about time that we learn and not make the same mistakes.
Add new comment