Our team is evaluating India’s National Rural Livelihoods Mission (NRLM)—one of the world’s largest poverty alleviation programs. The study spans nine states of the country, covering over 25,000 households and 7,000 institutions. The recently-completed survey focused on socio-economic indicators, institution-level data, and women’s empowerment. Along the way, we gained valuable insights into what it takes to conduct high-quality data collection in complex, real-world settings.
In this blog, the second in a two-part series, we delve into the strategies we employed to maintain data quality during the survey phase. The first part, which focused on pre-data collection preparation, is here.
Challenges in large-scale data collection
Early in the survey, we noticed a troubling pattern—several sections of the questionnaire, particularly those related to agricultural labor and household income, had missing or illogical responses. For example, a respondent might report working full-time on a farm but list no agricultural income. This prompted an urgent review: was this an issue with the survey design, respondent recall, or enumerator performance? Despite extensive training, we realized that enumerators struggled with skipping patterns and instructions in the questionnaire. Some unknowingly bypassed key questions, while others misunderstood how to probe respondents for accurate financial details. Some key challenges included:
- Enumerators skipping or misinterpreting sections due to the survey’s complexity.
- Inconsistencies in responses due to respondent recall issues or enumerator biases.
- Fraudulent practices such as fabricated data entries.
- Variability in enumerator performance, affecting data reliability.
With data collected from diverse geographies, each with its own cultural and logistical challenges, we needed to devise a dynamic approach to ensure quality. We tackled these challenges head-on through real-time monitoring and iterative refinements.
Implementing high-frequency checks
We identified inconsistencies in reporting different sources of income and labor participation early in the survey. To address this, we implemented high-frequency checks that flagged irregular data points. Additionally, we identified discrepancies in responses that don’t align logically (e.g., time spent on the primary activity is less than the secondary activity, mismatch in family labor work and occupation) and any unusually high or low values that deviate from the expected range (e.g., extreme high revenue, high cost reported). For example, if a family member reported working in agriculture in the kharif season, our system ensured at least one crop was listed in the agriculture section for the same season. These automated checks helped us detect and correct real-time errors, preventing systemic data issues.
Field monitoring and spot checks
We had protocols in place for the team of well-trained enumerators in the field. However, field reports suggested discrepancies between training sessions and real-world applications in some cases. We increased field monitoring in areas with data quality issues to bridge this gap. Supervisors conducted random spot checks of about 5% of all interviews to ensure adherence to survey protocols. These checks uncovered issues like enumerators conducting interviews too quickly or skipping key questions. Corrective measures, including re-training and close supervision, improved overall data reliability.
One particularly revealing spot check involved cross-verifying responses with audio audits. We discovered that some enumerators were inadvertently leading respondents toward socially desirable answers. We reinforced the importance of neutral probing techniques by addressing this issue in debriefing sessions.
Back checks: validating enumerator survey
To verify the accuracy of responses, we conducted back checks by re-contacting 5% of respondents. Initially, we attempted random sampling, but we realized this didn’t capture errors specific to individual enumerators. Instead, we switched to an enumerator-based sampling approach, ensuring that each enumerator’s work was proportionally represented in back checks. For example, if six enumerators worked across three villages, conducting back checks in all three locations was unnecessary. Instead, we chose one village and, within that village, selected one household per enumerator. This method kept the process efficient while maintaining a robust quality check for all enumerators.
Debriefing sessions: addressing field challenges in real-time
We supported the data collection agency in reducing errors and inconsistencies by implementing regular debriefing sessions with enumerators. These sessions helped identify field barriers—respondent fatigue, reluctance to discuss income, and confusion over technical terms. Rather than waiting for these issues to be reflected in the data, we worked proactively to address them. For instance, when enumerators struggled to obtain accurate financial data, we collaborated on revising the questioning approach to make it more intuitive and context-specific.
Leveraging audio and text audits
Given the survey's scale, real-time monitoring of all interviews wasn’t feasible. To compensate, we introduced audio and text audits. We initially faced resistance from enumerators who were wary of being recorded, but by ensuring transparency and using these audits constructively, we improved adherence to survey protocols. By reviewing the timing of survey responses (text audits) and listening to actual interviews (audio audits), we caught issues that would have been impossible to detect through data review alone—such as rushed questioning or respondents being cut off mid-response. This led to refinement in our supervision approach, focusing on support rather than oversight.
Continuous adaptation: lessons learned
Data quality assurance in large-scale surveys isn’t about implementing a rigid system but about continuously adapting to ground realities. Some key takeaways:
- Performance categorization: Tracking enumerator performance over time allowed us to identify consistently underperforming individuals for targeted intervention.
- Survey duration checks: We found that extremely short interviews often indicated skipped questions, so we enforced minimum time thresholds for completion.
- Baseline consistency checks: During the endline survey phase, we matched responses to baseline data to catch inconsistencies in time-invariant indicators like land ownership details.
- Question reframing: Quality issues weren’t just about enumerator mistakes—sometimes, respondents struggled to recall details, requiring us to refine how we framed questions.
By integrating automated checks, field monitoring, back checks, and continuous learning loops, we built a system that adapted to challenges in real time. The NRLM impact evaluation reinforced that large-scale surveys don’t just demand technical expertise—they require the ability to problem-solve on the go.