A few years ago, I sat in on a review of an impact report for a girls' education program in northern Nigeria. The headline number was impressive: 1,200 girls enrolled. But when the evaluators dug deeper, they found that only 300 attended regularly, and 75% of those had dropped out by term three. The report never mentioned the girls who left. Their stories were lost — buried under a metric that made the funder happy.
This is not a one-off. Impact measurement, done carelessly, can distort truth. It can privilege the loudest voices, reward easy numbers, and erase the very people the program claims to serve. This article is about that deception — and how to avoid it.
Why This Topic Matters Now
According to published workflow guidance, skipping the calibration log is the pitfall that shows up on audit day.
The rise of 'impact washing' and donor pressure
Who gets to define success?
“A number that hides who is left behind is not a metric — it is a silence dressed up as evidence.”
— A biomedical equipment technician, clinical engineering
Real-world consequences of bad data
The worst part is not the wasted budget — though that stings. It is the redirected resources. When a report claims success using a flawed metric, the donor doubles down. More money flows into the same broken approach. Meanwhile, the intervention that actually worked — the one that required messy qualitative feedback and patient follow-up — gets defunded because its numbers did not sparkle. That hurts. I once watched an education initiative pivot from literacy tutoring to attendance tracking simply because attendance was easier to graph. Tutors lost jobs. Kids lost the only adult who checked their homework. The report looked great. The reality? A quiet disaster. That is why this topic matters now. Not because metrics are bad — they are essential. But because the ethical cost of measuring the wrong thing, for the wrong audience, has never been higher. We cannot afford to confuse activity with impact anymore. The data deceives only when we let it.
Core Idea in Plain Language
Data is never neutral
Every number you drop into a report arrived there because someone decided what counted. Not what is true — what counts. I once watched a team celebrate a 92% survey response rate as a win, when the missing 8% were the community members who couldn't read the language the survey was printed in. That silence was a choice. It just wasn't theirs. The moment you define a metric, you are also defining what you are willing to overlook. Attendance sheets, clinic logs, app downloads — each one carries a built-in bias about whose effort is visible and whose existence is inconvenient.
This is not an argument against measurement. It is an argument against pretending measurement is innocent. The cleaner the spreadsheet looks, the more power you had to clean it. And that power usually sits far away from the people the numbers claim to represent. One foundation I worked with kept insisting their program was 'data-driven' — yet they had never asked participants what a good outcome looked like to them. The data drove, but the destination was already chosen by someone with a grant deadline and a theory of change written in an air-conditioned office.
The catch is that most teams don't realise they are curating a story. They think they are simply reporting facts. But a fact is a fact only until you have to choose a denominator, or a cut-off date, or a way to classify the messy thing a person said into a tidy checkbox. Those choices bury more than they reveal.
The power of storytelling in numbers
Numbers tell stories — the question is whose voice gets the narrator slot. A 78% graduation rate sounds triumphant until you learn that the 22% who dropped out were all from the same village, and nobody thought to interview them about why. The aggregate pattern hides the individual rupture. Storytelling in numbers does not mean abandoning rigour; it means acknowledging that every statistic is a condensed narrative, stripped of tone, context, and the speaker's own emphasis.
That sounds fine until you are the one whose story got flattened. I have seen program staff push back against qualitative data because it 'doesn't fit the logframe.' That is a power move dressed as methodology. The real work is not to make people's experiences fit your boxes — it is to make your boxes flexible enough to hold their experiences without breaking them. A number from a participant who felt coerced into the program is a different kind of data point than one from a participant who walked in voluntarily. But most reporting tools treat them as identical. That is not accuracy. That is expedience wearing a lab coat.
Participation vs. extraction
Extraction is when you collect data from a community, write your report, and never show them the results. Participation is when the community helps decide what data matters, how it is gathered, and what it means once you have it. The difference is not subtle — it is fundamental. Extraction treats people as sources of raw material. Participation treats them as co-authors of the evidence.
Most teams skip this because it is slow, messy, and hard to fit into a quarterly deliverable. Fair enough. But the cost of speed is credibility. If your impact report tells a story the community does not recognise, whose story is it? You can hit every target and still be wrong about what changed. The practical fix is not to throw out quantitative methods — it is to spend a tenth of your evaluation budget on sense-making sessions where participants can push back, reinterpret, and add nuance to the numbers you collected.
'We were so busy counting heads that we forgot to ask whose head they were attached to.'
— senior evaluator reflecting on a three-year health project that showed high attendance and zero behaviour change
The trade-off is real: participatory methods can introduce new biases, slow timelines, and produce data that is harder to aggregate across sites. But the alternative is a beautifully formatted report that lies by omission — and that is a worse outcome than a slightly messier one that tells a truer story.
How It Works Under the Hood
According to industry interview notes, the gap is rarely tools — it is inconsistent handoffs between steps.
The transformation chain: from raw data to polished report
Data doesn’t arrive clean. Someone pulls attendance logs from a tablet, another person exports a spreadsheet from the program manager’s laptop, and a third intern copies numbers into a donor template. Right there—that handoff—is where the first bias sneaks in. The tablet might have recorded only people who signed a paper sheet, not the ones who arrived late and left early. The export might drop partial rows. The intern, under pressure, might round 47.3% up to 48% because it “looks cleaner.” What usually breaks first is the connection between what happened in the field and what lands in the report. I have seen teams spend hours reconciling missing data, only to discover the field coordinator was entering dummy test records the whole time. The transformation chain is a series of judgment calls, and each call can nudge the story away from messy reality toward a polished fiction.
Sampling bias and the tyranny of averages
Most impact reports use averages. Average attendance. Average knowledge gain. Average behavior score. Averages hide extremes, and in social impact the extremes often carry the real signal. Consider a program that serves 100 households: 80 show modest improvement, 10 deteriorate, and 10 leap forward. The average looks positive—but the 10 who deteriorated might have dropped out because the intervention was culturally inappropriate. The average buries that. Sampling bias makes it worse. If you only measure the people who showed up to the final event, you exclude dropouts—the very people who might signal failure. One NGO I audited had survey teams that skipped households without a mobile phone. That dropped 30% of their target population. The report said “95% satisfied.” The truth? They only asked the connected few. Who gets counted becomes who gets heard.
The role of indicators and unintended incentives
Indicators are supposed to focus attention. Instead they often distort it. Pick “number of training sessions held” as a key metric, and staff will cram sessions into every village, even when the content rushes past comprehension. Pick “percentage of participants who pass a post-test,” and facilitators will teach to the test—drilling memorized answers while real behavior stays unchanged. That sounds fine until you realize the organization’s funding depends on hitting that number. The unintended incentive is to make the data look good, not to make the program work. Most teams skip this: aligning their indicators with the messy, slow, hard-to-measure outcomes that actually matter. The trick is to balance a quantitative metric with a qualitative check—like a follow-up call to a random 10% of beneficiaries, asking open-ended questions about what changed. That catches the gap between reported success and lived experience.
‘The indicator told us people remembered three hand-washing steps. The household visit showed the soap was still in the package.’
— Program officer reflecting on a nutrition project, personal conversation
The pitfall here is that indicators feel objective. They aren’t. They are choices about what to value. If your report only tracks what’s easy to count, you are not telling the whole story—you are telling the story the numbers can tell. Which is not the same thing.
Walkthrough: When a Health NGO Confused Attendance with Behavior Change
Setting: maternal health program in rural Malawi
A well-funded health NGO had been running an antenatal care program in three districts of rural Malawi for eighteen months. The original pitch was elegant: train local health workers, distribute iron supplements, teach danger-sign recognition, and watch maternal mortality drop. I sat in on a quarterly review where the program director opened with a slide showing a beautiful upward curve—attendance at antenatal visits had climbed 47 percent since baseline. Smiles all around. The funder's representative nodded approvingly. Nobody asked the question sitting right in front of us: did mothers actually understand anything they heard, or were they just showing up because a community health worker walked them to the clinic?
The metric: number of women attending antenatal visits
The NGO's monitoring system tracked one thing faithfully: every woman who walked through the clinic door got counted. A fingerprint scan at intake, a checkbox on a paper form, and a weekly tally sent to headquarters. That number looked solid—clean, verifiable, easy to graph. The catch? The metric rewarded quantity over quality. Health workers quickly learned that their performance bonus hinged on keeping that graph pointing up. So they started bundling return visits. A woman who came for a tetanus shot, came back for iron refills, and came a third time for a checkup got counted three times. That's not wrong, exactly—she did attend three times—but the report implied three different women receiving care. Worse, when I asked the program manager what happened to women who attended once and never returned, she shrugged. "We don't track dropouts separately. The system wasn't built for that." There it is: the data hid the very people the program was designed to serve.
What the data hid: attendance ≠ understanding, and dropouts were invisible
We did a quick field exercise. Sixty women who had attended at least four antenatal visits were given a simple oral quiz: name three signs of pregnancy complications that mean you should go to the hospital immediately. Only twelve could name two signs. Nineteen named one. The rest—almost half—couldn't name any. They had come to the clinic, sat through the talks, received their supplements, and left without absorbing the core safety message the program was built on. That's not their failure; it's a measurement failure. The data system never checked for comprehension because comprehension is harder to count than butts in chairs.
'We measured what moved. Attendance moved. Understanding didn't. So we pretended understanding didn't matter.'
— Program officer, upon reviewing the field quiz results
And then there were the dropouts. The NGO's dataset had 1,847 women enrolled. Digging into the raw clinic logs—the ones never entered into the tidy dashboard—we found 412 women who attended exactly once and never came back. They simply vanished from the numbers; the aggregate chart showed growth because new enrollments outpaced dropouts. But those 412 women? They were the most vulnerable—younger, poorer, less likely to own a phone or have transport money. The program was systematically failing the people who needed it most, and the data conspired to hide that. The dashboard showed a thriving program. The field showed a revolving door with a pretty coat of paint. That's the core deception: when you only count what's easy to count, you end up telling the story of the people who stay, not the people who leave. And in impact work, the people who leave are often exactly the ones whose story matters most.
Edge Cases and Exceptions
Survivor-led data: when participation is re-traumatizing
I once sat in on a focus group where a facilitator asked survivors of domestic violence to rate their 'safety confidence' on a 1–10 scale. A woman in the back row folded her arms and whispered: 'You want me to prove my pain with a number so you can get funding?' She wasn't wrong. Standard measurement assumes distance—that a person can relive a memory, assign it a digit, and walk away clean. That assumption breaks in trauma contexts. The act of counting can re-open a wound. I have seen organizations quietly drop 'empowerment' metrics after staff realized the survey itself triggered flashbacks. The trade-off here is brutal: collect the data you need to prove impact, or protect the people you claim to serve. Some survivors choose silence. That silence is not a failure of evidence—it is a boundary. And boundaries, if you respect them, teach you more than a filled-in Likert scale ever will.
The catch is that funders rarely accept 'we respected their silence' as an outcome metric. Most teams skip this: they never ask which questions cause harm until someone breaks down in a debrief. By then, the damage is done. One rule of thumb worth stealing—run every question past a community advisory group before you field it. They will tell you what stings. Listen.
'They asked us to draw our trauma on a map. I drew a river. They wrote "proximity to water source." I meant where I was raped.'
— Community health worker, rural outreach program debrief, 2022
Informal economies that resist quantification
Counting what happens in an informal settlement is like trying to photograph fog. There is no payroll. No clinic registration. No fixed address. A woman sells vegetables at the roadside—her income fluctuates with rain, police checks, and the day her nephew gets malaria. Standard impact tools demand a baseline: 'How much did you earn last month?' She cannot answer. Not because she is hiding money, but because her economy runs on debt cycles, barter, and survival triage. Push her to give a number and she will guess—and that guess becomes your 'evidence.'
That sounds fine until your report claims income increased by 40% when actually she just had a good Tuesday. The real harm is subtler: by forcing informal systems into formal categories, you erase the coping strategies that keep people alive—borrowing from neighbors, splitting a bag of rice seven ways, skipping a meal so a child can eat school lunch. Those strategies do not fit in a spreadsheet. They look like 'no data.'
I fixed this once by scrapping the income question entirely. Instead we asked: 'In the last week, how many times did you skip a meal so your child could eat?' That gave us a proxy that respected how the economy actually worked. Ugly metric. Honest one.
Cultural taboos and what cannot be said
In some communities, naming a stillborn baby is considered dangerous—it invites the spirit back. So when a health NGO asks 'How many children have you lost?' the answer is always zero. Not because the loss never happened, but because speaking it aloud breaks a taboo older than the survey. Most measurement frameworks treat silence as absence. That is a mistake. A blank field may be the most truthful response in the room. The tricky bit is that your logic model cannot digest a blank. It needs a number. So what do you do? Fabricate a proxy? Or admit that some stories cannot be counted—only honored?
The exception to the counting rule is this: let communities define what is measurable. I have seen a women's cooperative reject a 'household decision-making index' because, in their culture, no woman would ever say 'I decide alone'—that would shame her husband. Instead they tracked 'how often she felt heard in family discussions.' Subjective. Squishy. And far more honest. The limit of the approach is that qualitative proxies do not fit neatly into dashboards. But whose problem is that—the community's, or the measurement framework's? Wrong order. You design the tool *for* the culture, not the other way around. That means sometimes you walk away from a metric altogether. And that is not failure. It is respect.
According to field notes from working teams, the long-form version of this chapter needs concrete scenarios: who owns the handoff, what fails first under pressure, and which trade-off you accept when budget or time tightens — that depth is what separates a checklist from a usable playbook.
Limits of the Approach
No perfect system: the trade-offs of mixed methods
I once sat with a team that had designed what looked like the ideal measurement framework—surveys, focus groups, and a control cohort. Six months in, they had rich qualitative data from three villages and zero usable quantitative trends. The reason? The survey instrument had been translated poorly, and the control group had dissolved because community leaders insisted everyone receive the intervention. That’s the hard truth about mixed methods: they demand more than just a checkbox approach. Each layer of data adds depth, yes—but also a new seam where things blow out. The trade-off is constant. Do you prioritize statistical rigor knowing it might alienate local participants? Or do you lean into participatory methods and accept that your sample may never reach significance? Honest answer: either choice leaves something on the table.
Resource constraints and the cost of ethical measurement
Ethical impact measurement is slow. That hurts when funders expect quarterly numbers. I have seen organizations cut participatory workshops—the very spaces where community voice emerges—because they cost four times as much as an online form nobody reads. The catch is that cheap data often tells the wrong story. A survey with 90% response rate sounds great until you realize field staff nudged respondents toward favorable answers to hit targets. Real ethical rigor means training enumerators, translating instruments back and forth, and budget lines for community translators. Most teams simply don’t have that cash. So what happens? They choose the less ethical path—not out of malice, but out of necessity. That’s a system failure, not a character flaw.
Consider this: if your entire monitoring budget covers two staff and a laptop, you cannot run focus groups across seven languages. You default to what fits. The result is impact reports that privilege the voices easiest to reach—English-speaking, literate, near the road. We fixed this at one small NGO by alternating: intense qualitative work every two years, leaner surveys in between. Not perfect, but it broke the cycle of annual reports that only told one story. Still, even that compromise costs time and political capital with boards who want neat year-over-year numbers.
When participation becomes a checkbox
The idea sounds noble: ‘Let communities define what success looks like.’ In practice, I’ve seen this become a box that gets ticked and forgotten. A health team invited village elders to a ‘co-design’ session, served tea, took photos—then used the exact same indicators they had already written. The elders knew. They told me later, “They listened politely and then did what they planned anyway.” That is not participation; it’s performance. The limit is not the tool but our willingness to actually change course based on what we hear. Ethical measurement fails when we treat community input as a procedural step rather than a steering mechanism. A key insight: if your findings never surprise you, you’re probably not listening hard enough.
‘The most ethical measurement in the world is useless if nobody can afford to do it twice.’
— Field director, after watching a three-year longitudinal study collapse under funding cuts
Reader FAQ
How do I know if my data is extractive?
You feel it before you prove it. I once watched a program officer run a focus group where every question started with “tell us what you lack.” Not once did she ask what the community already knew or had built. That’s the fingerprint — your data process takes, but never gives back. Extractiveness lives in the rhythm: you collect, you leave, you publish. No checking in. No sharing raw findings with participants. The real test is simple: would you show the household you surveyed the full report before it goes to your board? If that thought makes you wince, your data is extractive. Fix it by offering a two-page plain-language summary to every respondent. Do that for one quarter. Watch how the tone of your subsequent interviews shifts — people stop guarding their stories.
Can we ever trust impact numbers?
Not blindly — and that’s fine. Trust doesn’t require perfection. It requires traceability. A number becomes trustworthy when I can follow the chain from raw observation to final percentage: Who counted? What was their bias? Did the survey happen after a meal distribution — when people felt grateful and inflated their answers? I have seen NGOs report a 94% satisfaction rate, but the question was asked right after a free lunch. That’s a measurement choice, not a lie — but it’s still deceptive. The honest fix? Publish your denominator. Show the drop-off. If you started with 800 enrolled and ended with 62 valid surveys, say it. That transparency doesn’t weaken your report — it gives skeptics a reason to believe.
“We stopped arguing about whether the number was true. We started arguing about whether the number was useful.”
— Field director, after her team rebuilt their M&E framework from scratch
What small steps can I take this quarter?
Three moves, none of which require a new software license. First, audit one indicator — pick the metric your last report led with. Ask: does this measure activity or outcome? “Training sessions held” is activity. “Participants who changed a practice three months later” is outcome. If you’re reporting the first while implying the second, rewrite the indicator. Second, run a silent test: have someone outside your team re-analyze a slice of your raw data without seeing your previous conclusions. The gap between their findings and yours is your blind spot. We did this once and discovered our survey had accidentally primed respondents with the word “healthy” before every nutrition question — we had manufactured a 15-point inflation. Third, change one question in your most-used survey from a scale (“rate from 1 to 5”) to a narrative opener (“describe what changed for you”). That single swap will gut-punch you with stories your numbers were hiding. Do these three things before your next impact report. Not because they are easy — because the alternative is telling a comfortable lie.
Practical Takeaways
Audit your indicators for power
Pull your last impact report. Circle every number that claims to measure a person’s change—attendance rates, training completions, surveys collected. Now ask: Who decided this mattered? I once watched a youth program celebrate “85% retention” while every single exit interview mentioned boredom. The indicator served the funder’s spreadsheet, not the young people’s experience. The fix is uncomfortable but fast: map each metric back to the community’s own definition of progress. If your indicator measures something done to someone rather than changed in their life, flag it. That does not mean kill it—funders need numbers. But add a second line in your report: “This number tells you X; it does not tell you Y.”
Invest in qualitative methods and community feedback loops
Numbers flatten stories. A seventy-percent increase in clinic visits sounds triumphant—until you talk to the patients who walked three hours for a ten-minute lecture. What usually breaks first is the assumption that participation equals benefit. We fixed this by embedding a simple “listening session” into our quarterly reporting cycle: three open-ended questions, recorded and transcribed, no more than twelve people. The catch? You have to actually publish what you hear. One NGO I advised found that their flagship women’s empowerment program was triggering marital conflict at home—a detail no survey caught. They published it. Lost one grant, gained three because the honesty was rare. A number can tell you scale; a voice tells you direction.
“The most ethical report is the one that leaves the reader slightly less certain—not more certain—of the easy story.”
— Program officer reflecting on a failed evaluation, paraphrased
Publish limitations and uncertainties
Most teams skip this because it feels like admitting failure. That hurts—because the opposite is true. Declaring “we do not know if this change lasted six months” is not weakness; it is the only honest position. Create a dedicated section in your report titled “What We Cannot Say.” List: sample size gaps, attrition patterns you ignored, control-group compromises. One climate project I encountered reported that tree survival rates were “monitored monthly”—but the footnote (buried on page 17) admitted they only checked four of twelve plots. The deceptive part was not the survival rate. It was the smooth confidence. Here is the trade-off: publishing limitations may reduce your score on a funder’s checklist. But it protects your reputation from the far worse hit of being caught later. Start small—one margin of error, one missing data point, named. Next report, add two. That is not a retreat from accountability. It is the first real step toward it.
Comments (0)
Please sign in to post a comment.
Don't have an account? Create one
No comments yet. Be the first to comment!