Data Quality Part 1: Beyond Accuracy — What "Good Data" Really Means When AI Is on the Line
Impossibile aggiungere al carrello
Rimozione dalla Lista desideri non riuscita.
Non è stato possibile aggiungere il titolo alla Libreria
Non è stato possibile seguire il Podcast
Esecuzione del comando Non seguire più non riuscita
-
Letto da:
-
Di:
Most executives think data quality means one thing: is the number right? Three decades of research — and a string of nine-figure disasters — say it's actually at least seven different things, and AI is now scaling whichever one your organisation got wrong.
In Part 1 of our Data Quality in the AI Era series, James starts skeptical. Surely "is the data accurate" covers it? Why is this being made harder than it needs to be? Sarah walks him — and the listener — through what data quality actually is, the seven dimensions that matter for enterprise AI, and the killer distinction that explains most of what goes wrong: valid is not the same as accurate.
What we cover:
- Why "we cleaned the data, it's accurate now" has been doing damage for thirty years
- The seven dimensions of data quality — and why a single quality score is dangerous
- Public Health England: 15,841 COVID cases lost because an Excel file silently truncated rows
- NASA Mars Climate Orbiter: a $327M spacecraft lost to a unit mismatch that was perfectly valid
- Citigroup / Revlon: how three fields, six eyes, and one missing range check became an $894M wire transfer
- A heavy-industrial safety story where the data wasn't catastrophically wrong — it was catastrophically ambiguous
- Why AI doesn't inherit these problems gently — it scales them, in a tone of voice that sounds correct
- A teaser for Part 2: the Robodebt case, and the one question that would have prevented it
For executives, senior technology leaders, and data leaders trying to get real value from AI investment — without funding it on a foundation nobody has actually inspected.
"Polished on the surface, shaky underneath." — James
Episode length: ~21 min
Series: Data Quality in the AI Era — Part 1 of 2
References:
- The MIT Total Data Quality Management Program — https://web.mit.edu/tdqm/www/about.shtml
- MIT Sloan Management Review, Wang & Strong (1996), "Beyond Accuracy: What Data Quality Means to Data Consumers" — https://doi.org/10.1080/07421222.1996.11518099
- DAMA UK Working Group, "The Six Primary Dimensions for Data Quality Assessment" (2013) — https://www.sbctc.edu/resources/documents/colleges-staff/commissions-councils/dgc/data-quality-deminsions.pdf
- ISO/IEC 25012:2008, Software engineering — Software product Quality Requirements and Evaluation (SQuaRE) — https://www.iso.org/standard/35736.html
- Sambasivan et al., "Everyone wants to do the model work, not the data work: Data Cascades in High-Stakes AI", CHI 2021 — https://research.google/pubs/everyone-wants-to-do-the-model-work-not-the-data-work-data-cascades-in-high-stakes-ai/
- IBM Institute for Business Value, "2025 CDO Study: The AI multiplier effect" — https://www.ibm.com/thought-leadership/institute-business-value/en-us/report/2025-cdo
- BBC News, "Covid: 16,000 coronavirus cases missed in daily figures after IT error" (5 October 2020) — https://www.bbc.com/news/uk-54422505
- NASA, Mars Climate Orbiter Mishap Investigation Board Phase I Report (1999) — https://llis.nasa.gov/llis_lib/pdf/1009464main1_0641-mr.pdf
- Citi cites human error in accidental $900M transfer — https://www.bankingdive.com/news/citi-cites-human-error-in-accidental-900m-transfer/584156/
- Royal Commission into the Robodebt Scheme, Final Report (7 July 2023) — https://robodebt.royalcommission.gov.au/publications/report
Related episodes:
Episode 1 — Why Data Observability Matters Before AI Scales
Send us Feedback