Data Quality Part 1: Beyond Accuracy — What "Good Data" Really Means When AI Is on the Line copertina

Data Quality Part 1: Beyond Accuracy — What "Good Data" Really Means When AI Is on the Line

Data Quality Part 1: Beyond Accuracy — What "Good Data" Really Means When AI Is on the Line

Ascolta gratuitamente

Vedi i dettagli del titolo

Most executives think data quality means one thing: is the number right? Three decades of research — and a string of nine-figure disasters — say it's actually at least seven different things, and AI is now scaling whichever one your organisation got wrong.

In Part 1 of our Data Quality in the AI Era series, James starts skeptical. Surely "is the data accurate" covers it? Why is this being made harder than it needs to be? Sarah walks him — and the listener — through what data quality actually is, the seven dimensions that matter for enterprise AI, and the killer distinction that explains most of what goes wrong: valid is not the same as accurate.

What we cover:

  • Why "we cleaned the data, it's accurate now" has been doing damage for thirty years
  • The seven dimensions of data quality — and why a single quality score is dangerous
  • Public Health England: 15,841 COVID cases lost because an Excel file silently truncated rows
  • NASA Mars Climate Orbiter: a $327M spacecraft lost to a unit mismatch that was perfectly valid
  • Citigroup / Revlon: how three fields, six eyes, and one missing range check became an $894M wire transfer
  • A heavy-industrial safety story where the data wasn't catastrophically wrong — it was catastrophically ambiguous
  • Why AI doesn't inherit these problems gently — it scales them, in a tone of voice that sounds correct
  • A teaser for Part 2: the Robodebt case, and the one question that would have prevented it

For executives, senior technology leaders, and data leaders trying to get real value from AI investment — without funding it on a foundation nobody has actually inspected.

"Polished on the surface, shaky underneath." — James

Episode length: ~21 min
Series: Data Quality in the AI Era — Part 1 of 2

References:

  • The MIT Total Data Quality Management Program — https://web.mit.edu/tdqm/www/about.shtml
  • MIT Sloan Management Review, Wang & Strong (1996), "Beyond Accuracy: What Data Quality Means to Data Consumers" — https://doi.org/10.1080/07421222.1996.11518099
  • DAMA UK Working Group, "The Six Primary Dimensions for Data Quality Assessment" (2013) — https://www.sbctc.edu/resources/documents/colleges-staff/commissions-councils/dgc/data-quality-deminsions.pdf
  • ISO/IEC 25012:2008, Software engineering — Software product Quality Requirements and Evaluation (SQuaRE) — https://www.iso.org/standard/35736.html
  • Sambasivan et al., "Everyone wants to do the model work, not the data work: Data Cascades in High-Stakes AI", CHI 2021 — https://research.google/pubs/everyone-wants-to-do-the-model-work-not-the-data-work-data-cascades-in-high-stakes-ai/
  • IBM Institute for Business Value, "2025 CDO Study: The AI multiplier effect" — https://www.ibm.com/thought-leadership/institute-business-value/en-us/report/2025-cdo
  • BBC News, "Covid: 16,000 coronavirus cases missed in daily figures after IT error" (5 October 2020) — https://www.bbc.com/news/uk-54422505
  • NASA, Mars Climate Orbiter Mishap Investigation Board Phase I Report (1999) — https://llis.nasa.gov/llis_lib/pdf/1009464main1_0641-mr.pdf
  • Citi cites human error in accidental $900M transfer — https://www.bankingdive.com/news/citi-cites-human-error-in-accidental-900m-transfer/584156/
  • Royal Commission into the Robodebt Scheme, Final Report (7 July 2023) — https://robodebt.royalcommission.gov.au/publications/report


Related episodes:
Episode 1 — Why Data Observability Matters Before AI Scales

Send us Feedback

adbl_web_anon_alc_button_suppression_t1
Ancora nessuna recensione