Sarah and James open the three-part Data Security for AI series with a simple argument: AI is only as trustworthy as the data underneath it.
What we cover
The adoption gap: Gartner expects 40% of enterprise apps to embed AI agents by end‑2026 (up from <5%). IBM’s 2025 Cost of a Data Breach Report found 13% of organisations have had an AI-related breach — 97% lacked proper access controls.
Structured vs unstructured data: IDC estimates 80–90% of enterprise data is unstructured. Varonis found only 1 in 10 organisations have labelled files, and 88% still have “ghost” accounts. Point a copilot at that estate and every overshared file is exposed.
The incident catalogue: Samsung engineers pasting source code into ChatGPT (2023). Microsoft’s AI team exposing 38 TB — via a misconfigured Azure SAS token. DeepSeek’s ClickHouse leak exposing chat histories and API keys (2025).
Liability is real: Moffatt v. Air Canada (2024), where the airline argued its chatbot was a separate legal entity — and lost. NYC’s MyCity chatbot.
Shadow AI: IBM found shadow-AI breaches cost US$670K more and make up 20% of incidents.
Memorisation: Carlini et al. (ICLR 2023) showed models memorise training data based on size, duplication, and prompt context — sensitive data should be treated as eventually leakable.
Sources
Gartner 40% forecast: https://finance.yahoo.com/news/40-enterprise-apps-embed-ai-181310288.html
IBM 2025 Cost of a Data Breach: https://www.ibm.com/reports/data-breach
IBM analysis (97%, US$670K): https://www.kiteworks.com/cybersecurity-risk-management/ibm-2025-data-breach-report-ai-risks/
IDC unstructured data: https://blog.box.com/90-percent-unstructured-data
Varonis 2025 State of Data Security: https://www.varonis.com/blog/state-of-data-security-report
Samsung ChatGPT leak: https://www.pcmag.com/news/samsung-software-engineers-busted-for-pasting-proprietary-code-into-chatgpt
Microsoft 38 TB exposure: https://www.wiz.io/blog/38-terabytes-of-private-data-accidentally-exposed-by-microsoft-ai-researchers
DeepSeek ClickHouse exposure: https://www.wiz.io/blog/wiz-research-uncovers-exposed-deepseek-database-leak
Moffatt v. Air Canada (Forbes): https://www.forbes.com/sites/marisagarcia/2024/02/19/what-air-canada-lost-in-remarkable-lying-ai-chatbot-case/
NYC MyCity (The Markup): https://themarkup.org/artificial-intelligence/2024/04/02/malfunctioning-nyc-ai-chatbot-still-active-despite-widespread-evidence-its-encouraging-illegal-behavior
Cisco 2024 Privacy Benchmark: https://www.cisco.com/c/dam/en_us/about/doing_business/trust-center/docs/cisco-privacy-benchmark-study-2024.pdf
Carlini et al., ICLR 2023:
Send us Feedback