Cyber agencies issue guidance to IT developers and admins for training and operating AI systems
More free guidance on securing data used to train and operate artificial intelligence (AI) and machine learning (ML) systems is now available.
It comes from cybersecurity agencies in the U.S., the U.K., Australia and New Zealand, who today issued joint guidance to IT developers and administrators for AI system lifecycle and general best practices to secure data during the development, testing, and operation of AI-based systems.
These best practices include the incorporation of techniques such as data encryption, digital signatures, data provenance tracking, secure storage, and trust infrastructure, says a summary. It also provides an in-depth examination of three significant areas of data security risks in AI systems:
—data supply chain;
—maliciously modified (“poisoned”) data;
—and data drift.
Each section provides a detailed description of the risks and the corresponding best practices to mitigate those risks.
“The data resources used during the development, testing, and operation of an AI system are a critical component of the AI supply chain,” says the paper. “Therefore, the data resources must be protected and secured.”
“Data security is paramount in the development and deployment of AI systems,” it says.
“Successful data management strategies must ensure that the data has not been tampered with at any point throughout the entire AI system lifecycle; is free from malicious, unwanted, and unauthorized content; and does not have unintentional duplicative or anomalous information. Note that AI data security depends on robust, fundamental cybersecurity protection for all datasets used in designing, developing, deploying, operating, and maintaining AI systems and the ML models that enable them.”
Release of the paper comes as governments are having trouble passing AI governance legislation (Canada’s proposed Bill C-27 died when the March 2025 election was called) are refusing to pass national rules (in January President Donald Trump issued an executive order rescinding President Biden's executive order for the safe development and use of AI) or don’t know if they should pass legislation similar to the EU’s AI Act.
The cyber agencies’ guidance in some ways fills that gap — assuming organizations follow it.
It builds on the joint guidance by the partners issued just over a year ago on deploying AI systems securely, but focuses on securing the data used to train and operate AI-based systems.
Note this guidance deals with security, not how to stop AI systems from hallucinating. (In the latest on that, an AI system used to create an article for the Chicago Sun-Times recommended coming books from famous authors that, well, aren’t coming.)
I’m not going through the entire guidance issued by the cyber agencies, but these steps should be highlighted:
-source reliable data and track data provenance;
-verify and maintain data integrity during storage and transportation;
--use digital signatures to authenticate trusted data revisions;
-only use trusted IT infrastructure;
-classify data on its sensitivity, then use access controls to protect data types;
-encrypt data classified as sensitive;
--store data securely;
--use privacy-preserving techniques like depersonalizing data and differential privacy where necessary;
-before getting rid of AI storage drives delete data securely;
--conduct ongoing data security risk assessments.
Adopting best practices and risk management will help safeguard sensitive, proprietary, and mission critical data used in the development and operation of your your AI/ML system.