3. Development-time threats

3. Development-time threats

3.0 Development-time threats - Introduction


Data science (data engineering and model engineering - for machine learning often referred to as training phase) uses a development environment typically outside of the regular application development scope, introducing a new attack surface. Data engineering (collecting, storing, and preparing data) is typically a large and important part of machine learning engineering. Together with model engineering, it requires appropriate security to protect against data leaks, data poisoning, leaks of intellectual property, and supply chain attacks (see further below). In addition, data quality assurance can help reduce risks of intended and unintended data issues.


  • Particularity 1: the data in the AI development environment is real data that is typically sensitive, because it is needed to train the model and that obviously needs to happen on real data, instead of fake data that you typically see in standard development environment situations (e.g. for testing). Therefore, data protection activities need to be extended from the live system to the development environment.
  • Particularity 2: elements in the AI development environment (data, code, configuration & parameters) require extra protection as they are prone to attacks to manipulate model behaviour (called poisoning)
  • Particularity 3: source code, configuration, and parameters are typically critical intellectual property in AI

ISO/IEC 42001 B.7.2 briefly mentions development-time data security risks.

Controls for development-time protection:

  • See General controls


(development-time infosec). Development data protect: protect (train/test) data, source code, configuration & parameters

This data protection is important because when it leaks it hurts confidentiality of intellectual property and/or the confidentiality of train/test data which also may contain company secrets, or personal data for example. Also the integrity of this data is important to protect, to prevent data or model poisoning.

Training data is in most cases only present during development-time, but there are exceptions:

  • A machine learning model may be continuously trained with data collected runtime, which puts (part of the) train data in the runtime environment, where it also needs protection
  • For GenAI, information can be retrieved from a repository to be added to a prompt, for example to inform a large language model about the context to take into account for an instruction or question. This principle is called in-context learning. For example OpenCRE-chat uses a repository of requirements from security standards to add to a user question so that the large language model is more informed with background information. In the case of OpenCRE-chat this information is public, but in many cases the application of this so-called Retrieval Augmented Generation (RAG) will have a repository with company secrets or otherwise sensitive data. Organizations can benefit from unlocking their unique data, to be used by themselves, or to be provided as service or product. This is an attractive architecture because the alternative would be to train an LLM or to finetune it, which is expensive and difficult. An RAG approach may suffice. Effectively this puts the repository data to the same use as training data is used: control the behaviour of the model. Therefore, the security controls that apply to train data, also apply to this run-time repository data.

Protection strategies:

  • Encryption of data at rest
    Links to standards:
  • Technical access control for the data, to limit access following the least privilege principle
    Links to standards:
    • 27002 Controls 5.15, 5.16, 5.18, 5.3, 8.3. Gap: covers this control fully, with the particularities
    • OpenCRE
    • Centralized access control for the data
      Links to standards:
    • There is no 27002 control for this
    • OpenCRE
  • Operational security to protect stored data
    Links to standards:
    • Many 27002 controls cover operational security. Gap: covers this control fully, with the particularities.
      • 27002 control 5.23 Information security for use of cloud services
      • 27002 control 5.37 Documented operating procedures
      • Many more 27002 controls (See OpenCRE link)
    • OpenCRE
  • Logging and monitoring to detect suspicious manipulation of data, (e.g. outside office hours)
    Links to standards:


(management). Development security: the security management system needs to take into account the AI particularity: the AI development infrastructure holds sensitive information - regarding people, process and technology perspective. E.g. screening of development personnel, protection of source code/configuration, virus scanning on engineering machines.

Links to standards:

  • 27001 Information Security Management System, with the particularity


(development-time infosec). Segregate data: store sensitive training or test data in a separated environment with restricted access.

Links to standards:

  • 27002 control 8.31 Separation of development, test and production environments. Gap: covers this control partly - the particularity is that the development environment typically has the sensitive data instead of the production environment - which is typically the other way around in non-AI systems. Therefore it helps to restrict access to that data within the development environment. Even more: within the development environment further segregation can take place to limit access to only those who need the data for their work, as some developers will not be processing data.


(development-time infosec). ‘Confidential compute’: If available and possible, use features of the data science environment to hide training data and model parameters from model engineers

Links to standards:

  • Not covered yet in ISO/IEC standards


(development-time datascience). Federative learning can be applied when a training set is distributed over different organizations, preventing that the data needs to be collected in a central place - increasing the risk of leaking.

Federated Learning is a decentralized Machine Learning architecture wherein a number of edge clients (e.g., sensor or mobile devices) participate in collaborative, decentralized, asynchronous training, which is orchestrated and aggregated by a controlling central server. Advantages of Federated Learning include reduced central compute, and the potential for preservation of privacy, since training data may remain local to the edge/client device.

Broadly, Federated Learning generally consists of four high-level steps: First, there is a server-to-client broadcast; next, local models are updated on the client; once trained, local models are then returned to the central server; and finally, the central server updates via model aggregation.

Federated machine learning benefits & use cases

Federated machine learning may offer significant benefits for organizations in several domains, including regulatory compliance, enhanced privacy, scalability and bandwidth, and other user/client considerations.

Regulatory compliance. In federated machine learning, data collection is decentralized, which may allow for greater ease of regulatory compliance. Decentralization of data may be especially beneficial for international organizations, where data transfer across borders may be unlawful.

Enhanced privacy. Federated learning can provide enhanced privacy, as data does not leave the client device, minimizing the potential for exposure of sensitive information. Because data is vulnerable in transit, minimizing data broadcast back to a central server may additionally provide security benefits. However, care must be taken to guard models in transit, as sensitive data may still be extracted from the models themselves.

Scalability & bandwidth. Decreased data transfer between client devices and a central server may provide significant benefits for organizations where data transfer costs are high. Similarly, federation may provide advantages in resource-constrained environments where bandwidth considerations might otherwise limit data uptake and/or availability for modeling. Further, because federated learning optimizes network resources, these benefits may on aggregate allow for overall greater capacity & flexible scalability.

Additional benefits. Because federated learning relies on a plurality of models to aggregate an update to the central model, it may provide benefits in data & model diversity. The ability to operate efficiently in resource-constrained environments may further allow for increases in heterogeneity of client devices, further increasing the diversity of available data.

Challenges in federated machine learning

Challenges in Federated Learning include managing device and model heterogeneity, latency in broadcast phases, and preservation of privacy. Security concerns also include backdoor attacks via data/model poisoning; with federated systems additionally introducing a vast network of edge clients, some of which may be malicious.

Device Heterogeneity. User- or other edge devices may vary widely in their computational, storage, transmission, or other capabilities, presenting challenges for federated deployments. These may additionally introduce device-specific security concerns, which practitioners should take into consideration in design phases. While designing for constraints including connectivity, battery life, and compute, it is also critical to consider edge device security.

Broadcast Latency & Security. Efficient communication across a federated network introduces additional challenges. While strategies exist to minimize broadcast phase latency, they must also take into consideration potential data security risks. Because models are vulnerable during transmission phases, any communication optimizations must account for data security in transit.

Preservation of Privacy. Because data remain local to the client device, Federated Learning architectures are sometimes assumed to be fully privacy-preserving. However this is not the case, as sensitive data may still be extracted from the transmitted models themselves. Data may be revealed from models intercepted in transmission, or during the process of model aggregation by the central server. For this reason, additional privacy-enhancing measures may be applied to Federated Learning settings.

Privacy-preserving mitigations for Federated Learning include cryptographic and information-theoretic strategies, such as Secure Function Evaluation (SFE), also known as Secure Multi-Party Computation (SMC/SMPC); and Differential Privacy. However, all approaches entail tradeoffs between privacy and utility.

Backdoor Attacks. Federated Learning systems have shown particular vulnerability to data- and model-poisoning attacks. Two key aspects of federated systems may exacerbate this vulnerability in production. First, Federated Learning introduces a potentially vast number of edge clients, with few guarantees against malicious actors at the device level. Second, data localization–intended to be privacy-preserving–circumvents typical centralized data curation strategies.

Recent advances in Federated (on-device) Analytics may provide tools to address some of these issues. Practitioners should implement the fullest suite of MLSecOps tools possible to detect and mitigate backdoor attacks against Federated Learning systems.

Links to standards:

  • Not covered yet in ISO/IEC standards


(development-time infosec) Supply chain management: Managing the supply chain to minimize the security risk from externally obtained elements. In regular software engineering these elements are source code or software components (e.g. open source). The particularities for AI are:

  1. supplied elements can include data and models,
  2. many of the software components are executed development-time instead of just in production (the runtime of the application),
  3. as explained in the development-time threats, there are new vulnerable assets during AI development: training data and model parameters.

Security risks in obtained data or models can arise from accidental mistakes or from manipulations - just like with obtained source code or software components.

The AI supply chain can be complex. Just like with obtained source code or software components, data or models may involve multiple suppliers. For example: a model is trained by one vendor and then fine-tuned by another vendor. Or: an AI system contains multiple models, one is a model that has been fine-tuned with data from source X, using a base model from vendor A that claims data is used from sources Y and Z, where the data from source Z was labeled by vendor B. Because of hise supply chain complexity, data and model provenance is a helpful activity. The Software Bill Of Materials (SBOM) becomes the AIBOM (AI Bill Of Materials) or MBOM (Model Bill of Material).

Standard supply chain management includes provenance & pedigree, verifying signatures, using package repositories, frequent patching, and using dependency verification tools.

As said, in AI many of the software components are executed development-time, instead of just in production. Data engineering and model engineering involve operations on data and models for which often external components are used (e.g. tools such as Notebooks, or other MLOps applications). Because AI development has new assets such as the data and model parameters, these components pose a new threat. To make matters worse, data scientists also install dependencies on the Notebooks which makes the data and model engineering environment a dangerous attack vector and the classic supply chain guardrails typically don’t scan it.

See MITRE ATLAS - ML Supply chain compromise.

Links to standards:

  • 27002 Controls 5.19, 5.20, 5.21, 5.22, 5.23, 8.30. Gap: covers this control fully, with said particularity, and lacking controls on data provenance.
  • ISO/IEC AWI 5181 (Data provenance). Gap: covers the data provenance aspect to complete the coverage together with the 27002 controls - provided that the provenance concerns all sensitive data and is not limited to personal data.
  • ISO/IEC 42001 (AI management) briefly mentions data provenance and refers to 5181 in section B.7.5
  • OpenCRE

3.1. Broad model poisoning: model behaviour manipulation by altering data, engineering, or model

Impact: Integrity of model behaviour is affected, leading to issues from unwanted model output (e.g. failing fraud detection, decisions leading to safety issues, reputation damage, liability).

The type of impact on behaviour using broad model poisoning is typically more profound than with an evasion attack, for example:

  • Backdoors - which trigger unwanted responses to specific input variations (e.g. a money transaction is wrongfully marked as NOT fraud because it has a specific amount of money for which the model has been manipulated to ignore). Other name: Trojan attack
  • Unavailability by sabotage, leading to e.g. business continuity problems or safety issues

This poisoning is hard to detect once it has happened: there is no code to review in a model to look for backdoors, the model parameters make no sense to the human eye, and testing is typically done using normal cases, with blind spots for backdoors. This is the intention of attackers - to bypass regular testing. The best approach is 1) to prevent poisoining by protecting development-time, and 2) to assume training data has been compromised.


Controls for broad model poisoning:

  • See General controls
  • See controls for development-time protection


(development-time datascience). Model ensemble: include the model as part of an ensemble, where each model is trained in a separately protected environment. If one model’s output deviates from the others, it can be ignored, as this indicates possible manipulation.

Links to standards:

  • Not covered yet in ISO/IEC standards

3.1.1. Data poisoning by changing data development-time or supply chain

The attacker manipulates (training) data to affect the algorithm’s behavior. Also called causative attacks.

Example 1: an attacker breaks into a training set database to add images of houses and labels them as ‘fighter plane’, to mislead the camera system of an autonomous missile. The missile is then manipulated to attack houses. With a good test set this unwanted behaviour may be detected. However, the attacker can make the poisoned data represent input that normally doesn’t occur and therefore would not be in a testset. The attacker can then create that abnormal input in practice. In the previous exmaple this could be houses with white crosses on the door. See MITRE ATLAS - Poison traing data Example 2: a malicious supplier poisons data that is later obtained by another party to train a model. See MITRE ATLAS - Publish poisoned datasets Example 3: false information in documents on the internet causes a Large Language Model (GenAI) to output false results. That false information can be planted by an attacker, but of course also by accident. The latter case is a real GenAI risk, but technically comes down to the issue of having false data in a training set which falls outside of the security scope. (OWASP for LLM 03)

Controls for data poisoning:

  • See General controls
  • See controls for development-time protection


(development-time datascience): More train data: increasing the amount of non-malicious data makes training more robust against poisoned examples - provided that these poisoned examples are small in number. One way to do this is through data augmentation - the creation of artificial training set samples that are small variations of existing samples.

Links to standards:

  • Not covered yet in ISO/IEC standards


(development-time datascience). Data quality control: Perform quality control on data including detecting poisoned samples through statistical deviation or pattern recognition. For important data and scenarios this may involve human verification.

Particularity for AI and security: standard quality control needs to take into account that data may have maliciously been changed. This means that extra checks can be placed to detect changes that would normally not happen by themselves. For example: safely storing hash codes of data elements, such as images, and regularly checking to see if the images have been manipulated.

A method to detect statistical deviation is to train models on random selections of the training dataset and then feed each training sample to those models and compare results.

Links to standards:

  • ISO/IEC 5259 series on Data quality for analytics and ML. Gap: covers this control minimally. in light of the particularity - the standard does not mention approaches to detect malicious changes (including detecting statistical deviations). Nevertheless, standard data quality control helps to detect malicious changes that violate data quality rules.
  • ISO/iEC 42001 B.7.4 briefly covers data quality for AI. Gap: idem as 5259
  • Not further covered yet in ISO/IEC standards


(development-time datascience) - Train data distortion:.making poisoned samples ineffective by smoothing or adding noise to training data (with the best practice of keeping the original training data, in order to expertiment with the filtering)

See also EVASTIONROBUSTMODEL on adding noise against evasion attacks and OBFUSCATETRAININGDATA to minimize sensitive data through randomisation.


  • Transferability blocking. The true defense mechanism against closed box attacks is to obstruct the transferability of the adversarial samples. The transferability enables the usage of adversarial samples in different models trained on different datasets. Null labeling is a procedure that blocks transferability, by introducing null labels into the training dataset, and trains the model to discard the adversarial samples as null labeled data.
  • Local intrinsic dimensionality
  • (weight)Bagging - see Annex C in ENISA 2021
  • TRIM algorithm - see Annex C in ENISA 2021
  • STRIP technique (after model evaluation) - see Annex C in ENISA 2021

Link to standards:

  • Not covered yet in ISO/IEC standards


(development-time datascience). Poison robust model: select a model type and approach to reduce sensitivity to poisoned training data.

Example: Reducing sensitivity to backdoor poisoning attacks with fine-pruning (See paper on fine-pruning). Fine-pruning consists of two complementary approaches: Pruning and fine-tuning:
Pruning in essence reduces the size of the model so it does not have the capacity to trigger on backdoor-examples while remaining sufficient accuracy for the intended use case. The approach removes neurons in a neural network that have been identified as non-essential for sufficient accuracy.
Fine tuning retrains a model on a clean dataset(without poisoning) with the intention to remove memorisation of any backdoor triggers.

Links to standards:

  • Not covered yet in ISO/IEC standards

3.1.2. Development-time model poisoning

This threat refers to manipulating behaviour of the model by manipulating the engineering elements that lead to the model (including the parameters during development time), e.g. through supplying, changing components, code, or configuration. In some cases, the model is trained externally and supplied as-is, which also introduces a model poisoning threat. Data manipulation is referred to as data poisoning and is covered in separate threats.


  • See General controls
  • See controls for development-time protection

3.1.3 Transfer learning attack

An attacker supplies a manipulated pre-trained model which is then unknowingly further trained/fine tuned with still having the unwanted behaviour.

Example: GenAI models are sometimes obtained elsewhere (e.g. open source) and then fine-tuned. These models may have been manipulated at the source, or in transit. See OWASP for LLM 05: Supply Chain Vulnerabilities..

Controls specific for transfer learning:

  • See General controls
  • Choose a model type resilient against a transfer learning attack

3.2. Sensitive data leak development-time

3.2.1. Development-time data leak

Impact: Confidentiality breach of sensitive train/test data.

Training data or test data can be confidential because it’s sensitive data (e.g. personal data) or intellectual property. An attack or an unintended failure can lead to this training data leaking.
Leaking can happen from the development environment, as engineers need to work with real data to train the model.
Sometimes training data is collected at runtime, so a live system can become attack surface for this attack.
GenAI models are often hosted in the cloud, sometimes managed by an external party. Therefore, if you train or finetune these models, the training data (e.g. company documents) needs to travel to that cloud.


  • See General controls
  • See controls for development-time protection

3.2.2. Model theft through development-time model parameter leak

Impact: Confidentiality breach of model intellectual property.


  • See General controls
  • See controls for development-time protection

3.2.3. Source code/configuration leak

Impact: Confidentiality breach of model intellectual property.


  • See General controls
  • See controls for development-time protection