AI Security Overview

AI Security Overview

Summary - How to address AI Security?

See home for more information about this initiative, how to contribute or connect.
This page contains an overview of AI security and the next pages provide the main content: details on security threats to AI and controls against them. You can navigate through pages at the bottom of every page, or in the left sidebar. The right sidebar shows the different sections on a page. On small screens you can navigate through the menu.

While AI offers powerful perfomance boosts, it also increases the attack surface available to bad actors. It is therefore imperative to approach AI applications with a clear understanding of potential threats and which of those threats to prioritize for each use case. Standards and governance help guide this process for individual entities leveraging AI capabilities.

  • Implement AI governance
  • Extend security and development practices to include data science activities especially to protect and streamline the engineering environment.
  • Improve regular application and system security through understanding of AI particularities e.g. model parameters need protection and access to the model needs to be monitored and rate-limited.
  • Limit the impact of AI by minimizing privileges and adding oversight, e.g. guardrails, human oversight.
  • Countermeasures in data science through understanding of model attacks, e.g. data quality assurance, larger training sets, detecting common perturbation attacks, input filtering.

Threats overview

Threat model

We distinguish three types of threats: during development-time (when data is obtained and prepared, and the model is trained/obtained), through using the model (providing input and reading the output), and by attacking the system during runtime (in production). The diagram shows the threats in these three groups as arrows. Each threat has a specific impact, indicated by letters referring to the Impact legend. The control overview section contains this diagram with groups of controls added. AI Security Threats

AI Security Matrix

The AI security matrix below shows all threats and risks, ordered by type and impact.

Controls overview

Threat model with controls - general

The below diagram puts the controls in the AI Exchange into groups and places these groups in the right lifecycle with the corresponding threats. AI Security Threats and controls The groups of controls form a summary of how to address AI security (controls are in capitals):

  1. AI Governance: implement governance processes for AI risk, and include AI into your processes for information security and software lifecycle:

    (AIPROGRAM, SECPROGRAM, DEVPROGRAM, SECDEVPROGRAM, CHECKCOMPLIANCE, SECEDUCATE)

  2. Apply conventional technical IT security controls risk-based, since an AI system is an IT system:
    • 2a Apply standard conventional IT security controls (e.g. 15408, ASVS, OpenCRE, ISO 27001 Annex A, NIST SP800-53) to the complete AI system and don’t forget the new AI-specific assets :
      • Development-time: model & data storage, model & data supply chain, data science documentation:

        (DEVDATAPROTECT, DEVSECURITY, SEGREGATEDATA, SUPPLYCHAINMANAGE, DISCRETE)

      • Runtime: model storage, model use, plug-ins, and model input/output:

        (RUNTIMEMODELINTEGRITY, RUNTIMEMODELIOINTEGRITY, RUNTIMEMODELCONFIDENTIALITY, MODELINPUTCONFIDENTIALITY, ENCODEMODELOUTPUT, LIMITRESOURCES)

    • 2b Adapt conventional IT security controls to make them more suitable for AI (e.g. which usage patterns to monitor for):

      (MONITORUSE, MODELACCESSCONTROL, RATELIMIT)

    • 2c Adopt new IT security controls:

      (CONFCOMPUTE, MODELOBFUSCATION, PROMPTINPUTVALIDATION, INPUTSEGREGATION)

  3. Data scientists apply datascience security controls risk-based :
    • 3a Development-time controls when developing the model:

      (FEDERATIVELEARNING, CONTINUOUSVALIDATION, UNWANTEDBIASTESTING, EVASIONROBUSTMODEL, POISONROBUSTMODEL, TRAINADVERSARIAL, TRAINDATADISTORTION, ADVERSARIALROBUSTDISTILLATION, FILERSENSITIVETRAINDATA, MODELENSEMBLE, MORETRAINDATA, SMALLMODEL, DATAQUALITYCONTROL)

    • 3b Runtime controls to filter and detect attacks:

      (DETECTODDINPUT, DETECTADVERSARIALINPUT, DOSINPUTVALIDATION, INPUTDISTORTION, FILTERSENSITIVEMODELOUTPUT, OBSCURECONFIDENCE)

  4. Minimize data: Limit the amount of data in rest and in transit, and the time it is stored, development-time and runtime:

    (DATAMINIMIZE, ALLOWEDDATA, SHORTRETAIN, OBFUSCATETRAININGDATA)

  5. Control behaviour impact as the model can behave in unwanted ways - by mistake or by manipulation:

    (OVERSIGHT, LEASTMODELPRIVILEGE, AITRAINSPARENCY, EXPLAINABILITY, CONTINUOUSVALIDATION, UNWANTEDBIASTESTING)

All threats and controls are discussed in the further content of the AI Exchange.

Threat model with controls - GenAI trained/finetuned

Below diagram restricts the threats and controls to Generative AI only, for situations in which training or finetuning is done by the organization (note: this is not very common given the high cost and required expertise).

AI Security Threats and controls - GenAI trained or finetuned

Threat model with controls - GenAI as-is

Below diagram restricts the threats and controls to Generative AI only where the model is used as-is by the organization. The provider (e.g. OpenAI) has done the training/finetuning. Therefore, some threats are the responsibility of the model provider (sensitive/copyrighted data, manipulation at the provider). Nevertheless, the organization that uses the model should take these risks into account and gain assurance about them from the provider.

In many situation, the as-is model will be hosted externally and therefore security depends on how the supplier is handling the data, including the security configuration. How is the API protected? What is virtual private cloud? The entire external model, or just the API? Key management? Data retention? Logging? Does the model reach out to third party sources by sendint out sensitive input data?

AI Security Threats and controls - GenAI as-is

Navigator diagram

The navigator diagram below shows all threats, controls and how they relate, including risks and the types of controls.

ℹ️
Click on the image to get a PDF with clickable links.

About this Document

This document discusses threats to AI cyber security and controls for those threats (i.e. countermeasures, requirements, mitigations). Security here means preventing unauthorized access, use, disclosure, disruption, modification, or destruction. Modification includes manipulating the behaviour of an AI model in unwanted ways.

The AI Exchange initiative was taken by OWASP, triggered by Rob van der Veer - bridge builder for security standards, senior director at Software Improvement Group, with 31 years of experience in AI & security, lead author of ISO/IEC 5338 on AI lifecycle, founding father of OpenCRE, and currently working on security requirements concerning the EU AI act in CEN/CENELEC.

This material is all draft and work in progress for others to review and amend. It serves as input to ongoing key initiatives such as the EU AI act, ISO/IEC 27090 on AI security, ISO/IEC 27091 on AI privacy, the OWASP ML top 10, OWASP LLM top 10, and many more initiatives can benefit from consistent terminology and insights across the globe.

Sources

  • AI security experts who contributed to this as Open Source.
  • The insights of these experts were inspired by research work as mentioned in the references at the bottom of this document(ENISA, NIST, Microsoft, BIML, MITRE, etc.)

How we organized threats and controls

The threats are organized by attack surface (how and where does the attack take place?), and not by impact. This means that for example model theft is mentioned in three different parts of the overview:

  1. model theft by stealing model parameters from a live system, e.g. breaking into the network and reading the parameters from a file,
  2. model theft by stealing the modeling process or parameters from the engineering environment, e.g. stored in the version management system of a data scientist, and
  3. model theft by reverse engineering from using the AI system. These are three very different attacks, with similar impacts. This way of organizing is helpful because the goal is to link the threats to controls, and these controls vary per attack surface.

How to select relevant threats and controls? risk analysis

There are many threats and controls described in this document. Your situation determines which threats are relevant to you, and what controls are your responsibility. This selection process can be performed through risk analysis of the use case and architecture at hand:

  1. Threat identification: First select the threats that apply to your case by going through the list of threats and use the Impact description to see if it is applicable. For example the impact of identifying individuals in your training data would not apply to your case if your training data has no individuals. The Navigator shows impact in purple.

    If you use RAG (Retrieval Augmented Generation), then treat the retrieval repository (including embeddings) just like training data. Meaning:

    • Include the threats regarding data poisoning
    • Include the threats regarding train/test data leak if the data is sensitive

    Else, if you don’t train or finetune the model:

    • Ignore the development-time threats, with the exception of supply chain management: make sure the model you obtain is not manipulated, and genuine
    • Ignore the confidentiality of train data threats
    • Ignore the confidentiality of model IP threats
    • Ignore the data poisoning threat
    • Ignore development-time controls (e.g. filtering sensitive training data)

    These are the responsbilities of the model maker, but be aware you may be effected by the unwanted results. The maker may take the blame for any issue, which would take care of confidentiality issues, but you would suffer effectively from any manipulated model behaviour.

    If your train data is not sensitive: ignore the confidentiality of train data threats. A special case is the threat of membership inference: this threat only applies when the fact that a person was part of the training set is harmful information about the person, for example when the trainset consists of criminals and their history to predict criminal careers: membership of that set gives away the person is a convicted or aledged criminal.

    If your model is a GenAI model, ignore the following threats: evasion, model inversion. Also ignore prompt injection and insecure output handling if your GenAI model is NOT an LLM

    If your model is not a GenAI model, ignore (direct) prompt injection, and insecure output handling. Also, consider the risks around Evasion attacks: is it interesting AND possible for an attacker to manipulate input so that the model makes a wrong decision? An example where evasion IS interesting and possible: adding certain words in a spam email so that it is not recoginzed as such. An example where evasion is not interesting is when a patient gets a skin desease diagnosis based on a picture of the skin. The patient has no interest in a wrong decision, and also the patient typically has no control - well maybe by painting the skin. There are situations in which this CAN be of interest for the patient, for example to be eligible for compensation in case the (faked) skin desease was caused by certain restaurant food. This demonstrates that it all depends on the context whether a theoretical threat is a real threat or not.

    If your input data is not sensitive, ignore ‘leaking input data’. If you use RAG, consider data you retrieve also as input data.

  2. Arranging responsibility: For each selected threat, determine who is responsible to address it. By default, the organization that builds and deploys the AI system is responsible, but building and deploying may be done by different organizations, and some parts of the building and deployment may be deferred to other organizations, e.g. hosting the model, or providing a cloud environment for the application to run. Some aspects are shared responsibilities.

    If components of your AI system are hosted, then you share responsibility regarding all controls for the relevant threats with the hosting provider. This needs to be arranged with the provider, using for example a responsibility matrix. Components can be the model, model extensions, your application, or your infrastructure. See Threat model of using a model as-is.

    If an external party is not open about how certain risks are mitigated, consider requesting this information and when this remains unclear you are faced with either 1) accept the risk, 2) or provide your own mitigations, or 3)avoid the risk, by not engaging with the third party.

  3. Verify external responsibilities: For the threats that are the responsibility of other organisations: attain assurance whether these organisations take care of it. This would involve the controls that are linked to these threats.

  4. Control selection: Then, for the threats that are relevant to you and for which you are responsible: consider the various controls listed with that threat (or the parent section of that threat) and the general controls (they always apply). When considering a control, look at its purpose and determine if you think it is important enough to implement it and to what extent. This depends on the cost of implementation compared to how the purpose mitigates the threat, and the level of risk of the threat. These elements also play a role of course in the order you select controls: highest risks first, then starting with the lower cost controls (low hanging fruit).

  5. Use references: When implementing a control, consider the references and the links to standards. You may have implemented some of these standards, or the content of the standards may help you to implement the control.

  6. Risk acceptance: In the end you need to be able to accept the risks that remain regarding each threat, given the controls that you implemented.

  7. Further management of these controls (see SECPROGRAM), which includes continuous monitoring, documentation, reporting, and incident response.

For more information on risk analysis, see the SECPROGRAM control.

How about …

How about AI outside of machine learning?

A helpful way to look at AI is to see it as consisting of machine learning (the current dominant type of AI) models and heuristic models. A model can be a machine learning model which has learned how to compute based on data, or it can be a heuristic model engineered based on human knowledge, e.g. a rule-based system. Heuristic models still need data for testing, and sometimes to perform analysis for further building and validating the human knowledge.
This document focuses on machine learning. Nevertheless, here is a quick summary of the machine learning threats from this document that also apply to heuristic systems:

  • Model evasion is also possible for heuristic models, -trying to find a loophole in the rules
  • Model theft through use - it is possible to train a machine learning model based on input/output combinations from a heuristic model
  • Overreliance in use - heuristic systems can also be relied on too much. The applied knowledge can be false
  • Data poisoning and model poisoning is possible by manipulating data that is used to improve knowledge and by manipulating the rules development-time or runtime
  • Leaks of data used for analysis or testing can still be an issue
  • Knowledgebase, source code and configuration can be regarded as sensitive data when it is intellectual property, so it needs protection
  • Leak sensitive input data, for example when a heuristic system needs to diagnose a patient

How about responsible or trustworthy AI?

Responsible or trustworthy AI include security, but not the other way around: there are many more aspects of responsible/trustworthy AI than just security, and to make matters confusing, each of these aspects has a link with security. Let’s try to clarify:

  • Accuracy is about the AI model being sufficiently correct to perform its ‘business function’. Being incorrect can lead to physical safety problems (e.g. car trunk opens during driving) or other wrong decisions that are harmful (e.g. wrongfully declined loan). The link with security is that some attacks cause unwanted model behaviour which is by definition an accuracy problem. Nevertheless, the security scope is restricted to mitigating the risks of those attacks - NOT solve the entire problem of creating an accurate model (selecting representative data for the trainset etc.).
  • Safety (also reliability) is about the level of accuracy when there is a risk of harm (typically implying physical harm but not restricted to that) , plus the things that are in place to mitigate those risks (apart from accuracy), which included security to safeguard accuracy, plus a number of safety measures that are important for the business function of the model. These need to be taken care of not just for security reasons because the model can make unsafe decisions for other reasons (e.g. bad training data), so they are a shared concern between safety and security:
    • oversight to restrict unsafe behaviuour, and connected to that: assigning least privileges to the model,
    • continuous validation to safeguard accuracy,
    • transparency to warn users and depending systems of accuracy risks,
    • explainability to help users validate accuracy
  • Transparency: see above, plus in many cases users have the right to know details about a model being used and how it has been created. Therefore it is a shared concern between security, privacy and safety.
  • Explainability: see above, and apart from validating accuracy this can also support users to get transprancy and also understand what needs to change to get a different outcome. Therefore it is a shared concern between security, privacy, safety and business function. A special case is when explainability is required by law separate from privacy, which adds ‘compliance’ to the list of aspects that share this concern.
  • Robustness is about the ability of maintaining accuracy under expected or unexpected variations in input. The security scope is about when those variations are malicious which often requires different countermeasures than those required for robustness against benign variations. Just like with accuracy, security is not involved per se in creating a robust model for benign variations. The excption to this is when benign robustness supports malicious robustness, in which case this is a shared concern between safety and security. This depends on a case by case basis.
  • Fairness as in ‘free of unwanted bias’ where the model ‘mistreats’ certain groups. This is undesired for legal and ethical reasons and primarily therefore a business concern. The relation with security is that having detection of unwanted bias can help to identify unwanted model behaviour caused by an attack. For example, a data poisoning attack has inserted malicious data samples in the training set, which at first goes unnoticed, but then is discovered by an unexplained detection of bias in the model.
  • Empathy. The relation of that with security is that the feasible level of security should always be taken into account when validating a certain application of AI. If a sufficient level of security cannot be provided to individuals or organizations, then empathy means invalidating the idea, or takin other precautions.
  • Accountability. The relation of accountability with security is that security measures should be demonstrable, including the process that have led to those measures. In addition, traceability as a security property is important, just like in any IT system, in order to detect, reconstruct and respond to security incidents and provide accountability.

How about privacy?

AI Privacy can be divided into two parts:

  1. The AI security threats and controls in this document that are about confidentiality and integrity of (personal) data (e.g. model inversion, leaking training data), plus the integrity of the model behaviour
  2. Threats and controls with respect to rights of the individual, as covered by privacy regulations such as the GDPR, including use limitation, consent, fairness, transparency, data accuracy, right of correction/objection/reasure/access. For an overview, see the Privacy part of the OWASP AI guide

How about Generative AI (e.g. LLM)?

Yes, GenAI is leading the current AI revolution and it’s the fastest moving subfield of AI security. Nevertheless it is important to realize that other types of algorithms will remain to be applied to many important use cases such as credit scoring, fraud detection, medical diagnosis, product recommendation, image recognition, predictive maintenance, process control, etc. Relevant content has been marked with ‘GenAI’ in this document.

Important note: from a security framework perspective, GenAI is not that different from other forms of AI. GenAI threats and controls largely overlap and are very similar to AI in general. Nevertheless, some risks are (much) higher. Some are lower. Only a few risks are GenAI-specific.

GenAI security particularities are:

Nr.GenAI security particularitiesOWASP for LLM TOP 10
1Evasion attacks in general are about fooling a model using crafted input to make an unwanted decision, whereas for GenAI it is about fooling a model using a crafted prompt to circumvent behavioral policies (e.g. preventing offensive output or prevent leaking secrets).(OWASP for LLM 01:Prompt injection)
2Unwanted output of sensitive training data is an AI-broad issue, but more likely to be a high risk with GenAI systems that typically output rich content, and have been trained on a large variety of data sets.(OWASP for LLM 06)
3A GenAI model will not respect any variations in access privileges of training data. All data will be accessible to the model users.(OWASP for LLM 06: Sensitive Information Disclosure)
4Training data poisoning is an AI-broad problem, and with GenAI the risk is generally higher since training data can be supplied from different sources that may be challenging to control, such as the internet. Attackers could for example hijack domains and place manipulated information.(OWASP for LLM 03: Training Data Poisoning)
5Overreliance is an AI-broad risk factor, and in addition Large Language Models (GenAI) can make matters worse by coming across very confident and knowledgeable.(OWASP for LLM 09: Overreliance) and (OWASP for LLM 08: Excessive agency)
6Leaking input data: GenAI models mostly live in the cloud - often managed by an external party, which may increase the risk of leaking training data and leaking prompts. This issue is not limited to GenAI, but GenAI has 2 particular risks here: 1) model use involves user interaction through prompts, adding user data and corresponding privacy/sensitivity issues, and 2) GenAI model input (prompts) can contain rich context information with sensitive data (e.g. company secrets). The latter issue occurs with in context learning or Retrieval Augmented Generation(RAG) (adding background information to a prompt): for example data from all reports ever written at a consultancy firm. First of all, this information will travel with the prompt to the cloud, and second: the system will likely not respect the original access rights to the information. See the threat Leak sensitive input data.
7Pre-trained models may have been manipulated. The concept of pretraining is not limited to GenAI, but the approach is quite common in GenAI, which increases the risk of transfer learning attacks.(OWASP for LLM 05 - Supply chain vulnerabilities)
8The typical application of plug-ins in Large Language Models (GenAI) creates specific risks regarding the protection and privileges of these plugins - as they allow large language models (GenAI) to act outside of their normal conversation with the user.(OWASP for LLM 07)
9Prompt injection is a GenAI specific threat, listed under Application security threats(OWASP for LLM 01)
10Model inversion and membership inference are low to zero risks for GenAI(OWASP for LLM 06)
11GenAI output may contain elements that perform an injection attack such as cross-site-scripting.(OWASP for LLM 02)
12Denial of service can be an issue for any AI model, but GenAI models are extra sensitive because of the relatively high resource usage.(OWASP for LLM 04)

GenAI References:

How about the NCSC/CISA guidelines?

Mapping of the UK/US Guidelines for secure AI system development to the controls here at the AI Exchange:
(Search for them in this document or use the Navigator)

  1. Secure design
  • Raise staff awareness of threats and risks:
    #SECEDUCATE
  • Model the threats to your system:
    See Risk analysis under #SECPROGRAM
  • Design your system for security as well as functionality and performance:
    #AIPROGRAM, #SECPROGRAM, #DEVPROGRAM, #SECDEVPROGRAM, #CHECKCOMPLIANCE, #LEASTMODELPRIVILEGE, #DISCRETE, #OBSCURECONFIDENCE, #OVERSIGHT, #RATELIMIT, #DOSINPUTVALIDATION, #LIMITRESOURCES, #MODELACCESSCONTROL, #AITRANSPRENCY
  • Consider security benefits and trade-offs when selecting your AI model
    All development-time datascience controls (currently 13), #EXPLAINABILITY
  1. Secure Development
  • Secure your supply chain:
    #SUPPLYCHAINMANAGE
  • Identify, track and protect your assets:
    #DEVDATAPROTECT, #DEVSECURITY, #SEGREGATEDATA, #CONFCOMPUTE, #MODELINPUTCONFIDENTIALITY, #RUNTIMEMODELCONFIDENTIALITY, #DATAMINIMIZE, #ALLOWEDDATA, #SHORTRETAIN, #OBFUSCATETRAININGDATA and part of #SECPROGRAM
  • Document your data, models and prompts:
    Part of #DEVPROGRAM
  • Manage your technical debt:
    Part of #DEVPROGRAM
  1. Secure deployment
  • Secure your infrastructure:
    Part of #SECPROGRAM and see ‘Identify, track and protect your assets’
  • Protect your model continuously:
    #INPUTDISTORTION, #FILTERSENSITIVEMODELOUTPUT, #RUNTIMEMODELIOINTEGRITY, #MODELINPUTCONFIDENTIALITY, #PROMPTINPUTVALIDATION, #INPUTSEGREGATION
  • Develop incident management procedures:
    Part of #SECPROGRAM
  • Release AI responsibly:
    Part of #DEVPROGRAM
  • Make it easy for users to do the right things:
    Part of #SECPROGRAM
  1. Secure operation and maintenance
  • Monitor your system’s behaviour:
    #CONTINUOUSVALIDATION, #UNWANTEDBIASTESTING
  • Monitor your system’s inputs:
    #MONITORUSE, #DETECTODDINPUT, #DETECTADVERSARIALINPUT
  • Follow a secure by design approach to updates:
    Part of #SECDEVPROGRAM
  • Collect and share lessons learned:
    Part of #SECPROGAM and #SECDEVPROGRAM

References

References on the OWASP AI guide (a project of which this document is part):

Overviews of model attacks:

Misc.: