This catalog covers tests for AI agents with tool use and autonomous capabilities. For foundation model and LLM-based application tests, see the Foundation Model Test Catalog.
Agent Tests Overview
Agent tests are designed to evaluate AI agents with tool use, autonomous decision-making, and interaction capabilities. These tests assess risks specific to agentic systems including tool misuse, access control, behavior patterns, and governance. All agent tests are dynamically generated based on the agent’s configuration and capabilities. They requiresystem_description or policy_description in dataset_configuration to generate contextually relevant test prompts.
Governance Test
Purpose: Tests the alignment and governance of the agent, including goal misalignment and policy drift detection. Required Configuration:system_descriptionorpolicy_descriptionindataset_configuration: Description of the agent’s intended purpose, goals, and governance policies
- Goal misalignment
- Policy drift
- Value alignment
- Regulatory compliance
- Prompt
- Response
Agent Output Quality Test
Purpose: Tests the quality of the agent’s output including hallucinations, bias, and toxicity in multi-turn workflows. Required Configuration:system_descriptionorpolicy_descriptionindataset_configuration: Description of the agent’s capabilities and expected output quality standards
- Hallucinations
- Bias and discrimination
- Toxicity
- Factual accuracy
- Consistency across turns
- Prompt
- Response
Tool Misuse Test
Purpose: Tests the misuse of the agent’s tools and capabilities including API integration issues, supply chain vulnerabilities, and resource consumption. Required Configuration:system_descriptionorpolicy_descriptionindataset_configuration: Description of available tools and their intended usage
- Unauthorized tool access
- API abuse
- Resource exhaustion
- Supply chain vulnerabilities
- Tool chaining attacks
- Prompt
- Response
Privacy Test
Purpose: Tests privacy protections and data handling including sensitive data exposure and exfiltration channels. Required Configuration:system_descriptionorpolicy_descriptionindataset_configuration: Description of the agent’s data handling capabilities and privacy requirements
- Sensitive data exposure
- Data exfiltration
- Cross-user data leakage
- Privacy policy violations
- Data retention issues
- Prompt
- Response
Reliability and Observability Test
Purpose: Tests the reliability and observability of agent operations including data poisoning, concept drift, and opaque reasoning. Required Configuration:system_descriptionorpolicy_descriptionindataset_configuration: Description of the agent’s decision-making processes and monitoring capabilities
- Data poisoning resistance
- Concept drift detection
- Opaque reasoning
- Decision traceability
- Error handling
- Prompt
- Response
Agent Behaviour Test
Purpose: Tests the behaviour patterns and decision-making of the agent including human manipulation and unsafe actuation. Required Configuration:system_descriptionorpolicy_descriptionindataset_configuration: Description of the agent’s intended behavior patterns and boundaries
- Human manipulation
- Unsafe actuation
- Boundary violations
- Autonomous decision-making risks
- Social engineering resistance
- Prompt
- Response
Access Control and Permissions Test
Purpose: Tests access control and permissions enforcement including credential theft, privilege escalation, and confused deputy attacks. Required Configuration:system_descriptionorpolicy_descriptionindataset_configuration: Description of the agent’s access control mechanisms and permission levels
- Credential theft
- Privilege escalation
- Confused deputy attacks
- Authorization bypass
- Permission boundary violations
- Prompt
- Response
Tool Extraction Test
Purpose: Tests if the agent tool information can be extracted in outputs, revealing internal capabilities and configurations. Required Configuration:system_descriptionorpolicy_descriptionindataset_configuration: Description of the agent’s tool capabilities and configuration
- Tool name extraction
- Tool capability disclosure
- Configuration information leakage
- API endpoint exposure
- Internal architecture revelation
- Prompt
- Response
Configuration Requirements Summary
Agent Tests
All agent tests require:-
system_description or policy_description (Required in
dataset_configuration)- Description of the agent’s purpose and capabilities
- Tool descriptions and access patterns
- Behavior boundaries and constraints
- Access control mechanisms
-
endpoint_configuration with
testing_for: "agents"- Must specify
"testing_for": "agents"to enable agent-specific testing - Model configuration with tool/function calling support
- Must specify
Test Coverage
By OWASP Agent Security Category
- Governance & Alignment: governance_test
- Output Quality: agent_output_quality_test
- Tool Security: tool_misuse_test, tool_extraction_test
- Privacy & Data: privacy_test
- Reliability: reliability_and_observability_test
- Behavior & Safety: agent_behaviour_test
- Access Control: access_control_and_permissions_test
Usage Notes
-
Agent-Specific Configuration: Ensure
testing_for: "agents"is set inendpoint_configurationto enable agent tests. -
Multi-Turn Testing: Agent tests benefit from multi-turn attack methods to simulate realistic agent interactions:
-
System Description: Provide detailed
system_descriptionincluding:- Agent’s purpose and role
- Available tools and their capabilities
- Access control mechanisms
- Behavioral boundaries
- Test Combinations: Multiple agent tests can be run simultaneously. Use appropriate sample percentages to manage test duration.
-
Tool Configuration: For agents with tool access, ensure tool descriptions are included in
system_descriptionto generate relevant test scenarios.
Related Pages
- Payload Guide - Overview and quick reference for building payloads
- Foundation Model Test Catalog - Tests for foundation models and LLM-based applications
- Attack Methods Reference - Detailed attack method information
- Configuration Reference - Complete configuration reference
- Examples - Ready-to-use payload examples
- Quickstart - Getting started with the API

