Payload Structure Overview
Every payload consists of three main sections:Dataset Configuration
The dataset configuration specifies settings for generating test prompts when using generated datasets.Complete Example
Field Reference
Description of the AI system being tested. This helps the attack generator create contextually relevant prompts.Example:
"This is a customer support chatbot for a banking application"
The code of conduct or policy that the model should follow. Defines what the model should NOT do.Example:
"Do not provide harmful or illegal information. Do not share customer data."
Maximum number of prompts to generate for the dataset.Range: 1-1000Guidelines:
- Development: 10-50
- Standard testing: 100-200
- Comprehensive: 500-1000
Number of different scenarios to test across.Range: 1-10Description: Creates variety in attack contexts and use cases.
Number of risk categories to include in the generated dataset.Range: 1-15Description: Spreads testing across different types of risks (toxicity, bias, PII, etc.).
Depth of the attack tree for generating adversarial prompts.Range: 1-5Description: Higher depth creates more sophisticated, multi-step attacks.Guidelines:
- Depth 1-2: Simple attacks
- Depth 3: Moderate complexity
- Depth 4-5: Complex, multi-layered attacks
Usage Notes
The dataset configuration is only used when generating custom datasets. If you’re testing with pre-existing prompts, you can omit this section.
Red Team Test Configurations
This section specifies which tests to run and how to attack the model for each test type.Structure
Common Fields
Percentage of the dataset to use for this specific test.Range: 1-100Guidelines:
- Development/Quick Testing: 1-5%
- Standard Testing: 10-20%
- Comprehensive Testing: 50-100%
"sample_percentage": 50
means use 50% of available promptsConfiguration of attack methods to use for this test. See Attack Methods Reference for detailed information.Structure:
Test Categories
Custom Tests (1 test)
Custom Tests (1 test)
Standard Tests (12 tests)
Standard Tests (12 tests)
bias_test
Identifies and exposes biased outputs based on protected attributes (race, gender, age, etc.).cbrn_test
Addresses vulnerabilities related to chemical, biological, radiological, and nuclear domains.csem_test
Addresses vulnerabilities related to child abuse exploitation.harmful_test
Elicits responses that promote harm or danger to individuals or groups.insecure_code_test
Produces insecure or harmful code snippets with vulnerabilities.toxicity_test
Generates harmful or offensive content including hate speech and profanity.pii_test
Exposes personally identifiable information that should remain private.copyright_test
Exposes copyrighted material that the model should not reproduce.misinformation_test
Tests the model’s tendency to generate false or misleading information.system_prompt_extractions_test
Attempts to extract the system prompt or instructions.sponge_test
Tests for infinite loops or resource exhaustion attacks.competitor_test
Attempts to extract information about competitors.Example:Agentic Tests (8 tests)
Agentic Tests (8 tests)
Tests specifically designed for AI agents with tool use and autonomous capabilities:
governance_test
Testing the alignment and governance of the model including goal misalignment and policy drift.agent_output_quality_test
Testing the quality of the agent’s output including hallucinations, bias, and toxicity in multi-turn workflows.tool_misuse_test
Testing the misuse of the model’s tools including API integration issues, supply chain vulnerabilities, and resource consumption.privacy_test
Testing privacy protections and data handling including sensitive data exposure and exfiltration channels.reliability_and_observability_test
Testing the reliability and observability of agent operations including data poisoning, concept drift, and opaque reasoning.agent_behaviour_test
Testing the behaviour patterns and decision-making including human manipulation and unsafe actuation.access_control_and_permissions_test
Testing access control and permissions enforcement including credential theft, privilege escalation, and confused deputy attacks.tool_extraction_test
Test if the agent tool information can be extracted in outputs.Example:Specialized Tests for Generated Datasets (8 tests)
Specialized Tests for Generated Datasets (8 tests)
Advanced tests that require generated adversarial datasets:
adv_bias_test
Uncover biased outputs through adversarial methods.adv_info_test
Extract sensitive or unintended information from a generated dataset.adv_persona_test
Adversarial persona manipulation and identity spoofing attacks.adv_command_test
Manipulate the model to execute unintended commands.adv_pii_test
Expose personally identifiable information through adversarial techniques.adv_competitor_test
Exploit vulnerabilities to gain an advantage over competitors.adv_sponge_test
Advanced resource exhaustion and denial of service attacks.adv_persona_test
Adversarial persona manipulation and identity spoofing attacks.Example:Multiple Tests Example
You can run multiple tests in a single payload:Endpoint Configuration
This section provides specifics about the model to be tested.Complete Example
Top-Level Fields
Identifies the AI system type being tested.Options:
"foundationModels"
- Base language models (GPT, Claude, Gemini, etc.)"chatbotsAndCopilots"
- Chat interfaces and coding assistants"agents"
- Autonomous AI agents with tool use"URL"
- Custom API endpoints
The name or identifier for the model.Examples:
"gpt-4o"
"gpt-4o-mini"
"claude-3-5-sonnet-20241022"
"google/gemma-7b-it"
Configuration object for the model endpoint. See detailed fields below.
Model Config Fields
The provider of the model.Supported Providers:
"openai"
- OpenAI models"anthropic"
- Anthropic models"together"
- Together AI"groq"
- Groq"google"
- Google (Gemini)"aws_bedrock"
- AWS Bedrock"azure"
- Azure OpenAI"huggingface"
- HuggingFace models
Object containing endpoint details.Subfields:
scheme
(string): Protocol scheme (e.g.,"https"
)host
(string): API host (e.g.,"api.openai.com"
)port
(integer): Port number (e.g.,443
)base_path
(string): Base path for API endpoint (e.g.,"/v1/chat/completions"
)
Authentication configuration. Format varies by provider.Standard Header Auth (most providers):AWS Bedrock:Google Gemini (Query Param):
Array of API keys for authentication.Example:
["OPENAI_API_KEY"]
Note: Can provide multiple keys for load balancing.Input types the model accepts.Options:
["text"]
- Text-only (LLMs)["text", "image"]
- Text and images (VLMs)["text", "audio"]
- Text and audio (ALMs)
Output types the model produces.Typical:
["text"]
Optional. Sets initial model behavior or instructions.Example:
"You are a helpful customer support assistant."
Optional. Rate limit for API calls per minute.Example:
60
Optional. Object defining API path mappings.Example:
Optional. Provider-specific configuration options.Azure OpenAI:AWS Bedrock:Anthropic:HuggingFace:
Model-Type Specific Examples
Provider-Specific Notes
OpenAI
Standard bearer authentication with
Authorization: Bearer YOUR_KEY
Anthropic
Requires
anthropic_version
in metadata and x-api-key
headerAWS Bedrock
Uses AWS credentials instead of API keys. Requires region in metadata.
Google Gemini
Uses query parameter authentication with
?key=YOUR_KEY
Azure OpenAI
Requires Azure-specific metadata including instance and deployment ID
HuggingFace
Supports caching and wait-for-model options in metadata
Best Practices
Security Best Practices:
- Never commit API keys to version control
- Use environment variables for sensitive credentials
- Rotate API keys regularly
- Use separate keys for testing vs production
Configuration Tips:
- Start with
sample_percentage
of 2-5% for initial testing - Gradually increase to 50-100% for comprehensive assessments
- Use multiple test types to cover different risk categories
- Match attack methods to your specific security concerns
Related Pages
- Payload Guide - Overview and quick reference
- Attack Methods Reference - Detailed attack method information
- Examples - Complete payload examples
- GET Defaults API - Supported providers and models