redteam_test_configurations
and target_model_configuration
.
Payload Structure
JSON
Red Team Test Configurations
This section specifies the test categories and parameters for each.JSON
Field Descriptions
- Categories: Each key (
bias_test
,cbrn_test
,harmful_test
,insecure_code_test
,toxicity_test
) represents a specific test category. sample_percentage
: Specifies the percentage of the dataset to use (1-100).attack_methods
: Methods to apply for each test category.
Test Categories
Custom Tests for Everyone
- custom_test: Custom dataset test.
Standard Tests for Everyone
- bias_test: Identifies and exposes biased outputs.
- cbrn_test: Addresses vulnerabilities related to chemical, biological, radiological, and nuclear domains.
- harmful_test: Elicits responses that promote harm or danger.
- insecure_code_test: Produces insecure or harmful code snippets.
- toxicity_test: Generates harmful or offensive content.
- pii_test: Exposes personally identifiable information.
- copyright_test: Exposes copyrighted material.
- misinformation_test: Exposes misinformation.
- system_prompt_extractions_test: Exposes system prompt extractions.
- sponge_test: Exposes infinite loops.
- competitor_test: Exposes information about competitors.
Agentic Tests
- alignment_and_governance_test: Testing the alignment and governance of the model.
- input_and_content_integrity_test: Testing the integrity of the input and content.
- infrastructure_and_integration_test: Testing the infrastructure and integration of the model.
- security_and_privacy_test: Testing the security and privacy of the model.
- human_factors_and_societal_impact_test: Testing the human factors and societal impact of the model.
- access_control_test: Testing the access control and permissions of the model.
- physical_and_actuation_safety_test: Testing the physical and actuation safety of the model.
- reliability_and_monitoring_test: Testing the reliability and monitoring of the model.
- governance_test: Testing the governance of the model.
- agent_output_quality_test: Testing the quality of the agent’s output.
- tool_misuse_test: Testing the misuse of the model’s tools.
- privacy_test: Testing the privacy of the model.
- reliability_and_observability_test: Testing the reliability and observability of the model.
- agent_behaviour_test: Testing the behaviour of the agent.
- access_control_and_permissions_test: Testing the access control and permissions of the model.
- tool_extraction_test: Testing the extraction of tools from the model.
Specialized Tests for Generated Datasets
- adv_bias_test: Uncover biased outputs through adversarial methods.
- adv_info_test: Extract sensitive or unintended information from a generated dataset.
- adv_tool_test: Misuse integrated tools or features.
- adv_command_test: Manipulate the model to execute unintended commands.
- adv_pii_test: Expose personally identifiable information.
- adv_competitor_test: Exploit vulnerabilities to gain an advantage over competitors.
Attack Methods
-
basic
: Basic attack method for each test category.basic
-
advanced
: Advanced attack method for each test category.static
: Static attack method for each test category.encoding
,single_shot
dynamic
: Dynamic attack method for each test category.iterative
Target Model Configuration
This section provides specifics about the model to be tested.JSON
Field Descriptions
testing_for
: Identifies the AI system type (e.g., “foundationModels”, “chatbotsAndCopilots”, “agents”).model_name
: Required; The name or identifier for the model (e.g., “google/gemma-7b-it”).model_version
: specifies the model version. Custom value like v1.system_prompt
: sets initial model behavior.model_source
: URL of model documentation or source.model_provider
: Entity providing the model (e.g., “Google”).model_endpoint_url
: Required; API endpoint for model interactions.model_api_key
: Key for API access; replace with actual credentials.
Usage Notes
- Verify that API keys and endpoints are correctly configured and secure.
- Adjust
sample_percentage
based on dataset size and testing requirements. - Choose suitable
attack_methods
based on specific evaluation needs. - Leave
system_prompt
empty if not needed. - Use the tool in accordance with model provider terms of service and ethical guidelines.