The payload is a JSON object containing two main sections: redteam_test_configurations and target_model_configuration.

Payload Structure

{
    "redteam_test_configurations": {
        # Test configurations
    },
    "target_model_configuration": {
        # Model details
    }
}

Red Team Test Configurations

This section defines the types of tests to run and their parameters.

"redteam_test_configurations": {
    "jailbreak": {
        "sample_percentage": 5,
        "test_methods": ["single_shot", "iterative"]
    },
    "malware": {
        "sample_percentage": 5,
        "test_methods": ["malware"]
    },
    "toxicity": {
        "sample_percentage": 5,
        "test_methods": ["real_toxic_prompts"]
    },
    "bias": {
        "sample_percentage": 5,
        "test_methods": ["implicit_word_test", "implicit_sentence_test"]
    }
}

Fields:

  • Each key (jailbreak, malware, toxicity, bias) represents a test category.
  • sample_percentage: Percentage of the dataset to use for testing (1-100).
  • test_methods: Array of specific test methods to apply for each category.

Test Categories:

  1. Jailbreak: Attempts to bypass model safeguards.
  2. Malware: Tests for potential malicious code generation.
  3. Toxicity: Evaluates responses to toxic or offensive prompts.
  4. Bias: Assesses model bias in various contexts.

Test Methods:

  • single_shot: One-time prompts for jailbreaking.
  • iterative: Multiple attempts to bypass restrictions.
  • malware: Specific tests for malware-related content.
  • real_toxic_prompts: Uses real-world toxic language samples.
  • implicit_word_test: Tests for subtle biases in word associations.
  • implicit_sentence_test: Evaluates biases in sentence completions.

Target Model Configuration

This section specifies details about the model being tested.

"target_model_configuration": {
    "model_name": "google/gemma-7b-it",
    "model_access_method": "Anyscale Endpoint",
    "model_type": "text_2_text",
    "system_prompt": "",
    "conversation_template": "",
    "model_source": "https://docs.anyscale.com",
    "model_provider": "Google",
    "model_endpoint_url": "https://api.endpoints.anyscale.com/v1/chat/completions",
    "model_api_key": "ANYSCALE_API_KEY"
}

Fields:

  • model_name: Identifier for the model (e.g., “google/gemma-7b-it”).
  • model_access_method: How the model is accessed (e.g., “Anyscale Endpoint”).
  • model_type: Type of model, typically “text_2_text” for language models.
  • system_prompt: Initial prompt to set model behavior (if applicable).
  • conversation_template: Template for structuring conversations (if applicable).
  • model_source: Documentation or source URL for the model.
  • model_provider: Company or organization providing the model.
  • model_endpoint_url: API endpoint for model interactions.
  • model_api_key: Authentication key for API access (replace with actual key).

Usage Notes

  1. Ensure all API keys and endpoints are correctly set and secured.
  2. Adjust sample_percentage based on your testing needs and dataset size.
  3. Choose appropriate test_methods for each category based on your evaluation goals.
  4. The system_prompt and conversation_template can be left empty if not applicable.
  5. Always use this tool responsibly and in compliance with the model provider’s terms of service.

This payload structure allows for comprehensive testing of various aspects of model behavior and security across different providers and model types.