Attack Methods by Model Type
1. Large Language Models (LLMs) & AI Agents
Input: Text | Output: Text1.1 Direct Prompt Injection
basic - Raw Prompts (Baseline)
basic - Raw Prompts (Baseline)
basic
Description: Direct injection of adversarial prompts without any obfuscation or encoding. This is the baseline attack that every test should include.When to Use:- Always include as your first attack method
- Establishes baseline vulnerability assessment
- Quick testing during development
iterative - Iterative Attacks (Dynamic)
iterative - Iterative Attacks (Dynamic)
iterative
Description: Progressive prompt refinement based on model responses. The attack adapts iteratively, learning from each response to craft increasingly effective prompts.When to Use:- Testing models with strong initial defenses
- Comprehensive security assessments
- Understanding model’s resistance to adaptive attacks
width
(integer): Number of parallel attack paths to explore (default: 5, range: 1-10)branching_factor
(integer): Number of variations per iteration (default: 9, range: 1-15)depth
(integer): Maximum iteration depth (default: 3, range: 1-5)
multi_turn - Multi-Turn Attacks (Dynamic)
multi_turn - Multi-Turn Attacks (Dynamic)
multi_turn
Description: Distributes malicious intent across multiple conversation turns, exploiting the model’s conversation memory to build up to harmful outputs.When to Use:- Testing conversational AI and chatbots
- Assessing context window vulnerabilities
- Evaluating memory-based attack resistance
eai_attack - EAI Attack (Static)
eai_attack - EAI Attack (Static)
eai_attack
Description: Exploit-Amplify-Iterate methodology for systematic jailbreaking using graph-based encoding techniques.When to Use:- Advanced security testing
- Research-grade assessments
- Testing sophisticated defense mechanisms
1.2 Encoding & Obfuscation Techniques
ascii_encoding - ASCII Encoding
ascii_encoding - ASCII Encoding
ascii_encoding
Description: Converts characters to ASCII decimal values to evade content filters that pattern-match text.Example: “hello” → “104 101 108 108 111”When to Use:- Bypassing simple text-based filters
- Testing encoding awareness
- Combined with other techniques
base64_encoding - Base64 Encoding
base64_encoding - Base64 Encoding
base64_encoding
Description: Base64-encoded prompts to bypass pattern matching and content filters.Example: “attack” → “YXR0YWNr”When to Use:- Testing encoding robustness
- Multi-iteration encoding for advanced evasion
- Common bypass technique
encoding_type
(string): Type of encoding (default: “base64”)iterations
(integer): Number of encoding iterations (default: 1, range: 1-3)
binary_encoding - Binary Encoding
binary_encoding - Binary Encoding
binary_encoding
Description: Represents text as binary (base-2) to obscure malicious instructions.Example: “hi” → “01101000 01101001”When to Use:- Testing technical encoding understanding
- Advanced obfuscation scenarios
- Combined attack vectors
hex_encoding - Hexadecimal Encoding
hex_encoding - Hexadecimal Encoding
hex_encoding
Description: Hex-encoded prompts for filter evasion using base-16 representation.Example: “test” → “74657374”When to Use:- Technical content filters
- Programming-focused models
- Combined with other encodings
url_encoding - URL Encoding
url_encoding - URL Encoding
url_encoding
Description: Percent-encoded characters to obscure intent using URL encoding standards.Example: “hack me” → “hack%20me”When to Use:- Web-based applications
- URL/link processing models
- Combined obfuscation
obfuscation - General Obfuscation
obfuscation - General Obfuscation
obfuscation
Description: General obfuscation techniques including character substitution, spacing manipulation, and other text transformations.When to Use:- First-line static attack testing
- Complement to basic attacks
- Standard security assessment
1.3 Cipher & Character Substitution
leet_encoding - Leet Speak
leet_encoding - Leet Speak
leet_encoding
Description: Alphanumeric character substitution popular in internet culture.Example: “hack” → “h4ck”, “elite” → “31337”When to Use:- Testing character-level pattern matching
- Social engineering contexts
- Combined with other methods
rot13_encoding - ROT13 Cipher
rot13_encoding - ROT13 Cipher
rot13_encoding
Description: Caesar cipher with 13-position character rotation (A↔N, B↔O, etc.).Example: “hello” → “uryyb”When to Use:- Testing cipher understanding
- Classic obfuscation technique
- Educational/demonstration purposes
rot21_encoding - ROT21 Cipher
rot21_encoding - ROT21 Cipher
rot21_encoding
Description: Caesar cipher with 21-position character rotation.Example: “test” → “ozno”When to Use:- Alternative to ROT13
- Testing cipher detection range
- Comprehensive cipher testing
morse_encoding - Morse Code
morse_encoding - Morse Code
morse_encoding
Description: Represents text using Morse code dots and dashes.Example: “SOS” → ”… --- …”When to Use:- Unique encoding tests
- Historical/educational contexts
- Comprehensive encoding coverage
1.4 Multilingual Attacks
lang_fr - French
lang_fr - French
lang_fr
Description: Prompt translation to French for filter bypass. Many content filters are optimized for English.When to Use:- International models
- Testing language-specific defenses
- Comprehensive multilingual testing
lang_it - Italian
lang_it - Italian
lang_it
Description: Italian-language prompt injection to bypass English-focused filters.Configuration:lang_hi - Hindi
lang_hi - Hindi
lang_hi
Description: Hindi-language adversarial prompts, useful for testing non-Latin script handling.Configuration:lang_es - Spanish
lang_es - Spanish
lang_es
Description: Spanish-language attack vectors for Romance language testing.Configuration:lang_ja - Japanese
lang_ja - Japanese
lang_ja
Description: Japanese-language jailbreak attempts, testing Asian language defenses.Configuration:1.5 Advanced Techniques
deep_inception - Deep Inception
deep_inception - Deep Inception
deep_inception
Description: Nested multi-layer prompt injection using recursive context framing. Creates layered scenarios that progressively lead the model toward prohibited outputs.When to Use:- Advanced security research
- Testing sophisticated safety measures
- Comprehensive vulnerability assessment
2. Vision-Language Models (VLMs)
Input: Text + Image | Output: Text2.1 Visual Manipulation Attacks
basic - Raw Prompts for VLM
basic - Raw Prompts for VLM
basic
Description: Direct adversarial prompts with unmodified images as baseline for VLM testing.When to Use:- Always include as baseline for VLM testing
- Quick visual content assessment
- Establishing VLM vulnerability baseline
masking - Image Masking
masking - Image Masking
masking
Description: Strategic occlusion or masking of image regions to manipulate context and bypass visual content filters.When to Use:- Testing visual content moderation
- Occlusion-based attacks
- Combined visual-text attacks
figstep - FigStep Attack
figstep - FigStep Attack
figstep
Description: Figure-based step-wise adversarial attack technique that uses sequential visual elements to build malicious context.When to Use:- Advanced VLM security testing
- Multi-step visual attacks
- Research-grade assessments
2.2 Research-Grade Attacks
hades - HADES (Research-Grade)
hades - HADES (Research-Grade)
hades
Description: Advanced visual jailbreak methodology using sophisticated image perturbation techniques.When to Use:- Research and academic testing
- Advanced VLM security assessment
- May require special access permissions
jood - JOOD (Research-Grade)
jood - JOOD (Research-Grade)
jood
Description: Joint Out-of-Distribution attack leveraging distributional shifts in both visual and textual modalities.When to Use:- Research and academic testing
- OOD robustness evaluation
- May require special access permissions
3. Audio-Language Models (ALMs)
Input: Text + Audio | Output: Text3.1 Audio-Based Attacks
basic - Raw Prompts for ALM
basic - Raw Prompts for ALM
basic
Description: Direct adversarial prompts with audio input as baseline for ALM testing.When to Use:- Always include as baseline for ALM testing
- Quick audio content assessment
- Establishing ALM vulnerability baseline
waveform - Waveform Manipulation
waveform - Waveform Manipulation
waveform
Description: Audio waveform modification to bypass safety guardrails through signal processing techniques.When to Use:- Testing audio content moderation
- Signal-level attack testing
- Comprehensive ALM security
echo - Echo Effect
echo - Echo Effect
echo
Description: Echo-based audio manipulation to obscure or modify harmful audio content.When to Use:- Audio obfuscation testing
- Environmental effect bypass
- Combined audio attacks
speed - Speed Alteration
speed - Speed Alteration
speed
Description: Audio speed modification techniques (faster/slower playback) to bypass detection.When to Use:- Temporal manipulation testing
- Rate-based evasion
- Audio processing robustness
pitch - Pitch Shifting
pitch - Pitch Shifting
pitch
Description: Pitch modification for audio obfuscation while maintaining intelligibility.When to Use:- Frequency-based evasion
- Voice transformation testing
- Audio filter bypass
reverb - Reverb Effect
reverb - Reverb Effect
reverb
Description: Reverb-based audio manipulation to obscure content through spatial effects.When to Use:- Environmental audio testing
- Spatial effect evasion
- Combined audio techniques
noise - Noise Injection
noise - Noise Injection
noise
Description: Background noise injection techniques to obscure harmful content while maintaining comprehension.When to Use:- Noise robustness testing
- SNR-based evasion
- Real-world audio scenarios
Attack Method Combinations
Recommended Combinations
Starter Combination (Quick Testing)
Standard Combination (Balanced)
Advanced Combination (Comprehensive)
Best Practices
Start Simple
basic
attacks to establish a baseline before adding complexity.Progressive Testing
Match Your Model
Consider Cost
Related Pages
- Payload Guide - Overview and quick reference
- Configuration Reference - Detailed configuration options
- Examples - Complete payload examples