AI Security Arena

Prompt Engineering Training

Master prompt injection, jailbreak techniques, and defensive prompting through hands-on exercises

13 exercises found

Injectionbeginner

Basic Prompt Override

Learn how attackers attempt to override system instructions by injecting conflicting directives into user input.

Objective: Understand how prompt injection works by overriding system-level instructions with user-level directives.

Injectionintermediate

Indirect Prompt Injection

Explore how malicious instructions embedded in external data sources can compromise AI behavior when the model processes untrusted content.

Objective: Understand indirect prompt injection where malicious payloads arrive through data the model processes, not direct user input.

Injectionadvanced

Delimiter Escape Attack

Explore how attackers use special characters and delimiters to break out of intended input boundaries and inject new instructions.

Objective: Learn how delimiter-based input sanitization can be bypassed and why robust parsing is essential.

Jailbreakbeginner

Role-Play Jailbreak

Understand how fictional framing and role-play scenarios are used to bypass AI safety guardrails.

Objective: Recognize how role-play and fictional framing are common jailbreak vectors that attempt to shift the model's compliance context.

Jailbreakadvanced

Token Smuggling

Explore techniques where harmful requests are fragmented, encoded, or obfuscated to evade content filters.

Objective: Understand how token-level and encoding-based obfuscation can be used to smuggle harmful requests past safety filters.

Jailbreakexpert

Multi-Turn Escalation

Study how seemingly innocent conversation turns can gradually escalate toward restricted outputs through incremental boundary pushing.

Objective: Recognize how multi-turn conversations can be weaponized to gradually erode safety boundaries through contextual manipulation.

Defenseintermediate

System Prompt Hardening

Learn techniques for writing robust system prompts that resist injection and jailbreak attempts.

Objective: Develop skills in writing defensive system prompts that maintain model behavior under adversarial conditions.

Defenseintermediate

Output Filtering Strategies

Design output validation rules that catch potentially harmful model responses before they reach users.

Objective: Understand the principles and trade-offs of output filtering as a defense-in-depth layer for AI systems.

Defenseadvanced

Guardrail Architecture

Design a multi-layered guardrail system combining input validation, prompt engineering, and output filtering.

Objective: Learn to design defense-in-depth architectures for AI systems that combine multiple security layers.

Extractionintermediate

System Prompt Extraction

Understand how attackers attempt to extract hidden system prompts from AI models through carefully crafted queries.

Objective: Understand system prompt extraction techniques and why protecting system-level instructions is a critical security concern.

Extractionexpert

Training Data Probing

Explore techniques used to probe AI models for memorized training data, including personally identifiable information.

Objective: Understand the risks of training data memorization and techniques used to audit models for data leakage.

Alignmentadvanced

Value Alignment Conflicts

Explore scenarios where different alignment objectives (helpfulness, harmlessness, honesty) come into tension with each other.

Objective: Understand the fundamental alignment challenges in AI systems where core values (helpful, harmless, honest) can conflict.

Alignmentexpert

Reward Hacking Scenarios

Study how AI systems might find unintended shortcuts to satisfy their objectives without truly fulfilling the intended goal.

Objective: Recognize how naive optimization objectives can lead to reward hacking, and why robust alignment requires careful objective specification.

Prompt Engineering Training

Master prompt injection, jailbreak techniques, and defensive prompting through hands-on exercises

13 exercises found

Injectionbeginner

Basic Prompt Override

Learn how attackers attempt to override system instructions by injecting conflicting directives into user input.

Objective: Understand how prompt injection works by overriding system-level instructions with user-level directives.

Injectionintermediate

Indirect Prompt Injection

Explore how malicious instructions embedded in external data sources can compromise AI behavior when the model processes untrusted content.

Objective: Understand indirect prompt injection where malicious payloads arrive through data the model processes, not direct user input.

Injectionadvanced

Delimiter Escape Attack

Explore how attackers use special characters and delimiters to break out of intended input boundaries and inject new instructions.

Objective: Learn how delimiter-based input sanitization can be bypassed and why robust parsing is essential.

Jailbreakbeginner

Role-Play Jailbreak

Understand how fictional framing and role-play scenarios are used to bypass AI safety guardrails.

Objective: Recognize how role-play and fictional framing are common jailbreak vectors that attempt to shift the model's compliance context.

Jailbreakadvanced

Token Smuggling

Explore techniques where harmful requests are fragmented, encoded, or obfuscated to evade content filters.

Objective: Understand how token-level and encoding-based obfuscation can be used to smuggle harmful requests past safety filters.

Jailbreakexpert

Multi-Turn Escalation

Study how seemingly innocent conversation turns can gradually escalate toward restricted outputs through incremental boundary pushing.

Objective: Recognize how multi-turn conversations can be weaponized to gradually erode safety boundaries through contextual manipulation.

Defenseintermediate

System Prompt Hardening

Learn techniques for writing robust system prompts that resist injection and jailbreak attempts.

Objective: Develop skills in writing defensive system prompts that maintain model behavior under adversarial conditions.

Defenseintermediate

Output Filtering Strategies

Design output validation rules that catch potentially harmful model responses before they reach users.

Objective: Understand the principles and trade-offs of output filtering as a defense-in-depth layer for AI systems.

Defenseadvanced

Guardrail Architecture

Design a multi-layered guardrail system combining input validation, prompt engineering, and output filtering.

Objective: Learn to design defense-in-depth architectures for AI systems that combine multiple security layers.

Extractionintermediate

System Prompt Extraction

Understand how attackers attempt to extract hidden system prompts from AI models through carefully crafted queries.

Objective: Understand system prompt extraction techniques and why protecting system-level instructions is a critical security concern.

Extractionexpert

Training Data Probing

Explore techniques used to probe AI models for memorized training data, including personally identifiable information.

Objective: Understand the risks of training data memorization and techniques used to audit models for data leakage.

Alignmentadvanced

Value Alignment Conflicts

Explore scenarios where different alignment objectives (helpfulness, harmlessness, honesty) come into tension with each other.

Objective: Understand the fundamental alignment challenges in AI systems where core values (helpful, harmless, honest) can conflict.

Alignmentexpert

Reward Hacking Scenarios

Study how AI systems might find unintended shortcuts to satisfy their objectives without truly fulfilling the intended goal.

Objective: Recognize how naive optimization objectives can lead to reward hacking, and why robust alignment requires careful objective specification.