
What is Guardrails?
Guardrails is a powerful tool that help with faster adoption of Large Language Models (LLMs) in your organization. It provides an API to that detects and prevents security and privacy challenges such as Prompt Injection, Toxicity, NSFW, PII, and more.What are detectors and what are they used for?
Detectors are a set of rules that Guardrails uses to detect and prevent security and privacy issues. They are used to identify potential issues in your code and provide guidance on how to fix them.Prompt Injection Detector
Recognizes patterns that may indicate an prompt injection attack or jailbreak attempt protecting the system from malicious inputs. The detector returns a binary value (0 for benign, 1 for attack) for indicating whether the input is a potential injection attempt and also provides confidence levels for the detection.Toxicity Detector
The detector provides a detailed analysis of the text’s toxicity, indicating the presence and levels of general toxicity, obscene language, insults, threats, and identity hate. The detector returns a list of dictionaries, each containing the following information:- score: The score of the text on a scale of 0 to 100, with higher scores indicating a higher level of toxicity.
- type: The type of toxicity detected, such as “obscene”, “insult”, “threat”, or “identity_hate”.