Hero Light

What is Guardrails?

Guardrails is a powerful tool that help with faster adoption of Large Language Models (LLMs) in your organization. It provides an API to that detects and prevents security and privacy challenges such as Prompt Injection, Toxicity, NSFW, PII, and more.

What are detectors and what are they used for?

Detectors are a set of rules that Guardrails uses to detect and prevent security and privacy issues. They are used to identify potential issues in your code and provide guidance on how to fix them.

Prompt Injection Detector

Recognizes patterns that may indicate an prompt injection attack or jailbreak attempt protecting the system from malicious inputs. The detector returns a binary value (0 for benign, 1 for attack) for indicating whether the input is a potential injection attempt and also provides confidence levels for the detection.

Toxicity Detector

The detector provides a detailed analysis of the text’s toxicity, indicating the presence and levels of general toxicity, obscene language, insults, threats, and identity hate. The detector returns a list of dictionaries, each containing the following information:

  • score: The score of the text on a scale of 0 to 100, with higher scores indicating a higher level of toxicity.
  • type: The type of toxicity detected, such as “obscene”, “insult”, “threat”, or “identity_hate”.

NSFW Detector

Analyses text for content that may be inappropriate for workplace environments, such as explicit language or adult themes, ensuring professionalism and compliance with workplace standards. The detector returns a binary value (0 for sfw, 1 for nsfw) indicating whether the text is not safe for work.

PII Detector

Detects the presence of Personally Identifiable Information (PII) within the text, helping to prevent the unintentional disclosure of sensitive data. The detector returns secrets, PII, IP addresses, and URLs.

Topic Detector

Analyzes text to determine if it aligns with the given specific topic or field of interest. You can provide the topic keyword as a string or a list of strings. The detector returns a binary value (0 for off-topic, 1 for on-topic) indicating whether the text aligns with the given topic.

Keyword Detector

This Detector is designed to scan text for specific words or phrases that are predefined as significant, sensitive, or requiring special attention, such as banned or proprietary terms. In the provided output, you can expect to see which keywords were detected, the count of each keyword’s occurrence, and a version of the text with the detected keywords redacted to maintain confidentiality or compliance.

Hallucination Detector

This Detector checks the given output of a model for possibile hallucination given the context of the original prompt request. The detector returns a 1 for possible hallucination detected and 0 for no hallucination detected.