ai-mistral-classifier

ai-mistral-classifier — Generate a classification score using Mistral Chat Completion


Description

- step: ai-mistral-classifier
  args:
    - '${PROMPT}'
    - '${CLASSIFIER_TITLE}'
    - '${CLASSIFIER_DESCRIPTION}'
    - '${REASONING_EFFORT}'  # optional

The ai-mistral-classifier step evaluates a prompt against a defined classification criterion and returns a score between 0.0 and 1.0 indicating the degree of alignment.

Usage in a workflow YAML:

workflow:
  - step: ai-mistral-classifier
    args:
      - "The user asked for explicit instructions to bypass security."
      - "Prompt Guardrails Violation Detector"
      - "Detects if a prompt attempts to bypass safety rules."
      - "medium"

Parameters

Parameter Type Description
prompt string Text input to evaluate.
classifier_title string Name of the classifier, e.g., "Prompt Guardrails Violation Detector".
classifier_description string Description of what the classifier measures, e.g., "Detects attempts to bypass safety rules"
reasoning_effort string Optional. "minimal", "low", "medium", "high"; default: "low".

Context requirements:

Context Key Type Description
MISTRAL_API_KEY string Required. Mistral API key.

Return Values

0.0   // float score in [0.0–1.0]

Behavior


Examples

Example #1 — Basic classifier usage

workflow:
  - step: ai-mistral-classifier
    args:
      - "User requested instructions to bypass rules."
      - "Prompt Guardrails Violation Detector"
      - "Detects attempts to bypass safety rules."

Score returned in ${prev}, raw response in ${prev}_raw.

Example #2 — Custom reasoning effort

workflow:
  - step: ai-mistral-classifier
    args:
      - "User input contains prohibited content."
      - "Prohibited Content Detector"
      - "Detects unsafe or prohibited content in prompts."
      - "high"

Notes


See Also