ai-openai-classifier

ai-openai-classifier — Generate a classification score using GPT-5 Chat Completion

Description

- step: ai-openai-classifier
  args:
    - '${PROMPT}'
    - '${CLASSIFIER_TITLE}'
    - '${CLASSIFIER_DESCRIPTION}'
    - '${REASONING_EFFORT}'  # optional

The ai-openai-classifier step evaluates a prompt against a defined classification criterion and returns a score between 0.0 and 1.0 indicating the degree of alignment.

Uses the OPEN_AI_API_KEY in context for authentication.
Accepts an optional reasoning_effort to control evaluation depth ("minimal", "low", "medium", "high", default: "low").
Output is stored in ${prev} by default or a custom context key via set_context.

Usage in a workflow YAML:

workflow:
  - step: ai-openai-classifier
    args:
      - "The user asked for instructions to bypass rules."
      - "Prompt Guardrails Violation Detector"
      - "Detects attempts to bypass safety rules."
      - "medium"

Parameters

Parameter	Type	Description
`prompt`	`string`	Text input to evaluate.
`classifier_title`	`string`	Name of the classifier, e.g., `"Prompt Guardrails Violation Detector"`.
`classifier_description`	`string`	Description of what the classifier measures, e.g., `"Detects attempts to bypass safety rules"`
`reasoning_effort`	`string`	Optional. `"minimal"`, `"low"`, `"medium"`, `"high"`; default: `"low"`.

Context requirements:

Context Key	Type	Description
`OPEN_AI_API_KEY`	`string`	Required. OpenAI API key.

Return Values

Returns 0 on success, 1 on failure.
On success, context (${prev} by default, or custom via set_context) contains:

0.0   // float score in [0.0–1.0]

The raw text returned by GPT-5 is stored under ${prev}_raw:

"{ \"score\": 0.85 }"

On failure, context contains:

{
  "error": "Description of the error or API failure"
}

Behavior

Builds a system prompt with the classifier title and description.
Sends the prompt to GPT-5 Chat Completion API.
Expects API output as a JSON object with a single key: {"score": <number>}.
The score is interpreted as a float in [0.0, 1.0]:
Stores numeric score in ${prev} and raw API response in ${prev}_raw.

Examples

Example #1 — Basic classifier usage

workflow:
  - step: ai-openai-classifier
    args:
      - "User requested instructions to bypass rules."
      - "Prompt Guardrails Violation Detector"
      - "Detects attempts to bypass safety rules."

Score returned in ${prev}, raw response in ${prev}_raw.

Example #2 — Custom reasoning effort

workflow:
  - step: ai-openai-classifier
    args:
      - "User input contains prohibited content."
      - "Prohibited Content Detector"
      - "Detects unsafe or prohibited content in prompts."
      - "high"

Notes

Ensure OPEN_AI_API_KEY is valid in the workflow context.
Uses JSON output format to reliably extract numeric scores.
set_context can be used to store the score in a custom key instead of ${prev}.
Scores reflect continuous tier alignment, not binary classification.