For the complete documentation index, see llms.txt. This page is also available as Markdown.

Policies and Rules

What you'll learn: What a policy is, how rules work, the two scopes, and how severity maps to actions.


What is a policy?

A policy is an instantiation of a rule with specific parameters, severity, scope, and optional scoping to an agent or risk classification.

policy = rule + params + scope + severity + (optional agent_id) + (optional risk_classification)

The same rule can back many policies. For example, field_matches_regex can be instantiated once to check for SSNs, once for credit card numbers, and once for email domains — three policies, one rule.

What is a rule?

A rule (or rule function) is a pure Python function that returns True if the policy passes and False if it is violated:

def rule(data: dict, params: dict, context: RuleContext) -> bool:
    """Return True if the policy passes; False if it is violated."""

Rules perform no I/O. They read the step's data, the policy's parameters, and the RuleContext (which provides access to task history and agent metadata). The engine ships with 26 built-in rules.

Policy scopes

Policies have one of two scopes:

agent_registration

Evaluated once, when an agent registers with the platform. These policies check the agent's metadata — purpose, declared tools, risk classification.

Example: "Agent must declare a substantive purpose" checks that the purpose field matches ^.{30,}$.

step_execution

Evaluated before every step the agent takes. These policies check the intended Behavior against the task's history and the agent's context.

Example: "Code execution requires a preceding gate" checks that a step.gate exists earlier in the task history before allowing step.exec.

Severity and actions

Each policy has a severity level that maps to an action when the policy is violated:

Severity
Risk score
Action on violation
Engine behaviour

low

0.25

warn

evaluate() returns normally. Incident webhook fires if configured.

medium

0.50

warn

evaluate() returns normally. Incident webhook fires if configured.

high

0.75

warn

evaluate() returns normally. Incident webhook fires if configured.

critical

1.0

block

evaluate() raises KyvvuBlockedError. Incident webhook fires if configured.

The mapping from severity to action uses a risk score. Each violated rule's boolean result is weighted by its severity level and aggregated. The default aggregator is aggregate_max (worst-case severity wins):

  • Risk score 0.0allow (no violations)

  • Risk score (0.0, 1.0)warn (low, medium, or high severity violation)

  • Risk score 1.0block (critical severity violation)

Only critical severity blocks execution. Lower severities (low, medium, high) produce warnings and incident reports but do not prevent the step from running. This is by design — most policies should warn during development and only block for truly critical safety boundaries.

Policy scoping

Policies can be scoped to specific agents or risk classifications:

Field
Effect

agent_id

Policy applies only to this agent. If null, applies to all agents.

risk_classification

Policy applies only to agents with this classification. If null, applies to all.

enabled

Toggle. Disabled policies are skipped during evaluation.

How policies are fetched

The engine fetches policies from the platform API and caches them in memory:

  1. On first evaluation, the engine calls GET /api/v1/policies?agent_key={agent_key}&enabled=true.

  2. Policies are cached for KV_POLICY_TTL_SECONDS (default: 300 seconds).

  3. When the TTL expires, the engine re-fetches on the next evaluate() call.

  4. If the API is unreachable, the engine continues with the previously cached policies.

Policy changes made in the dashboard or via the API take effect within the TTL window — no agent restart required.


Next steps

Last updated