For the complete documentation index, see llms.txt. This page is also available as Markdown.

Tasks and History

What you'll learn: What a task is, how history enables path-dependent policies, and how task lifecycle and memory management work.


What is a task?

A task is one end-to-end agent execution, identified by a task_id. It represents everything an agent does from start to finish for a single request, conversation turn, or job.

The task's history is the ordered list of completed steps within it. History is what makes path-dependent policies possible — the engine doesn't just check the current step, it checks the current step in the context of everything that came before.

Task lifecycle

start_task()
  |
  v
evaluate() -> execute -> record()    (repeat per step)
  |
  v
end_task()  or  error_task()

start_task()

Begins a new task. Returns a task_id. Optionally accepts a caller-provided task_id.

The SDK emits a task.start Behavior that flows through the normal evaluate() path — registration-check and step policies both run.

evaluate() / record() cycle

For each step within the task:

  1. evaluate(intended, context) — checks the intended Behavior against policies and history. Returns allow, warn, or block.

  2. The caller executes the step.

  3. record(completed_step) — appends the step (with output) to the task's history. Assigns a monotonic step number.

end_task(task_id)

Normal completion. The SDK emits a task.end Behavior through evaluate(), then calls end_task() on the engine. This:

  • Evicts the task's history from memory

  • Flushes buffered logs to the log endpoint (if configured)

No policies run in end_task() itself — it is cleanup. To enforce task-end invariants, use policies that match on the task.end behaviour type.

error_task(task_id)

Abnormal termination. Same cleanup as end_task(), but emits a task.error Behavior first. History is evicted identically.

Path-dependent evaluation

Because evaluate() reads the full task history, policies can express path-dependent constraints:

  • "Code execution requires a preceding gate" — step_requires_gate checks whether step.gate exists anywhere in history before step.exec.

  • "External content taint requires a fresh gate" — step_directly_preceded_by checks whether the immediately previous step was a step.gate.

  • "No more than 50 resource calls" — execution_max_steps counts step.resource entries in history.

  • "Forbidden sequence" — sequence_forbidden checks whether a specific ordered sequence has occurred.

The same step can be allowed or blocked depending on what happened earlier. This is the core of the "policies on paths" model.

Memory management

Task history lives in memory, keyed by task_id. Three mechanisms prevent unbounded growth:

Normal termination: end_task(task_id) evicts history from memory and flushes buffered logs.

Automatic background sweeper: A daemon thread calls sweep_stale_tasks() every KV_SWEEP_INTERVAL_SECONDS (default: 300s / 5 minutes). Tasks older than KV_TASK_MAX_AGE_SECONDS (default: 3600s / 1 hour) are evicted automatically — no manual wiring needed.

Pause-aware eviction: When a task records a task.idle Behavior (e.g., waiting for human input), the sweep TTL clock resets. This prevents idle-but-legitimate tasks from being evicted prematurely. If the task remains idle beyond a full TTL window with no further activity, it is still evicted.

Flush-before-evict: When the sweeper evicts an abandoned task, it attempts a best-effort batch post of the buffered steps to the log endpoint (if KV_SWEEP_FLUSH_ON_EVICT=true, which is the default). This captures partial traces for debugging even when tasks crash.

Config
Default
Purpose

KV_TASK_MAX_AGE_SECONDS

3600

Max age before a task is considered abandoned

KV_SWEEP_INTERVAL_SECONDS

300

How often the sweeper runs

KV_SWEEP_ENABLED

true

Set to false to disable automatic sweeping

KV_SWEEP_FLUSH_ON_EVICT

true

Post partial traces before discarding

A new task_id always starts with a fresh, empty history. Task histories are completely independent — there is no relationship between tasks.

Concurrent tasks

In web servers and concurrent agents, multiple tasks may be active simultaneously. The SDK uses a ContextVar to track the active task_id per execution context (thread or asyncio task). The @kv.step decorator automatically reads the active task ID from this context.

For explicit control:


Next steps

Last updated