# agent-platform

基于 Python 的多服务智能体开发平台脚手架。

当前仓库已经初始化为 Monorepo，包含：

- `services/`：核心微服务
- `libs/`：共享领域模型、DSL、事件、数据库和公共组件
- `deployments/`：本地和集群部署占位
- `docs/`：规划和数据库设计文档

## 当前已创建的服务

- `api-gateway`
- `model-gateway-service`
- `session-service`
- `workflow-service`
- `runtime-service`
- `agent-service`
- `memory-service`
- `team-service`
- `skill-service`
- `human-service`
- `knowledge-service`
- `event-service`
- `auth-service`
- `scheduler-service`
- `tool-service`

每个服务都提供了最小 `FastAPI` 启动入口和健康检查接口，数据库相关服务也已经带上了 `SQLAlchemy` 模型骨架与 Alembic 目录。

## 当前已创建的共享库

- `core-domain`
- `core-dsl`
- `core-events`
- `core-shared`
- `core-db`

## 推荐本地开发方式

建议使用 `uv` 或 `pip` 创建虚拟环境后安装各服务依赖。

```powershell
cd D:\workspace\auto-platform
python -m venv .venv
.venv\Scripts\activate
pip install -e .\libs\core-shared
pip install -e .\libs\core-domain
pip install -e .\libs\core-dsl
pip install -e .\libs\core-events
pip install -e .\libs\core-db
pip install -e .\services\api-gateway
pip install -e .\services\session-service
pip install -e .\services\workflow-service
pip install -e .\services\runtime-service
pip install -e .\services\agent-service
pip install -e .\services\memory-service
pip install -e .\services\team-service
pip install -e .\services\skill-service
pip install -e .\services\human-service
pip install -e .\services\knowledge-service
pip install -e .\services\event-service
pip install -e .\services\auth-service
pip install -e .\services\scheduler-service
pip install -e .\services\tool-service
```

运行示例：

```powershell
cd D:\workspace\auto-platform\services\api-gateway
uvicorn app.main:app --reload --port 8000
```

数据库连接默认使用各服务目录下的 SQLite 文件，也可以通过环境变量覆盖：

```powershell
$env:AGENT_PLATFORM_DATABASE_URL="postgresql+psycopg://user:password@localhost:5432/workflow_db"
```

## 数据层脚手架

本轮已经加入：

- `libs/core-db`：统一 `SQLAlchemy` Base、通用 mixin、命名约定
- `workflow-service`：应用与流程定义模型
- `session-service`：会话与消息模型
- `runtime-service`：运行与节点执行模型
- `tool-service`：工具定义与绑定模型
- 每个服务独立的 `alembic.ini`、`env.py`、`versions/`
- `workflow-service`：已接入 repository / application service / CRUD API
- `session-service`：已接入 repository / application service / CRUD API

迁移执行示例：

```powershell
cd D:\workspace\auto-platform\services\workflow-service
alembic upgrade head
```

其他服务同理：

- `services/session-service`
- `services/runtime-service`
- `services/tool-service`

接口示例：

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8002/workflows/apps `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","code":"sales_assistant","name":"Sales Assistant"}'
```

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8001/sessions `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","app_id":"app-1","user_id":"user-1","channel_type":"web"}'
```

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8002/workflows/versions `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","workflow_id":"wf-1","dsl_json":{"nodes":[],"edges":[]}}'
```

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8001/sessions/run-requests `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","session_id":"sess-1","app_version_id":"appv-1","workflow_version_id":"wfv-1"}'
```

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8003/runtime/runs `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","app_id":"app-1","app_version_id":"appv-1","workflow_id":"wf-1","workflow_version_id":"wfv-1","session_id":"sess-1","initial_node":{"node_id":"start","node_type":"llm"}}'
```

如果不传 `initial_node`，`runtime-service` 会调用 `workflow-service` 读取对应的 `workflow version`，并从 DSL 中自动推导首节点：

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8003/runtime/runs `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","app_id":"app-1","app_version_id":"appv-1","workflow_id":"wf-1","workflow_version_id":"wfv-1","session_id":"sess-1"}'
```

一条链直接派发到 runtime：

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8001/sessions/run-requests/dispatch `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","session_id":"sess-1","app_id":"app-1","app_version_id":"appv-1","workflow_id":"wf-1","workflow_version_id":"wfv-1","initial_node":{"node_id":"start","node_type":"llm"}}'
```

工具定义示例：

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8004/tools `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","code":"search_products","name":"Search Products","tool_type":"http"}'
```

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8004/tools/versions `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","tool_id":"tool-1","input_schema_json":{"query":{"type":"string"}},"invoke_config_json":{"method":"GET","path":"/products/search"}}'
```

运行状态推进示例：

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8003/runtime/node-runs/node-run-id/status `
  -ContentType "application/json" `
  -Body '{"status":"running","worker_key":"runtime-worker-1"}'
```

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8003/runtime/runs/run-id/status `
  -ContentType "application/json" `
  -Body '{"status":"completed"}'
```

说明：

- 当你调用 `node-runs/{node_run_id}/status` 更新节点状态时，`runtime-service` 会自动聚合当前运行下所有 `node_run` 的状态，并同步刷新 `workflow_run.status`
- 当前规则是：任一节点 `failed` 则运行 `failed`；有节点 `running` 则运行 `running`；全部节点都为 `completed/skipped` 则运行 `completed`
- 当某个 `node_run` 被更新为 `completed` 时，`runtime-service` 还会基于 `workflow version` 的 DSL 自动查找后继节点，并创建新的 `queued` 状态 `node_run`

## 目录结构

```text
services/
  api-gateway/
  session-service/
  workflow-service/
  runtime-service/
  skill-service/
  human-service/
  knowledge-service/
  event-service/
  auth-service/
  scheduler-service/
  tool-service/
libs/
  core-domain/
  core-dsl/
  core-events/
  core-shared/
  core-db/
deployments/
  docker/
  k8s/
docs/
tests/
```

## 下一步建议

1. 补齐 `V0.1` 的 repository / service 层
2. 写第一版 Alembic 初始迁移
3. 接入 PostgreSQL / Redis
4. 增加 Docker Compose
5. 开始实现应用、流程、运行三条主链路

## Runtime Execute APIs

## Agent Service APIs

`agent-service` stores strongly typed agent definitions, versioned prompts/configuration, and agent run records.

Create an agent:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8007/agents `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","code":"sales_agent","name":"Sales Agent","agent_type":"assistant"}'
```

Create a published agent version:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8007/agents/versions `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","agent_id":"agent-id","status":"published","role":"sales_assistant","goal":"Help qualify leads","system_prompt":"You are a careful sales assistant."}'
```

Enable multi-step ReAct planning for an agent version:

```json
{
  "model_config": {
    "react_enabled": true,
    "react_max_steps": 5
  }
}
```

When ReAct is enabled, the model can emit JSON tool actions such as
`{"action":"tool","tool_code":"lookup_order","input_json":{"order_id":"123"}}`
and then finish with `{"action":"finish","answer":"..."}`. Each tool call is
persisted in `agent_tool_invocation`.

Create an agent run. If `agent_version_id` is omitted, the latest published version is used:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8007/agents/runs `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","agent_id":"agent-id","session_id":"session-id","input_text":"Summarize this lead."}'
```

List tool invocation records for an agent run:

```powershell
Invoke-RestMethod `
  -Uri "http://127.0.0.1:8007/agents/runs/agent-run-id/tool-invocations?tenant_id=t1"
```

Agent execution now persists tool invocation audit records with selected,
running, skipped, completed, or failed status, including input/output payloads
and `started_time` / `finished_time`.

Through `api-gateway`, use `/gateway/agents/**`.

## Memory Service APIs

`memory-service` stores scoped memories for tenants, users, sessions, agents, and teams. The first version uses database text search so it works without vector infrastructure; pgvector can be added later behind the same API.

Memory search now stores a local deterministic embedding per memory and uses hybrid rerank:

- `keyword_score`: token overlap and frequency
- `vector_score`: cosine similarity over local hash embeddings
- `importance_score`: normalized memory importance boost
- `rerank_mode`: `hybrid-local`

Create a memory:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8008/memories `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","scope_type":"session","scope_id":"session-id","memory_type":"fact","content_text":"User prefers concise answers.","importance_score":80}'
```

Search memories:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8008/memories/search `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","query":"concise","scope_type":"session","scope_id":"session-id","limit":5}'
```

Through `api-gateway`, use `/gateway/memories/**`.

## Team Service APIs

`team-service` stores multi-agent team definitions, versioned member composition, coordination mode, and team run records. The first version provides the team management backbone; later versions can connect team runs to supervisor/planner/member agent execution.

Create a team:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8009/teams `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","code":"research_team","name":"Research Team","team_type":"collaborative"}'
```

Create a published team version:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8009/teams/versions `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","team_id":"team-id","status":"published","coordination_mode":"supervisor","objective":"Research and summarize complex questions","member_refs":[{"member_key":"lead","agent_id":"agent-lead","role":"supervisor","responsibility":"Plan and assign work"},{"member_key":"writer","agent_id":"agent-writer","role":"executor","responsibility":"Draft final answer"}]}'
```

Create a team run. If `team_version_id` is omitted, the latest published version is used:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8009/teams/runs `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","team_id":"team-id","session_id":"session-id","input_text":"Analyze this customer request."}'
```

Through `api-gateway`, use `/gateway/teams/**`.

Execute a team run. The first implementation creates and executes one agent run
per member, then stores a team-level summary. `dry_run=true` lets this work
without model API keys:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8009/teams/runs/team-run-id/execute `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","worker_key":"team-worker-1","dry_run":true}'
```

Execute one queued team run through the worker claim API:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8009/teams/workers/execute-next `
  -ContentType "application/json" `
  -Body '{"worker_key":"team-worker-1","lease_seconds":300,"dry_run":true}'
```

Run a standalone team worker process:

```powershell
Push-Location .\services\team-service
$env:AGENT_PLATFORM_DATABASE_URL="sqlite:///./team_service.db"
$env:AGENT_PLATFORM_WORKER_DRY_RUN="true"
..\..\.venv\Scripts\python -m app.worker
Pop-Location
```

## Skill Service APIs

`skill-service` stores reusable skill definitions, versioned parameter/output schemas,
marketplace-style installations, and executable skill runs. The first executor supports a
dependency-free `template` runtime so local development works without API keys.

Create a skill:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8010/skills `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","code":"hello_user","name":"Hello User","skill_type":"template"}'
```

Create a published skill version:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8010/skills/versions `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","skill_id":"skill-id","status":"published","runtime_type":"template","parameter_schema_json":{"name":{"type":"string"}},"implementation_json":{"template":"Hello $name"}}'
```

Install the skill for a tenant, agent, team, app, or user scope:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8010/skills/installations `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","skill_id":"skill-id","install_scope":"tenant","scope_id":"t1","installed_by":"user-1"}'
```

Create and execute a skill run:

```powershell
$run = Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8010/skills/runs `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","skill_id":"skill-id","input_json":{"name":"Lucas"}}'

Invoke-RestMethod -Method Post `
  -Uri "http://127.0.0.1:8010/skills/runs/$($run.id)/execute" `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","worker_key":"skill-worker-1"}'
```

Through `api-gateway`, use `/gateway/skills/**`.

## Human Service APIs

`human-service` stores human-in-the-loop tasks for approval, input collection,
takeover, pause, and resume flows.

Create an approval task:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8011/human/tasks `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","task_type":"approval","title":"Approve refund","run_id":"run-id","node_run_id":"node-run-id","assigned_to":"ops-1","request_payload_json":{"amount":99}}'
```

Claim and complete a task:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8011/human/tasks/human-task-id/claim `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","claimed_by":"ops-1"}'

Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8011/human/tasks/human-task-id/complete `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","status":"approved","response_payload_json":{"approved":true}}'
```

Through `api-gateway`, use `/gateway/human/**`.

Runtime human-in-the-loop nodes now create `human-service` tasks and pause the
node in `pending` status until the task is completed. Supported node types:

- `human`
- `approval`
- `human-input`
- `human-takeover`

After completing the human task, resume the blocked node:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8003/runtime/node-runs/node-run-id/resume-human `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","human_task_id":"human-task-id","worker_key":"runtime-worker-1"}'
```

## Knowledge Service APIs

`knowledge-service` stores independent knowledge bases, documents, chunks, and
retrieval results. It defaults to deterministic local hash embeddings plus keyword
scoring, so it works without external API keys. For production, set
`AGENT_PLATFORM_EMBEDDING_PROVIDER=http` with an OpenAI-compatible
`/embeddings` endpoint; if the provider fails and fallback is enabled, indexing
and search fall back to local hash embeddings.

Create a knowledge base:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8012/knowledge/bases `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","code":"support_kb","name":"Support Knowledge Base"}'
```

Create and index a document:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8012/knowledge/documents `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","knowledge_base_id":"kb-id","title":"Refund Policy","content_text":"Refunds are available within seven days for eligible orders.","source_type":"text"}'
```

Search the knowledge base:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8012/knowledge/search `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","knowledge_base_id":"kb-id","query":"refund within seven days","top_k":3}'
```

Through `api-gateway`, use `/gateway/knowledge/**`.

## Event Service APIs

`event-service` stores platform events with delivery status so services can use
a durable outbox pattern now, and later swap delivery to Kafka/RabbitMQ behind
the same API.

Publish an event:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8013/events `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","event_type":"run.created","source_service":"runtime-service","aggregate_type":"workflow_run","aggregate_id":"run-id","payload_json":{"run_id":"run-id"}}'
```

Claim pending events for a delivery worker:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8013/events/claim-pending `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","limit":50}'
```

Through `api-gateway`, use `/gateway/events/**`.

## Auth Service APIs

`auth-service` stores users, roles, role assignments, and permission checks.
This is the first RBAC layer for tenant governance.

```powershell
$user = Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8014/auth/users `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","username":"alice","display_name":"Alice"}'

$role = Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8014/auth/roles `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","code":"admin","name":"Admin","permissions_json":["*"]}'

Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8014/auth/assignments `
  -ContentType "application/json" `
  -Body "{`"tenant_id`":`"t1`",`"user_id`":`"$($user.id)`",`"role_id`":`"$($role.id)`"}"

Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8014/auth/permissions/check `
  -ContentType "application/json" `
  -Body "{`"tenant_id`":`"t1`",`"user_id`":`"$($user.id)`",`"permission`":`"workflow:write`"}"
```

Through `api-gateway`, use `/gateway/auth/**`.

## Scheduler Service APIs

`scheduler-service` stores delayed jobs and due-job leases for time-based
automation. It is intentionally service-neutral: jobs can target HTTP,
event, runtime, agent, or team execution.

Create a scheduled job:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8015/scheduler/jobs `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","job_type":"runtime","name":"Run workflow later","schedule_time":"2026-04-26T12:00:00Z","payload_json":{"workflow_run_id":"run-id"}}'
```

Claim due jobs for a worker:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8015/scheduler/jobs/claim-due `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","worker_key":"scheduler-worker-1","limit":20}'
```

Mark a job completed or failed:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8015/scheduler/jobs/job-id/status `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","status":"completed"}'
```

Through `api-gateway`, use `/gateway/scheduler/**`.

Run the scheduler worker locally:

```powershell
Push-Location .\services\scheduler-service
$env:AGENT_PLATFORM_DATABASE_URL="sqlite:///./scheduler_service.db"
$env:AGENT_PLATFORM_EVENT_SERVICE_URL="http://127.0.0.1:8013"
python -m app.worker
Pop-Location
```

Execute an agent run without calling an external model:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8007/agents/runs/agent-run-id/execute `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","worker_key":"agent-worker-1","dry_run":true}'
```

Execute with `model-gateway-service`:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8007/agents/runs/agent-run-id/execute `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","worker_key":"agent-worker-1"}'
```

Agent memory policy is stored on `agent_version.memory_policy_json`:

- `enabled`: read memories before execution
- `memory_scope`: one of `tenant`, `user`, `session`, `agent`, or `team`
- `read_top_k`: maximum memories to inject into the prompt
- `write_enabled`: write a conversation memory after successful model execution
- `config_json.write_importance_score`: optional importance score for written memories

Agent capability refs are stored on `agent_version.tool_refs_json` and
`agent_version.skill_refs_json`.

- Tool refs are selected when `required=true`, `config_json.auto_invoke=true`, or `selection_keywords` match the run input.
- Skill refs are selected by default unless `config_json.auto_invoke=false`; `selection_keywords` can also select them.
- Dry-run execution returns `selected_tool_refs` and `selected_skill_refs` without calling downstream tools/skills.
- Normal execution invokes selected HTTP tool bindings and selected skills before the model call, then injects their results into the model messages.

Example version with session memory:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8007/agents/versions `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","agent_id":"agent-id","status":"published","role":"assistant","system_prompt":"Use relevant memory when helpful.","memory_policy":{"enabled":true,"memory_scope":"session","read_top_k":5,"write_enabled":true,"config_json":{"write_importance_score":60}}}'
```

Execute one queued agent run through the worker claim API:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8007/agents/workers/execute-next `
  -ContentType "application/json" `
  -Body '{"worker_key":"agent-worker-1","lease_seconds":300,"dry_run":true}'
```

Run a standalone agent worker process:

```powershell
Push-Location .\services\agent-service
$env:AGENT_PLATFORM_DATABASE_URL="sqlite:///./agent_service.db"
$env:AGENT_PLATFORM_WORKER_DRY_RUN="true"
..\..\.venv\Scripts\python -m app.worker
Pop-Location
```

`runtime-service` now includes a typed executor skeleton for these node types:

- `llm`
- `tool`
- `code`
- `human`
- `approval`
- `human-input`
- `human-takeover`
- `answer`
- `if-else`
- `assigner`
- `knowledge-retrieval`
- `template-transform`

Execute a specific queued node:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8003/runtime/node-runs/node-run-id/execute `
  -ContentType "application/json" `
  -Body '{"worker_key":"runtime-worker-1"}'
```

Execute the next queued node in a run:

```powershell
Invoke-RestMethod -Method Post `
  -Uri "http://127.0.0.1:8003/runtime/runs/run-id/execute-next?tenant_id=t1" `
  -ContentType "application/json" `
  -Body '{"worker_key":"runtime-worker-1"}'
```

Execute queued nodes in sequence until the run is finished, blocked, or reaches `max_steps`:

```powershell
Invoke-RestMethod -Method Post `
  -Uri "http://127.0.0.1:8003/runtime/runs/run-id/execute?tenant_id=t1" `
  -ContentType "application/json" `
  -Body '{"worker_key":"runtime-worker-1","max_steps":16}'
```

Execute one queued node through the worker claim API:

```powershell
Invoke-RestMethod -Method Post `
  -Uri "http://127.0.0.1:8003/runtime/workers/execute-next" `
  -ContentType "application/json" `
  -Body '{"worker_key":"runtime-worker-1","lease_seconds":300}'
```

Run a standalone runtime worker process:

```powershell
Push-Location .\services\runtime-service
$env:AGENT_PLATFORM_DATABASE_URL="sqlite:///./runtime_service.db"
..\..\.venv\Scripts\python -m app.worker
Pop-Location
```

The worker uses `node_run.status` plus `lease_expire_time` as a DB-backed queue. This keeps the first scalable version dependency-light; for heavier production concurrency, move `AGENT_PLATFORM_DATABASE_URL` to PostgreSQL before scaling many workers.

Node execution results are now persisted on `node_run`:

- `output_text`
- `output_json`

Node execution artifacts are also persisted on `node_artifact`:

- `artifact_type`
- `content_text`
- `content_json`
- `storage_uri`
- `size_bytes`

Query artifacts:

```powershell
Invoke-RestMethod `
  -Uri "http://127.0.0.1:8003/runtime/node-artifacts?tenant_id=t1&run_id=run-id"
```

Trace spans are persisted on `trace_span` for timeline and latency analysis:

- `span_type`
- `name`
- `status`
- `started_time`
- `ended_time`
- `duration_ms`
- `attributes_json`
- `error_code`
- `error_message`

Query trace spans:

```powershell
Invoke-RestMethod `
  -Uri "http://127.0.0.1:8003/runtime/trace-spans?tenant_id=t1&run_id=run-id"
```

Current behavior:

- `answer` nodes persist rendered text to `output_text`
- `assigner` nodes write `state_updates` to `output_json`
- `condition` / `if-else` nodes write `condition_result` and `route` to `output_json`
- `template-transform` nodes render text or JSON using previous node outputs and run state
- `knowledge-retrieval` / `retriever` nodes run keyword retrieval over inline or HTTP JSON documents
- `tool` nodes persist resolved binding/tool metadata to `output_json`
- default executors persist basic executor metadata to `output_json`
- parallel fan-out is supported by defining multiple outgoing edges from one node
- join nodes wait for predecessor completion with `config.join_policy`
- loop/re-entry is supported with `config.allow_loop=true` and `config.max_iterations`
- retry is supported with `config.retry_policy.max_attempts` and `retry_delay_seconds`
- delayed scheduling and node timeout use `config.delay_seconds` and `config.timeout_seconds`
- compensation nodes can be queued on failure with `config.compensation_node_id`

Runtime template context:

- `state.xxx`: values written by previous `assigner` nodes
- `nodes.node_id.output.xxx`: structured output from a previous node
- `nodes.node_id.text`: text output from a previous node
- `current.node_id`: current node id

Assigner node config example:

```json
{
  "id": "seed-state",
  "type": "assigner",
  "config": {
    "assignments": {
      "score": 7,
      "user_name": "Alice"
    }
  }
}
```

Condition node config example:

```json
{
  "id": "check-score",
  "type": "if-else",
  "config": {
    "expression": "state.score >= 5"
  }
}
```

Conditional edge example:

```json
[
  {"source": "check-score", "target": "high-path", "condition": "true"},
  {"source": "check-score", "target": "low-path", "condition": "false"}
]
```

Join node config example:

```json
{
  "id": "join-results",
  "type": "join",
  "config": {
    "join_policy": "all_completed"
  }
}
```

Loop and retry config example:

```json
{
  "id": "poll-status",
  "type": "tool",
  "config": {
    "allow_loop": true,
    "max_iterations": 5,
    "timeout_seconds": 30,
    "retry_policy": {
      "max_attempts": 3,
      "retry_delay_seconds": 2
    }
  }
}
```

Compensation config example:

```json
{
  "id": "charge-card",
  "type": "tool",
  "config": {
    "compensation_node_id": "refund-card"
  }
}
```

Template node config example:

```json
{
  "id": "high-path",
  "type": "template-transform",
  "config": {
    "template": "{{state.user_name}} passed with score {{state.score}}"
  }
}
```

Retriever node config example:

```json
{
  "id": "retrieve-docs",
  "type": "knowledge-retrieval",
  "config": {
    "query_template": "{{state.query}}",
    "top_k": 2,
    "documents": [
      {
        "id": "refund",
        "title": "Refund Policy",
        "text": "Refund policy allows returns within seven days."
      },
      {
        "id": "shipping",
        "title": "Shipping Policy",
        "text": "Shipping usually takes three to five business days."
      }
    ]
  }
}
```

Retriever nodes can call `knowledge-service` directly:

```json
{
  "id": "retrieve-kb",
  "type": "knowledge-retrieval",
  "config": {
    "knowledge_base_id": "kb-id",
    "query_template": "{{state.query}}",
    "top_k": 3,
    "filters_json": {
      "source_type": "text"
    }
  }
}
```

Retriever output is persisted to `node_run.output_json.retrieved_documents`. Template nodes can consume it:

```json
{
  "id": "render-answer",
  "type": "template-transform",
  "config": {
    "template": "Top doc: {{nodes.retrieve-docs.output.retrieved_documents.0.title}}"
  }
}
```

Retriever nodes can also load documents from an HTTP JSON source:

```json
{
  "id": "retrieve-remote-docs",
  "type": "retriever",
  "config": {
    "query": "refund policy",
    "source_url": "http://127.0.0.1:9000/documents",
    "top_k": 3
  }
}
```

The HTTP source should return either a document list or an object with a `documents` list.

Run the no-key runtime smoke test after local services are running:

```powershell
.\.venv\Scripts\python scripts\smoke_runtime_no_key.py
```

Run the same smoke test through `api-gateway`:

```powershell
$env:AGENT_PLATFORM_SMOKE_WORKFLOW_URL="http://127.0.0.1:8000/gateway/workflows"
$env:AGENT_PLATFORM_SMOKE_RUNTIME_URL="http://127.0.0.1:8000/gateway/runtime"
.\.venv\Scripts\python scripts\smoke_runtime_no_key.py
```

## API Gateway

`api-gateway` provides a unified entrypoint:

- `GET /gateway/services/health`
- `/gateway/workflows/**` -> `workflow-service /workflows/**`
- `/gateway/sessions/**` -> `session-service /sessions/**`
- `/gateway/runtime/**` -> `runtime-service /runtime/**`
- `/gateway/agents/**` -> `agent-service /agents/**`
- `/gateway/memories/**` -> `memory-service /memories/**`
- `/gateway/teams/**` -> `team-service /teams/**`
- `/gateway/skills/**` -> `skill-service /skills/**`
- `/gateway/human/**` -> `human-service /human/**`
- `/gateway/knowledge/**` -> `knowledge-service /knowledge/**`
- `/gateway/events/**` -> `event-service /events/**`
- `/gateway/auth/**` -> `auth-service /auth/**`
- `/gateway/scheduler/**` -> `scheduler-service /scheduler/**`
- `/gateway/tools/**` -> `tool-service /tools/**`
- `/gateway/models/**` -> `model-gateway-service /models/**`
- `/gateway/code/**` -> `code-runner-service /code/**`

Gateway readiness:

```powershell
Invoke-RestMethod -Uri "http://127.0.0.1:8000/ready"
```

Downstream health:

```powershell
Invoke-RestMethod -Uri "http://127.0.0.1:8000/gateway/services/health"
```

Gateway request context:

- Incoming `x-request-id` is reused; otherwise gateway generates one.
- Incoming `x-tenant-id` is reused; otherwise gateway falls back to `tenant_id` query parameter, then `public`.
- Gateway forwards both `x-request-id` and `x-tenant-id` to downstream services.
- Gateway writes request audit records to `gateway_request_audit`.

Query gateway audits:

```powershell
Invoke-RestMethod `
  -Uri "http://127.0.0.1:8000/gateway/audits?tenant_id=t1&limit=20" `
  -Headers @{"x-tenant-id"="t1"}
```

Query gateway audit stats:

```powershell
Invoke-RestMethod `
  -Uri "http://127.0.0.1:8000/gateway/audits/stats?tenant_id=t1" `
  -Headers @{"x-tenant-id"="t1"}
```

Gateway API Key auth:

- `AGENT_PLATFORM_AUTH_REQUIRED=false` by default for local development.
- Set `AGENT_PLATFORM_AUTH_REQUIRED=true` to protect `/gateway/**`, except `/gateway/services/health`.
- API keys are stored as SHA-256 hashes. The raw key is only returned once at creation.
- When auth is enabled and no API key exists yet, the first `POST /gateway/api-keys` is allowed as bootstrap.
- API keys can be `active`, `disabled`, or `revoked`; only `active` keys are accepted.
- If an API key has `scopes`, gateway checks them before proxying. Use `*`, `gateway:agents:*`, or exact permissions such as `gateway:agents:read`.
- Set `AGENT_PLATFORM_AUTHZ_REQUIRED=true` to require `x-user-id` and call `auth-service` `/auth/permissions/check` for the derived permission.

Create an API key:

```powershell
$body = @{
  tenant_id = "t1"
  name = "local-dev"
  scopes = "gateway:agents:* gateway:runtime:read"
} | ConvertTo-Json

$created = Invoke-RestMethod `
  -Method Post `
  -Uri "http://127.0.0.1:8000/gateway/api-keys" `
  -ContentType "application/json" `
  -Body $body

$created.api_key
```

Use an API key:

```powershell
Invoke-RestMethod `
  -Uri "http://127.0.0.1:8000/gateway/audits?tenant_id=t1" `
  -Headers @{"x-tenant-id"="t1"; "x-api-key"=$created.api_key}
```

Disable or revoke an API key:

```powershell
$body = @{
  tenant_id = "t1"
  status = "revoked"
} | ConvertTo-Json

Invoke-RestMethod `
  -Method Patch `
  -Uri "http://127.0.0.1:8000/gateway/api-keys/$($created.id)/status" `
  -ContentType "application/json" `
  -Headers @{"x-tenant-id"="t1"; "x-api-key"=$created.api_key} `
  -Body $body
```

Run smoke test through an authenticated gateway:

```powershell
$env:AGENT_PLATFORM_SMOKE_WORKFLOW_URL="http://127.0.0.1:8000/gateway/workflows"
$env:AGENT_PLATFORM_SMOKE_RUNTIME_URL="http://127.0.0.1:8000/gateway/runtime"
$env:AGENT_PLATFORM_SMOKE_TENANT_ID="t1"
$env:AGENT_PLATFORM_SMOKE_API_KEY=$created.api_key
.\.venv\Scripts\python scripts\smoke_runtime_no_key.py
```

HTTP tool node config example:

```json
{
  "id": "search-products",
  "type": "tool",
  "config": {
    "tool_binding_id": "binding-1",
    "query": {
      "keyword": "milk"
    }
  }
}
```

Supported HTTP tool config resolution order:

- URL: `config.url` or `invoke_config_json.url`
- Base URL: `config.base_url` or `binding.config_json.base_url` or `invoke_config_json.base_url`
- Path: `config.path` or `invoke_config_json.path`
- Method: `invoke_config_json.method`, default `GET`
- Query params: merge `invoke_config_json.query` + `config.query`
- Body JSON: merge `invoke_config_json.body` + `config.body`
- Headers: merge `invoke_config_json.headers` + `binding.config_json.headers` + `config.headers`

LLM node config example:

```json
{
  "id": "draft-answer",
  "type": "llm",
  "config": {
    "model": "gpt-4o-mini",
    "system_prompt": "You are a customer support assistant.",
    "prompt": "Summarize the user intent in Chinese.",
    "temperature": 0.2,
    "max_tokens": 400
  }
}
```

`llm` nodes also support explicit `messages`:

```json
{
  "id": "rewrite-message",
  "type": "llm",
  "config": {
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "system", "content": "You are a concise editor."},
      {"role": "user", "content": "Rewrite this sentence in a warmer tone."}
    ]
  }
}
```

`runtime-service` sends `llm` execution requests to `model-gateway-service`, and the gateway forwards them to an OpenAI-compatible `/chat/completions` provider.

Recommended environment variables for `model-gateway-service`:

```powershell
$env:AGENT_PLATFORM_PROVIDER_BASE_URL="https://api.openai.com/v1"
$env:AGENT_PLATFORM_PROVIDER_API_KEY="your-api-key"
$env:AGENT_PLATFORM_DEFAULT_MODEL="gpt-4o-mini"
```

Code node config example:

```json
{
  "id": "compute-summary",
  "type": "code",
  "config": {
    "language": "python",
    "timeout_seconds": 5,
    "input_json": {
      "numbers": [1, 2, 3, 4]
    },
    "code": "total = sum(payload['numbers'])\nresult = {'total': total, 'count': len(payload['numbers'])}\nprint(f'total={total}')"
  }
}
```

`runtime-service` sends `code` execution requests to `code-runner-service`. Current `python` execution contract:

- input payload is available as `payload`
- execution result should be assigned to `result`
- `print(...)` output is captured into `node_run.output_text`
- structured `result` is captured into `node_run.output_json.result_json`

Recommended environment variables for `code-runner-service`:

```powershell
$env:AGENT_PLATFORM_PYTHON_BIN="python"
$env:AGENT_PLATFORM_MAX_TIMEOUT_SECONDS="30"
```

## Docker Compose

Files:

- `deployments/docker/docker-compose.yml`
- `deployments/docker/python-service.Dockerfile`
- `deployments/docker/.env.example`

Start all services locally:

```powershell
cd D:\workspace\auto-platform
Copy-Item .\deployments\docker\.env.example .\.env
docker compose -f .\deployments\docker\docker-compose.yml up --build
```

Start in detached mode:

```powershell
docker compose -f .\deployments\docker\docker-compose.yml up --build -d
```

Production-like infrastructure:

- Compose now starts `postgres` with the `pgvector` image and runs `CREATE EXTENSION IF NOT EXISTS vector`.
- Compose now starts durable `redis` with append-only persistence.
- Copy `deployments/docker/.env.example` to `.env` to use per-service PostgreSQL databases such as `workflow_service`, `agent_service`, and `knowledge_service`.
- Set `AGENT_PLATFORM_REDIS_URL=redis://redis:6379/0` to enable shared Redis-backed locks, idempotency keys, and queues.

Scale runtime workers:

```powershell
docker compose -f .\deployments\docker\docker-compose.yml up --build -d --scale runtime-worker=3
```

Scale agent workers:

```powershell
docker compose -f .\deployments\docker\docker-compose.yml up --build -d --scale agent-worker=3
```

Scale team workers:

```powershell
docker compose -f .\deployments\docker\docker-compose.yml up --build -d --scale team-worker=3
```

Scale scheduler workers:

```powershell
docker compose -f .\deployments\docker\docker-compose.yml up --build -d --scale scheduler-worker=3
```

Stop and remove containers:

```powershell
docker compose -f .\deployments\docker\docker-compose.yml down
```

Important notes:

- Services still fall back to SQLite files under `/data` if `AGENT_PLATFORM_DATABASE_URL` is not set.
- For scaled workers, use PostgreSQL plus Redis rather than SQLite.
- `core-shared.redis_primitives` provides `DistributedLock`, `IdempotencyStore`, and `RedisQueue` for services that need cross-process coordination.
- `agent-service` stores agent definitions, prompt/config versions, and agent run records under `/data`
- `memory-service` stores scoped memories under `/data`; move it to PostgreSQL before enabling high-volume memory writes
- `team-service` stores multi-agent team definitions, team versions, and team run records under `/data`
- `team-worker` executes queued team runs by orchestrating member agent runs; it can be scaled independently
- `skill-service` stores skill definitions, versions, marketplace-style installations, and skill execution runs under `/data`
- `human-service` stores human approval, input, pause/resume, and takeover task records under `/data`
- `knowledge-service` stores knowledge bases, documents, chunks, and local retrieval metadata under `/data`
- `event-service` stores platform events and delivery status under `/data`
- `auth-service` stores users, roles, assignments, and permission policy metadata under `/data`
- `scheduler-service` stores delayed jobs, due-job leases, and retry status under `/data`
- `agent-worker` has no exposed port and can be scaled independently; set `AGENT_PLATFORM_AGENT_WORKER_DRY_RUN=true` for no-key local smoke runs
- `scheduler-worker` has no exposed port and can be scaled independently; prefer PostgreSQL for real multi-worker write concurrency
- `runtime-worker` has no exposed port and can be scaled independently; prefer PostgreSQL for real multi-worker write concurrency
- `runtime-service` automatically resolves internal URLs to `workflow-service`, `tool-service`, `model-gateway-service`, and `code-runner-service`
- `model-gateway-service` defaults to `http://host.docker.internal:11434/v1`; replace it in `.env` if you want OpenAI or another OpenAI-compatible provider