# agent-platform

基于 Python 的多服务智能体开发平台脚手架。

当前仓库已经初始化为 Monorepo，包含：

- `services/`：核心微服务
- `libs/`：共享领域模型、DSL、事件、数据库和公共组件
- `deployments/`：本地和集群部署占位
- `docs/`：规划和数据库设计文档

## 当前已创建的服务

- `api-gateway`
- `model-gateway-service`
- `session-service`
- `workflow-service`
- `runtime-service`
- `tool-service`

每个服务都提供了最小 `FastAPI` 启动入口和健康检查接口，数据库相关服务也已经带上了 `SQLAlchemy` 模型骨架与 Alembic 目录。

## 当前已创建的共享库

- `core-domain`
- `core-dsl`
- `core-events`
- `core-shared`
- `core-db`

## 推荐本地开发方式

建议使用 `uv` 或 `pip` 创建虚拟环境后安装各服务依赖。

```powershell
cd D:\workspace\auto-platform
python -m venv .venv
.venv\Scripts\activate
pip install -e .\libs\core-shared
pip install -e .\libs\core-domain
pip install -e .\libs\core-dsl
pip install -e .\libs\core-events
pip install -e .\libs\core-db
pip install -e .\services\api-gateway
pip install -e .\services\session-service
pip install -e .\services\workflow-service
pip install -e .\services\runtime-service
pip install -e .\services\tool-service
```

运行示例：

```powershell
cd D:\workspace\auto-platform\services\api-gateway
uvicorn app.main:app --reload --port 8000
```

数据库连接默认使用各服务目录下的 SQLite 文件，也可以通过环境变量覆盖：

```powershell
$env:AGENT_PLATFORM_DATABASE_URL="postgresql+psycopg://user:password@localhost:5432/workflow_db"
```

## 数据层脚手架

本轮已经加入：

- `libs/core-db`：统一 `SQLAlchemy` Base、通用 mixin、命名约定
- `workflow-service`：应用与流程定义模型
- `session-service`：会话与消息模型
- `runtime-service`：运行与节点执行模型
- `tool-service`：工具定义与绑定模型
- 每个服务独立的 `alembic.ini`、`env.py`、`versions/`
- `workflow-service`：已接入 repository / application service / CRUD API
- `session-service`：已接入 repository / application service / CRUD API

迁移执行示例：

```powershell
cd D:\workspace\auto-platform\services\workflow-service
alembic upgrade head
```

其他服务同理：

- `services/session-service`
- `services/runtime-service`
- `services/tool-service`

接口示例：

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8002/workflows/apps `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","code":"sales_assistant","name":"Sales Assistant"}'
```

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8001/sessions `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","app_id":"app-1","user_id":"user-1","channel_type":"web"}'
```

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8002/workflows/versions `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","workflow_id":"wf-1","dsl_json":{"nodes":[],"edges":[]}}'
```

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8001/sessions/run-requests `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","session_id":"sess-1","app_version_id":"appv-1","workflow_version_id":"wfv-1"}'
```

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8003/runtime/runs `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","app_id":"app-1","app_version_id":"appv-1","workflow_id":"wf-1","workflow_version_id":"wfv-1","session_id":"sess-1","initial_node":{"node_id":"start","node_type":"llm"}}'
```

如果不传 `initial_node`，`runtime-service` 会调用 `workflow-service` 读取对应的 `workflow version`，并从 DSL 中自动推导首节点：

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8003/runtime/runs `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","app_id":"app-1","app_version_id":"appv-1","workflow_id":"wf-1","workflow_version_id":"wfv-1","session_id":"sess-1"}'
```

一条链直接派发到 runtime：

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8001/sessions/run-requests/dispatch `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","session_id":"sess-1","app_id":"app-1","app_version_id":"appv-1","workflow_id":"wf-1","workflow_version_id":"wfv-1","initial_node":{"node_id":"start","node_type":"llm"}}'
```

工具定义示例：

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8004/tools `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","code":"search_products","name":"Search Products","tool_type":"http"}'
```

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8004/tools/versions `
  -ContentType "application/json" `
  -Body '{"tenant_id":"t1","tool_id":"tool-1","input_schema_json":{"query":{"type":"string"}},"invoke_config_json":{"method":"GET","path":"/products/search"}}'
```

运行状态推进示例：

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8003/runtime/node-runs/node-run-id/status `
  -ContentType "application/json" `
  -Body '{"status":"running","worker_key":"runtime-worker-1"}'
```

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8003/runtime/runs/run-id/status `
  -ContentType "application/json" `
  -Body '{"status":"completed"}'
```

说明：

- 当你调用 `node-runs/{node_run_id}/status` 更新节点状态时，`runtime-service` 会自动聚合当前运行下所有 `node_run` 的状态，并同步刷新 `workflow_run.status`
- 当前规则是：任一节点 `failed` 则运行 `failed`；有节点 `running` 则运行 `running`；全部节点都为 `completed/skipped` 则运行 `completed`
- 当某个 `node_run` 被更新为 `completed` 时，`runtime-service` 还会基于 `workflow version` 的 DSL 自动查找后继节点，并创建新的 `queued` 状态 `node_run`

## 目录结构

```text
services/
  api-gateway/
  session-service/
  workflow-service/
  runtime-service/
  tool-service/
libs/
  core-domain/
  core-dsl/
  core-events/
  core-shared/
  core-db/
deployments/
  docker/
  k8s/
docs/
tests/
```

## 下一步建议

1. 补齐 `V0.1` 的 repository / service 层
2. 写第一版 Alembic 初始迁移
3. 接入 PostgreSQL / Redis
4. 增加 Docker Compose
5. 开始实现应用、流程、运行三条主链路

## Runtime Execute APIs

`runtime-service` now includes a typed executor skeleton for these node types:

- `llm`
- `tool`
- `code`
- `answer`
- `if-else`
- `assigner`
- `knowledge-retrieval`
- `template-transform`

Execute a specific queued node:

```powershell
Invoke-RestMethod -Method Post `
  -Uri http://127.0.0.1:8003/runtime/node-runs/node-run-id/execute `
  -ContentType "application/json" `
  -Body '{"worker_key":"runtime-worker-1"}'
```

Execute the next queued node in a run:

```powershell
Invoke-RestMethod -Method Post `
  -Uri "http://127.0.0.1:8003/runtime/runs/run-id/execute-next?tenant_id=t1" `
  -ContentType "application/json" `
  -Body '{"worker_key":"runtime-worker-1"}'
```

Execute queued nodes in sequence until the run is finished, blocked, or reaches `max_steps`:

```powershell
Invoke-RestMethod -Method Post `
  -Uri "http://127.0.0.1:8003/runtime/runs/run-id/execute?tenant_id=t1" `
  -ContentType "application/json" `
  -Body '{"worker_key":"runtime-worker-1","max_steps":16}'
```

Execute one queued node through the worker claim API:

```powershell
Invoke-RestMethod -Method Post `
  -Uri "http://127.0.0.1:8003/runtime/workers/execute-next" `
  -ContentType "application/json" `
  -Body '{"worker_key":"runtime-worker-1","lease_seconds":300}'
```

Run a standalone runtime worker process:

```powershell
Push-Location .\services\runtime-service
$env:AGENT_PLATFORM_DATABASE_URL="sqlite:///./runtime_service.db"
..\..\.venv\Scripts\python -m app.worker
Pop-Location
```

The worker uses `node_run.status` plus `lease_expire_time` as a DB-backed queue. This keeps the first scalable version dependency-light; for heavier production concurrency, move `AGENT_PLATFORM_DATABASE_URL` to PostgreSQL before scaling many workers.

Node execution results are now persisted on `node_run`:

- `output_text`
- `output_json`

Node execution artifacts are also persisted on `node_artifact`:

- `artifact_type`
- `content_text`
- `content_json`
- `storage_uri`
- `size_bytes`

Query artifacts:

```powershell
Invoke-RestMethod `
  -Uri "http://127.0.0.1:8003/runtime/node-artifacts?tenant_id=t1&run_id=run-id"
```

Trace spans are persisted on `trace_span` for timeline and latency analysis:

- `span_type`
- `name`
- `status`
- `started_time`
- `ended_time`
- `duration_ms`
- `attributes_json`
- `error_code`
- `error_message`

Query trace spans:

```powershell
Invoke-RestMethod `
  -Uri "http://127.0.0.1:8003/runtime/trace-spans?tenant_id=t1&run_id=run-id"
```

Current behavior:

- `answer` nodes persist rendered text to `output_text`
- `assigner` nodes write `state_updates` to `output_json`
- `condition` / `if-else` nodes write `condition_result` and `route` to `output_json`
- `template-transform` nodes render text or JSON using previous node outputs and run state
- `knowledge-retrieval` / `retriever` nodes run keyword retrieval over inline or HTTP JSON documents
- `tool` nodes persist resolved binding/tool metadata to `output_json`
- default executors persist basic executor metadata to `output_json`

Runtime template context:

- `state.xxx`: values written by previous `assigner` nodes
- `nodes.node_id.output.xxx`: structured output from a previous node
- `nodes.node_id.text`: text output from a previous node
- `current.node_id`: current node id

Assigner node config example:

```json
{
  "id": "seed-state",
  "type": "assigner",
  "config": {
    "assignments": {
      "score": 7,
      "user_name": "Alice"
    }
  }
}
```

Condition node config example:

```json
{
  "id": "check-score",
  "type": "if-else",
  "config": {
    "expression": "state.score >= 5"
  }
}
```

Conditional edge example:

```json
[
  {"source": "check-score", "target": "high-path", "condition": "true"},
  {"source": "check-score", "target": "low-path", "condition": "false"}
]
```

Template node config example:

```json
{
  "id": "high-path",
  "type": "template-transform",
  "config": {
    "template": "{{state.user_name}} passed with score {{state.score}}"
  }
}
```

Retriever node config example:

```json
{
  "id": "retrieve-docs",
  "type": "knowledge-retrieval",
  "config": {
    "query_template": "{{state.query}}",
    "top_k": 2,
    "documents": [
      {
        "id": "refund",
        "title": "Refund Policy",
        "text": "Refund policy allows returns within seven days."
      },
      {
        "id": "shipping",
        "title": "Shipping Policy",
        "text": "Shipping usually takes three to five business days."
      }
    ]
  }
}
```

Retriever output is persisted to `node_run.output_json.retrieved_documents`. Template nodes can consume it:

```json
{
  "id": "render-answer",
  "type": "template-transform",
  "config": {
    "template": "Top doc: {{nodes.retrieve-docs.output.retrieved_documents.0.title}}"
  }
}
```

Retriever nodes can also load documents from an HTTP JSON source:

```json
{
  "id": "retrieve-remote-docs",
  "type": "retriever",
  "config": {
    "query": "refund policy",
    "source_url": "http://127.0.0.1:9000/documents",
    "top_k": 3
  }
}
```

The HTTP source should return either a document list or an object with a `documents` list.

Run the no-key runtime smoke test after local services are running:

```powershell
.\.venv\Scripts\python scripts\smoke_runtime_no_key.py
```

Run the same smoke test through `api-gateway`:

```powershell
$env:AGENT_PLATFORM_SMOKE_WORKFLOW_URL="http://127.0.0.1:8000/gateway/workflows"
$env:AGENT_PLATFORM_SMOKE_RUNTIME_URL="http://127.0.0.1:8000/gateway/runtime"
.\.venv\Scripts\python scripts\smoke_runtime_no_key.py
```

## API Gateway

`api-gateway` provides a unified entrypoint:

- `GET /gateway/services/health`
- `/gateway/workflows/**` -> `workflow-service /workflows/**`
- `/gateway/sessions/**` -> `session-service /sessions/**`
- `/gateway/runtime/**` -> `runtime-service /runtime/**`
- `/gateway/tools/**` -> `tool-service /tools/**`
- `/gateway/models/**` -> `model-gateway-service /models/**`
- `/gateway/code/**` -> `code-runner-service /code/**`

Gateway readiness:

```powershell
Invoke-RestMethod -Uri "http://127.0.0.1:8000/ready"
```

Downstream health:

```powershell
Invoke-RestMethod -Uri "http://127.0.0.1:8000/gateway/services/health"
```

Gateway request context:

- Incoming `x-request-id` is reused; otherwise gateway generates one.
- Incoming `x-tenant-id` is reused; otherwise gateway falls back to `tenant_id` query parameter, then `public`.
- Gateway forwards both `x-request-id` and `x-tenant-id` to downstream services.
- Gateway writes request audit records to `gateway_request_audit`.

Query gateway audits:

```powershell
Invoke-RestMethod `
  -Uri "http://127.0.0.1:8000/gateway/audits?tenant_id=t1&limit=20" `
  -Headers @{"x-tenant-id"="t1"}
```

Gateway API Key auth:

- `AGENT_PLATFORM_AUTH_REQUIRED=false` by default for local development.
- Set `AGENT_PLATFORM_AUTH_REQUIRED=true` to protect `/gateway/**`, except `/gateway/services/health`.
- API keys are stored as SHA-256 hashes. The raw key is only returned once at creation.
- When auth is enabled and no API key exists yet, the first `POST /gateway/api-keys` is allowed as bootstrap.
- API keys can be `active`, `disabled`, or `revoked`; only `active` keys are accepted.

Create an API key:

```powershell
$body = @{
  tenant_id = "t1"
  name = "local-dev"
} | ConvertTo-Json

$created = Invoke-RestMethod `
  -Method Post `
  -Uri "http://127.0.0.1:8000/gateway/api-keys" `
  -ContentType "application/json" `
  -Body $body

$created.api_key
```

Use an API key:

```powershell
Invoke-RestMethod `
  -Uri "http://127.0.0.1:8000/gateway/audits?tenant_id=t1" `
  -Headers @{"x-tenant-id"="t1"; "x-api-key"=$created.api_key}
```

Disable or revoke an API key:

```powershell
$body = @{
  tenant_id = "t1"
  status = "revoked"
} | ConvertTo-Json

Invoke-RestMethod `
  -Method Patch `
  -Uri "http://127.0.0.1:8000/gateway/api-keys/$($created.id)/status" `
  -ContentType "application/json" `
  -Headers @{"x-tenant-id"="t1"; "x-api-key"=$created.api_key} `
  -Body $body
```

Run smoke test through an authenticated gateway:

```powershell
$env:AGENT_PLATFORM_SMOKE_WORKFLOW_URL="http://127.0.0.1:8000/gateway/workflows"
$env:AGENT_PLATFORM_SMOKE_RUNTIME_URL="http://127.0.0.1:8000/gateway/runtime"
$env:AGENT_PLATFORM_SMOKE_TENANT_ID="t1"
$env:AGENT_PLATFORM_SMOKE_API_KEY=$created.api_key
.\.venv\Scripts\python scripts\smoke_runtime_no_key.py
```

HTTP tool node config example:

```json
{
  "id": "search-products",
  "type": "tool",
  "config": {
    "tool_binding_id": "binding-1",
    "query": {
      "keyword": "milk"
    }
  }
}
```

Supported HTTP tool config resolution order:

- URL: `config.url` or `invoke_config_json.url`
- Base URL: `config.base_url` or `binding.config_json.base_url` or `invoke_config_json.base_url`
- Path: `config.path` or `invoke_config_json.path`
- Method: `invoke_config_json.method`, default `GET`
- Query params: merge `invoke_config_json.query` + `config.query`
- Body JSON: merge `invoke_config_json.body` + `config.body`
- Headers: merge `invoke_config_json.headers` + `binding.config_json.headers` + `config.headers`

LLM node config example:

```json
{
  "id": "draft-answer",
  "type": "llm",
  "config": {
    "model": "gpt-4o-mini",
    "system_prompt": "You are a customer support assistant.",
    "prompt": "Summarize the user intent in Chinese.",
    "temperature": 0.2,
    "max_tokens": 400
  }
}
```

`llm` nodes also support explicit `messages`:

```json
{
  "id": "rewrite-message",
  "type": "llm",
  "config": {
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "system", "content": "You are a concise editor."},
      {"role": "user", "content": "Rewrite this sentence in a warmer tone."}
    ]
  }
}
```

`runtime-service` sends `llm` execution requests to `model-gateway-service`, and the gateway forwards them to an OpenAI-compatible `/chat/completions` provider.

Recommended environment variables for `model-gateway-service`:

```powershell
$env:AGENT_PLATFORM_PROVIDER_BASE_URL="https://api.openai.com/v1"
$env:AGENT_PLATFORM_PROVIDER_API_KEY="your-api-key"
$env:AGENT_PLATFORM_DEFAULT_MODEL="gpt-4o-mini"
```

Code node config example:

```json
{
  "id": "compute-summary",
  "type": "code",
  "config": {
    "language": "python",
    "timeout_seconds": 5,
    "input_json": {
      "numbers": [1, 2, 3, 4]
    },
    "code": "total = sum(payload['numbers'])\nresult = {'total': total, 'count': len(payload['numbers'])}\nprint(f'total={total}')"
  }
}
```

`runtime-service` sends `code` execution requests to `code-runner-service`. Current `python` execution contract:

- input payload is available as `payload`
- execution result should be assigned to `result`
- `print(...)` output is captured into `node_run.output_text`
- structured `result` is captured into `node_run.output_json.result_json`

Recommended environment variables for `code-runner-service`:

```powershell
$env:AGENT_PLATFORM_PYTHON_BIN="python"
$env:AGENT_PLATFORM_MAX_TIMEOUT_SECONDS="30"
```

## Docker Compose

Files:

- `deployments/docker/docker-compose.yml`
- `deployments/docker/python-service.Dockerfile`
- `deployments/docker/.env.example`

Start all services locally:

```powershell
cd D:\workspace\auto-platform
Copy-Item .\deployments\docker\.env.example .\.env
docker compose -f .\deployments\docker\docker-compose.yml up --build
```

Start in detached mode:

```powershell
docker compose -f .\deployments\docker\docker-compose.yml up --build -d
```

Scale runtime workers:

```powershell
docker compose -f .\deployments\docker\docker-compose.yml up --build -d --scale runtime-worker=3
```

Stop and remove containers:

```powershell
docker compose -f .\deployments\docker\docker-compose.yml down
```

Important notes:

- `workflow-service`, `session-service`, `runtime-service`, `tool-service`, and `api-gateway` use SQLite files mounted under `/data`
- `runtime-worker` has no exposed port and can be scaled independently; prefer PostgreSQL for real multi-worker write concurrency
- `runtime-service` automatically resolves internal URLs to `workflow-service`, `tool-service`, `model-gateway-service`, and `code-runner-service`
- `model-gateway-service` defaults to `http://host.docker.internal:11434/v1`; replace it in `.env` if you want OpenAI or another OpenAI-compatible provider