# agent-platform 基于 Python 的多服务智能体开发平台脚手架。 当前仓库已经初始化为 Monorepo,包含: - `services/`:核心微服务 - `libs/`:共享领域模型、DSL、事件、数据库和公共组件 - `deployments/`:本地和集群部署占位 - `docs/`:规划和数据库设计文档 ## 当前已创建的服务 - `api-gateway` - `model-gateway-service` - `session-service` - `workflow-service` - `runtime-service` - `tool-service` 每个服务都提供了最小 `FastAPI` 启动入口和健康检查接口,数据库相关服务也已经带上了 `SQLAlchemy` 模型骨架与 Alembic 目录。 ## 当前已创建的共享库 - `core-domain` - `core-dsl` - `core-events` - `core-shared` - `core-db` ## 推荐本地开发方式 建议使用 `uv` 或 `pip` 创建虚拟环境后安装各服务依赖。 ```powershell cd D:\workspace\auto-platform python -m venv .venv .venv\Scripts\activate pip install -e .\libs\core-shared pip install -e .\libs\core-domain pip install -e .\libs\core-dsl pip install -e .\libs\core-events pip install -e .\libs\core-db pip install -e .\services\api-gateway pip install -e .\services\session-service pip install -e .\services\workflow-service pip install -e .\services\runtime-service pip install -e .\services\tool-service ``` 运行示例: ```powershell cd D:\workspace\auto-platform\services\api-gateway uvicorn app.main:app --reload --port 8000 ``` 数据库连接默认使用各服务目录下的 SQLite 文件,也可以通过环境变量覆盖: ```powershell $env:AGENT_PLATFORM_DATABASE_URL="postgresql+psycopg://user:password@localhost:5432/workflow_db" ``` ## 数据层脚手架 本轮已经加入: - `libs/core-db`:统一 `SQLAlchemy` Base、通用 mixin、命名约定 - `workflow-service`:应用与流程定义模型 - `session-service`:会话与消息模型 - `runtime-service`:运行与节点执行模型 - `tool-service`:工具定义与绑定模型 - 每个服务独立的 `alembic.ini`、`env.py`、`versions/` - `workflow-service`:已接入 repository / application service / CRUD API - `session-service`:已接入 repository / application service / CRUD API 迁移执行示例: ```powershell cd D:\workspace\auto-platform\services\workflow-service alembic upgrade head ``` 其他服务同理: - `services/session-service` - `services/runtime-service` - `services/tool-service` 接口示例: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8002/workflows/apps ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","code":"sales_assistant","name":"Sales Assistant"}' ``` ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8001/sessions ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","app_id":"app-1","user_id":"user-1","channel_type":"web"}' ``` ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8002/workflows/versions ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","workflow_id":"wf-1","dsl_json":{"nodes":[],"edges":[]}}' ``` ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8001/sessions/run-requests ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","session_id":"sess-1","app_version_id":"appv-1","workflow_version_id":"wfv-1"}' ``` ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8003/runtime/runs ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","app_id":"app-1","app_version_id":"appv-1","workflow_id":"wf-1","workflow_version_id":"wfv-1","session_id":"sess-1","initial_node":{"node_id":"start","node_type":"llm"}}' ``` 如果不传 `initial_node`,`runtime-service` 会调用 `workflow-service` 读取对应的 `workflow version`,并从 DSL 中自动推导首节点: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8003/runtime/runs ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","app_id":"app-1","app_version_id":"appv-1","workflow_id":"wf-1","workflow_version_id":"wfv-1","session_id":"sess-1"}' ``` 一条链直接派发到 runtime: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8001/sessions/run-requests/dispatch ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","session_id":"sess-1","app_id":"app-1","app_version_id":"appv-1","workflow_id":"wf-1","workflow_version_id":"wfv-1","initial_node":{"node_id":"start","node_type":"llm"}}' ``` 工具定义示例: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8004/tools ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","code":"search_products","name":"Search Products","tool_type":"http"}' ``` ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8004/tools/versions ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","tool_id":"tool-1","input_schema_json":{"query":{"type":"string"}},"invoke_config_json":{"method":"GET","path":"/products/search"}}' ``` 运行状态推进示例: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8003/runtime/node-runs/node-run-id/status ` -ContentType "application/json" ` -Body '{"status":"running","worker_key":"runtime-worker-1"}' ``` ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8003/runtime/runs/run-id/status ` -ContentType "application/json" ` -Body '{"status":"completed"}' ``` 说明: - 当你调用 `node-runs/{node_run_id}/status` 更新节点状态时,`runtime-service` 会自动聚合当前运行下所有 `node_run` 的状态,并同步刷新 `workflow_run.status` - 当前规则是:任一节点 `failed` 则运行 `failed`;有节点 `running` 则运行 `running`;全部节点都为 `completed/skipped` 则运行 `completed` - 当某个 `node_run` 被更新为 `completed` 时,`runtime-service` 还会基于 `workflow version` 的 DSL 自动查找后继节点,并创建新的 `queued` 状态 `node_run` ## 目录结构 ```text services/ api-gateway/ session-service/ workflow-service/ runtime-service/ tool-service/ libs/ core-domain/ core-dsl/ core-events/ core-shared/ core-db/ deployments/ docker/ k8s/ docs/ tests/ ``` ## 下一步建议 1. 补齐 `V0.1` 的 repository / service 层 2. 写第一版 Alembic 初始迁移 3. 接入 PostgreSQL / Redis 4. 增加 Docker Compose 5. 开始实现应用、流程、运行三条主链路 ## Runtime Execute APIs `runtime-service` now includes a typed executor skeleton for these node types: - `llm` - `tool` - `code` - `answer` - `if-else` - `assigner` - `knowledge-retrieval` - `template-transform` Execute a specific queued node: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8003/runtime/node-runs/node-run-id/execute ` -ContentType "application/json" ` -Body '{"worker_key":"runtime-worker-1"}' ``` Execute the next queued node in a run: ```powershell Invoke-RestMethod -Method Post ` -Uri "http://127.0.0.1:8003/runtime/runs/run-id/execute-next?tenant_id=t1" ` -ContentType "application/json" ` -Body '{"worker_key":"runtime-worker-1"}' ``` Execute queued nodes in sequence until the run is finished, blocked, or reaches `max_steps`: ```powershell Invoke-RestMethod -Method Post ` -Uri "http://127.0.0.1:8003/runtime/runs/run-id/execute?tenant_id=t1" ` -ContentType "application/json" ` -Body '{"worker_key":"runtime-worker-1","max_steps":16}' ``` Execute one queued node through the worker claim API: ```powershell Invoke-RestMethod -Method Post ` -Uri "http://127.0.0.1:8003/runtime/workers/execute-next" ` -ContentType "application/json" ` -Body '{"worker_key":"runtime-worker-1","lease_seconds":300}' ``` Run a standalone runtime worker process: ```powershell Push-Location .\services\runtime-service $env:AGENT_PLATFORM_DATABASE_URL="sqlite:///./runtime_service.db" ..\..\.venv\Scripts\python -m app.worker Pop-Location ``` The worker uses `node_run.status` plus `lease_expire_time` as a DB-backed queue. This keeps the first scalable version dependency-light; for heavier production concurrency, move `AGENT_PLATFORM_DATABASE_URL` to PostgreSQL before scaling many workers. Node execution results are now persisted on `node_run`: - `output_text` - `output_json` Node execution artifacts are also persisted on `node_artifact`: - `artifact_type` - `content_text` - `content_json` - `storage_uri` - `size_bytes` Query artifacts: ```powershell Invoke-RestMethod ` -Uri "http://127.0.0.1:8003/runtime/node-artifacts?tenant_id=t1&run_id=run-id" ``` Trace spans are persisted on `trace_span` for timeline and latency analysis: - `span_type` - `name` - `status` - `started_time` - `ended_time` - `duration_ms` - `attributes_json` - `error_code` - `error_message` Query trace spans: ```powershell Invoke-RestMethod ` -Uri "http://127.0.0.1:8003/runtime/trace-spans?tenant_id=t1&run_id=run-id" ``` Current behavior: - `answer` nodes persist rendered text to `output_text` - `assigner` nodes write `state_updates` to `output_json` - `condition` / `if-else` nodes write `condition_result` and `route` to `output_json` - `template-transform` nodes render text or JSON using previous node outputs and run state - `knowledge-retrieval` / `retriever` nodes run keyword retrieval over inline or HTTP JSON documents - `tool` nodes persist resolved binding/tool metadata to `output_json` - default executors persist basic executor metadata to `output_json` Runtime template context: - `state.xxx`: values written by previous `assigner` nodes - `nodes.node_id.output.xxx`: structured output from a previous node - `nodes.node_id.text`: text output from a previous node - `current.node_id`: current node id Assigner node config example: ```json { "id": "seed-state", "type": "assigner", "config": { "assignments": { "score": 7, "user_name": "Alice" } } } ``` Condition node config example: ```json { "id": "check-score", "type": "if-else", "config": { "expression": "state.score >= 5" } } ``` Conditional edge example: ```json [ {"source": "check-score", "target": "high-path", "condition": "true"}, {"source": "check-score", "target": "low-path", "condition": "false"} ] ``` Template node config example: ```json { "id": "high-path", "type": "template-transform", "config": { "template": "{{state.user_name}} passed with score {{state.score}}" } } ``` Retriever node config example: ```json { "id": "retrieve-docs", "type": "knowledge-retrieval", "config": { "query_template": "{{state.query}}", "top_k": 2, "documents": [ { "id": "refund", "title": "Refund Policy", "text": "Refund policy allows returns within seven days." }, { "id": "shipping", "title": "Shipping Policy", "text": "Shipping usually takes three to five business days." } ] } } ``` Retriever output is persisted to `node_run.output_json.retrieved_documents`. Template nodes can consume it: ```json { "id": "render-answer", "type": "template-transform", "config": { "template": "Top doc: {{nodes.retrieve-docs.output.retrieved_documents.0.title}}" } } ``` Retriever nodes can also load documents from an HTTP JSON source: ```json { "id": "retrieve-remote-docs", "type": "retriever", "config": { "query": "refund policy", "source_url": "http://127.0.0.1:9000/documents", "top_k": 3 } } ``` The HTTP source should return either a document list or an object with a `documents` list. Run the no-key runtime smoke test after local services are running: ```powershell .\.venv\Scripts\python scripts\smoke_runtime_no_key.py ``` Run the same smoke test through `api-gateway`: ```powershell $env:AGENT_PLATFORM_SMOKE_WORKFLOW_URL="http://127.0.0.1:8000/gateway/workflows" $env:AGENT_PLATFORM_SMOKE_RUNTIME_URL="http://127.0.0.1:8000/gateway/runtime" .\.venv\Scripts\python scripts\smoke_runtime_no_key.py ``` ## API Gateway `api-gateway` provides a unified entrypoint: - `GET /gateway/services/health` - `/gateway/workflows/**` -> `workflow-service /workflows/**` - `/gateway/sessions/**` -> `session-service /sessions/**` - `/gateway/runtime/**` -> `runtime-service /runtime/**` - `/gateway/tools/**` -> `tool-service /tools/**` - `/gateway/models/**` -> `model-gateway-service /models/**` - `/gateway/code/**` -> `code-runner-service /code/**` Gateway readiness: ```powershell Invoke-RestMethod -Uri "http://127.0.0.1:8000/ready" ``` Downstream health: ```powershell Invoke-RestMethod -Uri "http://127.0.0.1:8000/gateway/services/health" ``` Gateway request context: - Incoming `x-request-id` is reused; otherwise gateway generates one. - Incoming `x-tenant-id` is reused; otherwise gateway falls back to `tenant_id` query parameter, then `public`. - Gateway forwards both `x-request-id` and `x-tenant-id` to downstream services. - Gateway writes request audit records to `gateway_request_audit`. Query gateway audits: ```powershell Invoke-RestMethod ` -Uri "http://127.0.0.1:8000/gateway/audits?tenant_id=t1&limit=20" ` -Headers @{"x-tenant-id"="t1"} ``` Gateway API Key auth: - `AGENT_PLATFORM_AUTH_REQUIRED=false` by default for local development. - Set `AGENT_PLATFORM_AUTH_REQUIRED=true` to protect `/gateway/**`, except `/gateway/services/health`. - API keys are stored as SHA-256 hashes. The raw key is only returned once at creation. - When auth is enabled and no API key exists yet, the first `POST /gateway/api-keys` is allowed as bootstrap. - API keys can be `active`, `disabled`, or `revoked`; only `active` keys are accepted. Create an API key: ```powershell $body = @{ tenant_id = "t1" name = "local-dev" } | ConvertTo-Json $created = Invoke-RestMethod ` -Method Post ` -Uri "http://127.0.0.1:8000/gateway/api-keys" ` -ContentType "application/json" ` -Body $body $created.api_key ``` Use an API key: ```powershell Invoke-RestMethod ` -Uri "http://127.0.0.1:8000/gateway/audits?tenant_id=t1" ` -Headers @{"x-tenant-id"="t1"; "x-api-key"=$created.api_key} ``` Disable or revoke an API key: ```powershell $body = @{ tenant_id = "t1" status = "revoked" } | ConvertTo-Json Invoke-RestMethod ` -Method Patch ` -Uri "http://127.0.0.1:8000/gateway/api-keys/$($created.id)/status" ` -ContentType "application/json" ` -Headers @{"x-tenant-id"="t1"; "x-api-key"=$created.api_key} ` -Body $body ``` Run smoke test through an authenticated gateway: ```powershell $env:AGENT_PLATFORM_SMOKE_WORKFLOW_URL="http://127.0.0.1:8000/gateway/workflows" $env:AGENT_PLATFORM_SMOKE_RUNTIME_URL="http://127.0.0.1:8000/gateway/runtime" $env:AGENT_PLATFORM_SMOKE_TENANT_ID="t1" $env:AGENT_PLATFORM_SMOKE_API_KEY=$created.api_key .\.venv\Scripts\python scripts\smoke_runtime_no_key.py ``` HTTP tool node config example: ```json { "id": "search-products", "type": "tool", "config": { "tool_binding_id": "binding-1", "query": { "keyword": "milk" } } } ``` Supported HTTP tool config resolution order: - URL: `config.url` or `invoke_config_json.url` - Base URL: `config.base_url` or `binding.config_json.base_url` or `invoke_config_json.base_url` - Path: `config.path` or `invoke_config_json.path` - Method: `invoke_config_json.method`, default `GET` - Query params: merge `invoke_config_json.query` + `config.query` - Body JSON: merge `invoke_config_json.body` + `config.body` - Headers: merge `invoke_config_json.headers` + `binding.config_json.headers` + `config.headers` LLM node config example: ```json { "id": "draft-answer", "type": "llm", "config": { "model": "gpt-4o-mini", "system_prompt": "You are a customer support assistant.", "prompt": "Summarize the user intent in Chinese.", "temperature": 0.2, "max_tokens": 400 } } ``` `llm` nodes also support explicit `messages`: ```json { "id": "rewrite-message", "type": "llm", "config": { "model": "gpt-4o-mini", "messages": [ {"role": "system", "content": "You are a concise editor."}, {"role": "user", "content": "Rewrite this sentence in a warmer tone."} ] } } ``` `runtime-service` sends `llm` execution requests to `model-gateway-service`, and the gateway forwards them to an OpenAI-compatible `/chat/completions` provider. Recommended environment variables for `model-gateway-service`: ```powershell $env:AGENT_PLATFORM_PROVIDER_BASE_URL="https://api.openai.com/v1" $env:AGENT_PLATFORM_PROVIDER_API_KEY="your-api-key" $env:AGENT_PLATFORM_DEFAULT_MODEL="gpt-4o-mini" ``` Code node config example: ```json { "id": "compute-summary", "type": "code", "config": { "language": "python", "timeout_seconds": 5, "input_json": { "numbers": [1, 2, 3, 4] }, "code": "total = sum(payload['numbers'])\nresult = {'total': total, 'count': len(payload['numbers'])}\nprint(f'total={total}')" } } ``` `runtime-service` sends `code` execution requests to `code-runner-service`. Current `python` execution contract: - input payload is available as `payload` - execution result should be assigned to `result` - `print(...)` output is captured into `node_run.output_text` - structured `result` is captured into `node_run.output_json.result_json` Recommended environment variables for `code-runner-service`: ```powershell $env:AGENT_PLATFORM_PYTHON_BIN="python" $env:AGENT_PLATFORM_MAX_TIMEOUT_SECONDS="30" ``` ## Docker Compose Files: - `deployments/docker/docker-compose.yml` - `deployments/docker/python-service.Dockerfile` - `deployments/docker/.env.example` Start all services locally: ```powershell cd D:\workspace\auto-platform Copy-Item .\deployments\docker\.env.example .\.env docker compose -f .\deployments\docker\docker-compose.yml up --build ``` Start in detached mode: ```powershell docker compose -f .\deployments\docker\docker-compose.yml up --build -d ``` Scale runtime workers: ```powershell docker compose -f .\deployments\docker\docker-compose.yml up --build -d --scale runtime-worker=3 ``` Stop and remove containers: ```powershell docker compose -f .\deployments\docker\docker-compose.yml down ``` Important notes: - `workflow-service`, `session-service`, `runtime-service`, `tool-service`, and `api-gateway` use SQLite files mounted under `/data` - `runtime-worker` has no exposed port and can be scaled independently; prefer PostgreSQL for real multi-worker write concurrency - `runtime-service` automatically resolves internal URLs to `workflow-service`, `tool-service`, `model-gateway-service`, and `code-runner-service` - `model-gateway-service` defaults to `http://host.docker.internal:11434/v1`; replace it in `.env` if you want OpenAI or another OpenAI-compatible provider