# agent-platform 基于 Python 的多服务智能体开发平台脚手架。 当前仓库已经初始化为 Monorepo,包含: - `services/`:核心微服务 - `libs/`:共享领域模型、DSL、事件、数据库和公共组件 - `deployments/`:本地和集群部署占位 - `docs/`:规划和数据库设计文档 ## 当前已创建的服务 - `api-gateway` - `model-gateway-service` - `session-service` - `workflow-service` - `runtime-service` - `agent-service` - `memory-service` - `team-service` - `skill-service` - `human-service` - `knowledge-service` - `event-service` - `auth-service` - `scheduler-service` - `tool-service` 每个服务都提供了最小 `FastAPI` 启动入口和健康检查接口,数据库相关服务也已经带上了 `SQLAlchemy` 模型骨架与 Alembic 目录。 ## 当前已创建的共享库 - `core-domain` - `core-dsl` - `core-events` - `core-shared` - `core-db` ## 推荐本地开发方式 建议使用 `uv` 或 `pip` 创建虚拟环境后安装各服务依赖。 ```powershell cd D:\workspace\auto-platform python -m venv .venv .venv\Scripts\activate pip install -e .\libs\core-shared pip install -e .\libs\core-domain pip install -e .\libs\core-dsl pip install -e .\libs\core-events pip install -e .\libs\core-db pip install -e .\services\api-gateway pip install -e .\services\session-service pip install -e .\services\workflow-service pip install -e .\services\runtime-service pip install -e .\services\agent-service pip install -e .\services\memory-service pip install -e .\services\team-service pip install -e .\services\skill-service pip install -e .\services\human-service pip install -e .\services\knowledge-service pip install -e .\services\event-service pip install -e .\services\auth-service pip install -e .\services\scheduler-service pip install -e .\services\tool-service ``` 运行示例: ```powershell cd D:\workspace\auto-platform\services\api-gateway uvicorn app.main:app --reload --port 8000 ``` 数据库连接默认使用各服务目录下的 SQLite 文件,也可以通过环境变量覆盖: ```powershell $env:AGENT_PLATFORM_DATABASE_URL="postgresql+psycopg://user:password@localhost:5432/workflow_db" ``` ## 数据层脚手架 本轮已经加入: - `libs/core-db`:统一 `SQLAlchemy` Base、通用 mixin、命名约定 - `workflow-service`:应用与流程定义模型 - `session-service`:会话与消息模型 - `runtime-service`:运行与节点执行模型 - `tool-service`:工具定义与绑定模型 - 每个服务独立的 `alembic.ini`、`env.py`、`versions/` - `workflow-service`:已接入 repository / application service / CRUD API - `session-service`:已接入 repository / application service / CRUD API 迁移执行示例: ```powershell cd D:\workspace\auto-platform\services\workflow-service alembic upgrade head ``` 其他服务同理: - `services/session-service` - `services/runtime-service` - `services/tool-service` 接口示例: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8002/workflows/apps ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","code":"sales_assistant","name":"Sales Assistant"}' ``` ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8001/sessions ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","app_id":"app-1","user_id":"user-1","channel_type":"web"}' ``` ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8002/workflows/versions ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","workflow_id":"wf-1","dsl_json":{"nodes":[],"edges":[]}}' ``` ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8001/sessions/run-requests ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","session_id":"sess-1","app_version_id":"appv-1","workflow_version_id":"wfv-1"}' ``` ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8003/runtime/runs ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","app_id":"app-1","app_version_id":"appv-1","workflow_id":"wf-1","workflow_version_id":"wfv-1","session_id":"sess-1","initial_node":{"node_id":"start","node_type":"llm"}}' ``` 如果不传 `initial_node`,`runtime-service` 会调用 `workflow-service` 读取对应的 `workflow version`,并从 DSL 中自动推导首节点: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8003/runtime/runs ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","app_id":"app-1","app_version_id":"appv-1","workflow_id":"wf-1","workflow_version_id":"wfv-1","session_id":"sess-1"}' ``` 一条链直接派发到 runtime: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8001/sessions/run-requests/dispatch ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","session_id":"sess-1","app_id":"app-1","app_version_id":"appv-1","workflow_id":"wf-1","workflow_version_id":"wfv-1","initial_node":{"node_id":"start","node_type":"llm"}}' ``` 工具定义示例: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8004/tools ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","code":"search_products","name":"Search Products","tool_type":"http"}' ``` ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8004/tools/versions ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","tool_id":"tool-1","input_schema_json":{"query":{"type":"string"}},"invoke_config_json":{"method":"GET","path":"/products/search"}}' ``` 运行状态推进示例: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8003/runtime/node-runs/node-run-id/status ` -ContentType "application/json" ` -Body '{"status":"running","worker_key":"runtime-worker-1"}' ``` ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8003/runtime/runs/run-id/status ` -ContentType "application/json" ` -Body '{"status":"completed"}' ``` 说明: - 当你调用 `node-runs/{node_run_id}/status` 更新节点状态时,`runtime-service` 会自动聚合当前运行下所有 `node_run` 的状态,并同步刷新 `workflow_run.status` - 当前规则是:任一节点 `failed` 则运行 `failed`;有节点 `running` 则运行 `running`;全部节点都为 `completed/skipped` 则运行 `completed` - 当某个 `node_run` 被更新为 `completed` 时,`runtime-service` 还会基于 `workflow version` 的 DSL 自动查找后继节点,并创建新的 `queued` 状态 `node_run` ## 目录结构 ```text services/ api-gateway/ session-service/ workflow-service/ runtime-service/ skill-service/ human-service/ knowledge-service/ event-service/ auth-service/ scheduler-service/ tool-service/ libs/ core-domain/ core-dsl/ core-events/ core-shared/ core-db/ deployments/ docker/ k8s/ docs/ tests/ ``` ## 下一步建议 1. 补齐 `V0.1` 的 repository / service 层 2. 写第一版 Alembic 初始迁移 3. 接入 PostgreSQL / Redis 4. 增加 Docker Compose 5. 开始实现应用、流程、运行三条主链路 ## Runtime Execute APIs ## Agent Service APIs `agent-service` stores strongly typed agent definitions, versioned prompts/configuration, and agent run records. Create an agent: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8007/agents ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","code":"sales_agent","name":"Sales Agent","agent_type":"assistant"}' ``` Create a published agent version: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8007/agents/versions ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","agent_id":"agent-id","status":"published","role":"sales_assistant","goal":"Help qualify leads","system_prompt":"You are a careful sales assistant."}' ``` Enable multi-step ReAct planning for an agent version: ```json { "model_config": { "react_enabled": true, "react_max_steps": 5 } } ``` When ReAct is enabled, the model can emit JSON tool actions such as `{"action":"tool","tool_code":"lookup_order","input_json":{"order_id":"123"}}` and then finish with `{"action":"finish","answer":"..."}`. Each tool call is persisted in `agent_tool_invocation`. Create an agent run. If `agent_version_id` is omitted, the latest published version is used: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8007/agents/runs ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","agent_id":"agent-id","session_id":"session-id","input_text":"Summarize this lead."}' ``` List tool invocation records for an agent run: ```powershell Invoke-RestMethod ` -Uri "http://127.0.0.1:8007/agents/runs/agent-run-id/tool-invocations?tenant_id=t1" ``` Agent execution now persists tool invocation audit records with selected, running, skipped, completed, or failed status, including input/output payloads and `started_time` / `finished_time`. Through `api-gateway`, use `/gateway/agents/**`. ## Memory Service APIs `memory-service` stores scoped memories for tenants, users, sessions, agents, and teams. The first version uses database text search so it works without vector infrastructure; pgvector can be added later behind the same API. Memory search now stores a local deterministic embedding per memory and uses hybrid rerank: - `keyword_score`: token overlap and frequency - `vector_score`: cosine similarity over local hash embeddings - `importance_score`: normalized memory importance boost - `rerank_mode`: `hybrid-local` Create a memory: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8008/memories ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","scope_type":"session","scope_id":"session-id","memory_type":"fact","content_text":"User prefers concise answers.","importance_score":80}' ``` Search memories: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8008/memories/search ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","query":"concise","scope_type":"session","scope_id":"session-id","limit":5}' ``` Through `api-gateway`, use `/gateway/memories/**`. ## Team Service APIs `team-service` stores multi-agent team definitions, versioned member composition, coordination mode, and team run records. The first version provides the team management backbone; later versions can connect team runs to supervisor/planner/member agent execution. Create a team: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8009/teams ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","code":"research_team","name":"Research Team","team_type":"collaborative"}' ``` Create a published team version: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8009/teams/versions ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","team_id":"team-id","status":"published","coordination_mode":"supervisor","objective":"Research and summarize complex questions","member_refs":[{"member_key":"lead","agent_id":"agent-lead","role":"supervisor","responsibility":"Plan and assign work"},{"member_key":"writer","agent_id":"agent-writer","role":"executor","responsibility":"Draft final answer"}]}' ``` Create a team run. If `team_version_id` is omitted, the latest published version is used: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8009/teams/runs ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","team_id":"team-id","session_id":"session-id","input_text":"Analyze this customer request."}' ``` Through `api-gateway`, use `/gateway/teams/**`. Execute a team run. The first implementation creates and executes one agent run per member, then stores a team-level summary. `dry_run=true` lets this work without model API keys: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8009/teams/runs/team-run-id/execute ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","worker_key":"team-worker-1","dry_run":true}' ``` Execute one queued team run through the worker claim API: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8009/teams/workers/execute-next ` -ContentType "application/json" ` -Body '{"worker_key":"team-worker-1","lease_seconds":300,"dry_run":true}' ``` Run a standalone team worker process: ```powershell Push-Location .\services\team-service $env:AGENT_PLATFORM_DATABASE_URL="sqlite:///./team_service.db" $env:AGENT_PLATFORM_WORKER_DRY_RUN="true" ..\..\.venv\Scripts\python -m app.worker Pop-Location ``` ## Skill Service APIs `skill-service` stores reusable skill definitions, versioned parameter/output schemas, marketplace-style installations, and executable skill runs. The first executor supports a dependency-free `template` runtime so local development works without API keys. Create a skill: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8010/skills ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","code":"hello_user","name":"Hello User","skill_type":"template"}' ``` Create a published skill version: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8010/skills/versions ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","skill_id":"skill-id","status":"published","runtime_type":"template","parameter_schema_json":{"name":{"type":"string"}},"implementation_json":{"template":"Hello $name"}}' ``` Install the skill for a tenant, agent, team, app, or user scope: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8010/skills/installations ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","skill_id":"skill-id","install_scope":"tenant","scope_id":"t1","installed_by":"user-1"}' ``` Create and execute a skill run: ```powershell $run = Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8010/skills/runs ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","skill_id":"skill-id","input_json":{"name":"Lucas"}}' Invoke-RestMethod -Method Post ` -Uri "http://127.0.0.1:8010/skills/runs/$($run.id)/execute" ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","worker_key":"skill-worker-1"}' ``` Through `api-gateway`, use `/gateway/skills/**`. ## Human Service APIs `human-service` stores human-in-the-loop tasks for approval, input collection, takeover, pause, and resume flows. Create an approval task: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8011/human/tasks ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","task_type":"approval","title":"Approve refund","run_id":"run-id","node_run_id":"node-run-id","assigned_to":"ops-1","request_payload_json":{"amount":99}}' ``` Claim and complete a task: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8011/human/tasks/human-task-id/claim ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","claimed_by":"ops-1"}' Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8011/human/tasks/human-task-id/complete ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","status":"approved","response_payload_json":{"approved":true}}' ``` Through `api-gateway`, use `/gateway/human/**`. Runtime human-in-the-loop nodes now create `human-service` tasks and pause the node in `pending` status until the task is completed. Supported node types: - `human` - `approval` - `human-input` - `human-takeover` After completing the human task, resume the blocked node: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8003/runtime/node-runs/node-run-id/resume-human ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","human_task_id":"human-task-id","worker_key":"runtime-worker-1"}' ``` ## Knowledge Service APIs `knowledge-service` stores independent knowledge bases, documents, chunks, and retrieval results. It defaults to deterministic local hash embeddings plus keyword scoring, so it works without external API keys. For production, set `AGENT_PLATFORM_EMBEDDING_PROVIDER=http` with an OpenAI-compatible `/embeddings` endpoint; if the provider fails and fallback is enabled, indexing and search fall back to local hash embeddings. Create a knowledge base: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8012/knowledge/bases ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","code":"support_kb","name":"Support Knowledge Base"}' ``` Create and index a document: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8012/knowledge/documents ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","knowledge_base_id":"kb-id","title":"Refund Policy","content_text":"Refunds are available within seven days for eligible orders.","source_type":"text"}' ``` Search the knowledge base: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8012/knowledge/search ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","knowledge_base_id":"kb-id","query":"refund within seven days","top_k":3}' ``` Through `api-gateway`, use `/gateway/knowledge/**`. ## Event Service APIs `event-service` stores platform events with delivery status so services can use a durable outbox pattern now, and later swap delivery to Kafka/RabbitMQ behind the same API. Publish an event: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8013/events ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","event_type":"run.created","source_service":"runtime-service","aggregate_type":"workflow_run","aggregate_id":"run-id","payload_json":{"run_id":"run-id"}}' ``` Claim pending events for a delivery worker: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8013/events/claim-pending ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","limit":50}' ``` Through `api-gateway`, use `/gateway/events/**`. ## Auth Service APIs `auth-service` stores users, roles, role assignments, and permission checks. This is the first RBAC layer for tenant governance. ```powershell $user = Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8014/auth/users ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","username":"alice","display_name":"Alice"}' $role = Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8014/auth/roles ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","code":"admin","name":"Admin","permissions_json":["*"]}' Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8014/auth/assignments ` -ContentType "application/json" ` -Body "{`"tenant_id`":`"t1`",`"user_id`":`"$($user.id)`",`"role_id`":`"$($role.id)`"}" Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8014/auth/permissions/check ` -ContentType "application/json" ` -Body "{`"tenant_id`":`"t1`",`"user_id`":`"$($user.id)`",`"permission`":`"workflow:write`"}" ``` Through `api-gateway`, use `/gateway/auth/**`. ## Scheduler Service APIs `scheduler-service` stores delayed jobs and due-job leases for time-based automation. It is intentionally service-neutral: jobs can target HTTP, event, runtime, agent, or team execution. Create a scheduled job: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8015/scheduler/jobs ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","job_type":"runtime","name":"Run workflow later","schedule_time":"2026-04-26T12:00:00Z","payload_json":{"workflow_run_id":"run-id"}}' ``` Claim due jobs for a worker: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8015/scheduler/jobs/claim-due ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","worker_key":"scheduler-worker-1","limit":20}' ``` Mark a job completed or failed: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8015/scheduler/jobs/job-id/status ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","status":"completed"}' ``` Through `api-gateway`, use `/gateway/scheduler/**`. Run the scheduler worker locally: ```powershell Push-Location .\services\scheduler-service $env:AGENT_PLATFORM_DATABASE_URL="sqlite:///./scheduler_service.db" $env:AGENT_PLATFORM_EVENT_SERVICE_URL="http://127.0.0.1:8013" python -m app.worker Pop-Location ``` Execute an agent run without calling an external model: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8007/agents/runs/agent-run-id/execute ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","worker_key":"agent-worker-1","dry_run":true}' ``` Execute with `model-gateway-service`: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8007/agents/runs/agent-run-id/execute ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","worker_key":"agent-worker-1"}' ``` Agent memory policy is stored on `agent_version.memory_policy_json`: - `enabled`: read memories before execution - `memory_scope`: one of `tenant`, `user`, `session`, `agent`, or `team` - `read_top_k`: maximum memories to inject into the prompt - `write_enabled`: write a conversation memory after successful model execution - `config_json.write_importance_score`: optional importance score for written memories Agent capability refs are stored on `agent_version.tool_refs_json` and `agent_version.skill_refs_json`. - Tool refs are selected when `required=true`, `config_json.auto_invoke=true`, or `selection_keywords` match the run input. - Skill refs are selected by default unless `config_json.auto_invoke=false`; `selection_keywords` can also select them. - Dry-run execution returns `selected_tool_refs` and `selected_skill_refs` without calling downstream tools/skills. - Normal execution invokes selected HTTP tool bindings and selected skills before the model call, then injects their results into the model messages. Example version with session memory: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8007/agents/versions ` -ContentType "application/json" ` -Body '{"tenant_id":"t1","agent_id":"agent-id","status":"published","role":"assistant","system_prompt":"Use relevant memory when helpful.","memory_policy":{"enabled":true,"memory_scope":"session","read_top_k":5,"write_enabled":true,"config_json":{"write_importance_score":60}}}' ``` Execute one queued agent run through the worker claim API: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8007/agents/workers/execute-next ` -ContentType "application/json" ` -Body '{"worker_key":"agent-worker-1","lease_seconds":300,"dry_run":true}' ``` Run a standalone agent worker process: ```powershell Push-Location .\services\agent-service $env:AGENT_PLATFORM_DATABASE_URL="sqlite:///./agent_service.db" $env:AGENT_PLATFORM_WORKER_DRY_RUN="true" ..\..\.venv\Scripts\python -m app.worker Pop-Location ``` `runtime-service` now includes a typed executor skeleton for these node types: - `llm` - `tool` - `code` - `human` - `approval` - `human-input` - `human-takeover` - `answer` - `if-else` - `assigner` - `knowledge-retrieval` - `template-transform` Execute a specific queued node: ```powershell Invoke-RestMethod -Method Post ` -Uri http://127.0.0.1:8003/runtime/node-runs/node-run-id/execute ` -ContentType "application/json" ` -Body '{"worker_key":"runtime-worker-1"}' ``` Execute the next queued node in a run: ```powershell Invoke-RestMethod -Method Post ` -Uri "http://127.0.0.1:8003/runtime/runs/run-id/execute-next?tenant_id=t1" ` -ContentType "application/json" ` -Body '{"worker_key":"runtime-worker-1"}' ``` Execute queued nodes in sequence until the run is finished, blocked, or reaches `max_steps`: ```powershell Invoke-RestMethod -Method Post ` -Uri "http://127.0.0.1:8003/runtime/runs/run-id/execute?tenant_id=t1" ` -ContentType "application/json" ` -Body '{"worker_key":"runtime-worker-1","max_steps":16}' ``` Execute one queued node through the worker claim API: ```powershell Invoke-RestMethod -Method Post ` -Uri "http://127.0.0.1:8003/runtime/workers/execute-next" ` -ContentType "application/json" ` -Body '{"worker_key":"runtime-worker-1","lease_seconds":300}' ``` Run a standalone runtime worker process: ```powershell Push-Location .\services\runtime-service $env:AGENT_PLATFORM_DATABASE_URL="sqlite:///./runtime_service.db" ..\..\.venv\Scripts\python -m app.worker Pop-Location ``` The worker uses `node_run.status` plus `lease_expire_time` as a DB-backed queue. This keeps the first scalable version dependency-light; for heavier production concurrency, move `AGENT_PLATFORM_DATABASE_URL` to PostgreSQL before scaling many workers. Node execution results are now persisted on `node_run`: - `output_text` - `output_json` Node execution artifacts are also persisted on `node_artifact`: - `artifact_type` - `content_text` - `content_json` - `storage_uri` - `size_bytes` Query artifacts: ```powershell Invoke-RestMethod ` -Uri "http://127.0.0.1:8003/runtime/node-artifacts?tenant_id=t1&run_id=run-id" ``` Trace spans are persisted on `trace_span` for timeline and latency analysis: - `span_type` - `name` - `status` - `started_time` - `ended_time` - `duration_ms` - `attributes_json` - `error_code` - `error_message` Query trace spans: ```powershell Invoke-RestMethod ` -Uri "http://127.0.0.1:8003/runtime/trace-spans?tenant_id=t1&run_id=run-id" ``` Current behavior: - `answer` nodes persist rendered text to `output_text` - `assigner` nodes write `state_updates` to `output_json` - `condition` / `if-else` nodes write `condition_result` and `route` to `output_json` - `template-transform` nodes render text or JSON using previous node outputs and run state - `knowledge-retrieval` / `retriever` nodes run keyword retrieval over inline or HTTP JSON documents - `tool` nodes persist resolved binding/tool metadata to `output_json` - default executors persist basic executor metadata to `output_json` - parallel fan-out is supported by defining multiple outgoing edges from one node - join nodes wait for predecessor completion with `config.join_policy` - loop/re-entry is supported with `config.allow_loop=true` and `config.max_iterations` - retry is supported with `config.retry_policy.max_attempts` and `retry_delay_seconds` - delayed scheduling and node timeout use `config.delay_seconds` and `config.timeout_seconds` - compensation nodes can be queued on failure with `config.compensation_node_id` Runtime template context: - `state.xxx`: values written by previous `assigner` nodes - `nodes.node_id.output.xxx`: structured output from a previous node - `nodes.node_id.text`: text output from a previous node - `current.node_id`: current node id Assigner node config example: ```json { "id": "seed-state", "type": "assigner", "config": { "assignments": { "score": 7, "user_name": "Alice" } } } ``` Condition node config example: ```json { "id": "check-score", "type": "if-else", "config": { "expression": "state.score >= 5" } } ``` Conditional edge example: ```json [ {"source": "check-score", "target": "high-path", "condition": "true"}, {"source": "check-score", "target": "low-path", "condition": "false"} ] ``` Join node config example: ```json { "id": "join-results", "type": "join", "config": { "join_policy": "all_completed" } } ``` Loop and retry config example: ```json { "id": "poll-status", "type": "tool", "config": { "allow_loop": true, "max_iterations": 5, "timeout_seconds": 30, "retry_policy": { "max_attempts": 3, "retry_delay_seconds": 2 } } } ``` Compensation config example: ```json { "id": "charge-card", "type": "tool", "config": { "compensation_node_id": "refund-card" } } ``` Template node config example: ```json { "id": "high-path", "type": "template-transform", "config": { "template": "{{state.user_name}} passed with score {{state.score}}" } } ``` Retriever node config example: ```json { "id": "retrieve-docs", "type": "knowledge-retrieval", "config": { "query_template": "{{state.query}}", "top_k": 2, "documents": [ { "id": "refund", "title": "Refund Policy", "text": "Refund policy allows returns within seven days." }, { "id": "shipping", "title": "Shipping Policy", "text": "Shipping usually takes three to five business days." } ] } } ``` Retriever nodes can call `knowledge-service` directly: ```json { "id": "retrieve-kb", "type": "knowledge-retrieval", "config": { "knowledge_base_id": "kb-id", "query_template": "{{state.query}}", "top_k": 3, "filters_json": { "source_type": "text" } } } ``` Retriever output is persisted to `node_run.output_json.retrieved_documents`. Template nodes can consume it: ```json { "id": "render-answer", "type": "template-transform", "config": { "template": "Top doc: {{nodes.retrieve-docs.output.retrieved_documents.0.title}}" } } ``` Retriever nodes can also load documents from an HTTP JSON source: ```json { "id": "retrieve-remote-docs", "type": "retriever", "config": { "query": "refund policy", "source_url": "http://127.0.0.1:9000/documents", "top_k": 3 } } ``` The HTTP source should return either a document list or an object with a `documents` list. Run the no-key runtime smoke test after local services are running: ```powershell .\.venv\Scripts\python scripts\smoke_runtime_no_key.py ``` Run the same smoke test through `api-gateway`: ```powershell $env:AGENT_PLATFORM_SMOKE_WORKFLOW_URL="http://127.0.0.1:8000/gateway/workflows" $env:AGENT_PLATFORM_SMOKE_RUNTIME_URL="http://127.0.0.1:8000/gateway/runtime" .\.venv\Scripts\python scripts\smoke_runtime_no_key.py ``` ## API Gateway `api-gateway` provides a unified entrypoint: - `GET /gateway/services/health` - `/gateway/workflows/**` -> `workflow-service /workflows/**` - `/gateway/sessions/**` -> `session-service /sessions/**` - `/gateway/runtime/**` -> `runtime-service /runtime/**` - `/gateway/agents/**` -> `agent-service /agents/**` - `/gateway/memories/**` -> `memory-service /memories/**` - `/gateway/teams/**` -> `team-service /teams/**` - `/gateway/skills/**` -> `skill-service /skills/**` - `/gateway/human/**` -> `human-service /human/**` - `/gateway/knowledge/**` -> `knowledge-service /knowledge/**` - `/gateway/events/**` -> `event-service /events/**` - `/gateway/auth/**` -> `auth-service /auth/**` - `/gateway/scheduler/**` -> `scheduler-service /scheduler/**` - `/gateway/tools/**` -> `tool-service /tools/**` - `/gateway/models/**` -> `model-gateway-service /models/**` - `/gateway/code/**` -> `code-runner-service /code/**` Gateway readiness: ```powershell Invoke-RestMethod -Uri "http://127.0.0.1:8000/ready" ``` Downstream health: ```powershell Invoke-RestMethod -Uri "http://127.0.0.1:8000/gateway/services/health" ``` Gateway request context: - Incoming `x-request-id` is reused; otherwise gateway generates one. - Incoming `x-tenant-id` is reused; otherwise gateway falls back to `tenant_id` query parameter, then `public`. - Gateway forwards both `x-request-id` and `x-tenant-id` to downstream services. - Gateway writes request audit records to `gateway_request_audit`. Query gateway audits: ```powershell Invoke-RestMethod ` -Uri "http://127.0.0.1:8000/gateway/audits?tenant_id=t1&limit=20" ` -Headers @{"x-tenant-id"="t1"} ``` Query gateway audit stats: ```powershell Invoke-RestMethod ` -Uri "http://127.0.0.1:8000/gateway/audits/stats?tenant_id=t1" ` -Headers @{"x-tenant-id"="t1"} ``` Gateway API Key auth: - `AGENT_PLATFORM_AUTH_REQUIRED=false` by default for local development. - Set `AGENT_PLATFORM_AUTH_REQUIRED=true` to protect `/gateway/**`, except `/gateway/services/health`. - API keys are stored as SHA-256 hashes. The raw key is only returned once at creation. - When auth is enabled and no API key exists yet, the first `POST /gateway/api-keys` is allowed as bootstrap. - API keys can be `active`, `disabled`, or `revoked`; only `active` keys are accepted. - If an API key has `scopes`, gateway checks them before proxying. Use `*`, `gateway:agents:*`, or exact permissions such as `gateway:agents:read`. - Set `AGENT_PLATFORM_AUTHZ_REQUIRED=true` to require `x-user-id` and call `auth-service` `/auth/permissions/check` for the derived permission. Create an API key: ```powershell $body = @{ tenant_id = "t1" name = "local-dev" scopes = "gateway:agents:* gateway:runtime:read" } | ConvertTo-Json $created = Invoke-RestMethod ` -Method Post ` -Uri "http://127.0.0.1:8000/gateway/api-keys" ` -ContentType "application/json" ` -Body $body $created.api_key ``` Use an API key: ```powershell Invoke-RestMethod ` -Uri "http://127.0.0.1:8000/gateway/audits?tenant_id=t1" ` -Headers @{"x-tenant-id"="t1"; "x-api-key"=$created.api_key} ``` Disable or revoke an API key: ```powershell $body = @{ tenant_id = "t1" status = "revoked" } | ConvertTo-Json Invoke-RestMethod ` -Method Patch ` -Uri "http://127.0.0.1:8000/gateway/api-keys/$($created.id)/status" ` -ContentType "application/json" ` -Headers @{"x-tenant-id"="t1"; "x-api-key"=$created.api_key} ` -Body $body ``` Run smoke test through an authenticated gateway: ```powershell $env:AGENT_PLATFORM_SMOKE_WORKFLOW_URL="http://127.0.0.1:8000/gateway/workflows" $env:AGENT_PLATFORM_SMOKE_RUNTIME_URL="http://127.0.0.1:8000/gateway/runtime" $env:AGENT_PLATFORM_SMOKE_TENANT_ID="t1" $env:AGENT_PLATFORM_SMOKE_API_KEY=$created.api_key .\.venv\Scripts\python scripts\smoke_runtime_no_key.py ``` HTTP tool node config example: ```json { "id": "search-products", "type": "tool", "config": { "tool_binding_id": "binding-1", "query": { "keyword": "milk" } } } ``` Supported HTTP tool config resolution order: - URL: `config.url` or `invoke_config_json.url` - Base URL: `config.base_url` or `binding.config_json.base_url` or `invoke_config_json.base_url` - Path: `config.path` or `invoke_config_json.path` - Method: `invoke_config_json.method`, default `GET` - Query params: merge `invoke_config_json.query` + `config.query` - Body JSON: merge `invoke_config_json.body` + `config.body` - Headers: merge `invoke_config_json.headers` + `binding.config_json.headers` + `config.headers` LLM node config example: ```json { "id": "draft-answer", "type": "llm", "config": { "model": "gpt-4o-mini", "system_prompt": "You are a customer support assistant.", "prompt": "Summarize the user intent in Chinese.", "temperature": 0.2, "max_tokens": 400 } } ``` `llm` nodes also support explicit `messages`: ```json { "id": "rewrite-message", "type": "llm", "config": { "model": "gpt-4o-mini", "messages": [ {"role": "system", "content": "You are a concise editor."}, {"role": "user", "content": "Rewrite this sentence in a warmer tone."} ] } } ``` `runtime-service` sends `llm` execution requests to `model-gateway-service`, and the gateway forwards them to an OpenAI-compatible `/chat/completions` provider. Recommended environment variables for `model-gateway-service`: ```powershell $env:AGENT_PLATFORM_PROVIDER_BASE_URL="https://api.openai.com/v1" $env:AGENT_PLATFORM_PROVIDER_API_KEY="your-api-key" $env:AGENT_PLATFORM_DEFAULT_MODEL="gpt-4o-mini" ``` Code node config example: ```json { "id": "compute-summary", "type": "code", "config": { "language": "python", "timeout_seconds": 5, "input_json": { "numbers": [1, 2, 3, 4] }, "code": "total = sum(payload['numbers'])\nresult = {'total': total, 'count': len(payload['numbers'])}\nprint(f'total={total}')" } } ``` `runtime-service` sends `code` execution requests to `code-runner-service`. Current `python` execution contract: - input payload is available as `payload` - execution result should be assigned to `result` - `print(...)` output is captured into `node_run.output_text` - structured `result` is captured into `node_run.output_json.result_json` Recommended environment variables for `code-runner-service`: ```powershell $env:AGENT_PLATFORM_PYTHON_BIN="python" $env:AGENT_PLATFORM_MAX_TIMEOUT_SECONDS="30" ``` ## Docker Compose Files: - `deployments/docker/docker-compose.yml` - `deployments/docker/python-service.Dockerfile` - `deployments/docker/.env.example` Start all services locally: ```powershell cd D:\workspace\auto-platform Copy-Item .\deployments\docker\.env.example .\.env docker compose -f .\deployments\docker\docker-compose.yml up --build ``` Start in detached mode: ```powershell docker compose -f .\deployments\docker\docker-compose.yml up --build -d ``` Production-like infrastructure: - Compose now starts `postgres` with the `pgvector` image and runs `CREATE EXTENSION IF NOT EXISTS vector`. - Compose now starts durable `redis` with append-only persistence. - Copy `deployments/docker/.env.example` to `.env` to use per-service PostgreSQL databases such as `workflow_service`, `agent_service`, and `knowledge_service`. - Set `AGENT_PLATFORM_REDIS_URL=redis://redis:6379/0` to enable shared Redis-backed locks, idempotency keys, and queues. Scale runtime workers: ```powershell docker compose -f .\deployments\docker\docker-compose.yml up --build -d --scale runtime-worker=3 ``` Scale agent workers: ```powershell docker compose -f .\deployments\docker\docker-compose.yml up --build -d --scale agent-worker=3 ``` Scale team workers: ```powershell docker compose -f .\deployments\docker\docker-compose.yml up --build -d --scale team-worker=3 ``` Scale scheduler workers: ```powershell docker compose -f .\deployments\docker\docker-compose.yml up --build -d --scale scheduler-worker=3 ``` Stop and remove containers: ```powershell docker compose -f .\deployments\docker\docker-compose.yml down ``` Important notes: - Services still fall back to SQLite files under `/data` if `AGENT_PLATFORM_DATABASE_URL` is not set. - For scaled workers, use PostgreSQL plus Redis rather than SQLite. - `core-shared.redis_primitives` provides `DistributedLock`, `IdempotencyStore`, and `RedisQueue` for services that need cross-process coordination. - `agent-service` stores agent definitions, prompt/config versions, and agent run records under `/data` - `memory-service` stores scoped memories under `/data`; move it to PostgreSQL before enabling high-volume memory writes - `team-service` stores multi-agent team definitions, team versions, and team run records under `/data` - `team-worker` executes queued team runs by orchestrating member agent runs; it can be scaled independently - `skill-service` stores skill definitions, versions, marketplace-style installations, and skill execution runs under `/data` - `human-service` stores human approval, input, pause/resume, and takeover task records under `/data` - `knowledge-service` stores knowledge bases, documents, chunks, and local retrieval metadata under `/data` - `event-service` stores platform events and delivery status under `/data` - `auth-service` stores users, roles, assignments, and permission policy metadata under `/data` - `scheduler-service` stores delayed jobs, due-job leases, and retry status under `/data` - `agent-worker` has no exposed port and can be scaled independently; set `AGENT_PLATFORM_AGENT_WORKER_DRY_RUN=true` for no-key local smoke runs - `scheduler-worker` has no exposed port and can be scaled independently; prefer PostgreSQL for real multi-worker write concurrency - `runtime-worker` has no exposed port and can be scaled independently; prefer PostgreSQL for real multi-worker write concurrency - `runtime-service` automatically resolves internal URLs to `workflow-service`, `tool-service`, `model-gateway-service`, and `code-runner-service` - `model-gateway-service` defaults to `http://host.docker.internal:11434/v1`; replace it in `.env` if you want OpenAI or another OpenAI-compatible provider