Platform Engineering Lessons from an AI Detective Game
From “Adult in the Room” to Autonomous Agents
In my previous post, Engineering Beyond Agile: The Platform Becomes the Adult in the Room, I explored how the Agentile organisation impacts platform engineering. The core idea: as teams adopt autonomous agents, the platform has to step up and provide the structure – setting intent, defining guardrails, and making sure things don’t go sideways. But writing about that is one thing. To actually feel how it works, I decided to build something.
Detective Agentic Mysteries is a 2D detective game I built where every NPC is an independent AI agent. It was built entirely with GitHub Copilot CLI, using the Copilot SDK to orchestrate multiple AI agents – each with its own memory, personality, emotional state, and tools – into a cohesive, emergent experience. Every playthrough is unique because the AI agents make real decisions, not scripted ones.
The game works as a contained demonstration of what happens when you give a set of autonomous agents’ clear roles, constrained tools, and a shared state – and let them interact. Not a toy example. A functioning multi-agent system that produces complex, unscripted outcomes. The multi-agent architecture powering this game is the same pattern that platform engineers can use to orchestrate incidents, manage infrastructure, and automate complex workflows.
Below, I’ll break down how the game works, how its concepts map to real platform use-cases – from incident response to continuous compliance – and what I learned about designing agentic systems with intent and guardrails.
What the Game Actually Does
When you play Detective Agentic Mysteries, you step into the role of a detective investigating a murder. Underneath, the game runs 9+ concurrent AI agents, each as an independent Copilot SDK session with its own system prompt, tools, and conversation history. Here’s the cast:
9+ AI suspects with unique personalities, secrets, alibis, and emotional states.
An AI Director that orchestrates the world between days – moving NPCs, planting evidence, escalating drama.
An AI Forensics Analyst that provides scientific analysis of collected evidence.
An AI Narrator that generates atmospheric noir prose in real-time.
An AI Profiler that studies how you play and adapt NPC responses accordingly.
Procedurally generated mysteries – a “Skeleton Key” agent can generate entirely new settings, characters, and crimes.
Day/night cycles where NPCs have private conversations, form alliances, and betray each other.
Multiple levels – Victorian manor and cruise ship, plus randomly generated scenarios.
Each agent has a distinct role and a specific toolset. The NPC suspects, for instance, can check evidence, reveal clues, update their own emotional state, express body language, create physical evidence, gossip about the detective, and more. The Director has tools to read the full game state, review conversation history, submit night plans, trigger panic events, form alliances, and tamper with evidence.
The Forensics agent does pure analysis with no state-modifying tools. The Profiler watches the player’s interrogation style and updates a detective profile. There’s even a Red Herring agent – a dynamically created suspect designed to mislead the player, using the same tools as the other NPC suspects.
None of these agents are scripted. They decide what to say, how to react, and what to do based on their identity (system prompt), their memory (conversation history), and the tools available to them.
The Architecture: How Agents Coordinate
The key insight from the Copilot SDK is that createSession() isn’t just a chat API – it’s an agent factory. Each session gets a system prompt that defines its identity and knowledge, tools that let it take actions in the world, conversation history that persists across messages (memory), and streaming for real-time response delivery.
Each agent gets its full identity injected via systemMessage: { mode: “replace” }, meaning the AI fully embodies the character rather than appending to a default prompt. This is an important design decision: complete identity ownership per agent.
How Agents Communicate
Agents don’t talk directly to each other by default. The Express server acts as the orchestration layer, mediating all inter-agent communication through several patterns:
1. Tool-Mediated State Changes. NPCs call tools (e.g. update_sentiment, create_evidence) that modify shared game state. Other agents read this state through their own tools. This is indirect communication – agents influence each other through a shared world. For example, when NPC A calls update_sentiment(”scared”), NPC B can later call get_overheard_info() and learn that A is scared.
2. onPostToolUse Hooks (Reactive Side Effects). The Copilot SDK’s SessionHooks allow registering lifecycle callbacks that fire after every tool call. The game uses onPostToolUse to propagate side effects automatically – during both player interrogations and night conversations. For instance, if an NPC’s emotional state shifts to “desperate,” nearby NPCs are automatically informed through eavesdropping. If a high-importance clue is revealed, others in the room notice it. This replaces manual eavesdropping wiring and ensures side effects fire consistently regardless of context. The hook receives the tool name, arguments, and result – enabling reactive cross-agent behavior without tight coupling.
3. Server-Orchestrated Conversations (Night Phase). During the night phase, the server takes NPC A’s response, injects it as context into NPC B’s prompt, and vice versa. Each night conversation is 4 exchanges (2 rounds of back-and-forth). NPCs can form alliances, share secrets, or betray each other – all generating new leads for the player to discover the next morning.
4. Fire-and-Forget Broadcasting. After a player interrogates an NPC, the server broadcasts gossip to all other NPCs, profiles the player’s style, and checks for contradictions – all asynchronously. One action fans out into multiple side processes without blocking the main interaction.
Session Health & Recovery
AI sessions can stall or crash. The system auto-recovers through several mechanisms: talkWithRetry() gives 2 attempts with session.abort() between failures; active request tracking monitors every sendAndWait with a timestamp; a 30-second healthcheck detects sessions stuck for more than 50 seconds, aborts and recreates them; and sendAndCollect() for night conversations also tracks and auto-recovers. The health system prioritizes resilience over correctness – a recovered session loses its conversation history but keeps its identity. This is the same principle production systems need: things will fail, so the system must handle it gracefully.
Tech Stack
Table: Tech stack from the project README. md
Design Decisions That Shaped the System
Building the game with Copilot CLI followed a tight loop: describe what you want, Copilot generates the system prompt and tools, playtest, iterate on the prompt, add mechanics. The game’s complexity emerged from composing simple agent patterns, not from writing complex game logic.
Seven design decisions defined the architecture:
systemMessage: { mode: “replace” } — Each agent gets its full identity injected as the system message, not appended to a default. The AI fully embodies the character.
Tools as agency — NPCs don’t just respond to questions. They can act: update their emotions, create physical evidence, express body language, gossip about the detective. This makes them feel alive rather than reactive.
The Director pattern — Instead of scripting game events, an AI agent’s entire job is to orchestrate other agents. It reads game state through tools, decides what happens next, and submits a structured plan. The server executes that plan.
Server as orchestrator, not participant — The Express server never generates content. It routes messages between agents, manages session lifecycle, and applies game rules. The AI agents make all creative decisions.
Fire-and-forget side effects — After every player action, multiple background processes run: gossip spreading, player profiling, contradiction detection. These create emergent complexity without blocking the main interaction.
Hooks for reactive behavior — The SDK’s onPostToolUse hook fires after every tool call, enabling automatic side effects (emotional shift propagation, clue eavesdropping, body language observation) without manual wiring. The same reactive behaviors work during both player interrogations and NPC night conversations.
Resilience over correctness — Sessions crash, responses stall, agents hallucinate. The health system (talkWithRetry, stuck detection, auto-recreation) ensures the game keeps running. A recovered session loses its conversation history but keeps its identity.
Game → Platform: The Parallels
The multi-agent architecture in this game maps directly to real-world platform engineering challenges. The patterns I used – independent agents with specialized tools, a director/orchestrator, server-mediated communication, health monitoring, and auto-recovery – are exactly what’s needed for complex operational systems.
Table: Direct parallels between game mechanics and platform engineering concepts, from the project README.md
Concrete Use Cases
To make the connection tangible, consider swapping “murder mystery” for “production incident.”
Incident Response Orchestration. Each agent owns a domain. A Database Agent has tools to query slow query logs, check replication lag, and trigger failover. A Networking Agent can run traceroutes, check DNS, and update load balancers. An App Agent can pull distributed traces, check dependency health, and trigger rollbacks. An Incident Commander reads all agent findings (like the Director reading game state), synthesizes a response plan, and coordinates execution. The same sendAndWait + tools pattern that lets NPCs interrogate each other lets infrastructure agents share findings and coordinate actions.
Change Management & Deployment. A Director-like agent analyzes the deployment plan and checks all service dependencies. Service agents each check their own health, dependencies, and readiness, reporting back through tools. The Director sequences the rollout, handles failures, and coordinates rollbacks.
Platform Self-Service. Give teams their own agents (like NPC suspects) that understand their services: “What’s the status of the payment service?” – the Payment Agent checks metrics, recent deploys, open incidents. “Why is latency high?” – the agent investigates, correlates with recent changes, suggests fixes. “Scale up the worker pool” – the agent validates the request, checks quotas, executes safely.
Continuous Compliance & Auditing. The Profiler pattern (studying detective behavior) becomes a compliance agent that studies infrastructure changes and flags violations – not through rigid rules, but through contextual understanding of why a change was made and whether it matches organizational policies.
Event-Driven Automation with Hooks. The onPostToolUse hook pattern from the game – where every NPC tool call triggers automatic side effects – maps directly to event-driven platform automation. Every agent action automatically produces audit logs, notifications, and cascading effects – without the agent itself needing to know about them. The hook runs at the SDK level, so it works whether the action was triggered by a human operator, another agent, or an automated schedule.
Why This Architecture Works for Platform Engineering
Building this game crystallized a set of principles that apply directly to designing autonomous platform systems. The README documents seven of them, and I think they deserve attention:
Agent isolation — Each agent owns its domain. A database agent doesn’t need to understand networking. Agents can be updated independently.
Tool-mediated safety — Agents can only do what their tools allow. A monitoring agent can read metrics but can’t modify infrastructure. A deployment agent can roll back but can’t access secrets. Permissions are baked into the tool definitions. [github.com]
Orchestrator pattern — Complex workflows (incident response, multi-service deployments) need a coordinator that can read the full picture and make decisions. The Director agent pattern solves this. [github.com]
Resilience is built in — The health check + auto-recovery pattern from this game (talkWithRetry, stuck detection, session recreation) is exactly what production agent systems need. Agents will fail. The system must recover gracefully. [github.com]
Emergent behavior — Just as NPC conversations create unexpected plot developments, agent-to-agent communication in platform engineering can surface insights no single agent would find alone: “The database agent reports increased connection timeouts at the same time the networking agent detected a BGP route change”. [github.com]
Hooks for cross-cutting concerns — The onPostToolUse pattern separates what agents do from what happens as a result. Agents focus on their domain; hooks handle audit logging, notification, compliance checks, and cascading effects. This is the agent equivalent of middleware or aspect-oriented programming. [github.com]
Human-in-the-loop — The detective (player) decides when to act on information. In platform engineering, the operator reviews agent recommendations before executing. The architecture supports both autonomous and supervised modes. [github.com]
These principles map closely to what the broader industry is calling Agentic Platform Engineering – the practice of building platforms augmented with goal-driven, context-aware AI agents that operate within platform-defined constraints, making decisions within explicit guardrails such as security, cost, and operational constraints. As described by platformengineering.org, the goal is not full autonomy, but bounded autonomy: enabling platforms to handle complex operational tasks without requiring constant human intervention, while novel or high-risk situations still require human judgment.
This compositional approach – treating the platform as a composed system of specialized agents with narrowly defined roles, coordinated through shared context, explicit policies, and supervisory control – improves safety and explainability while allowing autonomy to be introduced incrementally rather than all at once. [platformen...eering.org]
Conclusion: The Gap Is Smaller Than You Think
The leap from “game NPC” to “platform agent” is smaller than you think – it’s the same SDK, the same patterns, just different tools and prompts. The patterns from this game are directly reusable: define agent identity (system prompt), define agent capabilities (tools), create agent sessions with hooks for cross-cutting concerns, and orchestrate using the same Director pattern.
For platform engineers and tech leads, building something like Detective Agentic Mysteries offers a concrete way to stress-test multi-agent architectures before applying them to production infrastructure. The game demonstrates that multiple agents working in concert under the right constraints can produce complex, coherent outcomes – whether that’s solving a fictional murder or coordinating a real incident response.
All seven principles come back to the same idea: balance intent with guardrails. Let agents handle the goals, decisions, and problem-solving within their domain. Let the platform provide the constraints, communication channels, and supervision mechanisms that keep things on track. That balance was critical for making the game work, and it’s equally critical for making platform engineering work in an era of autonomous systems.
The platform remains the adult in the room. The agents play out the drama.




