Building a Multi-Agent ITSM System with AutoGen: A Step-by-Step Tutorial

By Sam Qikaka

Category: Agents & Architecture

Learn how B2B operations leaders can design a multi-agent system for IT service management using AutoGen. This practical tutorial covers incident triage, cost-aware routing, API integrations, and a simulated scenario demonstrating potential MTTR reductions—all with open-weight models.

Introduction to Multi-Agent Systems for IT Service Management IT service desks face a constant flood of incidents—password resets, server alerts, application hiccups. For B2B operations leaders, the challenge is clear: resolve tickets faster without ballooning headcount. Enter multi-agent AI, where specialized software agents collaborate to triage, troubleshoot, and even prevent issues. This tutorial walks you through building such a system using AutoGen , Microsoft’s open-source framework for creating conversational agents. While recent rumors suggest a future AutoGen v0.7 with built-in observability and cost-aware routing, our research confirms that as of this writing, the official release is AutoGen v0.2 (check and for the latest). No v0.7 exists yet. So we’ll build those capabilities ourselves—designing agents that log every step, route tasks based on model cost and complexity, and i

ntegrate with real ITSM APIs. The result? A vendor-neutral, open-weight multi-agent system that, in our simulated enterprise scenario, demonstrates a potential 30% reduction in mean time to resolve (MTTR) . You’ll learn by doing. We’ll create three specialized agents: - Incident Triage Agent – classifies and prioritizes incoming tickets. - Resolution Agent – fetches knowledge base articles, runs diagnostics, and suggests fixes. - Root Cause Analyst – digs deeper when patterns emerge, reducing recurring incidents. All will coordinate via a GroupChatManager in AutoGen, with cost-aware routing that chooses which agent (and which underlying open-weight model) handles each step. No proprietary vendor lock-in required—we’ll use models like Llama 3.1 or Mistral via local or API endpoints. Let’s get started. Setting Up AutoGen with Observability First, install AutoGen and prepare your environmen

t. We’ll use Python 3.11+ and a virtual environment. For open-weight models, you can run them locally via Ollama or llama.cpp, or use a cloud endpoint (e.g., DeepInfra, Together AI). This tutorial assumes you have a local Llama 3.1 running at via Ollama. Create a configuration file (AutoGen’s standard way of defining models): AutoGen agents expect a . We’ll load it: Instrumenting Observability from Day One AutoGen v0.2 doesn’t have a built-in observability dashboard, but we can add structured logging and tracing. Create a simple logger that captures agent messages, tool calls, and latencies. This data becomes essential for measuring MTTR later. Every agent we create will call at key points. This gives you a makeshift observability layer—until future AutoGen versions (perhaps v0.7) provide native support. You can later feed these logs into Grafana or an ELK stack. Designing the Incident T

riage Agent The Triage Agent is the first responder. It receives tickets from your ITSM system (ServiceNow, Jira, etc.) and decides urgency and category. Define its system prompt carefully: For integration, we’ll wrap this in a function that AutoGen can call via . But first, we need to fetch tickets from an ITSM tool—that’s covered in the API integration section. For now, we’ll simulate a ticket stream. Create a second agent that acts as the human proxy (or the ITSM bridge) to feed tickets into the group chat. In AutoGen’s , a dedicated can represent the external system. The Triage Agent will publish its classification, and the GroupChatManager will decide the next step based on that output—this is where cost-aware routing comes in. Implementing Cost-Aware Routing for Ticket Resolution Cost-aware routing means choosing the cheapest model (or agent path) that can successfully handle a tas

k of a given complexity. Since AutoGen v0.2 doesn’t have native routing, we’ll implement it as a custom speaker selection function for the GroupChatManager. First, define a cost table (hypothetical prices for open-weight models per 1K tokens if using a cloud provider, or just relative costs if local). For this tutorial, we’ll assign complexity scores to tickets: - P4, simple : “password reset” → low complexity → cheapest model. - P2, moderate : “VPN connection drops” → medium → mid-tier. - P1, complex : “server down, multiple alerts” → high → most capable model (but still open-weight). We’ll use Mistral 7B for low complexity, Llama 3.1 8B for medium, and Llama 3.1 70B (if available) for high—but you can adjust to your environment. But wait—our resolution agent is just one agent. To parameterize model choice per call, we can dynamically set the inside the agent’s ? That’s not straightforw

ard. A cleaner AutoGen pattern is to have separate agents each with a different model, and route to them. So we’ll create: - (Mistral 7B) - (Llama 8B) - (Llama 70B) All share the same system prompt but different models. This is a practical way to do cost-aware routing in current AutoGen. Later, when