Multi-Agent System for Hotel Operations: Replicable Blueprint from a 30-Hotel Pilot
By Sam Qikaka
Category: Agents & Architecture
As of May 23, 2026, a 30-hotel pilot using a multi-agent system on AWS Bedrock with Llama 5, Qwen 3.8 Max, and a fine-tuned concierge agent cut check-in time by 40% and front desk costs by 25%. This vendor-neutral guide details agent roles, latency benchmarks, and integration with existing PMS systems for hospitality operators.
Why Hotels Need a Multi-Agent Approach, Not a Single Chatbot As of May 23, 2026, the hospitality industry faces a familiar squeeze: guests expect frictionless, instant check-in while operators try to contain rising labor costs. A single AI chatbot—no matter how well trained—struggles to handle the variety of tasks required during guest arrival. It must verify identity, process payments, answer local-area questions, handle special requests (early check-in, room changes), and integrate with legacy property management systems (PMS). A monolithic bot hits latency bottlenecks, confuses intents, and fails to scale under peak-hour loads. That is why leading hotel groups are moving to a multi-agent system for hotel operations , where specialized agents share a common orchestration layer. Instead of one model doing everything, a routing agent delegates tasks to purpose-built models. Agent Roles i
n the Check-In Pipeline: Orchestrator, Concierge, and Specialist Agents In the proven pilot architecture, three distinct agent roles collaborate: - Orchestrator Agent (Llama 5 on AWS Bedrock): Receives the guest's initial request, extracts structured intent (e.g., check-in, baggage hold, upgrade request), and routes to the appropriate specialist. It maintains session state and handles hand-offs. Llama 5 was chosen for its strong reasoning and instruction-following capabilities. - Concierge Agent (Fine-tuned Llama 5 or a smaller Qwen variant): Handles conversational guest interactions: greeting, answering hotel-specific questions (restaurant hours, pool access), collecting preferences for room amenities, and managing upsell offers. This agent is fine-tuned on historical guest interaction logs and property-specific data. - Specialist Agents (Qwen 3.8 Max): Perform discrete backend operatio
ns: identity verification (scan and match passport data), payment processing (tokenization and charge via PMS integration), and digital key issuance. Qwen 3.8 Max excels at fast, reliable extraction and API call generation. This separation of concerns keeps each model focused, allowing the orchestration layer to optimize for latency and cost. For example, the orchestrator can serve the quick backend calls to Qwen 3.8 Max while the concierge agent maintains a rich conversation in parallel. Architecture Overview: AWS Bedrock with Llama 5, Qwen 3.8 Max, and a Fine-Tuned Concierge Agent The architecture runs entirely on AWS Bedrock , using its serverless inference endpoints to deploy Llama 5 (Meta AI, 70B parameter instruction-tuned) and Qwen 3.8 Max (Alibaba Cloud, 72B parameter). The fine-tuned concierge agent is a smaller Llama 5 variant (8B) fine-tuned with LoRA on the hotel group's prop
rietary guest interaction data. Bedrock handles load balancing, auto-scaling, and model versioning. Key design decisions: - Orchestrator as state machine: Built with AWS Step Functions, the orchestrator manages the multi-turn dialogue and calls Bedrock InvokeModel with the appropriate agent's prompt. - Agent communication via structured JSON: Each agent outputs a standard request/response schema that includes confidence scores, fallback triggers, and retry policies. - Latency optimization: By using the 8B concierge model for most guest-facing interactions, average response time stays under 600ms, while the 70B orchestrator runs only when complex reasoning is needed. Fine-Tuning the Concierge Agent: Data Sources and Training Strategy The concierge agent was fine-tuned on: - Historical chat logs from the hotel group's previous chatbot (anonymized, 120k conversations) - Property-specific FA
Q documents (pool hours, pet policies, breakfast menus, local attractions) - Upsell dialogue templates for room upgrades, late checkout, and spa packages Training used QLoRA on AWS SageMaker for 3 epochs, achieving an accuracy of 94% on intent classification and a BLEU-4 of 0.73 on open-ended responses. The fine-tuned model was then deployed on Bedrock as a custom model endpoint. The pilot team notes that fine-tuning on property-specific data was critical to reducing hallucination rates—guests received accurate local advice 97% of the time. Integrating with Existing PMS: APIs, Webhooks, and Fallback Modes Integration with property management systems (PMS) like Oracle Opera and Maestro was achieved through three patterns: 1. RESTful APIs for direct reservation updates (check-in status, room assignment). The orchestrator agent calls the PMS API via AWS Lambda functions that map agent outpu
ts to specific endpoints. 2. Webhooks for event-driven updates (e.g., guest confirms late checkout → PMS room status changes automatically). 3. Fallback to manual mode when the PMS is offline or the API returns an error. The agent gracefully escalates to a human front desk agent with a summarized co