Multi-Agent Underwriting Pilot Results: 10 Carriers Share Their Blueprint for 30% Faster Quotes
By Sam Qikaka
Category: Agents & Architecture
A consortium of 10 major insurance carriers completed the first known multi-agent underwriting pilot on AWS Bedrock, achieving 30% faster quotes, 25% lower review costs, and 15% better loss ratio accuracy. This vendor-neutral blueprint details the architecture, data pipeline, and governance guardrails used.
The Consortium’s Multi-Agent Architecture Overview The consortium deployed a two-agent system orchestrated via AWS Bedrock’s agent runtime. The architecture separates the underwriting workflow into two specialized stages: - Risk Scoring Agent – ingests application data, historical claims, and third-party risk signals to produce a risk score. - Policy Recommendation Agent – takes the risk score and other underwriting rules to recommend coverage terms, premiums, and exclusions. Both agents share a common state machine on Bedrock that manages context, retries, and human-in-the-loop checkpoints. This multi-agent orchestration insurance approach allowed each model to focus on its core competency while the orchestration layer handled sequencing and fallback logic. How Did the Consortium Achieve 30% Faster Quote Generation? The speed gains came primarily from replacing sequential manual handoff
s with parallel agent execution and automated decision routing. - Parallel risk assessment : The Risk Scoring Agent evaluates multiple risk dimensions simultaneously (credit, loss history, property condition) using Qwen 3.8 Max’s long-context capabilities. - Instant policy generation : Once the risk score is valid, the Policy Recommendation Agent (Llama 5) generates a quote in seconds, applying business rules stored as retrievable prompts. - Automated exception handling : Low-risk submissions bypass human review entirely; borderline cases are flagged with an explanation for underwriters. The pipeline reduced average quote time from 48 hours to 33.6 hours — a 30% improvement, per consortium internal data. Agent Roles: Risk Scoring with Qwen 3.8 Max and Policy Recommendation with Llama 5 Risk Scoring Agent (Qwen 3.8 Max) - Model : Qwen 3.8 Max, deployed via Bedrock’s model invocation endpo
int. Chosen for its strong multi-modal reasoning and ability to handle structured and unstructured data (e.g., PDF applications, loss runs). - Function : Inputs include applicant financials, property inspection reports, and historical claim narratives. Output is a composite risk score (0–100) with justification tokens for audit trails. - Background research : The design draws on findings from arXiv:2602.13213, which demonstrated improved risk stratification using foundation models with fine-tuned attention heads. Policy Recommendation Agent (Llama 5) - Model : Llama 5 (70B), also on Bedrock. Chosen for its instruction-following precision and deterministic output style suitable for compliance-sensitive decisions. - Function : Takes the risk score plus underwriting guidelines from a vector database (FAISS) and outputs a recommended policy structure: premium range, deductibles, endorsements
, and declination reasons if applicable. - Background research : The consortium referenced arXiv:2602.00456, which explored rule-constrained generation for insurance policy documents. Data Pipeline Design for Real-Time Underwriting The consortium built a streaming data pipeline that feeds both agents with up-to-date information: 1. Ingestion : Applications arrive via API (ACORD standard) and are parsed into a JSON schema. 2. Enrichment : Third-party data (credit scores, weather risk, property valuations) is fetched via Bedrock’s knowledge base connectors. 3. Vectorization : Historical claim documents and underwriting manuals are embedded into a vector store (Pinecone) for retrieval-augmented generation (RAG). 4. Orchestration : Bedrock’s agent runtime routes the enriched application to the Risk Scoring Agent, waits for the score, then passes it to the Policy Recommendation Agent. 5. Huma
n review gateway : Any policy recommendation that deviates from standard parameters (e.g., premium outside ±10% of model range) is routed to an underwriter dashboard. This design supports commercial lines underwriting automation at scale while maintaining full auditability. Governance Guardrails: Compliance, Accuracy, and Auditability Insurance is a heavily regulated industry, so the consortium implemented several governance guardrails for AI agents : - Human-in-the-loop mandates : All policies with premiums above $50,000 or flagged as “high risk” require an underwriter sign-off. - Explainability : Every agent output includes a confidence score and a chain-of-thought summary. The Risk Scoring Agent must output the top three factors influencing the score. - Audit log : All agent inputs, intermediate states, and final outputs are logged to Amazon S3 with immutable retention for regulatory
review. - Fairness monitoring : Monthly bias audits compare approval rates across protected classes, using a separate ML fairness toolkit. - Fallback rules : If either agent’s confidence falls below 70%, the system defaults to a deterministic rule engine used before the pilot. These guardrails helpe