Multi-Agent Fraud Detection for Banking: A Proven Blueprint from a 10-Bank Consortium

By Sam Qikaka

Category: Enterprise AI

As of May 24, 2026, a consortium of 10 major banks completed the first known multi-agent fraud detection pilot on AWS Bedrock. Combining Qwen 3.8 Max for transaction anomaly detection and Llama 5 for real-time rule adjustment, the system cut false positive alerts by 40% and reduced fraud investigation time by 35%. This vendor-neutral blueprint provides a step-by-step guide for banking operations leaders to replicate the architecture using open-weight models and LangGraph orchestration, with full

The Consortium and Pilot Overview As of May 24, 2026, a consortium of 10 major banks completed the industry's first documented multi-agent fraud detection pilot on AWS Bedrock. By combining Qwen 3.8 Max for transaction anomaly detection and Llama 5 for real-time rule adjustment, the system achieved a 40% reduction in false positive alerts and a 35% decrease in fraud investigation time compared to legacy rules-based systems. This article provides a vendor-neutral blueprint for banking operations leaders to replicate the architecture, including model selection rationale, LangGraph orchestration design, cost breakdown, and ROI projections. The pilot was initiated in Q4 2025 by a consortium of 10 major banks—including three top-10 US institutions, two European universal banks, and five regional players—to evaluate open-weight multi-agent approaches for fraud detection under real production c

onstraints. Each bank contributed anonymized transaction data (over 200 million transactions per month) and ran the system in parallel with existing rules engines for six months. The goal was to test whether a multi-agent system using publicly available models could outperform proprietary solutions at a fraction of the cost. Key findings: - False positive rate dropped from 12% (legacy) to 7.2% (multi-agent), a 40% relative reduction. - Median investigation time fell from 48 minutes to 31 minutes, a 35% improvement. - Overall fraud detection rate (true positives) remained stable at 94%, with only a slight increase in alert volume due to agentic autonomation. The consortium published the architecture and results under a permissive license, providing the first open blueprint for multi-agent fraud detection in banking. Architecture: Combining Qwen 3.8 Max for Anomaly Detection and Llama 5 fo

r Rule Adjustment The system uses two primary agentic models, each specialized for a distinct task: - Qwen 3.8 Max (Hugging Face model ID: Qwen/Qwen3.8-Max-Instruct) : An open-weight 38-billion-parameter model optimized for transaction anomaly detection. The consortium fine-tuned it on 12 months of aggregated banking transaction patterns using LoRA, achieving 99.2% recall for known fraud types and 91% for zero-day anomalies. The model runs on two AWS Bedrock real-time inference endpoints (g6e.12xlarge) with batching. - Llama 5 (Meta-Llama/Llama-5-70B-Instruct) : A 70-billion-parameter model for real-time rule adjustment. Llama 5 analyzes flagged transactions in context, determines whether existing rules are misaligned, and suggests new heuristic rules or parameter updates. It operates as a deterministic reasoning layer, with human approval required for rule changes above $10,000 threshol

ds. The two models communicate through a shared state graph managed by LangGraph. Qwen 3.8 Max produces anomaly scores and feature vectors; Llama 5 processes these to propose rule adjustments. The system cycles every 30 seconds, updating detection rules dynamically without human intervention for low-value patterns. Orchestration with LangGraph on AWS Bedrock LangGraph (v0.3.2) acts as the orchestration framework, defining the agent interaction graph on AWS Bedrock. The architecture follows a supervisor-agent pattern : 1. Supervisor Agent : A lightweight orchestrator (running on a small Lambda container) ingests streaming transaction logs from Amazon Kinesis. It assigns each transaction to one of three agent pathways based on risk score: low-risk (auto-pass), medium-risk (Qwen 3.8 Max analysis), high-risk (Qwen + Llama 5). 2. Anomaly Agent (Qwen 3.8 Max) : Processes batches of medium- and

high-risk transactions via Bedrock's SageMaker endpoint. Returns anomaly scores, confidence intervals, and top five feature attributions. 3. Rule Adjustment Agent (Llama 5) : For high-risk cases, Llama 5 reviews the anomaly explanation, compares it against the current rule set (stored in Amazon DynamoDB), and generates a proposed rule update. All proposals go to a human-in-the-loop queue for senior fraud analysts. 4. Feedback Loop : Approved adjustments update the global rule set in DynamoDB, and the system logs all decisions to Amazon S3 for audit. LangGraph provides built-in error handling, retries, and a DAG visualization that the consortium used to debug bottlenecks. The entire graph runs as a Step Functions state machine on AWS Bedrock, with a 99.95% uptime SLA during the pilot. Key Metrics: 40% Fewer False Positives and 35% Faster Investigations The pilot delivered three primary m

easurable outcomes: Metric Legacy System Multi-Agent System Improvement :-------------------------- :------------ :----------------- :--------------- False positive rate 12.0% 7.2% 40% reduction Median investigation time 48 minutes 31 minutes 35% reduction Fraud detection rate (TPR) 93.8% 94.1% +0.3