Multi-Agent Fraud Detection for Banks: A 10-Bank Consortium's Pilot on AWS Bedrock
By Sam Qikaka
Category: Agents & Architecture
As of May 24, 2026, a consortium of 10 global banks completed the first documented multi-agent fraud detection pilot on AWS Bedrock, combining Llama 5 and Qwen 3.8 Max to reduce detection time by 35% and false positives by 20%. This article details the architecture, deployment roadmap, and compliance considerations for B2B leaders.
First Multi-Agent Fraud Detection Pilot on AWS Bedrock Achieves 35% Faster Detection As of May 24, 2026, a consortium of 10 global banks has completed the first documented multi-agent pilot for fraud detection on AWS Bedrock. By orchestrating Llama 5 for transaction analysis and Qwen 3.8 Max for anomaly scoring, the consortium achieved a 35% reduction in detection time and a 20% decrease in false positives compared to single-agent baselines. This article provides a vendor-neutral, replicable architecture, real pilot metrics, and a 3-phase production roadmap tailored for banking compliance requirements under GDPR and CCPA. The Multi-Agent Architecture for Fraud Detection The pilot employed a two-agent architecture running on AWS Bedrock, leveraging Bedrock's native multi-agent orchestration capabilities. The agents communicated via a shared event bus managed by Bedrock Agents, with state
stored in a purpose-built vector database for temporal pattern matching. Agent 1: Llama 5 for Transaction Analysis – Fine-tuned on the consortium's historical transaction data, Llama 5 processed real-time transaction streams to extract features such as velocity, geolocation mismatches, and device fingerprints. It output structured transaction risk scores and flagged anomalous sequences for further review. Agent 2: Qwen 3.8 Max for Anomaly Scoring – This model ingested Llama 5's outputs along with cross-institutional signal data (e.g., IP blacklists, known fraud patterns). Qwen 3.8 Max applied a probabilistic scoring engine to assign a final anomaly probability per transaction, dynamically weighting features based on recent fraud trends. Orchestration on AWS Bedrock allowed the agents to run as independent serverless functions, scaling automatically with transaction volume. The consortium
used Bedrock's built-in guardrails to enforce data partitioning and prevent cross-institutional data leakage, a critical requirement for multi-tenant banking environments. Why Combine Llama 5 and Qwen 3.8 for Anomaly Scoring? Choosing two distinct models was deliberate. Llama 5, released by Meta in early 2026, excels at high-throughput sequence classification with low per-token cost, making it ideal for real-time transaction parsing. Qwen 3.8 Max, Alibaba Cloud's latest reasoning-focused model, offers superior contextual anomaly detection through multi-step reasoning and support for long context windows—essential for linking transactions across hours or days. Benchmark evaluations from the consortium's pre-pilot study showed: Llama 5 achieved 94.2% recall on known fraud categories with a latency of 12 ms per transaction. Qwen 3.8 Max boosted precision by 8.3% over a single Llama 5 model
on novel attack patterns, at a slightly higher latency of 45 ms per transaction. By splitting the pipeline, the consortium kept the critical path (initial flagging) fast while allowing the second agent more compute for nuanced scoring—reducing overall false positives without sacrificing detection speed. Pilot Results: 35% Faster Detection and 20% Fewer False Positives The pilot ran for 90 days across three production regions (Europe, North America, Asia-Pacific), processing a combined 1.2 billion transactions. Key performance indicators: Metric Multi-Agent System Single-Agent Baseline (Llama 5 only) Improvement :---------------------- :------------------------ :----------------------------------- :------------ Average detection time 0.8 seconds 1.23 seconds 35% faster False positive rate 2.1% 2.63% 20% reduction Recall on known fraud 96.5% 94.2% +2.3% Recall on novel fraud 78.4% 67.1% +
11.3% Notably, the multi-agent architecture caught 11.3% more novel fraud patterns—attacks not seen in training data—thanks to Qwen 3.8's anomaly detection capabilities. The consortium also reported a 15% reduction in manual investigation workload due to fewer false positives. Compliance with GDPR and CCPA in Multi-Agent Deployments Multi-agent architectures introduce unique compliance challenges because multiple models may process personal data at different pipeline stages. The consortium developed a compliance checklist based on GDPR Article 5 (data minimization) and CCPA Section 1798.100 (right to know): Data minimization by design : Each agent only received the minimum data required for its task. Llama 5 never saw PII; it processed tokenized transaction IDs and feature vectors. Qwen 3.8 handled only aggregated anomaly scores and anonymized metadata. Consent management : Transaction d
ata was pseudonymized before entering the agent pipeline. A consent dashboard allowed customers to opt out of AI-based scoring, routing those transactions to a rule-based fallback. Audit trails : AWS Bedrock's logging captured every agent invocation, input/output pair, and decision path. Logs were i