Designing a Multi-Agent Energy Trading Architecture on AWS Bedrock: A 50-Trader Pilot

By Sam Qikaka

Category: Agents & Architecture

Discover how a multi-agent system using Llama 5, Qwen 3.8 Max, and a fine-tuned settlement agent on AWS Bedrock cut settlement time by 35% and reconciliation errors by 28% in a real energy trading pilot. This guide breaks down the architecture, handoff logic, and cost-per-trade benchmarks for enterprise adoption.

Why Multi-Agent Systems Are Transforming Energy Trading Settlement Energy trading settlement involves complex workflows: trade confirmation, invoice matching, payment netting, and regulatory reporting. Manual processes or monolithic AI models often introduce bottlenecks and errors, especially when trades span multiple desks, counterparties, and jurisdictions. Multi-agent systems address this by distributing specialized tasks across LLM-powered agents, each optimized for a specific function. The pilot demonstrated that dividing settlement into discrete agentic steps—triage, validation, reconciliation, and archival—yields significant efficiency gains without sacrificing auditability. System Architecture Overview: AWS Bedrock and the Three-Agent Setup The pilot architecture ran entirely on AWS Bedrock, using its managed runtime for foundation models and built-in orchestration capabilities.

The system consisted of three primary agents: Trade Extraction Agent (Llama 5 405B): Ingests unstructured trade confirmations from emails, APIs, and spreadsheets, extracting standardized fields (counterparty, volume, price, delivery window). Settlement Logic Agent (Qwen 3.8 Max): Performs matching and netting calculations, applies contract terms, and flags discrepancies. Settlement Agent (Fine-tuned Llama 5 70B): Finalizes trade settlements, generates confirmations, and updates internal ledgers. This agent was fine-tuned on a proprietary dataset of historical trades and reconciliation outcomes. All three agents communicated through Bedrock’s native agent routing, with a centralized state store (Amazon DynamoDB) tracking each trade’s lifecycle. The architecture used asynchronous messaging via Amazon SQS to decouple agent execution and allow parallel processing where possible. Model Roles:

Llama 5 vs Qwen 3.8 Max in the Settlement Pipeline Llama 5 and Qwen 3.8 Max were chosen based on their complementary strengths. Llama 5, particularly the 405B variant, excels at reasoning over complex, multi-step instructions and handling ambiguous extractive tasks. In the pilot, it processed over 10,000 trade confirmations per day with 99.2% field extraction accuracy. Qwen 3.8 Max, with its 128K context window and strong multilingual support, was ideal for the settlement logic agent. It needed to compare contract terms, handle cross-currency netting, and reference long historical tables—tasks where its extended context and math reasoning outperformed general-purpose models. Benchmark comparisons from the pilot showed Qwen 3.8 Max reducing false-positive flags by 22% compared to an earlier GPT-4 baseline. The fine-tuned settlement agent (Llama 5 70B) was kept smaller to optimize latency

and cost. Fine-tuning on 8,000 annotated trades improved its ability to navigate edge cases like partial fills and late confirmations, directly contributing to the 28% error reduction. Agent Handoff Logic and the Three-Layer Validation Protocol The critical innovation in the pilot was an explicit three-layer validation protocol that governed handoffs between agents, ensuring no trade progressed without multiple independent checks. 1. Layer 1 – Extraction Verification: After the Trade Extraction Agent outputs structured data, the Settlement Logic Agent independently re-runs extraction on a random 5% of fields using a parallel lookup. Any mismatch triggers a re-extraction cycle. 2. Layer 2 – Logic Consistency Check: Before the Settlement Logic Agent’s netting result passes to the Settlement Agent, the system reads a copy of the raw contract terms and recomputes expected net amounts using

a deterministic rules engine. If deviation exceeds 0.01% of trade value, the flow is paused for human review. 3. Layer 3 – Final Settlement Validation: The Settlement Agent’s output (payment instructions, ledger entries) is cross-checked against the original trade confirmation and the logic agent’s netting result by a lightweight validator agent (a small rule-based model). Only trades that pass all three layers are automatically booked. This protocol added approximately 2 seconds per trade but eliminated 96% of manual reconciliation interventions in the pilot. Cost-per-Trade Benchmarks and Scaling Considerations Cost efficiency was a key design goal. Using AWS Bedrock with on-demand pricing, the per-trade cost breakdown was approximately: Trade Extraction: $0.012 per trade (Llama 5 405B, 1,500 tokens input + 300 output) Settlement Logic: $0.008 per trade (Qwen 3.8 Max, 2,000 input + 500

output) Fine-tuned Settlement: $0.005 per trade (Llama 5 70B, 800 input + 400 output) Bedrock orchestration & storage: $0.003 per trade Total: $0.028 per trade (including validation layer overhead) These figures are based on published AWS Bedrock pricing as of May 2026 and the pilot’s average token