Building a Multi-Agent System for Legal Contract Review: A Vendor-Neutral Guide

By Sam Qikaka

Category: Agents & Architecture

Learn how a three-agent architecture on AWS Bedrock—using Qwen 3.8 Max, Llama 5, and a fine-tuned model—automates contract clause extraction, risk assessment, and compliance reporting. A pilot with a mid-sized law firm achieved a 40% reduction in review time and 25% accuracy improvement.

Why Legal Departments Are Turning to Multi-Agent Architectures Legal teams face mounting pressure: contract volumes are rising, regulatory requirements are more complex, and budgets are constrained. Existing tools—whether purely manual reviews or single-model AI assistants—often fall short. A single large language model (LLM) can extract clauses, but it may lack domain-specific risk frameworks or the ability to generate structured compliance reports. Multi-agent architectures solve this by dividing the workflow into specialized tasks, each handled by a model or agent optimized for that function. The result is higher accuracy, faster throughput, and clearer audit trails. Key drivers include: - Increased contract complexity : Modern contracts contain nested clauses, conditional obligations, and cross-references. - E-discovery demands : Litigation support now requires processing millions of

documents quickly. - Regulatory compliance : New frameworks such as GDPR, CCPA, and sector-specific rules require systematic checks. - Cost pressure : Law firms and corporate legal departments aim to reduce billable hours spent on repetitive review. A multi-agent approach also enables independent updates: each agent can be swapped or fine-tuned without retraining the entire system. Architecture Overview: A Three-Agent System for Contract Analysis and Compliance The architecture consists of three agents running on using the multi-agent collaboration capability (released in late 2025). A central orchestrator (a lightweight agent built on Bedrock AgentCore) coordinates the workflow: 1. Agent 1 - Clause Extractor : Uses Qwen 3.8 Max to identify and extract key clauses (e.g., indemnification, termination, governing law) from uploaded contracts. 2. Agent 2 - Risk Assessor : Uses Llama 5 to as

sign risk scores to extracted clauses based on a custom risk taxonomy. 3. Agent 3 - Compliance Reporter : Uses a fine-tuned model (based on a compact foundation model like Mistral 7B or Llama 3.2) to generate a structured compliance report, referencing relevant regulations. Data flows sequentially: raw PDF or DOCX → extraction → risk scoring → report generation. Each agent can also call APIs for external data (e.g., regulatory databases) if needed. All processing stays within the AWS environment to meet data locality requirements. Agent 1: Contract Clause Extraction with Qwen 3.8 Max is a large-scale instruction-tuned model from Alibaba Cloud with strong performance on structured information extraction. On AWS Bedrock, we deploy it with a temperature of 0 and top-p of 0.1 to maximize determinism. Implementation details : - Input: Raw text from contracts (up to 32K tokens per document). -

Output: JSON array of clauses with fields: , , , . - Prompt engineering: A system prompt instructs the model to return only the clauses from a predefined list (e.g., 15 common types). We use zero-shot extraction with few-shot examples for rare clause types. Accuracy : During the pilot, extraction precision reached 92% and recall 89% after tuning prompts. The model handles dense legalese well, but ambiguous clauses (e.g., “best efforts” without definition) are flagged for human review. Cost per document : Based on AWS Bedrock pricing as of May 2026, Qwen 3.8 Max costs approximately $0.015 per 1K input tokens and $0.06 per 1K output tokens. For a typical 5,000-token contract (input) and 1,500 output tokens, the cost is about $0.165 per document. Batch processing reduces cost slightly via token caching. Agent 2: Risk Assessment Using Llama 5 (Meta AI) brings improved reasoning and a larger

context window (128K tokens) compared to its predecessor. We fine-tune it on a proprietary dataset of 10,000 annotated clause–risk pairs created by in-house legal experts. The fine-tuning uses LoRA on AWS Bedrock’s Fine-tuning service (available since Q1 2026). Risk scoring logic : - Each clause is scored on a 1–5 scale for three dimensions: legal exposure, financial exposure, and regulatory exposure. - Scores are weighted (e.g., legal 0.5, financial 0.3, regulatory 0.2) to produce an overall risk tier: Low, Medium, High. - Llama 5 also provides a text explanation for each score, aiding auditor understanding. Integration with Agent 1 : The orchestrator passes the extraction JSON directly to Agent 2’s API. No re-parsing is needed. Latency per clause is 2 seconds; a contract with 20 clauses takes 40 seconds. Cost : Llama 5 on Bedrock is priced at $0.012 per 1K input tokens and $0.045 per

1K output tokens (fine-tuned endpoints may have a 25% markup). For a typical clause input of 500 tokens and output of 300 tokens per clause, 20 clauses cost about $0.39. Agent 3: Compliance Report Generation (Fine-Tuned Specialized Model) Agent 3 generates a compliance report that maps clauses to sp