40% Faster Content Review: Inside the Multi-Agent Pilot by 10 Media Companies

By Sam Qikaka

Category: Agents & Architecture

As of May 24, 2026, a consortium of 10 media companies completed the first multi-agent pilot for content moderation and personalization on AWS Bedrock. This article provides a replicable architecture blueprint, deployment steps, and KPI definitions for media operations leaders evaluating multi-agent systems.

The Media Consortium Pilot: Goals and Scope The consortium—encompassing broadcasters, digital publishers, and streaming platforms—set out to tackle two perennial operational bottlenecks: slow content moderation cycles and generic recommendations that failed to retain audiences. Before the pilot, each company relied on siloed rules-based moderation tools and basic collaborative filtering models, resulting in average review times of 12–18 hours and user engagement scores hovering around 15% click-throughs. The group agreed on a shared infrastructure using AWS Bedrock, citing its managed agent orchestration, built-in guardrails, and support for multiple foundation models. The pilot ran from January to March 2026, processing over 1.2 million pieces of content (text, images, and short video clips) across news, sports, and entertainment verticals. Key objectives included: - Reduce human review

er workload by at least 30% - Increase personalization relevance without compromising content safety - Establish a framework to measure ROI that other media firms could adopt Architecture Overview: Multi-Agent System on AWS Bedrock The architecture uses two primary agents coordinated by Bedrock’s built-in orchestration layer. The content classification agent (powered by Qwen 3.8 Max) ingests incoming content, classifies it for safety (violence, hate speech, misinformation), and assigns metadata tags. The personalization recommendation agent (powered by Llama 5) then uses those tags plus user behavioral signals to generate real-time recommendations. A shared state store (Amazon DynamoDB) holds intermediate classifications and user session data, while event-driven triggers (Lambda functions) invoke each agent asynchronously. Both agents run in separate Bedrock agent instances with distinct

prompts and knowledge bases. The orchestrator handles error retries, timeouts, and fallback responses if one agent fails. For example, if the classification agent cannot determine a safety score, the content is sent to a human review queue automatically—a pattern the team called “graceful degradation.” Content Classification Agent with Qwen 3.8 Max The classification agent uses Qwen 3.8 Max , a 3.8-trillion‑parameter multimodal model from Alibaba Cloud, fine-tuned on media-specific datasets. According to the model card on Hugging Face (huggingface.co/Qwen), Qwen 3.8 Max offers state-of-the-art performance on multiple benchmarks including content safety classification (MMLU-Pro, with 87.4% accuracy on safety subsets). In the pilot, the agent receives raw content and performs three tasks: 1. Safety check : Scores content from 0–1 for violations of consortium policies (hate speech, graphic

violence, misinformation). Flagged items above a configurable threshold (0.95) are quarantined. 2. Topic tagging : Extracts entities (people, products, event categories) and writes them as metadata. 3. Quality flag : Identifies low-resolution images, broken captions, or duplicated content. The agent processes an average of 60 items per minute at a cost of $0.004 per classification call (based on Bedrock’s on-demand pricing as of May 2024; consortium negotiated volume discounts). Compared to the previous rules-based system, the agent caught 94% of policy violations—15% more than before. Personalization Recommendation Agent with Llama 5 The recommendation agent uses Llama 5 from Meta, a 1.2-trillion‑parameter model optimized for click-rate prediction and sequence-based recommendations. Meta’s documentation (meta.com/llama) highlights its ability to reason over long user histories (up to 2

56 tokens of past interactions) and cold-start recommendations for new users. In this workflow, the agent receives the user’s session history and the top 20 content pieces tagged by the classification agent. It scores each piece for predicted engagement (click probability) and returns the top five. The agent also generates a short natural-language explanation for each recommendation (e.g., “Because you read yesterday’s financial report…”), which the consortium found increased click-through by an additional 10%. One key integration detail: the classification agent’s safety scores are fed into the recommendation agent’s constraints. If a piece of content has a borderline safety score (0.8–0.95), it is excluded from recommendations unless the user has explicitly opted into mature content—a compliance requirement the consortium agreed upon early. Orchestrating the Agents: Workflow and State

Management Behind the scenes, AWS Bedrock handles agent orchestration via its Agent Builder and Knowledge Base features. The consortium designed a three-step workflow: 1. Ingestion : Content arrives via S3 bucket, triggering an event that passes the content URI to the classification agent. 2. Classi