Amazon Bedrock Pricing Models: On-Demand vs Provisioned Throughput + Model Decision Tree

By Sam Qikaka

Category: Models & Releases

Unlock the unit economics of Amazon Bedrock's pricing models, comparing on-demand and provisioned throughput for Nova, Claude Opus, and Llama profiles. Follow our practical decision tree to select the best model for enterprise RAG and agent workloads.

Amazon Bedrock Model Menu Overview Amazon Bedrock offers businesses a unified API to access over 100 foundation models (FMs) from leading providers like Amazon (Nova series), Anthropic (Claude), and Meta (Llama). As of May 2026, its model catalog is robust, featuring high-performance options suitable for enterprise workloads such as Retrieval Augmented Generation (RAG) pipelines, coding agents, and multimodal inference, all without the need for managing underlying infrastructure. Key features include serverless scaling, Bedrock Guardrails for enhanced safety, and customization options through fine-tuning or RAG. Pricing is model-specific and dependent on the throughput mode selected. For the most current rates, always refer to the official AWS Bedrock pricing page (aws.amazon.com/bedrock/pricing). Specific values, such as , , and , dictate the applicable rates. You can choose between on-

demand, pay-per-use pricing or opt for provisioned capacity commitments for predictable, large-scale usage. This guide aims to provide a clear breakdown of Bedrock's pricing models, unit economics, and a decision-making framework designed for operations leaders focused on optimizing AI costs. On-Demand vs. Provisioned Throughput Explained Amazon Bedrock provides two primary throughput modes to accommodate different usage patterns: On-Demand : This mode allows you to pay solely for the tokens processed. It's ideal for workloads with variable demand or those experiencing occasional bursts. There's no upfront commitment required, and the service automatically scales to meet demand. Billing is typically per 1,000 input and output tokens (or their equivalent for image and video data in multimodal models). Provisioned Throughput : With this option, you commit to a fixed model instance for a sp

ecified duration, such as 1, 6, or 12 months. This guarantees predictable latency and throughput, often measured in queries per minute (QPM) or tokens per minute (TPM). Provisioned Throughput can offer significant discounts, typically ranging from 30% to 70% compared to on-demand rates, making it cost-effective for steady, high-volume usage. You can select and configure this via the AWS console, specifying the and commitment duration. According to AWS documentation (as of May 2026), provisioned throughput is generally recommended for production RAG or agent workloads that maintain over 90% utilization. On-demand is better suited for prototyping or operations with highly sporadic usage patterns. Throughput is quantified in Tokens Per Minute (TPM) or its equivalent in Words Per Minute (WPM). Unit Economics Breakdown by Throughput Mode The unit economics for using Amazon Bedrock can vary si

gnificantly based on the chosen throughput mode and the specific model selected. It is crucial to always consult the official AWS Bedrock pricing page (aws.amazon.com/bedrock/pricing) for the most up-to-date rates associated with your chosen . On-Demand Economics Per-Token Billing : In this mode, you are billed based on the number of tokens processed. For example, hypothetical rates for might be around $10-$15 per million input tokens and $30-$75 per million output tokens (note: these are illustrative and based on prior pricing tiers; always verify current rates). Key Cost Drivers : Models that generate a high volume of output tokens, such as Claude Opus, can lead to higher overall costs. While RAG adds retrieval latency, it does not incur additional foundation model costs. Break-Even Point : On-demand pricing is generally cost-effective for workloads with utilization below 50-70%. It al

so seamlessly handles unpredictable spikes in demand. Provisioned Throughput Economics Hourly Rate for Reserved Capacity : You pay an hourly rate for reserved model capacity. For instance, committing to for 6 months could offer a discount of approximately 50% compared to its on-demand equivalent (based on AWS pricing card, May 2026). TPM Allocation : You can purchase capacity ranging from 1 million to over 100 million TPM. While unused provisioned capacity does not roll over, it guarantees a specific quality of service (QoS). Scenario Examples : RAG Workload (10,000 QPM) : On-demand costs might range from $0.05 to $0.20 per query. With provisioned throughput at scale, this cost can drop to $0.02 to $0.08 per query. Coding Agent : For agents that require extensive input (including prompts and tool usage), provisioned throughput for powerful models like Opus can be more economical due to p

redictable performance. Methodology Tip : Utilize the AWS Pricing Calculator to estimate costs. Input your required TPM, the specific , and the desired commitment duration. For agent workloads, consider factoring in a 1.2x to 2x multiplier for output tokens relative to input tokens. Key Models: Nova