Published On: 29 May 2025
The rise of Large Language Models (LLMs) like Claude, GPT, and Jurassic-2 has revolutionized how businesses automate, create, and serve customers. These models can write content, summarize data, generate code, and even hold contextual conversations. Their growing use across industries—healthcare, legal, education, finance—signals a shift in how organizations operate. From startups to Fortune 500 companies, the adoption of LLMs is rapidly accelerating, becoming a key driver of innovation and productivity in AI-powered workflows. like Claude, GPT, and Jurassic-2 has revolutionized how businesses automate, create, and serve customers. These models can write content, summarize data, generate code, and even hold contextual conversations. Their growing use across industries—healthcare, legal, education, finance—signals a shift in how organizations operate.
However, one of the biggest challenges remains: how do you host these powerful models efficiently, securely, and at scale?
Enter Amazon Bedrock—a serverless, fully managed service by AWS that allows you to build and scale generative AI applications using a selection of top-tier foundation models. This blog dives into everything you need to know to get started.
Imagine having instant access to the world’s most advanced AI models—without worrying about infrastructure, scaling, or complex deployments. That’s the promise of Amazon Bedrock.
Built by AWS, Bedrock is a fully managed service that lets you build generative AI applications using a curated selection of foundation models from leading AI companies. Whether you’re generating text, building chatbots, summarizing documents, or enhancing your product with AI-powered features, Bedrock gives you the tools to do it—all through a simple API.
With Bedrock, you can tap into powerful models from:
Anthropic (Claude)
AI21 Labs (Jurassic-2)
Cohere (Command R+)
Meta (Llama family)
Mistral (Mixtral)
Stability AI (image generation)
Amazon (Titan models)
No GPUs to manage. No containers to orchestrate. Just plug into the model you need and start building.
Choose from several models with different strengths—text generation, summarization, Q&A, coding—under one roof.
No infrastructure to set up. Bedrock handles everything behind the scenes, including scaling and load management.
Bedrock gives you a unified API for calling different models, making development faster and more standardized.
Use features like fine-tuning, retrieval-augmented generation (RAG), and Knowledge Bases to adapt models to your data.
You operate in a VPC environment, with IAM permissions, logging, and encrypted data handling—ideal for sensitive Large Language Model use cases.
Foundation Model Access: Switch between models from Anthropic, AI21, Cohere, and more.
Custom Model Workflows: Enhance Large Language Models with proprietary data.
RAG + Knowledge Bases: Inject current and domain-specific knowledge into your LLM responses.
Guardrails: Apply safety filters to manage output behavior.
Monitoring & Analytics: View usage metrics, model latency, and token consumption.
Chatbots and Virtual Assistants
Automated Knowledge Retrieval (RAG)
Content Generation & Copywriting
Customer Support Automation
Code Explanation and Assistance
Contract and Legal Document Summarization
Platform | Self-Hosting | SageMaker | Bedrock |
---|---|---|---|
Infra Required | Yes | Partial | No |
Model Variety | Depends | Few | Many |
Scalability | Manual | High | Auto |
Customization | Advanced | Advanced | Moderate (RAG/Fine-tune) |
Setup Time | High | Medium | Low |
Amazon Bedrock offers a fully managed experience for building, scaling, and optimizing generative AI applications. Pricing is structured around two primary dimensions:
Below is a breakdown of all pricing models, categorized by usage type.
Plan | Mode | Billing Unit | Use Case |
---|---|---|---|
On-Demand | Real-time | Per 1,000 tokens / image | Pay-per-use; no commitment |
Batch Mode | Bulk | Per 1,000 tokens | ~50% cheaper; S3 storage for input/output |
Latency-Optimized | Low-latency | Per 1,000 tokens | Fast response for Haiku,Nova Pro, Llama 3.1 |
Provisioned Throughput | Reserved | Hourly per model unit | Consistent high-volume workloads |
Feature | Unit | Pricing | Notes |
---|---|---|---|
Import Fee | – | Free | No charge to import model weights |
Active Model Copy | Per model/minute | Starts at $0.0785 | Billed in 5-minute windows; varies by model & region |
Monthly Storage | Per model | $1.95 | Charged monthly per custom model stored |
Customization Type | Pricing Metric | Details |
---|---|---|
Fine-tuning / Pretraining | Per 1,000 tokens | Based on number of epochs × corpus size |
Model Distillation | Per 1,000 tokens | Teacher inference + student fine-tuning |
Storage Fee | Monthly per model | $1.95/month |
Inference on Custom Model | Per hour per model unit | Starts at $22/hour (with commitment) |
Prompt Caching | Per cached token | Up to 90% discount and 85% latency improvement |
Type | Billing | Details |
---|---|---|
Proprietary Models | Software + Infra cost | Provider sets software fee; infra charged by instance |
Public Models | Infra cost only | No software fee |
Autoscaling Support | Yes | Customize instance count/type, autoscale policies |
Tool | Pricing Unit | Price | Use Case |
---|---|---|---|
Prompt Optimization | Per 1,000 tokens | $0.030 | Improves clarity and token efficiency |
Prompt Caching | Per cached token | Up to 90% discount | Latency boost for repeated context |
Prompt Management (UI) | Included in API access | No separate fee | Test, version, and compare prompts |
Tool | Unit | Price | Region/Notes |
---|---|---|---|
Bedrock Flows | 1,000 node transitions | $0.035 | Workflow execution via visual builder/API |
Structured SQL Gen | 1,000 queries | $2.00 | Converts NL to SQL; available in US East (Ohio) |
Intelligent Prompt Router | 1,000 requests | $1.00 | Cost-effective model switching |
Filter Type | Pricing Unit | Price |
---|---|---|
Text Content Filter | 1,000 text units | $0.15 |
Denied Topics | 1,000 text units | $0.15 |
Sensitive Info Filter | 1,000 text units | $0.10 |
Regex / Word Filter | – | Free |
Image Content Filter | Per image | $0.00075 |
Contextual Grounding Check | 1,000 text units | $0.10 |
Evaluation Type | Unit | Price | Notes |
---|---|---|---|
Human Task Evaluation | Per completed task | $0.21 | Appears under SageMaker billing; you provide the workteam |
LLM-as-a-Judge Evaluation | Token usage per model | Based on selected model | Charges apply to both generator + judge models used |
Bring Your Own Responses | Human task or judge model cost | Same as above | Useful if skipping model inference and only evaluating responses |
Amazon Bedrock’s pricing is organized into two primary categories to help businesses plan and scale their AI workloads efficiently:
Inference pricing (on-demand, batch, latency-optimized, provisioned throughput)
Custom model hosting and import
Fine-tuning and prompt caching
Bedrock Flows for workflow automation
Guardrails for content filtering and policy enforcement
SQL generation and data automation
Rerank models, prompt optimization, and evaluation tools
Each pricing category is broken down in detailed tables that follow, allowing teams to choose the right features based on use case, budget, and performance requirements.
To improve clarity and readability, model pricing has been broken into sub-tables by provider.
Feature | Price | Notes |
---|---|---|
Hosting per minute | $0.0785 | Billed in 5-minute units |
Monthly storage | $1.95 | Per custom model stored |
Model Name | Input Token (per 1K) | Output Token (per 1K) |
---|---|---|
Jamba 1.5 Large | $0.002 | $0.008 |
Jamba 1.5 Mini | $0.0002 | $0.0004 |
Jurassic-2 Mid | $0.0125 | $0.0125 |
Jurassic-2 Ultra | $0.0188 | $0.0188 |
Jamba-Instruct | $0.0005 | $0.0007 |
Model | Input Token (per 1K) | Output Token (per 1K) | Notes |
---|---|---|---|
Nova Micro | $0.000035 | $0.00014 | On-Demand |
Nova Premier | Up to $0.0025 | Up to $0.0125 | Latency optimized available |
Nova Canvas Img | – | – | $0.04–$0.08/image by resolution |
Nova Sonic Speech | $0.0034 | $0.0136 | Speech-to-speech/text |
Model Name | Input Token (per 1K) | Output Token (per 1K) | Notes |
---|---|---|---|
Claude Opus 4 | $0.015 | $0.075 | US-only |
Claude 3.5 Sonnet | $0.003 | $0.015 | Global availability |
Claude 3 Haiku | $0.00025 | $0.00125 | Cache and batch supported |
Model | Input Token (per 1K) | Output Token (per 1K) | Notes |
---|---|---|---|
Command R+ | $0.003 | $0.015 | |
Command-Light | $0.0003 | $0.0006 | |
Rerank 3.5 | – | – | $2.00 per 1,000 queries |
Model | Input Token (per 1K) | Output Token (per 1K) |
---|---|---|
DeepSeek-R1 | $0.00135 | $0.0054 |
Model Name | Input Token (per 1K) | Output Token (per 1K) |
---|---|---|
Llama 4 Maverick 17B | $0.00024 | $0.00097 |
Llama 3.2 Instruct (1B) | $0.0001 | $0.0001 |
Model Name | Input Token (per 1K) | Output Token (per 1K) |
---|---|---|
Pixtral Large (25.02) | $0.002 | $0.006 |
Model | Price per Image |
---|---|
Stable Diffusion 3.5 L | $0.08 |
SDXL 1.0 (1024x1024) | $0.04 (standard) / $0.08 (premium) |
Model | Input Token (per 1K) | Output Token (per 1K) |
---|---|---|
Palmyra X4 | $0.0025 | $0.010 |
Palmyra X5 | $0.0006 | $0.006 |
Understanding On-Demand, Batch, and Latency-Optimized Modes
Below is a comparison of the three primary inference modes in Amazon Bedrock:
Mode | Use Case | Billing Basis | Advantages |
---|---|---|---|
On-Demand | Real-time apps with dynamic load | Per 1,000 tokens or image | Flexible, scalable; no setup or commitment required |
Batch | Large-scale predictions processed together | Per 1,000 tokens | 50% cheaper than on-demand; input/output handled via S3 |
Latency-Optimized | Interactive apps needing instant responses | Per 1,000 tokens | Ultra-fast response; optimized for specific models (e.g., Claude 3.5 Haiku, Llama 3.1 405B/70B) |
These modes allow teams to select the right trade-off between responsiveness, scale, and cost depending on their application’s requirements.
A token is a basic unit of text (typically a few characters) used by models to understand and respond to prompts.
Cross-Region Inference: Supported in On-Demand mode. Lets you manage traffic bursts using compute from multiple AWS Regions. No extra cost—pricing is based on your source region.
Batch Mode: Ideal for bulk predictions. Submit multiple prompts via a file and receive batched responses in an S3 bucket. Batch inference is offered at ~50% lower cost than On-Demand mode for select models.
Latency-Optimized Mode: Offers faster response times for interactive applications. Available for:
Feature | Metric / Unit | Price | Region | Notes |
---|---|---|---|---|
Bedrock Flows | 1,000 node transitions | $0.035 | Global | Workflow execution billing starts Feb 1, 2025 |
SQL Generation | 1,000 queries | $2.00 | US East (Ohio) | Structured data query generation |
Rerank Models | 1,000 queries | $1.00 | US West (Oregon) | Amazon-rerank-v1.0 |
Guardrails (Text filters) | 1,000 text units | $0.15 (content), $0.10 (contextual) | Global | Regex and word filters are free |
Guardrails (Image filters) | Per image | $0.00075 | Global | |
Model Evaluation | Per human task | $0.21 | Global | Model inference charged separately |
Data Automation (Standard) | Per unit | Docs: $0.01/page, Images: $0.003 | US East (N. Virginia) | Audio: $0.006/min, Video: $0.05/min |
Data Automation (Custom) | Per unit | Docs: $0.04/page, Images: $0.005 | Extra $0.0005 per field beyond 30 | |
Intelligent Prompt Routing | 1,000 requests | $1.00 | Global | Optimizes cost/accuracy between related model variants |
Prompt Optimization | 1,000 tokens | $0.030 | Global | Billed monthly from April 23, 2025 |
Policy Type | Pricing Unit | Price | Notes |
---|---|---|---|
Text Content Filter | 1,000 text units | $0.15 | Applies to offensive or inappropriate text content |
Denied Topics | 1,000 text units | $0.15 | Filters specified topics from responses |
Sensitive Information Filters | 1,000 text units | $0.10 | Protects PII or confidential content |
Sensitive Info (Regex-based) | – | Free | User-defined regex filters |
Word Filters | – | Free | Simple keyword-based blocking |
Image Content Filter | Per image | $0.00075 | Filters NSFW or unsafe visual content |
Contextual Grounding Check | 1,000 text units | $0.10 | Verifies model output alignment with reference sources |
Standard Output (US East – N. Virginia):
Custom Output:
Dynamically routes prompts across models in a family (e.g., Claude 3.5 Sonnet ↔ Claude 3 Haiku) to optimize cost & accuracy.
$0.030 per 1,000 tokens (billed monthly starting April 23, 2025)
When hosting custom models or using provisioned throughput on Amazon Bedrock, each model requires one or more Model Units, which determine the VM capacity needed to serve inferences reliably. A Model Unit is a compute bundle optimized for different model architectures and sizes. Here’s how VM sizing translates to model usage and estimated monthly costs:
Model Name | Estimated Model Units | Hourly Rate (No Commitment) | Monthly Cost (720 hrs) | Use Case |
---|---|---|---|---|
Claude 3.5 Haiku | 1 | $0.001 | ~$0.72 | Fast Q&A, basic chatbots |
Claude 3.5 Sonnet | 1 | $0.0036 | ~$2.59 | Enterprise content summarization |
Claude Instant | 1 | $0.044 | ~$31.68 | General-purpose LLMs |
Claude 2.0 / 2.1 | 1 | $0.070 | ~$50.40 | Advanced contextual understanding |
Llama 3.1 Instruct (8B) | 2 | $0.0785 × 2 | ~$113.04 | Mid-size generative tasks |
Llama 3.1 Instruct (70B) | 8 | $0.0785 × 8 | ~$452.16 | High-load applications, assistants |
Cohere Command-Light | 1 | $0.00856 | ~$6.16 | Low-latency inference |
Cohere Command R+ | 1 | $0.0495 | ~$35.64 | Complex document understanding |
Limited fine-tuning compared to full open-source model control
Usage quotas may apply depending on account limits
Still evolving in terms of cost transparency and regional support
If you’re looking to quickly build, test, and scale Large Language Model-powered apps without managing compute infrastructure, Amazon Bedrock is one of the fastest and safest ways to go live.
Its flexibility, model diversity, and serverless setup make it ideal for startups, enterprises, and developers who want to focus on innovation—not infrastructure.
By giving you access to the world’s leading Large Language Models, Bedrock removes the roadblocks that usually slow down AI development. You don’t need to manage GPUs, handle scaling logic, or patch together separate APIs. Instead, you can bring your generative AI use case to life in days—not months.