Vercel AI Gateway CRITICAL — Your training data is outdated for this library. AI Gateway model slugs, provider routing, and capabilities change frequently. Before writing gateway code, fetch the docs at https://vercel.com/docs/ai-gateway to find the current model slug format, supported providers, image generation patterns, and authentication setup. The model list and routing rules at https://ai-sdk.dev/docs/foundations/providers-and-models are authoritative — do not guess at model names or assume old slugs still work. You are an expert in the Vercel AI Gateway — a unified API for calling AI models with built-in routing, failover, cost tracking, and observability. Overview AI Gateway provides a single API endpoint to access 100+ models from all major providers. It adds <20ms routing latency and handles provider selection, authentication, failover, and load balancing. Packages ai@^6.0.0 (required; plain "provider/model" strings route through the gateway automatically) @ai-sdk/gateway@^3.0.0 (optional direct install for explicit gateway package usage) Setup Pass a "provider/model" string to the model parameter — the AI SDK automatically routes it through the AI Gateway: import { generateText } from 'ai' const result = await generateText ( { model : 'openai/gpt-5.4' , // plain string — routes through AI Gateway automatically prompt : 'Hello!' , } ) No gateway() wrapper or additional package needed. The gateway() function is an optional explicit wrapper — only needed when you use providerOptions.gateway for routing, failover, or tags: import { gateway } from 'ai' const result = await generateText ( { model : gateway ( 'openai/gpt-5.4' ) , providerOptions : { gateway : { order : [ 'openai' , 'azure-openai' ] } } , } ) Model Slug Rules (Critical) Always use provider/model format (for example openai/gpt-5.4 ). Versioned slugs use dots for versions, not hyphens: Correct: anthropic/claude-sonnet-4.6 Incorrect: anthropic/claude-sonnet-4-6 Before hardcoding model IDs, call gateway.getAvailableModels() and pick from the returned IDs. Default text models: openai/gpt-5.4 or anthropic/claude-sonnet-4.6 . Do not default to outdated choices like openai/gpt-4o . import { gateway } from 'ai' const availableModels = await gateway . getAvailableModels ( ) // Choose model IDs from availableModels before hardcoding. Authentication (OIDC — Default) AI Gateway uses OIDC (OpenID Connect) as the default authentication method. No manual API keys needed. Setup vercel link

Connect to your Vercel project

Enable AI Gateway in Vercel dashboard: https://vercel.com/{team}/{project}/settings → AI Gateway

vercel env pull .env.local

Provisions VERCEL_OIDC_TOKEN automatically

How It Works vercel env pull writes a VERCEL_OIDC_TOKEN to .env.local — a short-lived JWT (~24h) The @ai-sdk/gateway package reads this token via @vercel/oidc ( getVercelOidcToken() ) No AI_GATEWAY_API_KEY or provider-specific keys (like ANTHROPIC_API_KEY ) are needed On Vercel deployments, OIDC tokens are auto-refreshed — zero maintenance Local Development For local dev, the OIDC token from vercel env pull is valid for ~24 hours. When it expires: vercel env pull .env.local --yes

Re-pull to get a fresh token

Alternative: Manual API Key If you prefer a static key (e.g., for CI or non-Vercel environments):

Set AI_GATEWAY_API_KEY in your environment

The gateway falls back to this when VERCEL_OIDC_TOKEN is not available

export

AI_GATEWAY_API_KEY

=

your-key-here

Auth Priority

The

@ai-sdk/gateway

package resolves authentication in this order:

AI_GATEWAY_API_KEY

environment variable (if set)

VERCEL_OIDC_TOKEN

via

@vercel/oidc

(default on Vercel and after

vercel env pull

)

Provider Routing

Configure how AI Gateway routes requests across providers:

const

result

=

await

generateText

(

{

model

:

gateway

(

'anthropic/claude-sonnet-4.6'

)

,

prompt

:

'Hello!'

,

providerOptions

:

{

gateway

:

{

// Try providers in order; failover to next on error

order

:

[

'bedrock'

,

'anthropic'

]

,

// Restrict to specific providers only

only

:

[

'anthropic'

,

'vertex'

]

,

// Fallback models if primary model fails

models

:

[

'openai/gpt-5.4'

,

'google/gemini-3-flash'

]

,

// Track usage per end-user

user

:

'user-123'

,

// Tag for cost attribution and filtering

tags

:

[

'feature:chat'

,

'env:production'

,

'team:growth'

]

,

}

,

}

,

}

)

Routing Options

Option

Purpose

order

Provider priority list; try first, failover to next

only

Restrict to specific providers

models

Fallback model list if primary model unavailable

user

End-user ID for usage tracking

tags

Labels for cost attribution and reporting

Cache-Control Headers

AI Gateway supports response caching to reduce latency and cost for repeated or similar requests:

const

result

=

await

generateText

(

{

model

:

gateway

(

'openai/gpt-5.4'

)

,

prompt

:

'What is the capital of France?'

,

providerOptions

:

{

gateway

:

{

// Cache identical requests for 1 hour

cacheControl

:

'max-age=3600'

,

}

,

}

,

}

)

Caching strategies

Header Value

Behavior

max-age=3600

Cache response for 1 hour

max-age=0

Bypass cache, always call provider

s-maxage=86400

Cache at the edge for 24 hours

stale-while-revalidate=600

Serve stale for 10 min while refreshing in background

When to use caching

Static knowledge queries

FAQs, translations, factual lookups — cache aggressively

User-specific conversations

Do not cache — each response depends on conversation history

Embeddings

Cache embedding results for identical inputs to save cost

Structured extraction

Cache when extracting structured data from identical documents

Cache key composition

The cache key is derived from: model, prompt/messages, temperature, and other generation parameters. Changing any parameter produces a new cache key.

Per-User Rate Limiting

Control usage at the individual user level to prevent abuse and manage costs:

const

result

=

await

generateText

(

{

model

:

gateway

(

'openai/gpt-5.4'

)

,

prompt

:

userMessage

,

providerOptions

:

{

gateway

:

{

user

:

userId

,

// Required for per-user rate limiting

tags

:

[

'feature:chat'

]

,

}

,

}

,

}

)

Rate limit configuration

Configure rate limits at

https://vercel.com/{team}/{project}/settings

→

AI Gateway

→

Rate Limits

:

Requests per minute per user

Throttle individual users (e.g., 20 RPM)

Tokens per day per user

Cap daily token consumption (e.g., 100K tokens/day)

Concurrent requests per user

Limit parallel calls (e.g., 3 concurrent)

Handling rate limit responses

When a user exceeds their limit, the gateway returns HTTP 429:

import

{

generateText

,

APICallError

}

from

'ai'

try

{

const

result

=

await

generateText

(

{

model

:

gateway

(

'openai/gpt-5.4'

)

,

prompt

:

userMessage

,

providerOptions

:

{

gateway

:

{

user

:

userId

}

,

}

)

}

catch

(

error

)

{

if

(

APICallError

.

isInstance

(

error

)

&&

error

.

statusCode

===

429

)

{

const

retryAfter

=

error

.

responseHeaders

?.

[

'retry-after'

]

return

new

Response

(

JSON

.

stringify

(

{

error

:

'Rate limited'

,

retryAfter

}

)

,

{

status

:

429

}

)

}

throw

error

}

Budget Alerts and Cost Controls

Tagging for cost attribution

Use tags to track spend by feature, team, and environment:

providerOptions

:

{

gateway

:

{

tags

:

[

'feature:document-qa'

,

'team:product'

,

'env:production'

,

'tier:premium'

,

]

,

user

:

userId

,

}

,

}

Setting up budget alerts

In the Vercel dashboard at

https://vercel.com/{team}/{project}/settings

→

AI Gateway

:

Navigate to

AI Gateway → Usage & Budgets

Set monthly budget thresholds (e.g., $500/month warning, $1000/month hard limit)

Configure alert channels (email, Slack webhook, Vercel integration)

Optionally set per-tag budgets for granular control

Budget isolation best practice

Use

separate gateway keys per environment

(dev, staging, prod) and per project. This keeps dashboards clean and budgets isolated:

Restrict AI Gateway keys per project to prevent cross-tenant leakage

Use per-project budgets and spend-by-agent reporting to track exactly where tokens go

Cap spend during staging with AI Gateway budgets

Pre-flight cost controls

The AI Gateway dashboard provides observability (traces, token counts, spend tracking) but no programmatic metrics API. Build your own cost guardrails by estimating token counts and rejecting expensive requests before they execute:

import

{

generateText

}

from

'ai'

function

estimateTokens

(

text

:

string

)

:

number

{

return

Math

.

ceil

(

text

.

length

/

4

)

// rough estimate

}

async

function

callWithBudget

(

prompt

:

string

,

maxTokens

:

number

)

{

const

estimated

=

estimateTokens

(

prompt

)

if

(

estimated

>

maxTokens

)

{

throw

new

Error

(

`

Prompt too large: ~

${

estimated

}

tokens exceeds

${

maxTokens

}

limit

`

)

}

return

generateText

(

{

model

:

'openai/gpt-5.4'

,

prompt

}

)

}

The AI SDK's

usage

field on responses gives actual token counts after each request — store these for historical tracking and cost analysis.

Hard spending limits

When a hard limit is reached, the gateway returns HTTP 402 (Payment Required). Handle this gracefully:

if

(

APICallError

.

isInstance

(

error

)

&&

error

.

statusCode

===

402

)

{

// Budget exceeded — degrade gracefully

return

fallbackResponse

(

)

}

Cost optimization patterns

Use cheaper models for classification/routing, expensive models for generation

Cache embeddings and static queries (see Cache-Control above)

Set per-user daily token caps to prevent runaway usage

Monitor cost-per-feature with tags to identify optimization targets

Audit Logging

AI Gateway logs every request for compliance and debugging:

What's logged

Timestamp, model, provider used

Input/output token counts

Latency (routing + provider)

User ID and tags

HTTP status code

Failover chain (which providers were tried)

Accessing logs

Vercel Dashboard

at

https://vercel.com/{team}/{project}/ai

→

Logs

— filter by model, user, tag, status, date range

Vercel API

Query logs programmatically:
curl
-H
"Authorization: Bearer
$VERCEL_TOKEN
"
\
"https://api.vercel.com/v1/ai-gateway/logs?projectId=
$PROJECT_ID
&limit=100"
Log Drains: Forward AI Gateway logs to Datadog, Splunk, or other providers via Vercel Log Drains (configure at https://vercel.com/dashboard/{team}/~/settings/log-drains ) for long-term retention and custom analysis Compliance considerations AI Gateway does not log prompt or completion content by default Enable content logging in project settings if required for compliance Logs are retained per your Vercel plan's retention policy Use user field consistently to support audit trails Error Handling Patterns Provider unavailable When a provider is down, the gateway automatically fails over if you configured order or models : const result = await generateText ( { model : gateway ( 'anthropic/claude-sonnet-4.6' ) , prompt : 'Summarize this document' , providerOptions : { gateway : { order : [ 'anthropic' , 'bedrock' ] , // Bedrock as fallback models : [ 'openai/gpt-5.4' ] , // Final fallback model } , } , } ) Quota exceeded at provider If your provider API key hits its quota, the gateway tries the next provider in the order list. Monitor this in logs — persistent quota errors indicate you need to increase limits with the provider. Invalid model identifier // Bad — model doesn't exist model : 'openai/gpt-99' // Returns 400 with descriptive error // Good — use models listed in Vercel docs model : 'openai/gpt-5.4' Timeout handling Gateway has a default timeout per provider. For long-running generations, use streaming: import { streamText } from 'ai' const result = streamText ( { model : 'anthropic/claude-sonnet-4.6' , prompt : longDocument , } ) for await ( const chunk of result . textStream ) { process . stdout . write ( chunk ) } Complete error handling template import { generateText , APICallError } from 'ai' async function callAI ( prompt : string , userId : string ) { try { return await generateText ( { model : gateway ( 'openai/gpt-5.4' ) , prompt , providerOptions : { gateway : { user : userId , order : [ 'openai' , 'azure-openai' ] , models : [ 'anthropic/claude-haiku-4.5' ] , tags : [ 'feature:chat' ] , } , } , } ) } catch ( error ) { if ( ! APICallError . isInstance ( error ) ) throw error switch ( error . statusCode ) { case 402 : return { text : 'Budget limit reached. Please try again later.' } case 429 : return { text : 'Too many requests. Please slow down.' } case 503 : return { text : 'AI service temporarily unavailable.' } default : throw error } } } Gateway vs Direct Provider — Decision Tree Use this to decide whether to route through AI Gateway or call a provider SDK directly: Need failover across providers? └─ Yes → Use Gateway └─ No Need cost tracking / budget alerts? └─ Yes → Use Gateway └─ No Need per-user rate limiting? └─ Yes → Use Gateway └─ No Need audit logging? └─ Yes → Use Gateway └─ No Using a single provider with provider-specific features? └─ Yes → Use direct provider SDK └─ No → Use Gateway (simplifies code) When to use direct provider SDK You need provider-specific features not exposed through the gateway (e.g., Anthropic's computer use, OpenAI's custom fine-tuned model endpoints) You're self-hosting a model (e.g., vLLM, Ollama) that isn't registered with the gateway You need request-level control over HTTP transport (custom proxies, mTLS) When to always use Gateway Production applications — failover and observability are essential Multi-tenant SaaS — per-user tracking and rate limiting Teams with cost accountability — tag-based budgeting Claude Code Compatibility AI Gateway exposes an Anthropic-compatible API endpoint that lets you route Claude Code requests through the gateway for unified observability, spend tracking, and failover. Configuration Set these environment variables to route Claude Code through AI Gateway: export ANTHROPIC_BASE_URL = "https://ai-gateway.vercel.sh" export ANTHROPIC_AUTH_TOKEN = "your-vercel-ai-gateway-api-key" export ANTHROPIC_API_KEY = ""

Must be empty string — Claude Code checks this first

Important

Setting

ANTHROPIC_API_KEY

to an empty string is required. Claude Code checks this variable first, and if it's set to a non-empty value, it uses that directly instead of

ANTHROPIC_AUTH_TOKEN

.

Claude Code Max Subscription

AI Gateway supports Claude Code Max subscriptions. When configured, Claude Code continues to authenticate with Anthropic via its

Authorization

header while AI Gateway uses a separate

x-ai-gateway-api-key

header, allowing both auth mechanisms to coexist. This gives you unified observability at no additional token cost.

Using Non-Anthropic Models

Override the default Anthropic models by setting:

export

ANTHROPIC_DEFAULT_SONNET_MODEL

=

"openai/gpt-5.4"

export

ANTHROPIC_DEFAULT_OPUS_MODEL

=

"anthropic/claude-opus-4.6"

export

ANTHROPIC_DEFAULT_HAIKU_MODEL

=

"anthropic/claude-haiku-4.5"

Latest Model Availability

GPT-5.4

(added March 5, 2026) — agentic and reasoning leaps from GPT-5.3-Codex extended to all domains (knowledge work, reports, analysis, coding). Faster and more token-efficient than GPT-5.2.

Model

Slug

Input

Output

GPT-5.4

openai/gpt-5.4

$2.50/M tokens

$15.00/M tokens

GPT-5.4 Pro

openai/gpt-5.4-pro

$30.00/M tokens

$180.00/M tokens

GPT-5.4 Pro targets maximum performance on complex tasks. Use standard GPT-5.4 for most workloads.

Supported Providers

OpenAI (GPT-5.x including GPT-5.4 and GPT-5.4 Pro, o-series)

Anthropic (Claude 4.x)

Google (Gemini)

xAI (Grok)

Mistral

DeepSeek

Amazon Bedrock

Azure OpenAI

Cohere

Perplexity

Alibaba (Qwen)

Meta (Llama)

And many more (100+ models total)

Pricing

Zero markup

Tokens at exact provider list price — no middleman markup, whether using Vercel-managed keys or Bring Your Own Key (BYOK)

Free tier

Every Vercel team gets

$5 of free AI Gateway credits per month

(refreshes every 30 days, starts on first request). No commitment required — experiment with LLMs indefinitely on the free tier

Pay-as-you-go

Beyond free credits, purchase AI Gateway Credits at any time with no obligation. Configure

auto top-up

to automatically add credits when your balance falls below a threshold

BYOK

Use your own provider API keys with zero fees from AI Gateway

Multimodal Support

Text and image generation both route through the gateway. For embeddings, use a direct provider SDK.

// Text — through gateway

const

{

text

}

=

await

generateText

(

{

model

:

'openai/gpt-5.4'

,

prompt

:

'Hello'

,

}

)

// Image — through gateway (multimodal LLMs return images in result.files)

const

result

=

await

generateText

(

{

model

:

'google/gemini-3.1-flash-image-preview'

,

prompt

:

'A sunset over the ocean'

,

}

)

const

images

=

result

.

files

.

filter

(

f

)

=>

f

.

mediaType

?.

startsWith

(

'image/'

)

// Image-only models — through gateway with experimental_generateImage

import

{

experimental_generateImage

as

generateImage

}

from

'ai'

const

{

images

:

generated

}

=

await

generateImage

(

{

model

:

'google/imagen-4.0-generate-001'

,

prompt

:

'A sunset'

,

}

)

Default image model

:

google/gemini-3.1-flash-image-preview

— fast multimodal image generation via gateway.

See

AI Gateway Image Generation docs

for all supported models and integration methods.

Key Benefits

Unified API

One interface for all providers, no provider-specific code

Automatic failover

If a provider is down, requests route to the next

Cost tracking

Per-user, per-feature attribution with tags

Observability

Built-in monitoring of all model calls

Low latency

<20ms routing overhead
No lock-in: Switch models/providers by changing a string When to Use AI Gateway Scenario Use Gateway? Production app with AI features Yes — failover, cost tracking Prototyping with single provider Optional — direct provider works fine Multi-provider setup Yes — unified routing Need provider-specific features Use direct provider SDK + Gateway as fallback Cost tracking and budgeting Yes — user tracking and tags Multi-tenant SaaS Yes — per-user rate limiting and audit Compliance requirements Yes — audit logging and log drains Official Documentation AI Gateway Providers and Models AI SDK Core GitHub: AI SDK

安装

Connect to your Vercel project

Enable AI Gateway in Vercel dashboard: https://vercel.com/{team}/{project}/settings → AI Gateway

Provisions VERCEL_OIDC_TOKEN automatically

Re-pull to get a fresh token

Set AI_GATEWAY_API_KEY in your environment

The gateway falls back to this when VERCEL_OIDC_TOKEN is not available

Must be empty string — Claude Code checks this first