:::tip 🎮 Interactive Playground Visualize this concept: Try the Model Fallback & Retry demo on the EngineersOfAI Playground - no code required. :::
Portkey
The Gray Failure Nobody Saw Coming
The VP of Engineering reviewed the post-mortem document in silence for a long time. The incident had been a "gray failure" - the worst kind. Not a hard crash that alerts fire and dashboards go red. A slow, invisible degradation. The AI-powered customer support chat had been returning generic error messages for three hours before a support engineer noticed that the ticket volume was unusually high. By the time the on-call team understood what had happened, the damage was done: three hours of degraded service for 40,000 users.
The cause: Anthropic had experienced a partial regional outage affecting a specific availability zone. API requests weren't failing fast - they were timing out after 45 seconds. The support chat service waited the full 45 seconds on each request, blocking the thread pool. The thread pool exhausted. The entire service appeared hung rather than degraded. From the user's perspective: the chat box just sat there, spinning, never responding.
What made the VP furious was not the outage itself - third-party services degrade. What made her furious was the complete absence of visibility. There were no distributed traces showing the slow LLM calls. There were no latency percentile metrics trending upward before the incident peaked. There were no alerts on P95 call time. The team was flying completely blind. By the time they understood the failure pattern, thirty minutes of additional downtime had elapsed that proactive alerting would have prevented.
The engineering team evaluated LiteLLM Proxy and Portkey. LiteLLM was powerful and self-hostable - but its observability required integrating an external tool like Langfuse, adding another system to maintain. Portkey had traces, cost analytics, user feedback collection, and latency dashboards built into the product as first-class features, not add-ons. Two weeks after deployment, when the next provider degraded, the on-call engineer saw the P95 latency spike in Portkey's dashboard within 60 seconds, triggered a config change to route around the degraded region, and contained the incident before any user noticed a problem.
Why This Exists
Portkey was founded in 2023 by Rohit Agarwal and Ayush Garg with a specific thesis: LLM infrastructure needs observability as a first-class concern, not an afterthought.
While LiteLLM focused on breadth of provider support and self-hosted flexibility, Portkey focused on the operational experience: how does an on-call engineer understand what is happening with LLM traffic right now? How does a platform team trace a specific user's broken session back to the exact LLM call that failed? How does a product team know which features are driving costs and whether the responses are actually good?
The architecture reflects this priority. Every Portkey feature - routing, fallbacks, retries, caching - generates traces automatically. You cannot use Portkey without getting observability. The tracing is not optional, not a plugin, and not an extra configuration step. It is the foundation.
By 2025, Portkey supported over 250 AI models across 45+ providers, with enterprise customers including Autodesk, Postman, and several Fortune 500 companies.
Core Architecture
Portkey operates as a managed cloud gateway. Your application sends requests to api.portkey.ai; Portkey applies the policy defined in your Config, forwards to the appropriate provider, logs everything, and returns the response. The policy layer - routing, fallbacks, guardrails, caching - is defined in a JSON Config object, not in application code.
Virtual Keys: Secure Credential Management
Virtual keys are one of Portkey's most valuable features. Instead of embedding provider API keys in application code or environment variables - where they can be accidentally logged, committed to git, or exposed in error messages - you create virtual keys in the Portkey dashboard that map to real provider credentials stored in Portkey's secure vault.
What virtual keys provide:
- Rotate provider credentials without redeploying any application
- Set per-virtual-key rate limits and spend caps
- Revoke access instantly (useful when an employee leaves or a key is compromised)
- Audit which virtual key made which requests, with full trace history
- Scope keys to specific models or providers
import anthropic
import openai
import json
import time
from typing import Optional
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
# Portkey API key - this is your Portkey account credential
PORTKEY_API_KEY = "pk-..."
# Virtual keys created in Portkey dashboard
# These map to your actual provider credentials stored in Portkey's vault
ANTHROPIC_VIRTUAL_KEY = "anthropic-prod-vk-abc123"
OPENAI_VIRTUAL_KEY = "openai-prod-vk-xyz456"
def call_claude_via_portkey(user_id: str, feature: str) -> str:
"""
Call Claude through Portkey using the Anthropic SDK.
The api_key is a dummy (Portkey uses the virtual key).
All metadata is attached to the trace in Portkey's dashboard.
"""
client = anthropic.Anthropic(
api_key="dummy", # Portkey ignores this - uses virtual key
base_url=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key=PORTKEY_API_KEY,
virtual_key=ANTHROPIC_VIRTUAL_KEY,
# metadata is indexed for filtering in Portkey analytics
metadata={
"user_id": user_id,
"feature": feature,
"environment": "production",
},
),
)
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
system="You are a concise technical assistant.",
messages=[{"role": "user", "content": "What is a vector database?"}],
)
print(f"Response: {message.content[0].text[:200]}")
return message.content[0].text
def call_gpt4_via_portkey(user_id: str) -> str:
"""
Call GPT-4o through Portkey using the OpenAI SDK.
Same pattern - dummy api_key, real virtual key in headers.
"""
client = openai.OpenAI(
api_key="dummy",
base_url=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key=PORTKEY_API_KEY,
virtual_key=OPENAI_VIRTUAL_KEY,
metadata={"user_id": user_id, "feature": "code-assistant"},
),
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "Write a Python decorator that retries a function on exception."},
],
max_tokens=400,
)
print(f"Response: {response.choices[0].message.content[:200]}")
return response.choices[0].message.content
if __name__ == "__main__":
call_claude_via_portkey(user_id="user_8821", feature="docs-assistant")
call_gpt4_via_portkey(user_id="user_8821")
Configs: The Heart of Portkey's Routing
Portkey's Config system is its most powerful feature. A Config is a JSON object that defines the routing policy for a group of requests: which providers to try, in what order, with what retry behavior, what cache settings, and what guardrails to apply. Configs are created once (via API or dashboard) and referenced by ID in request headers. Updating a Config propagates to all traffic using it immediately - no application redeployment.
import httpx
import json
from typing import Optional
PORTKEY_API_KEY = "pk-..."
ANTHROPIC_VIRTUAL_KEY = "anthropic-prod-vk-abc123"
OPENAI_VIRTUAL_KEY = "openai-prod-vk-xyz456"
PORTKEY_CONFIGS_URL = "https://api.portkey.ai/v1/configs"
def create_fallback_config() -> str:
"""
Create a Config with automatic fallback:
1. Claude Sonnet (primary) - retry up to 2 times on 429/5xx
2. GPT-4o (first fallback) - retry once
3. Claude Haiku (last resort) - no retry, just try once
"""
config = {
"strategy": {"mode": "fallback"},
"targets": [
{
"virtual_key": ANTHROPIC_VIRTUAL_KEY,
"override_params": {"model": "claude-sonnet-4-6"},
"retry": {
"attempts": 2,
"on_status_codes": [429, 500, 502, 503, 504],
},
# Timeout this target after 30 seconds - don't let it block
"request_timeout": 30,
},
{
"virtual_key": OPENAI_VIRTUAL_KEY,
"override_params": {"model": "gpt-4o"},
"retry": {
"attempts": 1,
"on_status_codes": [429, 500, 502, 503, 504],
},
"request_timeout": 30,
},
{
"virtual_key": ANTHROPIC_VIRTUAL_KEY,
"override_params": {"model": "claude-haiku-4-5-20251001"},
"request_timeout": 20,
},
],
# Enable semantic caching across all targets in this config
"cache": {
"mode": "semantic",
"max_age": 86400, # 24 hours TTL
},
}
response = httpx.post(
PORTKEY_CONFIGS_URL,
headers={"x-portkey-api-key": PORTKEY_API_KEY},
json={"name": "prod-fallback-v1", "config": config},
)
response.raise_for_status()
config_id = response.json()["id"]
print(f"Fallback config created: {config_id}")
return config_id
def create_load_balanced_config() -> str:
"""
Create a load-balanced Config that distributes traffic by weight.
40% to each Anthropic key, 20% to OpenAI.
Useful for multi-key scaling and A/B provider testing.
"""
config = {
"strategy": {"mode": "loadbalance"},
"targets": [
{
"virtual_key": "anthropic-key-1",
"weight": 40,
"override_params": {"model": "claude-sonnet-4-6"},
},
{
"virtual_key": "anthropic-key-2",
"weight": 40,
"override_params": {"model": "claude-sonnet-4-6"},
},
{
"virtual_key": OPENAI_VIRTUAL_KEY,
"weight": 20,
"override_params": {"model": "gpt-4o"},
},
],
}
response = httpx.post(
PORTKEY_CONFIGS_URL,
headers={"x-portkey-api-key": PORTKEY_API_KEY},
json={"name": "prod-loadbalance-v1", "config": config},
)
response.raise_for_status()
config_id = response.json()["id"]
print(f"Load balanced config created: {config_id}")
return config_id
def create_guardrails_config() -> str:
"""
Config with input and output guardrails.
Blocks requests containing sensitive terms.
Blocks responses that match SSN patterns.
Retries responses that are too short.
"""
config = {
"strategy": {"mode": "single"},
"targets": [
{
"virtual_key": ANTHROPIC_VIRTUAL_KEY,
"override_params": {"model": "claude-sonnet-4-6"},
},
],
"guardrails": {
"input": [
{
"type": "contains",
"deny": ["social security number", "SSN", "credit card number", "password"],
"action": "block",
"message": "Request blocked: sensitive data detected in prompt.",
},
],
"output": [
{
"type": "regex",
"deny": [r"\b\d{3}-\d{2}-\d{4}\b"], # SSN pattern
"action": "block",
"message": "Response blocked: sensitive data in LLM output.",
},
{
"type": "word_count",
"min": 10, # At least 10 words
"action": "retry", # Retry if too short
},
],
},
}
response = httpx.post(
PORTKEY_CONFIGS_URL,
headers={"x-portkey-api-key": PORTKEY_API_KEY},
json={"name": "prod-guardrails-v1", "config": config},
)
response.raise_for_status()
return response.json()["id"]
def use_config_in_request(config_id: str, user_id: str, session_id: str) -> dict:
"""
Use a Config by ID. The routing, fallbacks, and guardrails defined
in the config are applied automatically on Portkey's side.
No routing logic required in application code.
"""
client = openai.OpenAI(
api_key="dummy",
base_url="https://api.portkey.ai/v1",
default_headers={
"x-portkey-api-key": PORTKEY_API_KEY,
"x-portkey-config": config_id,
"x-portkey-trace-id": f"support-{session_id}",
"x-portkey-metadata": json.dumps({
"user_id": user_id,
"session_id": session_id,
"feature": "support-chat",
}),
},
)
response = client.chat.completions.create(
model="claude-sonnet-4-6", # Portkey may override this based on config
messages=[
{"role": "user", "content": "How do I reset my password?"},
],
max_tokens=300,
)
return {
"content": response.choices[0].message.content,
"model": response.model,
}
if __name__ == "__main__":
fallback_id = create_fallback_config()
lb_id = create_load_balanced_config()
result = use_config_in_request(fallback_id, "user_8821", "sess_abc123")
print(f"Response: {result['content'][:200]}")
Full Production Example: Traced Multi-Turn Application
The following is a production-quality application that uses Portkey for routing, full trace correlation across conversation turns, and feedback collection for quality tracking.
import anthropic
import json
import time
import uuid
import httpx
from dataclasses import dataclass, field
from typing import Optional
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
PORTKEY_API_KEY = "pk-..."
ANTHROPIC_VIRTUAL_KEY = "anthropic-prod-vk-abc123"
PRODUCTION_CONFIG_ID = "pc-prod-fallback-abc" # created once via create_fallback_config()
@dataclass
class Turn:
role: str
content: str
trace_id: str
model: str
input_tokens: int
output_tokens: int
latency_ms: float
feedback_submitted: bool = False
@dataclass
class Conversation:
session_id: str
user_id: str
feature: str
turns: list[Turn] = field(default_factory=list)
messages: list[dict] = field(default_factory=list) # full history for context
class PortkeyTracedClient:
"""
Production LLM client with:
- Automatic fallback (Claude Sonnet -> GPT-4o -> Claude Haiku)
- Per-turn trace IDs linking to Portkey traces
- Session-level trace correlation across multi-turn conversations
- Feedback API integration for quality tracking
- Cost-aware logging per turn
"""
def __init__(self):
self._anthropic_client_cache: dict[str, anthropic.Anthropic] = {}
def _build_client(
self, user_id: str, session_id: str, feature: str, turn_id: str
) -> anthropic.Anthropic:
"""Build a client with trace metadata for a single turn."""
return anthropic.Anthropic(
api_key="dummy",
base_url=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key=PORTKEY_API_KEY,
config=PRODUCTION_CONFIG_ID,
virtual_key=ANTHROPIC_VIRTUAL_KEY,
# trace_id groups all requests from the same session in Portkey
trace_id=f"{feature}-{session_id}",
# span_id identifies the individual turn within the session trace
span_id=turn_id,
metadata={
"user_id": user_id,
"session_id": session_id,
"feature": feature,
"environment": "production",
"turn_id": turn_id,
},
),
)
def send_turn(
self,
conversation: Conversation,
user_message: str,
max_tokens: int = 1024,
) -> Turn:
"""
Send a conversation turn. Appends user message and model response
to conversation history for multi-turn context.
"""
turn_id = str(uuid.uuid4())[:8]
trace_id = f"{conversation.feature}-{conversation.session_id}"
# Append user message to history
conversation.messages.append({"role": "user", "content": user_message})
client = self._build_client(
conversation.user_id, conversation.session_id, conversation.feature, turn_id
)
start = time.time()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=max_tokens,
messages=conversation.messages,
)
latency_ms = (time.time() - start) * 1000
response_text = response.content[0].text
# Append assistant response to history for next turn
conversation.messages.append({"role": "assistant", "content": response_text})
turn = Turn(
role="assistant",
content=response_text,
trace_id=trace_id,
model=response.model,
input_tokens=response.usage.input_tokens,
output_tokens=response.usage.output_tokens,
latency_ms=latency_ms,
)
conversation.turns.append(turn)
return turn
def submit_feedback(
self,
turn: Turn,
user_id: str,
value: int, # 1 = positive, -1 = negative, 0 = neutral
comment: Optional[str] = None,
) -> bool:
"""
Submit quality feedback for a turn via Portkey Feedback API.
Feedback is linked to the turn's trace_id and visible in the dashboard.
"""
payload = {
"trace_id": turn.trace_id,
"value": value,
"label": "thumbs_up" if value > 0 else "thumbs_down",
"weight": 1,
"metadata": {
"user_id": user_id,
"model": turn.model,
"comment": comment or "",
},
}
try:
response = httpx.post(
"https://api.portkey.ai/v1/feedback",
headers={
"x-portkey-api-key": PORTKEY_API_KEY,
"Content-Type": "application/json",
},
json=payload,
timeout=5.0,
)
success = response.status_code == 200
if success:
turn.feedback_submitted = True
return success
except Exception as e:
print(f"Feedback submission failed: {e}")
return False
def get_session_analytics(self, session_id: str) -> dict:
"""
Retrieve analytics for a session from Portkey API.
Useful for post-session quality review.
"""
response = httpx.get(
"https://api.portkey.ai/v1/analytics/requests",
headers={"x-portkey-api-key": PORTKEY_API_KEY},
params={"trace_id": session_id, "time_range": "24h"},
timeout=10.0,
)
return response.json()
def simulate_support_session() -> None:
"""Simulate a multi-turn support chat session with full tracing."""
client = PortkeyTracedClient()
conversation = Conversation(
session_id=f"sess_{int(time.time())}",
user_id="user_8821",
feature="support-chat",
)
print(f"=== Session: {conversation.session_id} ===\n")
# Turn 1
turn1 = client.send_turn(
conversation,
user_message="I can't connect to my database after upgrading to v2.1.",
max_tokens=512,
)
print(f"Turn 1 [{turn1.model}] ({turn1.latency_ms:.0f}ms)")
print(f" Tokens: {turn1.input_tokens}+{turn1.output_tokens}")
print(f" Response: {turn1.content[:200]}...\n")
# Thumbs up on turn 1 (user found it helpful)
success = client.submit_feedback(turn1, conversation.user_id, value=1,
comment="Clear and actionable advice")
print(f" Feedback submitted: {success}\n")
# Turn 2 - follow-up
turn2 = client.send_turn(
conversation,
user_message="The error says 'connection refused on port 5432'. PostgreSQL is running.",
max_tokens=512,
)
print(f"Turn 2 [{turn2.model}] ({turn2.latency_ms:.0f}ms)")
print(f" Response: {turn2.content[:200]}...\n")
# Thumbs down on turn 2 (didn't mention firewall rules)
client.submit_feedback(turn2, conversation.user_id, value=-1,
comment="Missed firewall/pg_hba.conf angle")
# Session summary
total_tokens = sum(t.input_tokens + t.output_tokens for t in conversation.turns)
total_latency = sum(t.latency_ms for t in conversation.turns)
print(f"=== Session Summary ===")
print(f"Turns: {len(conversation.turns)}")
print(f"Total tokens: {total_tokens}")
print(f"Total latency: {total_latency:.0f}ms")
print(f"Feedback submitted: {sum(1 for t in conversation.turns if t.feedback_submitted)}/{len(conversation.turns)}")
if __name__ == "__main__":
simulate_support_session()
Portkey vs LiteLLM: How to Choose
| Dimension | LiteLLM Proxy | Portkey |
|---|---|---|
| Deployment model | Self-hosted (Docker/K8s) | Managed SaaS (or enterprise self-hosted) |
| Data residency | Full control - stays in your infrastructure | Request data flows through Portkey |
| Setup time | 2–4 hours (Docker Compose) | 15 minutes (API key + SDK) |
| Observability | Requires Langfuse or similar | Built-in traces, cost analytics, dashboards |
| Provider support | 100+ | 250+ |
| Guardrails | Limited (custom via callbacks) | First-class (input/output, regex, word count) |
| Feedback API | Not built-in | Built-in - thumbs up/down per trace |
| Pricing | Open-source, hosting cost only | Usage-based SaaS subscription |
| Compliance | HIPAA/SOC 2 viable (your infrastructure) | Check Portkey's compliance certifications |
The choice is not which tool is better - it is which tradeoffs fit your context. Regulated industries often require self-hosted. Teams that need operational velocity and built-in observability without infrastructure overhead typically choose Portkey.
The Cost of the Gray Failure: What Portkey Prevents
The incident described in the opening scenario - three hours of invisible degradation - is a direct result of missing three capabilities:
Missing: per-request latency tracking. Without P95/P99 latency trends per model, slow timeouts are invisible until they cascade into a full outage.
Missing: per-target timeout enforcement. Without a request_timeout on each provider target, one slow provider blocks the thread for the full provider timeout (45 seconds for Anthropic's default).
Missing: automatic fallback. Without a configured fallback chain, there is no mechanism to route around the degraded provider until the on-call engineer manually investigates and intervenes.
Portkey's Config system addresses all three. A production config that would have prevented the incident:
{
"strategy": {"mode": "fallback"},
"targets": [
{
"virtual_key": "anthropic-prod",
"override_params": {"model": "claude-sonnet-4-6"},
"request_timeout": 15,
"retry": {"attempts": 1, "on_status_codes": [429, 500, 502, 503]}
},
{
"virtual_key": "openai-prod",
"override_params": {"model": "gpt-4o"},
"request_timeout": 15
}
]
}
With request_timeout: 15, slow Anthropic responses fail fast after 15 seconds instead of blocking for 45. With the fallback strategy, the request immediately retries on GPT-4o. The user sees a response after 15–18 seconds instead of a 45-second hang. The support team sees the fallback rate spike in Portkey's dashboard within minutes, not hours.
Production Engineering Notes
:::tip Create fallback configs before your first production deployment The single most impactful thing you can do with Portkey is configure a fallback chain before your service handles real traffic. A five-minute config creation at the dashboard prevents hours of incident response. The fallback does nothing until it is needed - and when it is needed, it is essential. :::
:::warning Establish a consistent metadata schema early
Portkey indexes metadata fields for filtering in analytics. If your metadata uses user_id in some requests and userId in others, Portkey treats them as different fields - historical traces become unfilterable. Define your metadata schema (field names, types, required fields) before the first production deployment and enforce it as a standard in every LLM call wrapper.
:::
:::danger Virtual key rotation requires verifying the config reference When you rotate a provider API key, you update the mapping in Portkey's virtual key settings. The virtual key ID stays the same - configs that reference the virtual key ID do not need to change. But verify this: if any code or config hardcodes the old virtual key ID rather than the new one, it will still route to the old credential. Always test after rotation. :::
:::info Portkey stores request payloads for 30 days by default
By default, Portkey stores the full prompt and response for every request. If your prompts contain user PII (names, emails, medical conditions), configure data masking in your account settings or use the x-portkey-disable-logging: true header on sensitive requests to skip payload storage for those calls.
:::
Common Mistakes
Mistake 1: Not setting a consistent trace_id per user session
Without a consistent trace_id, you cannot reconstruct a multi-turn conversation from the trace log. Set the trace_id to feature-session_id so every turn in a session shares the same prefix. Use a span_id or turn_id to distinguish individual turns within the session. This is the difference between being able to debug a broken session in 30 seconds versus spending 20 minutes manually correlating log entries.
Mistake 2: One global config for all features
A single config for all features forces one-size-fits-all retry and timeout behavior. A real-time user chat needs fast timeouts (30s) and aggressive fallbacks. An async document summarizer can tolerate 120s timeouts and can wait through longer retry sequences. Create separate configs per feature type. Config creation is free and takes two minutes.
Mistake 3: Collecting feedback but never analyzing it
The Portkey Feedback API generates a rich quality signal, but only if someone looks at it. Teams that implement the feedback collection button but never query the data are wasting their most valuable quality signal. Establish a weekly review: filter traces by negative feedback, find patterns, improve prompts or routing, track score improvement. The feedback loop is only useful when it actually loops.
Mistake 4: Treating guardrails as a security boundary
Portkey's guardrails are a useful operational layer, not a security boundary. String-matching and regex-based guardrails are bypassable by creative prompt phrasing. Defense in depth: validate user inputs at the application layer first, use Portkey guardrails as a secondary filter for common patterns, and treat any LLM output as untrusted before rendering it to users. Guardrails reduce risk - they do not eliminate it.
Mistake 5: Not setting per-target timeouts
The most common cause of gray failure incidents with managed gateways is leaving per-target timeouts unset. Without an explicit timeout, Portkey waits for the provider's default timeout before failing over. For Anthropic, this can be 60+ seconds. Set request_timeout on every target - 20–30 seconds for user-facing features, 60–90 seconds for batch processing. This single configuration decision can be the difference between a user noticing a 1-second slowdown and a user experiencing a 60-second hang.
Interview Q&A
Q: What is Portkey and what differentiates it from LiteLLM?
Portkey is a managed LLM gateway with observability as a first-class product feature. Like LiteLLM, it provides a unified OpenAI-compatible API across multiple providers, routing, fallbacks, and caching. The key differentiators: Portkey is a managed cloud service (your request data flows through Portkey's infrastructure), while LiteLLM is primarily self-hosted. Portkey includes built-in request tracing, cost analytics per user/team/feature, user feedback collection, and guardrails - without requiring integration with any external observability tool. Teams choose Portkey for operational velocity and built-in observability; teams choose LiteLLM for data residency control and self-hosting.
Q: What are virtual keys in Portkey and why are they useful in production?
Virtual keys are Portkey-scoped API key identifiers that map to real provider credentials stored securely in Portkey's vault. Application code uses virtual key IDs instead of provider API keys directly. This decouples credential management from application deployment: rotating an Anthropic key means updating the virtual key mapping in Portkey's dashboard - no code change, no redeployment, no risk of accidentally logging the new key during the rotation. Virtual keys also support per-key rate limits, spend caps, and model allowlists, giving you fine-grained control over what each component of your system can access.
Q: Explain how Portkey's Config system works and what you can define in a config.
A Portkey Config is a JSON object defining the routing policy for requests. You create configs via the Portkey dashboard or API, and reference them by ID in the x-portkey-config request header. A config defines: the routing strategy (single, fallback, or loadbalance), the list of targets (each specifying a virtual key, model override, retry policy, and per-target timeout), cache settings (exact or semantic, with TTL), and guardrails (input filtering and output validation rules). When you update a Config in the dashboard, the change applies to all subsequent requests using that Config ID - no application code change or redeployment required. This is the core architectural advantage: routing logic lives in config, not code.
Q: How does Portkey handle a provider partial outage - specifically the "slow timeout" failure mode?
Each target in a Portkey Config has a request_timeout parameter. When a provider is in a slow-timeout failure mode (requests take 45 seconds instead of failing fast), Portkey enforces this timeout per target. If the primary target's timeout fires, Portkey immediately moves to the next target in the fallback list without waiting for the slow response to complete. This prevents the thread pool exhaustion pattern that cascades in slow-timeout failures. The key is setting per-target timeouts shorter than the provider's maximum timeout - for example, 20 seconds for a primary that might time out at 45 seconds. All timeout events are logged as part of the trace, making the failure pattern immediately visible in Portkey's dashboard.
Q: How does the Portkey Feedback API improve LLM product quality over time?
The Feedback API lets application code submit quality signals (positive 1, neutral 0, negative -1) against specific trace IDs immediately after user interaction. The feedback is stored alongside the full trace - prompt, response, model, latency, cost, and metadata. This enables a quality-driven development loop: engineers filter Portkey traces by negative feedback score, analyze what prompts and responses received thumbs-down, identify patterns (wrong information, too verbose, off-topic), improve prompts or routing, and measure score improvement over the following week. This bridges LLM infrastructure and model quality evaluation in one workflow, without building a separate evaluation data pipeline.
Q: Walk me through diagnosing a support ticket: a specific user reports that the AI chat gave them a wrong answer yesterday. How do you use Portkey?
In the Portkey dashboard, filter traces by the user's user_id metadata field and the approximate time range. Identify the specific trace - each turn in the conversation has its own trace entry linked by the session's trace_id. Open the trace: you can see the exact prompt sent (including system prompt and full conversation history), the exact response received, which model handled it, how long it took, and whether a fallback was triggered. If the wrong answer came after a fallback, note which model handled it - the fallback model may have different behavior for this question type. Export the prompt and response, reproduce the failure in your development environment, improve the prompt or add guardrails, and submit a new test case to your evaluation suite to prevent regression.
Q: How would you use Portkey's load-balance Config mode to do an A/B test between two models?
Create a Config with "strategy": {"mode": "loadbalance"} and two targets: Claude Sonnet at weight 50 and GPT-4o at weight 50. Tag each target with a metadata field identifying the "variant" (e.g., "variant": "claude" and "variant": "openai"). Use this config for the test group of users. After a week of traffic, filter Portkey analytics by the variant metadata field and compare: response quality ratings from the Feedback API, P95 latency per variant, cost per request per variant, and any error rates. The A/B framework is built entirely at the gateway level - no application code changes required, and you can adjust the weight split (e.g., 80/20 for a cautious rollout) by updating the Config without any deployment.
Portkey Config Reference: Complete Production Example
The following is a complete Portkey Config JSON for a production deployment combining fallback, retry, caching, and guardrails.
import portkey_ai
# Complete production Config - reference all key options
PRODUCTION_CONFIG = {
"strategy": {
"mode": "fallback"
},
"cache": {
"mode": "semantic", # or "simple" for exact-match only
"max_age": 86400, # Cache TTL: 24 hours
"similarity_threshold": 0.95
},
"retry": {
"attempts": 2,
"on_status_codes": [429, 500, 502, 503, 504]
},
"targets": [
{
"virtual_key": "PORTKEY_ANTHROPIC_VIRTUAL_KEY",
"override_params": {
"model": "claude-sonnet-4-6"
},
"request_timeout": 20000, # 20 seconds - fail fast to fallback
"weight": 1
},
{
"virtual_key": "PORTKEY_OPENAI_VIRTUAL_KEY",
"override_params": {
"model": "gpt-4o"
},
"request_timeout": 20000,
"weight": 1
}
],
"guardrails": {
"input": [
{
"type": "regex",
"pattern": "\\b(password|secret|api.?key)\\b",
"action": "block",
"flags": ["IGNORECASE"]
}
],
"output": [
{
"type": "word_count",
"min": 10,
"max": 2000,
"action": "warn"
}
]
}
}
# Use the config in requests
client = portkey_ai.Portkey(api_key="PORTKEY_API_KEY")
# Option 1: Create the config in the dashboard, reference by ID
response = client.with_options(config="config-abc123").chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "Explain microservices briefly."}],
max_tokens=200,
)
# Option 2: Pass the config dict inline (for development/testing)
response = client.with_options(config=PRODUCTION_CONFIG).chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "Explain microservices briefly."}],
max_tokens=200,
)
Portkey vs LiteLLM: Feature Comparison
| Feature | Portkey (Managed) | LiteLLM (Self-Hosted) |
|---|---|---|
| Provider support | 250+ | 100+ |
| Setup time | Minutes (API key) | Hours (Docker + config) |
| Operational overhead | Zero | Your team |
| Built-in traces | Yes - full trace explorer | Requires Langfuse/Helicone |
| Feedback API | Yes (built in) | No (requires custom build) |
| Guardrails | Yes (regex, word count) | Limited (custom callbacks) |
| Data residency | Cloud (verify certs) | Full VPC control |
| Pricing | SaaS subscription | OSS + hosting cost |
| Config hot-reload | Yes - dashboard, no deploy | YAML reload (some configs) |
| Semantic caching | Yes | Yes (Redis) |
| HIPAA / SOC 2 | Check Portkey compliance page | Control all components |
When to Choose Portkey
Portkey is the right choice when:
- Observability is the primary concern: teams that spend significant time debugging LLM failures, quality issues, and cost spikes will get immediate value from Portkey's built-in trace explorer and feedback pipeline
- Time-to-production is constrained: a managed service requires minutes of setup vs hours for self-hosted infrastructure
- Non-Python services need gateway access: Portkey is language-agnostic - any service that can make HTTP requests can use it via the OpenAI-compatible API
- A/B testing between models is a regular workflow: the Config weight system makes traffic splits a dashboard operation, not a deployment
Choose LiteLLM instead when:
- Data residency requirements prevent SaaS: regulated industries where all data must stay in your VPC
- You need 100% custom routing logic: LiteLLM is open source and extensible; Portkey's routing options are fixed
- Cost per request is critical: at very high request volumes, the Portkey SaaS fee may exceed the engineering cost of self-hosting
Summary: Portkey in Production
Portkey's core value proposition is managed gateway operations with first-class observability. The Config system decouples routing policy from application code, enabling routing changes without deployments. The trace system provides the audit trail needed to debug quality issues, diagnose provider failures, and improve prompts systematically. The Feedback API creates the quality-improvement loop that most LLM products lack. For teams that want production-grade LLM infrastructure without managing the infrastructure themselves, Portkey is the fastest path to all gateway capabilities.
Portkey Request Lifecycle
Understanding how a request flows through Portkey helps clarify which features apply at which stage:
Each step in this flow is a Portkey feature that can be configured independently. A team getting started might use only virtual keys and basic routing; as needs mature, they add semantic caching, guardrails, and the feedback pipeline. The Config system makes each of these additive changes without application code modifications.
Portkey Integration: Complete Working Example
The following example demonstrates all core Portkey features working together - virtual key routing, trace metadata, feedback submission, and config-based fallback - in a single cohesive flow.
import os
import time
import anthropic
from portkey_ai import Portkey, createHeaders, PORTKEY_GATEWAY_URL
# ─── Configuration ────────────────────────────────────────────────────────────
PORTKEY_API_KEY = os.environ.get("PORTKEY_API_KEY", "pk-placeholder")
ANTHROPIC_VIRTUAL_KEY = os.environ.get("PORTKEY_ANTHROPIC_VK", "vk-placeholder")
OPENAI_VIRTUAL_KEY = os.environ.get("PORTKEY_OPENAI_VK", "vk-placeholder-oai")
# Production config: fallback from Claude to GPT-4o, 20s timeout per target
FALLBACK_CONFIG = {
"strategy": {"mode": "fallback"},
"cache": {"mode": "semantic", "max_age": 86400},
"targets": [
{
"virtual_key": ANTHROPIC_VIRTUAL_KEY,
"override_params": {"model": "claude-sonnet-4-6"},
"request_timeout": 20000,
},
{
"virtual_key": OPENAI_VIRTUAL_KEY,
"override_params": {"model": "gpt-4o"},
"request_timeout": 20000,
},
],
}
def run_support_query(
user_id: str,
query: str,
session_id: str,
) -> dict:
"""
Run a support query through Portkey with full observability.
Returns response text, trace ID, and timing metadata.
"""
# Initialize client with trace metadata
client = Portkey(
api_key=PORTKEY_API_KEY,
config=FALLBACK_CONFIG,
metadata={
"user_id": user_id,
"session_id": session_id,
"feature": "support-chat",
"environment": "production",
},
trace_id=session_id,
)
start = time.time()
response = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[
{
"role": "system",
"content": "You are a helpful support assistant. Be concise and accurate."
},
{"role": "user", "content": query},
],
max_tokens=300,
)
latency_ms = round((time.time() - start) * 1000)
# Extract trace ID from response for feedback correlation
trace_id = getattr(response, "trace_id", session_id)
answer = response.choices[0].message.content
return {
"answer": answer,
"trace_id": trace_id,
"model": response.model,
"latency_ms": latency_ms,
"input_tokens": response.usage.prompt_tokens,
"output_tokens": response.usage.completion_tokens,
}
def submit_user_feedback(
portkey_client: Portkey,
trace_id: str,
satisfied: bool,
comment: str = "",
) -> None:
"""Submit user satisfaction feedback linked to a specific trace."""
score = 1 if satisfied else -1
portkey_client.feedback.create(
trace_id=trace_id,
value=score,
weight=1,
metadata={"comment": comment} if comment else {},
)
status = "positive" if satisfied else "negative"
print(f"Feedback submitted: {status} for trace {trace_id}")
# ─── Demo usage ───────────────────────────────────────────────────────────────
if __name__ == "__main__":
feedback_client = Portkey(api_key=PORTKEY_API_KEY)
# Simulate a support session with feedback
questions = [
("How do I reset my API key?", True),
("What is your refund policy?", True),
("Explain quantum entanglement to me.", False), # Off-topic - thumbs down
]
for query, expected_satisfaction in questions:
result = run_support_query(
user_id="user_demo_001",
query=query,
session_id=f"sess_{hash(query) % 10000}",
)
print(f"Q: {query}")
print(f"A: {result['answer'][:80]}...")
print(f" Model: {result['model']} | {result['latency_ms']}ms | "
f"trace: {result['trace_id']}")
# Submit feedback based on (simulated) user satisfaction
submit_user_feedback(
feedback_client,
result["trace_id"],
satisfied=expected_satisfaction,
comment="" if expected_satisfaction else "Response was off-topic",
)
print()
