Use Portkey as a managed LLM gateway with built-in observability, virtual keys, guardrails, request tracing, feedback collection, and automated fallbacks across Claude, GPT-4o, and 250+ providers.

How does LLM gateway work in practice?

Portkey covers Portkey, LLM gateway, AI observability from first principles with code examples. Free lesson at https://engineersofai.com/docs/ai-engineering/llm-gateways/portkey

What is the difference between Portkey and AI observability?

See the full breakdown at https://engineersofai.com/docs/ai-engineering/llm-gateways/portkey

:::tip 🎮 Interactive Playground Visualize this concept: Try the Model Fallback & Retry demo on the EngineersOfAI Playground - no code required. :::

Portkey

The Gray Failure Nobody Saw Coming

The VP of Engineering reviewed the post-mortem document in silence for a long time. The incident had been a "gray failure" - the worst kind. Not a hard crash that alerts fire and dashboards go red. A slow, invisible degradation. The AI-powered customer support chat had been returning generic error messages for three hours before a support engineer noticed that the ticket volume was unusually high. By the time the on-call team understood what had happened, the damage was done: three hours of degraded service for 40,000 users.

The cause: Anthropic had experienced a partial regional outage affecting a specific availability zone. API requests weren't failing fast - they were timing out after 45 seconds. The support chat service waited the full 45 seconds on each request, blocking the thread pool. The thread pool exhausted. The entire service appeared hung rather than degraded. From the user's perspective: the chat box just sat there, spinning, never responding.

What made the VP furious was not the outage itself - third-party services degrade. What made her furious was the complete absence of visibility. There were no distributed traces showing the slow LLM calls. There were no latency percentile metrics trending upward before the incident peaked. There were no alerts on P95 call time. The team was flying completely blind. By the time they understood the failure pattern, thirty minutes of additional downtime had elapsed that proactive alerting would have prevented.

The engineering team evaluated LiteLLM Proxy and Portkey. LiteLLM was powerful and self-hostable - but its observability required integrating an external tool like Langfuse, adding another system to maintain. Portkey had traces, cost analytics, user feedback collection, and latency dashboards built into the product as first-class features, not add-ons. Two weeks after deployment, when the next provider degraded, the on-call engineer saw the P95 latency spike in Portkey's dashboard within 60 seconds, triggered a config change to route around the degraded region, and contained the incident before any user noticed a problem.

Why This Exists

Portkey was founded in 2023 by Rohit Agarwal and Ayush Garg with a specific thesis: LLM infrastructure needs observability as a first-class concern, not an afterthought.

While LiteLLM focused on breadth of provider support and self-hosted flexibility, Portkey focused on the operational experience: how does an on-call engineer understand what is happening with LLM traffic right now? How does a platform team trace a specific user's broken session back to the exact LLM call that failed? How does a product team know which features are driving costs and whether the responses are actually good?

The architecture reflects this priority. Every Portkey feature - routing, fallbacks, retries, caching - generates traces automatically. You cannot use Portkey without getting observability. The tracing is not optional, not a plugin, and not an extra configuration step. It is the foundation.

By 2025, Portkey supported over 250 AI models across 45+ providers, with enterprise customers including Autodesk, Postman, and several Fortune 500 companies.

Core Architecture

Portkey operates as a managed cloud gateway. Your application sends requests to api.portkey.ai; Portkey applies the policy defined in your Config, forwards to the appropriate provider, logs everything, and returns the response. The policy layer - routing, fallbacks, guardrails, caching - is defined in a JSON Config object, not in application code.

Virtual Keys: Secure Credential Management

Virtual keys are one of Portkey's most valuable features. Instead of embedding provider API keys in application code or environment variables - where they can be accidentally logged, committed to git, or exposed in error messages - you create virtual keys in the Portkey dashboard that map to real provider credentials stored in Portkey's secure vault.

What virtual keys provide:

Rotate provider credentials without redeploying any application
Set per-virtual-key rate limits and spend caps
Revoke access instantly (useful when an employee leaves or a key is compromised)
Audit which virtual key made which requests, with full trace history
Scope keys to specific models or providers

import anthropic
import openai
import json
import time
from typing import Optional
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

# Portkey API key - this is your Portkey account credential
PORTKEY_API_KEY = "pk-..."

# Virtual keys created in Portkey dashboard
# These map to your actual provider credentials stored in Portkey's vault
ANTHROPIC_VIRTUAL_KEY = "anthropic-prod-vk-abc123"
OPENAI_VIRTUAL_KEY = "openai-prod-vk-xyz456"


def call_claude_via_portkey(user_id: str, feature: str) -> str:
    """
    Call Claude through Portkey using the Anthropic SDK.
    The api_key is a dummy (Portkey uses the virtual key).
    All metadata is attached to the trace in Portkey's dashboard.
    """
    client = anthropic.Anthropic(
        api_key="dummy",                  # Portkey ignores this - uses virtual key
        base_url=PORTKEY_GATEWAY_URL,
        default_headers=createHeaders(
            api_key=PORTKEY_API_KEY,
            virtual_key=ANTHROPIC_VIRTUAL_KEY,
            # metadata is indexed for filtering in Portkey analytics
            metadata={
                "user_id": user_id,
                "feature": feature,
                "environment": "production",
            },
        ),
    )

    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=512,
        system="You are a concise technical assistant.",
        messages=[{"role": "user", "content": "What is a vector database?"}],
    )

    print(f"Response: {message.content[0].text[:200]}")
    return message.content[0].text


def call_gpt4_via_portkey(user_id: str) -> str:
    """
    Call GPT-4o through Portkey using the OpenAI SDK.
    Same pattern - dummy api_key, real virtual key in headers.
    """
    client = openai.OpenAI(
        api_key="dummy",
        base_url=PORTKEY_GATEWAY_URL,
        default_headers=createHeaders(
            api_key=PORTKEY_API_KEY,
            virtual_key=OPENAI_VIRTUAL_KEY,
            metadata={"user_id": user_id, "feature": "code-assistant"},
        ),
    )

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": "Write a Python decorator that retries a function on exception."},
        ],
        max_tokens=400,
    )

    print(f"Response: {response.choices[0].message.content[:200]}")
    return response.choices[0].message.content


if __name__ == "__main__":
    call_claude_via_portkey(user_id="user_8821", feature="docs-assistant")
    call_gpt4_via_portkey(user_id="user_8821")

Configs: The Heart of Portkey's Routing

Portkey's Config system is its most powerful feature. A Config is a JSON object that defines the routing policy for a group of requests: which providers to try, in what order, with what retry behavior, what cache settings, and what guardrails to apply. Configs are created once (via API or dashboard) and referenced by ID in request headers. Updating a Config propagates to all traffic using it immediately - no application redeployment.

import httpx
import json
from typing import Optional

PORTKEY_API_KEY = "pk-..."
ANTHROPIC_VIRTUAL_KEY = "anthropic-prod-vk-abc123"
OPENAI_VIRTUAL_KEY = "openai-prod-vk-xyz456"
PORTKEY_CONFIGS_URL = "https://api.portkey.ai/v1/configs"


def create_fallback_config() -> str:
    """
    Create a Config with automatic fallback:
    1. Claude Sonnet (primary) - retry up to 2 times on 429/5xx
    2. GPT-4o (first fallback) - retry once
    3. Claude Haiku (last resort) - no retry, just try once
    """
    config = {
        "strategy": {"mode": "fallback"},
        "targets": [
            {
                "virtual_key": ANTHROPIC_VIRTUAL_KEY,
                "override_params": {"model": "claude-sonnet-4-6"},
                "retry": {
                    "attempts": 2,
                    "on_status_codes": [429, 500, 502, 503, 504],
                },
                # Timeout this target after 30 seconds - don't let it block
                "request_timeout": 30,
            },
            {
                "virtual_key": OPENAI_VIRTUAL_KEY,
                "override_params": {"model": "gpt-4o"},
                "retry": {
                    "attempts": 1,
                    "on_status_codes": [429, 500, 502, 503, 504],
                },
                "request_timeout": 30,
            },
            {
                "virtual_key": ANTHROPIC_VIRTUAL_KEY,
                "override_params": {"model": "claude-haiku-4-5-20251001"},
                "request_timeout": 20,
            },
        ],
        # Enable semantic caching across all targets in this config
        "cache": {
            "mode": "semantic",
            "max_age": 86400,    # 24 hours TTL
        },
    }

    response = httpx.post(
        PORTKEY_CONFIGS_URL,
        headers={"x-portkey-api-key": PORTKEY_API_KEY},
        json={"name": "prod-fallback-v1", "config": config},
    )
    response.raise_for_status()
    config_id = response.json()["id"]
    print(f"Fallback config created: {config_id}")
    return config_id


def create_load_balanced_config() -> str:
    """
    Create a load-balanced Config that distributes traffic by weight.
    40% to each Anthropic key, 20% to OpenAI.
    Useful for multi-key scaling and A/B provider testing.
    """
    config = {
        "strategy": {"mode": "loadbalance"},
        "targets": [
            {
                "virtual_key": "anthropic-key-1",
                "weight": 40,
                "override_params": {"model": "claude-sonnet-4-6"},
            },
            {
                "virtual_key": "anthropic-key-2",
                "weight": 40,
                "override_params": {"model": "claude-sonnet-4-6"},
            },
            {
                "virtual_key": OPENAI_VIRTUAL_KEY,
                "weight": 20,
                "override_params": {"model": "gpt-4o"},
            },
        ],
    }

    response = httpx.post(
        PORTKEY_CONFIGS_URL,
        headers={"x-portkey-api-key": PORTKEY_API_KEY},
        json={"name": "prod-loadbalance-v1", "config": config},
    )
    response.raise_for_status()
    config_id = response.json()["id"]
    print(f"Load balanced config created: {config_id}")
    return config_id


def create_guardrails_config() -> str:
    """
    Config with input and output guardrails.
    Blocks requests containing sensitive terms.
    Blocks responses that match SSN patterns.
    Retries responses that are too short.
    """
    config = {
        "strategy": {"mode": "single"},
        "targets": [
            {
                "virtual_key": ANTHROPIC_VIRTUAL_KEY,
                "override_params": {"model": "claude-sonnet-4-6"},
            },
        ],
        "guardrails": {
            "input": [
                {
                    "type": "contains",
                    "deny": ["social security number", "SSN", "credit card number", "password"],
                    "action": "block",
                    "message": "Request blocked: sensitive data detected in prompt.",
                },
            ],
            "output": [
                {
                    "type": "regex",
                    "deny": [r"\b\d{3}-\d{2}-\d{4}\b"],    # SSN pattern
                    "action": "block",
                    "message": "Response blocked: sensitive data in LLM output.",
                },
                {
                    "type": "word_count",
                    "min": 10,                               # At least 10 words
                    "action": "retry",                       # Retry if too short
                },
            ],
        },
    }

    response = httpx.post(
        PORTKEY_CONFIGS_URL,
        headers={"x-portkey-api-key": PORTKEY_API_KEY},
        json={"name": "prod-guardrails-v1", "config": config},
    )
    response.raise_for_status()
    return response.json()["id"]


def use_config_in_request(config_id: str, user_id: str, session_id: str) -> dict:
    """
    Use a Config by ID. The routing, fallbacks, and guardrails defined
    in the config are applied automatically on Portkey's side.
    No routing logic required in application code.
    """
    client = openai.OpenAI(
        api_key="dummy",
        base_url="https://api.portkey.ai/v1",
        default_headers={
            "x-portkey-api-key": PORTKEY_API_KEY,
            "x-portkey-config": config_id,
            "x-portkey-trace-id": f"support-{session_id}",
            "x-portkey-metadata": json.dumps({
                "user_id": user_id,
                "session_id": session_id,
                "feature": "support-chat",
            }),
        },
    )

    response = client.chat.completions.create(
        model="claude-sonnet-4-6",     # Portkey may override this based on config
        messages=[
            {"role": "user", "content": "How do I reset my password?"},
        ],
        max_tokens=300,
    )

    return {
        "content": response.choices[0].message.content,
        "model": response.model,
    }


if __name__ == "__main__":
    fallback_id = create_fallback_config()
    lb_id = create_load_balanced_config()
    result = use_config_in_request(fallback_id, "user_8821", "sess_abc123")
    print(f"Response: {result['content'][:200]}")

Full Production Example: Traced Multi-Turn Application

The following is a production-quality application that uses Portkey for routing, full trace correlation across conversation turns, and feedback collection for quality tracking.

import anthropic
import json
import time
import uuid
import httpx
from dataclasses import dataclass, field
from typing import Optional
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders


PORTKEY_API_KEY = "pk-..."
ANTHROPIC_VIRTUAL_KEY = "anthropic-prod-vk-abc123"
PRODUCTION_CONFIG_ID = "pc-prod-fallback-abc"   # created once via create_fallback_config()


@dataclass
class Turn:
    role: str
    content: str
    trace_id: str
    model: str
    input_tokens: int
    output_tokens: int
    latency_ms: float
    feedback_submitted: bool = False


@dataclass
class Conversation:
    session_id: str
    user_id: str
    feature: str
    turns: list[Turn] = field(default_factory=list)
    messages: list[dict] = field(default_factory=list)  # full history for context


class PortkeyTracedClient:
    """
    Production LLM client with:
    - Automatic fallback (Claude Sonnet -> GPT-4o -> Claude Haiku)
    - Per-turn trace IDs linking to Portkey traces
    - Session-level trace correlation across multi-turn conversations
    - Feedback API integration for quality tracking
    - Cost-aware logging per turn
    """

    def __init__(self):
        self._anthropic_client_cache: dict[str, anthropic.Anthropic] = {}

    def _build_client(
        self, user_id: str, session_id: str, feature: str, turn_id: str
    ) -> anthropic.Anthropic:
        """Build a client with trace metadata for a single turn."""
        return anthropic.Anthropic(
            api_key="dummy",
            base_url=PORTKEY_GATEWAY_URL,
            default_headers=createHeaders(
                api_key=PORTKEY_API_KEY,
                config=PRODUCTION_CONFIG_ID,
                virtual_key=ANTHROPIC_VIRTUAL_KEY,
                # trace_id groups all requests from the same session in Portkey
                trace_id=f"{feature}-{session_id}",
                # span_id identifies the individual turn within the session trace
                span_id=turn_id,
                metadata={
                    "user_id": user_id,
                    "session_id": session_id,
                    "feature": feature,
                    "environment": "production",
                    "turn_id": turn_id,
                },
            ),
        )

    def send_turn(
        self,
        conversation: Conversation,
        user_message: str,
        max_tokens: int = 1024,
    ) -> Turn:
        """
        Send a conversation turn. Appends user message and model response
        to conversation history for multi-turn context.
        """
        turn_id = str(uuid.uuid4())[:8]
        trace_id = f"{conversation.feature}-{conversation.session_id}"

        # Append user message to history
        conversation.messages.append({"role": "user", "content": user_message})

        client = self._build_client(
            conversation.user_id, conversation.session_id, conversation.feature, turn_id
        )

        start = time.time()
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=max_tokens,
            messages=conversation.messages,
        )
        latency_ms = (time.time() - start) * 1000

        response_text = response.content[0].text

        # Append assistant response to history for next turn
        conversation.messages.append({"role": "assistant", "content": response_text})

        turn = Turn(
            role="assistant",
            content=response_text,
            trace_id=trace_id,
            model=response.model,
            input_tokens=response.usage.input_tokens,
            output_tokens=response.usage.output_tokens,
            latency_ms=latency_ms,
        )
        conversation.turns.append(turn)
        return turn

    def submit_feedback(
        self,
        turn: Turn,
        user_id: str,
        value: int,          # 1 = positive, -1 = negative, 0 = neutral
        comment: Optional[str] = None,
    ) -> bool:
        """
        Submit quality feedback for a turn via Portkey Feedback API.
        Feedback is linked to the turn's trace_id and visible in the dashboard.
        """
        payload = {
            "trace_id": turn.trace_id,
            "value": value,
            "label": "thumbs_up" if value > 0 else "thumbs_down",
            "weight": 1,
            "metadata": {
                "user_id": user_id,
                "model": turn.model,
                "comment": comment or "",
            },
        }
        try:
            response = httpx.post(
                "https://api.portkey.ai/v1/feedback",
                headers={
                    "x-portkey-api-key": PORTKEY_API_KEY,
                    "Content-Type": "application/json",
                },
                json=payload,
                timeout=5.0,
            )
            success = response.status_code == 200
            if success:
                turn.feedback_submitted = True
            return success
        except Exception as e:
            print(f"Feedback submission failed: {e}")
            return False

    def get_session_analytics(self, session_id: str) -> dict:
        """
        Retrieve analytics for a session from Portkey API.
        Useful for post-session quality review.
        """
        response = httpx.get(
            "https://api.portkey.ai/v1/analytics/requests",
            headers={"x-portkey-api-key": PORTKEY_API_KEY},
            params={"trace_id": session_id, "time_range": "24h"},
            timeout=10.0,
        )
        return response.json()


def simulate_support_session() -> None:
    """Simulate a multi-turn support chat session with full tracing."""
    client = PortkeyTracedClient()

    conversation = Conversation(
        session_id=f"sess_{int(time.time())}",
        user_id="user_8821",
        feature="support-chat",
    )

    print(f"=== Session: {conversation.session_id} ===\n")

    # Turn 1
    turn1 = client.send_turn(
        conversation,
        user_message="I can't connect to my database after upgrading to v2.1.",
        max_tokens=512,
    )
    print(f"Turn 1 [{turn1.model}] ({turn1.latency_ms:.0f}ms)")
    print(f"  Tokens: {turn1.input_tokens}+{turn1.output_tokens}")
    print(f"  Response: {turn1.content[:200]}...\n")

    # Thumbs up on turn 1 (user found it helpful)
    success = client.submit_feedback(turn1, conversation.user_id, value=1,
                                     comment="Clear and actionable advice")
    print(f"  Feedback submitted: {success}\n")

    # Turn 2 - follow-up
    turn2 = client.send_turn(
        conversation,
        user_message="The error says 'connection refused on port 5432'. PostgreSQL is running.",
        max_tokens=512,
    )
    print(f"Turn 2 [{turn2.model}] ({turn2.latency_ms:.0f}ms)")
    print(f"  Response: {turn2.content[:200]}...\n")

    # Thumbs down on turn 2 (didn't mention firewall rules)
    client.submit_feedback(turn2, conversation.user_id, value=-1,
                           comment="Missed firewall/pg_hba.conf angle")

    # Session summary
    total_tokens = sum(t.input_tokens + t.output_tokens for t in conversation.turns)
    total_latency = sum(t.latency_ms for t in conversation.turns)
    print(f"=== Session Summary ===")
    print(f"Turns: {len(conversation.turns)}")
    print(f"Total tokens: {total_tokens}")
    print(f"Total latency: {total_latency:.0f}ms")
    print(f"Feedback submitted: {sum(1 for t in conversation.turns if t.feedback_submitted)}/{len(conversation.turns)}")


if __name__ == "__main__":
    simulate_support_session()

Portkey vs LiteLLM: How to Choose

Dimension	LiteLLM Proxy	Portkey
Deployment model	Self-hosted (Docker/K8s)	Managed SaaS (or enterprise self-hosted)
Data residency	Full control - stays in your infrastructure	Request data flows through Portkey
Setup time	2–4 hours (Docker Compose)	15 minutes (API key + SDK)
Observability	Requires Langfuse or similar	Built-in traces, cost analytics, dashboards
Provider support	100+	250+
Guardrails	Limited (custom via callbacks)	First-class (input/output, regex, word count)
Feedback API	Not built-in	Built-in - thumbs up/down per trace
Pricing	Open-source, hosting cost only	Usage-based SaaS subscription
Compliance	HIPAA/SOC 2 viable (your infrastructure)	Check Portkey's compliance certifications

The choice is not which tool is better - it is which tradeoffs fit your context. Regulated industries often require self-hosted. Teams that need operational velocity and built-in observability without infrastructure overhead typically choose Portkey.

The Cost of the Gray Failure: What Portkey Prevents

The incident described in the opening scenario - three hours of invisible degradation - is a direct result of missing three capabilities:

Missing: per-request latency tracking. Without P95/P99 latency trends per model, slow timeouts are invisible until they cascade into a full outage.

Missing: per-target timeout enforcement. Without a request_timeout on each provider target, one slow provider blocks the thread for the full provider timeout (45 seconds for Anthropic's default).

Missing: automatic fallback. Without a configured fallback chain, there is no mechanism to route around the degraded provider until the on-call engineer manually investigates and intervenes.

Portkey's Config system addresses all three. A production config that would have prevented the incident:

{
  "strategy": {"mode": "fallback"},
  "targets": [
    {
      "virtual_key": "anthropic-prod",
      "override_params": {"model": "claude-sonnet-4-6"},
      "request_timeout": 15,
      "retry": {"attempts": 1, "on_status_codes": [429, 500, 502, 503]}
    },
    {
      "virtual_key": "openai-prod",
      "override_params": {"model": "gpt-4o"},
      "request_timeout": 15
    }
  ]
}

With request_timeout: 15, slow Anthropic responses fail fast after 15 seconds instead of blocking for 45. With the fallback strategy, the request immediately retries on GPT-4o. The user sees a response after 15–18 seconds instead of a 45-second hang. The support team sees the fallback rate spike in Portkey's dashboard within minutes, not hours.

Production Engineering Notes

:::tip Create fallback configs before your first production deployment The single most impactful thing you can do with Portkey is configure a fallback chain before your service handles real traffic. A five-minute config creation at the dashboard prevents hours of incident response. The fallback does nothing until it is needed - and when it is needed, it is essential. :::

:::warning Establish a consistent metadata schema early Portkey indexes metadata fields for filtering in analytics. If your metadata uses user_id in some requests and userId in others, Portkey treats them as different fields - historical traces become unfilterable. Define your metadata schema (field names, types, required fields) before the first production deployment and enforce it as a standard in every LLM call wrapper. :::

:::danger Virtual key rotation requires verifying the config reference When you rotate a provider API key, you update the mapping in Portkey's virtual key settings. The virtual key ID stays the same - configs that reference the virtual key ID do not need to change. But verify this: if any code or config hardcodes the old virtual key ID rather than the new one, it will still route to the old credential. Always test after rotation. :::

:::info Portkey stores request payloads for 30 days by default By default, Portkey stores the full prompt and response for every request. If your prompts contain user PII (names, emails, medical conditions), configure data masking in your account settings or use the x-portkey-disable-logging: true header on sensitive requests to skip payload storage for those calls. :::

Common Mistakes

Mistake 1: Not setting a consistent trace_id per user session

Without a consistent trace_id, you cannot reconstruct a multi-turn conversation from the trace log. Set the trace_id to feature-session_id so every turn in a session shares the same prefix. Use a span_id or turn_id to distinguish individual turns within the session. This is the difference between being able to debug a broken session in 30 seconds versus spending 20 minutes manually correlating log entries.

Mistake 2: One global config for all features

A single config for all features forces one-size-fits-all retry and timeout behavior. A real-time user chat needs fast timeouts (30s) and aggressive fallbacks. An async document summarizer can tolerate 120s timeouts and can wait through longer retry sequences. Create separate configs per feature type. Config creation is free and takes two minutes.

Mistake 3: Collecting feedback but never analyzing it

The Portkey Feedback API generates a rich quality signal, but only if someone looks at it. Teams that implement the feedback collection button but never query the data are wasting their most valuable quality signal. Establish a weekly review: filter traces by negative feedback, find patterns, improve prompts or routing, track score improvement. The feedback loop is only useful when it actually loops.

Mistake 4: Treating guardrails as a security boundary

Portkey's guardrails are a useful operational layer, not a security boundary. String-matching and regex-based guardrails are bypassable by creative prompt phrasing. Defense in depth: validate user inputs at the application layer first, use Portkey guardrails as a secondary filter for common patterns, and treat any LLM output as untrusted before rendering it to users. Guardrails reduce risk - they do not eliminate it.

Mistake 5: Not setting per-target timeouts

The most common cause of gray failure incidents with managed gateways is leaving per-target timeouts unset. Without an explicit timeout, Portkey waits for the provider's default timeout before failing over. For Anthropic, this can be 60+ seconds. Set request_timeout on every target - 20–30 seconds for user-facing features, 60–90 seconds for batch processing. This single configuration decision can be the difference between a user noticing a 1-second slowdown and a user experiencing a 60-second hang.

Interview Q&A

Q: What is Portkey and what differentiates it from LiteLLM?

Portkey is a managed LLM gateway with observability as a first-class product feature. Like LiteLLM, it provides a unified OpenAI-compatible API across multiple providers, routing, fallbacks, and caching. The key differentiators: Portkey is a managed cloud service (your request data flows through Portkey's infrastructure), while LiteLLM is primarily self-hosted. Portkey includes built-in request tracing, cost analytics per user/team/feature, user feedback collection, and guardrails - without requiring integration with any external observability tool. Teams choose Portkey for operational velocity and built-in observability; teams choose LiteLLM for data residency control and self-hosting.

Q: What are virtual keys in Portkey and why are they useful in production?

Virtual keys are Portkey-scoped API key identifiers that map to real provider credentials stored securely in Portkey's vault. Application code uses virtual key IDs instead of provider API keys directly. This decouples credential management from application deployment: rotating an Anthropic key means updating the virtual key mapping in Portkey's dashboard - no code change, no redeployment, no risk of accidentally logging the new key during the rotation. Virtual keys also support per-key rate limits, spend caps, and model allowlists, giving you fine-grained control over what each component of your system can access.

Q: Explain how Portkey's Config system works and what you can define in a config.

A Portkey Config is a JSON object defining the routing policy for requests. You create configs via the Portkey dashboard or API, and reference them by ID in the x-portkey-config request header. A config defines: the routing strategy (single, fallback, or loadbalance), the list of targets (each specifying a virtual key, model override, retry policy, and per-target timeout), cache settings (exact or semantic, with TTL), and guardrails (input filtering and output validation rules). When you update a Config in the dashboard, the change applies to all subsequent requests using that Config ID - no application code change or redeployment required. This is the core architectural advantage: routing logic lives in config, not code.

Q: How does Portkey handle a provider partial outage - specifically the "slow timeout" failure mode?

Each target in a Portkey Config has a request_timeout parameter. When a provider is in a slow-timeout failure mode (requests take 45 seconds instead of failing fast), Portkey enforces this timeout per target. If the primary target's timeout fires, Portkey immediately moves to the next target in the fallback list without waiting for the slow response to complete. This prevents the thread pool exhaustion pattern that cascades in slow-timeout failures. The key is setting per-target timeouts shorter than the provider's maximum timeout - for example, 20 seconds for a primary that might time out at 45 seconds. All timeout events are logged as part of the trace, making the failure pattern immediately visible in Portkey's dashboard.

Q: How does the Portkey Feedback API improve LLM product quality over time?

The Feedback API lets application code submit quality signals (positive 1, neutral 0, negative -1) against specific trace IDs immediately after user interaction. The feedback is stored alongside the full trace - prompt, response, model, latency, cost, and metadata. This enables a quality-driven development loop: engineers filter Portkey traces by negative feedback score, analyze what prompts and responses received thumbs-down, identify patterns (wrong information, too verbose, off-topic), improve prompts or routing, and measure score improvement over the following week. This bridges LLM infrastructure and model quality evaluation in one workflow, without building a separate evaluation data pipeline.

Q: Walk me through diagnosing a support ticket: a specific user reports that the AI chat gave them a wrong answer yesterday. How do you use Portkey?

In the Portkey dashboard, filter traces by the user's user_id metadata field and the approximate time range. Identify the specific trace - each turn in the conversation has its own trace entry linked by the session's trace_id. Open the trace: you can see the exact prompt sent (including system prompt and full conversation history), the exact response received, which model handled it, how long it took, and whether a fallback was triggered. If the wrong answer came after a fallback, note which model handled it - the fallback model may have different behavior for this question type. Export the prompt and response, reproduce the failure in your development environment, improve the prompt or add guardrails, and submit a new test case to your evaluation suite to prevent regression.

Q: How would you use Portkey's load-balance Config mode to do an A/B test between two models?

Create a Config with "strategy": {"mode": "loadbalance"} and two targets: Claude Sonnet at weight 50 and GPT-4o at weight 50. Tag each target with a metadata field identifying the "variant" (e.g., "variant": "claude" and "variant": "openai"). Use this config for the test group of users. After a week of traffic, filter Portkey analytics by the variant metadata field and compare: response quality ratings from the Feedback API, P95 latency per variant, cost per request per variant, and any error rates. The A/B framework is built entirely at the gateway level - no application code changes required, and you can adjust the weight split (e.g., 80/20 for a cautious rollout) by updating the Config without any deployment.

Portkey Config Reference: Complete Production Example

The following is a complete Portkey Config JSON for a production deployment combining fallback, retry, caching, and guardrails.

import portkey_ai

# Complete production Config - reference all key options
PRODUCTION_CONFIG = {
    "strategy": {
        "mode": "fallback"
    },
    "cache": {
        "mode": "semantic",             # or "simple" for exact-match only
        "max_age": 86400,               # Cache TTL: 24 hours
        "similarity_threshold": 0.95
    },
    "retry": {
        "attempts": 2,
        "on_status_codes": [429, 500, 502, 503, 504]
    },
    "targets": [
        {
            "virtual_key": "PORTKEY_ANTHROPIC_VIRTUAL_KEY",
            "override_params": {
                "model": "claude-sonnet-4-6"
            },
            "request_timeout": 20000,   # 20 seconds - fail fast to fallback
            "weight": 1
        },
        {
            "virtual_key": "PORTKEY_OPENAI_VIRTUAL_KEY",
            "override_params": {
                "model": "gpt-4o"
            },
            "request_timeout": 20000,
            "weight": 1
        }
    ],
    "guardrails": {
        "input": [
            {
                "type": "regex",
                "pattern": "\\b(password|secret|api.?key)\\b",
                "action": "block",
                "flags": ["IGNORECASE"]
            }
        ],
        "output": [
            {
                "type": "word_count",
                "min": 10,
                "max": 2000,
                "action": "warn"
            }
        ]
    }
}

# Use the config in requests
client = portkey_ai.Portkey(api_key="PORTKEY_API_KEY")

# Option 1: Create the config in the dashboard, reference by ID
response = client.with_options(config="config-abc123").chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Explain microservices briefly."}],
    max_tokens=200,
)

# Option 2: Pass the config dict inline (for development/testing)
response = client.with_options(config=PRODUCTION_CONFIG).chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Explain microservices briefly."}],
    max_tokens=200,
)

Portkey vs LiteLLM: Feature Comparison

Feature	Portkey (Managed)	LiteLLM (Self-Hosted)
Provider support	250+	100+
Setup time	Minutes (API key)	Hours (Docker + config)
Operational overhead	Zero	Your team
Built-in traces	Yes - full trace explorer	Requires Langfuse/Helicone
Feedback API	Yes (built in)	No (requires custom build)
Guardrails	Yes (regex, word count)	Limited (custom callbacks)
Data residency	Cloud (verify certs)	Full VPC control
Pricing	SaaS subscription	OSS + hosting cost
Config hot-reload	Yes - dashboard, no deploy	YAML reload (some configs)
Semantic caching	Yes	Yes (Redis)
HIPAA / SOC 2	Check Portkey compliance page	Control all components

When to Choose Portkey

Portkey is the right choice when:

Observability is the primary concern: teams that spend significant time debugging LLM failures, quality issues, and cost spikes will get immediate value from Portkey's built-in trace explorer and feedback pipeline
Time-to-production is constrained: a managed service requires minutes of setup vs hours for self-hosted infrastructure
Non-Python services need gateway access: Portkey is language-agnostic - any service that can make HTTP requests can use it via the OpenAI-compatible API
A/B testing between models is a regular workflow: the Config weight system makes traffic splits a dashboard operation, not a deployment

Choose LiteLLM instead when:

Data residency requirements prevent SaaS: regulated industries where all data must stay in your VPC
You need 100% custom routing logic: LiteLLM is open source and extensible; Portkey's routing options are fixed
Cost per request is critical: at very high request volumes, the Portkey SaaS fee may exceed the engineering cost of self-hosting

Summary: Portkey in Production

Portkey's core value proposition is managed gateway operations with first-class observability. The Config system decouples routing policy from application code, enabling routing changes without deployments. The trace system provides the audit trail needed to debug quality issues, diagnose provider failures, and improve prompts systematically. The Feedback API creates the quality-improvement loop that most LLM products lack. For teams that want production-grade LLM infrastructure without managing the infrastructure themselves, Portkey is the fastest path to all gateway capabilities.

Portkey Request Lifecycle

Understanding how a request flows through Portkey helps clarify which features apply at which stage:

Each step in this flow is a Portkey feature that can be configured independently. A team getting started might use only virtual keys and basic routing; as needs mature, they add semantic caching, guardrails, and the feedback pipeline. The Config system makes each of these additive changes without application code modifications.

Portkey Integration: Complete Working Example

The following example demonstrates all core Portkey features working together - virtual key routing, trace metadata, feedback submission, and config-based fallback - in a single cohesive flow.

import os
import time
import anthropic
from portkey_ai import Portkey, createHeaders, PORTKEY_GATEWAY_URL

# ─── Configuration ────────────────────────────────────────────────────────────

PORTKEY_API_KEY = os.environ.get("PORTKEY_API_KEY", "pk-placeholder")
ANTHROPIC_VIRTUAL_KEY = os.environ.get("PORTKEY_ANTHROPIC_VK", "vk-placeholder")
OPENAI_VIRTUAL_KEY = os.environ.get("PORTKEY_OPENAI_VK", "vk-placeholder-oai")

# Production config: fallback from Claude to GPT-4o, 20s timeout per target
FALLBACK_CONFIG = {
    "strategy": {"mode": "fallback"},
    "cache": {"mode": "semantic", "max_age": 86400},
    "targets": [
        {
            "virtual_key": ANTHROPIC_VIRTUAL_KEY,
            "override_params": {"model": "claude-sonnet-4-6"},
            "request_timeout": 20000,
        },
        {
            "virtual_key": OPENAI_VIRTUAL_KEY,
            "override_params": {"model": "gpt-4o"},
            "request_timeout": 20000,
        },
    ],
}


def run_support_query(
    user_id: str,
    query: str,
    session_id: str,
) -> dict:
    """
    Run a support query through Portkey with full observability.
    Returns response text, trace ID, and timing metadata.
    """
    # Initialize client with trace metadata
    client = Portkey(
        api_key=PORTKEY_API_KEY,
        config=FALLBACK_CONFIG,
        metadata={
            "user_id": user_id,
            "session_id": session_id,
            "feature": "support-chat",
            "environment": "production",
        },
        trace_id=session_id,
    )

    start = time.time()
    response = client.chat.completions.create(
        model="claude-sonnet-4-6",
        messages=[
            {
                "role": "system",
                "content": "You are a helpful support assistant. Be concise and accurate."
            },
            {"role": "user", "content": query},
        ],
        max_tokens=300,
    )
    latency_ms = round((time.time() - start) * 1000)

    # Extract trace ID from response for feedback correlation
    trace_id = getattr(response, "trace_id", session_id)
    answer = response.choices[0].message.content

    return {
        "answer": answer,
        "trace_id": trace_id,
        "model": response.model,
        "latency_ms": latency_ms,
        "input_tokens": response.usage.prompt_tokens,
        "output_tokens": response.usage.completion_tokens,
    }


def submit_user_feedback(
    portkey_client: Portkey,
    trace_id: str,
    satisfied: bool,
    comment: str = "",
) -> None:
    """Submit user satisfaction feedback linked to a specific trace."""
    score = 1 if satisfied else -1
    portkey_client.feedback.create(
        trace_id=trace_id,
        value=score,
        weight=1,
        metadata={"comment": comment} if comment else {},
    )
    status = "positive" if satisfied else "negative"
    print(f"Feedback submitted: {status} for trace {trace_id}")


# ─── Demo usage ───────────────────────────────────────────────────────────────
if __name__ == "__main__":
    feedback_client = Portkey(api_key=PORTKEY_API_KEY)

    # Simulate a support session with feedback
    questions = [
        ("How do I reset my API key?", True),
        ("What is your refund policy?", True),
        ("Explain quantum entanglement to me.", False),  # Off-topic - thumbs down
    ]

    for query, expected_satisfaction in questions:
        result = run_support_query(
            user_id="user_demo_001",
            query=query,
            session_id=f"sess_{hash(query) % 10000}",
        )
        print(f"Q: {query}")
        print(f"A: {result['answer'][:80]}...")
        print(f"   Model: {result['model']} | {result['latency_ms']}ms | "
              f"trace: {result['trace_id']}")

        # Submit feedback based on (simulated) user satisfaction
        submit_user_feedback(
            feedback_client,
            result["trace_id"],
            satisfied=expected_satisfaction,
            comment="" if expected_satisfaction else "Response was off-topic",
        )
        print()

The Gray Failure Nobody Saw Coming​

Why This Exists​

Core Architecture​

Virtual Keys: Secure Credential Management​

Configs: The Heart of Portkey's Routing​

Full Production Example: Traced Multi-Turn Application​

Portkey vs LiteLLM: How to Choose​

The Cost of the Gray Failure: What Portkey Prevents​

Production Engineering Notes​

Common Mistakes​

Interview Q&A​

Portkey Config Reference: Complete Production Example​

Portkey vs LiteLLM: Feature Comparison​

When to Choose Portkey​

Summary: Portkey in Production​

Portkey Request Lifecycle​

Portkey Integration: Complete Working Example​