:::tip 🎮 Interactive Playground Visualize this concept: Try the Feature Store Architecture demo on the EngineersOfAI Playground - no code required. :::
Tecton and Managed Feature Stores
"The question is never which feature store has the best architecture. The question is which feature store your team will actually operate successfully at 3am when the materialization pipeline breaks."
The Weight of Infrastructure Ownership
The Feast deployment worked. For eighteen months it worked remarkably well. The team had 50 engineers building ML models. They had accumulated 500 defined features across 40 feature views. Materialization ran hourly on an Airflow cluster, writing outputs to Redis, and the recommendation and fraud models serving 4 million daily users pulled features in under 3 milliseconds.
Then the cracks appeared. Not in the design - in the operational surface area. Someone had to own the Redis cluster: capacity planning, eviction policy tuning, memory pressure alerts at 2am. Someone had to own the Spark materialization jobs: executor sizing, shuffle partitioning, dependency conflicts between the feast version and the Spark version. Someone had to own the SQL registry: connection pool exhaustion under load, slow feast apply performance as the registry grew to 500 feature definitions, backup schedules. When a new engineer joined the ML platform team, onboarding to the feature store took three weeks before they could operate it confidently.
The real cost was not the AWS bill for Redis and the Spark cluster - it was the engineering hours. The ML platform team spent 60% of their time on infrastructure operations and 40% on the platform features their model teams actually needed. The economics were inverted.
They evaluated Tecton and migrated over four months. The infrastructure disappeared. Redis was replaced by Tecton's managed online store. Spark materialization jobs were replaced by Tecton's managed compute layer. The SQL registry was replaced by Tecton's hosted control plane. What remained was exactly what should remain: feature definitions in Python, pipelines for raw data, and model code. A new engineer could define a streaming feature and have it in production in an afternoon.
This lesson covers the managed feature store landscape: what you get when you pay someone else to run the infrastructure, where the trade-offs live, how to choose between options, and how to migrate without breaking production.
The Managed Feature Store Value Proposition
When you choose a managed feature store, you are not buying a better feature store architecture. You are buying the elimination of operational surface area. The value is not in the product - it is in what the product takes off your team's plate.
Concretely, a managed feature store eliminates:
- Online store operations: no Redis cluster to size, patch, monitor, or scale
- Materialization compute: no Spark cluster to maintain, no executor tuning, no version conflicts
- Registry operations: no PostgreSQL to back up, no connection pool to manage, no schema migrations
- Streaming infrastructure: no Kafka consumer groups to rebalance, no Flink cluster to operate
- Observability setup: feature freshness monitoring, data quality alerts, and serving latency metrics are built-in
The cost is real: managed feature stores are expensive. Tecton pricing in 2024 starts at approximately 100,000 per year for production deployments, scaling with the number of features, entities, and serving volume. Vertex AI Feature Store and Databricks Feature Store are consumption-based and can reach similar costs at scale.
The calculation is simple: if your team's all-in engineering cost is 450,000 of engineering cost versus 100,000 licensing line item is actually cheaper than the $450,000 of invisible engineering overhead.
Tecton: The Full-Service Feature Platform
Tecton was founded in 2020 by the creators of Uber's internal feature store, Michelangelo. The design reflects lessons from one of the highest-scale ML platforms in the world: a unified abstraction layer over batch and streaming computation, a managed infrastructure layer that handles materialization without user intervention, and a production-first serving architecture.
Tecton Core Abstractions
Tecton mirrors Feast's conceptual model but extends it significantly, particularly around streaming features.
Entity: identical concept to Feast. A join key that identifies the subject of features.
import tecton
user = tecton.Entity(name="user", join_keys=["user_id"])
merchant = tecton.Entity(name="merchant", join_keys=["merchant_id"])
BatchFeatureView: reads from a batch source (BigQuery, Snowflake, S3), runs a transformation on a user-defined schedule, materializes automatically to Tecton's managed online store.
from tecton import batch_feature_view, Attribute, FilteredSource
from tecton.types import Float64, Int64, Timestamp
from datetime import datetime, timedelta
@batch_feature_view(
sources=[FilteredSource(user_events_batch_source)],
entities=[user],
mode="spark_sql",
aggregation_interval=timedelta(hours=1),
feature_start_time=datetime(2024, 1, 1),
batch_schedule=timedelta(hours=1),
ttl=timedelta(days=7),
online=True,
offline=True,
description="User purchase statistics, refreshed hourly",
)
def user_purchase_stats(user_events):
return f"""
SELECT
user_id,
timestamp,
COUNT(*) OVER (
PARTITION BY user_id
ORDER BY timestamp
RANGE BETWEEN INTERVAL 7 DAY PRECEDING AND CURRENT ROW
) AS purchase_count_7d,
SUM(purchase_amount) OVER (
PARTITION BY user_id
ORDER BY timestamp
RANGE BETWEEN INTERVAL 7 DAY PRECEDING AND CURRENT ROW
) AS total_spend_7d
FROM user_events
WHERE event_type = 'PURCHASE'
"""
When you apply this definition, Tecton:
- Provisions and manages the Spark compute that runs the SQL transformation
- Schedules hourly materialization runs automatically
- Writes feature values to its managed online store and offline store
- Provides freshness monitoring and alerting out of the box
You wrote a SQL transformation. Tecton handled everything else.
StreamFeatureView: reads from a Kafka or Kinesis stream, computes aggregations in real time, serves features with sub-second freshness. This is where Tecton's managed offering creates the most dramatic difference compared to self-hosted Feast.
from tecton import stream_feature_view, FilteredSource, Aggregation
from tecton.types import Float64, Int64
from datetime import timedelta
@stream_feature_view(
source=FilteredSource(kafka_purchase_events_source),
entities=[user],
mode="spark_sql",
aggregations=[
Aggregation(column="purchase_amount", function="sum", time_window=timedelta(minutes=15)),
Aggregation(column="purchase_amount", function="count", time_window=timedelta(minutes=15)),
Aggregation(column="purchase_amount", function="sum", time_window=timedelta(hours=1)),
Aggregation(column="purchase_amount", function="count", time_window=timedelta(hours=1)),
],
online=True,
offline=True,
ttl=timedelta(hours=2),
description="Real-time purchase aggregates from Kafka",
)
def user_realtime_purchase_stats(kafka_purchase_events):
return f"""
SELECT
user_id,
timestamp,
purchase_amount
FROM kafka_purchase_events
WHERE event_type = 'PURCHASE'
"""
Tecton provisions a managed Spark Structured Streaming job that consumes from Kafka, computes the windowed aggregations, and writes to the online store. Feature values reflect events that occurred seconds ago. No Flink cluster to manage. No Kafka consumer group to monitor. No state backend to tune.
OnDemandFeatureView: computes features at request time, same concept as Feast on-demand views.
from tecton import on_demand_feature_view, RequestSource
from tecton.types import Float64
request_schema = RequestSource(schema={"request_amount": Float64})
@on_demand_feature_view(
sources=[user_purchase_stats, request_schema],
mode="python",
schema=[Attribute("spend_velocity_ratio", Float64)],
)
def spend_velocity_ratio(user_purchase_stats, request_data):
# Ratio of this transaction to 15-minute rolling spend
recent_spend = user_purchase_stats["total_spend_15m"] or 0.001
return {"spend_velocity_ratio": request_data["request_amount"] / recent_spend}
Tecton Feature Retrieval
At serving time, the Tecton Python SDK provides feature lookup:
import tecton
# Initialize the feature service
fraud_service = tecton.get_feature_service("fraud_detection_v3")
# Online retrieval - sub-millisecond from Tecton's managed store
feature_vector = fraud_service.get_feature_vector(
join_keys={"user_id": "usr_12345"},
request_data={"request_amount": 250.00},
include_on_demand_feature_views=True,
)
# Access features as a dict
features = feature_vector.to_pandas().to_dict(orient="records")[0]
# {
# "purchase_count_7d": 12,
# "total_spend_7d": 847.50,
# "total_spend_15m": 0.0,
# "spend_velocity_ratio": 250000.0,
# }
Databricks Feature Store
The Databricks Feature Store is the right choice when your organization is already running on the Databricks Lakehouse platform. It is not a stand-alone product - it is deeply integrated with Delta Lake, MLflow, and the Databricks runtime. That integration is its greatest strength and its limiting constraint.
The Central Value: Model-Feature Coupling via MLflow
Databricks Feature Store's most compelling capability is automatic feature lookup at scoring time. When you log a model with MLflow using the Feature Store client, the model carries metadata about which feature table and which columns it requires. At batch scoring time, you provide only the entity keys - the Feature Store client automatically retrieves the current feature values and joins them to your scoring DataFrame.
This eliminates the manual feature lookup code that typically lives in scoring pipelines and tends to drift from the training pipeline over time.
from databricks.feature_store import FeatureStoreClient, FeatureLookup
import mlflow
fs = FeatureStoreClient()
# 1. Create a feature table in Delta Lake
fs.create_table(
name="ml_catalog.user_features.user_stats",
primary_keys=["user_id"],
timestamp_keys=["feature_timestamp"],
schema=spark.createDataFrame([], schema).schema,
description="User behavioral statistics for ML models",
)
# 2. Write feature values to the table
user_stats_df = spark.sql("""
SELECT
user_id,
current_timestamp() as feature_timestamp,
COUNT(CASE WHEN event_type = 'PURCHASE' THEN 1 END) OVER (
PARTITION BY user_id
ORDER BY event_time
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as purchase_count_7d,
SUM(CASE WHEN event_type = 'PURCHASE' THEN amount ELSE 0 END) OVER (
PARTITION BY user_id
ORDER BY event_time
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as total_spend_7d
FROM user_events
WHERE event_date >= current_date() - 7
""")
fs.write_table(
name="ml_catalog.user_features.user_stats",
df=user_stats_df,
mode="merge",
)
# 3. Create training data with automatic feature lookup
training_set = fs.create_training_set(
df=labels_df, # DataFrame with entity keys + labels
feature_lookups=[
FeatureLookup(
table_name="ml_catalog.user_features.user_stats",
feature_names=["purchase_count_7d", "total_spend_7d"],
lookup_key="user_id",
),
],
label="purchased",
)
training_df = training_set.load_df()
# 4. Train model
import mlflow.sklearn
from sklearn.ensemble import GradientBoostingClassifier
with mlflow.start_run():
model = GradientBoostingClassifier(n_estimators=100, max_depth=4)
X = training_df.drop("purchased").toPandas()
y = training_df.select("purchased").toPandas()
model.fit(X, y)
# Log with feature store - model now carries feature metadata
fs.log_model(
model=model,
artifact_path="model",
flavor=mlflow.sklearn,
training_set=training_set,
)
# 5. Batch scoring - Feature Store automatically retrieves features
from databricks.feature_store import FeatureStoreClient
import mlflow
fs = FeatureStoreClient()
# Load the registered model
model_uri = "models:/fraud_detection_v2/Production"
# Only provide entity keys - Feature Store handles feature lookup
scoring_df = spark.createDataFrame([
{"user_id": "usr_001"},
{"user_id": "usr_002"},
{"user_id": "usr_003"},
])
# score_batch joins features automatically
predictions = fs.score_batch(
model_uri=model_uri,
df=scoring_df,
)
# Returns: user_id | purchase_count_7d | total_spend_7d | prediction
Databricks Feature Store Limitations
The Databricks Feature Store has real constraints you must understand before committing.
Platform lock-in is total. Feature tables are Delta Lake tables on Databricks-managed storage. The FeatureStoreClient SDK only works inside the Databricks runtime or with a valid Databricks workspace connection. You cannot take your feature definitions and run them somewhere else. If you ever leave Databricks, you migrate features manually.
Online serving is limited. Databricks Feature Store does not include a managed online serving store with sub-millisecond latency. For real-time serving you must either use Databricks online tables (available in Unity Catalog, still maturing in 2024), export features to an external Redis/DynamoDB, or accept the latency of a Delta Lake read. This makes it a poor fit for inference endpoints with tight latency SLAs.
Streaming ingestion requires extra configuration. Writing real-time streaming features to the Feature Store requires Structured Streaming jobs that output to Delta tables, then triggering feature table sync. The path is functional but involves more moving parts than Tecton's native streaming support.
Vertex AI Feature Store
Vertex AI Feature Store is Google's managed feature store, deeply integrated with the Google Cloud ecosystem. If your organization runs on GCP - particularly if your data warehouse is BigQuery and your ML platform is Vertex AI Pipelines - it is the lowest-friction managed option.
Architecture
Vertex AI Feature Store uses BigQuery as the offline store (feature data lives in BigQuery tables) and Bigtable as the online store (the managed key-value store serving sub-millisecond lookups). Both are fully managed GCP services with no operational overhead.
The API is organized around three concepts: Featurestore (the top-level container), EntityType (equivalent to Feast's Entity), and Feature (an individual feature column).
from google.cloud import aiplatform
aiplatform.init(project="your-gcp-project", location="us-central1")
# Create a Feature Store (the top-level container)
online_store = aiplatform.Featurestore.create(
featurestore_id="user_features",
online_store_fixed_node_count=3, # Bigtable nodes for online serving
)
# Create an EntityType (equivalent to Entity in Feast)
user_entity_type = online_store.create_entity_type(
entity_type_id="user",
description="A registered platform user",
)
# Create Features
user_entity_type.batch_create_features(
feature_configs={
"purchase_count_7d": {
"value_type": "INT64",
"description": "Number of purchases in last 7 days",
},
"total_spend_7d": {
"value_type": "DOUBLE",
"description": "Total spend in USD over last 7 days",
},
"preferred_category": {
"value_type": "STRING",
"description": "Most purchased category",
},
}
)
Batch ingestion from BigQuery:
# Ingest feature values from a BigQuery table
user_entity_type.ingest_from_bq(
feature_ids=["purchase_count_7d", "total_spend_7d", "preferred_category"],
feature_time="feature_timestamp",
bq_source_uri="bq://your-project.your_dataset.user_features_staging",
entity_id_field="user_id",
worker_count=4,
)
Streaming ingestion:
# Write individual feature values programmatically (streaming path)
user_entity_type.write_feature_values(
instances={
"usr_12345": {
"purchase_count_7d": 15,
"total_spend_7d": 412.50,
},
"usr_67890": {
"purchase_count_7d": 3,
"total_spend_7d": 89.00,
},
}
)
Online serving:
# Fast lookup from Bigtable
feature_vector = user_entity_type.read(
entity_ids=["usr_12345", "usr_67890"],
feature_ids=["purchase_count_7d", "total_spend_7d"],
)
Batch serving to BigQuery (for training):
# Export historical feature values to BigQuery for training
user_entity_type.export_feature_values(
destination=aiplatform.gapic.DestinationFeatureSetting(
destination_table="bq://your-project.training_dataset.training_features"
),
feature_ids=["purchase_count_7d", "total_spend_7d", "preferred_category"],
start_time="2024-01-01T00:00:00Z",
end_time="2024-11-01T00:00:00Z",
)
When to Use Vertex AI Feature Store
Use it when: your data is already in BigQuery, your ML pipelines run in Vertex AI, and you want zero infrastructure to operate. The BigQuery-Bigtable architecture is scalable to billions of entity updates per day without configuration. The managed control plane handles provisioning, scaling, and patching.
Avoid it when: you are multi-cloud or using AWS/Azure as primary, you need extreme cost sensitivity (Bigtable is not cheap at scale), or you need advanced transformation logic beyond SQL.
The Build vs. Buy Decision Framework
Decision Matrix
| Dimension | Feast | Tecton | Databricks FS | Vertex AI FS |
|---|---|---|---|---|
| Cost | Low (infra only) | High (200k/yr) | Included w/ Databricks | Consumption-based |
| Ops burden | High | Low | Medium | Low |
| Streaming support | Requires work | Excellent (native) | Requires work | Good |
| Online serving | Redis (self-managed) | Managed | Limited / Databricks Online Tables | Bigtable (managed) |
| Cloud lock-in | None | AWS/GCP/Azure | Databricks only | GCP only |
| Team size fit | 1–20 engineers | 20+ engineers | Any (on Databricks) | Any (on GCP) |
| Training-serving consistency | Strong | Strong | Excellent (MLflow integration) | Good |
Total Cost of Ownership Comparison
Scenario: 100 features, 10 models, 10 million entity updates per day, 50,000 online lookups per day.
Feast (self-hosted):
- Redis cache.m6g.large: ~$120/month
- Spot Spark cluster for materialization (2hr/day): ~$200/month
- PostgreSQL RDS db.t3.small for registry: ~$30/month
- Engineering operational overhead: 0.5 FTE × 150,000/year
- Total annual cost: ~$153,000 (mostly engineering time)
Tecton:
- Licensing: ~$80,000/year (estimate for this scale)
- Engineering operational overhead: ~0.1 FTE = $30,000/year
- Total annual cost: ~$110,000 (mostly licensing)
Vertex AI Feature Store:
- Bigtable (3 nodes): ~$2,700/month
- Ingestion: ~$150/month at this volume
- Engineering operational overhead: ~0.1 FTE = $30,000/year
- Total annual cost: ~$66,000 (Bigtable dominates)
These estimates are rough and highly scenario-dependent. The key insight is that at moderate scale, the engineering overhead of self-hosted Feast is the dominant cost - not the infrastructure. At small scale (fewer than 50 features), Feast's infrastructure cost is negligible and the operational overhead is manageable by a single engineer.
Migrating from Feast to a Managed Feature Store
Migration without breaking production requires a parallel-run strategy. You cannot cut over all models simultaneously.
Phase 1: Define features in the new system (2–4 weeks)
Re-implement your feature definitions in the target system's SDK. Do not modify the Feast definitions - keep Feast running in production throughout. Run the new system's materialization in parallel, writing to its own separate online store.
# Example: same feature re-implemented in Tecton
# Your existing Feast feature view
user_stats_fv = FeatureView(
name="user_stats",
entities=[user],
ttl=timedelta(days=7),
schema=[...],
source=user_stats_source,
)
# Equivalent Tecton batch feature view
@batch_feature_view(
sources=[FilteredSource(tecton_user_events_source)],
entities=[tecton_user],
mode="spark_sql",
batch_schedule=timedelta(hours=1),
ttl=timedelta(days=7),
online=True,
)
def user_stats(user_events):
return "SELECT user_id, timestamp, ..."
Phase 2: Shadow mode validation (2 weeks)
Instrument your inference endpoints to call both Feast and the new system and compare results. Log every discrepancy. Common sources of divergence: different window boundary semantics, different null handling, different timestamp timezone handling.
# Shadow mode comparison in inference endpoint
feast_features = feast_store.get_online_features(
features=["user_stats:purchase_count_7d"],
entity_rows=[{"user_id": user_id}],
).to_dict()
tecton_features = tecton_service.get_feature_vector(
join_keys={"user_id": user_id}
).to_pandas().to_dict(orient="records")[0]
# Log discrepancies
feast_val = feast_features["user_stats__purchase_count_7d"][0]
tecton_val = tecton_features.get("purchase_count_7d")
if feast_val != tecton_val:
logger.warning(
"Feature value mismatch",
extra={
"user_id": user_id,
"feature": "purchase_count_7d",
"feast_value": feast_val,
"tecton_value": tecton_val,
}
)
Phase 3: Canary cutover (1 week per model)
Route 5% of traffic to use the new feature store exclusively. Monitor model metrics (click-through rate, fraud catch rate, AUC on held-out validation set). Increase to 50%, then 100%. Migrate models one at a time, starting with the least business-critical.
Phase 4: Decommission Feast
Once all models have migrated and operated on the new system for at least two weeks without incidents, decommission the Feast materialization jobs, drain the Feast Redis instance, and archive the feature repository.
Vendor lock-in with managed feature stores is structural, not accidental.
Your feature definitions are written in the vendor's SDK (Tecton decorators, Databricks FeatureStoreClient, Vertex EntityType). They are not portable to another system without rewriting. The data is stored in the vendor's managed infrastructure and cannot be easily exported in bulk. If the vendor raises prices by 5x, goes out of business, or deprecates the product, your migration path is a full rewrite. Mitigate this by maintaining a vendor-agnostic feature specification document (a simple CSV or YAML listing each feature's business definition, computation logic, and data source) so you can re-implement elsewhere.
Managed does not mean zero operations - it means different operations. With Tecton or Vertex AI Feature Store, you still own: the pipelines that deliver raw data to the feature store's sources (your Kafka topics, your BigQuery tables, your S3 paths), the data quality of that raw data, the model integration code that calls the feature store SDK at serving time, the entity key management (who generates user IDs, are they stable, how do they handle user merges), and the feature freshness requirements documentation. "Managed" means the vendor runs the infrastructure. It does not mean the vendor runs your data pipelines.
Production Engineering Notes
Caching at the application layer. Even with a managed online store, adding an application-level cache (a local in-memory cache keyed by entity ID with a 10-second TTL) can reduce serving latency from 3ms to under 0.1ms for hot entities. In recommendation systems where the same popular item is scored thousands of times per second, this cache hit rate is extremely high.
Feature versioning. All three managed options support versioning feature views (Tecton has explicit versioning, Databricks uses the Delta Lake time travel, Vertex AI has point-in-time reads). Use this capability to run A/B tests between feature versions: some models receive features computed by the old transformation, others receive features from the new transformation. Compare model metrics before promoting the new version.
Cross-region serving. Managed feature stores typically replicate online store data across regions automatically. Verify the replication lag and consistency model for your specific deployment. Some managed stores use eventual consistency with a lag of up to 15 seconds - for features that need to reflect events in the last 5 seconds, this matters.
Interview Q&A
Q1: What is the core trade-off between Feast (self-hosted) and Tecton (managed)?
The trade-off is operational burden versus cost. Feast is free but requires your team to operate Redis, Spark materialization, PostgreSQL registry, and monitoring infrastructure. The engineering overhead scales with the number of features and the complexity of your streaming requirements. Tecton costs 200,000 per year but eliminates that operational surface area almost entirely. The decision depends on your team's size, infrastructure maturity, and whether you have dedicated ML platform engineers who can absorb the operational work. For teams under 10 ML engineers or under 100 features, Feast's operational overhead is usually manageable. For larger teams building more than 50 new models per year, the math often favors managed.
Q2: What makes Databricks Feature Store unique compared to Feast or Tecton?
The unique capability is automatic feature lookup at model scoring time via MLflow integration. When you log a model with fs.log_model(), the model carries metadata about which feature table and columns it needs. fs.score_batch() reads that metadata and retrieves current feature values automatically - you provide entity keys, not feature values. This is valuable because it eliminates the separate feature retrieval code in batch scoring pipelines, which tends to diverge from the training pipeline over time and create training-serving skew. The limitation is total Databricks lock-in and limited online serving capabilities.
Q3: When would you choose Vertex AI Feature Store over Tecton for a GCP-native deployment?
The primary case for Vertex AI Feature Store over Tecton on GCP is tight integration with BigQuery as the offline store. If your feature computation runs in BigQuery (dbt models outputting to BigQuery tables, BigQuery ML, or BigQuery SQL jobs), Vertex AI Feature Store can ingest directly from BigQuery without an intermediate export step. This eliminates data movement costs and latency. Tecton also supports BigQuery as a source, but the Vertex AI integration is native and requires no SDK installation or vendor credentials on the BigQuery side. Cost is also a factor: at moderate scale, Bigtable plus engineering overhead is often cheaper than Tecton licensing.
Q4: How do you handle the training-serving skew risk when migrating from Feast to a managed feature store?
Training-serving skew during migration is the highest-risk failure mode. The mitigation is a parallel shadow run with explicit value comparison. During migration, both the old (Feast) and new (Tecton/Vertex) systems materialize the same features from the same source data. Inference endpoints query both systems and log every case where values diverge. Common sources of divergence include: different handling of the window boundary (does a "7-day window" include or exclude the current day?), different null handling (does the system return 0 or null for an entity with no history?), and timezone differences in timestamp interpretation. Resolve all divergences before any traffic cutover. The shadow run period should be at least one full materialization cycle - if features are refreshed daily, run shadow mode for at least a week.
Q5: What does "managed" actually mean - what responsibilities does the vendor NOT take over?
"Managed" means the vendor operates the feature store infrastructure: the online store, the materialization compute, the registry, and the serving layer. It does not mean the vendor takes over your data pipelines. You still own: the upstream data sources (Kafka topics, S3 buckets, BigQuery tables) and the pipelines that keep them fresh and correct, the data quality of those sources (nulls, outliers, schema drift), the entity key lifecycle management (how user IDs are assigned, merged, deleted), the feature freshness requirements for each model (which features must be updated every second vs. every day), and the model integration code (how your inference service calls the feature store SDK). Managed feature stores are infrastructure-as-a-service, not feature-engineering-as-a-service.
Q6: Describe a migration strategy from Feast to Tecton that minimizes risk to production models.
A safe migration follows four phases. Phase 1 (2–4 weeks): re-implement all feature definitions in Tecton's SDK while keeping Feast running. Run Tecton materialization in parallel, writing to Tecton's separate online store. Phase 2 (2 weeks): enable shadow mode - inference endpoints query both Feast and Tecton and log value comparisons. Investigate and resolve every divergence. Phase 3 (1–2 weeks per model): canary cutover per model, starting with the least critical. Route 5% of traffic to Tecton, verify model metrics match expectations, increase to 100%. Phase 4: decommission Feast after all models have been running on Tecton for 2+ weeks without incidents. The key risk mitigations are: never cut over models in bulk, always run shadow mode first, always verify model performance metrics (not just feature value equality) before promoting.
Q7: How do you evaluate whether to use on-demand feature views vs. pre-materialized features in a managed feature store?
The decision criteria are: (1) Can the feature be computed in advance? If it requires data only available at request time (current GPS coordinates, current request amount, the exact time of the request), it must be on-demand. (2) What is the cardinality? If the feature has extremely high cardinality and most combinations are requested only once, pre-materialization wastes storage. On-demand computation is more efficient. (3) What is the computational cost? On-demand feature views execute synchronously in the serving path. If the computation takes more than 1ms, it adds directly to your serving latency. Pre-materialized features retrieved from Redis or Bigtable take 1–3ms total including the lookup - on-demand transforms must be faster than the cost they replace. (4) Does the feature require joining multiple pre-retrieved features? Ratios, differences, and interactions between pre-retrieved features are natural candidates for on-demand views because they are cheap to compute and avoid storing derived values when the base features are already stored.
Feature Store Integration Patterns for Inference Services
Regardless of which feature store you choose, the inference service integration follows a common pattern. Understanding this pattern helps you build serving code that is resilient, low-latency, and testable.
The Synchronous Lookup Pattern
The most common pattern: the inference endpoint receives a request, looks up feature values for the entity, assembles a feature vector, runs the model, and returns a prediction. All steps execute synchronously within the request lifetime.
# FastAPI inference endpoint with Tecton feature lookup
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import tecton
import numpy as np
import joblib
import time
import logging
app = FastAPI()
logger = logging.getLogger(__name__)
# Load model at startup
model = joblib.load("models/fraud_model_v3.pkl")
fraud_service = tecton.get_feature_service("fraud_detection_v3")
class PredictionRequest(BaseModel):
user_id: str
transaction_amount: float
merchant_id: str
class PredictionResponse(BaseModel):
user_id: str
fraud_probability: float
model_version: str
feature_latency_ms: float
total_latency_ms: float
@app.post("/predict", response_model=PredictionResponse)
async def predict_fraud(request: PredictionRequest):
request_start = time.perf_counter()
# Feature retrieval
feature_start = time.perf_counter()
try:
features = fraud_service.get_feature_vector(
join_keys={
"user_id": request.user_id,
"merchant_id": request.merchant_id,
},
request_data={
"transaction_amount": request.transaction_amount,
},
include_on_demand_feature_views=True,
).to_pandas().to_dict(orient="records")[0]
except Exception as e:
logger.error(f"Feature retrieval failed for user {request.user_id}: {e}")
raise HTTPException(status_code=503, detail="Feature store unavailable")
feature_latency_ms = (time.perf_counter() - feature_start) * 1000
# Check for null features (entity cold start)
critical_features = ["purchase_count_7d", "total_spend_7d"]
for feat in critical_features:
if features.get(feat) is None:
logger.warning(f"Null feature {feat} for user {request.user_id} - using default")
features[feat] = 0 # safe default for new users
# Assemble feature vector in model's expected order
feature_vector = np.array([[
features["purchase_count_7d"],
features["total_spend_7d"],
features["transaction_amount"], # on-demand feature
features["spend_velocity_ratio"], # on-demand derived feature
features["user_merchant_purchase_count"],
]])
# Model inference
fraud_prob = model.predict_proba(feature_vector)[0][1]
total_latency_ms = (time.perf_counter() - request_start) * 1000
return PredictionResponse(
user_id=request.user_id,
fraud_probability=float(fraud_prob),
model_version="fraud_v3",
feature_latency_ms=feature_latency_ms,
total_latency_ms=total_latency_ms,
)
Application-Level Feature Caching
For high-traffic inference endpoints where the same entity key appears in thousands of requests per minute, an in-process cache reduces feature store calls dramatically.
from functools import lru_cache
from datetime import datetime, timedelta
from typing import Optional
import threading
class TimedFeatureCache:
"""
Thread-safe LRU cache with TTL for feature values.
Suitable for features that change infrequently (e.g., user profile features).
Not suitable for real-time streaming features where freshness is critical.
"""
def __init__(self, maxsize: int = 10_000, ttl_seconds: int = 30):
self._cache: dict = {}
self._timestamps: dict = {}
self._lock = threading.Lock()
self._maxsize = maxsize
self._ttl = timedelta(seconds=ttl_seconds)
def get(self, key: str) -> Optional[dict]:
with self._lock:
if key not in self._cache:
return None
if datetime.now() - self._timestamps[key] > self._ttl:
del self._cache[key]
del self._timestamps[key]
return None
return self._cache[key]
def set(self, key: str, value: dict):
with self._lock:
if len(self._cache) >= self._maxsize:
# Evict oldest entry
oldest_key = min(self._timestamps, key=self._timestamps.get)
del self._cache[oldest_key]
del self._timestamps[oldest_key]
self._cache[key] = value
self._timestamps[key] = datetime.now()
# Usage in inference endpoint
feature_cache = TimedFeatureCache(maxsize=50_000, ttl_seconds=30)
def get_features_with_cache(user_id: str) -> dict:
cached = feature_cache.get(user_id)
if cached is not None:
return cached
# Cache miss - retrieve from feature store
features = fraud_service.get_feature_vector(
join_keys={"user_id": user_id}
).to_pandas().to_dict(orient="records")[0]
feature_cache.set(user_id, features)
return features
A 30-second TTL cache on user-level features reduces feature store calls by 80–95% for most recommendation and fraud scoring workloads, where the same users appear repeatedly in short time windows. Do not apply this cache to streaming features that must reflect events in the last 10 seconds - cache them separately with a 5-second TTL or skip caching entirely.
Feature Store Observability: The Three-Layer Stack
Production feature stores require observability at three layers:
Layer 1 - Pipeline Health: Are materialization jobs completing successfully and on schedule? Monitor Airflow task success rates, Tecton workspace alerts, or Databricks job run history. Alert on job failure, job duration exceeding 2x the p95 baseline, and missing runs.
Layer 2 - Data Quality: Are feature values within expected statistical bounds? For each feature view, track: null rate, mean, standard deviation, min, max, and a small set of quantiles (p1, p25, p50, p75, p99) computed after each materialization. Compare against a 7-day rolling baseline. Alert if any metric drifts more than 3 standard deviations from the baseline.
Layer 3 - Serving Health: Are feature lookups completing within SLA at serving time? Track: p50, p95, p99 latency, error rate (null returns, connection failures, timeouts), and throughput. Alert if p99 latency exceeds your SLA (typically 10ms for synchronous serving).
These three layers together give you the visibility to distinguish between "the model performance degraded because the features are wrong" (Layer 2 signal) and "the model performance degraded because features are stale" (Layer 1 signal) and "the model performance degraded because the serving path is timing out and returning defaults" (Layer 3 signal). Without all three layers, debugging production ML issues becomes archaeology.
Embedded Feature Stores: When to Skip the Feature Store Entirely
There is a class of ML systems where a dedicated feature store adds more complexity than it removes. These are characterized by:
- Fewer than 10 features per model
- All features computed at request time from the request payload (no historical state)
- A single team owning both the models and the features
- Model update frequency faster than feature pipeline update frequency
In these cases, compute features directly in the inference service and skip the feature store abstraction:
# Embedded feature computation - no feature store needed
def compute_request_features(user_id: str, transaction: dict) -> dict:
"""Compute all features from available context without a feature store."""
# Request-time features
hour_of_day = datetime.utcnow().hour
is_weekend = datetime.utcnow().weekday() >= 5
amount_log = np.log1p(transaction["amount"])
# Features from the request context
is_new_merchant = transaction.get("is_first_purchase_at_merchant", True)
merchant_category = transaction.get("merchant_category", "unknown")
return {
"hour_of_day": hour_of_day,
"is_weekend": int(is_weekend),
"amount_log": amount_log,
"is_new_merchant": int(is_new_merchant),
"merchant_category_encoded": CATEGORY_ENCODING.get(merchant_category, 0),
}
The rule is: add a feature store when the operational complexity of managing feature pipelines across multiple teams exceeds the operational complexity of the feature store itself. For single-team systems with fewer than 50 features, that crossover rarely happens.
