Skip to main content

Hash Maps and Sets

Hash table internals, collision handling, frequency counting, grouping patterns, two-sum variations, and set operations - with ML-relevant practice problems and full Python solutions.

Reading time: ~35 min | Interview relevance: Very High | Roles: All AI/ML roles

The Real Interview Moment

You are in an on-site interview at Google. The interviewer, a senior MLE on the recommendations team, writes a problem on the whiteboard: "Given a list of feature names used across multiple models in our system, find all features that appear in exactly K models. The list contains (feature_name, model_id) pairs, and there can be millions of entries."

Your first instinct might be to sort the list by feature name, then scan for groups of size K. That works - O(n log n) time. But the interviewer is watching your face, waiting to see if you reach for the more natural tool.

You say: "I will use a hash map. First, I will build a mapping from each feature name to a set of model IDs. Then I will iterate through the map and collect features where the set size equals K. Total time O(n), space O(n)."

The interviewer nods. "Good. Can you code that up?"

You write it in four minutes. Clean, correct, handles edge cases. The interviewer moves to the follow-up: "What if the data does not fit in memory?" Now you are in system design territory, discussing distributed hash maps and MapReduce. But you earned that conversation by nailing the fundamentals first.

Hash maps are the single most versatile data structure in coding interviews. This chapter makes them second nature.

Hash Table Internals

Understanding how hash tables work under the hood helps you reason about performance and answer follow-up questions confidently.

How a Hash Table Works

How a Hash Table Works

Step by step:

  1. Hash the key - Convert the key into an integer using a hash function
  2. Compute the index - Take the hash modulo the table size: index = hash(key) % table_size
  3. Store in bucket - Place the (key, value) pair at that index
  4. Handle collisions - When two keys map to the same index

Collision Handling Strategies

StrategyHow It WorksProsCons
ChainingEach bucket holds a linked list of entriesSimple, never "full"Cache-unfriendly, extra memory for pointers
Open Addressing (linear probing)If bucket occupied, try next bucketCache-friendly, no extra pointersClustering, degrades as load factor rises
Open Addressing (quadratic probing)Try bucket + 1, + 4, + 9, ...Less clustering than linearCan fail to find empty bucket
Robin Hood HashingSteal from "rich" (close to home) entriesLow variance in probe lengthsComplex implementation

Python's dict uses open addressing with a custom probing sequence. CPython uses a perturbation-based scheme that mixes in higher bits of the hash.

60-Second Answer

When an interviewer asks "What is the worst-case time complexity of a hash table lookup?", the answer is O(n) - if all keys hash to the same bucket, it degenerates to a linear scan. But emphasize that this is astronomically unlikely with a good hash function, and the expected (average-case) complexity is O(1).

Load Factor and Resizing

# Conceptual implementation of hash table resizing
class SimpleHashMap:
def __init__(self, initial_size=8):
self.size = initial_size
self.count = 0
self.buckets = [[] for _ in range(self.size)]
self.load_factor_threshold = 0.75

def _hash(self, key):
return hash(key) % self.size

def put(self, key, value):
idx = self._hash(key)
# Check if key exists and update
for i, (k, v) in enumerate(self.buckets[idx]):
if k == key:
self.buckets[idx][i] = (key, value)
return

self.buckets[idx].append((key, value))
self.count += 1

# Resize if load factor exceeds threshold
if self.count / self.size > self.load_factor_threshold:
self._resize()

def get(self, key):
idx = self._hash(key)
for k, v in self.buckets[idx]:
if k == key:
return v
raise KeyError(key)

def _resize(self):
"""Double the table size and rehash all entries."""
old_buckets = self.buckets
self.size *= 2
self.buckets = [[] for _ in range(self.size)]
self.count = 0
for bucket in old_buckets:
for key, value in bucket:
self.put(key, value)
Common Trap

A common interview mistake is claiming hash table operations are always O(1). They are O(1) on average (amortized). The worst case is O(n) due to collisions, and resizing is O(n) when it happens. Always qualify with "average case" or "amortized."

What Can Be a Hash Key in Python?

This is a surprisingly common interview stumbling point:

TypeHashable?Why / Why Not
intYesImmutable
floatYesImmutable (but beware: hash(1) == hash(1.0))
strYesImmutable
tupleYes (if contents hashable)Immutable container
frozensetYesImmutable set
listNoMutable - hash would change on modification
dictNoMutable
setNoMutable - use frozenset instead
Custom objectYes (default)Uses id() by default; override __hash__ and __eq__
# Converting unhashable types to hashable equivalents
list_key = [1, 2, 3]
# dict_map[list_key] = "value" # TypeError!
dict_map = {}
dict_map[tuple(list_key)] = "value" # Works - tuple is hashable

set_key = {1, 2, 3}
# dict_map[set_key] = "value" # TypeError!
dict_map[frozenset(set_key)] = "value" # Works - frozenset is hashable

Pattern 1: Frequency Counting

The most common hash map pattern in ML interviews. Count occurrences of elements.

from collections import Counter

# Basic frequency counting
def word_frequencies(text):
"""Count word frequencies in text.

ML context: Building vocabulary for NLP models,
computing term frequencies for TF-IDF.

Time: O(n), Space: O(unique words)
"""
words = text.lower().split()
return Counter(words)

# Top-K frequent elements
def top_k_frequent(nums, k):
"""Return the K most frequent elements.

ML context: Finding the most common features, labels, or tokens.

Time: O(n) using bucket sort approach
Space: O(n)
"""
counts = Counter(nums)

# Bucket sort: index = frequency, value = list of elements with that frequency
max_freq = max(counts.values())
buckets = [[] for _ in range(max_freq + 1)]
for num, freq in counts.items():
buckets[freq].append(num)

# Collect top-K from highest frequency buckets
result = []
for freq in range(max_freq, 0, -1):
for num in buckets[freq]:
result.append(num)
if len(result) == k:
return result
return result

# Note: Counter.most_common(k) uses a heap internally - O(n + k log n)
# The bucket sort approach above is O(n) when k is small relative to n
Company Variation

At Meta, "top-K frequent" is one of the most commonly asked problems across all roles, including ML. Expect it as a warm-up question. At Google, the follow-up is usually "What if the data is streaming?" - which requires a Count-Min Sketch or similar probabilistic data structure.

Frequency-Based Grouping

from collections import defaultdict

def group_anagrams(words):
"""Group words that are anagrams of each other.

ML context: Finding near-duplicate text entries in a dataset,
grouping similar feature names.

Key insight: Two words are anagrams if their sorted characters match.
Use sorted characters as the hash key.

Time: O(n * k log k) where k = max word length
Space: O(n * k)
"""
groups = defaultdict(list)
for word in words:
key = tuple(sorted(word)) # Anagrams share the same sorted form
groups[key].append(word)
return list(groups.values())

# Optimization: use character count tuple as key instead of sorting
def group_anagrams_optimal(words):
"""O(n * k) using character counts instead of sorting.

Time: O(n * k) where k = max word length
Space: O(n * k)
"""
groups = defaultdict(list)
for word in words:
# Create a tuple of character counts (26 letters)
count = [0] * 26
for char in word:
count[ord(char) - ord('a')] += 1
groups[tuple(count)].append(word)
return list(groups.values())

Pattern 2: Two-Sum and Its Variants

The "two-sum" pattern is the most iconic hash map problem. Understanding its variants prepares you for a wide range of questions.

# Classic Two Sum
def two_sum(nums, target):
"""Find indices of two numbers that sum to target.

Time: O(n), Space: O(n)
"""
seen = {} # value -> index
for i, num in enumerate(nums):
complement = target - num
if complement in seen:
return [seen[complement], i]
seen[num] = i
return []

# Three Sum (reduces to Two Sum)
def three_sum(nums):
"""Find all unique triplets that sum to zero.

Time: O(n^2), Space: O(1) extra (excluding output)
"""
nums.sort()
result = []
for i in range(len(nums) - 2):
# Skip duplicates for first element
if i > 0 and nums[i] == nums[i - 1]:
continue
# Two-pointer approach for remaining two elements
left, right = i + 1, len(nums) - 1
while left < right:
total = nums[i] + nums[left] + nums[right]
if total < 0:
left += 1
elif total > 0:
right -= 1
else:
result.append([nums[i], nums[left], nums[right]])
# Skip duplicates
while left < right and nums[left] == nums[left + 1]:
left += 1
while left < right and nums[right] == nums[right - 1]:
right -= 1
left += 1
right -= 1
return result

# Four Sum - generalization
def four_sum(nums, target):
"""Find all unique quadruplets that sum to target.

Time: O(n^3), Space: O(1) extra
"""
nums.sort()
result = []
n = len(nums)
for i in range(n - 3):
if i > 0 and nums[i] == nums[i - 1]:
continue
for j in range(i + 1, n - 2):
if j > i + 1 and nums[j] == nums[j - 1]:
continue
left, right = j + 1, n - 1
while left < right:
total = nums[i] + nums[j] + nums[left] + nums[right]
if total < target:
left += 1
elif total > target:
right -= 1
else:
result.append([nums[i], nums[j], nums[left], nums[right]])
while left < right and nums[left] == nums[left + 1]:
left += 1
while left < right and nums[right] == nums[right - 1]:
right -= 1
left += 1
right -= 1
return result

Two-Sum Variants Comparison

VariantApproachTimeSpace
Two Sum (unsorted)Hash mapO(n)O(n)
Two Sum (sorted)Two pointersO(n)O(1)
Two Sum (BST)Inorder + two pointersO(n)O(n) for inorder
Three SumSort + two pointersO(n^2)O(1)
Four SumSort + nested + two pointersO(n^3)O(1)
K Sum (general)Recursion + two pointersO(n^(k-1))O(k)

Pattern 3: Hash Map for Lookup Optimization

The core value of hash maps: turning O(n) lookups into O(1).

# Pattern: Check if complement/partner exists
def find_pairs_with_difference(arr, k):
"""Find all pairs with difference exactly k.

ML context: Finding feature pairs whose correlation
differs by exactly a threshold.

Time: O(n), Space: O(n)
"""
if k < 0:
return []
num_set = set(arr)
pairs = []
for num in arr:
if num + k in num_set:
pairs.append((num, num + k))
# Remove duplicates if needed
return list(set(pairs))

# Pattern: Hash map as cache / memoization
def is_isomorphic(s, t):
"""Check if two strings are isomorphic (same structure).

ML context: Checking if two feature encoding schemes
have the same structural mapping.

Two strings are isomorphic if characters in s can be replaced
to get t, with no two characters mapping to the same character.

Time: O(n), Space: O(n)
"""
if len(s) != len(t):
return False
s_to_t = {}
t_to_s = {}
for cs, ct in zip(s, t):
if cs in s_to_t:
if s_to_t[cs] != ct:
return False
else:
if ct in t_to_s:
return False
s_to_t[cs] = ct
t_to_s[ct] = cs
return True

# Example: is_isomorphic("egg", "add") -> True
# is_isomorphic("foo", "bar") -> False

Pattern 4: Set Operations

Sets provide O(1) membership testing and support mathematical set operations.

# Set operations comparison
def set_operations_demo():
"""Demonstrate Python set operations and their complexity."""
features_model_a = {'age', 'income', 'education', 'city', 'gender'}
features_model_b = {'age', 'income', 'credit_score', 'employment', 'city'}

# Intersection: features used by both models - O(min(n, m))
shared = features_model_a & features_model_b
# {'age', 'income', 'city'}

# Union: all features across both models - O(n + m)
all_features = features_model_a | features_model_b
# {'age', 'income', 'education', 'city', 'gender', 'credit_score', 'employment'}

# Difference: features unique to model A - O(n)
unique_to_a = features_model_a - features_model_b
# {'education', 'gender'}

# Symmetric difference: features in exactly one model - O(n + m)
exclusive = features_model_a ^ features_model_b
# {'education', 'gender', 'credit_score', 'employment'}

# Subset check - O(n)
is_subset = features_model_a <= all_features # True

return shared, all_features, unique_to_a, exclusive

Set for Deduplication

def deduplicate_features(feature_lists):
"""Remove duplicate features across multiple feature lists,
preserving the order of first occurrence.

ML context: Merging feature sets from different data sources
while maintaining a consistent ordering.

Time: O(total features), Space: O(unique features)
"""
seen = set()
result = []
for feature_list in feature_lists:
for feature in feature_list:
if feature not in seen:
seen.add(feature)
result.append(feature)
return result

# Example:
# lists = [['age', 'income', 'city'],
# ['income', 'education', 'age'],
# ['city', 'gender', 'income']]
# Result: ['age', 'income', 'city', 'education', 'gender']

Pattern 5: Hash Map for Caching (LRU Cache)

LRU (Least Recently Used) Cache is a classic interview problem that combines a hash map with a doubly-linked list.

from collections import OrderedDict

class LRUCache:
"""Least Recently Used Cache.

ML context: Caching model predictions for repeated inputs,
caching feature computations, embedding lookups.

All operations O(1).
"""
def __init__(self, capacity):
self.capacity = capacity
self.cache = OrderedDict()

def get(self, key):
"""Get value and mark as recently used."""
if key not in self.cache:
return -1
self.cache.move_to_end(key) # Mark as recently used
return self.cache[key]

def put(self, key, value):
"""Add/update entry. Evict LRU if over capacity."""
if key in self.cache:
self.cache.move_to_end(key)
self.cache[key] = value
if len(self.cache) > self.capacity:
self.cache.popitem(last=False) # Remove least recently used

# From-scratch implementation with doubly-linked list
class DNode:
def __init__(self, key=0, val=0):
self.key = key
self.val = val
self.prev = None
self.next = None

class LRUCacheScratch:
"""LRU Cache without OrderedDict - pure hash map + doubly linked list."""
def __init__(self, capacity):
self.capacity = capacity
self.cache = {} # key -> DNode
# Sentinel nodes to avoid null checks
self.head = DNode() # Dummy head (most recently used end)
self.tail = DNode() # Dummy tail (least recently used end)
self.head.next = self.tail
self.tail.prev = self.head

def _remove(self, node):
"""Remove node from doubly linked list."""
node.prev.next = node.next
node.next.prev = node.prev

def _add_to_front(self, node):
"""Add node right after head (most recently used position)."""
node.next = self.head.next
node.prev = self.head
self.head.next.prev = node
self.head.next = node

def get(self, key):
if key not in self.cache:
return -1
node = self.cache[key]
self._remove(node)
self._add_to_front(node)
return node.val

def put(self, key, value):
if key in self.cache:
self._remove(self.cache[key])
node = DNode(key, value)
self.cache[key] = node
self._add_to_front(node)
if len(self.cache) > self.capacity:
lru = self.tail.prev
self._remove(lru)
del self.cache[lru.key]
Instant Rejection

If you implement LRU Cache with a list and linear search (O(n) get/put), the interviewer will reject it. The entire point is O(1) for both operations. You must combine a hash map (O(1) lookup) with a doubly-linked list (O(1) insert/remove). Using Python's OrderedDict is acceptable but be prepared to explain how it works internally.

Pattern 6: Hash Map for Subarray Problems

def longest_subarray_with_k_distinct(arr, k):
"""Find the length of the longest subarray with at most k distinct elements.

ML context: Finding the longest contiguous sequence in a dataset
where at most K different categories appear.

Uses sliding window + hash map to track element frequencies.

Time: O(n), Space: O(k)
"""
from collections import defaultdict

char_count = defaultdict(int)
left = 0
max_length = 0

for right in range(len(arr)):
char_count[arr[right]] += 1

# Shrink window until we have at most k distinct elements
while len(char_count) > k:
char_count[arr[left]] -= 1
if char_count[arr[left]] == 0:
del char_count[arr[left]]
left += 1

max_length = max(max_length, right - left + 1)

return max_length

def subarrays_with_exactly_k_distinct(arr, k):
"""Count subarrays with exactly k distinct elements.

Trick: exactly(k) = at_most(k) - at_most(k-1)

Time: O(n), Space: O(k)
"""
def at_most_k(arr, k):
from collections import defaultdict
count = defaultdict(int)
left = 0
result = 0
for right in range(len(arr)):
count[arr[right]] += 1
while len(count) > k:
count[arr[left]] -= 1
if count[arr[left]] == 0:
del count[arr[left]]
left += 1
result += right - left + 1
return result

return at_most_k(arr, k) - at_most_k(arr, k - 1)

ML-Specific Hash Map Applications

Building a Vocabulary

def build_vocabulary(documents, min_freq=2, max_vocab_size=10000):
"""Build a vocabulary from a list of documents.

ML context: Creating token-to-index mapping for NLP models.

Time: O(total tokens), Space: O(unique tokens)
"""
from collections import Counter

# Count all tokens
token_counts = Counter()
for doc in documents:
tokens = doc.lower().split()
token_counts.update(tokens)

# Filter by minimum frequency
filtered = {token: count for token, count in token_counts.items()
if count >= min_freq}

# Sort by frequency (descending) and take top max_vocab_size
sorted_tokens = sorted(filtered.keys(),
key=lambda t: filtered[t],
reverse=True)[:max_vocab_size]

# Create token-to-index mapping
vocab = {'<PAD>': 0, '<UNK>': 1} # Special tokens
for i, token in enumerate(sorted_tokens):
vocab[token] = i + 2 # Offset by special tokens

return vocab

def encode_text(text, vocab):
"""Encode text using vocabulary mapping.

Time: O(n), Space: O(n) where n = number of tokens
"""
tokens = text.lower().split()
return [vocab.get(token, vocab['<UNK>']) for token in tokens]

Feature Hashing (Hashing Trick)

def feature_hash(features, n_buckets=1024):
"""Hash features to a fixed-size vector.

ML context: The hashing trick - map high-dimensional categorical
features to a fixed-size vector without maintaining a vocabulary.
Used in Vowpal Wabbit, scikit-learn's HashingVectorizer.

Time: O(number of features), Space: O(n_buckets)
"""
vector = [0.0] * n_buckets
for feature_name, value in features.items():
# Hash feature name to get bucket index
bucket = hash(feature_name) % n_buckets
# Use sign hash to reduce collisions
sign = 1 if hash(feature_name + '_sign') % 2 == 0 else -1
vector[bucket] += sign * value
return vector

# Example:
# features = {'user_age': 25, 'user_city_NYC': 1, 'user_premium': 1}
# vector = feature_hash(features, n_buckets=8)
# Result: [0, 25, 0, -1, 1, 0, 0, 0] (example - actual values depend on hash function)

Embedding Lookup Table

class EmbeddingTable:
"""Simple embedding lookup using a hash map.

ML context: This is conceptually how nn.Embedding works -
a mapping from integer indices to dense vectors.

Time: O(1) lookup, O(n * d) initialization
Space: O(n * d) where n = vocab size, d = embedding dim
"""
def __init__(self, vocab_size, embedding_dim, seed=42):
import random
random.seed(seed)
self.embeddings = {}
for i in range(vocab_size):
self.embeddings[i] = [
random.gauss(0, 0.01) for _ in range(embedding_dim)
]

def lookup(self, indices):
"""Look up embeddings for a list of indices."""
return [self.embeddings.get(idx, self.embeddings[0]) for idx in indices]

def lookup_batch(self, batch_of_indices):
"""Look up embeddings for a batch of sequences."""
return [[self.embeddings.get(idx, self.embeddings[0])
for idx in seq] for seq in batch_of_indices]

Practice Problems

Problem 1: Feature Deduplication Across Models

Given a list of (feature_name, model_name) tuples, find features that are used in all models.

# Input: pairs = [("age", "modelA"), ("income", "modelA"), ("age", "modelB"),
# ("city", "modelB"), ("age", "modelC"), ("income", "modelC")]
# total_models = 3
# Output: ["age"]
# Explanation: "age" appears in all 3 models
Hint 1
Build a mapping from feature_name to the set of models it appears in.
Hint 2
Use a `defaultdict(set)` to collect model names per feature.
Hint 3
Return features where `len(models) == total_models`.
Solution
from collections import defaultdict

def features_in_all_models(pairs, total_models):
"""Find features present in every model.

Time: O(n), Space: O(n)
"""
feature_to_models = defaultdict(set)
for feature, model in pairs:
feature_to_models[feature].add(model)

return [feature for feature, models in feature_to_models.items()
if len(models) == total_models]

Problem 2: Minimum Window Substring

Given a string S and a string T, find the minimum window in S that contains all characters of T.

# Input: s = "ADOBECODEBANC", t = "ABC"
# Output: "BANC"
Hint 1
Use a sliding window with two hash maps: one for the required character counts (from T), one for the current window's character counts.
Hint 2
Track how many characters in T are "satisfied" (current count >= required count). When all are satisfied, try to shrink the window from the left.
Hint 3
The time complexity is O(|S| + |T|) because each character is visited at most twice (once by right pointer, once by left pointer).
Solution
from collections import Counter

def min_window_substring(s, t):
"""Find minimum window in s containing all characters of t.

Time: O(|S| + |T|), Space: O(|S| + |T|)
"""
if not s or not t or len(s) < len(t):
return ""

required = Counter(t)
needed = len(required) # Number of unique chars needed
formed = 0 # Number of unique chars satisfied
window_counts = {}

min_len = float('inf')
min_start = 0
left = 0

for right in range(len(s)):
char = s[right]
window_counts[char] = window_counts.get(char, 0) + 1

# Check if current character's frequency is satisfied
if char in required and window_counts[char] == required[char]:
formed += 1

# Try to shrink window
while formed == needed:
# Update minimum window
if right - left + 1 < min_len:
min_len = right - left + 1
min_start = left

# Remove left character
left_char = s[left]
window_counts[left_char] -= 1
if left_char in required and window_counts[left_char] < required[left_char]:
formed -= 1
left += 1

return s[min_start:min_start + min_len] if min_len != float('inf') else ""

Problem 3: Subarray Sum Equals K

Count the number of contiguous subarrays whose sum equals K.

# Input: nums = [1, 1, 1], k = 2
# Output: 2 (subarrays [1,1] at index 0-1 and 1-2)
Hint 1
Use prefix sums. If prefix[j] - prefix[i] == k, then the subarray from i to j-1 sums to k.
Hint 2
Store prefix sum frequencies in a hash map. At each position, check how many times (current_prefix - k) has appeared.
Hint 3
Initialize the hash map with `{0: 1}` to handle subarrays starting from index 0.
Solution
def subarray_sum(nums, k):
"""Count subarrays with sum equal to k using prefix sum + hash map.

Time: O(n), Space: O(n)
"""
count = 0
prefix = 0
prefix_counts = {0: 1}

for num in nums:
prefix += num
count += prefix_counts.get(prefix - k, 0)
prefix_counts[prefix] = prefix_counts.get(prefix, 0) + 1

return count

Problem 4: Valid Sudoku

Determine if a 9x9 Sudoku board is valid (only check filled cells).

# ML relevance: Constraint satisfaction - similar to checking that
# a feature matrix satisfies uniqueness constraints per category.
Hint 1
Use three sets: one for each row, one for each column, one for each 3x3 box.
Hint 2
The box index for cell (r, c) is `(r // 3, c // 3)`.
Hint 3
For each filled cell, check if the value already exists in its row set, column set, or box set.
Solution
def is_valid_sudoku(board):
"""Check if a Sudoku board is valid.

Time: O(81) = O(1) since board size is fixed
Space: O(81) = O(1)
"""
rows = [set() for _ in range(9)]
cols = [set() for _ in range(9)]
boxes = [set() for _ in range(9)]

for r in range(9):
for c in range(9):
val = board[r][c]
if val == '.':
continue

box_idx = (r // 3) * 3 + (c // 3)

if val in rows[r] or val in cols[c] or val in boxes[box_idx]:
return False

rows[r].add(val)
cols[c].add(val)
boxes[box_idx].add(val)

return True

Problem 5: Longest Consecutive Sequence

Given an unsorted array of integers, find the length of the longest consecutive elements sequence in O(n) time.

# Input: [100, 4, 200, 1, 3, 2]
# Output: 4 (sequence [1, 2, 3, 4])
Hint 1
Put all elements in a set for O(1) lookup. Then for each number, check if it is the start of a sequence (num - 1 not in set).
Hint 2
Only count from sequence starts. This ensures each element is visited at most twice total.
Hint 3
From each start, keep incrementing and checking `num + 1 in set`.
Solution
def longest_consecutive(nums):
"""Find longest consecutive sequence using a set.

Time: O(n), Space: O(n)
"""
if not nums:
return 0

num_set = set(nums)
max_length = 0

for num in num_set:
# Only start counting from the beginning of a sequence
if num - 1 not in num_set:
current = num
length = 1
while current + 1 in num_set:
current += 1
length += 1
max_length = max(max_length, length)

return max_length

Problem 6: Design a Time-Based Key-Value Store

Create a data structure that stores (key, value, timestamp) and supports:

  • set(key, value, timestamp) - store the value for the given key at the given timestamp
  • get(key, timestamp) - return the value associated with the largest timestamp <= given timestamp
# ML context: A feature store that tracks feature values over time.
# "What was the user's credit score at inference time T?"
Hint 1
Use a hash map where each key maps to a list of (timestamp, value) pairs. Since timestamps are always increasing, the list stays sorted.
Hint 2
For `get`, use binary search on the timestamp list to find the largest timestamp <= the query timestamp.
Hint 3
Python's `bisect` module provides `bisect_right` which finds the insertion point - the answer is one position before it.
Solution
from collections import defaultdict
import bisect

class TimeMap:
"""Time-based key-value store.

set: O(1) amortized
get: O(log n) where n = number of timestamps for that key
"""
def __init__(self):
self.store = defaultdict(list) # key -> [(timestamp, value), ...]

def set(self, key, value, timestamp):
self.store[key].append((timestamp, value))

def get(self, key, timestamp):
if key not in self.store:
return ""
entries = self.store[key]
# Binary search for rightmost timestamp <= query timestamp
idx = bisect.bisect_right(entries, (timestamp, chr(127))) - 1
if idx < 0:
return ""
return entries[idx][1]

Interview Cheat Sheet

PatternWhen to UseKey Detail
Frequency counting"Most common", "frequency of", "count"Use Counter, bucket sort for top-K in O(n)
Two-Sum variants"Two numbers that sum to", "pair with property"Hash map for O(1) complement lookup
Grouping"Group by property", "anagrams", "categorize"defaultdict(list), choose the right key
Set membership"Contains", "exists in", "unique"set for O(1) in check
Set operations"Common elements", "difference", "overlap"& (intersection), | (union), - (difference)
Sliding window + hash"Window with K distinct", "contains all chars"Hash map tracks window contents
Prefix sum + hash"Subarray sum equals K"Map stores prefix sum frequencies
LRU Cache"Eviction policy", "bounded cache"OrderedDict or hash map + doubly linked list

Spaced Repetition Checkpoints

DayReview ActivityTime
Day 0Read all patterns. Solve Two Sum, Group Anagrams, and Top K Frequent without looking.35 min
Day 3Implement LRU Cache from scratch (not using OrderedDict).20 min
Day 7Solve Minimum Window Substring and Subarray Sum Equals K from scratch.25 min
Day 14Explain hash table internals to someone: how hashing works, collision handling, load factor, and resizing.15 min
Day 21Time yourself on 3 hash map problems from LeetCode (Medium). Target: under 20 minutes each.60 min

Next Steps

Hash maps give you O(1) lookups. Next, learn Trees and Graphs to handle hierarchical and network-structured data - from decision trees to ML pipeline DAGs.

© 2026 EngineersOfAI. All rights reserved.