Skip to main content

Heap and Priority Queue - Efficient Minimum Tracking

Reading time: ~20 minutes | Level: Foundation → Engineering

Here is a problem that exposes whether you understand the fundamental tradeoff between sorting and heaps.

import heapq
import time
import random

data = [random.randint(0, 1_000_000) for _ in range(10_000_000)]
k = 10

# Method A: full sort, then take top-k
start = time.perf_counter()
result_a = sorted(data, reverse=True)[:k]
sort_time = time.perf_counter() - start

# Method B: heapq.nlargest
start = time.perf_counter()
result_b = heapq.nlargest(k, data)
heap_time = time.perf_counter() - start

print(f"sorted()[-k:]: {sort_time:.3f}s - O(n log n)")
print(f"heapq.nlargest(k): {heap_time:.3f}s - O(n log k)")
print(f"Heap is {sort_time / heap_time:.1f}x faster for k={k}")
print(f"Results equal: {sorted(result_a) == sorted(result_b)}")

Output (approximate):

sorted()[-k:]: 6.241s - O(n log n)
heapq.nlargest(k): 1.083s - O(n log k)
Heap is 5.8x faster for k=10
Results equal: True

Both return the same 10 largest values from 10 million elements. The heap is 5.8x faster because it avoids fully sorting all 10 million elements - it only maintains a 10-element heap throughout the scan. This is the difference between O(n log n) and O(n log k) where k=10.

Understanding when to use a heap instead of a sort is a mark of engineering maturity.

What You Will Learn

  • The binary heap data structure: shape property, heap invariant, and array representation
  • How Python's heapq module implements a min-heap using a plain list
  • The index mapping formula: parent at i, children at 2i+1 and 2i+2
  • heapq.heappush(), heapq.heappop(), heapq.heapify() - internals and complexity
  • Max-heap simulation: the negation pattern and its pitfalls
  • heapq.nlargest() and heapq.nsmallest() - when they beat full sorting
  • queue.PriorityQueue - thread-safe priority queue and when to use it
  • Heap-based algorithmic patterns: top-K, merge K sorted lists, running median
  • Real-world applications: Dijkstra's shortest path, event simulation, task scheduling

Prerequisites

  • Python lists internals - dynamic arrays and indexing (Lesson 1)
  • Time complexity fundamentals - O(log n) and O(n) (Lesson 9)
  • Basic Python classes and tuples

Part 1 - The Binary Heap Data Structure

What Is a Heap?

A heap is a complete binary tree that satisfies the heap property. "Complete" means all levels are fully filled except possibly the last, which is filled left to right.

Rules: Parent(i) ≤ Child_left(i) and Parent(i) ≤ Child_right(i) always. The minimum is always at the root. No other ordering is guaranteed - sibling order is arbitrary (note: 3 is left of 2, but that is fine).

Note: 3 is left of 2, but that is fine. The heap only guarantees parent ≤ children, not left ≤ right.

Max-Heap

The maximum is always at the root. Python's heapq implements only min-heaps.

Why Binary? Why Complete?

A binary tree with the "complete" constraint maps perfectly to a flat array - no pointers needed. This is the key insight that makes heaps memory-efficient.

Part 2 - Heap as a Flat List: The Index Mapping

Python's heapq does not use a tree class. It uses a regular Python list with a mathematical index mapping.

Heap stored as flat array (0-indexed):

Index: 0 1 2 3 4 5 6 7 8
Array: [1, 3, 2, 7, 4, 5, 6, 8, 9]

For element at index i:

RelationshipFormulaExample (i=1, value=3)
Parent(i - 1) // 2(1-1)//2 = 0 → value 1 ✓
Left child2 * i + 12*1+1 = 3 → value 7 ✓
Right child2 * i + 22*1+2 = 4 → value 4 ✓

The flat array representation:

  • Requires zero pointers (unlike linked tree nodes)
  • Is cache-friendly - children are at predictable offsets
  • Enables O(1) parent and child access via index arithmetic
import heapq

# heapq uses a regular list - you can inspect it directly
heap = []
for val in [5, 3, 8, 1, 4, 2, 7]:
heapq.heappush(heap, val)

print(heap) # [1, 3, 2, 4, 5, 8, 7]
# ↑
# Always the minimum (root at index 0)

# Verify heap invariant: every parent ≤ both children
def verify_heap(h):
for i in range(1, len(h)):
parent_idx = (i - 1) // 2
assert h[parent_idx] <= h[i], f"Violated at index {i}: parent={h[parent_idx]}, child={h[i]}"
print(f"Heap invariant holds for {len(h)} elements")

verify_heap(heap) # Heap invariant holds for 7 elements

Part 3 - heapq Operations: Internals and Complexity

heapq.heappush() - Insert and Sift Up

Inserting into a heap: place the new element at the end of the array (maintaining the "complete" shape), then sift up - repeatedly swap with parent until the heap invariant is restored.

heappush(heap, 0) - inserting 0 into [1, 3, 2, 7, 4, 5, 6]:

Step 1: Append to end (index 7)
[1, 3, 2, 7, 4, 5, 6, 0]

Step 2: Compare 0 with parent at (7-1)//2 = 3 → value 7
0 < 7 → swap
[1, 3, 2, 0, 4, 5, 6, 7]

Step 3: Compare 0 with parent at (3-1)//2 = 1 → value 3
0 < 3 → swap
[1, 0, 2, 3, 4, 5, 6, 7]

Step 4: Compare 0 with parent at (1-1)//2 = 0 → value 1
0 < 1 → swap
[0, 1, 2, 3, 4, 5, 6, 7]

Step 5: Reached root - done
Result: [0, 1, 2, 3, 4, 5, 6, 7]

Maximum swaps = height of tree = log₂(n)
Complexity: O(log n)

heapq.heappop() - Remove Minimum and Sift Down

Removing the minimum: take the root (index 0), move the last element to the root, then sift down - repeatedly swap with the smaller child until the heap invariant is restored.

heappop([0, 1, 2, 3, 4, 5, 6, 7]) → returns 0

Step 1: Save root (0), move last element to root
[7, 1, 2, 3, 4, 5, 6]

Step 2: Compare 7 with children: left=1 (idx 1), right=2 (idx 2)
Smaller child is 1. 7 > 1 → swap with left child
[1, 7, 2, 3, 4, 5, 6]

Step 3: Compare 7 with children: left=3 (idx 3), right=4 (idx 4)
Smaller child is 3. 7 > 3 → swap with left child
[1, 3, 2, 7, 4, 5, 6]

Step 4: Compare 7 with children: left=8 (idx 7) - doesn't exist
No children - done
Result: [1, 3, 2, 7, 4, 5, 6], returned value: 0

Maximum swaps = height of tree = log₂(n)
Complexity: O(log n)

heapq.heapify() - Build Heap from List: O(n)

You might expect building a heap from n elements to cost O(n log n) - push each element one by one. But heapify() uses a bottom-up approach that is provably O(n).

heapify([5, 3, 8, 1, 4, 2, 7])

Start from the last non-leaf (index n//2 - 1) and sift down:
Process index 2 (value 8): children are 2,7 - 2 < 8 → swap
Process index 1 (value 3): children are 1,4 - 1 < 3 → swap
Process index 0 (value 5): children after swaps - sift down

Result: [1, 3, 2, 5, 4, 8, 7] ← valid min-heap

Why O(n)?
Most nodes are near the leaves. Leaf nodes do O(1) work (no children to sift to).
Only the root does O(log n) work.
Sum: n/2 × 0 + n/4 × 1 + n/8 × 2 + ... = O(n) (mathematical proof via geometric series)

heapify() is O(n) - always use it instead of n × heappush() when starting from a list.
import heapq

data = [5, 3, 8, 1, 4, 2, 7]

# heapify: O(n)
heap = data[:]
heapq.heapify(heap)
print(heap) # [1, 3, 2, 5, 4, 8, 7]

# Equivalent but O(n log n) - slower:
heap2 = []
for x in data:
heapq.heappush(heap2, x)
print(heap2) # [1, 3, 2, 5, 4, 8, 7] - same result, slower construction

# Always use heapify() when you have the full list upfront

Complete Complexity Table

OperationComplexityNotes
heapq.heappush(heap, x)O(log n)Append + sift up
heapq.heappop(heap)O(log n)Remove root + sift down
heapq.heapify(lst)O(n)Bottom-up heap construction
heap[0] (peek minimum)O(1)Root is always at index 0
heapq.heappushpop(heap, x)O(log n)Push then pop - more efficient
heapq.heapreplace(heap, x)O(log n)Pop then push - even more efficient
heapq.nlargest(k, data)O(n log k)Maintains k-element heap
heapq.nsmallest(k, data)O(n log k)Maintains k-element heap
len(heap)O(1)Python list len
import heapq

# Basic heap operations
heap = []

# Push elements
for val in [5, 3, 8, 1, 4, 2, 7]:
heapq.heappush(heap, val)

print(f"Heap: {heap}") # [1, 3, 2, 5, 4, 8, 7]
print(f"Minimum: {heap[0]}") # 1 - peek without removing

# Pop elements (always in ascending order from min-heap)
popped = []
while heap:
popped.append(heapq.heappop(heap))
print(f"Popped: {popped}") # [1, 2, 3, 4, 5, 7, 8]

# heappushpop: push new element then pop minimum
# More efficient than heappush + heappop separately
heap = [1, 3, 5, 7, 9]
heapq.heapify(heap)
result = heapq.heappushpop(heap, 4)
print(f"Pushed 4, popped: {result}") # 1 (minimum)
print(f"Heap after: {heap}") # [3, 4, 5, 7, 9]

# heapreplace: pop minimum first, then push new element
# Guaranteed that new element is placed (unlike heappushpop which may pop it)
heap = [1, 3, 5, 7, 9]
heapq.heapify(heap)
result = heapq.heapreplace(heap, 0)
print(f"Replaced, popped: {result}") # 1
print(f"Heap after: {heap}") # [0, 3, 5, 7, 9]

Part 4 - Max-Heap: The Negation Pattern

Python's heapq only implements min-heaps. To simulate a max-heap, negate all values.

import heapq

# Min-heap (default): smallest element first
min_heap = [3, 1, 4, 1, 5, 9, 2, 6]
heapq.heapify(min_heap)
print(heapq.heappop(min_heap)) # 1 - smallest

# Max-heap simulation: negate values
values = [3, 1, 4, 1, 5, 9, 2, 6]
max_heap = [-x for x in values] # negate all values
heapq.heapify(max_heap)
print(-heapq.heappop(max_heap)) # 9 - largest (negate back)

# Or push/pop with negation
max_heap = []
for x in [3, 1, 4, 1, 5, 9, 2, 6]:
heapq.heappush(max_heap, -x) # store negated

largest = -heapq.heappop(max_heap)
print(largest) # 9

:::warning Max-Heap Negation Pitfall with Complex Objects Negation works cleanly for numbers. For tuples used in priority queues, negate only the priority field:

# Correct: negate priority, keep rest unchanged
heapq.heappush(max_heap, (-priority, item))
priority, item = heapq.heappop(max_heap)
priority = -priority # restore original value

For objects that cannot be negated (strings, custom objects), use queue.PriorityQueue or invert your priority scale (e.g., lower number = higher priority). :::

Part 5 - heapq.nlargest and heapq.nsmallest

These functions efficiently find the k largest or smallest elements without fully sorting.

import heapq

data = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]

# 3 largest
print(heapq.nlargest(3, data)) # [9, 6, 5]

# 3 smallest
print(heapq.nsmallest(3, data)) # [1, 1, 2]

# With key function
records = [
{"name": "Alice", "score": 95},
{"name": "Bob", "score": 72},
{"name": "Carol", "score": 88},
{"name": "Dave", "score": 99},
{"name": "Eve", "score": 81},
]

top_3 = heapq.nlargest(3, records, key=lambda r: r["score"])
for r in top_3:
print(f"{r['name']}: {r['score']}")
# Dave: 99
# Alice: 95
# Carol: 88

When nlargest/nsmallest Beat Full Sort

ConditionBest ApproachComplexity
k == 1 (single min/max)min() or max()O(n)
k ≪ n (much smaller than n)heapq.nlargest / nsmallestO(n log k)
k ≈ n (close to n)sorted()[-k:]O(n log n)
k == n (all elements)sorted()O(n log n)

CPython's heapq.nlargest uses this logic internally: k=1 uses max(), k >= n/2 uses sorted(), otherwise heap-based O(n log k).

import heapq
import timeit
import random

data = random.sample(range(10_000_000), 1_000_000)

for k in [10, 1000, 10000, 100000]:
t_heap = timeit.timeit(
f"heapq.nlargest({k}, data)",
globals={"heapq": heapq, "data": data}, number=5
) / 5

t_sort = timeit.timeit(
f"sorted(data, reverse=True)[:{k}]",
globals={"data": data}, number=5
) / 5

ratio = t_sort / t_heap
print(f"k={k:>7,} heap={t_heap:.3f}s sort={t_sort:.3f}s ratio={ratio:.2f}x")

Output (approximate):

k= 10 heap=0.067s sort=0.631s ratio=9.42x
k= 1,000 heap=0.138s sort=0.631s ratio=4.57x
k= 10,000 heap=0.284s sort=0.631s ratio=2.22x
k=100,000 heap=0.572s sort=0.631s ratio=1.10x

The heap advantage disappears as k approaches n. At k=100,000 (10% of n=1M), both are nearly equal.

Part 6 - Priority Queue with Tuples

The standard pattern for a priority queue in Python uses a tuple (priority, item). Python compares tuples lexicographically - the first element (priority) determines order.

import heapq

# Task scheduler: lower number = higher priority
task_queue = []

heapq.heappush(task_queue, (1, "Deploy hotfix"))
heapq.heappush(task_queue, (5, "Update docs"))
heapq.heappush(task_queue, (2, "Fix security bug"))
heapq.heappush(task_queue, (1, "Restart server")) # same priority as first
heapq.heappush(task_queue, (3, "Add feature"))

print("Processing order:")
while task_queue:
priority, task = heapq.heappop(task_queue)
print(f" Priority {priority}: {task}")

Output:

Processing order:
Priority 1: Deploy hotfix
Priority 1: Restart server
Priority 2: Fix security bug
Priority 3: Add feature
Priority 5: Update docs

Handling Equal Priorities: The Counter Trick

When two items have equal priority and the item itself is not comparable (e.g., a custom object), Python will try to compare the items and may raise TypeError.

import heapq
import itertools

# Problem: if priorities are equal, Python compares the second tuple element
# If items don't support comparison, this raises TypeError

class Task:
def __init__(self, name):
self.name = name
# No __lt__ defined

counter = itertools.count() # Unique, ever-increasing integer

task_queue = []

def push_task(priority, task):
# (priority, unique_counter, task)
# Counter ensures no two entries are identical - comparison stops at counter
heapq.heappush(task_queue, (priority, next(counter), task))

push_task(1, Task("Deploy hotfix"))
push_task(1, Task("Restart server")) # Same priority - counter prevents TypeError
push_task(2, Task("Fix bug"))

while task_queue:
priority, _, task = heapq.heappop(task_queue)
print(f" Priority {priority}: {task.name}")

Output:

Priority 1: Deploy hotfix
Priority 1: Restart server
Priority 2: Fix bug

The counter acts as a FIFO tiebreaker - when priorities are equal, earlier-pushed items come out first (counter value is smaller).

Part 7 - queue.PriorityQueue: Thread-Safe Alternative

For multi-threaded applications, use queue.PriorityQueue instead of heapq. It wraps a heap with locking.

from queue import PriorityQueue
import threading
import time

pq = PriorityQueue()

def producer(name, items):
for priority, task in items:
pq.put((priority, task))
time.sleep(0.01)

def consumer(name):
while True:
priority, task = pq.get() # blocks until item available
print(f"{name} processing priority={priority}: {task}")
pq.task_done()
if task == "STOP":
break

# Thread-safe: multiple producers can push simultaneously
t1 = threading.Thread(target=producer, args=("Producer1", [(1, "critical"), (3, "low")]))
t2 = threading.Thread(target=producer, args=("Producer2", [(2, "medium"), (1, "urgent")]))
tc = threading.Thread(target=consumer, args=("Consumer",))

t1.start(); t2.start(); tc.start()
t1.join(); t2.join()
pq.put((99, "STOP")) # Signal consumer to stop
tc.join()
Featureheapqqueue.PriorityQueue
Thread safetyNOT thread-safeThread-safe (locked)
Blocking getNo (raises IndexError)Yes (blocks)
Max size limitNoYes (maxsize param)
PerformanceFaster (no lock)Slower (lock overhead)
Use caseSingle-threadedMulti-threaded
Direct list accessYes (heap[0])No

:::note Use heapq for Single-Threaded, PriorityQueue for Multi-Threaded heapq is faster because it avoids lock overhead - use it in single-threaded algorithms (Dijkstra's, event simulation, batch processing). queue.PriorityQueue is the right choice when multiple threads produce and consume tasks concurrently - its blocking get() and put() methods prevent race conditions. :::

Part 8 - Heap Patterns in Practice

Pattern 1: Streaming Top-K (Fixed Memory)

Process a stream of unknown length, maintaining only the k largest elements seen so far.

import heapq

def top_k_stream(stream, k):
"""
Maintain top-k largest elements from a stream.
Memory: O(k) regardless of stream length.
Time: O(n log k) total.
"""
heap = [] # min-heap of size k

for item in stream:
if len(heap) < k:
heapq.heappush(heap, item)
elif item > heap[0]: # item is larger than current minimum in top-k
heapq.heapreplace(heap, item) # pop min, push item (single O(log k) op)

return sorted(heap, reverse=True) # O(k log k) - small

# Simulate processing 10M streaming numbers
import random
stream = (random.randint(0, 1_000_000) for _ in range(10_000_000))

result = top_k_stream(stream, k=10)
print(f"Top 10: {result}")
# e.g., [999998, 999991, 999988, ...]

# Memory used: only 10 integers - regardless of stream size

Pattern 2: Merge K Sorted Lists

A classic interview problem and common data pipeline operation.

import heapq

def merge_k_sorted(lists):
"""
Merge k sorted lists into one sorted list.
Time: O(n log k) where n = total elements, k = number of lists
Memory: O(k) - only one element from each list in heap at a time

Algorithm:
1. Initialize heap with first element from each list (with list index)
2. Pop minimum → add to result
3. Push next element from same list
4. Repeat until heap empty
"""
heap = []
# Heap entry: (value, list_index, element_index)
for i, lst in enumerate(lists):
if lst:
heapq.heappush(heap, (lst[0], i, 0))

result = []
while heap:
val, list_idx, elem_idx = heapq.heappop(heap)
result.append(val)

next_idx = elem_idx + 1
if next_idx < len(lists[list_idx]):
next_val = lists[list_idx][next_idx]
heapq.heappush(heap, (next_val, list_idx, next_idx))

return result


# Example: merging 4 sorted lists
lists = [
[1, 4, 7, 10],
[2, 5, 8, 11],
[3, 6, 9, 12],
[0, 13, 14, 15],
]

merged = merge_k_sorted(lists)
print(merged)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

# Real-world: merging sorted partitions from distributed sort,
# merging sorted log files from multiple servers

Pattern 3: Running Median with Two Heaps

Maintain the median of a stream efficiently without re-sorting on each insert.

import heapq

class RunningMedian:
"""
Maintains running median of a stream.
Time: O(log n) per insert, O(1) per query.
Memory: O(n)

Algorithm:
- max_heap: lower half of data (negated for max-heap behavior)
- min_heap: upper half of data
- Invariant: max_heap has same size or one more element than min_heap
- Median = max_heap[0] (if sizes differ) or average of both tops (if equal)
"""
def __init__(self):
self.max_heap = [] # lower half - stored negated (max-heap simulation)
self.min_heap = [] # upper half - actual values (min-heap)

def add(self, num):
# Push to max_heap (lower half)
heapq.heappush(self.max_heap, -num)

# Balance: ensure max_heap's max ≤ min_heap's min
if self.max_heap and self.min_heap and (-self.max_heap[0] > self.min_heap[0]):
val = -heapq.heappop(self.max_heap)
heapq.heappush(self.min_heap, val)

# Rebalance sizes: max_heap can have at most one more element than min_heap
if len(self.max_heap) > len(self.min_heap) + 1:
val = -heapq.heappop(self.max_heap)
heapq.heappush(self.min_heap, val)
elif len(self.min_heap) > len(self.max_heap):
val = heapq.heappop(self.min_heap)
heapq.heappush(self.max_heap, -val)

def get_median(self):
if not self.max_heap:
return None
if len(self.max_heap) > len(self.min_heap):
return float(-self.max_heap[0])
return (-self.max_heap[0] + self.min_heap[0]) / 2.0


rm = RunningMedian()
stream = [5, 2, 8, 1, 9, 3, 7, 4, 6]
for num in stream:
rm.add(num)
print(f"Added {num:2d} → median = {rm.get_median()}")

Output:

Added 5 → median = 5.0
Added 2 → median = 3.5
Added 8 → median = 5.0
Added 1 → median = 3.5
Added 9 → median = 5.0
Added 3 → median = 4.0
Added 7 → median = 5.0
Added 4 → median = 4.5
Added 6 → median = 5.0

Pattern 4: Dijkstra's Shortest Path

The canonical graph algorithm that requires a priority queue.

import heapq
from collections import defaultdict

def dijkstra(graph, start):
"""
Find shortest paths from start to all reachable nodes.
Time: O((V + E) log V) where V = vertices, E = edges

Args:
graph: dict of {node: [(neighbor, weight), ...]}
start: starting node

Returns:
dist: dict of {node: shortest_distance_from_start}
"""
# Priority queue: (distance, node)
heap = [(0, start)]
dist = {start: 0}

while heap:
current_dist, node = heapq.heappop(heap)

# If we already found a shorter path to this node, skip
if current_dist > dist.get(node, float("inf")):
continue

for neighbor, weight in graph.get(node, []):
new_dist = current_dist + weight
if new_dist < dist.get(neighbor, float("inf")):
dist[neighbor] = new_dist
heapq.heappush(heap, (new_dist, neighbor))

return dist


# Example: city road network
graph = {
"A": [("B", 4), ("C", 2)],
"B": [("D", 3), ("C", 1)],
"C": [("B", 1), ("D", 5)],
"D": [],
}

distances = dijkstra(graph, "A")
print("Shortest distances from A:")
for node, dist in sorted(distances.items()):
print(f" A → {node}: {dist}")

Output:

Shortest distances from A:
A → A: 0
A → B: 3 (A→C→B: 2+1=3, shorter than A→B: 4)
A → C: 2
A → D: 6 (A→C→B→D: 2+1+3=6, or A→C→D: 2+5=7, minimum is 6)

Part 9 - Event-Driven Simulation

Heaps power discrete event simulation - systems where events are processed in chronological order.

import heapq
from dataclasses import dataclass, field
from typing import Any

@dataclass(order=True)
class Event:
time: float
priority: int # tiebreaker for same-time events
action: Any = field(compare=False) # not compared - use time + priority

class EventSimulator:
"""
Discrete event simulation engine.
Events are processed in time order using a min-heap.
"""
def __init__(self):
self.heap = []
self.counter = 0 # unique tiebreaker
self.current_time = 0.0

def schedule(self, time, action, priority=0):
"""Schedule an event at the given time."""
event = Event(time=time, priority=priority, action=action)
heapq.heappush(self.heap, event)

def run(self, until=float("inf")):
"""Process all events up to `until` time."""
while self.heap and self.heap[0].time <= until:
event = heapq.heappop(self.heap)
self.current_time = event.time
print(f"t={event.time:.1f}: {event.action}")
# In a real simulator, the action function would schedule new events

# Simulation: server processing requests
sim = EventSimulator()
sim.schedule(0.0, "Server starts")
sim.schedule(0.5, "Request #1 arrives")
sim.schedule(0.8, "Request #2 arrives")
sim.schedule(1.2, "Request #1 processed")
sim.schedule(1.5, "Request #3 arrives")
sim.schedule(2.0, "Request #2 processed")
sim.schedule(2.3, "Request #3 processed")

sim.run(until=5.0)

Output:

t=0.0: Server starts
t=0.5: Request #1 arrives
t=0.8: Request #2 arrives
t=1.2: Request #1 processed
t=1.5: Request #3 arrives
t=2.0: Request #2 processed
t=2.3: Request #3 processed

Interview Questions

Q1: What is the heap invariant and how does Python's heapq maintain it?

Answer: The heap invariant for a min-heap: every parent element must be less than or equal to both its children. Python's heapq maintains this invariant through two operations: sift-up (after insertion) and sift-down (after removal). After heappush: the new element is appended to the end of the list, then repeatedly swapped with its parent (index (i-1)//2) until it is larger than its parent or reaches the root - O(log n) swaps maximum. After heappop: the minimum (index 0) is returned, the last element is placed at the root, then it is repeatedly swapped with its smaller child until both children are larger or it reaches a leaf - O(log n) swaps maximum. The invariant is never violated externally because the list is modified only through heapq functions.

Q2: Why is heapify() O(n) rather than O(n log n)?

Answer: A naive approach - pushing n elements one by one - costs O(n log n). heapify() uses a bottom-up algorithm: it starts from the last non-leaf node (index n//2 - 1) and calls sift-down on each node moving toward the root. The key insight: nodes near the leaves (the majority) have almost no sifting to do. Leaf nodes do O(1) work. Nodes at height 1 do O(1) sift-down. Nodes at height h do O(h) work. The total cost: Σ (n / 2^h) × h for h from 0 to log(n) = O(n). This geometric series sums to O(n). In contrast, inserting n elements one-by-one works top-down: each insertion may sift from leaf to root - O(log n) per element, O(n log n) total.

Q3: How do you implement a max-heap using Python's heapq?

Answer: Python's heapq only provides a min-heap. The standard workaround is value negation: store -x instead of x. The smallest negated value corresponds to the largest original value. To push: heapq.heappush(heap, -x). To pop the maximum: max_val = -heapq.heappop(heap). To peek: max_val = -heap[0]. This works correctly for integers and floats. For tuples (priority queues), negate only the priority field: heapq.heappush(heap, (-priority, item)). For custom objects that cannot be negated, you must implement a wrapper class with an inverted __lt__, or use a different data structure like sortedcontainers.SortedList.

Q4: When should you use heapq.nlargest(k, data) instead of sorted(data, reverse=True)[:k]?

Answer: Use heapq.nlargest(k, data) when k is significantly smaller than n. nlargest maintains a min-heap of size k, scanning all n elements once - O(n log k). sorted()[:k] sorts all n elements first - O(n log n). The crossover point: when k approaches n/2, the constant factors make sorted() competitive. CPython's heapq.nlargest internally uses sorted() when k >= n/2. Rule of thumb: for k less than ~10-20% of n, use nlargest; for larger k or when you need the full sorted list anyway, use sorted(). For k=1, always use max() - O(n) with minimal overhead.

Q5: Explain why a heap is more appropriate than a sorted list for a real-time task scheduler that receives continuous new tasks.

Answer: A sorted list supports O(1) access to the minimum (first element) but O(n) insertion - you must find the correct position and shift elements. A heap supports O(log n) for both insertion and minimum retrieval. For a scheduler receiving n tasks per second continuously: sorted list insertion = O(n) per task → O(n²) total for n tasks. Heap insertion = O(log n) per task → O(n log n) total. The heap's O(log n) insertion is the critical advantage in dynamic scenarios where the data is constantly changing. If you received all tasks at once (static scenario), heapify + heappop = O(n) + O(n log n) vs sorted list: the same. But schedulers are never static - they receive tasks continuously while processing others.

Q6: Describe the "merge K sorted lists" problem and explain why a heap solves it efficiently.

Answer: Given k sorted lists with n total elements, merge them into one sorted list. Naive approach: concatenate all lists and sort - O(n log n). Better: use a min-heap of size k. Initialize with the first element from each list. Repeat: pop the minimum (O(log k)), add to result, push the next element from the same list (O(log k)). Total: n pop+push pairs, each O(log k) → O(n log k). Since k ≤ n, this is always at least as good as O(n log n) and better when k << n. Real-world applications: merging sorted log files from multiple servers, merging sorted database partitions after a distributed sort (external sort), combining sorted result streams from multiple API sources. The heap maintains exactly the "frontier" - one candidate from each source - allowing globally correct ordering with minimal memory (O(k) heap size vs O(n) for full concatenation).

Practice Challenges

Beginner: Basic Heap Operations

Predict the output of each heappop call and explain why.

import heapq

data = [10, 5, 8, 3, 1, 7, 2]
heapq.heapify(data)

print(heapq.heappop(data))
print(heapq.heappop(data))
heapq.heappush(data, 4)
print(heapq.heappop(data))
print(data[0]) # peek without popping
Solution

Output:

1
2
3
4

Explanation:

After heapify([10, 5, 8, 3, 1, 7, 2]), the heap looks like [1, 3, 2, 5, 10, 7, 8] (or similar valid min-heap arrangement - exact layout varies, but 1 is always at index 0).

  • First heappop(): removes and returns 1 (the global minimum). Heap rearranges.
  • Second heappop(): removes and returns 2 (next minimum).
  • heappush(data, 4): inserts 4 - heap adjusts to maintain invariant. Heap now has elements {3, 4, 5, 7, 8, 10}.
  • Third heappop(): removes and returns 3 (current minimum).
  • data[0]: peeks at root without removing - returns 4 (next minimum after 3 was removed).

Key insight: heappop() always returns the current minimum, but the internal array layout is not necessarily sorted - only the root is guaranteed to be the minimum.

Intermediate: K-th Largest Element

Implement a function that finds the k-th largest element in a list using a heap. It should be more efficient than fully sorting the list.

data = [3, 2, 1, 5, 6, 4]
k = 2
# Expected: 5 (the 2nd largest element)
Solution
import heapq

def kth_largest(nums, k):
"""
Find the k-th largest element in nums.

Strategy: maintain a min-heap of size k.
- The heap always contains the k largest elements seen so far.
- The minimum of those k elements is the k-th largest.
- heap[0] is the k-th largest.

Time: O(n log k)
Space: O(k)
"""
heap = [] # min-heap of size k

for num in nums:
heapq.heappush(heap, num)
if len(heap) > k:
heapq.heappop(heap) # remove the smallest to maintain size k

# heap[0] is the k-th largest (smallest of the top-k)
return heap[0]


# Tests
print(kth_largest([3, 2, 1, 5, 6, 4], k=2)) # 5
print(kth_largest([3, 2, 3, 1, 2, 4, 5, 5, 6], k=4)) # 4

# Alternative: heapq.nlargest (same complexity, less code)
def kth_largest_v2(nums, k):
return heapq.nlargest(k, nums)[-1]

print(kth_largest_v2([3, 2, 1, 5, 6, 4], k=2)) # 5

Why a min-heap of size k works:

  • We keep the k largest elements seen so far in the heap
  • The heap's minimum (heap[0]) is the smallest of those k elements
  • That is by definition the k-th largest element overall
  • When we see a new element larger than heap[0], we pop the current k-th largest and push the new one

Advanced: Design a Task Scheduler with Priorities and Dependencies

Design a task scheduler that processes tasks by priority, but only schedules a task when all its dependencies are complete.

tasks = {
"A": {"priority": 1, "deps": []},
"B": {"priority": 2, "deps": ["A"]},
"C": {"priority": 1, "deps": ["A"]},
"D": {"priority": 3, "deps": ["B", "C"]},
"E": {"priority": 2, "deps": []},
}
# Expected execution order based on priority (lower = higher priority)
# and dependency constraints
Solution
import heapq
from collections import defaultdict

def schedule_tasks(tasks):
"""
Topological sort with priority queue.

Algorithm:
1. Build in-degree count and reverse dependency graph
2. Initialize heap with all tasks that have no dependencies (in-degree 0)
3. Pop the highest-priority task (lowest priority number)
4. Mark it complete, decrement in-degree of dependents
5. Push any dependent tasks that now have in-degree 0
6. Repeat until heap empty or tasks remain (cycle detection)

Time: O((V + E) log V) - each task pushed/popped once, each edge processed once
Space: O(V + E)
"""
# Build dependency tracking
in_degree = {task: len(info["deps"]) for task, info in tasks.items()}
dependents = defaultdict(list) # task → list of tasks that depend on it

for task, info in tasks.items():
for dep in info["deps"]:
dependents[dep].append(task)

# Priority queue: (priority, task_name)
heap = []
for task, degree in in_degree.items():
if degree == 0: # No dependencies - ready to schedule immediately
heapq.heappush(heap, (tasks[task]["priority"], task))

execution_order = []

while heap:
priority, task = heapq.heappop(heap)
execution_order.append(task)
print(f"Executing: {task} (priority={priority})")

# Unlock dependent tasks
for dependent in dependents[task]:
in_degree[dependent] -= 1
if in_degree[dependent] == 0:
dep_priority = tasks[dependent]["priority"]
heapq.heappush(heap, (dep_priority, dependent))

# Check for cycles
if len(execution_order) < len(tasks):
remaining = set(tasks) - set(execution_order)
raise ValueError(f"Circular dependency detected among: {remaining}")

return execution_order


tasks = {
"A": {"priority": 1, "deps": []},
"B": {"priority": 2, "deps": ["A"]},
"C": {"priority": 1, "deps": ["A"]},
"D": {"priority": 3, "deps": ["B", "C"]},
"E": {"priority": 2, "deps": []},
}

order = schedule_tasks(tasks)
print(f"\nFinal order: {order}")

Output:

Executing: A (priority=1) ← A and E both ready; A has priority 1 (lower = higher priority)
Executing: E (priority=2) ← E was ready from start, priority 2
Executing: C (priority=1) ← A completed; C and B unlocked; C has priority 1
Executing: B (priority=2) ← B unlocked; priority 2
Executing: D (priority=3) ← D unlocked after both B and C; priority 3

Final order: ['A', 'E', 'C', 'B', 'D']

This pattern powers CI/CD pipeline scheduling, build system parallelization (Makefile, Bazel), and workflow orchestration systems (Airflow, Prefect).

Quick Reference

OperationCodeComplexity
Create heap from listheapq.heapify(lst)O(n)
Push elementheapq.heappush(heap, x)O(log n)
Pop minimumheapq.heappop(heap)O(log n)
Peek minimumheap[0]O(1)
Push then popheapq.heappushpop(heap, x)O(log n)
Pop then pushheapq.heapreplace(heap, x)O(log n)
K largestheapq.nlargest(k, data)O(n log k)
K smallestheapq.nsmallest(k, data)O(n log k)
Max-heap pushheapq.heappush(heap, -x)O(log n)
Max-heap pop-heapq.heappop(heap)O(log n)
Thread-safe priority queuequeue.PriorityQueue()O(log n) + lock
Equal-priority tiebreaker(priority, next(counter), item)-
PatternWhen to Use
heapq min-heapSingle-threaded priority retrieval
Negation max-heapWhen you need maximum-first order
(priority, counter, item)Safe tuples with non-comparable items
queue.PriorityQueueMulti-threaded producer-consumer
nlargest(k)Top-k elements, k << n
Two-heap medianRunning median of stream
Heap + in-degreeTopological sort with priorities
Heap in DijkstraShortest path in weighted graphs

Key Takeaways

  • A heap is not sorted - only the minimum (root) is guaranteed to be the global minimum; siblings can be in any order; the flat list is not sorted
  • heapify() is O(n), not O(n log n) - always use it when constructing a heap from an existing list; never use n individual heappush() calls when you have the full data upfront
  • Python's heapq is always a min-heap - simulate max-heap by negating values; for tuples, negate only the priority field: heapq.heappush(heap, (-priority, item))
  • nlargest(k) beats sorted() when k << n - O(n log k) vs O(n log n); for k=1 use max(); for k ≥ n/2, sorted() is better (CPython's nlargest handles this automatically)
  • Use a counter tuple (priority, counter, item) for safe equal-priority handling - prevents TypeError when the item type does not support comparison, and provides FIFO behavior as a tiebreaker
  • Use queue.PriorityQueue for multi-threaded systems - heapq is not thread-safe; PriorityQueue wraps it with locks and blocking get()/put() for safe producer-consumer patterns
  • Heap invariant: for any node at index i, its parent at (i-1)//2 is always ≤ both children at 2i+1 and 2i+2; this enables O(log n) insert and extract-min without maintaining full sorted order
© 2026 EngineersOfAI. All rights reserved.