Skip to main content

Custom Sorting with Key - Sorting Complex Objects

Reading time: ~18 minutes | Level: Foundation → Engineering

Here is code that reveals a performance trap most Python developers don't notice.

import time
from operator import itemgetter

users = [
{"name": f"User{i}", "age": i % 50, "score": i % 100}
for i in range(1_000_000)
]

# Method A: lambda key
start = time.perf_counter()
sorted_a = sorted(users, key=lambda u: u["age"])
lambda_time = time.perf_counter() - start

# Method B: operator.itemgetter key
start = time.perf_counter()
sorted_b = sorted(users, key=itemgetter("age"))
getter_time = time.perf_counter() - start

print(f"lambda key: {lambda_time:.3f}s")
print(f"itemgetter key: {getter_time:.3f}s")
print(f"itemgetter is {lambda_time / getter_time:.1f}x faster")

Output (approximate):

lambda key: 0.412s
itemgetter key: 0.187s
itemgetter is 2.2x faster

Both produce identical results. The 2x speedup comes from itemgetter being a C-level function - no Python function call overhead per element. For 1 million elements, this is a 225ms difference. In a web service handling thousands of sort operations per second, it compounds significantly.

What You Will Learn

  • How the key= parameter works internally - the decorate-sort-undecorate mechanism CPython uses
  • lambda as a sort key - syntax, use cases, and limitations
  • operator.attrgetter and operator.itemgetter - C-level performance and multi-field extraction
  • Sorting objects without natural ordering - implementing __lt__
  • functools.cmp_to_key - converting legacy comparators to key functions
  • Case-insensitive string sorting and Unicode-aware sorting
  • Multi-key sorting with tuple keys and handling mixed ascending/descending
  • Sorting None values safely - the (x is None, x) pattern
  • Sorting dicts by value - the canonical pattern
  • Performance: why the key function is called O(n) times, not O(n log n)
  • Real-world patterns: leaderboards, ranking systems, priority queues with custom ordering

Prerequisites

  • Python functions, lambda expressions (Module 3)
  • Python lists, dicts, tuples (Module 4, Lessons 1–4)
  • Sorting Mechanisms - Timsort (Lesson 10 of this module)
  • Basic understanding of Big-O notation (Lesson 9)

Part 1 - How the key= Parameter Works Internally

When you write sorted(data, key=fn), Python does not call fn during comparisons. It calls fn exactly once per element before sorting, caches the results, then sorts using the cached keys.

This is the Schwartzian transform implemented at the C level. Your key function, no matter how expensive, is called only once per element.

# Verify: key function is called exactly n times
call_log = []

def traced_key(x):
call_log.append(x["id"])
return x["score"]

data = [{"id": i, "score": i % 10} for i in range(100)]
result = sorted(data, key=traced_key)

print(f"Elements: {len(data)}")
print(f"Key fn calls: {len(call_log)}")
print(f"Unique calls: {len(set(call_log))}") # each element called exactly once

Output:

Elements: 100
Key fn calls: 100
Unique calls: 100

Part 2 - Lambda as Sort Key

Lambda is the most common way to define a sort key inline.

Basic Lambda Patterns

# Sort list of dicts by a field
users = [
{"name": "Alice", "age": 31, "score": 95.5},
{"name": "Bob", "age": 25, "score": 88.0},
{"name": "Carol", "age": 28, "score": 92.3},
]

# By age (ascending)
by_age = sorted(users, key=lambda u: u["age"])
# [Bob(25), Carol(28), Alice(31)]

# By score (descending)
by_score = sorted(users, key=lambda u: u["score"], reverse=True)
# [Alice(95.5), Carol(92.3), Bob(88.0)]

# By name length, then alphabetically
by_name = sorted(users, key=lambda u: (len(u["name"]), u["name"]))
# [Bob, Alice, Carol] → (3, "Bob"), (5, "Alice"), (5, "Carol")

Sorting by Computed Value

# Sort by absolute value
nums = [-5, 3, -1, 8, -2, 4]
sorted_abs = sorted(nums, key=abs)
print(sorted_abs) # [-1, -2, 3, 4, -5, 8]

# Sort strings by last character
words = ["Python", "Java", "Rust", "Go"]
sorted_by_last = sorted(words, key=lambda w: w[-1])
print(sorted_by_last) # ['Java', 'Go', 'Python', 'Rust']
# 'a' 'o' 'n' 't'

# Sort by number of vowels in a word
def vowel_count(s):
return sum(1 for c in s.lower() if c in "aeiou")

words = ["Python", "JavaScript", "Go", "Rust", "Erlang"]
sorted_by_vowels = sorted(words, key=vowel_count)
print(sorted_by_vowels) # ['Go', 'Rust', 'Python', 'Erlang', 'JavaScript']
# vowels: 1 1 2 2 3

When Lambda Falls Short

# Lambda limitations:
# 1. Cannot contain statements (only expressions)
# 2. No named function for reuse or testing
# 3. Harder to read for complex logic
# 4. Slower than C-level functions (operator module)

# For complex or reused logic, define a named function
def ranking_key(product):
"""Multi-criteria ranking: in-stock first, then by rating, then by review count."""
in_stock_rank = 0 if product["in_stock"] else 1 # 0 sorts before 1
rating = -product["rating"] # negate for descending
reviews = -product["review_count"] # negate for descending
return (in_stock_rank, rating, reviews)

products = [...]
sorted_products = sorted(products, key=ranking_key)

Part 3 - operator.itemgetter and operator.attrgetter

The operator module provides C-level implementations of common operations, including field extraction for sorting.

operator.itemgetter: For Dicts and Sequences

from operator import itemgetter

users = [
{"name": "Alice", "age": 31, "dept": "Engineering"},
{"name": "Bob", "age": 25, "dept": "Marketing"},
{"name": "Carol", "age": 28, "dept": "Engineering"},
{"name": "Dave", "age": 31, "dept": "Marketing"},
]

# Single field - equivalent to lambda u: u["age"]
by_age = sorted(users, key=itemgetter("age"))

# Multiple fields - equivalent to lambda u: (u["dept"], u["age"])
by_dept_age = sorted(users, key=itemgetter("dept", "age"))

for u in by_dept_age:
print(u["dept"], u["age"], u["name"])

Output:

Engineering 28 Carol
Engineering 31 Alice
Marketing 25 Bob
Marketing 31 Dave

itemgetter("dept", "age") extracts both fields as a tuple - equivalent to the lambda lambda u: (u["dept"], u["age"]) but implemented in C.

# itemgetter also works on lists and tuples (by index)
data = [(3, "b"), (1, "a"), (2, "c"), (1, "b")]

# Sort by index 0, then index 1
by_both = sorted(data, key=itemgetter(0, 1))
print(by_both) # [(1, 'a'), (1, 'b'), (2, 'c'), (3, 'b')]

operator.attrgetter: For Objects and Dataclasses

from operator import attrgetter
from dataclasses import dataclass

@dataclass
class Employee:
name: str
department: str
salary: float
years: int

employees = [
Employee("Alice", "Engineering", 95000, 5),
Employee("Bob", "Marketing", 80000, 3),
Employee("Carol", "Engineering", 110000, 8),
Employee("Dave", "Marketing", 75000, 2),
Employee("Eve", "Engineering", 95000, 3),
]

# Single attribute - equivalent to lambda e: e.salary
by_salary = sorted(employees, key=attrgetter("salary"), reverse=True)

# Multiple attributes - equivalent to lambda e: (e.department, e.salary)
by_dept_salary = sorted(employees, key=attrgetter("department", "salary"))

for e in by_dept_salary:
print(f"{e.department:15} ${e.salary:>9,.0f} {e.name}")

Output:

Engineering $ 95,000 Alice
Engineering $ 95,000 Eve
Engineering $110,000 Carol
Marketing $ 75,000 Dave
Marketing $ 80,000 Bob

attrgetter for Nested Attributes

from operator import attrgetter

class Address:
def __init__(self, city, country):
self.city = city
self.country = country

class Person:
def __init__(self, name, address):
self.name = name
self.address = address

people = [
Person("Alice", Address("London", "UK")),
Person("Bob", Address("Paris", "France")),
Person("Carol", Address("Berlin", "Germany")),
Person("Dave", Address("Lyon", "France")),
]

# Sort by country, then city - accessing nested attributes with dot notation
by_location = sorted(people, key=attrgetter("address.country", "address.city"))
for p in by_location:
print(f"{p.address.country:10} {p.address.city:10} {p.name}")

Output:

France Lyon Dave
France Paris Bob
Germany Berlin Carol
UK London Alice

Performance Comparison: lambda vs itemgetter vs attrgetter

import timeit
from operator import itemgetter, attrgetter
from dataclasses import dataclass

# Dict benchmark
users = [{"name": f"User{i}", "age": i % 80} for i in range(100_000)]

lambda_dict = timeit.timeit(
"sorted(users, key=lambda u: u['age'])",
globals={"users": users}, number=50
) / 50

getter_dict = timeit.timeit(
"sorted(users, key=itemgetter('age'))",
globals={"users": users, "itemgetter": itemgetter}, number=50
) / 50

print(f"Dict sort - lambda: {lambda_dict*1000:.1f}ms")
print(f"Dict sort - itemgetter: {getter_dict*1000:.1f}ms")
print(f"Speedup: {lambda_dict/getter_dict:.1f}x")

Output (approximate):

Dict sort - lambda: 42.3ms
Dict sort - itemgetter: 19.1ms
Speedup: 2.2x

:::tip When to Use itemgetter vs lambda Use itemgetter/attrgetter when: sorting in hot paths (called frequently), sorting large datasets (N > 100K), or the key is a simple field extraction. Use lambda when: the key involves computation, multiple different fields with transformations (e.g., -x["score"]), or the logic is complex. itemgetter("a", "b") cannot negate fields - you need lambda or a named function for descending mixed-direction sorts. :::

Part 4 - Sorting Objects Without Natural Ordering

When sorting custom class instances, Python uses __lt__ (less-than). Without it, sorted() raises TypeError.

class Task:
def __init__(self, name, priority, deadline):
self.name = name
self.priority = priority
self.deadline = deadline

tasks = [Task("Write tests", 2, "2024-12-01"),
Task("Fix bug", 1, "2024-11-30"),
Task("Review PR", 3, "2024-12-02")]

# Without __lt__: TypeError: '<' not supported between instances of 'Task' and 'Task'
# tasks.sort() ← would fail

Approach 1: Implement __lt__

class Task:
def __init__(self, name, priority, deadline):
self.name = name
self.priority = priority
self.deadline = deadline

def __lt__(self, other):
"""Primary: priority ascending. Tie-break: deadline ascending."""
if self.priority != other.priority:
return self.priority < other.priority
return self.deadline < other.deadline

def __repr__(self):
return f"Task({self.name!r}, priority={self.priority})"

tasks = [Task("Write tests", 2, "2024-12-01"),
Task("Fix bug", 1, "2024-11-30"),
Task("Review PR", 3, "2024-12-02")]

tasks.sort() # Uses __lt__
print(tasks)
# [Task('Fix bug', priority=1), Task('Write tests', priority=2), Task('Review PR', priority=3)]

Approach 2: Use @dataclass(order=True)

from dataclasses import dataclass, field

@dataclass(order=True)
class Task:
# Fields are compared in definition order
priority: int # compared first
deadline: str # compared second (if priority equal)
name: str = field(compare=False) # not used in comparison

tasks = [
Task(2, "2024-12-01", "Write tests"),
Task(1, "2024-11-30", "Fix bug"),
Task(3, "2024-12-02", "Review PR"),
]

tasks.sort()
for t in tasks:
print(f"Priority {t.priority}: {t.name} (due {t.deadline})")

Output:

Priority 1: Fix bug (due 2024-11-30)
Priority 2: Write tests (due 2024-12-01)
Priority 3: Review PR (due 2024-12-02)

Approach 3: Use key= Without Modifying the Class

class Task:
def __init__(self, name, priority, deadline):
self.name = name
self.priority = priority
self.deadline = deadline

# Sort without touching the class - most flexible
tasks.sort(key=lambda t: (t.priority, t.deadline))

This is usually preferred in production code - it keeps the class clean and makes the sort criterion explicit at the call site.

Part 5 - functools.cmp_to_key: Adapting Legacy Comparators

Python 2 had a cmp parameter that accepted a comparator function returning negative/0/positive. Python 3 removed it. functools.cmp_to_key adapts old-style comparators for use with key=.

from functools import cmp_to_key

# Old-style comparator: returns negative, 0, or positive
def compare_versions(v1, v2):
"""
Compare version strings like "1.10.2" correctly.
"1.2" < "1.10" (not lexicographic - "10" > "2" numerically)
"""
parts1 = [int(x) for x in v1.split(".")]
parts2 = [int(x) for x in v2.split(".")]

for p1, p2 in zip(parts1, parts2):
if p1 < p2: return -1
if p1 > p2: return 1

# If all shared parts equal, longer version is greater
return len(parts1) - len(parts2)

versions = ["1.10.2", "1.9.0", "2.0.0", "1.2.3", "1.10.1"]
sorted_versions = sorted(versions, key=cmp_to_key(compare_versions))
print(sorted_versions)
# ['1.2.3', '1.9.0', '1.10.1', '1.10.2', '2.0.0']

Note how "1.9.0" correctly sorts before "1.10.0" - lexicographic string comparison would put "1.9" after "1.10" because "9" > "1". The comparator handles this correctly.

:::warning cmp_to_key Has Performance Overhead A comparator function is called O(n log n) times during sorting (once per comparison). A key function is called O(n) times. For large datasets, cmp_to_key is significantly slower than a properly designed key function. Use it for: version strings, Roman numerals, natural sort order, or any case where a comparison function is more natural. Avoid it when a key function can express the same logic. :::

# The key-function equivalent for version sorting (faster)
def version_key(v):
return tuple(int(x) for x in v.split("."))

versions = ["1.10.2", "1.9.0", "2.0.0", "1.2.3", "1.10.1"]
sorted_versions = sorted(versions, key=version_key)
print(sorted_versions)
# ['1.2.3', '1.9.0', '1.10.1', '1.10.2', '2.0.0']

The key function version is faster and more readable. Prefer it when possible.

Part 6 - Case-Insensitive and Unicode-Aware Sorting

Case-Insensitive String Sorting

words = ["Banana", "apple", "Cherry", "DATE", "elderberry"]

# Default: case-sensitive (uppercase before lowercase)
print(sorted(words))
# ['Banana', 'Cherry', 'DATE', 'apple', 'elderberry']

# Case-insensitive: key=str.lower
print(sorted(words, key=str.lower))
# ['apple', 'Banana', 'Cherry', 'DATE', 'elderberry']

# Alternative: key=str.casefold (more thorough Unicode handling)
print(sorted(words, key=str.casefold))
# ['apple', 'Banana', 'Cherry', 'DATE', 'elderberry']

str.casefold() is more aggressive than str.lower() for Unicode. For example, the German "ß" casefolds to "ss", while .lower() leaves it as "ß". For international data, always use casefold.

Locale-Aware Sorting with locale.strxfrm

import locale

# Set locale to user's system locale
locale.setlocale(locale.LC_ALL, "")

names = ["Åse", "Adam", "Bjorn", "Åke", "Anders"]

# Default sort - wrong for Scandinavian alphabets (Å should sort after Z)
print(sorted(names))
# ['Adam', 'Anders', 'Bjorn', 'Åke', 'Åse'] ← WRONG

# Locale-aware sort
print(sorted(names, key=locale.strxfrm))
# ['Adam', 'Anders', 'Bjorn', 'Åke', 'Åse'] ← correct for Swedish locale
# (exact order depends on system locale setting)

For production systems serving international users, use the pyuca library or ICU for proper Unicode collation algorithm compliance.

Part 7 - Multi-Key Sorting: Mixed Ascending and Descending

The canonical approach is a tuple key. For numeric fields, negate to reverse direction.

students = [
{"name": "Alice", "grade": "A", "score": 95, "age": 20},
{"name": "Bob", "grade": "B", "score": 80, "age": 22},
{"name": "Carol", "grade": "A", "score": 95, "age": 19},
{"name": "Dave", "grade": "B", "score": 88, "age": 21},
{"name": "Eve", "grade": "A", "score": 88, "age": 20},
]

# Sort: grade ascending, score descending, age ascending, name ascending
result = sorted(
students,
key=lambda s: (s["grade"], -s["score"], s["age"], s["name"])
)

for s in result:
print(f"{s['grade']} score={s['score']} age={s['age']} {s['name']}")

Output:

A score=95 age=19 Carol
A score=95 age=20 Alice
A score=88 age=20 Eve
B score=88 age=21 Dave
B score=80 age=22 Bob

Handling Mixed Ascending/Descending for Non-Numeric Fields

When you cannot negate (e.g., strings), use the two-pass stable sort approach:

records = [
{"dept": "Engineering", "name": "Alice", "score": 95},
{"dept": "Marketing", "name": "Bob", "score": 80},
{"dept": "Engineering", "name": "Carol", "score": 88},
{"dept": "Marketing", "name": "Dave", "score": 95},
]

# Goal: department DESCENDING (Z before A), score DESCENDING within dept

# Two-pass stable sort:
records.sort(key=lambda r: r["score"], reverse=True) # Step 1: secondary key
records.sort(key=lambda r: r["dept"], reverse=True) # Step 2: primary key

for r in records:
print(f"{r['dept']:15} {r['score']} {r['name']}")

Output:

Marketing 95 Dave
Marketing 80 Bob
Engineering 95 Alice
Engineering 88 Carol

Part 8 - Sorting Dicts by Value

A very common pattern in production: sort a frequency map, ranking dict, or score table.

# Word frequency count
text = "the quick brown fox jumps over the lazy dog the fox"
word_counts = {}
for word in text.split():
word_counts[word] = word_counts.get(word, 0) + 1

print(word_counts)
# {'the': 3, 'quick': 1, 'brown': 1, 'fox': 2, ...}

# Sort by frequency descending, then alphabetically for ties
ranked = sorted(
word_counts.items(),
key=lambda item: (-item[1], item[0])
)

for word, count in ranked:
print(f"{word:10}: {count}")

Output:

the : 3
fox : 2
brown : 1
dog : 1
jumps : 1
lazy : 1
over : 1
quick : 1
# Sorting a dict by value - three common patterns
scores = {"Alice": 95, "Bob": 80, "Carol": 88, "Dave": 72}

# Pattern 1: sorted() returns list of tuples
ranked_items = sorted(scores.items(), key=lambda x: x[1], reverse=True)
print(ranked_items) # [('Alice', 95), ('Carol', 88), ('Bob', 80), ('Dave', 72)]

# Pattern 2: itemgetter for clean syntax
from operator import itemgetter
ranked_items = sorted(scores.items(), key=itemgetter(1), reverse=True)

# Pattern 3: sorted dict (Python 3.7+ preserves insertion order)
ranked_dict = dict(sorted(scores.items(), key=itemgetter(1), reverse=True))
print(ranked_dict) # {'Alice': 95, 'Carol': 88, 'Bob': 80, 'Dave': 72}

Part 9 - Sorting None Values Safely

# Problem: Python 3 raises TypeError when comparing None with other types
data = [3, None, 1, None, 2]
data.sort() # TypeError: '<' not supported between 'NoneType' and 'int'

Pattern: Sort None to End

data = [3, None, 1, None, 2]

# Trick: (x is None, x if x is not None else 0)
# (False, 3) < (True, 0) because False < True
# So all non-None values sort before None values
sorted_none_last = sorted(data, key=lambda x: (x is None, x if x is not None else 0))
print(sorted_none_last) # [1, 2, 3, None, None]

Pattern: Sort None to Front

data = [3, None, 1, None, 2]

# (x is not None, x if x is not None else 0)
# (False, 0) for None - sorts before (True, 3) for non-None
sorted_none_first = sorted(data, key=lambda x: (x is not None, x if x is not None else 0))
print(sorted_none_first) # [None, None, 1, 2, 3]

Pattern: Sort Dicts with Optional Fields

records = [
{"name": "Alice", "score": 95},
{"name": "Bob", "score": None}, # score pending
{"name": "Carol", "score": 88},
{"name": "Dave", "score": None},
{"name": "Eve", "score": 72},
]

# Sort by score descending, None scores go to end
result = sorted(
records,
key=lambda r: (r["score"] is None, -(r["score"] or 0))
)

for r in result:
score_str = str(r["score"]) if r["score"] is not None else "pending"
print(f"{r['name']:8}: {score_str}")

Output:

Alice : 95
Carol : 88
Eve : 72
Bob : pending
Dave : pending

Part 10 - Real-World Patterns

Ranking System: E-Commerce Product Sort

from dataclasses import dataclass
from typing import Optional

@dataclass
class Product:
name: str
in_stock: bool
rating: float # 0.0 to 5.0
review_count: int
price: float
is_prime: bool

def product_sort_key(p: Product):
"""
E-commerce ranking logic:
1. In-stock items first
2. Prime items before non-Prime (for Prime members)
3. Higher rating first
4. More reviews first (proxy for trustworthiness)
5. Lower price first (for equal rating/reviews)
"""
return (
not p.in_stock, # False (0) sorts before True (1)
not p.is_prime, # Prime first
-round(p.rating, 1), # Descending rating (round to avoid float noise)
-p.review_count, # More reviews first
p.price, # Ascending price
)

products = [
Product("Widget A", True, 4.5, 1200, 29.99, True),
Product("Widget B", False, 4.8, 300, 24.99, True),
Product("Widget C", True, 4.5, 3500, 34.99, False),
Product("Widget D", True, 4.3, 800, 19.99, True),
Product("Widget E", True, 4.8, 150, 39.99, True),
]

ranked = sorted(products, key=product_sort_key)
for i, p in enumerate(ranked, 1):
stock = "IN" if p.in_stock else "OUT"
prime = "Prime" if p.is_prime else " "
print(f"{i}. [{stock}] {prime}{p.rating} ({p.review_count:>4} reviews) ${p.price:.2f} {p.name}")

Output:

1. [IN ] Prime ★4.8 ( 150 reviews) $39.99 Widget E
2. [IN ] Prime ★4.5 (1200 reviews) $29.99 Widget A
3. [IN ] ★4.5 (3500 reviews) $34.99 Widget C
4. [IN ] Prime ★4.3 ( 800 reviews) $19.99 Widget D
5. [OUT] Prime ★4.8 ( 300 reviews) $24.99 Widget B

Leaderboard with Time-Based Tiebreaking

from datetime import datetime

players = [
{"name": "alice", "score": 5000, "achieved_at": datetime(2024, 1, 15, 10, 30)},
{"name": "bob", "score": 7200, "achieved_at": datetime(2024, 1, 15, 9, 15)},
{"name": "carol", "score": 5000, "achieved_at": datetime(2024, 1, 15, 8, 45)},
{"name": "dave", "score": 7200, "achieved_at": datetime(2024, 1, 15, 11, 0)},
{"name": "eve", "score": 6100, "achieved_at": datetime(2024, 1, 15, 9, 50)},
]

# Rank by: score desc, then achieved_at asc (earlier = better for ties)
ranked = sorted(
players,
key=lambda p: (-p["score"], p["achieved_at"])
)

for rank, p in enumerate(ranked, 1):
time_str = p["achieved_at"].strftime("%H:%M")
print(f"#{rank} {p['name']:8} {p['score']:,} ({time_str})")

Output:

#1 bob 7,200 (09:15) ← bob achieved 7200 before dave
#2 dave 7,200 (11:00)
#3 eve 6,100 (09:50)
#4 carol 5,000 (08:45) ← carol achieved 5000 before alice
#5 alice 5,000 (10:30)

Interview Questions

Q1: What is the difference between using lambda and operator.itemgetter as a sort key?

Answer: Both extract a field value for comparison, but itemgetter is implemented in C while lambda is a Python function. For each element, lambda u: u["age"] incurs Python function call overhead (creating a frame, resolving the variable, returning). itemgetter("age") is a C-level callable with minimal overhead. In practice, itemgetter is 2-3x faster for field extraction on large lists. The functional difference: itemgetter("a", "b") returns a tuple (u["a"], u["b"]) - equivalent to lambda u: (u["a"], u["b"]). Use lambda when you need computation (negation, conditional logic); use itemgetter/attrgetter for pure field extraction in performance-sensitive code.

Q2: How many times does Python call the key function when sorting n elements?

Answer: Exactly n times - once per element. CPython's sort implementation computes and caches all key values before beginning the sort. During the O(n log n) comparison phase, it compares the pre-computed key values directly - the key function is never called again. This is the Schwartzian transform applied at the C level. The implication: even expensive key functions (database lookups, API calls, file I/O) are only executed once per element, making them safe to use as sort keys. Total key function cost: O(n × key_function_complexity).

Q3: When would you use functools.cmp_to_key and when should you avoid it?

Answer: Use cmp_to_key when: (1) adapting legacy Python 2 comparators, (2) the ordering is most naturally expressed as a three-way comparison (negative/zero/positive), such as version strings, semantic versioning, Roman numerals, or natural sort order that interleaves numbers and text. Avoid it when a key function can express the same logic - cmp_to_key comparators are called O(n log n) times during sorting, while key functions are called O(n) times. For 1 million elements, that is 20 million vs 1 million calls - a 20x overhead difference. Always check if you can express the comparator as a key function first.

Q4: How do you sort a list of objects by multiple fields where some fields are ascending and others descending?

Answer: For numeric fields: use negation in the tuple key - key=lambda x: (x.primary_asc, -x.secondary_desc). The negated value reverses the sort direction for that field. For non-numeric fields (strings, dates): use two stable sorts - sort by the secondary key first (in its desired direction), then sort by the primary key. Python's stable sort preserves secondary ordering within primary groups. Alternatively, use cmp_to_key with a comparator that handles each field explicitly. The most Pythonic and performant approach for mixed types is usually two-pass stable sorting.

Q5: Explain why key=str.lower is preferred over key=lambda s: s.lower() for case-insensitive sorting.

Answer: Both are functionally equivalent - they transform each string to lowercase for comparison. The difference is performance: str.lower is a bound method object that Python can call as a C-level function without creating a Python lambda frame. lambda s: s.lower() creates a Python closure, and calling it incurs Python function call overhead for each element. For sorting 1 million strings, this overhead accumulates. Additionally, str.lower is more readable - it directly communicates the intent ("sort by lowercased value") without the noise of lambda s: s.lower(). Note: for international text with special characters (ß, æ, ø), prefer str.casefold over str.lower.

Q6: Design a sort key for a task scheduler that prioritizes: (1) critical tasks first, (2) earlier deadline within each priority, (3) shorter estimated duration as a tiebreaker. Handle the case where deadline is None (no deadline = sort last within priority).

Answer:

from datetime import datetime
from enum import IntEnum

class Priority(IntEnum):
CRITICAL = 0 # Lower int = higher priority
HIGH = 1
MEDIUM = 2
LOW = 3

def task_sort_key(task):
"""
Returns a tuple for comparison.
All fields sort in ascending order (Python default).
Lower tuple = higher priority = scheduled first.
"""
priority = task["priority"].value # IntEnum: 0=CRITICAL, 3=LOW
has_deadline = task["deadline"] is None # False=has deadline, True=no deadline
# False < True, so deadline tasks first
deadline = task["deadline"] or datetime.max # None → far future (sort last)
duration = task["duration_minutes"]

return (priority, has_deadline, deadline, duration)


tasks = [
{"name": "Deploy hotfix", "priority": Priority.CRITICAL, "deadline": datetime(2024,12,1,10,0), "duration_minutes": 15},
{"name": "Board report", "priority": Priority.HIGH, "deadline": datetime(2024,12,1, 9,0), "duration_minutes": 60},
{"name": "Code review", "priority": Priority.MEDIUM, "deadline": None, "duration_minutes": 30},
{"name": "Security patch", "priority": Priority.CRITICAL, "deadline": datetime(2024,12,1,11,0), "duration_minutes": 45},
{"name": "Team meeting", "priority": Priority.HIGH, "deadline": datetime(2024,12,1, 9,0), "duration_minutes": 45},
]

scheduled = sorted(tasks, key=task_sort_key)
for t in scheduled:
dl = t["deadline"].strftime("%H:%M") if t["deadline"] else "no deadline"
print(f"{t['priority'].name:10} {dl:12} {t['duration_minutes']:3}min {t['name']}")

Output:

CRITICAL 10:00 15min Deploy hotfix
CRITICAL 11:00 45min Security patch
HIGH 09:00 45min Team meeting
HIGH 09:00 60min Board report
MEDIUM no deadline 30min Code review

Practice Challenges

Beginner: Sort by Computed Field

Sort this list of strings by their number of vowels (ascending), with alphabetical order as a tiebreaker.

words = ["Python", "JavaScript", "Rust", "Go", "Erlang", "Lua", "C"]
Solution
def vowel_count(s):
return sum(1 for c in s.lower() if c in "aeiou")

words = ["Python", "JavaScript", "Rust", "Go", "Erlang", "Lua", "C"]

result = sorted(words, key=lambda w: (vowel_count(w), w.lower()))
print(result)
# Vowel counts: C=0, Rust=1, Go=1, Lua=2, Python=2, Erlang=2, JavaScript=3
# Tiebreakers: Go < Rust, Erlang < Lua < Python

# Output: ['C', 'Go', 'Rust', 'Erlang', 'Lua', 'Python', 'JavaScript']

Explanation: The tuple key (vowel_count(w), w.lower()) sorts primarily by vowel count (ascending), then alphabetically within each count group. w.lower() ensures case-insensitive alphabetical ordering for the tiebreaker.

Intermediate: Implement a Ranking Function

Given a list of search results, implement a ranking key that combines relevance score, recency (newer is better), and whether the result is from a trusted domain.

from datetime import datetime, timedelta

results = [
{"title": "Python Tutorial", "score": 0.92, "date": datetime.now() - timedelta(days=5), "trusted": True},
{"title": "Python Reference", "score": 0.85, "date": datetime.now() - timedelta(days=1), "trusted": True},
{"title": "Python Guide", "score": 0.92, "date": datetime.now() - timedelta(days=3), "trusted": False},
{"title": "Learn Python Fast", "score": 0.78, "date": datetime.now() - timedelta(hours=2), "trusted": True},
{"title": "Python Cheatsheet", "score": 0.88, "date": datetime.now() - timedelta(days=10), "trusted": True},
]
Solution
from datetime import datetime, timedelta

def search_ranking_key(result):
"""
Ranking criteria:
1. Trusted domain first (boolean: False sorts before True, so trusted=True → not trusted = False)
2. Relevance score descending (negate)
3. Recency: newer first (ascending date means older first, so negate timestamp)
"""
not_trusted = not result["trusted"] # False (trusted) sorts before True (untrusted)
relevance = -round(result["score"], 2) # Descending: higher score = lower key
recency = -result["date"].timestamp() # Descending: newer = higher timestamp = lower key

return (not_trusted, relevance, recency)

results = [
{"title": "Python Tutorial", "score": 0.92, "date": datetime.now() - timedelta(days=5), "trusted": True},
{"title": "Python Reference", "score": 0.85, "date": datetime.now() - timedelta(days=1), "trusted": True},
{"title": "Python Guide", "score": 0.92, "date": datetime.now() - timedelta(days=3), "trusted": False},
{"title": "Learn Python Fast", "score": 0.78, "date": datetime.now() - timedelta(hours=2), "trusted": True},
{"title": "Python Cheatsheet", "score": 0.88, "date": datetime.now() - timedelta(days=10), "trusted": True},
]

ranked = sorted(results, key=search_ranking_key)
for i, r in enumerate(ranked, 1):
trust = "trusted" if r["trusted"] else " "
age = (datetime.now() - r["date"])
age_str = f"{int(age.total_seconds() // 3600)}h ago" if age.days == 0 else f"{age.days}d ago"
print(f"#{i} [{trust}] score={r['score']:.2f} {age_str:8} {r['title']}")

Output:

#1 [trusted] score=0.92 5d ago Python Tutorial
#2 [trusted] score=0.88 10d ago Python Cheatsheet
#3 [trusted] score=0.85 1d ago Python Reference
#4 [trusted] score=0.78 2h ago Learn Python Fast
#5 [ ] score=0.92 3d ago Python Guide

"Python Tutorial" ranks above "Python Cheatsheet" despite lower recency because both are trusted with the same score (0.92 vs 0.88). "Python Guide" has the highest relevance but is not trusted, so it ranks last.

Advanced: Build a Natural Sort Key

Implement a natural sort key that sorts strings containing numbers in numeric order rather than lexicographic order. "file10.txt" should sort after "file9.txt", not before.

files = [
"file1.txt", "file10.txt", "file2.txt", "file20.txt",
"file3.txt", "notes.txt", "README.md", "file9.txt"
]

# Default sort (wrong):
print(sorted(files))
# ['README.md', 'file1.txt', 'file10.txt', 'file2.txt', 'file20.txt', 'file3.txt', 'file9.txt', 'notes.txt']

# Natural sort (desired):
# ['README.md', 'file1.txt', 'file2.txt', 'file3.txt', 'file9.txt', 'file10.txt', 'file20.txt', 'notes.txt']
Solution
import re

def natural_sort_key(s):
"""
Split the string into alternating text and number segments.
Convert number segments to integers so they sort numerically.

"file10.txt" → ["file", 10, ".txt"]
"file9.txt" → ["file", 9, ".txt"]

When compared as tuples:
("file", 9, ".txt") < ("file", 10, ".txt") ← correct!
"""
parts = re.split(r'(\d+)', s) # Split on digit sequences, keep them
result = []
for part in parts:
if part.isdigit():
result.append(int(part)) # Convert numeric parts to int
else:
result.append(part.lower()) # Keep text parts, case-insensitive
return result

files = [
"file1.txt", "file10.txt", "file2.txt", "file20.txt",
"file3.txt", "notes.txt", "README.md", "file9.txt"
]

print("Default sort:")
print(sorted(files))

print("\nNatural sort:")
print(sorted(files, key=natural_sort_key))

Output:

Default sort:
['README.md', 'file1.txt', 'file10.txt', 'file2.txt', 'file20.txt', 'file3.txt', 'file9.txt', 'notes.txt']

Natural sort:
['README.md', 'file1.txt', 'file2.txt', 'file3.txt', 'file9.txt', 'file10.txt', 'file20.txt', 'notes.txt']

How it works:

  • re.split(r'(\d+)', "file10.txt")["file", "10", ".txt"]
  • Integers are parsed so 10 > 9 (not "10" < "9" lexicographically)
  • Text is lowercased for case-insensitive comparison
  • Mixed lists compare correctly because Python compares tuple elements sequentially

This pattern is widely used in file managers, version control systems, and any tool displaying mixed text-numeric sequences.

Quick Reference

GoalCodeNotes
Sort by fieldsorted(data, key=lambda x: x["field"])Simple, readable
Sort by dict field (fast)sorted(data, key=itemgetter("field"))2x faster than lambda
Sort by attribute (fast)sorted(data, key=attrgetter("attr"))C-level, no overhead
Multiple fields ascendingkey=itemgetter("a", "b")Returns tuple
Mixed asc/desc (numeric)key=lambda x: (x["a"], -x["b"])Negate for descending
Case-insensitive stringskey=str.lower or key=str.casefoldcasefold for Unicode
Sort by length, then alphakey=lambda x: (len(x), x.lower())Tuple key
None values lastkey=lambda x: (x is None, x or 0)None after non-None
None values firstkey=lambda x: (x is not None, x or 0)None before non-None
Sort dict by valuesorted(d.items(), key=itemgetter(1))Returns list of tuples
Legacy comparatorkey=cmp_to_key(old_cmp_fn)Slower - avoid if possible
Version stringskey=lambda v: tuple(int(x) for x in v.split("."))Numeric version sort

Key Takeaways

  • The key function is called exactly once per element - CPython caches all key values before sorting begins; even expensive key functions are safe and only called O(n) times, never O(n log n)
  • operator.itemgetter and attrgetter are 2-3x faster than lambda for pure field extraction - use them in hot paths and for large datasets
  • Tuple keys enable multi-criteria sorting in a single pass - key=lambda x: (x["primary"], -x["secondary"]) sorts ascending by primary and descending by secondary simultaneously
  • cmp_to_key is a last resort - comparators are called O(n log n) times vs O(n) for key functions; always prefer a key function when the logic can be expressed as one
  • Sort None values safely using the (x is None, x or default) pattern - Python 3 cannot compare None with other types; this two-element tuple routes None to the correct end
  • str.lower and str.casefold are the correct tools for case-insensitive sorting - casefold handles Unicode edge cases like German ßss; never rely on default string comparison for user-facing text
  • For objects without natural ordering, prefer key=lambda at the call site over implementing __lt__in the class - keeps sorting logic explicit and the class clean
© 2026 EngineersOfAI. All rights reserved.