Python List Comprehensions Deep Dive: Practice Problems & Exercises

Practice: List Comprehensions Deep Dive

12 problems4 Easy5 Medium3 Hard⏱ 45–60 min

Easy

#1Basic List Comprehension: SquaresEasy

list-comprehensiontransformbasics

Use a list comprehension to create a list of squares of numbers from 1 to 10.

Python

squares = [x * x for x in range(1, 11)]
print(squares)

Solution

squares = [x * x for x in range(1, 11)]
print(squares)

Output:

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

How it works: The list comprehension [x * x for x in range(1, 11)] iterates over each integer from 1 to 10. For each value of x, it computes x * x and places the result into a new list. This is equivalent to:

squares = []
for x in range(1, 11):
    squares.append(x * x)

Key insight: The comprehension version is not just shorter — it compiles to bytecode that uses the LIST_APPEND opcode, avoiding the overhead of looking up and calling result.append() on every iteration. This makes it roughly 25-35% faster than the equivalent for loop.

Expected Output

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Hints

Hint 1: The syntax is [expression for variable in iterable].

Hint 2: range(1, 11) produces integers from 1 through 10 inclusive.

#2Filtering with Trailing ifEasy

list-comprehensionfilterconditional

Use a list comprehension with a filter to extract all numbers from 1 to 30 that are even but NOT divisible by 3.

Python

result = [x for x in range(1, 31) if x % 2 == 0 if x % 3 != 0]
print(result)

Solution

result = [x for x in range(1, 31) if x % 2 == 0 if x % 3 != 0]
print(result)

Output:

[2, 8, 14, 20, 26]

How it works: Multiple trailing if clauses act as AND conditions. The comprehension keeps only values where both x % 2 == 0 and x % 3 != 0 are True. This is equivalent to writing if x % 2 == 0 and x % 3 != 0.

Why these numbers? Even numbers from 1-30 are 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30. Removing those also divisible by 3 (6, 12, 18, 24, 30) leaves 2, 8, 14, 20, 26. These are numbers divisible by 2 but not by 6.

Key insight: The trailing if is a filter — it reduces the number of items in the output. Do not confuse it with the ternary if-else in the expression position, which transforms values but keeps all items.

Expected Output

[2, 8, 14, 20, 26]

Hints

Hint 1: A trailing if clause filters which items are included — no else allowed.

Hint 2: You need two conditions: divisible by 2 AND not divisible by 3.

#3Ternary Expression in ComprehensionEasy

list-comprehensionternarytransform

Use a list comprehension with a ternary expression to label each number from 1 to 10 as "even" or "odd".

Python

labels = ["even" if x % 2 == 0 else "odd" for x in range(1, 11)]
print(labels)

Solution

labels = ["even" if x % 2 == 0 else "odd" for x in range(1, 11)]
print(labels)

Output:

['odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even']

How it works: The ternary expression "even" if x % 2 == 0 else "odd" sits in the expression position of the comprehension — it determines what value each element has, not whether it is included. Every item from range(1, 11) produces exactly one output.

Comprehension anatomy:

[  "even" if x % 2 == 0 else "odd"   for x in range(1, 11)  ]
   ↑ conditional expression            ↑ iteration
   (transforms — every item kept)

Key insight: A ternary in the expression position transforms — the output list has the same length as the input. A trailing if filters — the output may be shorter. These are fundamentally different operations and cannot be combined in the same clause position.

Expected Output

['odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even']

Hints

Hint 1: The ternary form is: value_if_true if condition else value_if_false.

Hint 2: This goes in the expression position (before the for), not as a trailing filter.

#4Set Comprehension: Unique DomainsEasy

set-comprehensionstring-methodsdeduplication

Use a set comprehension to extract all unique email domains from a list of addresses.

Python

emails = [
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]",
]

domains = {email.split("@")[1] for email in emails}
print(len(domains))
print("gmail.com" in domains)
print("yahoo.com" in domains)
print("hotmail.com" in domains)

Solution

emails = [
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]",
]

domains = {email.split("@")[1] for email in emails}
print(len(domains))
print("gmail.com" in domains)
print("yahoo.com" in domains)
print("hotmail.com" in domains)

Output:

3
True
True
False

How it works: The set comprehension {email.split("@")[1] for email in emails} splits each email at @ and takes the domain part (index 1). Since sets automatically deduplicate, gmail.com and yahoo.com each appear only once despite multiple emails using them.

Key insight: Set comprehensions look like dict comprehensions but without the colon. {expr for x in it} creates a set; {k: v for x in it} creates a dict. Be careful: {} alone creates an empty dict, not an empty set. Use set() for an empty set.

Expected Output

3\nTrue\nTrue\nFalse

Hints

Hint 1: Set comprehensions use curly braces: {expression for var in iterable}.

Hint 2: str.split("@")[1] extracts the domain portion of an email address.

Medium

#5Flatten a Matrix with Nested ComprehensionMedium

nested-comprehensionflattenmatrix

Use a nested list comprehension to flatten a 3x3 matrix into a single list. Then flatten again but keep only even numbers.

Python

matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

flat = [x for row in matrix for x in row]
print(flat)

flat_evens = [x for row in matrix for x in row if x % 2 == 0]
print(flat_evens)

Solution

matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

flat = [x for row in matrix for x in row]
print(flat)

flat_evens = [x for row in matrix for x in row if x % 2 == 0]
print(flat_evens)

Output:

[1, 2, 3, 4, 5, 6, 7, 8, 9]
[2, 4, 6, 8]

Reading order for nested comprehensions: The for-clauses read left-to-right, matching the equivalent nested loop:

# This comprehension:
flat = [x for row in matrix for x in row]

# Is equivalent to:
flat = []
for row in matrix:        # first for in the comprehension
    for x in row:         # second for in the comprehension
        flat.append(x)    # expression at the start

The outer loop (for row in matrix) appears first in the comprehension. This is the part that trips up most engineers — they expect the inner loop first because the expression x comes from the innermost loop.

Adding a filter: The trailing if x % 2 == 0 applies to the innermost loop, filtering individual elements after they are extracted from each row.

Key insight: For deeply nested structures or complex filtering, consider itertools.chain.from_iterable(matrix) instead — it is often more readable and equally performant for simple flattening.

Expected Output

[1, 2, 3, 4, 5, 6, 7, 8, 9]\n[2, 4, 6, 8]

Hints

Hint 1: For flattening, the outer for comes first (left to right): [x for row in matrix for x in row].

Hint 2: You can add a trailing if to filter elements during the flatten.

#6Dict Comprehension: Invert and FilterMedium

dict-comprehensioninvertfilter

Use dict comprehensions to first filter a dictionary (keep only items where the value is greater than 1), then invert the filtered result (swap keys and values).

Python

data = {"a": 1, "b": 2, "c": 3, "d": 4}

filtered = {k: v for k, v in data.items() if v > 1}
print(filtered)

inverted = {v: k for k, v in filtered.items()}
print(inverted)

Solution

data = {"a": 1, "b": 2, "c": 3, "d": 4}

filtered = {k: v for k, v in data.items() if v > 1}
print(filtered)

inverted = {v: k for k, v in filtered.items()}
print(inverted)

Output:

{'b': 2, 'c': 3, 'd': 4}
{2: 'b', 3: 'c', 4: 'd'}

How it works:

Filtering: {k: v for k, v in data.items() if v > 1} iterates over all key-value pairs and keeps only those where the value exceeds 1. The item "a": 1 is excluded.
Inverting: {v: k for k, v in filtered.items()} swaps each key-value pair — the old value becomes the new key and vice versa.

Warning about inverting: Inversion only works correctly when values are unique and hashable. If two keys share the same value, only the last one survives (dict keys must be unique). For example, inverting {"a": 1, "b": 1} yields {1: "b"} — "a" is silently lost.

Key insight: Dict comprehensions are the Pythonic way to transform, filter, and reshape dictionaries. They replace verbose loops like new_dict = {}; for k, v in data.items(): if v > 1: new_dict[k] = v with a single, readable expression.

Expected Output

{'b': 2, 'c': 3, 'd': 4}\n{2: 'b', 3: 'c', 4: 'd'}

Hints

Hint 1: Dict comprehension syntax: {key_expr: value_expr for var in iterable if condition}.

Hint 2: To invert a dict, swap keys and values: {v: k for k, v in original.items()}.

#7Generator Expression with sum, any, allMedium

generator-expressionsumanyalllazy

Use generator expressions (not list comprehensions) with sum, any, and all to answer questions about a dataset without building intermediate lists.

Python

data = [3, 7, 12, 5, 20, 8, 15, 2, 18, 10]

# Sum of squares of all elements
total = sum(x * x for x in data)
print(total)

# Is any element greater than 15?
has_large = any(x > 15 for x in data)
print(has_large)

# Are all elements greater than 5?
all_above_five = all(x > 5 for x in data)
print(all_above_five)

# Are all elements positive?
all_positive = all(x > 0 for x in data)
print(all_positive)

Solution

data = [3, 7, 12, 5, 20, 8, 15, 2, 18, 10]

total = sum(x * x for x in data)
print(total)

has_large = any(x > 15 for x in data)
print(has_large)

all_above_five = all(x > 5 for x in data)
print(all_above_five)

all_positive = all(x > 0 for x in data)
print(all_positive)

Output:

285
True
False
True

Why generators here, not list comprehensions?

sum(), any(), and all() consume their input in a single pass. They do not need random access or len(). A generator produces one value at a time without allocating a list.
any() short-circuits — it stops as soon as it finds a True value. When checking any(x > 15 for x in data), it stops at 20 (index 4) without examining elements 5-9. A list comprehension would evaluate all 10 elements first.
all() short-circuits at the first False. For all(x > 5 for x in data), it stops at 3 (index 0).

Memory difference: For data of size N:

sum([x * x for x in data]) — allocates a list of N integers, then sums it
sum(x * x for x in data) — processes one integer at a time, O(1) memory

Key insight: When passing a generator directly as the only argument to a function, the outer parentheses of the function call serve double duty — you do not need an extra pair. sum(x*x for x in data) works; no need for sum((x*x for x in data)).

Expected Output

285\nTrue\nFalse\nTrue

Hints

Hint 1: Generator expressions use parentheses: (expr for x in iterable).

Hint 2: When passed directly to a function like sum(), you can omit the extra parentheses.

#8Build a Matrix with Nested ComprehensionMedium

nested-comprehensionmatrixconstruction

Use a nested list comprehension to build a 4x4 identity matrix (1 on the diagonal, 0 elsewhere). Print each row on its own line.

Python

n = 4
identity = [[1 if i == j else 0 for j in range(n)] for i in range(n)]

for row in identity:
    print(row)

Solution

n = 4
identity = [[1 if i == j else 0 for j in range(n)] for i in range(n)]

for row in identity:
    print(row)

Output:

[1, 0, 0, 0]
[0, 1, 0, 0]
[0, 0, 1, 0]
[0, 0, 0, 1]

How to read this nested comprehension:

[[1 if i == j else 0 for j in range(n)] for i in range(n)]
 ↑ inner comprehension (builds one row)   ↑ outer (iterates over rows)

The outer comprehension iterates i from 0 to 3 (rows). For each row i, the inner comprehension iterates j from 0 to 3 (columns) and produces 1 when i == j (diagonal) or 0 otherwise.

Building vs Flattening — the critical difference:

Building (matrix construction): [[expr for j in cols] for i in rows] — the outer comprehension wraps inner ones, creating a list of lists.
Flattening (matrix destruction): [x for row in matrix for x in row] — a single comprehension with two for-clauses, producing a flat list.

The bracket placement determines which pattern you get. Building has [[ ]] (nested brackets). Flattening has [ ] with multiple for clauses inside a single bracket pair.

Expected Output

[1, 0, 0, 0]\n[0, 1, 0, 0]\n[0, 0, 1, 0]\n[0, 0, 0, 1]

Hints

Hint 1: For building a matrix, the outer comprehension creates rows, the inner creates columns.

Hint 2: An identity matrix has 1 on the diagonal (where row index equals column index) and 0 elsewhere.

#9ETL Pipeline with Dict ComprehensionMedium

dict-comprehensionetldata-cleaningreal-world

Use a dict comprehension to clean and transform raw API data: strip and lowercase names, cast scores to integers, and include only active users.

Python

raw_records = [
    {"name": "  Alice  ", "score": "92", "active": "true"},
    {"name": "  Bob  ",   "score": "67", "active": "false"},
    {"name": "  Carol  ", "score": "88", "active": "true"},
    {"name": "  Dave  ",  "score": "45", "active": "false"},
]

active_scores = {
    r["name"].strip().lower(): int(r["score"])
    for r in raw_records
    if r["active"] == "true"
}
print(active_scores)

top_user = max(active_scores, key=active_scores.get)
print("Highest:", top_user, "(" + str(active_scores[top_user]) + ")")

Solution

raw_records = [
    {"name": "  Alice  ", "score": "92", "active": "true"},
    {"name": "  Bob  ",   "score": "67", "active": "false"},
    {"name": "  Carol  ", "score": "88", "active": "true"},
    {"name": "  Dave  ",  "score": "45", "active": "false"},
]

active_scores = {
    r["name"].strip().lower(): int(r["score"])
    for r in raw_records
    if r["active"] == "true"
}
print(active_scores)

top_user = max(active_scores, key=active_scores.get)
print("Highest:", top_user, "(" + str(active_scores[top_user]) + ")")

Output:

{'alice': 92, 'carol': 88}
Highest: alice (92)

How it works: The dict comprehension performs three operations in a single pass:

Filter: if r["active"] == "true" excludes inactive users (Bob, Dave)
Transform keys: r["name"].strip().lower() removes whitespace and normalizes to lowercase
Transform values: int(r["score"]) casts string scores to integers

This is a common Extract-Transform-Load (ETL) pattern. The equivalent loop version would be 6-8 lines. The comprehension version is a single expression that clearly communicates intent: "build a name-to-score mapping for active users."

Key insight: Dict comprehensions excel at building lookup tables from raw data. When you see yourself writing result = {}; for item in data: result[key] = value, that is a signal to use a dict comprehension instead.

Expected Output

{'alice': 92, 'carol': 88}\nHighest: alice (92)

Hints

Hint 1: Chain operations: strip whitespace, lowercase names, cast scores to int, filter by active status.

Hint 2: Use max() with a key argument to find the highest scorer.

Hard

#10Walrus Operator in ComprehensionsHard

walrus-operatorlist-comprehensionassignment-expressionoptimization

Use the walrus operator (:=) to compute the cube of each number, filter cubes between 5 and 500, and include both the original number and its cube — without computing the cube twice.

Python

numbers = range(1, 10)

# Without walrus, you would compute x**3 twice:
# [(x, x**3) for x in numbers if 5 < x**3 < 500]

# With walrus, compute once and reuse:
results = [(x, cube) for x in numbers if 5 < (cube := x ** 3) < 500]
print(results)
print(len(results))

Solution

numbers = range(1, 10)

results = [(x, cube) for x in numbers if 5 < (cube := x ** 3) < 500]
print(results)
print(len(results))

Output:

[(2, 8), (3, 27), (4, 64), (5, 125), (6, 216), (7, 343)]
6

How the walrus operator works here:

The expression (cube := x ** 3) does two things simultaneously:

Computes x ** 3 and assigns the result to the name cube
Returns the computed value so it can be used in the comparison 5 < ... < 500

Then in the output expression (x, cube), cube already holds the computed value — no need to call x ** 3 again.

Without the walrus operator, you would either:

Compute x ** 3 twice: [(x, x**3) for x in numbers if 5 < x**3 < 500]
Use a nested comprehension trick: [(x, c) for x in numbers for c in [x**3] if 5 < c < 500]
Fall back to an explicit loop

When this matters: If the expression is an expensive function call (database query, API call, complex computation), avoiding the duplicate call is a real performance win, not just a style preference.

Key insight: The walrus operator (Python 3.8+) is most valuable inside comprehensions where you need the same computed value in both the filter (if clause) and the output expression. It keeps the comprehension form viable in cases that would otherwise require falling back to a loop.

Expected Output

[(2, 8), (3, 27), (4, 64), (5, 125), (6, 216), (7, 343)]\n6

Hints

Hint 1: The walrus operator := assigns a value and returns it in a single expression.

Hint 2: Use it to avoid computing the same expensive expression twice (once for the filter, once for the output).

#11Generator Pipeline: Multi-Stage Data ProcessingHard

generator-expressionpipelinelazy-evaluationmemory-efficiency

Build a multi-stage generator pipeline that filters and transforms transaction data lazily. Each stage must be a generator expression — no intermediate lists should be materialized.

Python

def process_transactions(transactions):
    # Stage 1: Remove refunds (negative amounts)
    non_refunds = (t for t in transactions if t["amount_cents"] > 0)

    # Stage 2: Convert cents to dollars
    with_dollars = (
        {"merchant": t["merchant"], "dollars": t["amount_cents"] / 100}
        for t in non_refunds
    )

    # Stage 3: Keep only transactions over $10
    large_only = (
        (t["merchant"], t["dollars"])
        for t in with_dollars
        if t["dollars"] > 10
    )

    return large_only

# Test data
transactions = [
    {"merchant": "Amazon", "amount_cents": 4999},
    {"merchant": "Refund-Store", "amount_cents": -1500},
    {"merchant": "Coffee Shop", "amount_cents": 450},
    {"merchant": "Grocery", "amount_cents": 8732},
    {"merchant": "Gas Station", "amount_cents": 5100},
    {"merchant": "Refund-Online", "amount_cents": -2000},
    {"merchant": "Restaurant", "amount_cents": 3275},
]

for merchant, amount in process_transactions(transactions):
    print(merchant, amount)

Solution

def process_transactions(transactions):
    # Stage 1: Remove refunds (negative amounts)
    non_refunds = (t for t in transactions if t["amount_cents"] > 0)

    # Stage 2: Convert cents to dollars
    with_dollars = (
        {"merchant": t["merchant"], "dollars": t["amount_cents"] / 100}
        for t in non_refunds
    )

    # Stage 3: Keep only transactions over $10
    large_only = (
        (t["merchant"], t["dollars"])
        for t in with_dollars
        if t["dollars"] > 10
    )

    return large_only

transactions = [
    {"merchant": "Amazon", "amount_cents": 4999},
    {"merchant": "Refund-Store", "amount_cents": -1500},
    {"merchant": "Coffee Shop", "amount_cents": 450},
    {"merchant": "Grocery", "amount_cents": 8732},
    {"merchant": "Gas Station", "amount_cents": 5100},
    {"merchant": "Refund-Online", "amount_cents": -2000},
    {"merchant": "Restaurant", "amount_cents": 3275},
]

for merchant, amount in process_transactions(transactions):
    print(merchant, amount)

Output:

Amazon 49.99
Grocery 87.32
Gas Station 51.0
Restaurant 32.75

What happens when the for loop pulls the first item:

large_only asks with_dollars for the next item
with_dollars asks non_refunds for the next item
non_refunds pulls from transactions, gets {"merchant": "Amazon", "amount_cents": 4999} — positive, so it yields it
with_dollars transforms it to {"merchant": "Amazon", "dollars": 49.99} and yields
large_only checks 49.99 > 10 — True, so it yields ("Amazon", 49.99)
The for loop prints it

For "Refund-Store" (amount -1500), Stage 1 rejects it. Stage 2 and 3 never see it. For "Coffee Shop" ( $4.50), Stages 1 and 2 process it, but Stage 3 rejects it since$ 4.50 is not greater than $10.

Memory analysis: At any moment, only one transaction dict flows through the pipeline. If transactions were a file reader yielding millions of rows, the pipeline would still use O(1) memory — each row enters, gets processed or rejected, and is discarded.

Key insight: Generator pipelines are Python's answer to Unix pipes. Each generator is like a filter in cat data | grep | sed | awk. Data flows through stages on demand, with zero intermediate storage. This is the correct pattern for processing large datasets that do not fit in memory.

Starter Code

def process_transactions(transactions):
    """Build a lazy pipeline that:
    1. Filters out refunds (amount < 0)
    2. Converts amounts from cents to dollars
    3. Filters transactions over $10
    4. Returns (merchant, dollar_amount) tuples
    
    Each stage must be a generator — no intermediate lists.
    """
    # TODO: Implement 3-stage generator pipeline
    pass

# Test data
transactions = [
    {"merchant": "Amazon", "amount_cents": 4999},
    {"merchant": "Refund-Store", "amount_cents": -1500},
    {"merchant": "Coffee Shop", "amount_cents": 450},
    {"merchant": "Grocery", "amount_cents": 8732},
    {"merchant": "Gas Station", "amount_cents": 5100},
    {"merchant": "Refund-Online", "amount_cents": -2000},
    {"merchant": "Restaurant", "amount_cents": 3275},
]

for merchant, amount in process_transactions(transactions):
    print(merchant, amount)

Expected Output

Amazon 49.99\nGrocery 87.32\nGas Station 51.0\nRestaurant 32.75

Hints

Hint 1: Each stage should be a generator expression that feeds into the next.

Hint 2: Stage 1 filters negatives, Stage 2 transforms cents to dollars, Stage 3 filters by threshold.

Hint 3: No data flows until the final consumer (the for loop) pulls from the pipeline.

#12Comprehension vs Loop: Refactoring for PerformanceHard

performancerefactoringlist-comprehensionset-comprehensiondict-comprehension

Refactor three verbose loops into Pythonic comprehensions. The function processes server log entries and extracts error messages, unique sources, and per-source error counts.

Python

from collections import Counter

def analyze_logs(logs):
    error_messages = [
        log["message"].upper()
        for log in logs
        if log["level"] == "ERROR"
    ]

    unique_sources = {log["source"] for log in logs}

    error_counts = dict(Counter(
        log["source"]
        for log in logs
        if log["level"] == "ERROR"
    ))

    return {
        "error_messages": error_messages,
        "unique_sources": unique_sources,
        "error_counts": error_counts,
    }

# Test data
logs = [
    {"level": "INFO",  "source": "web",     "message": "Request received"},
    {"level": "ERROR", "source": "storage", "message": "Disk full"},
    {"level": "WARN",  "source": "network", "message": "High latency"},
    {"level": "ERROR", "source": "network", "message": "Connection timeout"},
    {"level": "INFO",  "source": "web",     "message": "Response sent"},
    {"level": "ERROR", "source": "storage", "message": "Disk failure"},
]

result = analyze_logs(logs)
print("Errors:", result["error_messages"])
print("Sources:", len(result["unique_sources"]))
print("Counts:", result["error_counts"])

Solution

from collections import Counter

def analyze_logs(logs):
    error_messages = [
        log["message"].upper()
        for log in logs
        if log["level"] == "ERROR"
    ]

    unique_sources = {log["source"] for log in logs}

    error_counts = dict(Counter(
        log["source"]
        for log in logs
        if log["level"] == "ERROR"
    ))

    return {
        "error_messages": error_messages,
        "unique_sources": unique_sources,
        "error_counts": error_counts,
    }

logs = [
    {"level": "INFO",  "source": "web",     "message": "Request received"},
    {"level": "ERROR", "source": "storage", "message": "Disk full"},
    {"level": "WARN",  "source": "network", "message": "High latency"},
    {"level": "ERROR", "source": "network", "message": "Connection timeout"},
    {"level": "INFO",  "source": "web",     "message": "Response sent"},
    {"level": "ERROR", "source": "storage", "message": "Disk failure"},
]

result = analyze_logs(logs)
print("Errors:", result["error_messages"])
print("Sources:", len(result["unique_sources"]))
print("Counts:", result["error_counts"])

Output:

Errors: ['DISK FULL', 'CONNECTION TIMEOUT', 'DISK FAILURE']
Sources: 3
Counts: {'storage': 2, 'network': 1}

Refactoring breakdown:

1. List comprehension (filter + transform):

# Before: 4 lines with manual append
error_messages = []
for log in logs:
    if log["level"] == "ERROR":
        error_messages.append(log["message"].upper())

# After: 1 expression — filter with trailing if, transform with .upper()
error_messages = [log["message"].upper() for log in logs if log["level"] == "ERROR"]

2. Set comprehension (deduplication):

# Before: 3 lines with manual add
unique_sources = set()
for log in logs:
    unique_sources.add(log["source"])

# After: 1 expression — set automatically deduplicates
unique_sources = {log["source"] for log in logs}

3. Counter with generator expression (aggregation):

# Before: 7 lines with manual counting
error_counts = {}
for log in logs:
    if log["level"] == "ERROR":
        src = log["source"]
        if src not in error_counts:
            error_counts[src] = 0
        error_counts[src] += 1

# After: Counter + generator expression — one pass, automatic counting
error_counts = dict(Counter(log["source"] for log in logs if log["level"] == "ERROR"))

Performance notes: Each comprehension uses LIST_APPEND or SET_ADD bytecodes instead of attribute lookup + method call, giving roughly 25-35% speedup on the iteration itself. The Counter accepts a generator expression, so no intermediate list is created for the counting step.

Key insight: The three comprehension types — list, set, and dict (via Counter) — cover the vast majority of data transformation patterns. When you see a loop that builds a collection with append, add, or key assignment, it is almost always a refactoring candidate for a comprehension.

Starter Code

def analyze_logs_slow(logs):
    """Slow version using loops. Refactor to comprehensions.
    
    Given log entries, return a dict with:
    - 'error_messages': list of messages from ERROR logs (uppercase)
    - 'unique_sources': set of all unique source names
    - 'error_counts': dict mapping each source to its error count
    """
    # TODO: Refactor these three loops into comprehensions
    error_messages = []
    for log in logs:
        if log["level"] == "ERROR":
            error_messages.append(log["message"].upper())

    unique_sources = set()
    for log in logs:
        unique_sources.add(log["source"])

    error_counts = {}
    for log in logs:
        if log["level"] == "ERROR":
            src = log["source"]
            if src not in error_counts:
                error_counts[src] = 0
            error_counts[src] += 1

    return {
        "error_messages": error_messages,
        "unique_sources": unique_sources,
        "error_counts": error_counts,
    }

Expected Output

Errors: ['DISK FULL', 'CONNECTION TIMEOUT', 'DISK FAILURE']\nSources: 3\nCounts: {'storage': 2, 'network': 1}

Hints

Hint 1: error_messages can be a list comprehension with a filter.

Hint 2: unique_sources can be a set comprehension.

Hint 3: error_counts needs collections.Counter or a dict comprehension over grouped data.

Practice: List Comprehensions Deep Dive

Easy​

Medium​

Hard​

Easy

Medium

Hard