Python List Comprehensions Deep Dive: Practice Problems & Exercises
Practice: List Comprehensions Deep Dive
← Back to lessonEasy
Use a list comprehension to create a list of squares of numbers from 1 to 10.
squares = [x * x for x in range(1, 11)] print(squares)
Solution
squares = [x * x for x in range(1, 11)]
print(squares)
Output:
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
How it works: The list comprehension [x * x for x in range(1, 11)] iterates over each integer from 1 to 10. For each value of x, it computes x * x and places the result into a new list. This is equivalent to:
squares = []
for x in range(1, 11):
squares.append(x * x)
Key insight: The comprehension version is not just shorter — it compiles to bytecode that uses the LIST_APPEND opcode, avoiding the overhead of looking up and calling result.append() on every iteration. This makes it roughly 25-35% faster than the equivalent for loop.
Expected Output
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]Hints
Hint 1: The syntax is [expression for variable in iterable].
Hint 2: range(1, 11) produces integers from 1 through 10 inclusive.
Use a list comprehension with a filter to extract all numbers from 1 to 30 that are even but NOT divisible by 3.
result = [x for x in range(1, 31) if x % 2 == 0 if x % 3 != 0] print(result)
Solution
result = [x for x in range(1, 31) if x % 2 == 0 if x % 3 != 0]
print(result)
Output:
[2, 8, 14, 20, 26]
How it works: Multiple trailing if clauses act as AND conditions. The comprehension keeps only values where both x % 2 == 0 and x % 3 != 0 are True. This is equivalent to writing if x % 2 == 0 and x % 3 != 0.
Why these numbers? Even numbers from 1-30 are 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30. Removing those also divisible by 3 (6, 12, 18, 24, 30) leaves 2, 8, 14, 20, 26. These are numbers divisible by 2 but not by 6.
Key insight: The trailing if is a filter — it reduces the number of items in the output. Do not confuse it with the ternary if-else in the expression position, which transforms values but keeps all items.
Expected Output
[2, 8, 14, 20, 26]Hints
Hint 1: A trailing if clause filters which items are included — no else allowed.
Hint 2: You need two conditions: divisible by 2 AND not divisible by 3.
Use a list comprehension with a ternary expression to label each number from 1 to 10 as "even" or "odd".
labels = ["even" if x % 2 == 0 else "odd" for x in range(1, 11)] print(labels)
Solution
labels = ["even" if x % 2 == 0 else "odd" for x in range(1, 11)]
print(labels)
Output:
['odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even']
How it works: The ternary expression "even" if x % 2 == 0 else "odd" sits in the expression position of the comprehension — it determines what value each element has, not whether it is included. Every item from range(1, 11) produces exactly one output.
Comprehension anatomy:
[ "even" if x % 2 == 0 else "odd" for x in range(1, 11) ]
↑ conditional expression ↑ iteration
(transforms — every item kept)
Key insight: A ternary in the expression position transforms — the output list has the same length as the input. A trailing if filters — the output may be shorter. These are fundamentally different operations and cannot be combined in the same clause position.
Expected Output
['odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even']Hints
Hint 1: The ternary form is: value_if_true if condition else value_if_false.
Hint 2: This goes in the expression position (before the for), not as a trailing filter.
Use a set comprehension to extract all unique email domains from a list of addresses.
emails = [
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]",
]
domains = {email.split("@")[1] for email in emails}
print(len(domains))
print("gmail.com" in domains)
print("yahoo.com" in domains)
print("hotmail.com" in domains)Solution
emails = [
]
domains = {email.split("@")[1] for email in emails}
print(len(domains))
print("gmail.com" in domains)
print("yahoo.com" in domains)
print("hotmail.com" in domains)
Output:
3
True
True
False
How it works: The set comprehension {email.split("@")[1] for email in emails} splits each email at @ and takes the domain part (index 1). Since sets automatically deduplicate, gmail.com and yahoo.com each appear only once despite multiple emails using them.
Key insight: Set comprehensions look like dict comprehensions but without the colon. {expr for x in it} creates a set; {k: v for x in it} creates a dict. Be careful: {} alone creates an empty dict, not an empty set. Use set() for an empty set.
Expected Output
3\nTrue\nTrue\nFalseHints
Hint 1: Set comprehensions use curly braces: {expression for var in iterable}.
Hint 2: str.split("@")[1] extracts the domain portion of an email address.
Medium
Use a nested list comprehension to flatten a 3x3 matrix into a single list. Then flatten again but keep only even numbers.
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] flat = [x for row in matrix for x in row] print(flat) flat_evens = [x for row in matrix for x in row if x % 2 == 0] print(flat_evens)
Solution
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [x for row in matrix for x in row]
print(flat)
flat_evens = [x for row in matrix for x in row if x % 2 == 0]
print(flat_evens)
Output:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[2, 4, 6, 8]
Reading order for nested comprehensions: The for-clauses read left-to-right, matching the equivalent nested loop:
# This comprehension:
flat = [x for row in matrix for x in row]
# Is equivalent to:
flat = []
for row in matrix: # first for in the comprehension
for x in row: # second for in the comprehension
flat.append(x) # expression at the start
The outer loop (for row in matrix) appears first in the comprehension. This is the part that trips up most engineers — they expect the inner loop first because the expression x comes from the innermost loop.
Adding a filter: The trailing if x % 2 == 0 applies to the innermost loop, filtering individual elements after they are extracted from each row.
Key insight: For deeply nested structures or complex filtering, consider itertools.chain.from_iterable(matrix) instead — it is often more readable and equally performant for simple flattening.
Expected Output
[1, 2, 3, 4, 5, 6, 7, 8, 9]\n[2, 4, 6, 8]Hints
Hint 1: For flattening, the outer for comes first (left to right): [x for row in matrix for x in row].
Hint 2: You can add a trailing if to filter elements during the flatten.
Use dict comprehensions to first filter a dictionary (keep only items where the value is greater than 1), then invert the filtered result (swap keys and values).
data = {"a": 1, "b": 2, "c": 3, "d": 4}
filtered = {k: v for k, v in data.items() if v > 1}
print(filtered)
inverted = {v: k for k, v in filtered.items()}
print(inverted)Solution
data = {"a": 1, "b": 2, "c": 3, "d": 4}
filtered = {k: v for k, v in data.items() if v > 1}
print(filtered)
inverted = {v: k for k, v in filtered.items()}
print(inverted)
Output:
{'b': 2, 'c': 3, 'd': 4}
{2: 'b', 3: 'c', 4: 'd'}
How it works:
-
Filtering:
{k: v for k, v in data.items() if v > 1}iterates over all key-value pairs and keeps only those where the value exceeds 1. The item"a": 1is excluded. -
Inverting:
{v: k for k, v in filtered.items()}swaps each key-value pair — the old value becomes the new key and vice versa.
Warning about inverting: Inversion only works correctly when values are unique and hashable. If two keys share the same value, only the last one survives (dict keys must be unique). For example, inverting {"a": 1, "b": 1} yields {1: "b"} — "a" is silently lost.
Key insight: Dict comprehensions are the Pythonic way to transform, filter, and reshape dictionaries. They replace verbose loops like new_dict = {}; for k, v in data.items(): if v > 1: new_dict[k] = v with a single, readable expression.
Expected Output
{'b': 2, 'c': 3, 'd': 4}\n{2: 'b', 3: 'c', 4: 'd'}Hints
Hint 1: Dict comprehension syntax: {key_expr: value_expr for var in iterable if condition}.
Hint 2: To invert a dict, swap keys and values: {v: k for k, v in original.items()}.
Use generator expressions (not list comprehensions) with sum, any, and all to answer questions about a dataset without building intermediate lists.
data = [3, 7, 12, 5, 20, 8, 15, 2, 18, 10] # Sum of squares of all elements total = sum(x * x for x in data) print(total) # Is any element greater than 15? has_large = any(x > 15 for x in data) print(has_large) # Are all elements greater than 5? all_above_five = all(x > 5 for x in data) print(all_above_five) # Are all elements positive? all_positive = all(x > 0 for x in data) print(all_positive)
Solution
data = [3, 7, 12, 5, 20, 8, 15, 2, 18, 10]
total = sum(x * x for x in data)
print(total)
has_large = any(x > 15 for x in data)
print(has_large)
all_above_five = all(x > 5 for x in data)
print(all_above_five)
all_positive = all(x > 0 for x in data)
print(all_positive)
Output:
285
True
False
True
Why generators here, not list comprehensions?
sum(),any(), andall()consume their input in a single pass. They do not need random access orlen(). A generator produces one value at a time without allocating a list.any()short-circuits — it stops as soon as it finds a True value. When checkingany(x > 15 for x in data), it stops at20(index 4) without examining elements 5-9. A list comprehension would evaluate all 10 elements first.all()short-circuits at the first False. Forall(x > 5 for x in data), it stops at3(index 0).
Memory difference: For data of size N:
sum([x * x for x in data])— allocates a list of N integers, then sums itsum(x * x for x in data)— processes one integer at a time, O(1) memory
Key insight: When passing a generator directly as the only argument to a function, the outer parentheses of the function call serve double duty — you do not need an extra pair. sum(x*x for x in data) works; no need for sum((x*x for x in data)).
Expected Output
285\nTrue\nFalse\nTrueHints
Hint 1: Generator expressions use parentheses: (expr for x in iterable).
Hint 2: When passed directly to a function like sum(), you can omit the extra parentheses.
Use a nested list comprehension to build a 4x4 identity matrix (1 on the diagonal, 0 elsewhere). Print each row on its own line.
n = 4
identity = [[1 if i == j else 0 for j in range(n)] for i in range(n)]
for row in identity:
print(row)Solution
n = 4
identity = [[1 if i == j else 0 for j in range(n)] for i in range(n)]
for row in identity:
print(row)
Output:
[1, 0, 0, 0]
[0, 1, 0, 0]
[0, 0, 1, 0]
[0, 0, 0, 1]
How to read this nested comprehension:
[[1 if i == j else 0 for j in range(n)] for i in range(n)]
↑ inner comprehension (builds one row) ↑ outer (iterates over rows)
The outer comprehension iterates i from 0 to 3 (rows). For each row i, the inner comprehension iterates j from 0 to 3 (columns) and produces 1 when i == j (diagonal) or 0 otherwise.
Building vs Flattening — the critical difference:
- Building (matrix construction):
[[expr for j in cols] for i in rows]— the outer comprehension wraps inner ones, creating a list of lists. - Flattening (matrix destruction):
[x for row in matrix for x in row]— a single comprehension with two for-clauses, producing a flat list.
The bracket placement determines which pattern you get. Building has [[ ]] (nested brackets). Flattening has [ ] with multiple for clauses inside a single bracket pair.
Expected Output
[1, 0, 0, 0]\n[0, 1, 0, 0]\n[0, 0, 1, 0]\n[0, 0, 0, 1]Hints
Hint 1: For building a matrix, the outer comprehension creates rows, the inner creates columns.
Hint 2: An identity matrix has 1 on the diagonal (where row index equals column index) and 0 elsewhere.
Use a dict comprehension to clean and transform raw API data: strip and lowercase names, cast scores to integers, and include only active users.
raw_records = [
{"name": " Alice ", "score": "92", "active": "true"},
{"name": " Bob ", "score": "67", "active": "false"},
{"name": " Carol ", "score": "88", "active": "true"},
{"name": " Dave ", "score": "45", "active": "false"},
]
active_scores = {
r["name"].strip().lower(): int(r["score"])
for r in raw_records
if r["active"] == "true"
}
print(active_scores)
top_user = max(active_scores, key=active_scores.get)
print("Highest:", top_user, "(" + str(active_scores[top_user]) + ")")Solution
raw_records = [
{"name": " Alice ", "score": "92", "active": "true"},
{"name": " Bob ", "score": "67", "active": "false"},
{"name": " Carol ", "score": "88", "active": "true"},
{"name": " Dave ", "score": "45", "active": "false"},
]
active_scores = {
r["name"].strip().lower(): int(r["score"])
for r in raw_records
if r["active"] == "true"
}
print(active_scores)
top_user = max(active_scores, key=active_scores.get)
print("Highest:", top_user, "(" + str(active_scores[top_user]) + ")")
Output:
{'alice': 92, 'carol': 88}
Highest: alice (92)
How it works: The dict comprehension performs three operations in a single pass:
- Filter:
if r["active"] == "true"excludes inactive users (Bob, Dave) - Transform keys:
r["name"].strip().lower()removes whitespace and normalizes to lowercase - Transform values:
int(r["score"])casts string scores to integers
This is a common Extract-Transform-Load (ETL) pattern. The equivalent loop version would be 6-8 lines. The comprehension version is a single expression that clearly communicates intent: "build a name-to-score mapping for active users."
Key insight: Dict comprehensions excel at building lookup tables from raw data. When you see yourself writing result = {}; for item in data: result[key] = value, that is a signal to use a dict comprehension instead.
Expected Output
{'alice': 92, 'carol': 88}\nHighest: alice (92)Hints
Hint 1: Chain operations: strip whitespace, lowercase names, cast scores to int, filter by active status.
Hint 2: Use max() with a key argument to find the highest scorer.
Hard
Use the walrus operator (:=) to compute the cube of each number, filter cubes between 5 and 500, and include both the original number and its cube — without computing the cube twice.
numbers = range(1, 10) # Without walrus, you would compute x**3 twice: # [(x, x**3) for x in numbers if 5 < x**3 < 500] # With walrus, compute once and reuse: results = [(x, cube) for x in numbers if 5 < (cube := x ** 3) < 500] print(results) print(len(results))
Solution
numbers = range(1, 10)
results = [(x, cube) for x in numbers if 5 < (cube := x ** 3) < 500]
print(results)
print(len(results))
Output:
[(2, 8), (3, 27), (4, 64), (5, 125), (6, 216), (7, 343)]
6
How the walrus operator works here:
The expression (cube := x ** 3) does two things simultaneously:
- Computes
x ** 3and assigns the result to the namecube - Returns the computed value so it can be used in the comparison
5 < ... < 500
Then in the output expression (x, cube), cube already holds the computed value — no need to call x ** 3 again.
Without the walrus operator, you would either:
- Compute
x ** 3twice:[(x, x**3) for x in numbers if 5 < x**3 < 500] - Use a nested comprehension trick:
[(x, c) for x in numbers for c in [x**3] if 5 < c < 500] - Fall back to an explicit loop
When this matters: If the expression is an expensive function call (database query, API call, complex computation), avoiding the duplicate call is a real performance win, not just a style preference.
Key insight: The walrus operator (Python 3.8+) is most valuable inside comprehensions where you need the same computed value in both the filter (if clause) and the output expression. It keeps the comprehension form viable in cases that would otherwise require falling back to a loop.
Expected Output
[(2, 8), (3, 27), (4, 64), (5, 125), (6, 216), (7, 343)]\n6Hints
Hint 1: The walrus operator := assigns a value and returns it in a single expression.
Hint 2: Use it to avoid computing the same expensive expression twice (once for the filter, once for the output).
Build a multi-stage generator pipeline that filters and transforms transaction data lazily. Each stage must be a generator expression — no intermediate lists should be materialized.
def process_transactions(transactions):
# Stage 1: Remove refunds (negative amounts)
non_refunds = (t for t in transactions if t["amount_cents"] > 0)
# Stage 2: Convert cents to dollars
with_dollars = (
{"merchant": t["merchant"], "dollars": t["amount_cents"] / 100}
for t in non_refunds
)
# Stage 3: Keep only transactions over $10
large_only = (
(t["merchant"], t["dollars"])
for t in with_dollars
if t["dollars"] > 10
)
return large_only
# Test data
transactions = [
{"merchant": "Amazon", "amount_cents": 4999},
{"merchant": "Refund-Store", "amount_cents": -1500},
{"merchant": "Coffee Shop", "amount_cents": 450},
{"merchant": "Grocery", "amount_cents": 8732},
{"merchant": "Gas Station", "amount_cents": 5100},
{"merchant": "Refund-Online", "amount_cents": -2000},
{"merchant": "Restaurant", "amount_cents": 3275},
]
for merchant, amount in process_transactions(transactions):
print(merchant, amount)Solution
def process_transactions(transactions):
# Stage 1: Remove refunds (negative amounts)
non_refunds = (t for t in transactions if t["amount_cents"] > 0)
# Stage 2: Convert cents to dollars
with_dollars = (
{"merchant": t["merchant"], "dollars": t["amount_cents"] / 100}
for t in non_refunds
)
# Stage 3: Keep only transactions over $10
large_only = (
(t["merchant"], t["dollars"])
for t in with_dollars
if t["dollars"] > 10
)
return large_only
transactions = [
{"merchant": "Amazon", "amount_cents": 4999},
{"merchant": "Refund-Store", "amount_cents": -1500},
{"merchant": "Coffee Shop", "amount_cents": 450},
{"merchant": "Grocery", "amount_cents": 8732},
{"merchant": "Gas Station", "amount_cents": 5100},
{"merchant": "Refund-Online", "amount_cents": -2000},
{"merchant": "Restaurant", "amount_cents": 3275},
]
for merchant, amount in process_transactions(transactions):
print(merchant, amount)
Output:
Amazon 49.99
Grocery 87.32
Gas Station 51.0
Restaurant 32.75
What happens when the for loop pulls the first item:
large_onlyaskswith_dollarsfor the next itemwith_dollarsasksnon_refundsfor the next itemnon_refundspulls fromtransactions, gets{"merchant": "Amazon", "amount_cents": 4999}— positive, so it yields itwith_dollarstransforms it to{"merchant": "Amazon", "dollars": 49.99}and yieldslarge_onlychecks49.99 > 10— True, so it yields("Amazon", 49.99)- The
forloop prints it
For "Refund-Store" (amount -1500), Stage 1 rejects it. Stage 2 and 3 never see it. For "Coffee Shop" (4.50 is not greater than $10.
Memory analysis: At any moment, only one transaction dict flows through the pipeline. If transactions were a file reader yielding millions of rows, the pipeline would still use O(1) memory — each row enters, gets processed or rejected, and is discarded.
Key insight: Generator pipelines are Python's answer to Unix pipes. Each generator is like a filter in cat data | grep | sed | awk. Data flows through stages on demand, with zero intermediate storage. This is the correct pattern for processing large datasets that do not fit in memory.
def process_transactions(transactions):
"""Build a lazy pipeline that:
1. Filters out refunds (amount < 0)
2. Converts amounts from cents to dollars
3. Filters transactions over $10
4. Returns (merchant, dollar_amount) tuples
Each stage must be a generator — no intermediate lists.
"""
# TODO: Implement 3-stage generator pipeline
pass
# Test data
transactions = [
{"merchant": "Amazon", "amount_cents": 4999},
{"merchant": "Refund-Store", "amount_cents": -1500},
{"merchant": "Coffee Shop", "amount_cents": 450},
{"merchant": "Grocery", "amount_cents": 8732},
{"merchant": "Gas Station", "amount_cents": 5100},
{"merchant": "Refund-Online", "amount_cents": -2000},
{"merchant": "Restaurant", "amount_cents": 3275},
]
for merchant, amount in process_transactions(transactions):
print(merchant, amount)Expected Output
Amazon 49.99\nGrocery 87.32\nGas Station 51.0\nRestaurant 32.75Hints
Hint 1: Each stage should be a generator expression that feeds into the next.
Hint 2: Stage 1 filters negatives, Stage 2 transforms cents to dollars, Stage 3 filters by threshold.
Hint 3: No data flows until the final consumer (the for loop) pulls from the pipeline.
Refactor three verbose loops into Pythonic comprehensions. The function processes server log entries and extracts error messages, unique sources, and per-source error counts.
from collections import Counter
def analyze_logs(logs):
error_messages = [
log["message"].upper()
for log in logs
if log["level"] == "ERROR"
]
unique_sources = {log["source"] for log in logs}
error_counts = dict(Counter(
log["source"]
for log in logs
if log["level"] == "ERROR"
))
return {
"error_messages": error_messages,
"unique_sources": unique_sources,
"error_counts": error_counts,
}
# Test data
logs = [
{"level": "INFO", "source": "web", "message": "Request received"},
{"level": "ERROR", "source": "storage", "message": "Disk full"},
{"level": "WARN", "source": "network", "message": "High latency"},
{"level": "ERROR", "source": "network", "message": "Connection timeout"},
{"level": "INFO", "source": "web", "message": "Response sent"},
{"level": "ERROR", "source": "storage", "message": "Disk failure"},
]
result = analyze_logs(logs)
print("Errors:", result["error_messages"])
print("Sources:", len(result["unique_sources"]))
print("Counts:", result["error_counts"])Solution
from collections import Counter
def analyze_logs(logs):
error_messages = [
log["message"].upper()
for log in logs
if log["level"] == "ERROR"
]
unique_sources = {log["source"] for log in logs}
error_counts = dict(Counter(
log["source"]
for log in logs
if log["level"] == "ERROR"
))
return {
"error_messages": error_messages,
"unique_sources": unique_sources,
"error_counts": error_counts,
}
logs = [
{"level": "INFO", "source": "web", "message": "Request received"},
{"level": "ERROR", "source": "storage", "message": "Disk full"},
{"level": "WARN", "source": "network", "message": "High latency"},
{"level": "ERROR", "source": "network", "message": "Connection timeout"},
{"level": "INFO", "source": "web", "message": "Response sent"},
{"level": "ERROR", "source": "storage", "message": "Disk failure"},
]
result = analyze_logs(logs)
print("Errors:", result["error_messages"])
print("Sources:", len(result["unique_sources"]))
print("Counts:", result["error_counts"])
Output:
Errors: ['DISK FULL', 'CONNECTION TIMEOUT', 'DISK FAILURE']
Sources: 3
Counts: {'storage': 2, 'network': 1}
Refactoring breakdown:
1. List comprehension (filter + transform):
# Before: 4 lines with manual append
error_messages = []
for log in logs:
if log["level"] == "ERROR":
error_messages.append(log["message"].upper())
# After: 1 expression — filter with trailing if, transform with .upper()
error_messages = [log["message"].upper() for log in logs if log["level"] == "ERROR"]
2. Set comprehension (deduplication):
# Before: 3 lines with manual add
unique_sources = set()
for log in logs:
unique_sources.add(log["source"])
# After: 1 expression — set automatically deduplicates
unique_sources = {log["source"] for log in logs}
3. Counter with generator expression (aggregation):
# Before: 7 lines with manual counting
error_counts = {}
for log in logs:
if log["level"] == "ERROR":
src = log["source"]
if src not in error_counts:
error_counts[src] = 0
error_counts[src] += 1
# After: Counter + generator expression — one pass, automatic counting
error_counts = dict(Counter(log["source"] for log in logs if log["level"] == "ERROR"))
Performance notes: Each comprehension uses LIST_APPEND or SET_ADD bytecodes instead of attribute lookup + method call, giving roughly 25-35% speedup on the iteration itself. The Counter accepts a generator expression, so no intermediate list is created for the counting step.
Key insight: The three comprehension types — list, set, and dict (via Counter) — cover the vast majority of data transformation patterns. When you see a loop that builds a collection with append, add, or key assignment, it is almost always a refactoring candidate for a comprehension.
def analyze_logs_slow(logs):
"""Slow version using loops. Refactor to comprehensions.
Given log entries, return a dict with:
- 'error_messages': list of messages from ERROR logs (uppercase)
- 'unique_sources': set of all unique source names
- 'error_counts': dict mapping each source to its error count
"""
# TODO: Refactor these three loops into comprehensions
error_messages = []
for log in logs:
if log["level"] == "ERROR":
error_messages.append(log["message"].upper())
unique_sources = set()
for log in logs:
unique_sources.add(log["source"])
error_counts = {}
for log in logs:
if log["level"] == "ERROR":
src = log["source"]
if src not in error_counts:
error_counts[src] = 0
error_counts[src] += 1
return {
"error_messages": error_messages,
"unique_sources": unique_sources,
"error_counts": error_counts,
}Expected Output
Errors: ['DISK FULL', 'CONNECTION TIMEOUT', 'DISK FAILURE']\nSources: 3\nCounts: {'storage': 2, 'network': 1}Hints
Hint 1: error_messages can be a list comprehension with a filter.
Hint 2: unique_sources can be a set comprehension.
Hint 3: error_counts needs collections.Counter or a dict comprehension over grouped data.
