Python Sets Practice Problems & Exercises
Practice: Sets and Mathematical Operations
← Back to lessonGiven a list of items (which may contain duplicates), return the count of unique elements.
Constraints:
- Items can be any hashable type (int, str, tuple)
- The list can be empty
- Do NOT use manual loops for counting -- use set conversion
Solution
def count_unique(items: list) -> int:
"""Return the number of unique elements in the list."""
return len(set(items))
# Tests
print(count_unique([1, 2, 2, 3, 3, 3])) # 3
print(count_unique(["a", "b", "a", "c", "b"])) # 3
print(count_unique([])) # 0
print(count_unique([42])) # 1
Complexity: O(n) time to build the set, O(n) space for the set. The set() constructor iterates through the list once, inserting each element into the hash table. Duplicates are silently ignored because the hash slot is already occupied by an equal element.
def count_unique(items: list) -> int:
"""Return the number of unique elements in the list."""
# TODO: implement using a set
passExpected Output
count_unique([1, 2, 2, 3, 3, 3]) => 3
count_unique(["a", "b", "a", "c", "b"]) => 3
count_unique([]) => 0
count_unique([42]) => 1Hints
Hint 1: Converting a list to a set removes all duplicates automatically.
Hint 2: The len() of a set gives the count of unique elements.
Given two sets a and b, return a dictionary containing the results of all four fundamental set operations. The 'difference' key should hold a - b (elements in a but not in b).
Constraints:
- Use operator syntax (
|,&,-,^) - Return a dict with exactly these four keys
Solution
def set_operations(a: set, b: set) -> dict:
"""Return a dict with keys 'union', 'intersection',
'difference', 'symmetric_difference' mapped to the
corresponding result sets."""
return {
"union": a | b,
"intersection": a & b,
"difference": a - b,
"symmetric_difference": a ^ b,
}
# Test
result = set_operations({1, 2, 3, 4}, {3, 4, 5, 6})
for key, val in result.items():
print(f"{key}: {val}")
Complexity: Union and symmetric difference are O(n+m). Intersection is O(min(n,m)). Difference is O(n). All operations create new sets rather than modifying the originals.
def set_operations(a: set, b: set) -> dict:
"""Return a dict with keys 'union', 'intersection',
'difference', 'symmetric_difference' mapped to the
corresponding result sets."""
# TODO: compute all four operations
passExpected Output
a={1,2,3,4}, b={3,4,5,6}
union: {1, 2, 3, 4, 5, 6}
intersection: {3, 4}
difference: {1, 2}
symmetric_difference: {1, 2, 5, 6}Hints
Hint 1: Union uses | operator or .union() method.
Hint 2: Intersection uses & operator. Difference uses - operator. Symmetric difference uses ^ operator.
Given a list with possible duplicates, return a new list containing only the first occurrence of each element, in the original order.
Constraints:
- Must be O(n) time -- do NOT use
list.index()or nested loops - Must preserve the order of first occurrence
- The key insight: sets are unordered, but you can combine a set (for O(1) membership) with a list (for order)
Solution
def unique_ordered(items: list) -> list:
"""Remove duplicates while preserving first-occurrence order."""
seen = set()
result = []
for item in items:
if item not in seen:
seen.add(item)
result.append(item)
return result
# Tests
print(unique_ordered([3, 1, 4, 1, 5, 9, 2, 6, 5]))
# [3, 1, 4, 5, 9, 2, 6]
print(unique_ordered(["b", "a", "b", "c", "a"]))
# ["b", "a", "c"]
print(unique_ordered([]))
# []
Complexity: O(n) time -- each element is checked against the set (O(1)) and potentially added to both the set and list (O(1) amortized each). O(n) space for the seen set and result list. This is the canonical pattern for order-preserving deduplication.
def unique_ordered(items: list) -> list:
"""Remove duplicates while preserving first-occurrence order."""
# TODO: use a set for O(1) lookups combined with a list for order
passExpected Output
unique_ordered([3, 1, 4, 1, 5, 9, 2, 6, 5]) => [3, 1, 4, 5, 9, 2, 6]
unique_ordered(["b", "a", "b", "c", "a"]) => ["b", "a", "c"]
unique_ordered([]) => []Hints
Hint 1: Maintain a 'seen' set and a result list. For each item, only append to result if it is not in 'seen'.
Hint 2: After appending, add the item to the seen set so future duplicates are skipped in O(1).
Model a simple social network analysis. Given the friend sets of two people, compute their mutual friends, exclusive friends, the combined social circle, and whether they share any connections. Complexity: Each operation is O(n+m) or better. The entire function runs in O(n+m) time. This pattern directly maps to real-world problems like finding common customers, shared dependencies, or overlapping feature sets.Solution
def analyze_friends(alice_friends: set, bob_friends: set) -> dict:
"""Analyze social connections between Alice and Bob.
Return a dict with keys:
'mutual' - friends of both
'alice_only' - friends exclusive to Alice
'bob_only' - friends exclusive to Bob
'all_friends' - everyone in either circle
'are_connected' - True if they share at least one friend
"""
# TODO: implement using set operations
passExpected Output
alice={"Carol","Dave","Eve","Frank"}
bob={"Dave","Eve","Grace","Heidi"}
mutual: {'Dave', 'Eve'}
alice_only: {'Carol', 'Frank'}
bob_only: {'Grace', 'Heidi'}
all_friends: {'Carol', 'Dave', 'Eve', 'Frank', 'Grace', 'Heidi'}
are_connected: TrueHints
Hint 1: Mutual friends = intersection (&). Exclusive friends = difference (-). All friends = union (|).
Hint 2: Two people are connected if their friend sets are not disjoint -- use len(intersection) > 0 or not isdisjoint().
Given a list of email addresses (with possibly inconsistent casing), extract the unique domains in lowercase using a single set comprehension.
Constraints:
- Return a set (not a list)
- Domains must be lowercase
- Use a set comprehension -- not a loop with
.add() - Assume all emails are valid (contain exactly one
@)
Solution
def extract_email_domains(emails: list[str]) -> set:
"""Extract unique lowercase domains from a list of email addresses."""
return {email.split("@")[1].lower() for email in emails}
# Test
print(extract_email_domains(emails))
# {'gmail.com', 'yahoo.com'}
Complexity: O(n) where n is the number of emails. The set comprehension iterates once, and each .split() and .lower() is O(k) where k is the string length. Deduplication happens automatically during set construction.
def extract_email_domains(emails: list[str]) -> set:
"""Extract unique lowercase domains from a list of email addresses.
Example: ["[email protected]", "[email protected]"] => {"gmail.com"}
"""
# TODO: use a set comprehension with .split("@") and .lower()
passExpected Output
["[email protected]", "[email protected]", "[email protected]", "[email protected]"]
=> {'gmail.com', 'yahoo.com'}Hints
Hint 1: Split each email on '@' and take the second part (index [1]).
Hint 2: Use .lower() on the domain part. A set comprehension automatically deduplicates.
You are building a deployment safety checker. Given the old and new feature configuration sets, compute what changed and determine if the deployment is safe (no features were removed).
Constraints:
'changed'should equal the symmetric difference (all features that were either added or removed)'is_safe'isTrueonly when nothing was removed
Solution
def config_diff(old_config: set, new_config: set) -> dict:
removed = old_config - new_config
return {
"added": new_config - old_config,
"removed": removed,
"unchanged": old_config & new_config,
"changed": old_config ^ new_config,
"is_safe": len(removed) == 0,
}
# Test
old = {"auth", "logging", "cache", "rate_limit"}
new = {"auth", "logging", "metrics", "tracing"}
result = config_diff(old, new)
for key, val in result.items():
print(f"{key}: {val}")
Complexity: All operations are O(n+m). This is the exact pattern used in infrastructure tools for computing config deltas, dependency diffs, and feature flag audits.
def config_diff(old_config: set, new_config: set) -> dict:
"""Compare two configuration sets and report changes.
Return a dict with:
'added' - features in new but not old
'removed' - features in old but not new
'unchanged' - features in both
'changed' - features that were either added or removed
'is_safe' - True if nothing was removed (additive change only)
"""
# TODO: implement using set operations
passExpected Output
old={"auth","logging","cache","rate_limit"}
new={"auth","logging","metrics","tracing"}
added: {'metrics', 'tracing'}
removed: {'cache', 'rate_limit'}
unchanged: {'auth', 'logging'}
changed: {'cache', 'metrics', 'rate_limit', 'tracing'}
is_safe: FalseHints
Hint 1: Added = new - old. Removed = old - new. Unchanged = old & new.
Hint 2: Changed is the symmetric difference (old ^ new). A deploy is safe if the removed set is empty.
Demonstrate how frozenset enables using sets as dictionary keys. Build a cache where the key is a set of features (order-independent) and the value is a configuration name.
Constraints:
- The cache must use
frozensetkeys so that{"auth", "logging"}and{"logging", "auth"}map to the same entry - Return
Noneif the query features are not in the cache
Solution
def cached_query(feature_sets_and_results: list[tuple[set, str]],
query_features: set) -> str | None:
cache = {}
for feature_set, result in feature_sets_and_results:
cache[frozenset(feature_set)] = result
return cache.get(frozenset(query_features))
# Tests
data = [
({"auth", "logging"}, "config_A"),
({"auth", "metrics"}, "config_B"),
]
print(cached_query(data, {"logging", "auth"})) # "config_A"
print(cached_query(data, {"auth", "metrics"})) # "config_B"
print(cached_query(data, {"auth", "tracing"})) # None
Key insight: frozenset is the immutable, hashable counterpart of set. Because frozenset({"a", "b"}) == frozenset({"b", "a"}) and they produce the same hash, order of elements does not matter for cache lookups. This is why frozenset exists -- it allows set-like data to participate in hash-based structures (dict keys, set elements).
def cached_query(feature_sets_and_results: list[tuple[set, str]],
query_features: set) -> str | None:
"""Build a cache from (feature_set, result) pairs using frozenset keys,
then look up the result for query_features.
Return the cached result string, or None if not found.
"""
# TODO: build a dict with frozenset keys, then look up query_features
passExpected Output
cache data:
({"auth","logging"}, "config_A")
({"auth","metrics"}, "config_B")
query {"logging","auth"} => "config_A" (order doesn't matter)
query {"auth","metrics"} => "config_B"
query {"auth","tracing"} => NoneHints
Hint 1: Convert each set to frozenset() before using it as a dict key.
Hint 2: When looking up, also convert the query set to frozenset(). frozenset({'a','b'}) == frozenset({'b','a'}).
Implement an authorization checker using subset/superset relationships and set operations. A user is granted access only if they hold ALL required permissions AND NONE of the forbidden permissions. Complexity: Subset check is O(len(required)). Solution
isdisjoint is O(min(len(user), len(forbidden))) and short-circuits on the first common element. All difference/intersection operations are O(n). This maps directly to role-based access control (RBAC) systems.
def check_permissions(
user_perms: set,
required_perms: set,
forbidden_perms: set,
) -> dict:
"""Check if a user has the right permissions for an action.
Return a dict with:
'has_access' - True if user has ALL required perms
'is_forbidden' - True if user has ANY forbidden perm
'missing' - set of required perms the user lacks
'violations' - set of forbidden perms the user holds
'granted' - True if has_access and not is_forbidden
"""
# TODO: use subset/superset checks and set operations
passExpected Output
user={"read","write","delete"}
required={"read","write","admin"}
forbidden={"root","sudo"}
has_access: False
is_forbidden: False
missing: {'admin'}
violations: set()
granted: FalseHints
Hint 1: has_access is True when required_perms is a subset of user_perms (required <= user).
Hint 2: is_forbidden is True when user_perms and forbidden_perms are NOT disjoint. Missing = required - user. Violations = user & forbidden.
Given a list of sets, find elements common to ALL sets. This generalizes the two-set intersection to N sets.
Constraints:
- Handle the empty list case (return empty set)
- Handle sets with no overlap (return empty set)
- Your solution should work for any number of sets
Solution
def common_to_all(sets: list[set]) -> set:
"""Return elements that appear in ALL of the given sets."""
if not sets:
return set()
result = sets[0].copy()
for s in sets[1:]:
result &= s
return result
# Alternative using set.intersection with unpacking:
def common_to_all_v2(sets: list[set]) -> set:
if not sets:
return set()
return sets[0].intersection(*sets[1:])
# Tests
print(common_to_all([{1, 2, 3}, {2, 3, 4}, {3, 4, 5}])) # {3}
print(common_to_all([{1, 2}, {1, 2, 3}, {1, 2, 3, 4}])) # {1, 2}
print(common_to_all([{"a", "b"}, {"c", "d"}])) # set()
print(common_to_all([])) # set()
Complexity: O(n * k) where n is the number of sets and k is the size of the smallest set. Each intersection reduces the working set, so later intersections become cheaper. The set.intersection(*others) approach lets CPython optimize the multi-set intersection internally.
def common_to_all(sets: list[set]) -> set:
"""Return elements that appear in ALL of the given sets.
If the list is empty, return an empty set.
"""
# TODO: find the intersection across an arbitrary number of sets
passExpected Output
common_to_all([{1,2,3},{2,3,4},{3,4,5}]) => {3}
common_to_all([{1,2},{1,2,3},{1,2,3,4}]) => {1, 2}
common_to_all([{"a","b"},{"c","d"}]) => set()
common_to_all([]) => set()Hints
Hint 1: You can chain intersections: start with the first set and intersect with each subsequent set.
Hint 2: Use set.intersection(*others) which accepts multiple arguments, or use functools.reduce with the & operator.
Build a dependency conflict detector for a package manager. Two packages "conflict" if they share some dependencies but each also pulls in unique ones -- this can cause version mismatches in practice.
Constraints:
- Only report pairs where the intersection is non-empty AND both differences are non-empty
- Return results sorted by pair names (alphabetically)
- Use
itertools.combinationsfor generating pairs
Solution
from itertools import combinations
def find_conflicts(packages: dict[str, set[str]]) -> list[dict]:
conflicts = []
for (name_a, deps_a), (name_b, deps_b) in combinations(
packages.items(), 2
):
shared = deps_a & deps_b
exclusive_a = deps_a - deps_b
exclusive_b = deps_b - deps_a
if shared and exclusive_a and exclusive_b:
pair = tuple(sorted([name_a, name_b]))
conflicts.append({
"pair": pair,
"shared": shared,
"exclusive_a": exclusive_a if pair[0] == name_a else exclusive_b,
"exclusive_b": exclusive_b if pair[1] == name_b else exclusive_a,
})
return sorted(conflicts, key=lambda c: c["pair"])
# Test
packages = {
"web": {"requests", "flask", "jinja2"},
"api": {"requests", "fastapi", "pydantic"},
"cli": {"click", "rich"},
}
for conflict in find_conflicts(packages):
a, b = conflict["pair"]
print(f"CONFLICT {a} <-> {b}")
print(f" shared: {conflict['shared']}")
print(f" {a}-only: {conflict['exclusive_a']}")
print(f" {b}-only: {conflict['exclusive_b']}")
Complexity: O(P^2 * D) where P is the number of packages and D is the average dependency set size. Each pair comparison involves intersection and two differences, all O(D). For a package manager with hundreds of packages, this is practical. For thousands, you would index by dependency to avoid the quadratic pair enumeration.
def find_conflicts(
packages: dict[str, set[str]],
) -> list[dict]:
"""Detect dependency conflicts between packages.
'packages' maps package names to their dependency sets.
A conflict exists between two packages if they share at least
one dependency AND each has dependencies the other doesn't.
Return a list of conflict dicts, each with:
'pair' - tuple of the two package names
'shared' - set of common dependencies
'exclusive_a' - deps only in the first package
'exclusive_b' - deps only in the second package
"""
# TODO: compare every pair of packages using set operations
passExpected Output
packages:
"web": {"requests","flask","jinja2"}
"api": {"requests","fastapi","pydantic"}
"cli": {"click","rich"}
conflicts:
pair=('web','api') shared={'requests'}
web-only: {'flask', 'jinja2'}
api-only: {'fastapi', 'pydantic'}
(web,cli) and (api,cli) have no shared deps => no conflictHints
Hint 1: Use itertools.combinations to get all unique pairs of packages.
Hint 2: For each pair, compute intersection. If intersection is non-empty AND each side has exclusive deps (difference is non-empty), it is a conflict.
Implement BFS-based graph reachability. The visited set is the key data structure -- it provides O(1) checks to avoid revisiting nodes, which prevents infinite loops in cyclic graphs and keeps the algorithm O(V+E).
Constraints:
- Use a set for visited node tracking (not a list)
- Handle nodes with no outgoing edges (empty neighbor lists)
- Handle nodes not present as keys in the graph (they have no neighbors)
Solution
from collections import deque
def find_reachable(graph: dict[str, list[str]], start: str) -> set[str]:
"""Return all nodes reachable from 'start' using BFS."""
visited = {start}
queue = deque([start])
while queue:
node = queue.popleft()
for neighbor in graph.get(node, []):
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return visited
def are_connected(
graph: dict[str, list[str]], node_a: str, node_b: str
) -> bool:
"""Return True if there is a path from node_a to node_b."""
return node_b in find_reachable(graph, node_a)
# Test
graph = {
"A": ["B", "C"],
"B": ["D"],
"C": ["D", "E"],
"D": [],
"E": ["F"],
"F": [],
"G": ["H"],
"H": [],
}
print(f"reachable from A: {find_reachable(graph, 'A')}")
print(f"reachable from G: {find_reachable(graph, 'G')}")
print(f"are_connected(A, F): {are_connected(graph, 'A', 'F')}")
print(f"are_connected(A, G): {are_connected(graph, 'A', 'G')}")
Complexity: O(V+E) time where V is vertices and E is edges. The visited set ensures each node is processed exactly once. Without the set (using a list for visited checks), the algorithm would be O(V*(V+E)) because node in visited_list is O(V). This is the textbook example of why sets matter in graph algorithms.
def find_reachable(
graph: dict[str, list[str]],
start: str,
) -> set[str]:
"""Return all nodes reachable from 'start' using BFS.
The result should include the start node itself.
The graph is represented as an adjacency list (dict of lists).
"""
# TODO: implement BFS using a set to track visited nodes
pass
def are_connected(
graph: dict[str, list[str]],
node_a: str,
node_b: str,
) -> bool:
"""Return True if there is a path from node_a to node_b."""
# TODO: use find_reachable
passExpected Output
graph:
A -> [B, C]
B -> [D]
C -> [D, E]
D -> []
E -> [F]
F -> []
G -> [H]
H -> []
reachable from A: {'A', 'B', 'C', 'D', 'E', 'F'}
reachable from G: {'G', 'H'}
are_connected(A, F): True
are_connected(A, G): FalseHints
Hint 1: Use a collections.deque for the BFS queue and a set for visited nodes.
Hint 2: For each node you dequeue, add its unvisited neighbors to both the queue and the visited set.
Hint 3: are_connected simply checks if node_b is in the reachable set from node_a.
Simulate the log analysis design challenge from the lesson. Given two lists of nginx-format log lines, extract IP addresses and compute all set operations to answer: who returned, who left, who is new, and how many total unique visitors are there?
Design considerations:
- In production, you would stream the file line by line (not load all lines into a list) to handle multi-GB files
- The IP sets themselves fit in memory because unique IPs are far fewer than total log lines
- All set operations are O(n+m), making this approach feasible for 10M+ line files
Solution
def analyze_ip_logs(
log_a_lines: list[str],
log_b_lines: list[str],
) -> dict:
# Extract unique IPs from each log
ips_a = {line.split()[0] for line in log_a_lines if line.strip()}
ips_b = {line.split()[0] for line in log_b_lines if line.strip()}
returning = ips_a & ips_b
only_a = ips_a - ips_b
only_b = ips_b - ips_a
total = ips_a | ips_b
return {
"unique_a": len(ips_a),
"unique_b": len(ips_b),
"returning": returning,
"only_a": only_a,
"only_b": only_b,
"total_unique": len(total),
}
# Test with sample nginx-format log lines
log_a = [
'192.168.1.1 - - [01/Jan:00:00] "GET / HTTP/1.1" 200 1234',
'10.0.0.1 - - [01/Jan:00:01] "GET /about HTTP/1.1" 200 5678',
'192.168.1.1 - - [01/Jan:00:02] "GET /contact HTTP/1.1" 200 910',
'172.16.0.1 - - [01/Jan:00:03] "POST /api HTTP/1.1" 201 100',
]
log_b = [
'10.0.0.1 - - [01/Feb:00:00] "GET / HTTP/1.1" 200 1234',
'172.16.0.1 - - [01/Feb:00:01] "GET /about HTTP/1.1" 200 5678',
'8.8.8.8 - - [01/Feb:00:02] "GET / HTTP/1.1" 200 910',
'10.0.0.1 - - [01/Feb:00:03] "GET /api HTTP/1.1" 200 100',
]
result = analyze_ip_logs(log_a, log_b)
for k, v in result.items():
print(f"{k}: {v}")
Complexity: Building each IP set is O(n) where n is the number of log lines. Intersection, difference, and union are each O(n+m) where n and m are the unique IP counts. Total: O(lines_a + lines_b) time. For 10M-line logs with perhaps 500K unique IPs each, this completes in seconds. The set comprehension {line.split()[0] for line in lines} handles both extraction and deduplication in a single pass.
Production adaptation: Replace the list input with a file-streaming generator. Use split(' ', 1)[0] instead of split()[0] to avoid splitting the entire line. Add a 1MB read buffer. The memory usage is dominated by the IP sets (roughly 50-500MB for millions of unique IPs), not the log lines.
def analyze_ip_logs(
log_a_lines: list[str],
log_b_lines: list[str],
) -> dict:
"""Analyze two sets of log lines (nginx format: IP is first token).
Return a dict with:
'unique_a' - count of unique IPs in log A
'unique_b' - count of unique IPs in log B
'returning' - set of IPs in both logs
'only_a' - set of IPs only in log A
'only_b' - set of IPs only in log B
'total_unique' - count of unique IPs across both logs
"""
# TODO: extract IP sets from each log, then compute all operations
passExpected Output
Log A: 3 unique IPs (192.168.1.1, 10.0.0.1, 172.16.0.1)
Log B: 3 unique IPs (10.0.0.1, 172.16.0.1, 8.8.8.8)
unique_a: 3
unique_b: 3
returning: {'10.0.0.1', '172.16.0.1'}
only_a: {'192.168.1.1'}
only_b: {'8.8.8.8'}
total_unique: 4Hints
Hint 1: Extract the IP from each line with line.split()[0]. Build a set from each log for automatic deduplication.
Hint 2: Use intersection for returning IPs, difference for exclusive IPs, and union for total unique count.
Hint 3: Think about why this approach works for 10M-line logs: building the set is O(n) and all set operations are O(n+m).
