Python Sets Practice Problems & Exercises

Practice: Sets and Mathematical Operations

12 problems4 Easy5 Medium3 Hard⏱ 45-60 min

← Back to lesson

#1Unique Element CounterEasy

set basicsdeduplication

Given a list of items (which may contain duplicates), return the count of unique elements.

Constraints:

Items can be any hashable type (int, str, tuple)
The list can be empty
Do NOT use manual loops for counting -- use set conversion

Solution

def count_unique(items: list) -> int:
    """Return the number of unique elements in the list."""
    return len(set(items))

# Tests
print(count_unique([1, 2, 2, 3, 3, 3]))        # 3
print(count_unique(["a", "b", "a", "c", "b"]))  # 3
print(count_unique([]))                          # 0
print(count_unique([42]))                        # 1

Complexity: O(n) time to build the set, O(n) space for the set. The set() constructor iterates through the list once, inserting each element into the hash table. Duplicates are silently ignored because the hash slot is already occupied by an equal element.

Starter Code

def count_unique(items: list) -> int:
  """Return the number of unique elements in the list."""
  # TODO: implement using a set
  pass

Expected Output

count_unique([1, 2, 2, 3, 3, 3]) => 3
count_unique(["a", "b", "a", "c", "b"]) => 3
count_unique([]) => 0
count_unique([42]) => 1

Hints

Hint 1: Converting a list to a set removes all duplicates automatically.

Hint 2: The len() of a set gives the count of unique elements.

#2Set Operation ClassifierEasy

unionintersectiondifferencesymmetric difference

Given two sets a and b, return a dictionary containing the results of all four fundamental set operations. The 'difference' key should hold a - b (elements in a but not in b).

Constraints:

Use operator syntax (|, &, -, ^)
Return a dict with exactly these four keys

Solution

def set_operations(a: set, b: set) -> dict:
    """Return a dict with keys 'union', 'intersection',
    'difference', 'symmetric_difference' mapped to the
    corresponding result sets."""
    return {
        "union": a | b,
        "intersection": a & b,
        "difference": a - b,
        "symmetric_difference": a ^ b,
    }

# Test
result = set_operations({1, 2, 3, 4}, {3, 4, 5, 6})
for key, val in result.items():
    print(f"{key}: {val}")

Complexity: Union and symmetric difference are O(n+m). Intersection is O(min(n,m)). Difference is O(n). All operations create new sets rather than modifying the originals.

Starter Code

def set_operations(a: set, b: set) -> dict:
  """Return a dict with keys 'union', 'intersection',
  'difference', 'symmetric_difference' mapped to the
  corresponding result sets."""
  # TODO: compute all four operations
  pass

Expected Output

a={1,2,3,4}, b={3,4,5,6}
union: {1, 2, 3, 4, 5, 6}
intersection: {3, 4}
difference: {1, 2}
symmetric_difference: {1, 2, 5, 6}

Hints

Hint 1: Union uses | operator or .union() method.

Hint 2: Intersection uses & operator. Difference uses - operator. Symmetric difference uses ^ operator.

#3Order-Preserving DeduplicationEasy

deduplicationinsertion orderset + list pattern

Given a list with possible duplicates, return a new list containing only the first occurrence of each element, in the original order.

Constraints:

Must be O(n) time -- do NOT use list.index() or nested loops
Must preserve the order of first occurrence
The key insight: sets are unordered, but you can combine a set (for O(1) membership) with a list (for order)

Solution

def unique_ordered(items: list) -> list:
    """Remove duplicates while preserving first-occurrence order."""
    seen = set()
    result = []
    for item in items:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result

# Tests
print(unique_ordered([3, 1, 4, 1, 5, 9, 2, 6, 5]))
# [3, 1, 4, 5, 9, 2, 6]
print(unique_ordered(["b", "a", "b", "c", "a"]))
# ["b", "a", "c"]
print(unique_ordered([]))
# []

Complexity: O(n) time -- each element is checked against the set (O(1)) and potentially added to both the set and list (O(1) amortized each). O(n) space for the seen set and result list. This is the canonical pattern for order-preserving deduplication.

Starter Code

def unique_ordered(items: list) -> list:
  """Remove duplicates while preserving first-occurrence order."""
  # TODO: use a set for O(1) lookups combined with a list for order
  pass

Expected Output

unique_ordered([3, 1, 4, 1, 5, 9, 2, 6, 5]) => [3, 1, 4, 5, 9, 2, 6]
unique_ordered(["b", "a", "b", "c", "a"]) => ["b", "a", "c"]
unique_ordered([]) => []

Hints

Hint 1: Maintain a 'seen' set and a result list. For each item, only append to result if it is not in 'seen'.

Hint 2: After appending, add the item to the seen set so future duplicates are skipped in O(1).

#4Common and Exclusive FriendsEasy

intersectiondifferencereal-world modeling

Model a simple social network analysis. Given the friend sets of two people, compute their mutual friends, exclusive friends, the combined social circle, and whether they share any connections.

Solution

def analyze_friends(alice_friends: set, bob_friends: set) -> dict:
    mutual = alice_friends & bob_friends
    return {
        "mutual": mutual,
        "alice_only": alice_friends - bob_friends,
        "bob_only": bob_friends - alice_friends,
        "all_friends": alice_friends | bob_friends,
        "are_connected": len(mutual) > 0,
    }

# Test
alice = {"Carol", "Dave", "Eve", "Frank"}
bob = {"Dave", "Eve", "Grace", "Heidi"}
result = analyze_friends(alice, bob)
for key, val in result.items():
    print(f"{key}: {val}")

Complexity: Each operation is O(n+m) or better. The entire function runs in O(n+m) time. This pattern directly maps to real-world problems like finding common customers, shared dependencies, or overlapping feature sets.

Starter Code

def analyze_friends(alice_friends: set, bob_friends: set) -> dict:
  """Analyze social connections between Alice and Bob.

  Return a dict with keys:
    'mutual'       - friends of both
    'alice_only'   - friends exclusive to Alice
    'bob_only'     - friends exclusive to Bob
    'all_friends'  - everyone in either circle
    'are_connected' - True if they share at least one friend
  """
  # TODO: implement using set operations
  pass

Expected Output

alice={"Carol","Dave","Eve","Frank"}
bob={"Dave","Eve","Grace","Heidi"}
mutual: {'Dave', 'Eve'}
alice_only: {'Carol', 'Frank'}
bob_only: {'Grace', 'Heidi'}
all_friends: {'Carol', 'Dave', 'Eve', 'Frank', 'Grace', 'Heidi'}
are_connected: True

Hints

Hint 1: Mutual friends = intersection (&). Exclusive friends = difference (-). All friends = union (|).

Hint 2: Two people are connected if their friend sets are not disjoint -- use len(intersection) > 0 or not isdisjoint().

#5Set Comprehension FilterMedium

set comprehensionfilteringstring processing

Given a list of email addresses (with possibly inconsistent casing), extract the unique domains in lowercase using a single set comprehension.

Constraints:

Return a set (not a list)
Domains must be lowercase
Use a set comprehension -- not a loop with .add()
Assume all emails are valid (contain exactly one @)

Solution

def extract_email_domains(emails: list[str]) -> set:
    """Extract unique lowercase domains from a list of email addresses."""
    return {email.split("@")[1].lower() for email in emails}

# Test
emails = ["[email protected]", "[email protected]", "[email protected]", "[email protected]"]
print(extract_email_domains(emails))
# {'gmail.com', 'yahoo.com'}

Complexity: O(n) where n is the number of emails. The set comprehension iterates once, and each .split() and .lower() is O(k) where k is the string length. Deduplication happens automatically during set construction.

Starter Code

def extract_email_domains(emails: list[str]) -> set:
  """Extract unique lowercase domains from a list of email addresses.

  Example: ["[email protected]", "[email protected]"] => {"gmail.com"}
  """
  # TODO: use a set comprehension with .split("@") and .lower()
  pass

Expected Output

["[email protected]", "[email protected]", "[email protected]", "[email protected]"]
=> {'gmail.com', 'yahoo.com'}

Hints

Hint 1: Split each email on '@' and take the second part (index [1]).

Hint 2: Use .lower() on the domain part. A set comprehension automatically deduplicates.

#6Config Diff ReporterMedium

differencesymmetric differencereal-world pattern

You are building a deployment safety checker. Given the old and new feature configuration sets, compute what changed and determine if the deployment is safe (no features were removed).

Constraints:

'changed' should equal the symmetric difference (all features that were either added or removed)
'is_safe' is True only when nothing was removed

Solution

def config_diff(old_config: set, new_config: set) -> dict:
    removed = old_config - new_config
    return {
        "added": new_config - old_config,
        "removed": removed,
        "unchanged": old_config & new_config,
        "changed": old_config ^ new_config,
        "is_safe": len(removed) == 0,
    }

# Test
old = {"auth", "logging", "cache", "rate_limit"}
new = {"auth", "logging", "metrics", "tracing"}
result = config_diff(old, new)
for key, val in result.items():
    print(f"{key}: {val}")

Complexity: All operations are O(n+m). This is the exact pattern used in infrastructure tools for computing config deltas, dependency diffs, and feature flag audits.

Starter Code

def config_diff(old_config: set, new_config: set) -> dict:
  """Compare two configuration sets and report changes.

  Return a dict with:
    'added'     - features in new but not old
    'removed'   - features in old but not new
    'unchanged' - features in both
    'changed'   - features that were either added or removed
    'is_safe'   - True if nothing was removed (additive change only)
  """
  # TODO: implement using set operations
  pass

Expected Output

old={"auth","logging","cache","rate_limit"}
new={"auth","logging","metrics","tracing"}
added: {'metrics', 'tracing'}
removed: {'cache', 'rate_limit'}
unchanged: {'auth', 'logging'}
changed: {'cache', 'metrics', 'rate_limit', 'tracing'}
is_safe: False

Hints

Hint 1: Added = new - old. Removed = old - new. Unchanged = old & new.

Hint 2: Changed is the symmetric difference (old ^ new). A deploy is safe if the removed set is empty.

#7Frozenset Cache KeyMedium

frozensethashabilitycaching

Demonstrate how frozenset enables using sets as dictionary keys. Build a cache where the key is a set of features (order-independent) and the value is a configuration name.

Constraints:

The cache must use frozenset keys so that {"auth", "logging"} and {"logging", "auth"} map to the same entry
Return None if the query features are not in the cache

Solution

def cached_query(feature_sets_and_results: list[tuple[set, str]],
                 query_features: set) -> str | None:
    cache = {}
    for feature_set, result in feature_sets_and_results:
        cache[frozenset(feature_set)] = result
    return cache.get(frozenset(query_features))

# Tests
data = [
    ({"auth", "logging"}, "config_A"),
    ({"auth", "metrics"}, "config_B"),
]
print(cached_query(data, {"logging", "auth"}))   # "config_A"
print(cached_query(data, {"auth", "metrics"}))    # "config_B"
print(cached_query(data, {"auth", "tracing"}))    # None

Key insight: frozenset is the immutable, hashable counterpart of set. Because frozenset({"a", "b"}) == frozenset({"b", "a"}) and they produce the same hash, order of elements does not matter for cache lookups. This is why frozenset exists -- it allows set-like data to participate in hash-based structures (dict keys, set elements).

Starter Code

def cached_query(feature_sets_and_results: list[tuple[set, str]],
               query_features: set) -> str | None:
  """Build a cache from (feature_set, result) pairs using frozenset keys,
  then look up the result for query_features.

  Return the cached result string, or None if not found.
  """
  # TODO: build a dict with frozenset keys, then look up query_features
  pass

Expected Output

cache data:
({"auth","logging"}, "config_A")
({"auth","metrics"}, "config_B")

query {"logging","auth"} => "config_A"  (order doesn't matter)
query {"auth","metrics"} => "config_B"
query {"auth","tracing"} => None

Hints

Hint 1: Convert each set to frozenset() before using it as a dict key.

Hint 2: When looking up, also convert the query set to frozenset(). frozenset({'a','b'}) == frozenset({'b','a'}).

#8Subset Permission CheckerMedium

subsetsupersetisdisjointauthorization

Implement an authorization checker using subset/superset relationships and set operations. A user is granted access only if they hold ALL required permissions AND NONE of the forbidden permissions.

Solution

def check_permissions(
    user_perms: set,
    required_perms: set,
    forbidden_perms: set,
) -> dict:
    has_access = required_perms <= user_perms
    is_forbidden = not user_perms.isdisjoint(forbidden_perms)
    return {
        "has_access": has_access,
        "is_forbidden": is_forbidden,
        "missing": required_perms - user_perms,
        "violations": user_perms & forbidden_perms,
        "granted": has_access and not is_forbidden,
    }

# Test 1: missing 'admin'
result = check_permissions(
    user_perms={"read", "write", "delete"},
    required_perms={"read", "write", "admin"},
    forbidden_perms={"root", "sudo"},
)
for k, v in result.items():
    print(f"{k}: {v}")

print()

# Test 2: has all required but also holds a forbidden perm
result2 = check_permissions(
    user_perms={"read", "write", "admin", "sudo"},
    required_perms={"read", "write"},
    forbidden_perms={"root", "sudo"},
)
for k, v in result2.items():
    print(f"{k}: {v}")

Complexity: Subset check is O(len(required)). isdisjoint is O(min(len(user), len(forbidden))) and short-circuits on the first common element. All difference/intersection operations are O(n). This maps directly to role-based access control (RBAC) systems.

Starter Code

def check_permissions(
  user_perms: set,
  required_perms: set,
  forbidden_perms: set,
) -> dict:
  """Check if a user has the right permissions for an action.

  Return a dict with:
    'has_access'    - True if user has ALL required perms
    'is_forbidden'  - True if user has ANY forbidden perm
    'missing'       - set of required perms the user lacks
    'violations'    - set of forbidden perms the user holds
    'granted'       - True if has_access and not is_forbidden
  """
  # TODO: use subset/superset checks and set operations
  pass

Expected Output

user={"read","write","delete"}
required={"read","write","admin"}
forbidden={"root","sudo"}
has_access: False
is_forbidden: False
missing: {'admin'}
violations: set()
granted: False

Hints

Hint 1: has_access is True when required_perms is a subset of user_perms (required <= user).

Hint 2: is_forbidden is True when user_perms and forbidden_perms are NOT disjoint. Missing = required - user. Violations = user & forbidden.

#9Multi-Set Intersection PipelineMedium

intersectionreducemultiple sets

Given a list of sets, find elements common to ALL sets. This generalizes the two-set intersection to N sets.

Constraints:

Handle the empty list case (return empty set)
Handle sets with no overlap (return empty set)
Your solution should work for any number of sets

Solution

def common_to_all(sets: list[set]) -> set:
    """Return elements that appear in ALL of the given sets."""
    if not sets:
        return set()
    result = sets[0].copy()
    for s in sets[1:]:
        result &= s
    return result

# Alternative using set.intersection with unpacking:
def common_to_all_v2(sets: list[set]) -> set:
    if not sets:
        return set()
    return sets[0].intersection(*sets[1:])

# Tests
print(common_to_all([{1, 2, 3}, {2, 3, 4}, {3, 4, 5}]))     # {3}
print(common_to_all([{1, 2}, {1, 2, 3}, {1, 2, 3, 4}]))      # {1, 2}
print(common_to_all([{"a", "b"}, {"c", "d"}]))                 # set()
print(common_to_all([]))                                        # set()

Complexity: O(n * k) where n is the number of sets and k is the size of the smallest set. Each intersection reduces the working set, so later intersections become cheaper. The set.intersection(*others) approach lets CPython optimize the multi-set intersection internally.

Starter Code

def common_to_all(sets: list[set]) -> set:
  """Return elements that appear in ALL of the given sets.

  If the list is empty, return an empty set.
  """
  # TODO: find the intersection across an arbitrary number of sets
  pass

Expected Output

common_to_all([{1,2,3},{2,3,4},{3,4,5}]) => {3}
common_to_all([{1,2},{1,2,3},{1,2,3,4}]) => {1, 2}
common_to_all([{"a","b"},{"c","d"}]) => set()
common_to_all([]) => set()

Hints

Hint 1: You can chain intersections: start with the first set and intersect with each subsequent set.

Hint 2: Use set.intersection(*others) which accepts multiple arguments, or use functools.reduce with the & operator.

#10Dependency Conflict DetectorHard

symmetric differencepairwise operationsreal-world

Build a dependency conflict detector for a package manager. Two packages "conflict" if they share some dependencies but each also pulls in unique ones -- this can cause version mismatches in practice.

Constraints:

Only report pairs where the intersection is non-empty AND both differences are non-empty
Return results sorted by pair names (alphabetically)
Use itertools.combinations for generating pairs

Solution

from itertools import combinations

def find_conflicts(packages: dict[str, set[str]]) -> list[dict]:
    conflicts = []
    for (name_a, deps_a), (name_b, deps_b) in combinations(
        packages.items(), 2
    ):
        shared = deps_a & deps_b
        exclusive_a = deps_a - deps_b
        exclusive_b = deps_b - deps_a
        if shared and exclusive_a and exclusive_b:
            pair = tuple(sorted([name_a, name_b]))
            conflicts.append({
                "pair": pair,
                "shared": shared,
                "exclusive_a": exclusive_a if pair[0] == name_a else exclusive_b,
                "exclusive_b": exclusive_b if pair[1] == name_b else exclusive_a,
            })
    return sorted(conflicts, key=lambda c: c["pair"])

# Test
packages = {
    "web": {"requests", "flask", "jinja2"},
    "api": {"requests", "fastapi", "pydantic"},
    "cli": {"click", "rich"},
}
for conflict in find_conflicts(packages):
    a, b = conflict["pair"]
    print(f"CONFLICT {a} <-> {b}")
    print(f"  shared: {conflict['shared']}")
    print(f"  {a}-only: {conflict['exclusive_a']}")
    print(f"  {b}-only: {conflict['exclusive_b']}")

Complexity: O(P^2 * D) where P is the number of packages and D is the average dependency set size. Each pair comparison involves intersection and two differences, all O(D). For a package manager with hundreds of packages, this is practical. For thousands, you would index by dependency to avoid the quadratic pair enumeration.

Starter Code

def find_conflicts(
  packages: dict[str, set[str]],
) -> list[dict]:
  """Detect dependency conflicts between packages.

  'packages' maps package names to their dependency sets.
  A conflict exists between two packages if they share at least
  one dependency AND each has dependencies the other doesn't.

  Return a list of conflict dicts, each with:
    'pair'        - tuple of the two package names
    'shared'      - set of common dependencies
    'exclusive_a' - deps only in the first package
    'exclusive_b' - deps only in the second package
  """
  # TODO: compare every pair of packages using set operations
  pass

Expected Output

packages:
"web": {"requests","flask","jinja2"}
"api": {"requests","fastapi","pydantic"}
"cli": {"click","rich"}

conflicts:
pair=('web','api') shared={'requests'}
  web-only: {'flask', 'jinja2'}
  api-only: {'fastapi', 'pydantic'}
(web,cli) and (api,cli) have no shared deps => no conflict

Hints

Hint 1: Use itertools.combinations to get all unique pairs of packages.

Hint 2: For each pair, compute intersection. If intersection is non-empty AND each side has exclusive deps (difference is non-empty), it is a conflict.

#11Graph Reachability with Visited SetHard

BFSvisited setgraph traversal

Implement BFS-based graph reachability. The visited set is the key data structure -- it provides O(1) checks to avoid revisiting nodes, which prevents infinite loops in cyclic graphs and keeps the algorithm O(V+E).

Constraints:

Use a set for visited node tracking (not a list)
Handle nodes with no outgoing edges (empty neighbor lists)
Handle nodes not present as keys in the graph (they have no neighbors)

Solution

from collections import deque

def find_reachable(graph: dict[str, list[str]], start: str) -> set[str]:
    """Return all nodes reachable from 'start' using BFS."""
    visited = {start}
    queue = deque([start])
    while queue:
        node = queue.popleft()
        for neighbor in graph.get(node, []):
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append(neighbor)
    return visited

def are_connected(
    graph: dict[str, list[str]], node_a: str, node_b: str
) -> bool:
    """Return True if there is a path from node_a to node_b."""
    return node_b in find_reachable(graph, node_a)

# Test
graph = {
    "A": ["B", "C"],
    "B": ["D"],
    "C": ["D", "E"],
    "D": [],
    "E": ["F"],
    "F": [],
    "G": ["H"],
    "H": [],
}
print(f"reachable from A: {find_reachable(graph, 'A')}")
print(f"reachable from G: {find_reachable(graph, 'G')}")
print(f"are_connected(A, F): {are_connected(graph, 'A', 'F')}")
print(f"are_connected(A, G): {are_connected(graph, 'A', 'G')}")

Complexity: O(V+E) time where V is vertices and E is edges. The visited set ensures each node is processed exactly once. Without the set (using a list for visited checks), the algorithm would be O(V*(V+E)) because node in visited_list is O(V). This is the textbook example of why sets matter in graph algorithms.

Starter Code

def find_reachable(
  graph: dict[str, list[str]],
  start: str,
) -> set[str]:
  """Return all nodes reachable from 'start' using BFS.

  The result should include the start node itself.
  The graph is represented as an adjacency list (dict of lists).
  """
  # TODO: implement BFS using a set to track visited nodes
  pass

def are_connected(
  graph: dict[str, list[str]],
  node_a: str,
  node_b: str,
) -> bool:
  """Return True if there is a path from node_a to node_b."""
  # TODO: use find_reachable
  pass

Expected Output

graph:
A -> [B, C]
B -> [D]
C -> [D, E]
D -> []
E -> [F]
F -> []
G -> [H]
H -> []

reachable from A: {'A', 'B', 'C', 'D', 'E', 'F'}
reachable from G: {'G', 'H'}
are_connected(A, F): True
are_connected(A, G): False

Hints

Hint 1: Use a collections.deque for the BFS queue and a set for visited nodes.

Hint 2: For each node you dequeue, add its unvisited neighbors to both the queue and the visited set.

Hint 3: are_connected simply checks if node_b is in the reachable set from node_a.

#12Log Analysis: IP Set Operations at ScaleHard

real-worldscaleall operationsdesign

Simulate the log analysis design challenge from the lesson. Given two lists of nginx-format log lines, extract IP addresses and compute all set operations to answer: who returned, who left, who is new, and how many total unique visitors are there?

Design considerations:

In production, you would stream the file line by line (not load all lines into a list) to handle multi-GB files
The IP sets themselves fit in memory because unique IPs are far fewer than total log lines
All set operations are O(n+m), making this approach feasible for 10M+ line files

Solution

def analyze_ip_logs(
    log_a_lines: list[str],
    log_b_lines: list[str],
) -> dict:
    # Extract unique IPs from each log
    ips_a = {line.split()[0] for line in log_a_lines if line.strip()}
    ips_b = {line.split()[0] for line in log_b_lines if line.strip()}

    returning = ips_a & ips_b
    only_a = ips_a - ips_b
    only_b = ips_b - ips_a
    total = ips_a | ips_b

    return {
        "unique_a": len(ips_a),
        "unique_b": len(ips_b),
        "returning": returning,
        "only_a": only_a,
        "only_b": only_b,
        "total_unique": len(total),
    }

# Test with sample nginx-format log lines
log_a = [
    '192.168.1.1 - - [01/Jan:00:00] "GET / HTTP/1.1" 200 1234',
    '10.0.0.1 - - [01/Jan:00:01] "GET /about HTTP/1.1" 200 5678',
    '192.168.1.1 - - [01/Jan:00:02] "GET /contact HTTP/1.1" 200 910',
    '172.16.0.1 - - [01/Jan:00:03] "POST /api HTTP/1.1" 201 100',
]
log_b = [
    '10.0.0.1 - - [01/Feb:00:00] "GET / HTTP/1.1" 200 1234',
    '172.16.0.1 - - [01/Feb:00:01] "GET /about HTTP/1.1" 200 5678',
    '8.8.8.8 - - [01/Feb:00:02] "GET / HTTP/1.1" 200 910',
    '10.0.0.1 - - [01/Feb:00:03] "GET /api HTTP/1.1" 200 100',
]

result = analyze_ip_logs(log_a, log_b)
for k, v in result.items():
    print(f"{k}: {v}")

Complexity: Building each IP set is O(n) where n is the number of log lines. Intersection, difference, and union are each O(n+m) where n and m are the unique IP counts. Total: O(lines_a + lines_b) time. For 10M-line logs with perhaps 500K unique IPs each, this completes in seconds. The set comprehension {line.split()[0] for line in lines} handles both extraction and deduplication in a single pass.

Production adaptation: Replace the list input with a file-streaming generator. Use split(' ', 1)[0] instead of split()[0] to avoid splitting the entire line. Add a 1MB read buffer. The memory usage is dominated by the IP sets (roughly 50-500MB for millions of unique IPs), not the log lines.

Starter Code

def analyze_ip_logs(
  log_a_lines: list[str],
  log_b_lines: list[str],
) -> dict:
  """Analyze two sets of log lines (nginx format: IP is first token).

  Return a dict with:
    'unique_a'     - count of unique IPs in log A
    'unique_b'     - count of unique IPs in log B
    'returning'    - set of IPs in both logs
    'only_a'       - set of IPs only in log A
    'only_b'       - set of IPs only in log B
    'total_unique' - count of unique IPs across both logs
  """
  # TODO: extract IP sets from each log, then compute all operations
  pass

Expected Output

Log A: 3 unique IPs (192.168.1.1, 10.0.0.1, 172.16.0.1)
Log B: 3 unique IPs (10.0.0.1, 172.16.0.1, 8.8.8.8)

unique_a: 3
unique_b: 3
returning: {'10.0.0.1', '172.16.0.1'}
only_a: {'192.168.1.1'}
only_b: {'8.8.8.8'}
total_unique: 4

Hints

Hint 1: Extract the IP from each line with line.split()[0]. Build a set from each log for automatic deduplication.

Hint 2: Use intersection for returning IPs, difference for exclusive IPs, and union for total unique count.

Hint 3: Think about why this approach works for 10M-line logs: building the set is O(n) and all set operations are O(n+m).