Python Interning Practice Problems & Exercises

Practice: Interning and Object Caching

10 problems3 Easy4 Medium3 Hard⏱ 35–50 min

Easy

#1Integer Cache Range CheckEasy

integer-cacheis-operatoridentity

Predict the output of each is comparison. Think about which integers CPython caches.

Python

a = 256
b = 256
print(a is b)

c = -5
d = -5
print(c is d)

e = 0
f = 0
print(e is f)

# Now outside the cache range
g = 257
h = 257
print(g is h)

i = -6
j = -6
print(i is j)

Solution

True
True
True
False
False

How it works: CPython pre-allocates a fixed array of integer objects for values -5 through 256 during interpreter startup. Every time your code uses an integer in this range, Python returns a pointer to the existing cached object rather than creating a new one.

256 is 256 is True because 256 is the upper boundary of the cache.
-5 is -5 is True because -5 is the lower boundary.
0 is 0 is True because 0 is well inside the range.
257 is 257 is False because 257 falls outside the cache — two separate objects are created.
-6 is -6 is False because -6 falls below the cache range.

Key insight: Never use is to compare integer values. The cache is a CPython implementation detail, not a Python language guarantee. Always use == for value comparison.

Expected Output

True\nTrue\nTrue\nFalse\nFalse

Hints

Hint 1: CPython pre-allocates integer objects from -5 to 256 at startup.

Hint 2: Any integer within that range will always be the same object. Outside it, new objects are created.

#2String Interning for IdentifiersEasy

string-interningis-operatoridentifier

Predict the output. Think about which strings CPython automatically interns.

Python

# Identifier-like strings (only letters, digits, underscores)
a = "hello"
b = "hello"
print(a is b)

# String with a space — not identifier-like
c = "hello world"
d = "hello world"
print(c is d)

# Underscore is fine — still identifier-like
e = "hello_world"
f = "hello_world"
print(e is f)

# Hyphen breaks it — not a valid identifier character
g = "hello-world"
h = "hello-world"
print(g is h)

Solution

True
False
True
False

How it works: CPython automatically interns string literals that resemble valid Python identifiers — strings containing only ASCII letters, digits, and underscores. This optimization exists because such strings frequently appear as attribute names, variable names, and dictionary keys in the interpreter itself.

"hello" is interned — simple identifier-like string.
"hello world" is NOT interned — the space makes it non-identifier-like.
"hello_world" is interned — underscores are valid in identifiers.
"hello-world" is NOT interned — hyphens are not valid in identifiers.

Key insight: String interning is an optimization for the interpreter's internal bookkeeping. It reduces memory usage for commonly used names and speeds up dictionary lookups (identity check before full equality check). You should never rely on it in application logic.

Expected Output

True\nFalse\nTrue\nFalse

Hints

Hint 1: CPython automatically interns string literals that look like valid Python identifiers (letters, digits, underscores only).

Hint 2: Strings with spaces, hyphens, or other special characters are NOT automatically interned.

#3Singleton Identity — True, False, NoneEasy

singletonsis-operatorNonebool

Predict the output. These are Python's built-in singletons.

Python

# None is always the same object
a = None
b = None
print(a is b)

# True and False are singletons
c = True
d = True
print(c is d)

e = False
f = False
print(e is f)

# Even when obtained through expressions
g = (1 == 1)   # Evaluates to True
print(g is True)

h = (1 == 2)   # Evaluates to False
print(h is False)

Solution

True
True
True
True
True

How it works: Python guarantees that None, True, and False are singletons. There is exactly one object of each in the entire interpreter process. Every reference to None points to the same object, every reference to True points to the same object, and the same for False.

This is different from integer or string interning — singletons are a language guarantee, not an implementation detail.

None is None is always True — PEP 8 explicitly recommends if x is None over if x == None.
True is True and False is False are always True.
Boolean expressions like 1 == 1 return the actual True singleton, not a new True-like object.

Key insight: is None is the only idiomatic use of is for comparison in everyday Python code. is True and is False work but are rarely needed — use truthiness checks instead (if x: rather than if x is True:).

Expected Output

True\nTrue\nTrue\nTrue\nTrue

Hints

Hint 1: True, False, and None are singletons in Python — only one instance of each exists.

Hint 2: Using `is` with None is actually the recommended idiom (PEP 8).

Medium

#4The 256/257 Boundary ExperimentMedium

integer-cacheboundaryidentity

Write a function that tests whether a given integer value falls within CPython's integer cache. Then use it to map the exact boundaries.

Challenge: The tricky part is avoiding compile-time constant folding, which can make literals in the same code block share identity even outside the cache range. The function boundary helps, but we also use n + 0 to force runtime computation.

Python

def test_cache_boundary(n):
    """Return True if integer n is cached (same object when assigned twice)."""
    a = n + 0
    b = n + 0
    return a is b

for val in [255, 256, 257, -5, -6, 0, 100, 1000]:
    cached = test_cache_boundary(val)
    print(f"{val:>6}: {'CACHED' if cached else 'NOT CACHED'}")

Solution

def test_cache_boundary(n):
    """Return True if integer n is cached."""
    a = n + 0  # Forces runtime computation, avoids constant folding
    b = n + 0
    return a is b

for val in [255, 256, 257, -5, -6, 0, 100, 1000]:
    cached = test_cache_boundary(val)
    print(f"{val:>6}: {'CACHED' if cached else 'NOT CACHED'}")

Output:

CACHED
CACHED
NOT CACHED
    -5: CACHED
    -6: NOT CACHED
CACHED
CACHED
NOT CACHED

Why n + 0: If you simply write a = 257 and b = 257 in the same function, CPython's peephole optimizer may fold them into the same constant, making a is b return True even though 257 is outside the cache. Using n + 0 forces the computation to happen at runtime, bypassing constant folding.

The cache in CPython source: In Objects/longobject.c, CPython defines NSMALLPOSINTS = 257 and NSMALLNEGINTS = 5, creating a contiguous array of 262 integer objects from -5 to 256. The function PyLong_FromLong() checks if the value falls in this range and returns the pre-allocated object instead of creating a new one.

Starter Code

def test_cache_boundary(n):
    """Return True if integer n is cached (same object when assigned twice)."""
    # Create two separate assignments of the same value
    # IMPORTANT: use a function to avoid compile-time constant folding
    a = n + 0  # Forces runtime evaluation
    b = n + 0
    return a is b

# Test the boundary
for val in [255, 256, 257, -5, -6, 0, 100, 1000]:
    cached = test_cache_boundary(val)
    print(f"{val:>6}: {'CACHED' if cached else 'NOT CACHED'}")

Expected Output

   255: CACHED\n   256: CACHED\n   257: NOT CACHED\n    -5: CACHED\n    -6: NOT CACHED\n     0: CACHED\n   100: CACHED\n  1000: NOT CACHED

Hints

Hint 1: Using `n + 0` forces CPython to compute the result at runtime, avoiding constant folding.

Hint 2: The cache range is exactly -5 through 256 inclusive — 262 pre-allocated integer objects.

#5Explicit String Interning with sys.intern()Medium

sys-internstring-interningperformance

Use sys.intern() to explicitly intern strings that CPython would not intern automatically. Show that interned strings share identity even when they contain spaces or special characters.

Python

import sys

# Without interning
a = "hello world"
b = "hello world"
print(f"Without intern: a is b -> {a is b}")

# With sys.intern()
c = sys.intern("hello world")
d = sys.intern("hello world")
print(f"With intern:    c is d -> {c is d}")

# Interning a computed string
parts = ["hello", " ", "world"]
e = sys.intern("".join(parts))
print(f"Interned join:  c is e -> {c is e}")

print(f"id(c) == id(d) == id(e): {id(c) == id(d) == id(e)}")

Solution

import sys

def demonstrate_intern():
    # Without interning — strings with spaces create separate objects
    a = "hello world"
    b = "hello world"
    print(f"Without intern: a is b -> {a is b}")

    # With sys.intern() — forces the same object
    c = sys.intern("hello world")
    d = sys.intern("hello world")
    print(f"With intern:    c is d -> {c is d}")

    # Even computed strings get interned to the same object
    parts = ["hello", " ", "world"]
    e = sys.intern("".join(parts))
    print(f"Interned join:  c is e -> {c is e}")

    print(f"id(c) == id(d) == id(e): {id(c) == id(d) == id(e)}")

demonstrate_intern()

When to use sys.intern():

Parsing large datasets where the same string keys repeat thousands of times (CSV columns, JSON keys).
Building lookup tables with string keys that are compared frequently.
Any case where you have many duplicate strings and want to trade one-time interning cost for faster comparisons and lower memory.

How it works internally: sys.intern() maintains a global dictionary of interned strings. When you call sys.intern(s), Python checks if an equal string already exists in that dictionary. If yes, it returns the existing object (and the new string can be garbage collected). If no, it adds the string to the dictionary and returns it.

Performance impact: Identity comparison (is) is O(1) — a single pointer comparison. Equality comparison (==) is O(n) in the worst case — it must compare character by character. For interned strings, Python's == implementation short-circuits to an identity check first, so interning can speed up equality checks too.

Starter Code

import sys

def demonstrate_intern():
    """Show how sys.intern() forces interning for non-identifier strings."""
    # Without interning — strings with spaces are separate objects
    a = "hello world"
    b = "hello world"
    print(f"Without intern: a is b -> {a is b}")

    # With sys.intern() — force the same object
    c = sys.intern("hello world")
    d = sys.intern("hello world")
    print(f"With intern:    c is d -> {c is d}")

    # Interning a computed string
    parts = ["hello", " ", "world"]
    e = sys.intern("".join(parts))
    print(f"Interned join:  c is e -> {c is e}")

    # Show memory benefit: all interned versions are the same object
    print(f"id(c) == id(d) == id(e): {id(c) == id(d) == id(e)}")

demonstrate_intern()

Expected Output

Without intern: a is b -> False\nWith intern:    c is d -> True\nInterned join:  c is e -> True\nid(c) == id(d) == id(e): True

Hints

Hint 1: sys.intern() takes a string and returns the interned version. If the string was already interned, it returns the existing object.

Hint 2: Interning is useful when you have many copies of the same string (e.g., dictionary keys from parsed data).

#6Constant Folding and Identity SurprisesMedium

constant-foldingpeepholecompiler

Predict the output. This explores how CPython's compiler optimizations affect identity checks in ways that surprise even experienced developers.

Python

# Case 1: Two literals in the same code block share constants
a = 1000
b = 1000
print(f"Literal in same scope: {a is b}")

# Case 2: Computed at runtime — no sharing
def make_thousand():
    return 500 + 500

c = make_thousand()
d = make_thousand()
print(f"Computed at runtime:   {c is d}")

# Case 3: Constant folding — the compiler computes 10 * 100 = 1000
e = 10 * 100
f = 10 * 100
print(f"Folded expression:     {e is f}")

# Case 4: String concatenation of literals is also folded
g = "hello" + "_" + "world"
h = "hello" + "_" + "world"
print(f"Long string folded:    {g is h}")

Solution

Literal in same scope: True
Computed at runtime:   False
Folded expression:     True
Long string folded:    True

What is happening here:

Case 1 — Same-scope literal deduplication: When CPython compiles a code block, it stores all constants in a co_consts tuple. If the same literal appears twice, the compiler stores it once and both names reference the same entry. This is why a is b is True even though 1000 is outside the integer cache range.

Case 2 — Runtime computation: Each call to make_thousand() computes 500 + 500 at runtime, producing a new integer object each time. The results are equal (==) but not identical (is).

Case 3 — Peephole constant folding: The compiler evaluates 10 * 100 at compile time, replacing it with the constant 1000. Since both e and f reference the same compiled constant, e is f is True. You can verify this with dis.dis():

import dis
dis.dis(compile("e = 10 * 100", "<string>", "exec"))
# Shows LOAD_CONST 1000, not LOAD_CONST 10 followed by LOAD_CONST 100

Case 4 — String literal concatenation: The compiler also folds string concatenation of literals. "hello" + "_" + "world" becomes "hello_world" at compile time.

Key insight: These optimizations make is behavior unpredictable for values outside the guaranteed cache/singleton range. This is exactly why you should never use is for value comparison — the result depends on which code path created the object, not on its value.

Expected Output

Literal in same scope: True\nComputed at runtime:   False\nFolded expression:     True\nLong string folded:    True

Hints

Hint 1: CPython compiles each code block (module, function, class) into a code object. Constants in the same code object are deduplicated.

Hint 2: The peephole optimizer folds constant expressions like `10 * 100` into `1000` at compile time.

#7Integer Cache Range MapperMedium

integer-cacheexplorationrange

Build an experiment that automatically discovers the exact range of CPython's integer cache. Your function should return the lower and upper bounds without hardcoding them.

Python

def is_cached(n):
    """Test if integer n is in the cache without constant folding."""
    a = n + 0
    b = n + 0
    return a is b

def find_cache_range():
    """Discover the exact integer cache range by scanning."""
    # Find upper bound: scan up from 0
    upper = 0
    while is_cached(upper + 1):
        upper += 1

    # Find lower bound: scan down from 0
    lower = 0
    while is_cached(lower - 1):
        lower -= 1

    return (lower, upper)

lower, upper = find_cache_range()
print(f"Cache range: [{lower}, {upper}]")
print(f"Total cached integers: {upper - lower + 1}")
print(f"Boundary check: {lower - 1} cached? {is_cached(lower - 1)}")
print(f"Boundary check: {upper + 1} cached? {is_cached(upper + 1)}")

Solution

def is_cached(n):
    """Test if integer n is in the cache without constant folding."""
    a = n + 0
    b = n + 0
    return a is b

def find_cache_range():
    """Discover the exact integer cache range by scanning."""
    upper = 0
    while is_cached(upper + 1):
        upper += 1

    lower = 0
    while is_cached(lower - 1):
        lower -= 1

    return (lower, upper)

lower, upper = find_cache_range()
print(f"Cache range: [{lower}, {upper}]")
print(f"Total cached integers: {upper - lower + 1}")
print(f"Boundary check: {lower - 1} cached? {is_cached(lower - 1)}")
print(f"Boundary check: {upper + 1} cached? {is_cached(upper + 1)}")

Output:

Cache range: [-5, 256]
Total cached integers: 262
Boundary check: -6 cached? False
Boundary check: 257 cached? False

Why 262 integers: The range was chosen pragmatically by CPython developers. Most Python programs use small integers heavily — loop counters, indexing, boolean-like values (0/1), ASCII codes (0-127), byte values (0-255). Caching -5 to 256 covers the vast majority of these use cases while keeping the memory overhead minimal (262 objects, roughly 7 KB on 64-bit systems).

Alternative approach — binary search: For a faster algorithm, you could binary search for the upper bound (start at 0, double until not cached, then bisect). But since the range is small (a few hundred), linear scanning is perfectly fine and more readable.

Why this matters: Understanding the cache range helps you reason about unexpected is behavior in debugging sessions. But in production code, you should never write logic that depends on these boundaries.

Starter Code

def find_cache_range():
    """Discover the exact integer cache range by binary search and scanning.
    Return (lower_bound, upper_bound) — the inclusive range of cached integers.
    """
    # Strategy: test integers outward from 0 until we find the boundaries
    # Use a helper that avoids constant folding
    pass

def is_cached(n):
    """Test if integer n is in the cache without constant folding interference."""
    # Create two independent runtime computations that produce n
    a = n + 0
    b = n + 0
    return a is b

lower, upper = find_cache_range()
print(f"Cache range: [{lower}, {upper}]")
print(f"Total cached integers: {upper - lower + 1}")
print(f"Boundary check: {lower - 1} cached? {is_cached(lower - 1)}")
print(f"Boundary check: {upper + 1} cached? {is_cached(upper + 1)}")

Expected Output

Cache range: [-5, 256]\nTotal cached integers: 262\nBoundary check: -6 cached? False\nBoundary check: 257 cached? False

Hints

Hint 1: Start from 0 and scan upward until is_cached returns False to find the upper bound.

Hint 2: Then scan downward from 0 until is_cached returns False to find the lower bound.

Hint 3: The upper bound is the last value that returns True, the lower bound is the first (most negative) value that returns True.

Hard

#8Custom Intern Pool for Arbitrary ObjectsHard

interningdesign-patternflyweight

Implement a generic InternPool that works like sys.intern() but for any hashable type — tuples, frozensets, custom named tuples, etc. After interning, objects with the same value should share identity.

pool = InternPool()

# Intern tuples — normally (1, 2) creates a new object each time
a = pool.intern((1, 2))
b = pool.intern((1, 2))
print(f"a is b: {a is b}")
print(f"Pool size: {pool.pool_size()}")
print(f"Is interned: {pool.is_interned(a)}")

# Intern multiple coordinate tuples
coords = [(1, 2), (3, 4), (1, 2), (5, 6), (3, 4), (1, 2)]
interned = [pool.intern(c) for c in coords]
print(f"Coord pool size: {pool.pool_size()}")
print(f"(1, 2) is (1, 2): {interned[0] is interned[2]}")
print(f"All same object: {all(interned[i] is interned[0] for i in [0, 2, 5])}")

print(f"Stats: {pool.stats()}")

Solution

class InternPool:
    """A generic intern pool that deduplicates objects by value."""

    def __init__(self):
        self._pool = {}

    def intern(self, obj):
        """Return the canonical version of obj."""
        if obj in self._pool:
            return self._pool[obj]
        self._pool[obj] = obj
        return obj

    def pool_size(self):
        """Return the number of unique objects in the pool."""
        return len(self._pool)

    def is_interned(self, obj):
        """Check if obj is the canonical (interned) instance."""
        return obj in self._pool and self._pool[obj] is obj

    def stats(self):
        """Return dict with pool statistics."""
        types = {type(v).__name__ for v in self._pool.values()}
        return {
            "pool_size": self.pool_size(),
            "types": types,
        }

Design decisions:

Dictionary as the pool: We use the object's value as both the key and the value. Python's dictionary uses __hash__ and __eq__ to match keys, which gives us automatic deduplication by value.
Why this works: When pool.intern((1, 2)) is called twice, the second call finds (1, 2) already in the dictionary (via __eq__) and returns the stored object. Both callers now hold a reference to the same tuple object.
Hashable requirement: This pool only works for hashable types (tuples, frozensets, strings, ints, named tuples). Mutable types like lists and dicts cannot be dictionary keys.
This is the Flyweight pattern: In software engineering, this is known as the Flyweight design pattern — sharing objects to reduce memory usage. CPython's integer cache and string interning are built-in implementations of this pattern.

Memory impact: If you have 1 million coordinate tuples but only 1000 unique values, interning reduces memory from 1M objects to 1K objects — a 1000x reduction.

Starter Code

class InternPool:
    """A generic intern pool that deduplicates objects by value.
    Similar to sys.intern() but works for any hashable type.
    """

    def __init__(self):
        self._pool = {}  # Maps value -> canonical object

    def intern(self, obj):
        """Return the canonical version of obj.
        If an equal object is already in the pool, return that instead.
        Otherwise, add obj to the pool and return it.
        """
        pass

    def pool_size(self):
        """Return the number of unique objects in the pool."""
        pass

    def is_interned(self, obj):
        """Check if obj is the canonical (interned) instance."""
        pass

    def stats(self):
        """Return dict with pool statistics."""
        pass

Expected Output

a is b: True\nPool size: 1\nIs interned: True\nCoord pool size: 3\n(1, 2) is (1, 2): True\nAll same object: True\nStats: {'pool_size': 3, 'types': {'tuple'}}

Hints

Hint 1: The key insight is to use the object itself (or a hashable representation) as the dictionary key.

Hint 2: When interning, check if the pool already contains an equal object. If yes, return the pooled one.

Hint 3: For is_interned, check if the object IS (identity) the one stored in the pool, not just equal to it.

#9Weak-Reference Object CacheHard

weakrefcachegcadvanced

Build a WeakInternPool that caches objects using weak references. Objects stay in the pool only while at least one strong reference exists elsewhere. When all strong references are deleted and GC runs, the cached entry is automatically removed. This mirrors how CPython manages some internal caches.

import gc

pool = WeakInternPool()

# Intern some objects — using a wrapper class since basic types
# (int, str, tuple) don't support weak references
class Box:
    """Wrapper that supports weak references and hashing."""
    def __init__(self, value):
        self.value = value
    def __hash__(self):
        return hash(self.value)
    def __eq__(self, other):
        return isinstance(other, Box) and self.value == other.value
    def __repr__(self):
        return f"Box({self.value!r})"

a = pool.intern(Box("alpha"))
b = pool.intern(Box("alpha"))
c = pool.intern(Box("beta"))

print(f"After interning 3: pool_size={pool.pool_size()}")
print(f"a is b (while held): {a is b}")
print(f"Hit rate: {pool.hit_rate():.2f}")

# Delete strong references and force GC
del a
del b
gc.collect()

print(f"After del + GC: pool_size <= 3: {pool.pool_size() <= 3}")

# Re-intern — should be a cache miss since the old object was collected
d = pool.intern(Box("alpha"))
print(f"Re-intern after GC — same object? {d is c}")

print(f"Final stats keys: {sorted(pool.stats().keys())}")

Solution

import weakref

class WeakInternPool:
    """An intern pool using weak references for automatic cleanup."""

    def __init__(self):
        self._pool = weakref.WeakValueDictionary()
        self._hits = 0
        self._misses = 0

    def intern(self, obj):
        """Intern an object. Return cached version if available."""
        key = (type(obj).__name__, obj)
        try:
            cached = self._pool[key]
            self._hits += 1
            return cached
        except KeyError:
            self._pool[key] = obj
            self._misses += 1
            return obj

    def pool_size(self):
        """Return current number of live objects in pool."""
        return len(self._pool)

    def hit_rate(self):
        """Return cache hit rate as a float."""
        total = self._hits + self._misses
        if total == 0:
            return 0.0
        return self._hits / total

    def stats(self):
        """Return dict with pool statistics."""
        return {
            "pool_size": self.pool_size(),
            "hits": self._hits,
            "misses": self._misses,
            "hit_rate": self.hit_rate(),
        }

How WeakValueDictionary works:

Values in the dictionary are stored as weak references — they do not prevent garbage collection.
When the last strong reference to a value is deleted and GC runs, the dictionary entry is automatically removed (via a weak reference callback).
Accessing a key whose value was collected raises KeyError, just like a missing key.

Why we need the Box wrapper: Python's basic types (int, str, tuple) do not support weak references because they are implemented in C and lack the __weakref__ slot. Custom classes support weak references by default.

How this mirrors CPython: CPython uses similar patterns internally. For example, the type object cache and certain codec lookup tables use weak references to avoid pinning objects in memory indefinitely. The key insight is that caching should be helpful — it should speed up access to objects that are actively in use — but it should not prevent cleanup of objects that nobody needs anymore.

Real-world use case: If you build a large graph processing system where nodes reference shared label objects, a WeakInternPool ensures that labels used by active nodes stay deduplicated, but labels for deleted nodes get garbage collected automatically.

Starter Code

import weakref

class WeakInternPool:
    """An intern pool that uses weak references so cached objects
    can be garbage collected when no strong references remain.
    Like CPython's internal approach — cache what's in use, release what's not.
    """

    def __init__(self):
        # WeakValueDictionary: values are weak refs, auto-removed on GC
        self._pool = weakref.WeakValueDictionary()
        self._hits = 0
        self._misses = 0

    def intern(self, obj):
        """Intern an object. Return cached version if available.
        If the cached version was garbage collected, store this one.
        """
        pass

    def pool_size(self):
        """Return current number of live (not GC'd) objects in pool."""
        pass

    def hit_rate(self):
        """Return cache hit rate as a float (0.0 to 1.0)."""
        pass

    def stats(self):
        """Return dict with pool statistics."""
        pass

Expected Output

After interning 3: pool_size=3\na is b (while held): True\nHit rate: 0.50\nAfter del + GC: pool_size <= 3: True\nRe-intern after GC — same object? False\nFinal stats keys: ['pool_size', 'hits', 'misses', 'hit_rate']

Hints

Hint 1: weakref.WeakValueDictionary automatically removes entries when the value is garbage collected.

Hint 2: Use a hashable key derived from the object. For simple types, the object itself works as a key.

Hint 3: The tricky part: after GC reclaims a cached object, the next intern() of the same value should create a new entry (cache miss).

#10Interning-Aware Smart ComparatorHard

interningidentityequalityoptimization

Build a SmartComparator that mimics CPython's internal comparison strategy: try identity first for known-interned types, fall back to full equality only when necessary. Track statistics to show the optimization impact.

import sys

comp = SmartComparator()

# Register some interned strings
key1 = comp.register_interned("user_id")
key2 = comp.register_interned("user_id")

# Test various comparisons
# 1. None — always singleton
r1 = comp.equal(None, None)
print(f"None comparison (identity): {r1}, used identity: {comp._identity_hits == 1}")

# 2. Small integer — in cache
r2 = comp.equal(42, 42)
print(f"Small int (identity): {r2}, used identity: {comp._identity_hits == 2}")

# 3. Large integer — outside cache, needs ==
x, y = 10**18, 10**18
r3 = comp.equal(x, y)
print(f"Large int (equality): {r3}, used identity: {comp._identity_hits == 2}")

# 4. Interned string
r4 = comp.equal(key1, key2)
print(f"Interned string (identity): {r4}, used identity: {comp._identity_hits == 3}")

# 5. Non-interned string
r5 = comp.equal("hello world!", "hello world!")
print(f"Regular string (equality): {r5}, used identity: {comp._identity_hits == 3}")

# 6. List — never identity-safe
r6 = comp.equal([1, 2], [1, 2])
print(f"List (equality): {r6}, used identity: {comp._identity_hits == 3}")

s = comp.stats()
pct = s['identity_hits'] / s['total_comparisons'] * 100
print(f"Stats: identity_pct={pct:.1f}%")

Solution

import sys

class SmartComparator:
    """An equality checker that exploits interning for performance."""

    def __init__(self):
        self._comparisons = 0
        self._identity_hits = 0
        self._equality_checks = 0
        self._interned_strings = set()

    def register_interned(self, s):
        """Register a string as interned. Returns the interned string."""
        interned = sys.intern(s)
        self._interned_strings.add(id(interned))
        return interned

    def equal(self, a, b):
        """Compare a and b, using identity shortcut when possible."""
        self._comparisons += 1

        # Fast path: identity check
        if a is b:
            self._identity_hits += 1
            return True

        # Check if identity was the right test but objects aren't identical
        if self._can_use_identity(a, b):
            # Both are interned types but not the same object → not equal
            self._identity_hits += 1
            return False

        # Slow path: full equality check
        self._equality_checks += 1
        return a == b

    def _can_use_identity(self, a, b):
        """Determine if identity check is sufficient."""
        # NoneType — singleton
        if a is None or b is None:
            return True

        # Booleans — singletons
        if isinstance(a, bool) and isinstance(b, bool):
            return True

        # Small integers in cache range
        if isinstance(a, int) and isinstance(b, int):
            if not isinstance(a, bool) and not isinstance(b, bool):
                if -5 <= a <= 256 and -5 <= b <= 256:
                    return True

        # Explicitly interned strings
        if isinstance(a, str) and isinstance(b, str):
            if id(a) in self._interned_strings and id(b) in self._interned_strings:
                return True

        return False

    def stats(self):
        """Return comparison statistics."""
        return {
            "total_comparisons": self._comparisons,
            "identity_hits": self._identity_hits,
            "equality_checks": self._equality_checks,
        }

How CPython's dict lookup actually works:

When Python looks up a key in a dictionary, it follows this exact strategy:

Compute the hash of the lookup key.
Find the slot in the hash table.
Compare the stored key with the lookup key using identity first (stored_key is lookup_key).
Only if identity fails, compare using equality (stored_key == lookup_key).

This is why interned strings make dictionary lookups faster — step 3 succeeds immediately for interned keys, skipping the character-by-character comparison in step 4.

Performance analysis:

Identity check (is): ~1 nanosecond — single pointer comparison.
Equality check for strings (==): O(n) where n is string length — could be microseconds for long strings.
For a dictionary with 1 million lookups of interned keys, this optimization saves roughly 1 million string comparisons.

The bool check matters: bool is a subclass of int in Python. True == 1 and False == 0 are both True. We check for bool before int to handle them as singletons, not as cached integers.

Starter Code

import sys

class SmartComparator:
    """An equality checker that exploits interning for performance.
    Uses identity (is) for known-interned types, falls back to == otherwise.

    This mimics how CPython's dict lookup works:
    first check identity, then check equality.
    """

    def __init__(self):
        self._comparisons = 0
        self._identity_hits = 0
        self._equality_checks = 0
        # Track explicitly interned strings
        self._interned_strings = set()

    def register_interned(self, s):
        """Register a string as interned (via sys.intern).
        Returns the interned string."""
        pass

    def equal(self, a, b):
        """Compare a and b, using identity shortcut when possible.
        Returns True if equal, False otherwise.
        """
        pass

    def _can_use_identity(self, a, b):
        """Determine if identity check is sufficient for these objects."""
        pass

    def stats(self):
        """Return comparison statistics."""
        pass

Expected Output

None comparison (identity): True, used identity: True\nSmall int (identity): True, used identity: True\nLarge int (equality): True, used identity: False\nInterned string (identity): True, used identity: True\nRegular string (equality): True, used identity: False\nList (equality): True, used identity: False\nStats: identity_pct=50.0%

Hints

Hint 1: None, True, False are always safe for identity comparison — they are singletons.

Hint 2: Small integers in the cache range can use identity. For integers outside the range, fall back to equality.

Hint 3: For strings, check if they are in your registered interned set. For all other types, always use equality.

Practice: Interning and Object Caching

Easy​

Medium​

Hard​

Easy

Medium

Hard