Project 02 - Mini Profiler Tool

Estimated time: 6–8 hours core | Level: Intermediate

Before reading the requirements, consider this question: cProfile reports "tottime" (time spent in a function excluding callees) and "cumtime" (time including callees). How does it compute them? A flat list of (function, elapsed time) pairs is not enough - you need to know the call relationships. What data structure represents that? Sketch the answer before you start.

Learning Objectives

By the time this project is complete, you will have practiced:

Using sys.setprofile() to intercept call and return events with nanosecond timestamps
Building a call tree - not a flat list - to correctly compute own time vs total time
Using tracemalloc to measure peak memory, total allocations, and top allocation sites by traceback
Implementing the context manager protocol (__enter__, __exit__) and the decorator protocol correctly
Formatting a table using only the Python standard library (no rich, no tabulate)
Comparing two profiler runs and computing deltas
Exporting profiling data as JSON suitable for CI threshold enforcement

System Overview

You are building a single module profiler.py that exports one class: Profiler. The class works as both a context manager and a decorator factory, and produces a formatted report on demand. The module has no external dependencies - standard library only.

from profiler import Profiler

with Profiler() as p:
    result = my_expensive_function(data)

p.report()
# Output:
# ┌─────────────────────────────────────────────────────┐
# │ Time Profile                                        │
# ├─────────────────────────┬────────┬────────┬────────┤
# │ Function                │ Calls  │ Total  │ Own    │
# ├─────────────────────────┼────────┼────────┼────────┤
# │ my_expensive_function   │ 1      │ 2.341s │ 0.012s │
# │   helper_a              │ 1000   │ 1.890s │ 1.890s │
# │   helper_b              │ 500    │ 0.439s │ 0.439s │
# └─────────────────────────┴────────┴────────┴────────┘
# Memory: peak 45.2 MB, allocated 12.3 MB in 847 objects

Requirements

R1 - Time profiling with `sys.setprofile()`

Install a profile callback using sys.setprofile() when profiling begins and remove it (restore None or the previous callback) when profiling ends.

The callback signature is:

def profile_callback(frame, event, arg):
    ...

event is one of: "call", "return", "c_call", "c_return", "c_exception".
frame is the current stack frame (frame.f_code.co_name gives the function name, frame.f_code.co_filename gives the file).
arg is the return value on "return" events, and the C function object on "c_call" events.
Record a time.perf_counter_ns() timestamp on every "call" event and every "return" event.
Handle "c_call" and "c_return" in the same way as "call" and "return" - built-in functions must appear in the profile.

R2 - Call tree construction

Do not use a flat list. Build a call tree where each node represents one invocation of a function and contains:

name: function name (and file for disambiguation)
call_time_ns: timestamp when the call was entered
return_time_ns: timestamp when the call returned
children: list of child call nodes (functions called from within this function)
call_count: how many times this function was called (for aggregated reporting)

Total time for a node = return_time_ns - call_time_ns (the wall time for the entire call including callees).

Own time for a node = total time minus the sum of total times of all direct children.

Maintain a call stack (a Python list) in the profiler callback. On "call", push a new node. On "return", pop the node, set its return time, and attach it as a child of the new top of the stack.

The root of the tree is a synthetic node representing the profiled block itself.

R3 - Memory profiling with `tracemalloc`

Start tracemalloc when profiling begins and stop it when profiling ends.

Record peak memory usage: use tracemalloc.get_traced_memory() which returns (current, peak) in bytes.
Record total number of memory allocations and total bytes allocated during the profiled block: take a snapshot with tracemalloc.take_snapshot() on stop and aggregate the statistics.
Store the top-10 allocation sites by traceback (file + line number + size).
Convert all byte values to megabytes for display (divide by 1,048,576).

R4 - Context manager and decorator protocols

Profiler must work in both modes without the user doing anything different:

Context manager:

with Profiler() as p:
    some_code()
p.report()

Decorator:

p = Profiler()

@p.profile
def my_function():
    some_code()

my_function()
p.report()

__enter__ starts sys.setprofile() and tracemalloc.
__exit__ stops them, records final memory stats, and stores results - regardless of whether an exception occurred.
p.profile is a method that returns a decorator. The decorated function starts and stops profiling around each call.
A Profiler instance may be used multiple times; each use accumulates results (for the decorator mode) or resets them (for the context manager mode, unless configured otherwise).

R5 - Formatted report table

p.report() must print a formatted box-drawing table to stdout. Use only the standard library. The table must contain:

A "Time Profile" section with columns: Function, Calls, Total, Own.
Functions indented by their depth in the call tree (two spaces per level).
Time values formatted as X.XXXs (three decimal places, in seconds) or XXX.Xms if under one second.
A memory summary line below the table: Memory: peak X.X MB, allocated X.X MB in N objects.

The box-drawing characters to use:

┌ ─ ┐   (top border)
│       (vertical separator)
├ ─ ┤   (mid border, after header)
│ │ │   (data rows)
└ ─ ┘   (bottom border)
┬ ┼ ┴   (column separators in borders)

Column widths must be computed dynamically from the data - do not hardcode them.

R6 - Top-N memory allocations by traceback

p.report() must also print (or p.memory_report() optionally as a separate call) the top-10 memory allocation sites:

Top memory allocations:
  1.  12.4 MB  mymodule.py:42  (847 allocs)
  2.   3.1 MB  mymodule.py:87  (203 allocs)
  ...

Use tracemalloc.Snapshot.statistics("lineno") to get this data.

R7 - Run comparison

Implement p1.compare(p2) that prints a comparison table showing the delta between two profiler runs:

┌─────────────────────────┬───────────┬───────────┬───────────┐
│ Function                │ Run 1     │ Run 2     │ Delta     │
├─────────────────────────┼───────────┼───────────┼───────────┤
│ my_function             │ 2.341s    │ 1.892s    │ -0.449s   │
│   helper_a              │ 1.890s    │ 1.455s    │ -0.435s   │
└─────────────────────────┴───────────┴───────────┴───────────┘
Memory delta: -3.2 MB peak

Match functions by name across the two runs.
Show positive deltas in green (using ANSI escape codes: \033[32m) and negative deltas as-is. The ANSI codes are the only non-stdlib-table exception allowed.
Functions present in one run but not the other are shown with - in the missing column.

R8 - JSON export and CI threshold enforcement

Implement p.export_json() returning a dict and p.check_threshold(function_name, max_seconds) that raises ThresholdExceeded if the named function's total time exceeds max_seconds.

data = p.export_json()
# data structure:
# {
#   "functions": [
#     {"name": "my_function", "calls": 1, "total_s": 2.341, "own_s": 0.012, "depth": 0},
#     ...
#   ],
#   "memory": {"peak_mb": 45.2, "allocated_mb": 12.3, "objects": 847}
# }

p.check_threshold("my_expensive_function", max_seconds=3.0)  # passes
p.check_threshold("my_expensive_function", max_seconds=1.0)  # raises ThresholdExceeded

The JSON must be serialisable with json.dumps() without a custom encoder.

Starter Code Skeleton

import functools
import json
import sys
import time
import tracemalloc
from dataclasses import dataclass, field
from typing import Optional


# ── Custom Exceptions ──────────────────────────────────────────────────────────

class ThresholdExceeded(Exception):
    pass


# ── Data Structures ────────────────────────────────────────────────────────────

@dataclass
class CallNode:
    name: str
    filename: str
    call_time_ns: int = 0
    return_time_ns: int = 0
    children: list["CallNode"] = field(default_factory=list)

    @property
    def total_time_s(self) -> float:
        return (self.return_time_ns - self.call_time_ns) / 1_000_000_000

    @property
    def own_time_s(self) -> float:
        children_total = sum(c.total_time_s for c in self.children)
        return self.total_time_s - children_total

    @property
    def call_count(self) -> int:
        # For aggregated reporting, count recursive occurrences with same name
        # TODO: implement if needed for aggregation
        return 1


@dataclass
class MemoryStats:
    peak_mb: float = 0.0
    allocated_mb: float = 0.0
    objects: int = 0
    top_allocations: list[dict] = field(default_factory=list)


# ── Table Formatting ───────────────────────────────────────────────────────────

def format_time(seconds: float) -> str:
    """Format a time value as Xs or Xms depending on magnitude."""
    if seconds >= 1.0:
        return f"{seconds:.3f}s"
    else:
        return f"{seconds * 1000:.1f}ms"


def make_table(headers: list[str], rows: list[list[str]]) -> str:
    """
    Build a box-drawing table from headers and rows.
    Column widths are computed from the data.
    Returns the table as a string (caller prints it).
    """
    # TODO: compute column widths as max(len(header), max(len(cell)) for cell in col)
    # TODO: build top border: ┌─...─┬─...─┐
    # TODO: build header row: │ h1  │ h2  │
    # TODO: build mid border: ├─...─┼─...─┤
    # TODO: build data rows:  │ v1  │ v2  │
    # TODO: build bottom:     └─...─┴─...─┘
    pass


# ── Core Profiler ──────────────────────────────────────────────────────────────

class Profiler:
    def __init__(self):
        self._root: Optional[CallNode] = None
        self._call_stack: list[CallNode] = []
        self._memory: Optional[MemoryStats] = None
        self._previous_profile = None  # save/restore sys.getprofile()

    # ── Internal profile callback ─────────────────────────────────────────────

    def _profile_callback(self, frame, event: str, arg) -> None:
        """sys.setprofile callback. Called on every function call and return."""
        if event in ("call", "c_call"):
            name = frame.f_code.co_name if event == "call" else getattr(arg, "__name__", "?")
            filename = frame.f_code.co_filename if event == "call" else "<builtin>"
            node = CallNode(
                name=name,
                filename=filename,
                call_time_ns=time.perf_counter_ns(),
            )
            # TODO: append node to self._call_stack

        elif event in ("return", "c_return", "c_exception"):
            if not self._call_stack:
                return
            # TODO: pop node from self._call_stack
            # TODO: set node.return_time_ns = time.perf_counter_ns()
            # TODO: if self._call_stack is non-empty, append node to self._call_stack[-1].children
            # TODO: else, append node to self._root.children

    # ── Start / Stop ──────────────────────────────────────────────────────────

    def _start(self) -> None:
        self._root = CallNode(name="<profiled block>", filename="")
        self._root.call_time_ns = time.perf_counter_ns()
        self._call_stack = []
        self._previous_profile = sys.getprofile()
        sys.setprofile(self._profile_callback)
        tracemalloc.start()

    def _stop(self) -> None:
        sys.setprofile(self._previous_profile)
        self._root.return_time_ns = time.perf_counter_ns()

        current, peak = tracemalloc.get_traced_memory()
        snapshot = tracemalloc.take_snapshot()
        tracemalloc.stop()

        stats = snapshot.statistics("lineno")
        top = []
        for stat in stats[:10]:
            top.append({
                "file": str(stat.traceback[0].filename),
                "line": stat.traceback[0].lineno,
                "size_mb": stat.size / 1_048_576,
                "count": stat.count,
            })

        self._memory = MemoryStats(
            peak_mb=peak / 1_048_576,
            allocated_mb=sum(s.size for s in stats) / 1_048_576,
            objects=sum(s.count for s in stats),
            top_allocations=top,
        )

    # ── Context Manager Protocol ───────────────────────────────────────────────

    def __enter__(self) -> "Profiler":
        self._start()
        return self

    def __exit__(self, exc_type, exc_val, exc_tb) -> bool:
        self._stop()
        return False  # do not suppress exceptions

    # ── Decorator Protocol ─────────────────────────────────────────────────────

    def profile(self, fn):
        """Decorator: wrap fn so each call is profiled."""
        @functools.wraps(fn)
        def wrapper(*args, **kwargs):
            self._start()
            try:
                return fn(*args, **kwargs)
            finally:
                self._stop()
        return wrapper

    # ── Reporting ─────────────────────────────────────────────────────────────

    def _flatten_tree(
        self,
        node: CallNode,
        depth: int = 0,
        result: Optional[list] = None,
    ) -> list[tuple[int, CallNode]]:
        """Return a depth-annotated flat list of CallNodes for display."""
        if result is None:
            result = []
        # TODO: for each child in node.children (skip root itself):
        #   append (depth, child) to result
        #   recurse into child's children with depth + 1
        return result

    def report(self) -> None:
        """Print the full profiling report to stdout."""
        if self._root is None:
            print("No profiling data. Run inside `with Profiler() as p:` or use `@p.profile`.")
            return

        flat = self._flatten_tree(self._root)

        headers = ["Function", "Calls", "Total", "Own"]
        rows = []
        for depth, node in flat:
            indent = "  " * depth
            rows.append([
                f"{indent}{node.name}",
                str(node.call_count),
                format_time(node.total_time_s),
                format_time(node.own_time_s),
            ])

        # TODO: print "Time Profile" title
        # TODO: call make_table(headers, rows) and print the result
        # TODO: print memory summary line

    def memory_report(self) -> None:
        """Print the top memory allocation sites."""
        if self._memory is None:
            return
        print("Top memory allocations:")
        for i, alloc in enumerate(self._memory.top_allocations, 1):
            print(f"  {i:2d}.  {alloc['size_mb']:6.1f} MB  {alloc['file']}:{alloc['line']}  ({alloc['count']} allocs)")

    def compare(self, other: "Profiler") -> None:
        """Print a side-by-side comparison of this run vs another."""
        # TODO: flatten both trees into dicts keyed by function name
        # TODO: collect all function names from both
        # TODO: print comparison table with delta column
        # TODO: print memory delta
        pass

    def export_json(self) -> dict:
        """Return a JSON-serialisable dict of the profiling results."""
        if self._root is None:
            return {}
        flat = self._flatten_tree(self._root)
        functions = []
        for depth, node in flat:
            functions.append({
                "name": node.name,
                "calls": node.call_count,
                "total_s": round(node.total_time_s, 6),
                "own_s": round(node.own_time_s, 6),
                "depth": depth,
            })
        memory = {}
        if self._memory:
            memory = {
                "peak_mb": round(self._memory.peak_mb, 2),
                "allocated_mb": round(self._memory.allocated_mb, 2),
                "objects": self._memory.objects,
            }
        return {"functions": functions, "memory": memory}

    def check_threshold(self, function_name: str, max_seconds: float) -> None:
        """Raise ThresholdExceeded if function_name's total time exceeds max_seconds."""
        if self._root is None:
            return
        flat = self._flatten_tree(self._root)
        for _, node in flat:
            if node.name == function_name:
                if node.total_time_s > max_seconds:
                    raise ThresholdExceeded(
                        f"{function_name} took {node.total_time_s:.3f}s, "
                        f"exceeding threshold of {max_seconds:.3f}s"
                    )
                return


# ── Demonstration ──────────────────────────────────────────────────────────────

if __name__ == "__main__":
    import time as _time

    def helper_a(n: int) -> int:
        total = 0
        for i in range(n):
            total += i
        return total

    def helper_b(items: list) -> list:
        return sorted(items)

    def my_expensive_function(data: list) -> dict:
        a_result = helper_a(len(data))
        b_result = helper_b(data)
        return {"sum": a_result, "sorted": b_result}

    data = list(range(5000))

    # --- context manager usage ---
    print("=== Context Manager Usage ===")
    with Profiler() as p:
        result = my_expensive_function(data)

    p.report()

    print()
    p.memory_report()

    # --- decorator usage ---
    print("\n=== Decorator Usage ===")
    p2 = Profiler()

    @p2.profile
    def profiled_sort(items):
        return sorted(items)

    profiled_sort(data)
    p2.report()

    # --- comparison ---
    print("\n=== Run Comparison ===")
    with Profiler() as p3:
        my_expensive_function(list(range(1000)))

    with Profiler() as p4:
        my_expensive_function(list(range(10000)))

    p3.compare(p4)

    # --- JSON export ---
    print("\n=== JSON Export ===")
    data_json = p.export_json()
    print(json.dumps(data_json, indent=2))

    # --- threshold check ---
    print("\n=== Threshold Check ===")
    try:
        p.check_threshold("my_expensive_function", max_seconds=100.0)
        print("Threshold check passed.")
    except ThresholdExceeded as e:
        print(f"Threshold exceeded: {e}")

Expected Output

=== Context Manager Usage ===
┌─────────────────────────────────────────────────────────────┐
│ Time Profile                                                │
├─────────────────────────┬────────┬──────────┬──────────────┤
│ Function                │ Calls  │ Total    │ Own          │
├─────────────────────────┼────────┼──────────┼──────────────┤
│ my_expensive_function   │ 1      │ 3.2ms    │ 0.1ms        │
│   helper_a              │ 1      │ 1.8ms    │ 1.8ms        │
│   helper_b              │ 1      │ 1.3ms    │ 1.3ms        │
└─────────────────────────┴────────┴──────────┴──────────────┘
Memory: peak 0.4 MB, allocated 0.2 MB in 312 objects

Top memory allocations:
   1.    0.1 MB  profiler.py:42  (128 allocs)
   2.    0.0 MB  profiler.py:87  (64 allocs)
  ...

=== Decorator Usage ===
┌─────────────────────────────────────────────────────────────┐
│ Time Profile                                                │
├─────────────────────────┬────────┬──────────┬──────────────┤
│ Function                │ Calls  │ Total    │ Own          │
├─────────────────────────┼────────┼──────────┼──────────────┤
│ profiled_sort           │ 1      │ 0.9ms    │ 0.9ms        │
└─────────────────────────┴────────┴──────────┴──────────────┘
Memory: peak 0.2 MB, allocated 0.1 MB in 89 objects

=== Run Comparison ===
┌─────────────────────────┬───────────┬───────────┬───────────┐
│ Function                │ Run 1     │ Run 2     │ Delta     │
├─────────────────────────┼───────────┼───────────┼───────────┤
│ my_expensive_function   │ 0.8ms     │ 3.2ms     │ +2.4ms    │
│   helper_a              │ 0.4ms     │ 1.8ms     │ +1.4ms    │
│   helper_b              │ 0.3ms     │ 1.3ms     │ +1.0ms    │
└─────────────────────────┴───────────┴───────────┴───────────┘
Memory delta: +0.3 MB peak

=== JSON Export ===
{
  "functions": [
    {
      "name": "my_expensive_function",
      "calls": 1,
      "total_s": 0.003241,
      "own_s": 0.000124,
      "depth": 0
    },
    ...
  ],
  "memory": {
    "peak_mb": 0.38,
    "allocated_mb": 0.19,
    "objects": 312
  }
}

=== Threshold Check ===
Threshold check passed.

Note: timing and memory values will vary by machine and Python version. The table structure, column alignment, and JSON keys must match exactly.

Step-by-Step Hints

Hint 1 - Understand the call stack invariant before writing the callback. The call stack in your profiler must always mirror the real Python call stack. On every "call" event, a new frame has been pushed. On every "return" event, the current frame is about to be popped. Your profiler's _call_stack list must stay in sync with this. Draw out what happens for a three-deep call A -> B -> C -> return C -> return B -> return A before writing the code:

event         action                  _call_stack after
"call" A      push CallNode(A)        [A]
"call" B      push CallNode(B)        [A, B]
"call" C      push CallNode(C)        [A, B, C]
"return" C    pop C, set return_time  [A, B],  B.children = [C]
"return" B    pop B, set return_time  [A],     A.children = [B]
"return" A    pop A, set return_time  [],      root.children = [A]

Hint 2 - The root node is synthetic. The _root node is not a real function call. It represents the profiled block itself. Its children are the top-level functions called directly within the with block. Initialise _root with call_time_ns = time.perf_counter_ns() before installing the profile callback, and set _root.return_time_ns after removing the callback.

Hint 3 - sys.setprofile() vs sys.settrace(). sys.setprofile() fires only on function call and return events - it has very low overhead because it does not fire on every line. sys.settrace() fires on every line, which is what coverage.py needs but is too expensive for a profiler. Use sys.setprofile(). The callback signature is identical, but the event strings are different: setprofile uses "call", "return", "c_call", "c_return", "c_exception" - not "line".

Hint 4 - Save and restore the previous profile. Do not assume sys.getprofile() is None when you start. Another profiler, a debugger, or a test framework may have installed its own callback. Save it before installing yours and restore it on stop:

def _start(self):
    self._previous_profile = sys.getprofile()
    sys.setprofile(self._profile_callback)

def _stop(self):
    sys.setprofile(self._previous_profile)

Hint 5 - tracemalloc snapshot statistics. After tracemalloc.take_snapshot(), use snapshot.statistics("lineno") to get a list of tracemalloc.Statistic objects sorted by size (largest first). Each statistic has .size (bytes), .count (allocations), and .traceback (a Traceback object). stat.traceback[0] is the innermost frame, with .filename and .lineno attributes.

snapshot = tracemalloc.take_snapshot()
stats = snapshot.statistics("lineno")
for stat in stats[:10]:
    print(f"{stat.size / 1_048_576:.1f} MB  {stat.traceback[0]}")

Hint 6 - Building the table formatter. The key is computing column widths from the data:

col_widths = [
    max(len(headers[i]), max((len(row[i]) for row in rows), default=0))
    for i in range(len(headers))
]

Then build each border and row by repeating "─" for col_width characters and joining with "┬", "┼", or "┴" depending on which border it is. The top border uses "┌" / "┬" / "┐", the mid border uses "├" / "┼" / "┤", and the bottom uses "└" / "┴" / "┘".

Hint 7 - __exit__ must always stop the profiler. Even if an exception occurs inside the with block, __exit__ is called. Use try/finally inside _stop() or rely on the fact that __exit__ is always called after the block. Return False from __exit__ to let the exception propagate. Never return True unless you intentionally want to swallow the exception.

Hint 8 - Own time can go negative for tiny functions. Due to time.perf_counter_ns() resolution and the overhead of the profiler callback itself, own time can sometimes compute as a small negative number. Clamp it to zero: max(0.0, self.total_time_s - children_total).

Internals Concepts Tested

Concept	Where it appears
`sys.setprofile()`	R1 - installs the profiler callback
Profile callback signature	R1 - `(frame, event, arg)` handling
`frame.f_code`	R1 - extracting function name and filename
Call tree construction	R2 - own time vs total time
`time.perf_counter_ns()`	R1, R2 - high-resolution timestamps
`tracemalloc.start()` / `stop()`	R3 - memory tracking
`tracemalloc.take_snapshot()`	R3 - allocation site analysis
Context manager protocol	R4 - `__enter__` / `__exit__`
Decorator protocol	R4 - `p.profile` wrapper
`sys.getprofile()`	R4 - save/restore previous callback

Engineering Notes - Where These Patterns Appear in Production

cProfile and profile. The standard library cProfile module is a C extension that implements exactly what you are building, but in C for lower overhead. It installs a C-level profile callback, builds a call graph of (caller, callee) pairs, and computes cumulative times from the graph. When you run python -m cProfile myscript.py, this is what runs. Understanding your own implementation makes reading cProfile's output trivial.

py-spy. py-spy is a sampling profiler that reads the CPython frame stack directly from the process memory of a running Python process - without installing any hooks. It uses ptrace on Linux and proc_pidinfo on macOS to read PyInterpreterState -> PyThreadState -> frame without stopping the target process. Because it is sampling rather than tracing, it has near-zero overhead but can miss short-lived function calls. Your profiler uses tracing (sys.setprofile) which captures every call but adds overhead.

Austin. Austin is another sampling profiler that works at the OS level, sampling the call stack by reading Python frame pointers from the process. It outputs in the collapsed stack format used by flamegraph.pl to produce flame graphs. The data model - a tree of calls with time attribution - is the same tree structure you build in R2.

Pyroscope. Pyroscope is a continuous profiling platform. It integrates with Python via a pyroscope-io SDK that installs a sys.setprofile()-based tracer (similar to yours) and periodically ships aggregated flame graph data to a central server. This data can then be visualised over time - "the profiling data for the last 24 hours for this service, grouped by endpoint". The per-call JSON export you implement in R8 is a simplified version of what Pyroscope ships.

CI performance regression detection. Many engineering teams implement what R8 asks for: a profiler that runs in CI on every pull request and fails the build if a critical function's performance regresses beyond a threshold. The pattern is: run benchmarks, export JSON, compare to a stored baseline, raise if delta exceeds N%. The pytest-benchmark library does this for benchmarks; your check_threshold() method implements the core of the same idea for profiling data.

Extension Challenges

Extension 1 - Flame graph output Implement p.export_flamegraph(filename) that writes the profiling data in the collapsed stack format used by flamegraph.pl:

my_expensive_function;helper_a 1800000
my_expensive_function;helper_b 1300000
my_expensive_function 124000

Each line is a semicolon-separated call stack path and a time value in microseconds. This format can be fed directly to flamegraph.pl (Brendan Gregg's flame graph generator) to produce an SVG.

Extension 2 - Line-level profiling Switch from sys.setprofile() to sys.settrace() for a line-level profiler that shows not just which functions were called but which lines within those functions were executed and how many times. This is more expensive but shows exactly where time is spent within a function. Match the output format of line_profiler's @profile decorator.

Extension 3 - Thread-aware profiling sys.setprofile() installs a per-thread callback. Extend the profiler to work correctly when the profiled block spawns threads. Use threading.settrace() and threading.setprofile() (which set the default trace/profile functions for newly created threads) to ensure all threads are profiled. Aggregate results per-thread in the report.

Extension 4 - Object allocation tracking In addition to tracemalloc byte-level tracking, use sys.getrefcount() and gc.get_objects() before and after the profiled block to track the number of live objects of each type created during profiling. Report: "847 new dict objects, 312 new list objects, 1024 new int objects".

Extension 5 - Async profiling sys.setprofile() does not fire for coroutine suspension and resumption. Extend the profiler to handle async def functions using a custom asyncio event loop runner that instruments task scheduling. Record the time a coroutine spends suspended (waiting for I/O) separately from the time it spends executing. This distinction - CPU time vs wall time for async code - is the core challenge in async profiling.

Learning Objectives​

System Overview​

Requirements​

R1 - Time profiling with sys.setprofile()​

R2 - Call tree construction​

R3 - Memory profiling with tracemalloc​

R4 - Context manager and decorator protocols​

R5 - Formatted report table​

R6 - Top-N memory allocations by traceback​

R7 - Run comparison​

R8 - JSON export and CI threshold enforcement​

Starter Code Skeleton​

Expected Output​

Step-by-Step Hints​

Internals Concepts Tested​

Engineering Notes - Where These Patterns Appear in Production​

Extension Challenges​