Skip to main content

The Python Import System - importlib, Finders, Loaders, and Import Hooks

Reading time: ~35 minutes | Level: Intermediate → Engineering

Before reading further, predict what each line prints:

import sys

print("mymodule" in sys.modules) # ?

import mymodule # assume it exists

print("mymodule" in sys.modules) # ?
import mymodule # does this re-execute the module?
print(id(sys.modules["mymodule"]) == id(mymodule)) # ?
Show Answer
False
True
True

Before the first import mymodule, the name is absent from sys.modules. After the first import, Python finds the module file, executes it, and stores the resulting module object in sys.modules["mymodule"].

The second import mymodule does nothing. Python checks sys.modules first - the name is there - and returns the cached module object without reading or executing the file again. The module's top-level code runs exactly once per interpreter session, no matter how many files import it.

id(sys.modules["mymodule"]) == id(mymodule) is True because import mymodule simply binds the local name mymodule to the object already in sys.modules - it is the same object in memory.

This caching behaviour is the reason import is cheap after the first time, and the reason global state defined at module level (like a database connection pool or a logger) is shared across all importers.

The import system is one of Python's most consequential design decisions. Every framework, plugin system, test runner, and build tool depends on it. Understanding it at depth lets you write plugin systems, diagnose import errors, build dynamic module loaders, and reason about why circular imports fail in unexpected ways.

What You Will Learn

  • The import statement desugared to __import__() and importlib
  • Import resolution order: sys.modules cache → built-ins → sys.path finders
  • sys.path, PYTHONPATH, and sys.meta_path
  • The finder/loader two-step protocol: find_speccreate_module + exec_module
  • importlib.import_module() for runtime dynamic imports
  • importlib.util.spec_from_file_location() for loading files by path
  • Relative vs absolute imports and when to use each
  • __init__.py, __all__, and namespace packages
  • Circular imports: why they happen and how to fix them
  • Custom import hooks: loading modules from non-standard sources
  • importlib.reload(): re-executing a module

Prerequisites

  • Lesson 08 (sys and inspect) - sys.modules and sys.meta_path are sys-module concepts
  • Basic understanding of Python packages and the file system layout
  • Lesson 05 (Reference Counting) - understanding why module objects persist

Part 1 - The import Statement Desugared

Every import statement is syntactic sugar for __import__(). The interpreter translates:

import json

into approximately:

json = __import__("json", globals(), locals(), [], 0)

And:

from os.path import join, exists

into:

_temp = __import__("os.path", globals(), locals(), ["join", "exists"], 0)
join = _temp.join
exists = _temp.exists

__import__() is itself a thin wrapper around importlib._bootstrap._find_and_load(). Since Python 3.1, the entire import machinery lives in importlib. You can use importlib directly in your own code for more control.

Part 2 - Import Resolution Order

sys.path and PYTHONPATH

sys.path is the list of directories Python searches for importable modules. It is populated at startup from:

  1. The directory containing the script being run (or "" for interactive mode, meaning the current working directory)
  2. The PYTHONPATH environment variable (colon-separated on Unix, semicolon-separated on Windows)
  3. Installation-dependent defaults (site-packages, standard library directories)
import sys

# Show the current search path
for i, path in enumerate(sys.path):
print(f"[{i}] {path!r}")

# [0] '' # current directory
# [1] '/usr/lib/python312.zip'
# [2] '/usr/lib/python3.12'
# [3] '/usr/lib/python3.12/lib-dynload'
# [4] '/usr/local/lib/python3.12/dist-packages'
# Set PYTHONPATH before running Python
PYTHONPATH=/opt/my-libs:/opt/other-libs python my_script.py

sys.meta_path: The Finder List

sys.meta_path is a list of finder objects. Python calls finder.find_spec(fullname, path, target) on each one in order until one returns a non-None ModuleSpec.

import sys

for finder in sys.meta_path:
print(type(finder).__name__)

# BuiltinImporter - handles built-in modules (sys, builtins)
# FrozenImporter - handles frozen modules (compiled into the interpreter)
# PathFinder - handles regular files on sys.path

PathFinder is the one that searches sys.path and uses sys.path_hooks to find module files.

Part 3 - Finders and Loaders: The Two-Step Protocol

The import system separates finding a module from loading it. This separation lets you intercept either step.

Step 1: Finder → ModuleSpec

A finder's find_spec(fullname, path, target) returns a ModuleSpec object that describes the module - where it lives, which loader to use, whether it is a package, etc.

import importlib.util

# Use the public API to replicate what Python does internally
spec = importlib.util.find_spec("json")
print(spec.name) # json
print(spec.origin) # /usr/lib/python3.12/json/__init__.py
print(spec.submodule_search_locations) # ['/usr/lib/python3.12/json'] - it's a package
print(type(spec.loader).__name__) # SourceFileLoader

Step 2: Loader → Module Object

The loader's create_module(spec) creates the module object (usually returns None to use the default), and exec_module(module) executes the module's code in the module's namespace.

import importlib.util
import sys

def load_module_from_spec(spec):
"""Manually execute the full load protocol."""
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module # add to cache BEFORE exec (important for circular imports)
spec.loader.exec_module(module)
return module

spec = importlib.util.find_spec("textwrap")
textwrap = load_module_from_spec(spec)
print(textwrap.fill("Hello world", width=40))

Part 4 - importlib: Dynamic Imports

importlib.import_module(): Runtime Dynamic Imports

importlib.import_module() is the public API for programmatic imports - use it when the module name is only known at runtime.

import importlib

# Import by string name - same as 'import json'
json = importlib.import_module("json")
print(json.dumps({"key": "value"}))

# Import a submodule using dotted name
path_module = importlib.import_module("os.path")
print(path_module.join("/usr", "local", "bin"))

# Import relative to a package (like 'from . import sibling')
# The 'package' argument is required for relative imports
# importlib.import_module(".sibling", package="mypackage")
tip

Use importlib.import_module(name) for plugin systems where module names are not known at write time. A plugin system typically reads plugin names from a config file, then imports each one dynamically. This is how Django's INSTALLED_APPS works: Django iterates the list and calls importlib.import_module(app_name) for each entry.

A minimal plugin system:

import importlib
from typing import Protocol

class Plugin(Protocol):
def execute(self, data: dict) -> dict: ...

def load_plugins(plugin_names: list[str]) -> list[Plugin]:
plugins = []
for name in plugin_names:
module = importlib.import_module(name)
if not hasattr(module, "Plugin"):
raise ImportError(f"Module '{name}' has no 'Plugin' class")
plugins.append(module.Plugin())
return plugins

# config.toml might contain:
# plugins = ["myapp.plugins.csv_exporter", "myapp.plugins.json_exporter"]

importlib.util.spec_from_file_location(): Load a File by Path

When you have a path to a .py file that is not on sys.path, use spec_from_file_location:

import importlib.util
import sys

def import_file(module_name: str, file_path: str):
"""Import a Python file from an absolute path, bypassing sys.path."""
spec = importlib.util.spec_from_file_location(module_name, file_path)
if spec is None:
raise ImportError(f"Cannot find module at {file_path!r}")
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
spec.loader.exec_module(module)
return module

# Load a plugin from a user-defined path
plugin = import_file("user_plugin", "/home/user/my_custom_plugin.py")
plugin.run()

This is how tools like pytest's --import-mode and Jupyter's %run magic work.

Part 5 - Relative vs Absolute Imports

Absolute Imports (default in Python 3)

Absolute imports always start from the top of the package hierarchy:

# In myproject/services/email.py
from myproject.models.user import User # absolute - always works
from myproject.utils.formatting import format_email # absolute

Relative Imports

Relative imports use dots to express position within the current package:

# In myproject/services/email.py

from . import formatting # from myproject/services/formatting.py
from .formatting import format_email # same, specific name
from .. import models # up one level: myproject/models/__init__.py
from ..models.user import User # up one level, then down into models
SyntaxMeaningWhen to Use
from . import xsibling module in same packageintra-package references
from .. import xparent packagereferencing a package-level export
from .sub import xchild submoduleaccessing a subpackage
import mypackage.modabsolutecross-package references
note

Relative imports only work inside packages - a file that is run directly as python myfile.py has __name__ == "__main__" and no __package__, so relative imports raise ImportError: attempted relative import with no known parent package. Use python -m mypackage.myfile to run a module inside a package while preserving its package context.

Part 6 - Packages: init.py and all

init.py: The Package Marker

A directory containing __init__.py is a regular package. The __init__.py runs when the package is first imported.

mypackage/
__init__.py ← runs on 'import mypackage'
models.py
services/
__init__.py ← runs on 'import mypackage.services'
email.py
billing.py

__init__.py controls the package's public interface:

# mypackage/__init__.py

# Re-export key names so users can write:
# from mypackage import User, EmailService
# instead of:
# from mypackage.models import User
# from mypackage.services.email import EmailService

from .models import User
from .services.email import EmailService

# Set the package version
__version__ = "2.4.1"

# Optionally control star imports
__all__ = ["User", "EmailService", "__version__"]
note

__init__.py is not required in Python 3.3+ for namespace packages - packages that span multiple directories with no __init__.py. However, __init__.py is still best practice for regular packages: it makes the package intent explicit, enables star-import control via __all__, and ensures consistent behaviour across all Python implementations.

all: Controlling Star Imports

__all__ is a list[str] that controls what from module import * imports. It does not restrict direct attribute access.

# utils.py
__all__ = ["public_function", "PublicClass"]

def public_function():
return "I'm public"

def _private_helper(): # not in __all__ - excluded from star import
return "I'm private"

class PublicClass:
pass

class _InternalClass: # not in __all__ - excluded from star import
pass

# SECRET = "do not export" # also excluded even without leading underscore
from utils import *

public_function() # works - in __all__
PublicClass() # works - in __all__
_private_helper() # NameError - not imported by star import

import utils
utils._private_helper() # works - __all__ does NOT restrict direct access
utils.SECRET # AttributeError only if SECRET doesn't exist

Part 7 - Circular Imports: The Definitive Explanation

Circular imports are one of the most confusing Python problems. Understanding why they fail requires understanding the import protocol's caching step.

Why Circular Imports Happen

# File: a.py
from b import B_value # triggers import of b

# File: b.py
from a import A_value # triggers import of a - but a is already being imported!

When Python imports a.py:

  1. Starts executing a.py
  2. Hits from b import B_value - starts importing b.py
  3. Starts executing b.py
  4. Hits from a import A_value
  5. Checks sys.modules["a"] - the module object IS there (added in step 1 before execution)
  6. But a has only been partially executed - A_value may not exist yet
  7. ImportError: cannot import name 'A_value' from partially initialized module 'a'

How Python Handles Partially-Initialised Modules

# Python adds the module to sys.modules BEFORE executing its code.
# This is step 3 in the load protocol - it prevents infinite recursion.
# But it means the module object in sys.modules may be incomplete.

sys.modules["a"] = module_a # added early - empty namespace
spec.loader.exec_module(module_a) # NOW the module code runs
# After exec_module, module_a.__dict__ is fully populated

This is why import a (deferred attribute access) often works where from a import A_value (immediate attribute access) fails in circular scenarios.

Circular Import Patterns: Fail vs Work

# Pattern 1: from-import at module level - FAILS
# a.py
from b import B_value # needs B_value to exist at import time

# b.py
from a import A_value # needs A_value to exist at import time
# → ImportError: circular import


# Pattern 2: import-module at module level - WORKS (usually)
# a.py
import b # binds name 'b' - does not access b's attributes yet

def use_b():
return b.B_value # attribute access deferred to call time

# b.py
import a

def use_a():
return a.A_value # attribute access deferred to call time
# → Works because neither module accesses the other's attributes at import time


# Pattern 3: deferred import inside function - ALWAYS works
# a.py
def get_b_value():
from b import B_value # import happens when function is called, not at module load
return B_value

How to Fix Circular Imports

warning

Circular imports are almost always a symptom of poor module structure - two modules that depend on each other probably share too much responsibility. The correct fix is to restructure, not to work around.

The three correct fixes, in order of preference:

1. Extract the shared dependency into a third module (best)

# Before: a ↔ b (circular)
# After: a → c ← b (no cycle)

# c.py - shared definitions
class SharedClass: ...
SHARED_CONSTANT = "value"

# a.py
from c import SharedClass, SHARED_CONSTANT

# b.py
from c import SharedClass, SHARED_CONSTANT

2. Move imports inside functions (acceptable)

# a.py
def make_b():
from b import B # deferred - not circular
return B()

3. Use import module instead of from module import name (last resort)

# a.py
import b # works even if b is partially initialised at this point

def use_b():
return b.B_value # b is fully initialised by the time this runs
danger

from module import name at module level fails on circular imports because Python evaluates the from ... import name as attribute access immediately, when the module may be partially initialised. import module only binds the module object - which exists in sys.modules early - and defers attribute access until the name is actually used. If you must have a circular dependency, replace from a import A_value with import a and access a.A_value lazily.

Part 8 - Import Hooks: Custom Finders and Loaders

Python's import system is extensible. You can insert custom finders into sys.meta_path to intercept any import and load the module from an unconventional source - a database, a remote URL, an encrypted archive, or procedurally generated code.

A Complete Custom Finder and Loader

import sys
import importlib.abc
import importlib.machinery
import importlib.util

# Module source stored in a dict - simulates a database or remote source
_VIRTUAL_MODULES = {
"virtual.config": """
CONFIG = {
"debug": True,
"version": "1.0.0",
"db_url": "sqlite:///app.db",
}
""",
"virtual.greet": """
def hello(name: str) -> str:
return f"Hello from a virtual module, {name}!"
""",
}


class DictFinder(importlib.abc.MetaPathFinder):
"""Finder that loads modules from an in-memory dict."""

def find_spec(self, fullname, path, target=None):
if fullname in _VIRTUAL_MODULES:
return importlib.machinery.ModuleSpec(
name=fullname,
loader=DictLoader(fullname),
origin="<dict>",
)
return None # not our module - let other finders try


class DictLoader(importlib.abc.Loader):
"""Loader that executes module source from the dict."""

def __init__(self, name: str):
self.name = name

def create_module(self, spec):
return None # use default module creation

def exec_module(self, module):
source = _VIRTUAL_MODULES[self.name]
exec(compile(source, "<dict>", "exec"), module.__dict__)


# Register the custom finder at the FRONT of sys.meta_path
sys.meta_path.insert(0, DictFinder())

# Now these imports work without any .py files
import virtual.config
import virtual.greet

print(virtual.config.CONFIG["version"]) # 1.0.0
print(virtual.greet.hello("Alice")) # Hello from a virtual module, Alice!

Real-World Import Hook Use Cases

Use CaseDescription
Encrypted packagesDecrypt .pyc files on-the-fly during load
Remote module loadingFetch module source from an internal package server
Database-stored pluginsLoad plugin code stored as text in a database
TranspilersImport .coffee, .ts, or .pyx files by compiling on import
Mock injection in testsReplace real modules with test doubles via sys.modules
Jupyter magic%run uses a custom loader to run files in the current namespace

Part 9 - importlib.reload(): Re-executing a Module

importlib.reload(module) re-executes the module's code in its existing namespace. It is used for:

  • Development REPLs where you edit a file and want to pick up changes without restarting
  • Plugin systems that support hot-reloading of extensions
import importlib
import mymodule

# Edit mymodule.py on disk...

importlib.reload(mymodule)
# mymodule.py is re-executed; new/changed names are updated in sys.modules["mymodule"]

What reload() Does and Does Not Do

# DOES:
# - Re-execute the module's code
# - Update names defined at module level (new functions, changed constants)
# - Update sys.modules[name] in place

# DOES NOT:
# - Create a new module object - the existing object is reused
# - Affect names imported BEFORE the reload:

import mymodule
from mymodule import some_function # binds to the OLD function object

importlib.reload(mymodule)

# mymodule.some_function is the NEW version
# but the local 'some_function' still points to the OLD version

# Fix: re-import after reload
some_function = mymodule.some_function # re-bind to new version
tip

For production plugin hot-reload, do not use importlib.reload() alone - also reset any registries or caches that stored references to the old module's objects. A pattern that works: reload the module, then call a module.register() function that re-registers all its components with your framework.

Graded Practice

Level 1 - Predict the Output

import sys

print("os" in sys.modules)

import os

print("os" in sys.modules)
print("os.path" in sys.modules)

import os.path

print("os" in sys.modules)
print("os.path" in sys.modules)
print(sys.modules["os"] is os)
Show Answer
False (or True - 'os' may already be imported by startup code)
True
True (importing 'os' also imports 'os.path' as a side effect)
True
True
True

In practice, "os" in sys.modules is often True even before your explicit import os, because Python's startup sequence imports os internally. But the logic is: importing os triggers os.path as well (CPython's os.py imports posixpath as path). After import os.path, both are in sys.modules. sys.modules["os"] is os is always True - same object.

Level 2 - Debug This Code

This plugin loader is supposed to load plugins from a directory and call their run() function. It fails with AttributeError: module 'plugin_a' has no attribute 'run'. Diagnose and fix it.

import importlib
import sys
import os

PLUGIN_DIR = "/opt/plugins"

def load_and_run_plugins():
for filename in os.listdir(PLUGIN_DIR):
if filename.endswith(".py"):
module_name = filename[:-3] # strip .py
module = importlib.import_module(module_name)
module.run()

load_and_run_plugins()
Show Answer

Two bugs:

Bug 1: /opt/plugins is not on sys.path

importlib.import_module("plugin_a") searches sys.path. If /opt/plugins is not in sys.path, it finds some other plugin_a (or raises ModuleNotFoundError). If it accidentally finds a different module with the same name (e.g., from an installed package), the AttributeError makes sense - that other module has no run().

Bug 2: The module cache is not cleared between runs

If plugin_a was previously imported from somewhere else (e.g., sys.modules["plugin_a"] exists from a prior load), importlib.import_module returns the cached version without loading from /opt/plugins.

Fixed version:

import importlib.util
import sys
import os

PLUGIN_DIR = "/opt/plugins"

def load_and_run_plugins():
for filename in os.listdir(PLUGIN_DIR):
if not filename.endswith(".py"):
continue
module_name = filename[:-3]
file_path = os.path.join(PLUGIN_DIR, filename)

# Load by explicit path - bypasses sys.path search entirely
spec = importlib.util.spec_from_file_location(module_name, file_path)
if spec is None:
print(f"Cannot load {file_path}")
continue

module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module # register before exec (prevents circular issues)
spec.loader.exec_module(module)

if not hasattr(module, "run"):
raise AttributeError(f"Plugin '{module_name}' has no 'run()' function")
module.run()

load_and_run_plugins()

spec_from_file_location loads from the exact file path, regardless of sys.path.

Level 3 - Design Challenge

You are building a Python test framework. You need to implement a module isolation feature: when running a test, any module the test imports should be loaded fresh (not from the sys.modules cache), and all changes to sys.modules made during the test should be reverted after the test completes - so one test's imports do not pollute another's.

Design and implement a IsolatedImportContext context manager that:

  1. On enter: saves the current sys.modules state
  2. During the context: allows normal imports, but they work on a copy
  3. On exit: restores sys.modules to its pre-enter state, removing anything added during the context

Then explain: why does pytest NOT do this by default, and what are the trade-offs?

Show Answer
import sys
from types import ModuleType

class IsolatedImportContext:
"""
Context manager that rolls back sys.modules to its state before the context.
Useful for test isolation when tests import modules with global side effects.
"""

def __init__(self):
self._saved_modules: dict[str, ModuleType] = {}

def __enter__(self):
# Deep-copy the keys and values of sys.modules
# We save the entire dict snapshot - keys (names) + values (module objects)
self._saved_modules = dict(sys.modules)
return self

def __exit__(self, exc_type, exc_val, exc_tb):
# Remove any modules added during the context
keys_to_remove = set(sys.modules) - set(self._saved_modules)
for key in keys_to_remove:
del sys.modules[key]

# Restore any modules that were present before but were removed or replaced
for name, module in self._saved_modules.items():
sys.modules[name] = module

return False # do not suppress exceptions


# Usage
with IsolatedImportContext():
import json
import textwrap
# json and textwrap are importable here

# After the context, any module NOT already in sys.modules before
# the context is removed. If json was already cached before, it stays.
# If textwrap was newly imported, it is removed.

# Demonstrate isolation:
modules_before = set(sys.modules.keys())

with IsolatedImportContext():
import csv # not commonly pre-imported
print("csv" in sys.modules) # True - inside context

print("csv" in sys.modules) # False - removed on exit (if it wasn't there before)

Why pytest does NOT do this by default:

  1. Performance: Removing and re-importing modules for every test would be catastrophically slow. Most Python modules take 10–500ms to import from disk; re-importing them per test would turn a 5-second test suite into a 5-minute one.

  2. Shared expensive state: Database connection pools, ML model weights, and parsed configuration files are intentionally shared across tests. Isolating imports would require re-initialising all of these per test.

  3. Incorrect isolation level: True test isolation is about isolating state (database rows, file system changes, global variable mutations), not about which module objects are in memory. Two tests sharing the same module object is fine as long as neither test mutates the module's global state - which pytest addresses with fixtures and monkeypatching.

  4. Import-time side effects are the real problem: If a module has side effects on import (starts a thread, opens a socket, writes a file), the solution is to fix the module, not to hide the problem by re-importing it.

When isolation IS appropriate:

  • Testing import hooks and custom finders (you need a clean sys.meta_path)
  • Testing modules that deliberately mutate sys.modules
  • Testing importlib.reload() behaviour
  • Verifying that a module can be imported in isolation without all other modules present

pytest's monkeypatch.syspath_prepend() and importlib.import_module() in combination with sys.modules.pop() are the tools pytest provides for these specific cases.

Key Takeaways

  • import mymodule runs the module's code exactly once per interpreter session. Subsequent import statements return the cached object from sys.modules - no file I/O, no re-execution.
  • The import resolution order is: sys.modules cache → built-in modules → sys.meta_path finders → sys.path directories.
  • sys.meta_path is a list of finder objects. You can insert custom finders to load modules from any source - databases, remote servers, encrypted archives, or generated code.
  • The finder/loader protocol is a clean two-step design: find_spec() locates the module and describes it; exec_module() executes the code in the module's namespace.
  • importlib.import_module(name) is the correct API for dynamic imports in plugin systems and frameworks. Use it whenever the module name is determined at runtime.
  • importlib.util.spec_from_file_location() loads a .py file by its absolute path, bypassing sys.path - essential for loading user plugins from non-standard locations.
  • Relative imports (from . import x, from .. import y) are for intra-package references. They only work when the file is run as part of a package (python -m package.module), not as a script.
  • __init__.py makes a directory a package, runs on first import, and is the right place to define the package's public API via re-exports. It is still best practice even though Python 3.3+ allows namespace packages without it.
  • __all__ controls from module import * - it does not restrict direct attribute access (import module; module._private still works).
  • Circular imports fail at the from module import name level because the name may not exist yet in a partially-initialised module. Fix by restructuring (preferred), deferring to function scope, or using import module with lazy attribute access.
  • importlib.reload() re-executes a module's code in its existing namespace. Names imported before the reload (from module import name) still point to the old objects - you must re-bind them manually after reload.

What's Next

You have now completed Module 03 - Python Internals, covering CPython architecture, bytecode, disassembly, the GIL, reference counting, garbage collection, memory profiling, the sys and inspect modules, and the full import system.

Module 04 - Testing and Quality builds directly on this knowledge:

  • pytest internals - how fixture injection uses inspect.signature (you now understand the mechanism)
  • mocking and patching - how unittest.mock.patch manipulates sys.modules (you now understand the import cache)
  • coverage.py - how it uses sys.settrace to track which lines execute (you now understand the trace hook)
  • property-based testing with Hypothesis - generating test cases from type annotations (you now understand __annotations__ and inspect)

The internals knowledge you built in this module is not academic - it is the foundation that makes the testing tools legible rather than magical.

© 2026 EngineersOfAI. All rights reserved.