The Python Import System - importlib, Finders, Loaders, and Import Hooks
Reading time: ~35 minutes | Level: Intermediate → Engineering
Before reading further, predict what each line prints:
import sys
print("mymodule" in sys.modules) # ?
import mymodule # assume it exists
print("mymodule" in sys.modules) # ?
import mymodule # does this re-execute the module?
print(id(sys.modules["mymodule"]) == id(mymodule)) # ?
Show Answer
False
True
True
Before the first import mymodule, the name is absent from sys.modules. After the first import, Python finds the module file, executes it, and stores the resulting module object in sys.modules["mymodule"].
The second import mymodule does nothing. Python checks sys.modules first - the name is there - and returns the cached module object without reading or executing the file again. The module's top-level code runs exactly once per interpreter session, no matter how many files import it.
id(sys.modules["mymodule"]) == id(mymodule) is True because import mymodule simply binds the local name mymodule to the object already in sys.modules - it is the same object in memory.
This caching behaviour is the reason import is cheap after the first time, and the reason global state defined at module level (like a database connection pool or a logger) is shared across all importers.
The import system is one of Python's most consequential design decisions. Every framework, plugin system, test runner, and build tool depends on it. Understanding it at depth lets you write plugin systems, diagnose import errors, build dynamic module loaders, and reason about why circular imports fail in unexpected ways.
What You Will Learn
- The import statement desugared to
__import__()andimportlib - Import resolution order:
sys.modulescache → built-ins →sys.pathfinders sys.path,PYTHONPATH, andsys.meta_path- The finder/loader two-step protocol:
find_spec→create_module+exec_module importlib.import_module()for runtime dynamic importsimportlib.util.spec_from_file_location()for loading files by path- Relative vs absolute imports and when to use each
__init__.py,__all__, and namespace packages- Circular imports: why they happen and how to fix them
- Custom import hooks: loading modules from non-standard sources
importlib.reload(): re-executing a module
Prerequisites
- Lesson 08 (sys and inspect) -
sys.modulesandsys.meta_pathare sys-module concepts - Basic understanding of Python packages and the file system layout
- Lesson 05 (Reference Counting) - understanding why module objects persist
Part 1 - The import Statement Desugared
Every import statement is syntactic sugar for __import__(). The interpreter translates:
import json
into approximately:
json = __import__("json", globals(), locals(), [], 0)
And:
from os.path import join, exists
into:
_temp = __import__("os.path", globals(), locals(), ["join", "exists"], 0)
join = _temp.join
exists = _temp.exists
__import__() is itself a thin wrapper around importlib._bootstrap._find_and_load(). Since Python 3.1, the entire import machinery lives in importlib. You can use importlib directly in your own code for more control.
Part 2 - Import Resolution Order
sys.path and PYTHONPATH
sys.path is the list of directories Python searches for importable modules. It is populated at startup from:
- The directory containing the script being run (or
""for interactive mode, meaning the current working directory) - The
PYTHONPATHenvironment variable (colon-separated on Unix, semicolon-separated on Windows) - Installation-dependent defaults (site-packages, standard library directories)
import sys
# Show the current search path
for i, path in enumerate(sys.path):
print(f"[{i}] {path!r}")
# [0] '' # current directory
# [1] '/usr/lib/python312.zip'
# [2] '/usr/lib/python3.12'
# [3] '/usr/lib/python3.12/lib-dynload'
# [4] '/usr/local/lib/python3.12/dist-packages'
# Set PYTHONPATH before running Python
PYTHONPATH=/opt/my-libs:/opt/other-libs python my_script.py
sys.meta_path: The Finder List
sys.meta_path is a list of finder objects. Python calls finder.find_spec(fullname, path, target) on each one in order until one returns a non-None ModuleSpec.
import sys
for finder in sys.meta_path:
print(type(finder).__name__)
# BuiltinImporter - handles built-in modules (sys, builtins)
# FrozenImporter - handles frozen modules (compiled into the interpreter)
# PathFinder - handles regular files on sys.path
PathFinder is the one that searches sys.path and uses sys.path_hooks to find module files.
Part 3 - Finders and Loaders: The Two-Step Protocol
The import system separates finding a module from loading it. This separation lets you intercept either step.
Step 1: Finder → ModuleSpec
A finder's find_spec(fullname, path, target) returns a ModuleSpec object that describes the module - where it lives, which loader to use, whether it is a package, etc.
import importlib.util
# Use the public API to replicate what Python does internally
spec = importlib.util.find_spec("json")
print(spec.name) # json
print(spec.origin) # /usr/lib/python3.12/json/__init__.py
print(spec.submodule_search_locations) # ['/usr/lib/python3.12/json'] - it's a package
print(type(spec.loader).__name__) # SourceFileLoader
Step 2: Loader → Module Object
The loader's create_module(spec) creates the module object (usually returns None to use the default), and exec_module(module) executes the module's code in the module's namespace.
import importlib.util
import sys
def load_module_from_spec(spec):
"""Manually execute the full load protocol."""
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module # add to cache BEFORE exec (important for circular imports)
spec.loader.exec_module(module)
return module
spec = importlib.util.find_spec("textwrap")
textwrap = load_module_from_spec(spec)
print(textwrap.fill("Hello world", width=40))
Part 4 - importlib: Dynamic Imports
importlib.import_module(): Runtime Dynamic Imports
importlib.import_module() is the public API for programmatic imports - use it when the module name is only known at runtime.
import importlib
# Import by string name - same as 'import json'
json = importlib.import_module("json")
print(json.dumps({"key": "value"}))
# Import a submodule using dotted name
path_module = importlib.import_module("os.path")
print(path_module.join("/usr", "local", "bin"))
# Import relative to a package (like 'from . import sibling')
# The 'package' argument is required for relative imports
# importlib.import_module(".sibling", package="mypackage")
Use importlib.import_module(name) for plugin systems where module names are not known at write time. A plugin system typically reads plugin names from a config file, then imports each one dynamically. This is how Django's INSTALLED_APPS works: Django iterates the list and calls importlib.import_module(app_name) for each entry.
A minimal plugin system:
import importlib
from typing import Protocol
class Plugin(Protocol):
def execute(self, data: dict) -> dict: ...
def load_plugins(plugin_names: list[str]) -> list[Plugin]:
plugins = []
for name in plugin_names:
module = importlib.import_module(name)
if not hasattr(module, "Plugin"):
raise ImportError(f"Module '{name}' has no 'Plugin' class")
plugins.append(module.Plugin())
return plugins
# config.toml might contain:
# plugins = ["myapp.plugins.csv_exporter", "myapp.plugins.json_exporter"]
importlib.util.spec_from_file_location(): Load a File by Path
When you have a path to a .py file that is not on sys.path, use spec_from_file_location:
import importlib.util
import sys
def import_file(module_name: str, file_path: str):
"""Import a Python file from an absolute path, bypassing sys.path."""
spec = importlib.util.spec_from_file_location(module_name, file_path)
if spec is None:
raise ImportError(f"Cannot find module at {file_path!r}")
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
spec.loader.exec_module(module)
return module
# Load a plugin from a user-defined path
plugin = import_file("user_plugin", "/home/user/my_custom_plugin.py")
plugin.run()
This is how tools like pytest's --import-mode and Jupyter's %run magic work.
Part 5 - Relative vs Absolute Imports
Absolute Imports (default in Python 3)
Absolute imports always start from the top of the package hierarchy:
# In myproject/services/email.py
from myproject.models.user import User # absolute - always works
from myproject.utils.formatting import format_email # absolute
Relative Imports
Relative imports use dots to express position within the current package:
# In myproject/services/email.py
from . import formatting # from myproject/services/formatting.py
from .formatting import format_email # same, specific name
from .. import models # up one level: myproject/models/__init__.py
from ..models.user import User # up one level, then down into models
| Syntax | Meaning | When to Use |
|---|---|---|
from . import x | sibling module in same package | intra-package references |
from .. import x | parent package | referencing a package-level export |
from .sub import x | child submodule | accessing a subpackage |
import mypackage.mod | absolute | cross-package references |
Relative imports only work inside packages - a file that is run directly as python myfile.py has __name__ == "__main__" and no __package__, so relative imports raise ImportError: attempted relative import with no known parent package. Use python -m mypackage.myfile to run a module inside a package while preserving its package context.
Part 6 - Packages: init.py and all
init.py: The Package Marker
A directory containing __init__.py is a regular package. The __init__.py runs when the package is first imported.
mypackage/
__init__.py ← runs on 'import mypackage'
models.py
services/
__init__.py ← runs on 'import mypackage.services'
email.py
billing.py
__init__.py controls the package's public interface:
# mypackage/__init__.py
# Re-export key names so users can write:
# from mypackage import User, EmailService
# instead of:
# from mypackage.models import User
# from mypackage.services.email import EmailService
from .models import User
from .services.email import EmailService
# Set the package version
__version__ = "2.4.1"
# Optionally control star imports
__all__ = ["User", "EmailService", "__version__"]
__init__.py is not required in Python 3.3+ for namespace packages - packages that span multiple directories with no __init__.py. However, __init__.py is still best practice for regular packages: it makes the package intent explicit, enables star-import control via __all__, and ensures consistent behaviour across all Python implementations.
all: Controlling Star Imports
__all__ is a list[str] that controls what from module import * imports. It does not restrict direct attribute access.
# utils.py
__all__ = ["public_function", "PublicClass"]
def public_function():
return "I'm public"
def _private_helper(): # not in __all__ - excluded from star import
return "I'm private"
class PublicClass:
pass
class _InternalClass: # not in __all__ - excluded from star import
pass
# SECRET = "do not export" # also excluded even without leading underscore
from utils import *
public_function() # works - in __all__
PublicClass() # works - in __all__
_private_helper() # NameError - not imported by star import
import utils
utils._private_helper() # works - __all__ does NOT restrict direct access
utils.SECRET # AttributeError only if SECRET doesn't exist
Part 7 - Circular Imports: The Definitive Explanation
Circular imports are one of the most confusing Python problems. Understanding why they fail requires understanding the import protocol's caching step.
Why Circular Imports Happen
# File: a.py
from b import B_value # triggers import of b
# File: b.py
from a import A_value # triggers import of a - but a is already being imported!
When Python imports a.py:
- Starts executing
a.py - Hits
from b import B_value- starts importingb.py - Starts executing
b.py - Hits
from a import A_value - Checks
sys.modules["a"]- the module object IS there (added in step 1 before execution) - But
ahas only been partially executed -A_valuemay not exist yet ImportError: cannot import name 'A_value' from partially initialized module 'a'
How Python Handles Partially-Initialised Modules
# Python adds the module to sys.modules BEFORE executing its code.
# This is step 3 in the load protocol - it prevents infinite recursion.
# But it means the module object in sys.modules may be incomplete.
sys.modules["a"] = module_a # added early - empty namespace
spec.loader.exec_module(module_a) # NOW the module code runs
# After exec_module, module_a.__dict__ is fully populated
This is why import a (deferred attribute access) often works where from a import A_value (immediate attribute access) fails in circular scenarios.
Circular Import Patterns: Fail vs Work
# Pattern 1: from-import at module level - FAILS
# a.py
from b import B_value # needs B_value to exist at import time
# b.py
from a import A_value # needs A_value to exist at import time
# → ImportError: circular import
# Pattern 2: import-module at module level - WORKS (usually)
# a.py
import b # binds name 'b' - does not access b's attributes yet
def use_b():
return b.B_value # attribute access deferred to call time
# b.py
import a
def use_a():
return a.A_value # attribute access deferred to call time
# → Works because neither module accesses the other's attributes at import time
# Pattern 3: deferred import inside function - ALWAYS works
# a.py
def get_b_value():
from b import B_value # import happens when function is called, not at module load
return B_value
How to Fix Circular Imports
Circular imports are almost always a symptom of poor module structure - two modules that depend on each other probably share too much responsibility. The correct fix is to restructure, not to work around.
The three correct fixes, in order of preference:
1. Extract the shared dependency into a third module (best)
# Before: a ↔ b (circular)
# After: a → c ← b (no cycle)
# c.py - shared definitions
class SharedClass: ...
SHARED_CONSTANT = "value"
# a.py
from c import SharedClass, SHARED_CONSTANT
# b.py
from c import SharedClass, SHARED_CONSTANT
2. Move imports inside functions (acceptable)
# a.py
def make_b():
from b import B # deferred - not circular
return B()
3. Use import module instead of from module import name (last resort)
# a.py
import b # works even if b is partially initialised at this point
def use_b():
return b.B_value # b is fully initialised by the time this runs
from module import name at module level fails on circular imports because Python evaluates the from ... import name as attribute access immediately, when the module may be partially initialised. import module only binds the module object - which exists in sys.modules early - and defers attribute access until the name is actually used. If you must have a circular dependency, replace from a import A_value with import a and access a.A_value lazily.
Part 8 - Import Hooks: Custom Finders and Loaders
Python's import system is extensible. You can insert custom finders into sys.meta_path to intercept any import and load the module from an unconventional source - a database, a remote URL, an encrypted archive, or procedurally generated code.
A Complete Custom Finder and Loader
import sys
import importlib.abc
import importlib.machinery
import importlib.util
# Module source stored in a dict - simulates a database or remote source
_VIRTUAL_MODULES = {
"virtual.config": """
CONFIG = {
"debug": True,
"version": "1.0.0",
"db_url": "sqlite:///app.db",
}
""",
"virtual.greet": """
def hello(name: str) -> str:
return f"Hello from a virtual module, {name}!"
""",
}
class DictFinder(importlib.abc.MetaPathFinder):
"""Finder that loads modules from an in-memory dict."""
def find_spec(self, fullname, path, target=None):
if fullname in _VIRTUAL_MODULES:
return importlib.machinery.ModuleSpec(
name=fullname,
loader=DictLoader(fullname),
origin="<dict>",
)
return None # not our module - let other finders try
class DictLoader(importlib.abc.Loader):
"""Loader that executes module source from the dict."""
def __init__(self, name: str):
self.name = name
def create_module(self, spec):
return None # use default module creation
def exec_module(self, module):
source = _VIRTUAL_MODULES[self.name]
exec(compile(source, "<dict>", "exec"), module.__dict__)
# Register the custom finder at the FRONT of sys.meta_path
sys.meta_path.insert(0, DictFinder())
# Now these imports work without any .py files
import virtual.config
import virtual.greet
print(virtual.config.CONFIG["version"]) # 1.0.0
print(virtual.greet.hello("Alice")) # Hello from a virtual module, Alice!
Real-World Import Hook Use Cases
| Use Case | Description |
|---|---|
| Encrypted packages | Decrypt .pyc files on-the-fly during load |
| Remote module loading | Fetch module source from an internal package server |
| Database-stored plugins | Load plugin code stored as text in a database |
| Transpilers | Import .coffee, .ts, or .pyx files by compiling on import |
| Mock injection in tests | Replace real modules with test doubles via sys.modules |
| Jupyter magic | %run uses a custom loader to run files in the current namespace |
Part 9 - importlib.reload(): Re-executing a Module
importlib.reload(module) re-executes the module's code in its existing namespace. It is used for:
- Development REPLs where you edit a file and want to pick up changes without restarting
- Plugin systems that support hot-reloading of extensions
import importlib
import mymodule
# Edit mymodule.py on disk...
importlib.reload(mymodule)
# mymodule.py is re-executed; new/changed names are updated in sys.modules["mymodule"]
What reload() Does and Does Not Do
# DOES:
# - Re-execute the module's code
# - Update names defined at module level (new functions, changed constants)
# - Update sys.modules[name] in place
# DOES NOT:
# - Create a new module object - the existing object is reused
# - Affect names imported BEFORE the reload:
import mymodule
from mymodule import some_function # binds to the OLD function object
importlib.reload(mymodule)
# mymodule.some_function is the NEW version
# but the local 'some_function' still points to the OLD version
# Fix: re-import after reload
some_function = mymodule.some_function # re-bind to new version
For production plugin hot-reload, do not use importlib.reload() alone - also reset any registries or caches that stored references to the old module's objects. A pattern that works: reload the module, then call a module.register() function that re-registers all its components with your framework.
Graded Practice
Level 1 - Predict the Output
import sys
print("os" in sys.modules)
import os
print("os" in sys.modules)
print("os.path" in sys.modules)
import os.path
print("os" in sys.modules)
print("os.path" in sys.modules)
print(sys.modules["os"] is os)
Show Answer
False (or True - 'os' may already be imported by startup code)
True
True (importing 'os' also imports 'os.path' as a side effect)
True
True
True
In practice, "os" in sys.modules is often True even before your explicit import os, because Python's startup sequence imports os internally. But the logic is: importing os triggers os.path as well (CPython's os.py imports posixpath as path). After import os.path, both are in sys.modules. sys.modules["os"] is os is always True - same object.
Level 2 - Debug This Code
This plugin loader is supposed to load plugins from a directory and call their run() function. It fails with AttributeError: module 'plugin_a' has no attribute 'run'. Diagnose and fix it.
import importlib
import sys
import os
PLUGIN_DIR = "/opt/plugins"
def load_and_run_plugins():
for filename in os.listdir(PLUGIN_DIR):
if filename.endswith(".py"):
module_name = filename[:-3] # strip .py
module = importlib.import_module(module_name)
module.run()
load_and_run_plugins()
Show Answer
Two bugs:
Bug 1: /opt/plugins is not on sys.path
importlib.import_module("plugin_a") searches sys.path. If /opt/plugins is not in sys.path, it finds some other plugin_a (or raises ModuleNotFoundError). If it accidentally finds a different module with the same name (e.g., from an installed package), the AttributeError makes sense - that other module has no run().
Bug 2: The module cache is not cleared between runs
If plugin_a was previously imported from somewhere else (e.g., sys.modules["plugin_a"] exists from a prior load), importlib.import_module returns the cached version without loading from /opt/plugins.
Fixed version:
import importlib.util
import sys
import os
PLUGIN_DIR = "/opt/plugins"
def load_and_run_plugins():
for filename in os.listdir(PLUGIN_DIR):
if not filename.endswith(".py"):
continue
module_name = filename[:-3]
file_path = os.path.join(PLUGIN_DIR, filename)
# Load by explicit path - bypasses sys.path search entirely
spec = importlib.util.spec_from_file_location(module_name, file_path)
if spec is None:
print(f"Cannot load {file_path}")
continue
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module # register before exec (prevents circular issues)
spec.loader.exec_module(module)
if not hasattr(module, "run"):
raise AttributeError(f"Plugin '{module_name}' has no 'run()' function")
module.run()
load_and_run_plugins()
spec_from_file_location loads from the exact file path, regardless of sys.path.
Level 3 - Design Challenge
You are building a Python test framework. You need to implement a module isolation feature: when running a test, any module the test imports should be loaded fresh (not from the sys.modules cache), and all changes to sys.modules made during the test should be reverted after the test completes - so one test's imports do not pollute another's.
Design and implement a IsolatedImportContext context manager that:
- On enter: saves the current
sys.modulesstate - During the context: allows normal imports, but they work on a copy
- On exit: restores
sys.modulesto its pre-enter state, removing anything added during the context
Then explain: why does pytest NOT do this by default, and what are the trade-offs?
Show Answer
import sys
from types import ModuleType
class IsolatedImportContext:
"""
Context manager that rolls back sys.modules to its state before the context.
Useful for test isolation when tests import modules with global side effects.
"""
def __init__(self):
self._saved_modules: dict[str, ModuleType] = {}
def __enter__(self):
# Deep-copy the keys and values of sys.modules
# We save the entire dict snapshot - keys (names) + values (module objects)
self._saved_modules = dict(sys.modules)
return self
def __exit__(self, exc_type, exc_val, exc_tb):
# Remove any modules added during the context
keys_to_remove = set(sys.modules) - set(self._saved_modules)
for key in keys_to_remove:
del sys.modules[key]
# Restore any modules that were present before but were removed or replaced
for name, module in self._saved_modules.items():
sys.modules[name] = module
return False # do not suppress exceptions
# Usage
with IsolatedImportContext():
import json
import textwrap
# json and textwrap are importable here
# After the context, any module NOT already in sys.modules before
# the context is removed. If json was already cached before, it stays.
# If textwrap was newly imported, it is removed.
# Demonstrate isolation:
modules_before = set(sys.modules.keys())
with IsolatedImportContext():
import csv # not commonly pre-imported
print("csv" in sys.modules) # True - inside context
print("csv" in sys.modules) # False - removed on exit (if it wasn't there before)
Why pytest does NOT do this by default:
-
Performance: Removing and re-importing modules for every test would be catastrophically slow. Most Python modules take 10–500ms to import from disk; re-importing them per test would turn a 5-second test suite into a 5-minute one.
-
Shared expensive state: Database connection pools, ML model weights, and parsed configuration files are intentionally shared across tests. Isolating imports would require re-initialising all of these per test.
-
Incorrect isolation level: True test isolation is about isolating state (database rows, file system changes, global variable mutations), not about which module objects are in memory. Two tests sharing the same module object is fine as long as neither test mutates the module's global state - which pytest addresses with fixtures and monkeypatching.
-
Import-time side effects are the real problem: If a module has side effects on import (starts a thread, opens a socket, writes a file), the solution is to fix the module, not to hide the problem by re-importing it.
When isolation IS appropriate:
- Testing import hooks and custom finders (you need a clean
sys.meta_path) - Testing modules that deliberately mutate
sys.modules - Testing
importlib.reload()behaviour - Verifying that a module can be imported in isolation without all other modules present
pytest's monkeypatch.syspath_prepend() and importlib.import_module() in combination with sys.modules.pop() are the tools pytest provides for these specific cases.
Key Takeaways
import mymoduleruns the module's code exactly once per interpreter session. Subsequentimportstatements return the cached object fromsys.modules- no file I/O, no re-execution.- The import resolution order is:
sys.modulescache → built-in modules →sys.meta_pathfinders →sys.pathdirectories. sys.meta_pathis a list of finder objects. You can insert custom finders to load modules from any source - databases, remote servers, encrypted archives, or generated code.- The finder/loader protocol is a clean two-step design:
find_spec()locates the module and describes it;exec_module()executes the code in the module's namespace. importlib.import_module(name)is the correct API for dynamic imports in plugin systems and frameworks. Use it whenever the module name is determined at runtime.importlib.util.spec_from_file_location()loads a.pyfile by its absolute path, bypassingsys.path- essential for loading user plugins from non-standard locations.- Relative imports (
from . import x,from .. import y) are for intra-package references. They only work when the file is run as part of a package (python -m package.module), not as a script. __init__.pymakes a directory a package, runs on first import, and is the right place to define the package's public API via re-exports. It is still best practice even though Python 3.3+ allows namespace packages without it.__all__controlsfrom module import *- it does not restrict direct attribute access (import module; module._privatestill works).- Circular imports fail at the
from module import namelevel because the name may not exist yet in a partially-initialised module. Fix by restructuring (preferred), deferring to function scope, or usingimport modulewith lazy attribute access. importlib.reload()re-executes a module's code in its existing namespace. Names imported before the reload (from module import name) still point to the old objects - you must re-bind them manually after reload.
What's Next
You have now completed Module 03 - Python Internals, covering CPython architecture, bytecode, disassembly, the GIL, reference counting, garbage collection, memory profiling, the sys and inspect modules, and the full import system.
Module 04 - Testing and Quality builds directly on this knowledge:
- pytest internals - how fixture injection uses
inspect.signature(you now understand the mechanism) - mocking and patching - how
unittest.mock.patchmanipulatessys.modules(you now understand the import cache) - coverage.py - how it uses
sys.settraceto track which lines execute (you now understand the trace hook) - property-based testing with Hypothesis - generating test cases from type annotations (you now understand
__annotations__andinspect)
The internals knowledge you built in this module is not academic - it is the foundation that makes the testing tools legible rather than magical.
