Dataclasses - Code Generation, Immutability, and Production Patterns
Reading time: ~28 minutes | Level: Intermediate → Engineering
Before reading further, predict which of these two classes are equivalent:
# Version A
class Point:
def __init__(self, x: float, y: float):
self.x = x
self.y = y
def __repr__(self):
return f"Point(x={self.x!r}, y={self.y!r})"
def __eq__(self, other):
if not isinstance(other, Point):
return NotImplemented
return (self.x, self.y) == (other.x, other.y)
# Version B
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
They are functionally equivalent - @dataclass generates all three methods from the field declarations. But this is just the entry point. @dataclass can also generate __hash__, __lt__, __le__, __gt__, __ge__, handle mutable defaults correctly, enforce immutability, run post-initialisation validation, and more - all through parameters and field().
The lesson is not about saving keystrokes. It is about understanding what the generator produces, knowing its edge cases, and using it correctly in production code.
What You Will Learn
- What
@dataclassgenerates and exactly how each generated method works field()- controlling individual field behaviour (defaults, factories, repr, compare, hash)frozen=True- how it implements immutability and where it breaks down__post_init__- running validation and computed fields after generationClassVarvsInitVar- fields that are not instance attributes- Inheritance with dataclasses - what the field ordering rules are and where they fail
@dataclass(order=True)- comparison method generation and its implications- Comparing dataclasses to
NamedTupleandattrs- when to use each - Production patterns: FastAPI request models, config objects, value objects
- Performance considerations
Prerequisites
- Lessons 01–09 of this module
- Understanding of
__init__,__repr__,__eq__,__hash__from Lessons 02 and 03 - Familiarity with type hints
Part 1 - What @dataclass Generates
The Basic Case
from dataclasses import dataclass
@dataclass
class Product:
name: str
price: float
quantity: int = 0
The decorator generates:
# Generated __init__
def __init__(self, name: str, price: float, quantity: int = 0):
self.name = name
self.price = price
self.quantity = quantity
# Generated __repr__
def __repr__(self):
return f"Product(name={self.name!r}, price={self.price!r}, quantity={self.quantity!r})"
# Generated __eq__
def __eq__(self, other):
if other.__class__ is self.__class__:
return (self.name, self.price, self.quantity) == (other.name, other.price, other.quantity)
return NotImplemented
The ordering of __init__ parameters matches the field declaration order. Fields with defaults must come after fields without defaults - same rule as function arguments.
Inspecting the Generated Code
from dataclasses import dataclass, fields, asdict, astuple
import inspect
@dataclass
class Config:
host: str
port: int = 5432
debug: bool = False
# Inspect the generated __init__
print(inspect.signature(Config.__init__))
# (self, host: str, port: int = 5432, debug: bool = False)
# Inspect all fields
for f in fields(Config):
print(f.name, f.type, f.default)
# host <class 'str'> MISSING
# port <class 'int'> 5432
# debug <class 'bool'> False
# Convert to dict and tuple
cfg = Config(host="localhost")
print(asdict(cfg)) # {'host': 'localhost', 'port': 5432, 'debug': False}
print(astuple(cfg)) # ('localhost', 5432, False)
fields(), asdict(), and astuple() are utility functions from the dataclasses module. They work on any dataclass instance.
The @dataclass Parameters
@dataclass(
init=True, # generate __init__ (default True)
repr=True, # generate __repr__ (default True)
eq=True, # generate __eq__ (default True)
order=False, # generate __lt__, __le__, __gt__, __ge__ (default False)
unsafe_hash=False, # generate __hash__ even if eq=True (default False)
frozen=False, # make instances immutable (default False)
match_args=True, # generate __match_args__ for pattern matching (Python 3.10+, default True)
kw_only=False, # all fields are keyword-only (Python 3.10+, default False)
slots=False, # use __slots__ (Python 3.10+, default False)
)
class MyClass:
...
Part 2 - field() and default_factory
Why field() Exists
The @dataclass decorator cannot allow mutable defaults directly. This is the same mutable class attribute problem from Lesson 01, but caught at definition time:
from dataclasses import dataclass
@dataclass
class BadRegistry:
items: list = [] # ValueError: mutable default <class 'list'> is not allowed
Python raises ValueError immediately. The fix is field(default_factory=...):
from dataclasses import dataclass, field
@dataclass
class Registry:
items: list = field(default_factory=list) # each instance gets its OWN list
metadata: dict = field(default_factory=dict)
r1 = Registry()
r2 = Registry()
r1.items.append("alpha")
print(r1.items) # ['alpha']
print(r2.items) # [] - separate list
default_factory is a zero-argument callable. It is called once per instance creation.
default_factory is required for mutable defaults like list and dict. Writing items: list = [] raises ValueError at class definition time - Python catches this immediately to prevent the shared-mutable-default bug. Always use field(default_factory=list) or field(default_factory=dict). For scalar immutable values (int, str, bool, None), a plain default is fine.
All field() Parameters
from dataclasses import dataclass, field
@dataclass
class Event:
# name: included in __init__, __repr__, __eq__, __hash__
title: str
# default: scalar default value (for immutable values)
priority: int = field(default=0)
# default_factory: callable that produces the default (for mutable values)
tags: list[str] = field(default_factory=list)
# repr=False: exclude from __repr__ (useful for sensitive data)
_internal_id: str = field(default="", repr=False)
# compare=False: exclude from __eq__ and ordering comparisons
created_at: str = field(default="", compare=False)
# hash=False: exclude from __hash__ (only relevant when unsafe_hash=True or frozen=True)
description: str = field(default="", hash=False)
# init=False: not a constructor argument - set in __post_init__ instead
word_count: int = field(default=0, init=False)
def __post_init__(self):
self.word_count = len(self.description.split())
e = Event(title="Launch", priority=1, tags=["product", "urgent"])
print(e)
# Event(title='Launch', priority=1, tags=['product', 'urgent'],
# created_at='', description='', word_count=0)
# Note: _internal_id excluded from repr
Custom default_factory Functions
from dataclasses import dataclass, field
from datetime import datetime
import uuid
def generate_id() -> str:
return str(uuid.uuid4())
def current_timestamp() -> str:
return datetime.utcnow().isoformat()
@dataclass
class Order:
product_id: int
quantity: int
order_id: str = field(default_factory=generate_id)
created_at: str = field(default_factory=current_timestamp)
line_items: list = field(default_factory=list)
o1 = Order(product_id=42, quantity=3)
o2 = Order(product_id=99, quantity=1)
print(o1.order_id) # 'f47ac10b-58cc-4372-a567-0e02b2c3d479' (unique UUID)
print(o2.order_id) # 'a3bb189e-8bf9-3888-9912-ace4e6543002' (different UUID)
print(o1.order_id == o2.order_id) # False - each gets its own UUID
Part 3 - frozen=True and Immutability
What frozen=True Does
frozen=True generates __setattr__ and __delattr__ methods that raise FrozenInstanceError on any attempt to modify the instance after creation. It also makes the dataclass hashable (since equality-based classes are not hashable by default in Python 3).
from dataclasses import dataclass
@dataclass(frozen=True)
class Vector:
x: float
y: float
def magnitude(self) -> float:
return (self.x ** 2 + self.y ** 2) ** 0.5
def __add__(self, other: "Vector") -> "Vector":
return Vector(self.x + other.x, self.y + other.y) # returns new Vector
v1 = Vector(3.0, 4.0)
v2 = Vector(1.0, 2.0)
print(v1.magnitude()) # 5.0
print(v1 + v2) # Vector(x=4.0, y=6.0)
v1.x = 10.0 # FrozenInstanceError: cannot assign to field 'x'
# Frozen dataclasses are hashable
print(hash(v1)) # consistent hash
d = {v1: "origin vector"}
print(d[Vector(3.0, 4.0)]) # "origin vector" - equal frozen instances hash equally
Use frozen=True for value objects - types like Money, Point, Coordinate, or Config that should be treated as immutable values, not mutable state. Frozen dataclasses are hashable, usable as dictionary keys, and safe to share across threads without locks. They communicate design intent: "this object should not change after creation."
The Limits of frozen=True
frozen=True prevents reassignment of attributes. It does not prevent mutation of mutable objects stored as attributes.
from dataclasses import dataclass, field
@dataclass(frozen=True)
class Config:
host: str
tags: list = field(default_factory=list) # mutable!
cfg = Config(host="localhost", tags=["web"])
# This raises FrozenInstanceError - correct
# cfg.host = "other"
# This WORKS - mutating the list object, not reassigning the attribute
cfg.tags.append("db")
print(cfg.tags) # ['web', 'db'] - mutation succeeded despite frozen=True
True immutability requires immutable container types:
@dataclass(frozen=True)
class ImmutableConfig:
host: str
tags: tuple = () # tuple - genuinely immutable
cfg = ImmutableConfig(host="localhost", tags=("web", "db"))
# cfg.tags.append("x") # AttributeError: 'tuple' has no attribute 'append' - correct
Using frozen=True for Dictionary Keys and Set Members
from dataclasses import dataclass
@dataclass(frozen=True)
class Point:
x: float
y: float
# Can be used as dict keys and set members
points = {Point(0, 0), Point(1, 1), Point(0, 0)}
print(len(points)) # 2 - duplicates removed
registry = {
Point(0, 0): "origin",
Point(1, 0): "unit x",
}
print(registry[Point(0, 0)]) # "origin"
Part 4 - __post_init__ for Validation and Computed Fields
__post_init__ runs immediately after the generated __init__ completes. Use it for validation, normalisation, and computed fields that depend on the constructor arguments.
__post_init__ runs after the generated __init__ finishes assigning all fields. This is the correct place for: input validation (raise ValueError early), field normalisation (strip/lowercase strings), and derived fields computed from other fields (e.g. word_count = len(description.split())). Do not try to use __init__ directly when a dataclass is involved - you will override the generated one.
Validation
from dataclasses import dataclass
from datetime import date
@dataclass
class DateRange:
start: date
end: date
def __post_init__(self):
if self.end < self.start:
raise ValueError(
f"end ({self.end}) must be >= start ({self.start})"
)
@property
def duration_days(self) -> int:
return (self.end - self.start).days
today = date.today()
dr = DateRange(start=date(2024, 1, 1), end=date(2024, 12, 31))
print(dr.duration_days) # 365
DateRange(start=date(2024, 12, 31), end=date(2024, 1, 1))
# ValueError: end (2024-01-01) must be >= start (2024-12-31)
Normalisation and Computed Fields
from dataclasses import dataclass, field
@dataclass
class EmailAddress:
raw: str
normalised: str = field(init=False, repr=True)
domain: str = field(init=False, repr=True)
def __post_init__(self):
# Normalise: lowercase, strip whitespace
self.normalised = self.raw.strip().lower()
# Validate
if "@" not in self.normalised:
raise ValueError(f"Invalid email: {self.raw!r}")
# Extract computed field
local, self.domain = self.normalised.rsplit("@", 1)
print(e.domain) # example.com
print(e) # EmailAddress(raw=' [email protected] ', normalised='[email protected]', domain='example.com')
Type Coercion in __post_init__
from dataclasses import dataclass
@dataclass
class Port:
number: int
def __post_init__(self):
# Coerce string input to int - useful when loading from environment variables
self.number = int(self.number)
if not (1 <= self.number <= 65535):
raise ValueError(f"Port must be 1–65535, got {self.number}")
p = Port(number="8080") # string input
print(p.number) # 8080 (int)
print(type(p.number)) # <class 'int'>
Part 5 - ClassVar and InitVar
ClassVar - Class-Level Attributes
Fields annotated with ClassVar are excluded from __init__, __repr__, __eq__, and all dataclass operations. They are class attributes, not instance attributes.
from dataclasses import dataclass
from typing import ClassVar
@dataclass
class Employee:
# ClassVar: class-level, not an instance field
company: ClassVar[str] = "EngineersOfAI"
max_salary: ClassVar[float] = 200_000.0
# Instance fields
name: str
role: str
salary: float
e = Employee(name="Alice", role="Engineer", salary=90_000)
print(e)
# Employee(name='Alice', role='Engineer', salary=90000.0)
# company and max_salary NOT in repr
print(Employee.company) # EngineersOfAI - class attribute
print(e.company) # EngineersOfAI - accessed via instance, lives on class
# ClassVar is excluded from __init__
# Employee(name="Alice", role="Engineer", salary=90000, company="Other")
# TypeError: __init__() got an unexpected keyword argument 'company'
InitVar - Constructor Parameters That Are Not Fields
InitVar creates a parameter in __init__ that is passed to __post_init__ but is not stored as an instance attribute.
from dataclasses import dataclass, InitVar
@dataclass
class HashedPassword:
username: str
# raw_password is a constructor param - passed to __post_init__ - NOT stored
raw_password: InitVar[str]
# password_hash IS stored
password_hash: str = ""
def __post_init__(self, raw_password: str):
import hashlib
# Hash the password in post_init - raw_password is not stored
self.password_hash = hashlib.sha256(raw_password.encode()).hexdigest()
user = HashedPassword(username="alice", raw_password="secret123")
print(user.username) # alice
print(user.password_hash) # sha256 hash of "secret123"
# user.raw_password - AttributeError! Not stored.
print(user)
# HashedPassword(username='alice', password_hash='...')
# raw_password NOT in repr - it is not a field
InitVar is the correct pattern for:
- Passwords that should not be stored in plaintext
- One-time initialisation data (database connections, config files)
- Data used only to compute field values in
__post_init__
Part 6 - Inheritance with Dataclasses
Basic Inheritance
from dataclasses import dataclass
@dataclass
class Base:
id: int
created_at: str = ""
@dataclass
class Child(Base):
name: str = "" # Must have default - Base fields with defaults come first
# Child __init__ signature: (id: int, created_at: str = '', name: str = '')
# MRO order: Base fields first, then Child fields
c = Child(id=1, name="Alice")
print(c) # Child(id=1, created_at='', name='Alice')
The Non-Default After Default Problem
This is the most common dataclass inheritance mistake:
from dataclasses import dataclass
@dataclass
class Base:
x: int = 0 # has default
@dataclass
class Child(Base):
y: int # no default - ERROR!
# TypeError: non-default argument 'y' follows default argument
Python sees the effective __init__ as (x: int = 0, y: int) - a non-default argument after a default argument.
The fix: either give all child fields defaults, or restructure so base class fields have no defaults:
@dataclass
class Base:
x: int # no default - OK
@dataclass
class Child(Base):
y: int # no default - OK
z: int = 0 # default - OK (at the end)
c = Child(x=1, y=2) # Child(x=1, y=2, z=0)
c = Child(x=1, y=2, z=3) # Child(x=1, y=2, z=3)
Inheritance with dataclasses: parent fields always come before child fields in the generated __init__. This means if any parent field has a default, ALL child fields must also have defaults - otherwise Python raises TypeError at class definition time. This "non-default after default" problem is the most common inheritance mistake with dataclasses. The solution is either: use kw_only=True (Python 3.10+), give all fields defaults, or restructure so only the last class in the hierarchy introduces defaults.
kw_only=True Solves the Ordering Problem (Python 3.10+)
from dataclasses import dataclass, field
@dataclass
class Base:
x: int = field(default=0, kw_only=True) # keyword-only
@dataclass
class Child(Base):
y: int # positional - can come "before" in effective init
# Works because kw_only fields don't participate in positional ordering
c = Child(y=5, x=10) # Child(x=10, y=5)
Or use @dataclass(kw_only=True) to make all fields keyword-only:
@dataclass(kw_only=True)
class Config:
host: str
port: int = 5432
debug: bool = False
cfg = Config(host="localhost") # OK
cfg = Config(host="localhost", debug=True) # OK
# Config("localhost") # TypeError: takes no positional arguments
Part 7 - order=True for Comparison
from dataclasses import dataclass
@dataclass(order=True)
class Version:
major: int
minor: int
patch: int = 0
v1 = Version(1, 2, 3)
v2 = Version(1, 3, 0)
v3 = Version(2, 0, 0)
print(v1 < v2) # True - (1,2,3) < (1,3,0)
print(v2 < v3) # True - (1,3,0) < (2,0,0)
print(sorted([v3, v1, v2]))
# [Version(major=1, minor=2, patch=3), Version(major=1, minor=3, patch=0), Version(major=2, minor=0, patch=0)]
order=True generates __lt__, __le__, __gt__, __ge__ by comparing instances as tuples of their fields in declaration order. All fields participate unless field(compare=False) is used.
Note: order=True requires eq=True (the default). If eq=False, order=True raises ValueError.
Excluding Fields from Comparison
@dataclass(order=True)
class Task:
priority: int # compared first
name: str # compared second
description: str = field(default="", compare=False) # NOT compared
created_at: str = field(default="", compare=False) # NOT compared
t1 = Task(priority=1, name="Alpha", description="Long description", created_at="2024-01-01")
t2 = Task(priority=1, name="Alpha", description="Different description", created_at="2024-06-01")
print(t1 == t2) # True - description and created_at excluded from comparison
print(t1 < t2) # False - equal on (priority, name)
Part 8 - Dataclasses vs NamedTuple vs attrs
NamedTuple - Immutable, Tuple-Based
from typing import NamedTuple
class Point(NamedTuple):
x: float
y: float
p = Point(3.0, 4.0)
print(p.x) # 3.0
print(p[0]) # 3.0 - indexable like a tuple
print(p._asdict()) # {'x': 3.0, 'y': 4.0}
NamedTuple instances are tuples - immutable by default, indexable, unpackable, usable anywhere a tuple is expected. Use NamedTuple when you need tuple semantics (unpacking, indexing, compatibility with tuple-expecting functions) plus named access.
Side-by-Side Comparison
| Feature | @dataclass | NamedTuple | attrs |
|---|---|---|---|
| Mutability | Mutable by default | Immutable | Configurable |
| Inheritance | Full Python inheritance | Limited | Full |
frozen=True | Yes | N/A (always frozen) | @define(frozen=True) |
__slots__ | slots=True (3.10+) | Built-in | @define uses slots |
| Custom validators | __post_init__ | Manual | @validator |
| Performance | Good | Best (tuple) | Best (slots) |
| Indexable | No | Yes | No |
| Standard library | Yes | Yes | Third-party |
| Production use | Config, models, DTOs | Return values, records | High-performance models |
When to Use Each
@dataclass: general-purpose data containers, config objects, request/response models, when you need mutability or complex inheritanceNamedTuple: function return values, records where tuple unpacking is useful, small immutable data holdersattrs: high-performance scenarios, complex validation requirements, when you need slots + validators without Python 3.10
Part 9 - Production Patterns
FastAPI Request and Response Models
FastAPI uses Pydantic, not @dataclass, but the patterns are illustrative of how dataclasses are used for API boundaries:
from dataclasses import dataclass, field
from typing import Optional
import json
@dataclass
class CreateUserRequest:
"""Incoming request - validated in __post_init__."""
username: str
email: str
password: str
role: str = "user"
def __post_init__(self):
self.username = self.username.strip()
self.email = self.email.strip().lower()
if len(self.password) < 8:
raise ValueError("Password must be at least 8 characters")
if self.role not in ("user", "admin", "viewer"):
raise ValueError(f"Invalid role: {self.role!r}")
@dataclass
class UserResponse:
"""Outgoing response - no password, no sensitive fields."""
id: int
username: str
email: str
role: str
def to_dict(self) -> dict:
from dataclasses import asdict
return asdict(self)
def to_json(self) -> str:
return json.dumps(self.to_dict())
# Request/response flow
req = CreateUserRequest(
username=" Alice ",
password="supersecret123",
)
# Normalised in __post_init__
print(req.username) # "Alice"
resp = UserResponse(id=1, username=req.username, email=req.email, role=req.role)
print(resp.to_json())
# {"id": 1, "username": "Alice", "email": "[email protected]", "role": "user"}
Configuration Objects
Dataclasses are excellent for typed configuration:
from dataclasses import dataclass, field
from typing import ClassVar
import os
@dataclass
class DatabaseConfig:
host: str = field(default_factory=lambda: os.getenv("DB_HOST", "localhost"))
port: int = field(default_factory=lambda: int(os.getenv("DB_PORT", "5432")))
name: str = field(default_factory=lambda: os.getenv("DB_NAME", "myapp"))
user: str = field(default_factory=lambda: os.getenv("DB_USER", "postgres"))
password: str = field(
default_factory=lambda: os.getenv("DB_PASSWORD", ""),
repr=False # NEVER include password in repr
)
pool_size: int = 10
max_overflow: int = 20
DRIVER: ClassVar[str] = "postgresql"
def __post_init__(self):
if not self.host:
raise ValueError("DB_HOST must be set")
@property
def url(self) -> str:
return f"{self.DRIVER}://{self.user}:{self.password}@{self.host}:{self.port}/{self.name}"
@dataclass(frozen=True)
class AppConfig:
"""Immutable application config - loaded once at startup."""
db: DatabaseConfig = field(default_factory=DatabaseConfig)
debug: bool = field(default_factory=lambda: os.getenv("DEBUG", "false").lower() == "true")
secret_key: str = field(
default_factory=lambda: os.getenv("SECRET_KEY", "dev-secret"),
repr=False
)
allowed_hosts: tuple = field(
default_factory=lambda: tuple(os.getenv("ALLOWED_HOSTS", "localhost").split(","))
)
config = AppConfig()
print(config.db.host) # localhost (or from env)
print(config.debug) # False (or from env)
print(config)
# AppConfig(db=DatabaseConfig(host='localhost', ...), debug=False, allowed_hosts=('localhost',))
# secret_key excluded from repr - repr=False
Value Objects with replace()
dataclasses.replace() creates a copy of a dataclass with selected fields changed - useful for immutable value objects.
from dataclasses import dataclass, replace
@dataclass(frozen=True)
class Money:
amount: float
currency: str = "USD"
def __add__(self, other: "Money") -> "Money":
if self.currency != other.currency:
raise ValueError(f"Cannot add {self.currency} and {other.currency}")
return replace(self, amount=self.amount + other.amount)
def __mul__(self, factor: float) -> "Money":
return replace(self, amount=self.amount * factor)
def convert(self, currency: str, rate: float) -> "Money":
return Money(amount=self.amount * rate, currency=currency)
price = Money(amount=100.0)
tax = Money(amount=10.0)
total = price + tax
print(total) # Money(amount=110.0, currency='USD')
euro_price = price.convert("EUR", 0.92)
print(euro_price) # Money(amount=92.0, currency='EUR')
# replace() usage
doubled = replace(price, amount=price.amount * 2)
print(doubled) # Money(amount=200.0, currency='USD')
Part 10 - Performance Considerations
Memory: slots=True (Python 3.10+)
from dataclasses import dataclass
import sys
@dataclass
class PointDict:
x: float
y: float
@dataclass(slots=True)
class PointSlots:
x: float
y: float
pd = PointDict(1.0, 2.0)
ps = PointSlots(1.0, 2.0)
print(sys.getsizeof(pd) + sys.getsizeof(pd.__dict__)) # ~232 bytes
print(sys.getsizeof(ps)) # ~56 bytes - no __dict__
Use slots=True for dataclasses that are instantiated in large numbers (sensor readings, geometric points, event records).
Instantiation Performance
Dataclasses have overhead compared to plain __init__ methods in hot paths:
import timeit
@dataclass
class DataPoint:
x: float
y: float
z: float = 0.0
class ManualPoint:
def __init__(self, x: float, y: float, z: float = 0.0):
self.x = x
self.y = y
self.z = z
dc_time = timeit.timeit(lambda: DataPoint(1.0, 2.0), number=1_000_000)
mp_time = timeit.timeit(lambda: ManualPoint(1.0, 2.0), number=1_000_000)
print(f"Dataclass: {dc_time:.3f}s")
print(f"Manual: {mp_time:.3f}s")
# Typical: dataclass is ~10-20% slower due to field validation at class creation
# With slots=True, the gap narrows significantly
For performance-critical code processing millions of objects per second, benchmark before choosing. In most application code, the difference is irrelevant.
Common Mistakes
Mistake 1 - Using a Mutable Default Directly
from dataclasses import dataclass
# Wrong: raises ValueError immediately
@dataclass
class Bad:
items: list = []
# Right
@dataclass
class Good:
items: list = field(default_factory=list)
Mistake 2 - Non-Default Field After Default Field in Subclass
@dataclass
class Base:
x: int = 0 # default
@dataclass
class Child(Base):
y: int # no default - TypeError at class definition time
Mistake 3 - Expecting frozen=True to Deep-Freeze Mutable Containers
@dataclass(frozen=True)
class Config:
tags: list = field(default_factory=list)
cfg = Config()
cfg.tags.append("x") # Works! frozen prevents reassignment, not mutation
Use tuple instead of list if you need true immutability.
Mistake 4 - Forgetting That @dataclass Does Not Generate __hash__ By Default
When eq=True (the default), Python sets __hash__ = None, making the class unhashable - the same behaviour as manually defining __eq__. To get a hash, either use frozen=True or unsafe_hash=True.
@dataclass
class Point:
x: float
y: float
p = Point(1, 2)
hash(p) # TypeError: unhashable type: 'Point'
@dataclass(frozen=True)
class FrozenPoint:
x: float
y: float
hash(FrozenPoint(1, 2)) # works
Engineering Checklist
Before moving to the next lesson, verify you can answer these without looking:
- What three dunder methods does
@dataclassgenerate by default? - Why can you not use a mutable value as a field default? What is the fix?
- What is the difference between
field(repr=False),field(compare=False), andfield(init=False)? - What does
frozen=Truegenerate, and what does it NOT prevent? - What is
__post_init__and when does it run? - What is
ClassVarand how does it differ from a regular field? - What is
InitVarand when would you use it? - What is the non-default-after-default problem in dataclass inheritance? How do you fix it?
- When should you prefer
NamedTupleover@dataclass? - How do you create a copy of a frozen dataclass with one field changed?
Graded Practice Challenges
Level 1 - Predict the Output
Predict each output before running:
Question 1
from dataclasses import dataclass
@dataclass
class Item:
name: str
price: float = 0.0
a = Item("Widget", 9.99)
b = Item("Widget", 9.99)
print(a == b)
print(a is b)
Show Answer
True
False
@dataclass generates __eq__ that compares field values, so a == b is True. But a and b are two separate objects in memory, so a is b is False. This is the standard value-equality vs identity distinction.
Question 2
from dataclasses import dataclass, field
@dataclass
class Cart:
items: list = field(default_factory=list)
c1 = Cart()
c2 = Cart()
c1.items.append("apple")
print(c1.items)
print(c2.items)
Show Answer
['apple']
[]
default_factory=list calls list() separately for each instance. c1 and c2 each get their own list object - mutations to one do not affect the other.
Question 3
from dataclasses import dataclass
@dataclass(frozen=True)
class Coord:
x: int
y: int
c = Coord(1, 2)
s = {c, Coord(1, 2), Coord(3, 4)}
print(len(s))
Show Answer
2
frozen=True makes the dataclass hashable. Coord(1, 2) and c are equal (same field values) and hash equally, so the set deduplicates them. Only two unique coordinates remain.
Question 4
from dataclasses import dataclass, field
@dataclass
class Record:
value: int
label: str = field(default="", repr=False)
score: float = field(default=0.0, compare=False)
r1 = Record(value=10, label="A", score=99.9)
r2 = Record(value=10, label="B", score=0.0)
print(r1 == r2)
print(r1)
Show Answer
True
Record(value=10, score=99.9)
label is excluded from __eq__ because repr=False does not affect comparison - wait, compare is what controls __eq__. label has no compare=False so it IS compared. But label of r1 is "A" and r2 is "B", so r1 == r2 should be False... unless we check again: label has repr=False only, and score has compare=False. So __eq__ compares value and label. Since label differs ("A" vs "B"), the result is actually False. And repr shows value and score (label is excluded from repr).
Corrected output:
False
Record(value=10, score=99.9)
Key lesson: repr=False only affects __repr__. compare=False affects __eq__ and ordering. They are independent controls.
Question 5
from dataclasses import dataclass
@dataclass
class Node:
value: int
def __post_init__(self):
self.doubled = self.value * 2
n = Node(5)
print(n.doubled)
print(n)
Show Answer
10
Node(value=5)
__post_init__ runs and sets self.doubled = 10. However, doubled was not declared as a field annotation, so __repr__ does not include it. Only declared fields appear in the generated __repr__.
Level 2 - Debug Challenge
This dataclass has three bugs. Find and fix all of them:
from dataclasses import dataclass, field
@dataclass(frozen=True)
class ShoppingCart:
owner: str
items: list = [] # Bug 1
total: float = field(init=False) # Bug 2
def __post_init__(self):
self.total = sum(self.items) # Bug 3
Show Solution
Bug 1 - Mutable default on a frozen dataclass:
items: list = [] raises ValueError: mutable default <class 'list'> is not allowed for field items. Fix with default_factory:
items: list = field(default_factory=list)
Bug 2 - field(init=False) needs a default or default_factory:
total: float = field(init=False) has no default value. Since init=False means it is not set by __init__, Python needs to know the initial value. Fix:
total: float = field(init=False, default=0.0)
Bug 3 - Assigning to a field in __post_init__ of a frozen dataclass:
frozen=True generates __setattr__ that raises FrozenInstanceError. You cannot assign to self.total directly. The fix is to use object.__setattr__:
def __post_init__(self):
object.__setattr__(self, "total", sum(self.items))
Corrected class:
from dataclasses import dataclass, field
@dataclass(frozen=True)
class ShoppingCart:
owner: str
items: tuple = () # use tuple for true immutability
total: float = field(init=False, default=0.0)
def __post_init__(self):
object.__setattr__(self, "total", sum(self.items))
cart = ShoppingCart(owner="Alice", items=(10.0, 25.50, 5.0))
print(cart.total) # 40.5
print(cart) # ShoppingCart(owner='Alice', items=(10.0, 25.5, 5.0), total=40.5)
Level 3 - Design Challenge
Design a typed configuration system for a microservice using dataclasses. The system must:
- Have a
DatabaseConfigwith host, port, name, user, and password (password excluded from repr) - Have a
CacheConfigwith host, port, and TTL in seconds - Have a top-level
ServiceConfigthat is frozen, holds both configs, and validates that port numbers are in range 1–65535 - Support loading from a dictionary (e.g., parsed from YAML/JSON)
- Use
replace()to produce a "test mode" version with an in-memory database
Show Reference Solution
from dataclasses import dataclass, field, replace
from typing import ClassVar
@dataclass
class DatabaseConfig:
host: str = "localhost"
port: int = 5432
name: str = "myservice"
user: str = "postgres"
password: str = field(default="", repr=False)
def __post_init__(self):
if not (1 <= self.port <= 65535):
raise ValueError(f"DB port must be 1-65535, got {self.port}")
@classmethod
def from_dict(cls, data: dict) -> "DatabaseConfig":
return cls(
host=data.get("host", "localhost"),
port=int(data.get("port", 5432)),
name=data.get("name", "myservice"),
user=data.get("user", "postgres"),
password=data.get("password", ""),
)
@property
def url(self) -> str:
return f"postgresql://{self.user}:{self.password}@{self.host}:{self.port}/{self.name}"
@dataclass
class CacheConfig:
host: str = "localhost"
port: int = 6379
ttl_seconds: int = 300
def __post_init__(self):
if not (1 <= self.port <= 65535):
raise ValueError(f"Cache port must be 1-65535, got {self.port}")
if self.ttl_seconds <= 0:
raise ValueError(f"TTL must be positive, got {self.ttl_seconds}")
@classmethod
def from_dict(cls, data: dict) -> "CacheConfig":
return cls(
host=data.get("host", "localhost"),
port=int(data.get("port", 6379)),
ttl_seconds=int(data.get("ttl_seconds", 300)),
)
@dataclass(frozen=True)
class ServiceConfig:
service_name: str
db: DatabaseConfig = field(default_factory=DatabaseConfig)
cache: CacheConfig = field(default_factory=CacheConfig)
debug: bool = False
VERSION: ClassVar[str] = "1.0"
@classmethod
def from_dict(cls, data: dict) -> "ServiceConfig":
return cls(
service_name=data["service_name"],
db=DatabaseConfig.from_dict(data.get("db", {})),
cache=CacheConfig.from_dict(data.get("cache", {})),
debug=data.get("debug", False),
)
def as_test_config(self) -> "ServiceConfig":
"""Return a copy configured for testing (in-memory SQLite, no cache TTL)."""
test_db = replace(self.db, host="memory", name=":memory:", port=1)
test_cache = replace(self.cache, ttl_seconds=1)
return replace(self, db=test_db, cache=test_cache, debug=True)
# Load from a dict (simulating YAML parse)
raw = {
"service_name": "order-service",
"db": {"host": "db.prod.example.com", "port": "5432", "name": "orders", "password": "s3cr3t"},
"cache": {"host": "redis.prod.example.com", "ttl_seconds": "600"},
"debug": False,
}
config = ServiceConfig.from_dict(raw)
print(config.service_name) # order-service
print(config.db.host) # db.prod.example.com
print(config.db) # DatabaseConfig(host='db.prod.example.com', ...) - no password
print(config.cache.ttl_seconds) # 600
test_config = config.as_test_config()
print(test_config.db.host) # memory
print(test_config.debug) # True
print(test_config.cache.ttl_seconds) # 1
Key Takeaways
@dataclassgenerates__init__,__repr__, and__eq__by default. All other methods (__hash__,__lt__, etc.) require explicit opt-in via decorator parameters.field(default_factory=...)is mandatory for mutable defaults (list,dict). Using a mutable directly raisesValueErrorat class definition time - Python catches this to prevent shared-state bugs.frozen=Truegenerates__setattr__/__delattr__that raiseFrozenInstanceError, and also generates__hash__. It does NOT deep-freeze: mutable container attributes can still be mutated.__post_init__runs after the generated__init__finishes. It is the place for validation, normalisation, and computed fields.ClassVarfields are excluded from all dataclass operations.InitVarfields appear in__init__and__post_init__but are not stored as instance attributes.- Dataclass inheritance places parent fields before child fields in the generated
__init__. If any parent field has a default, all child fields must also have defaults - or usekw_only=True. - Use
frozen=Truefor value objects (Money,Point,Config) that should be immutable and hashable. UseNamedTuplewhen you need tuple semantics (indexing, unpacking). Useattrsfor high-performance slots + validators. dataclasses.replace()creates a shallow copy with selected fields changed - the correct way to "modify" a frozen dataclass.
What's Next
Lesson 11 covers the SOLID principles applied to Python - the five design principles that guide the construction of maintainable, extensible object-oriented systems. You have already seen several SOLID violations and their fixes throughout this module - now you will learn them by name, understand them precisely, and see production Python examples of each.
