Skip to main content

Dataclasses - Code Generation, Immutability, and Production Patterns

Reading time: ~28 minutes | Level: Intermediate → Engineering

Before reading further, predict which of these two classes are equivalent:

# Version A
class Point:
def __init__(self, x: float, y: float):
self.x = x
self.y = y

def __repr__(self):
return f"Point(x={self.x!r}, y={self.y!r})"

def __eq__(self, other):
if not isinstance(other, Point):
return NotImplemented
return (self.x, self.y) == (other.x, other.y)

# Version B
from dataclasses import dataclass

@dataclass
class Point:
x: float
y: float

They are functionally equivalent - @dataclass generates all three methods from the field declarations. But this is just the entry point. @dataclass can also generate __hash__, __lt__, __le__, __gt__, __ge__, handle mutable defaults correctly, enforce immutability, run post-initialisation validation, and more - all through parameters and field().

The lesson is not about saving keystrokes. It is about understanding what the generator produces, knowing its edge cases, and using it correctly in production code.

What You Will Learn

  • What @dataclass generates and exactly how each generated method works
  • field() - controlling individual field behaviour (defaults, factories, repr, compare, hash)
  • frozen=True - how it implements immutability and where it breaks down
  • __post_init__ - running validation and computed fields after generation
  • ClassVar vs InitVar - fields that are not instance attributes
  • Inheritance with dataclasses - what the field ordering rules are and where they fail
  • @dataclass(order=True) - comparison method generation and its implications
  • Comparing dataclasses to NamedTuple and attrs - when to use each
  • Production patterns: FastAPI request models, config objects, value objects
  • Performance considerations

Prerequisites

  • Lessons 01–09 of this module
  • Understanding of __init__, __repr__, __eq__, __hash__ from Lessons 02 and 03
  • Familiarity with type hints

Part 1 - What @dataclass Generates

The Basic Case

from dataclasses import dataclass

@dataclass
class Product:
name: str
price: float
quantity: int = 0

The decorator generates:

# Generated __init__
def __init__(self, name: str, price: float, quantity: int = 0):
self.name = name
self.price = price
self.quantity = quantity

# Generated __repr__
def __repr__(self):
return f"Product(name={self.name!r}, price={self.price!r}, quantity={self.quantity!r})"

# Generated __eq__
def __eq__(self, other):
if other.__class__ is self.__class__:
return (self.name, self.price, self.quantity) == (other.name, other.price, other.quantity)
return NotImplemented

The ordering of __init__ parameters matches the field declaration order. Fields with defaults must come after fields without defaults - same rule as function arguments.

Inspecting the Generated Code

from dataclasses import dataclass, fields, asdict, astuple
import inspect

@dataclass
class Config:
host: str
port: int = 5432
debug: bool = False

# Inspect the generated __init__
print(inspect.signature(Config.__init__))
# (self, host: str, port: int = 5432, debug: bool = False)

# Inspect all fields
for f in fields(Config):
print(f.name, f.type, f.default)
# host <class 'str'> MISSING
# port <class 'int'> 5432
# debug <class 'bool'> False

# Convert to dict and tuple
cfg = Config(host="localhost")
print(asdict(cfg)) # {'host': 'localhost', 'port': 5432, 'debug': False}
print(astuple(cfg)) # ('localhost', 5432, False)

fields(), asdict(), and astuple() are utility functions from the dataclasses module. They work on any dataclass instance.

The @dataclass Parameters

@dataclass(
init=True, # generate __init__ (default True)
repr=True, # generate __repr__ (default True)
eq=True, # generate __eq__ (default True)
order=False, # generate __lt__, __le__, __gt__, __ge__ (default False)
unsafe_hash=False, # generate __hash__ even if eq=True (default False)
frozen=False, # make instances immutable (default False)
match_args=True, # generate __match_args__ for pattern matching (Python 3.10+, default True)
kw_only=False, # all fields are keyword-only (Python 3.10+, default False)
slots=False, # use __slots__ (Python 3.10+, default False)
)
class MyClass:
...

Part 2 - field() and default_factory

Why field() Exists

The @dataclass decorator cannot allow mutable defaults directly. This is the same mutable class attribute problem from Lesson 01, but caught at definition time:

from dataclasses import dataclass

@dataclass
class BadRegistry:
items: list = [] # ValueError: mutable default <class 'list'> is not allowed

Python raises ValueError immediately. The fix is field(default_factory=...):

from dataclasses import dataclass, field

@dataclass
class Registry:
items: list = field(default_factory=list) # each instance gets its OWN list
metadata: dict = field(default_factory=dict)

r1 = Registry()
r2 = Registry()
r1.items.append("alpha")
print(r1.items) # ['alpha']
print(r2.items) # [] - separate list

default_factory is a zero-argument callable. It is called once per instance creation.

warning

default_factory is required for mutable defaults like list and dict. Writing items: list = [] raises ValueError at class definition time - Python catches this immediately to prevent the shared-mutable-default bug. Always use field(default_factory=list) or field(default_factory=dict). For scalar immutable values (int, str, bool, None), a plain default is fine.

All field() Parameters

from dataclasses import dataclass, field

@dataclass
class Event:
# name: included in __init__, __repr__, __eq__, __hash__
title: str

# default: scalar default value (for immutable values)
priority: int = field(default=0)

# default_factory: callable that produces the default (for mutable values)
tags: list[str] = field(default_factory=list)

# repr=False: exclude from __repr__ (useful for sensitive data)
_internal_id: str = field(default="", repr=False)

# compare=False: exclude from __eq__ and ordering comparisons
created_at: str = field(default="", compare=False)

# hash=False: exclude from __hash__ (only relevant when unsafe_hash=True or frozen=True)
description: str = field(default="", hash=False)

# init=False: not a constructor argument - set in __post_init__ instead
word_count: int = field(default=0, init=False)

def __post_init__(self):
self.word_count = len(self.description.split())


e = Event(title="Launch", priority=1, tags=["product", "urgent"])
print(e)
# Event(title='Launch', priority=1, tags=['product', 'urgent'],
# created_at='', description='', word_count=0)
# Note: _internal_id excluded from repr

Custom default_factory Functions

from dataclasses import dataclass, field
from datetime import datetime
import uuid

def generate_id() -> str:
return str(uuid.uuid4())

def current_timestamp() -> str:
return datetime.utcnow().isoformat()

@dataclass
class Order:
product_id: int
quantity: int
order_id: str = field(default_factory=generate_id)
created_at: str = field(default_factory=current_timestamp)
line_items: list = field(default_factory=list)

o1 = Order(product_id=42, quantity=3)
o2 = Order(product_id=99, quantity=1)

print(o1.order_id) # 'f47ac10b-58cc-4372-a567-0e02b2c3d479' (unique UUID)
print(o2.order_id) # 'a3bb189e-8bf9-3888-9912-ace4e6543002' (different UUID)
print(o1.order_id == o2.order_id) # False - each gets its own UUID

Part 3 - frozen=True and Immutability

What frozen=True Does

frozen=True generates __setattr__ and __delattr__ methods that raise FrozenInstanceError on any attempt to modify the instance after creation. It also makes the dataclass hashable (since equality-based classes are not hashable by default in Python 3).

from dataclasses import dataclass

@dataclass(frozen=True)
class Vector:
x: float
y: float

def magnitude(self) -> float:
return (self.x ** 2 + self.y ** 2) ** 0.5

def __add__(self, other: "Vector") -> "Vector":
return Vector(self.x + other.x, self.y + other.y) # returns new Vector

v1 = Vector(3.0, 4.0)
v2 = Vector(1.0, 2.0)

print(v1.magnitude()) # 5.0
print(v1 + v2) # Vector(x=4.0, y=6.0)

v1.x = 10.0 # FrozenInstanceError: cannot assign to field 'x'

# Frozen dataclasses are hashable
print(hash(v1)) # consistent hash
d = {v1: "origin vector"}
print(d[Vector(3.0, 4.0)]) # "origin vector" - equal frozen instances hash equally
tip

Use frozen=True for value objects - types like Money, Point, Coordinate, or Config that should be treated as immutable values, not mutable state. Frozen dataclasses are hashable, usable as dictionary keys, and safe to share across threads without locks. They communicate design intent: "this object should not change after creation."

The Limits of frozen=True

frozen=True prevents reassignment of attributes. It does not prevent mutation of mutable objects stored as attributes.

from dataclasses import dataclass, field

@dataclass(frozen=True)
class Config:
host: str
tags: list = field(default_factory=list) # mutable!

cfg = Config(host="localhost", tags=["web"])

# This raises FrozenInstanceError - correct
# cfg.host = "other"

# This WORKS - mutating the list object, not reassigning the attribute
cfg.tags.append("db")
print(cfg.tags) # ['web', 'db'] - mutation succeeded despite frozen=True

True immutability requires immutable container types:

@dataclass(frozen=True)
class ImmutableConfig:
host: str
tags: tuple = () # tuple - genuinely immutable

cfg = ImmutableConfig(host="localhost", tags=("web", "db"))
# cfg.tags.append("x") # AttributeError: 'tuple' has no attribute 'append' - correct

Using frozen=True for Dictionary Keys and Set Members

from dataclasses import dataclass

@dataclass(frozen=True)
class Point:
x: float
y: float

# Can be used as dict keys and set members
points = {Point(0, 0), Point(1, 1), Point(0, 0)}
print(len(points)) # 2 - duplicates removed

registry = {
Point(0, 0): "origin",
Point(1, 0): "unit x",
}
print(registry[Point(0, 0)]) # "origin"

Part 4 - __post_init__ for Validation and Computed Fields

__post_init__ runs immediately after the generated __init__ completes. Use it for validation, normalisation, and computed fields that depend on the constructor arguments.

note

__post_init__ runs after the generated __init__ finishes assigning all fields. This is the correct place for: input validation (raise ValueError early), field normalisation (strip/lowercase strings), and derived fields computed from other fields (e.g. word_count = len(description.split())). Do not try to use __init__ directly when a dataclass is involved - you will override the generated one.

Validation

from dataclasses import dataclass
from datetime import date

@dataclass
class DateRange:
start: date
end: date

def __post_init__(self):
if self.end < self.start:
raise ValueError(
f"end ({self.end}) must be >= start ({self.start})"
)

@property
def duration_days(self) -> int:
return (self.end - self.start).days


today = date.today()
dr = DateRange(start=date(2024, 1, 1), end=date(2024, 12, 31))
print(dr.duration_days) # 365

DateRange(start=date(2024, 12, 31), end=date(2024, 1, 1))
# ValueError: end (2024-01-01) must be >= start (2024-12-31)

Normalisation and Computed Fields

from dataclasses import dataclass, field

@dataclass
class EmailAddress:
raw: str
normalised: str = field(init=False, repr=True)
domain: str = field(init=False, repr=True)

def __post_init__(self):
# Normalise: lowercase, strip whitespace
self.normalised = self.raw.strip().lower()
# Validate
if "@" not in self.normalised:
raise ValueError(f"Invalid email: {self.raw!r}")
# Extract computed field
local, self.domain = self.normalised.rsplit("@", 1)

e = EmailAddress(raw=" [email protected] ")
print(e.normalised) # [email protected]
print(e.domain) # example.com
print(e) # EmailAddress(raw=' [email protected] ', normalised='[email protected]', domain='example.com')

Type Coercion in __post_init__

from dataclasses import dataclass

@dataclass
class Port:
number: int

def __post_init__(self):
# Coerce string input to int - useful when loading from environment variables
self.number = int(self.number)
if not (1 <= self.number <= 65535):
raise ValueError(f"Port must be 1–65535, got {self.number}")

p = Port(number="8080") # string input
print(p.number) # 8080 (int)
print(type(p.number)) # <class 'int'>

Part 5 - ClassVar and InitVar

ClassVar - Class-Level Attributes

Fields annotated with ClassVar are excluded from __init__, __repr__, __eq__, and all dataclass operations. They are class attributes, not instance attributes.

from dataclasses import dataclass
from typing import ClassVar

@dataclass
class Employee:
# ClassVar: class-level, not an instance field
company: ClassVar[str] = "EngineersOfAI"
max_salary: ClassVar[float] = 200_000.0

# Instance fields
name: str
role: str
salary: float

e = Employee(name="Alice", role="Engineer", salary=90_000)
print(e)
# Employee(name='Alice', role='Engineer', salary=90000.0)
# company and max_salary NOT in repr

print(Employee.company) # EngineersOfAI - class attribute
print(e.company) # EngineersOfAI - accessed via instance, lives on class

# ClassVar is excluded from __init__
# Employee(name="Alice", role="Engineer", salary=90000, company="Other")
# TypeError: __init__() got an unexpected keyword argument 'company'

InitVar - Constructor Parameters That Are Not Fields

InitVar creates a parameter in __init__ that is passed to __post_init__ but is not stored as an instance attribute.

from dataclasses import dataclass, InitVar

@dataclass
class HashedPassword:
username: str
# raw_password is a constructor param - passed to __post_init__ - NOT stored
raw_password: InitVar[str]
# password_hash IS stored
password_hash: str = ""

def __post_init__(self, raw_password: str):
import hashlib
# Hash the password in post_init - raw_password is not stored
self.password_hash = hashlib.sha256(raw_password.encode()).hexdigest()


user = HashedPassword(username="alice", raw_password="secret123")
print(user.username) # alice
print(user.password_hash) # sha256 hash of "secret123"
# user.raw_password - AttributeError! Not stored.
print(user)
# HashedPassword(username='alice', password_hash='...')
# raw_password NOT in repr - it is not a field

InitVar is the correct pattern for:

  • Passwords that should not be stored in plaintext
  • One-time initialisation data (database connections, config files)
  • Data used only to compute field values in __post_init__

Part 6 - Inheritance with Dataclasses

Basic Inheritance

from dataclasses import dataclass

@dataclass
class Base:
id: int
created_at: str = ""

@dataclass
class Child(Base):
name: str = "" # Must have default - Base fields with defaults come first

# Child __init__ signature: (id: int, created_at: str = '', name: str = '')
# MRO order: Base fields first, then Child fields
c = Child(id=1, name="Alice")
print(c) # Child(id=1, created_at='', name='Alice')

The Non-Default After Default Problem

This is the most common dataclass inheritance mistake:

from dataclasses import dataclass

@dataclass
class Base:
x: int = 0 # has default

@dataclass
class Child(Base):
y: int # no default - ERROR!
# TypeError: non-default argument 'y' follows default argument

Python sees the effective __init__ as (x: int = 0, y: int) - a non-default argument after a default argument.

The fix: either give all child fields defaults, or restructure so base class fields have no defaults:

@dataclass
class Base:
x: int # no default - OK

@dataclass
class Child(Base):
y: int # no default - OK
z: int = 0 # default - OK (at the end)

c = Child(x=1, y=2) # Child(x=1, y=2, z=0)
c = Child(x=1, y=2, z=3) # Child(x=1, y=2, z=3)
danger

Inheritance with dataclasses: parent fields always come before child fields in the generated __init__. This means if any parent field has a default, ALL child fields must also have defaults - otherwise Python raises TypeError at class definition time. This "non-default after default" problem is the most common inheritance mistake with dataclasses. The solution is either: use kw_only=True (Python 3.10+), give all fields defaults, or restructure so only the last class in the hierarchy introduces defaults.

kw_only=True Solves the Ordering Problem (Python 3.10+)

from dataclasses import dataclass, field

@dataclass
class Base:
x: int = field(default=0, kw_only=True) # keyword-only

@dataclass
class Child(Base):
y: int # positional - can come "before" in effective init

# Works because kw_only fields don't participate in positional ordering
c = Child(y=5, x=10) # Child(x=10, y=5)

Or use @dataclass(kw_only=True) to make all fields keyword-only:

@dataclass(kw_only=True)
class Config:
host: str
port: int = 5432
debug: bool = False

cfg = Config(host="localhost") # OK
cfg = Config(host="localhost", debug=True) # OK
# Config("localhost") # TypeError: takes no positional arguments

Part 7 - order=True for Comparison

from dataclasses import dataclass

@dataclass(order=True)
class Version:
major: int
minor: int
patch: int = 0

v1 = Version(1, 2, 3)
v2 = Version(1, 3, 0)
v3 = Version(2, 0, 0)

print(v1 < v2) # True - (1,2,3) < (1,3,0)
print(v2 < v3) # True - (1,3,0) < (2,0,0)
print(sorted([v3, v1, v2]))
# [Version(major=1, minor=2, patch=3), Version(major=1, minor=3, patch=0), Version(major=2, minor=0, patch=0)]

order=True generates __lt__, __le__, __gt__, __ge__ by comparing instances as tuples of their fields in declaration order. All fields participate unless field(compare=False) is used.

Note: order=True requires eq=True (the default). If eq=False, order=True raises ValueError.

Excluding Fields from Comparison

@dataclass(order=True)
class Task:
priority: int # compared first
name: str # compared second
description: str = field(default="", compare=False) # NOT compared
created_at: str = field(default="", compare=False) # NOT compared

t1 = Task(priority=1, name="Alpha", description="Long description", created_at="2024-01-01")
t2 = Task(priority=1, name="Alpha", description="Different description", created_at="2024-06-01")

print(t1 == t2) # True - description and created_at excluded from comparison
print(t1 < t2) # False - equal on (priority, name)

Part 8 - Dataclasses vs NamedTuple vs attrs

NamedTuple - Immutable, Tuple-Based

from typing import NamedTuple

class Point(NamedTuple):
x: float
y: float

p = Point(3.0, 4.0)
print(p.x) # 3.0
print(p[0]) # 3.0 - indexable like a tuple
print(p._asdict()) # {'x': 3.0, 'y': 4.0}

NamedTuple instances are tuples - immutable by default, indexable, unpackable, usable anywhere a tuple is expected. Use NamedTuple when you need tuple semantics (unpacking, indexing, compatibility with tuple-expecting functions) plus named access.

Side-by-Side Comparison

Feature@dataclassNamedTupleattrs
MutabilityMutable by defaultImmutableConfigurable
InheritanceFull Python inheritanceLimitedFull
frozen=TrueYesN/A (always frozen)@define(frozen=True)
__slots__slots=True (3.10+)Built-in@define uses slots
Custom validators__post_init__Manual@validator
PerformanceGoodBest (tuple)Best (slots)
IndexableNoYesNo
Standard libraryYesYesThird-party
Production useConfig, models, DTOsReturn values, recordsHigh-performance models

When to Use Each

  • @dataclass: general-purpose data containers, config objects, request/response models, when you need mutability or complex inheritance
  • NamedTuple: function return values, records where tuple unpacking is useful, small immutable data holders
  • attrs: high-performance scenarios, complex validation requirements, when you need slots + validators without Python 3.10

Part 9 - Production Patterns

FastAPI Request and Response Models

FastAPI uses Pydantic, not @dataclass, but the patterns are illustrative of how dataclasses are used for API boundaries:

from dataclasses import dataclass, field
from typing import Optional
import json

@dataclass
class CreateUserRequest:
"""Incoming request - validated in __post_init__."""
username: str
email: str
password: str
role: str = "user"

def __post_init__(self):
self.username = self.username.strip()
self.email = self.email.strip().lower()
if len(self.password) < 8:
raise ValueError("Password must be at least 8 characters")
if self.role not in ("user", "admin", "viewer"):
raise ValueError(f"Invalid role: {self.role!r}")

@dataclass
class UserResponse:
"""Outgoing response - no password, no sensitive fields."""
id: int
username: str
email: str
role: str

def to_dict(self) -> dict:
from dataclasses import asdict
return asdict(self)

def to_json(self) -> str:
return json.dumps(self.to_dict())


# Request/response flow
req = CreateUserRequest(
username=" Alice ",
password="supersecret123",
)
# Normalised in __post_init__
print(req.username) # "Alice"
print(req.email) # "[email protected]"

resp = UserResponse(id=1, username=req.username, email=req.email, role=req.role)
print(resp.to_json())
# {"id": 1, "username": "Alice", "email": "[email protected]", "role": "user"}

Configuration Objects

Dataclasses are excellent for typed configuration:

from dataclasses import dataclass, field
from typing import ClassVar
import os

@dataclass
class DatabaseConfig:
host: str = field(default_factory=lambda: os.getenv("DB_HOST", "localhost"))
port: int = field(default_factory=lambda: int(os.getenv("DB_PORT", "5432")))
name: str = field(default_factory=lambda: os.getenv("DB_NAME", "myapp"))
user: str = field(default_factory=lambda: os.getenv("DB_USER", "postgres"))
password: str = field(
default_factory=lambda: os.getenv("DB_PASSWORD", ""),
repr=False # NEVER include password in repr
)
pool_size: int = 10
max_overflow: int = 20

DRIVER: ClassVar[str] = "postgresql"

def __post_init__(self):
if not self.host:
raise ValueError("DB_HOST must be set")

@property
def url(self) -> str:
return f"{self.DRIVER}://{self.user}:{self.password}@{self.host}:{self.port}/{self.name}"


@dataclass(frozen=True)
class AppConfig:
"""Immutable application config - loaded once at startup."""
db: DatabaseConfig = field(default_factory=DatabaseConfig)
debug: bool = field(default_factory=lambda: os.getenv("DEBUG", "false").lower() == "true")
secret_key: str = field(
default_factory=lambda: os.getenv("SECRET_KEY", "dev-secret"),
repr=False
)
allowed_hosts: tuple = field(
default_factory=lambda: tuple(os.getenv("ALLOWED_HOSTS", "localhost").split(","))
)


config = AppConfig()
print(config.db.host) # localhost (or from env)
print(config.debug) # False (or from env)
print(config)
# AppConfig(db=DatabaseConfig(host='localhost', ...), debug=False, allowed_hosts=('localhost',))
# secret_key excluded from repr - repr=False

Value Objects with replace()

dataclasses.replace() creates a copy of a dataclass with selected fields changed - useful for immutable value objects.

from dataclasses import dataclass, replace

@dataclass(frozen=True)
class Money:
amount: float
currency: str = "USD"

def __add__(self, other: "Money") -> "Money":
if self.currency != other.currency:
raise ValueError(f"Cannot add {self.currency} and {other.currency}")
return replace(self, amount=self.amount + other.amount)

def __mul__(self, factor: float) -> "Money":
return replace(self, amount=self.amount * factor)

def convert(self, currency: str, rate: float) -> "Money":
return Money(amount=self.amount * rate, currency=currency)


price = Money(amount=100.0)
tax = Money(amount=10.0)
total = price + tax
print(total) # Money(amount=110.0, currency='USD')

euro_price = price.convert("EUR", 0.92)
print(euro_price) # Money(amount=92.0, currency='EUR')

# replace() usage
doubled = replace(price, amount=price.amount * 2)
print(doubled) # Money(amount=200.0, currency='USD')

Part 10 - Performance Considerations

Memory: slots=True (Python 3.10+)

from dataclasses import dataclass
import sys

@dataclass
class PointDict:
x: float
y: float

@dataclass(slots=True)
class PointSlots:
x: float
y: float

pd = PointDict(1.0, 2.0)
ps = PointSlots(1.0, 2.0)

print(sys.getsizeof(pd) + sys.getsizeof(pd.__dict__)) # ~232 bytes
print(sys.getsizeof(ps)) # ~56 bytes - no __dict__

Use slots=True for dataclasses that are instantiated in large numbers (sensor readings, geometric points, event records).

Instantiation Performance

Dataclasses have overhead compared to plain __init__ methods in hot paths:

import timeit

@dataclass
class DataPoint:
x: float
y: float
z: float = 0.0

class ManualPoint:
def __init__(self, x: float, y: float, z: float = 0.0):
self.x = x
self.y = y
self.z = z

dc_time = timeit.timeit(lambda: DataPoint(1.0, 2.0), number=1_000_000)
mp_time = timeit.timeit(lambda: ManualPoint(1.0, 2.0), number=1_000_000)

print(f"Dataclass: {dc_time:.3f}s")
print(f"Manual: {mp_time:.3f}s")
# Typical: dataclass is ~10-20% slower due to field validation at class creation
# With slots=True, the gap narrows significantly

For performance-critical code processing millions of objects per second, benchmark before choosing. In most application code, the difference is irrelevant.

Common Mistakes

Mistake 1 - Using a Mutable Default Directly

from dataclasses import dataclass

# Wrong: raises ValueError immediately
@dataclass
class Bad:
items: list = []

# Right
@dataclass
class Good:
items: list = field(default_factory=list)

Mistake 2 - Non-Default Field After Default Field in Subclass

@dataclass
class Base:
x: int = 0 # default

@dataclass
class Child(Base):
y: int # no default - TypeError at class definition time

Mistake 3 - Expecting frozen=True to Deep-Freeze Mutable Containers

@dataclass(frozen=True)
class Config:
tags: list = field(default_factory=list)

cfg = Config()
cfg.tags.append("x") # Works! frozen prevents reassignment, not mutation

Use tuple instead of list if you need true immutability.

Mistake 4 - Forgetting That @dataclass Does Not Generate __hash__ By Default

When eq=True (the default), Python sets __hash__ = None, making the class unhashable - the same behaviour as manually defining __eq__. To get a hash, either use frozen=True or unsafe_hash=True.

@dataclass
class Point:
x: float
y: float

p = Point(1, 2)
hash(p) # TypeError: unhashable type: 'Point'

@dataclass(frozen=True)
class FrozenPoint:
x: float
y: float

hash(FrozenPoint(1, 2)) # works

Engineering Checklist

Before moving to the next lesson, verify you can answer these without looking:

  1. What three dunder methods does @dataclass generate by default?
  2. Why can you not use a mutable value as a field default? What is the fix?
  3. What is the difference between field(repr=False), field(compare=False), and field(init=False)?
  4. What does frozen=True generate, and what does it NOT prevent?
  5. What is __post_init__ and when does it run?
  6. What is ClassVar and how does it differ from a regular field?
  7. What is InitVar and when would you use it?
  8. What is the non-default-after-default problem in dataclass inheritance? How do you fix it?
  9. When should you prefer NamedTuple over @dataclass?
  10. How do you create a copy of a frozen dataclass with one field changed?

Graded Practice Challenges

Level 1 - Predict the Output

Predict each output before running:

Question 1

from dataclasses import dataclass

@dataclass
class Item:
name: str
price: float = 0.0

a = Item("Widget", 9.99)
b = Item("Widget", 9.99)
print(a == b)
print(a is b)
Show Answer
True
False

@dataclass generates __eq__ that compares field values, so a == b is True. But a and b are two separate objects in memory, so a is b is False. This is the standard value-equality vs identity distinction.

Question 2

from dataclasses import dataclass, field

@dataclass
class Cart:
items: list = field(default_factory=list)

c1 = Cart()
c2 = Cart()
c1.items.append("apple")
print(c1.items)
print(c2.items)
Show Answer
['apple']
[]

default_factory=list calls list() separately for each instance. c1 and c2 each get their own list object - mutations to one do not affect the other.

Question 3

from dataclasses import dataclass

@dataclass(frozen=True)
class Coord:
x: int
y: int

c = Coord(1, 2)
s = {c, Coord(1, 2), Coord(3, 4)}
print(len(s))
Show Answer
2

frozen=True makes the dataclass hashable. Coord(1, 2) and c are equal (same field values) and hash equally, so the set deduplicates them. Only two unique coordinates remain.

Question 4

from dataclasses import dataclass, field

@dataclass
class Record:
value: int
label: str = field(default="", repr=False)
score: float = field(default=0.0, compare=False)

r1 = Record(value=10, label="A", score=99.9)
r2 = Record(value=10, label="B", score=0.0)
print(r1 == r2)
print(r1)
Show Answer
True
Record(value=10, score=99.9)

label is excluded from __eq__ because repr=False does not affect comparison - wait, compare is what controls __eq__. label has no compare=False so it IS compared. But label of r1 is "A" and r2 is "B", so r1 == r2 should be False... unless we check again: label has repr=False only, and score has compare=False. So __eq__ compares value and label. Since label differs ("A" vs "B"), the result is actually False. And repr shows value and score (label is excluded from repr).

Corrected output:

False
Record(value=10, score=99.9)

Key lesson: repr=False only affects __repr__. compare=False affects __eq__ and ordering. They are independent controls.

Question 5

from dataclasses import dataclass

@dataclass
class Node:
value: int

def __post_init__(self):
self.doubled = self.value * 2

n = Node(5)
print(n.doubled)
print(n)
Show Answer
10
Node(value=5)

__post_init__ runs and sets self.doubled = 10. However, doubled was not declared as a field annotation, so __repr__ does not include it. Only declared fields appear in the generated __repr__.

Level 2 - Debug Challenge

This dataclass has three bugs. Find and fix all of them:

from dataclasses import dataclass, field

@dataclass(frozen=True)
class ShoppingCart:
owner: str
items: list = [] # Bug 1
total: float = field(init=False) # Bug 2

def __post_init__(self):
self.total = sum(self.items) # Bug 3
Show Solution

Bug 1 - Mutable default on a frozen dataclass: items: list = [] raises ValueError: mutable default <class 'list'> is not allowed for field items. Fix with default_factory:

items: list = field(default_factory=list)

Bug 2 - field(init=False) needs a default or default_factory: total: float = field(init=False) has no default value. Since init=False means it is not set by __init__, Python needs to know the initial value. Fix:

total: float = field(init=False, default=0.0)

Bug 3 - Assigning to a field in __post_init__ of a frozen dataclass: frozen=True generates __setattr__ that raises FrozenInstanceError. You cannot assign to self.total directly. The fix is to use object.__setattr__:

def __post_init__(self):
object.__setattr__(self, "total", sum(self.items))

Corrected class:

from dataclasses import dataclass, field

@dataclass(frozen=True)
class ShoppingCart:
owner: str
items: tuple = () # use tuple for true immutability
total: float = field(init=False, default=0.0)

def __post_init__(self):
object.__setattr__(self, "total", sum(self.items))

cart = ShoppingCart(owner="Alice", items=(10.0, 25.50, 5.0))
print(cart.total) # 40.5
print(cart) # ShoppingCart(owner='Alice', items=(10.0, 25.5, 5.0), total=40.5)

Level 3 - Design Challenge

Design a typed configuration system for a microservice using dataclasses. The system must:

  1. Have a DatabaseConfig with host, port, name, user, and password (password excluded from repr)
  2. Have a CacheConfig with host, port, and TTL in seconds
  3. Have a top-level ServiceConfig that is frozen, holds both configs, and validates that port numbers are in range 1–65535
  4. Support loading from a dictionary (e.g., parsed from YAML/JSON)
  5. Use replace() to produce a "test mode" version with an in-memory database
Show Reference Solution
from dataclasses import dataclass, field, replace
from typing import ClassVar

@dataclass
class DatabaseConfig:
host: str = "localhost"
port: int = 5432
name: str = "myservice"
user: str = "postgres"
password: str = field(default="", repr=False)

def __post_init__(self):
if not (1 <= self.port <= 65535):
raise ValueError(f"DB port must be 1-65535, got {self.port}")

@classmethod
def from_dict(cls, data: dict) -> "DatabaseConfig":
return cls(
host=data.get("host", "localhost"),
port=int(data.get("port", 5432)),
name=data.get("name", "myservice"),
user=data.get("user", "postgres"),
password=data.get("password", ""),
)

@property
def url(self) -> str:
return f"postgresql://{self.user}:{self.password}@{self.host}:{self.port}/{self.name}"


@dataclass
class CacheConfig:
host: str = "localhost"
port: int = 6379
ttl_seconds: int = 300

def __post_init__(self):
if not (1 <= self.port <= 65535):
raise ValueError(f"Cache port must be 1-65535, got {self.port}")
if self.ttl_seconds <= 0:
raise ValueError(f"TTL must be positive, got {self.ttl_seconds}")

@classmethod
def from_dict(cls, data: dict) -> "CacheConfig":
return cls(
host=data.get("host", "localhost"),
port=int(data.get("port", 6379)),
ttl_seconds=int(data.get("ttl_seconds", 300)),
)


@dataclass(frozen=True)
class ServiceConfig:
service_name: str
db: DatabaseConfig = field(default_factory=DatabaseConfig)
cache: CacheConfig = field(default_factory=CacheConfig)
debug: bool = False

VERSION: ClassVar[str] = "1.0"

@classmethod
def from_dict(cls, data: dict) -> "ServiceConfig":
return cls(
service_name=data["service_name"],
db=DatabaseConfig.from_dict(data.get("db", {})),
cache=CacheConfig.from_dict(data.get("cache", {})),
debug=data.get("debug", False),
)

def as_test_config(self) -> "ServiceConfig":
"""Return a copy configured for testing (in-memory SQLite, no cache TTL)."""
test_db = replace(self.db, host="memory", name=":memory:", port=1)
test_cache = replace(self.cache, ttl_seconds=1)
return replace(self, db=test_db, cache=test_cache, debug=True)


# Load from a dict (simulating YAML parse)
raw = {
"service_name": "order-service",
"db": {"host": "db.prod.example.com", "port": "5432", "name": "orders", "password": "s3cr3t"},
"cache": {"host": "redis.prod.example.com", "ttl_seconds": "600"},
"debug": False,
}

config = ServiceConfig.from_dict(raw)
print(config.service_name) # order-service
print(config.db.host) # db.prod.example.com
print(config.db) # DatabaseConfig(host='db.prod.example.com', ...) - no password
print(config.cache.ttl_seconds) # 600

test_config = config.as_test_config()
print(test_config.db.host) # memory
print(test_config.debug) # True
print(test_config.cache.ttl_seconds) # 1

Key Takeaways

  • @dataclass generates __init__, __repr__, and __eq__ by default. All other methods (__hash__, __lt__, etc.) require explicit opt-in via decorator parameters.
  • field(default_factory=...) is mandatory for mutable defaults (list, dict). Using a mutable directly raises ValueError at class definition time - Python catches this to prevent shared-state bugs.
  • frozen=True generates __setattr__/__delattr__ that raise FrozenInstanceError, and also generates __hash__. It does NOT deep-freeze: mutable container attributes can still be mutated.
  • __post_init__ runs after the generated __init__ finishes. It is the place for validation, normalisation, and computed fields.
  • ClassVar fields are excluded from all dataclass operations. InitVar fields appear in __init__ and __post_init__ but are not stored as instance attributes.
  • Dataclass inheritance places parent fields before child fields in the generated __init__. If any parent field has a default, all child fields must also have defaults - or use kw_only=True.
  • Use frozen=True for value objects (Money, Point, Config) that should be immutable and hashable. Use NamedTuple when you need tuple semantics (indexing, unpacking). Use attrs for high-performance slots + validators.
  • dataclasses.replace() creates a shallow copy with selected fields changed - the correct way to "modify" a frozen dataclass.

What's Next

Lesson 11 covers the SOLID principles applied to Python - the five design principles that guide the construction of maintainable, extensible object-oriented systems. You have already seen several SOLID violations and their fixes throughout this module - now you will learn them by name, understand them precisely, and see production Python examples of each.

© 2026 EngineersOfAI. All rights reserved.