Python Variables - What Nobody Actually Teaches You
Reading time: ~18 minutes | Level: Foundation → Engineering
Here is a question.
a = [1, 2, 3]
b = a
b.append(4)
print(a) # What prints here?
If you said [1, 2, 3] - you have the wrong mental model.
If you said [1, 2, 3, 4] - you understand Python's memory model.
If you can explain exactly why - you are ahead of most developers.
By the end of this page, you will not just know the answer. You will understand the entire object reference model that governs Python behavior.
What You Will Learn
- Why Python variables are names, not boxes
- The difference between reassignment and mutation - and why it matters
- What
id(),type(), andisactually check - The small integer interning trap that tricks experienced engineers
- Why default mutable function arguments are a silent time bomb
- What pass-by-object-reference means (it is not pass-by-value, not pass-by-reference)
- When to use shallow copy vs deep copy and why it matters in AI/ML pipelines
- The top 5 variable-related bugs that break real production systems
- How this directly applies to NumPy arrays and PyTorch tensors
Prerequisites
Before this page, you should be comfortable with:
- Running Python code in a terminal or REPL
- Understanding that Python has data types (int, str, list)
- Basic function syntax
The Mental Model That Changes Everything
How Beginners Think (Incorrect)
Most people coming from other languages picture a variable like a labelled box:
In C, when you write int y = x, you copy the value. Two independent boxes. Two separate integers in memory.
How Python Actually Works (Correct)
Python does something fundamentally different.
Variables in Python are names bound to objects.
They are not storage containers.
They are labels attached to objects that live on the heap.
This single insight explains dozens of Python behaviors that confuse developers every day.
Watch: Python Names and Values Explained
:::info Video This talk by Ned Batchelder (PyCon) is one of the best explanations of Python's variable model ever recorded. Watch it before or after reading - it reinforces everything on this page. :::
Part 1 - Name Binding: What x = 10 Actually Does
When you write:
x = 10
Python does three things:
Step 1. Creates an integer object 10 on the heap (or reuses a cached one)
Step 2. Creates the name x in the current namespace
Step 3. Binds the name x to that object
You can verify this with id(), which returns the memory address of an object:
x = 10
print(id(x)) # e.g. 140234567890
y = x
print(id(y)) # Same address - same object
print(x is y) # True - same object in memory
There is only one object.
Both names reference it.
:::tip The Core Rule
In Python, assignment (=) never copies data automatically.
It creates a name and binds it to an existing object.
:::
Part 2 - Reassignment vs Mutation: Two Operations, One Syntax
This is the most critical distinction in Python's memory model.
Both look similar. They behave completely differently.
Reassignment - The Reference Changes
a = 5
b = a
print(id(a), id(b)) # Same object
b = 6
print(id(a)) # Unchanged - still points to 5
print(id(b)) # New object - now points to 6
print(a) # 5
print(b) # 6
Memory diagram after b = 6:
Before: After b = 6:
namespace heap namespace heap
a ──────► [5] a ──────► [5]
b ──────► [5] b ──────► [6] (new object)
a is untouched.
Reassignment moves a name to a different object.
Mutation - The Object Changes
a = [1, 2, 3]
b = a
print(id(a), id(b)) # Same object
b.append(4)
print(a) # [1, 2, 3, 4] ← a is affected!
print(b) # [1, 2, 3, 4]
print(id(a) == id(b)) # True - still same object
Memory diagram after b.append(4):
Before: After b.append(4):
namespace heap namespace heap
a ──────► [1, 2, 3] a ──────► [1, 2, 3, 4]
b ──────► (same) b ──────► (same object, modified)
The object itself changed.
Both names still point to it.
Both names see the change.
:::danger Critical Distinction
- Reassignment changes which object a name points to
- Mutation changes the object itself
This difference determines whether other variables are affected. :::
Watch: Mutable vs Immutable Objects in Python
Part 3 - Everything Is an Object
In Python, there are no primitives in the C sense.
Every value - integer, float, boolean, string, None - is a full object.
import sys
x = 42
print(type(x)) # <class 'int'>
print(id(x)) # memory address
print(sys.getsizeof(x)) # size in bytes (28 bytes for int!)
Why does 42 take 28 bytes when C stores an int in 4 bytes?
Because a Python integer object contains:
This overhead enables Python's flexibility - dynamic typing, garbage collection, introspection.
But it costs memory and speed compared to C.
# You can inspect any object's internals
x = 10
print(x.__class__) # <class 'int'>
print(x.bit_length()) # 4 - number of bits needed
print(x.__doc__[:50]) # Documentation string
This is why Python is called "everything is an object" - even int is a first-class class.
Part 4 - Identity vs Equality: is vs ==
These two operators check completely different things.
a = [1, 2, 3]
b = [1, 2, 3]
print(a == b) # True - same VALUE
print(a is b) # False - different OBJECTS in memory
namespace heap
a ──────► [1, 2, 3] object at address 0x7f1a
b ──────► [1, 2, 3] object at address 0x7f2b ← different!
| Operator | Checks | When to use |
|---|---|---|
== | Value equality (__eq__) | Comparing data content |
is | Identity (same memory address) | Checking for None, singletons |
The Only Correct Uses of is
# Correct: checking for None
if result is None:
handle_missing()
# Correct: checking for True/False singletons
if flag is True:
do_something()
# Correct: checking class identity
if type(x) is int:
process_integer(x)
Never Do This
# Wrong: comparing values with `is`
x = 1000
y = 1000
if x is y: # Unreliable! May be False even if values match
print("equal")
Using is to compare values is a common bug. It works accidentally for small integers (due to caching), creating a false sense of correctness that breaks in production.
Part 5 - Integer Interning: The Trap That Fools Experts
CPython caches integers from -5 to 256 for performance.
a = 100
b = 100
print(a is b) # True - same cached object
a = 1000
b = 1000
print(a is b) # False - different objects (outside cache range)
Why does this exist?
Small integers like 0, 1, -1 are used constantly in programs (loop counters, indices, flag values). Caching them avoids millions of memory allocations.
# Proving integer interning
for i in range(-5, 257):
a = i
b = i
assert a is b, f"Interning failed at {i}"
print("All integers -5 to 256 are interned")
What This Means for Your Code
Never rely on integer interning behavior.
It is an implementation detail of CPython, not a language guarantee.
PyPy, Jython, and other implementations may behave differently.
# This code may pass tests locally but break in production
def validate_status(code):
return code is 200 # BUG: works for -5 to 256, fails otherwise
# Correct version
def validate_status(code):
return code == 200
Python 3.8+ even shows a SyntaxWarning when using is with literals.
Part 6 - Multiple Assignment, Unpacking, and the Walrus Operator
Chained Assignment
a = b = c = 0
print(a, b, c) # 0 0 0
# All three names point to the same object
print(id(a) == id(b) == id(c)) # True
This is safe for immutable objects like integers and strings.
It is a trap for mutable objects:
a = b = c = [] # Dangerous!
a.append(1)
print(b) # [1] - not what you intended
print(c) # [1] - all three point to the SAME list
# Correct: create independent lists
a, b, c = [], [], []
Tuple Unpacking (Destructuring Assignment)
# Basic unpacking
x, y = 10, 20
print(x, y) # 10 20
# From a list
first, second, third = [100, 200, 300]
# Extended unpacking (Python 3+)
first, *rest = [1, 2, 3, 4, 5]
print(first) # 1
print(rest) # [2, 3, 4, 5]
# Ignore values with _
x, _, z = (1, 2, 3)
print(x, z) # 1 3
The Pythonic Swap
# Other languages need a temp variable:
# temp = a; a = b; b = temp
# Python:
a, b = 5, 10
a, b = b, a
print(a, b) # 10 5
Under the hood, Python creates a temporary tuple (b, a) and unpacks it - no explicit temp variable needed.
Walrus Operator := (Python 3.8+)
The assignment expression binds a value while returning it:
import re
data = "Error: connection timeout at 2024-01-15"
# Without walrus: redundant computation
match = re.search(r"\d{4}-\d{2}-\d{2}", data)
if match:
print(match.group())
# With walrus: cleaner
if m := re.search(r"\d{4}-\d{2}-\d{2}", data):
print(m.group()) # 2024-01-15
# Useful in while loops
import random
while (n := random.randint(1, 10)) != 7:
print(f"Got {n}, trying again...")
print(f"Found 7!")
:::note When to Use Walrus
Use := only when it genuinely eliminates redundancy.
Avoid it when it reduces clarity - a short name and a clear condition are always better than a clever one-liner.
:::
Part 7 - Default Mutable Arguments: The Classic Production Bug
This is one of the most commonly asked Python interview questions.
And one of the most commonly misunderstood bugs in real code.
# Dangerous - DO NOT DO THIS
def add_item(item, collection=[]):
collection.append(item)
return collection
print(add_item("apple")) # ['apple']
print(add_item("banana")) # ['apple', 'banana'] ← Bug!
print(add_item("cherry")) # ['apple', 'banana', 'cherry'] ← Still wrong!
Why does this happen?
Default argument values are evaluated once at function definition time, not at each call.
The same list object is reused across all calls.
# Proof: same object every time
def f(x=[]):
print(id(x))
f() # same id each call
f() # same id
f() # same id
The correct pattern:
def add_item(item, collection=None):
if collection is None:
collection = [] # New list created on every call
collection.append(item)
return collection
print(add_item("apple")) # ['apple']
print(add_item("banana")) # ['banana'] ← Correct
When is the original pattern intentional?
Sometimes developers use mutable defaults as a cache (memoization):
def cached_fibonacci(n, memo={}):
if n in memo:
return memo[n]
if n <= 1:
return n
memo[n] = cached_fibonacci(n-1, memo) + cached_fibonacci(n-2, memo)
return memo[n]
This is intentional mutation of the default - but should be documented clearly.
Part 8 - Pass-by-Object-Reference
Python's argument passing is neither pass-by-value (like C) nor pass-by-reference (like C++ references).
It is pass-by-object-reference - sometimes called pass-by-assignment.
What This Means
When you call a function, the parameter name is bound to the same object as the argument:
def inspect(value):
print(id(value))
x = [1, 2, 3]
print(id(x)) # e.g. 140234
inspect(x) # same: 140234 - same object, different name
Mutation Visible Outside the Function
def add_item(lst, item):
lst.append(item) # Mutates the original object
my_list = [1, 2, 3]
add_item(my_list, 99)
print(my_list) # [1, 2, 3, 99] - modified!
The function received a reference to the same list.
Mutating it affects the caller's variable.
Reassignment Not Visible Outside
def reset(lst):
lst = [] # Rebinds local name - does NOT affect caller
lst.append(100)
my_list = [1, 2, 3]
reset(my_list)
print(my_list) # [1, 2, 3] - unchanged
Inside the function, lst = [] creates a new binding for the local name lst.
The original object is unaffected.
Before call:
caller namespace: my_list ──────► [1, 2, 3]
After reset(my_list) starts:
local namespace: lst ──────────► [1, 2, 3] (same object)
After lst = []:
local namespace: lst ──────────► [] (new object)
caller namespace: my_list ──────► [1, 2, 3] (unchanged)
:::tip Summary
- Mutating the object inside a function: affects the caller
- Rebinding the parameter name inside a function: does NOT affect the caller :::
Part 9 - Copy Semantics: Shallow vs Deep
When you need an independent copy, you must ask for one explicitly.
Assignment (No Copy)
a = [1, 2, 3]
b = a # b is an alias - same object
b.append(4)
print(a) # [1, 2, 3, 4] - affected
Shallow Copy (One Level Deep)
import copy
a = [[1, 2], [3, 4]]
b = copy.copy(a) # or: b = a[:] or: b = list(a)
b.append([5, 6])
print(a) # [[1, 2], [3, 4]] - outer list unchanged
b[0].append(99)
print(a) # [[1, 2, 99], [3, 4]] - inner list AFFECTED!
Shallow copy creates a new container, but the nested objects are still shared:
After copy.copy(a):
a ──► [ ref0, ref1 ]
│ │
b ──► [ ref0, ref1 ] ← different outer list, same inner objects!
│ │
[1,2] [3,4]
Deep Copy (Fully Independent)
import copy
a = [[1, 2], [3, 4]]
b = copy.deepcopy(a)
b[0].append(99)
print(a) # [[1, 2], [3, 4]] - completely unaffected
Deep copy recursively duplicates every nested object.
After copy.deepcopy(a):
a ──► [ ref0, ref1 ]
│ │
[1,2] [3,4]
b ──► [ ref0', ref1' ] ← completely independent copies
│ │
[1,2] [3,4]
When to Use Which
| Situation | Method |
|---|---|
| Read-only access - no copy needed | Assignment (b = a) |
| Flat structure (no nested objects) | Shallow copy |
| Nested or complex structure | Deep copy |
| NumPy array - view (no copy) | b = a[:] or b = a.view() |
| NumPy array - true copy | b = a.copy() |
| Performance critical inner loop | Avoid copy entirely if possible |
Watch: Shallow vs Deep Copy in Python
Top 5 Variable Bugs in Production Code
Bug 1: Aliasing Instead of Copying
# In a data pipeline
def process(data):
result = data # Bug: alias, not copy
result.sort() # Sorts the ORIGINAL data too
return result
records = [5, 2, 8, 1]
processed = process(records)
print(records) # [1, 2, 5, 8] - original destroyed!
# Fix:
def process(data):
result = sorted(data) # Creates new sorted list
return result
Bug 2: Default Mutable Accumulation
# Bug seen in real Django projects
def get_permissions(user, perms=[]):
if user.is_admin:
perms.append("admin")
return perms
# First call
get_permissions(admin_user) # ['admin']
# Second call - new user, same list!
get_permissions(regular_user) # ['admin'] ← Bug in production
Bug 3: Loop Variable Capture in Closures
# Common in UI handlers or callback systems
functions = []
for i in range(3):
functions.append(lambda: i) # Bug: captures name i, not value
for f in functions:
print(f()) # 2, 2, 2 - all print the final value of i
# Fix: capture value explicitly
for i in range(3):
functions.append(lambda x=i: x)
for f in functions:
print(f()) # 0, 1, 2 - correct
Bug 4: Using is for Value Comparison
# Bug: passes tests for small numbers, fails in prod for large ones
user_id = get_user_id()
if user_id is 200: # SyntaxWarning in Python 3.8+
grant_access()
# Fix
if user_id == 200:
grant_access()
Bug 5: Shallow Copy of Nested Config
import copy
DEFAULT_CONFIG = {
"layers": [64, 128, 256],
"learning_rate": 0.001,
}
# Bug: modifying "layers" affects DEFAULT_CONFIG
user_config = DEFAULT_CONFIG.copy()
user_config["layers"].append(512)
print(DEFAULT_CONFIG["layers"]) # [64, 128, 256, 512] - corrupted!
# Fix
user_config = copy.deepcopy(DEFAULT_CONFIG)
user_config["layers"].append(512)
print(DEFAULT_CONFIG["layers"]) # [64, 128, 256] - safe
AI/ML Real-World Connection
Understanding Python's reference model is not academic - it directly affects correctness in machine learning pipelines.
NumPy Views vs Copies
NumPy arrays can return views (aliases) or copies depending on the operation:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
# Slicing returns a VIEW (alias)
view = arr[1:4]
view[0] = 99
print(arr) # [ 1 99 3 4 5] - original modified!
# .copy() returns a true copy
safe = arr[1:4].copy()
safe[0] = 0
print(arr) # [ 1 99 3 4 5] - unchanged
This directly causes bugs in:
- Data preprocessing - accidentally modifying training data
- Feature engineering - corrupting arrays passed between functions
- Batch processing - one batch modifying the global dataset
PyTorch Tensors
PyTorch has the same concept - tensor.view() returns an alias, tensor.clone() returns a copy:
import torch
x = torch.tensor([1.0, 2.0, 3.0])
y = x.view(3, 1) # alias - shares data
y[0] = 99.0
print(x) # tensor([99., 2., 3.]) - x is modified!
# Safe version
z = x.clone()
Understanding Python's reference model makes this behavior predictable rather than mysterious.
Interview Questions
These appear frequently in Python engineering interviews.
Q1: What is the difference between is and ==?
Answer: == calls __eq__() and checks value equality. is checks identity - whether two names reference the same object in memory. Use == for value comparison. Use is only for None, True, False, and class identity checks.
Q2: What does Python's assignment operator actually do?
Answer: It creates a name in the current namespace and binds it to an object on the heap. It does not copy the object. Multiple names can reference the same object simultaneously.
Q3: Why is def func(x=[]) dangerous?
Answer: Default argument values are evaluated once at function definition time, not at each call. The same list object is reused across all calls. Any mutation persists between invocations. The fix is to use None as the default and create the list inside the function body.
Q4: What happens when you pass a list to a function and modify it inside?
Answer: The function receives a reference to the same object. Mutating the object (e.g., .append()) affects the original. Rebinding the parameter name (e.g., lst = []) only affects the local name - the original is unchanged.
Q5: What is the difference between shallow copy and deep copy?
Answer: A shallow copy creates a new outer container but shares references to nested objects. A deep copy recursively duplicates all nested objects, producing a fully independent structure. Use copy.copy() for shallow and copy.deepcopy() for deep.
Q6: What will this print?
a = b = []
a.append(1)
print(b)
Answer: [1]. Both a and b are bound to the same list object. Mutating through a is visible through b.
Q7: What will this print?
a = []
b = a
a = [1, 2, 3]
print(b)
Answer: []. After b = a, both names reference the same empty list. Then a = [1, 2, 3] rebinds a to a new list. b still points to the original empty list.
Quick Reference Cheatsheet
| Operation | What Changes | Other Names Affected? |
|---|---|---|
b = a | Nothing - new binding | Yes, via mutation |
b = a[:] | New outer container | No (for flat list) |
b = copy.copy(a) | New outer container | No (for flat list) |
b = copy.deepcopy(a) | Entire object graph | No |
a.append(x) | Mutates object | Yes - all aliases see it |
a = [1, 2, 3] | Rebinds a | No - other names unchanged |
a is b | Checks identity | - |
a == b | Checks value | - |
Graded Practice Challenges
Level 1 - Predict the Output
What does this print?
x = 10
y = x
x = 20
print(y)
Show Answer
Output: 10
When y = x executes, y is bound to the integer object 10. Then x = 20 rebinds x to a new object 20. The binding of y is unchanged.
What does this print?
a = [1, 2, 3]
b = a
a = a + [4]
print(b)
Show Answer
Output: [1, 2, 3]
a + [4] creates a new list object and rebinds a to it. b still references the original [1, 2, 3].
Compare with a += [4] which calls __iadd__ (in-place extend) and mutates the original - b would then show [1, 2, 3, 4].
Level 2 - Debug the Code
Find and fix the bug:
def create_matrix(rows, cols, row=[]):
matrix = []
for i in range(rows):
row = row
for j in range(cols):
row.append(0)
matrix.append(row)
return matrix
print(create_matrix(3, 3))
Show Answer
Two bugs:
- The default
row=[]is shared across calls row = rowdoes not create a new list - it's a no-op alias
Fixed version:
def create_matrix(rows, cols):
matrix = []
for i in range(rows):
row = [] # New list on every iteration
for j in range(cols):
row.append(0)
matrix.append(row)
return matrix
# Or more Pythonically:
def create_matrix(rows, cols):
return [[0] * cols for _ in range(rows)]
Level 3 - Design Challenge
You are building a configuration system for a machine learning training pipeline.
The system has a DEFAULT_CONFIG dictionary containing nested lists (layer sizes, augmentation parameters).
Users should be able to create custom configurations without ever modifying the defaults.
Design a get_config() function that:
- Returns a user-modifiable config
- Guarantees the defaults are never affected
- Allows users to override individual keys
Write the function and demonstrate it is safe.
Show Reference Solution
import copy
DEFAULT_CONFIG = {
"model": {
"layers": [64, 128, 256],
"activation": "relu",
"dropout": 0.3,
},
"training": {
"epochs": 100,
"batch_size": 32,
"learning_rate": 0.001,
},
"augmentation": {
"transforms": ["flip", "rotate"],
}
}
def get_config(**overrides):
"""
Returns a deep copy of DEFAULT_CONFIG with optional overrides.
The global DEFAULT_CONFIG is never modified.
"""
config = copy.deepcopy(DEFAULT_CONFIG)
for key, value in overrides.items():
if key in config:
if isinstance(value, dict) and isinstance(config[key], dict):
config[key].update(value) # Merge nested dicts
else:
config[key] = value # Replace scalar values
else:
config[key] = value
return config
# Usage
user_config = get_config(
training={"epochs": 200, "learning_rate": 0.0005}
)
user_config["model"]["layers"].append(512) # Modify freely
# Verify defaults are untouched
print(DEFAULT_CONFIG["model"]["layers"]) # [64, 128, 256]
print(DEFAULT_CONFIG["training"]["epochs"]) # 100
print(user_config["training"]["epochs"]) # 200
Key Takeaways
- Python variables are names bound to objects - not storage containers
=binds a name to an object - it never copies- Reassignment changes which object a name points to
- Mutation changes the object itself - all bound names see the change
ischecks identity (same memory address) - use only forNoneand singletons==checks value equality - use for comparing data- Small integers (-5 to 256) are cached - never rely on
isfor numeric comparison - Default mutable arguments are evaluated once - use
Noneas sentinel - Python passes object references to functions - mutation affects the caller, rebinding does not
- Shallow copy copies the container but shares nested objects
- Deep copy creates a fully independent object graph
- This model directly causes bugs in NumPy views, PyTorch tensors, and config pipelines
