Writing Files - Modes, Atomicity, and Safe File Updates
Reading time: ~17 minutes | Level: Foundation → Engineering
Here is a scenario that has caused production outages:
# Updating a configuration file
with open("config.json", "w") as f:
f.write(json.dumps(config))
# Program crashes here (power cut, SIGKILL, disk full)
# Result: config.json is now empty or half-written.
# The server reads it on startup → crashes → you get a 3 AM page.
'w' mode truncates the file before writing. If your write is interrupted, you lose both the old content and the new content. The file is gone.
Production systems use a different pattern - write to a temporary file, then rename it atomically. This page explains why, and every other piece of file-writing knowledge you need to write robust code.
What You Will Learn
- All write modes:
'w'(truncate),'a'(append),'x'(exclusive create),'w+','a+' write()vswritelines()- behavioral differences and when to use each- Flushing and syncing: what
file.flush()does vsos.fsync()- the gap between them - How write buffering can silently lose data on crash
- The atomic write pattern: write-to-temp-then-rename for crash-safe updates
- Writing binary data:
'wb'mode, struct module, encoding in binary contexts - Exclusive create mode
'x': race-condition-free file creation - The
tempfilemodule:NamedTemporaryFile,mkstempfor safe temporary file creation io.StringIOandio.BytesIO: in-memory file objects for testing- Real-world patterns: safe config updates, atomic log rotation, report generation
Prerequisites
- Understanding of
open()and file modes from the Reading Files section - Basic understanding of Python exceptions and
withstatements - Familiarity with
osmodule basics
Mental Model: What Happens When You Write
The path from your Python f.write("data") call to bits on disk is longer than you think:
Power failure between any two layers = data loss!
f.flush() moves data from Layer 1 to Layer 2. os.fsync() moves data from Layer 2 to Layer 3. Only after fsync() is the data truly durable.
Part 1 - Write Modes in Depth
All Write Modes at a Glance
| Mode | File Exists | File Missing | After open() |
|---|---|---|---|
'w' | Truncate | Create new | Position at 0 (empty file) |
'a' | Preserve | Create new | Position at end |
'x' | Raise Error | Create new | Position at 0 (new file) |
'w+' | Truncate | Create new | Read+write, position at 0 |
'a+' | Preserve | Create new | Read+write, writes at end |
'r+' | Preserve | Raise Error | Read+write, position at 0 |
'w' - Write (Truncate)
# 'w' creates the file if missing, TRUNCATES IT if it exists
with open("output.txt", "w", encoding="utf-8") as f:
f.write("First line\n")
f.write("Second line\n")
# File now contains exactly "First line\nSecond line\n"
# Whatever was there before is gone
:::danger 'w' mode truncates immediately
The truncation happens when open() is called, not when write() is called. So:
f = open("important.txt", "w") # file is now EMPTY at this point
# crash here - important.txt is now empty forever
f.write("data")
f.close()
Always write to a temp file first, then rename. See Part 5 for the pattern. :::
'a' - Append
# 'a' always writes to the end - existing content is preserved
with open("access.log", "a", encoding="utf-8") as f:
f.write("2024-01-15 14:32:01 GET /api/users 200\n")
# Subsequent writes append to the end
with open("access.log", "a", encoding="utf-8") as f:
f.write("2024-01-15 14:32:02 POST /api/login 401\n")
'a' mode is safe for appending to logs because:
- On POSIX systems,
O_APPENDmakes each write atomic at the OS level - even if multiple processes write simultaneously, writes do not interleave within a singlewrite()call - Existing content is never truncated
However, 'a' mode cannot be used to update existing content - writes always go to the end regardless of f.seek() calls.
'x' - Exclusive Create
# 'x' fails if the file already exists - race-condition-free creation
try:
with open("config.lock", "x", encoding="utf-8") as f:
f.write(str(os.getpid()))
print("Lock acquired")
except FileExistsError:
print("Another process holds the lock")
'x' maps to O_CREAT | O_EXCL at the OS level - a single atomic operation. Without 'x', a check-then-create pattern (if not os.path.exists() then open('w')) has a time-of-check/time-of-use (TOCTOU) race condition between the check and the create.
# WRONG - race condition (TOCTOU):
if not os.path.exists("lockfile"):
# Another process could create it here!
with open("lockfile", "w") as f:
f.write("locked")
# CORRECT - atomic:
try:
with open("lockfile", "x") as f:
f.write("locked")
except FileExistsError:
pass # already locked
Part 2 - write() vs writelines()
write(string) - Write a String
with open("data.txt", "w", encoding="utf-8") as f:
f.write("Hello, World!\n") # returns 14 (number of chars written)
f.write("Second line\n")
# No automatic newline added - you must include it yourself
write() returns the number of characters (text mode) or bytes (binary mode) written. This is almost always the full length - partial writes are an edge case in network I/O but not in regular file I/O.
writelines(iterable) - Write an Iterable of Strings
lines = ["Line 1\n", "Line 2\n", "Line 3\n"]
with open("data.txt", "w", encoding="utf-8") as f:
f.writelines(lines)
# File: "Line 1\nLine 2\nLine 3\n"
writelines() is a convenience method. It calls write() for each item. Crucially, it does not add newlines - that is your responsibility.
# Common mistake: forgetting newlines with writelines
words = ["apple", "banana", "cherry"]
# Wrong - all on one line:
with open("words.txt", "w", encoding="utf-8") as f:
f.writelines(words)
# File: "applebananacherry"
# Correct - add newlines:
with open("words.txt", "w", encoding="utf-8") as f:
f.writelines(word + "\n" for word in words)
# File: "apple\nbanana\ncherry\n"
Performance: write() vs writelines() vs join()+write()
import timeit, io
lines = [f"line {i}\n" for i in range(100000)]
# Option 1: repeated write()
def method_write():
with io.StringIO() as f:
for line in lines:
f.write(line)
# Option 2: writelines()
def method_writelines():
with io.StringIO() as f:
f.writelines(lines)
# Option 3: join then single write()
def method_join():
with io.StringIO() as f:
f.write("".join(lines))
# All three are fast due to buffering; join() uses more peak RAM
# For very large iterables, writelines() with a generator is most memory-efficient
:::tip For large outputs, use writelines() with a generator
def generate_report_lines(data):
yield "Report Header\n"
yield "=" * 40 + "\n"
for item in data:
yield f"{item['name']:30} {item['value']:10.2f}\n"
yield "=" * 40 + "\n"
yield f"Total: {sum(d['value'] for d in data):.2f}\n"
with open("report.txt", "w", encoding="utf-8") as f:
f.writelines(generate_report_lines(dataset))
# Memory: O(1) per line, not O(len(dataset))
:::
Part 3 - Flushing and Syncing: The Full Story
file.flush() - Python Buffer to OS Buffer
with open("realtime.log", "w", encoding="utf-8", buffering=1) as f:
# buffering=1 means line-buffered in text mode
for event in event_stream:
f.write(f"{event}\n")
f.flush() # ensures this line is in the OS buffer immediately
# Another process can now read this line from the file
flush() sends Python's internal buffer to the operating system's page cache. After flush(), another process reading the file will see the data. However, the data is still in RAM (the OS page cache) - not necessarily on disk. A power failure can still lose it.
os.fsync() - OS Buffer to Disk
import os
with open("critical.db", "w", encoding="utf-8") as f:
f.write(serialized_data)
f.flush() # flush Python buffer to OS
os.fsync(f.fileno()) # force OS to write to physical disk
# After fsync(), the data survives a power failure
os.fsync() issues a fsync(2) system call. This tells the OS to flush its page cache to the storage device and wait until the device confirms the write. It is slow (milliseconds) but guarantees durability.
Use fsync() for: database write-ahead logs (WAL), financial transactions, any data where loss = correctness violation.
Skip fsync() for: temporary files, generated reports (recreatable), logs (some loss acceptable), development/test output.
os.fdatasync() - Faster fsync Without Metadata
import os
# fdatasync() syncs data blocks but NOT file metadata (timestamps, etc.)
# Faster than fsync() when you only care about data durability
with open("data.bin", "wb") as f:
f.write(data)
f.flush()
os.fdatasync(f.fileno()) # Linux only; use fsync() on macOS/Windows
Part 4 - Writing Binary Data
Binary Write Mode 'wb'
# Write raw bytes
with open("data.bin", "wb") as f:
data = b"\x89PNG\r\n\x1a\n" # PNG file signature
f.write(data)
# You CANNOT write str to a binary file
with open("data.bin", "wb") as f:
f.write("hello") # TypeError: a bytes-like object is required, not 'str'
struct Module for Binary Format Writing
import struct
# Pack binary data into a fixed-format byte sequence
# Format string: '<' = little-endian, 'I' = uint32, 'f' = float32, '10s' = 10-byte string
RECORD_FORMAT = "<If10s"
RECORD_SIZE = struct.calcsize(RECORD_FORMAT) # 18 bytes per record
def write_binary_records(filepath, records):
"""Write a binary database file with fixed-size records."""
with open(filepath, "wb") as f:
for record_id, value, name in records:
name_bytes = name.encode("utf-8")[:10].ljust(10, b"\x00")
packed = struct.pack(RECORD_FORMAT, record_id, value, name_bytes)
f.write(packed)
def read_binary_records(filepath):
"""Read all records from the binary file."""
records = []
with open(filepath, "rb") as f:
while True:
raw = f.read(RECORD_SIZE)
if len(raw) < RECORD_SIZE:
break
record_id, value, name_bytes = struct.unpack(RECORD_FORMAT, raw)
name = name_bytes.rstrip(b"\x00").decode("utf-8")
records.append((record_id, value, name))
return records
# Usage
data = [(1, 98.6, "Alice"), (2, 37.2, "Bob"), (3, 102.1, "Charlie")]
write_binary_records("/tmp/patients.bin", data)
recovered = read_binary_records("/tmp/patients.bin")
print(recovered)
# [(1, 98.6000..., 'Alice'), (2, 37.2000..., 'Bob'), (3, 102.0999..., 'Charlie')]
Encoding in Binary Contexts
# Explicitly encode before writing in binary mode
text = "Hello, café!"
encoded = text.encode("utf-8") # b'Hello, caf\xc3\xa9!'
with open("output.bin", "wb") as f:
# Write a length-prefixed string (common binary protocol pattern)
length = len(encoded)
f.write(struct.pack("<I", length)) # 4-byte little-endian length
f.write(encoded) # the string bytes
# Read it back
with open("output.bin", "rb") as f:
length_bytes = f.read(4)
length = struct.unpack("<I", length_bytes)[0]
data = f.read(length).decode("utf-8")
print(data) # Hello, café!
Part 5 - Atomic Writes: The Critical Pattern
The Problem with Direct 'w' Mode Updates
# DANGEROUS for critical files:
def update_config(path, new_config):
with open(path, "w", encoding="utf-8") as f: # TRUNCATES HERE
import json
json.dump(new_config, f, indent=2) # if crash here → empty file
If the process is killed between open() and json.dump(), the file is empty. If json.dump() writes half the data and the disk fills up, you have a corrupt JSON file.
The Atomic Write Pattern
On POSIX systems (Linux, macOS), os.rename() is atomic - it is a single OS operation that either succeeds completely or fails completely. There is no intermediate state.
import os
import json
import tempfile
def atomic_write_json(path, data):
"""
Write JSON data to path atomically.
The old file is always intact until the new one is complete.
"""
dir_path = os.path.dirname(os.path.abspath(path))
# Step 1: Write to a temp file in the SAME directory
# (same directory = same filesystem = rename is atomic)
fd, tmp_path = tempfile.mkstemp(dir=dir_path, suffix=".tmp")
try:
with os.fdopen(fd, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2)
f.flush()
os.fsync(f.fileno()) # ensure data is on disk before rename
# Step 2: Atomically replace the target file
os.replace(tmp_path, path) # atomic on POSIX; overwrites on Windows
except Exception:
# Clean up temp file on failure
try:
os.unlink(tmp_path)
except OSError:
pass
raise
# Usage
config = {"database": {"host": "localhost", "port": 5432}}
atomic_write_json("/etc/myapp/config.json", config)
# Either the old config.json exists, or the new one does. Never empty.
At no point does a reader see an empty or partial config.json.
:::warning Same filesystem requirement
The temp file and target must be on the same filesystem for os.replace() to be atomic. Using /tmp when your target is on a different partition breaks the atomicity guarantee - the OS would need to copy bytes across filesystems, which is not atomic.
Always create the temp file in the same directory as the target:
dir_path = os.path.dirname(os.path.abspath(target_path))
fd, tmp = tempfile.mkstemp(dir=dir_path)
:::
os.replace() vs os.rename()
# os.rename() - POSIX: atomic if same filesystem, may raise if target exists on some systems
# os.replace() - always atomically replaces, even if target exists (Python 3.3+)
# Use os.replace() for cross-platform atomic replacement
os.replace(tmp_path, target_path)
Part 6 - The tempfile Module
tempfile.mkstemp() - Low-Level Temp File Creation
import tempfile, os
# mkstemp returns (fd, path) - fd is an OS file descriptor (integer)
fd, path = tempfile.mkstemp(
suffix=".txt", # file extension
prefix="report_", # prefix for temp name
dir="/tmp", # directory for temp file
text=True # open in text mode (default is binary)
)
try:
with os.fdopen(fd, "w", encoding="utf-8") as f:
f.write("temporary content")
# Process the temp file
process(path)
finally:
os.unlink(path) # MUST delete manually - mkstemp does not clean up
mkstemp() creates the file and returns an open file descriptor. The file is created with mode 0600 (owner read/write only) for security - no other user can read it.
tempfile.NamedTemporaryFile() - Context-Managed Temp Files
import tempfile
# NamedTemporaryFile: auto-deleted when closed (or context exits)
with tempfile.NamedTemporaryFile(
mode="w",
suffix=".csv",
encoding="utf-8",
delete=True # default: delete on close
) as tmp:
tmp.write("id,name,value\n")
tmp.write("1,Alice,100\n")
tmp_path = tmp.name
# File exists while context is open
print(os.path.exists(tmp_path)) # True
# File is automatically deleted here
print(os.path.exists(tmp_path)) # False
:::note Windows caveat with NamedTemporaryFile
On Windows, a NamedTemporaryFile cannot be opened by another process while it is open (due to Windows file locking). For the atomic write pattern on Windows, use mkstemp() with delete=False and manage cleanup manually.
:::
tempfile.TemporaryDirectory() - Temp Directory
import tempfile, os
with tempfile.TemporaryDirectory() as tmpdir:
# Create files inside the temp directory
config_path = os.path.join(tmpdir, "config.json")
with open(config_path, "w", encoding="utf-8") as f:
json.dump({"key": "value"}, f)
data_path = os.path.join(tmpdir, "data.csv")
with open(data_path, "w", encoding="utf-8") as f:
f.write("id,value\n1,100\n")
# Do processing with the temp files
result = process_directory(tmpdir)
# Entire directory and all contents deleted here - recursive
Part 7 - In-Memory File Objects: io.StringIO and io.BytesIO
io.StringIO - In-Memory Text Stream
import io
# StringIO behaves exactly like a text file, but lives in RAM
buffer = io.StringIO()
buffer.write("Line 1\n")
buffer.write("Line 2\n")
# Read it back
buffer.seek(0)
content = buffer.read()
print(content) # "Line 1\nLine 2\n"
# Get the full string value
value = buffer.getvalue() # works without seeking
buffer.close()
Primary use cases:
- Testing - pass a
StringIOinstead of a real file to functions that accept file objects - Building strings - faster than string concatenation for many small writes (no repeated allocation)
- Capturing output from functions that write to file objects
# Testing a function that writes to a file
import io
def write_report(fileobj, data):
fileobj.write("=== Report ===\n")
for k, v in data.items():
fileobj.write(f"{k}: {v}\n")
# In production:
with open("report.txt", "w", encoding="utf-8") as f:
write_report(f, {"users": 42, "revenue": 1000})
# In tests - no disk I/O:
output = io.StringIO()
write_report(output, {"users": 42, "revenue": 1000})
assert "users: 42" in output.getvalue()
# Capture print() output
from contextlib import redirect_stdout
captured = io.StringIO()
with redirect_stdout(captured):
print("hello")
print("world")
text = captured.getvalue() # "hello\nworld\n"
io.BytesIO - In-Memory Binary Stream
import io, struct
# Build a binary protocol message in memory
buf = io.BytesIO()
buf.write(b"\x89PNG\r\n\x1a\n") # PNG signature
buf.write(struct.pack(">I", 13)) # chunk length
buf.write(b"IHDR") # chunk type
raw_bytes = buf.getvalue()
print(len(raw_bytes)) # 16
print(raw_bytes[:4]) # b'\x89PNG'
# Testing image processing without disk I/O
import io
from PIL import Image # pip install Pillow
def make_thumbnail(image_bytes, max_size=(128, 128)):
"""Create a thumbnail from image bytes, return thumbnail bytes."""
with Image.open(io.BytesIO(image_bytes)) as img:
img.thumbnail(max_size)
output = io.BytesIO()
img.save(output, format="PNG")
return output.getvalue()
# No temp files needed - pure in-memory processing
Part 8 - Encoding for Writing
Always Specify Encoding
# Dangerous - platform-dependent behavior:
with open("output.txt", "w") as f:
f.write("café") # might corrupt on Windows with non-UTF-8 locale
# Correct:
with open("output.txt", "w", encoding="utf-8") as f:
f.write("café") # always writes b'caf\xc3\xa9'
UTF-8 with BOM (utf-8-sig) for Excel Compatibility
# Excel expects a UTF-8 BOM to recognize UTF-8 CSV files correctly
with open("report.csv", "w", encoding="utf-8-sig", newline="") as f:
import csv
writer = csv.writer(f)
writer.writerow(["Name", "Price", "Description"])
writer.writerow(["Müsli", 3.99, "German breakfast cereal"])
writer.writerow(["Café au lait", 4.50, "French coffee drink"])
# Excel will display the umlauts and accents correctly
Errors Parameter for Writing
# 'strict' (default) - raises UnicodeEncodeError if char can't be encoded
with open("ascii_only.txt", "w", encoding="ascii") as f:
f.write("café") # raises UnicodeEncodeError: 'é' can't encode in ascii
# 'replace' - replace unencodable chars with '?'
with open("ascii_only.txt", "w", encoding="ascii", errors="replace") as f:
f.write("café") # writes "caf?" - data lost but no error
# 'ignore' - drop unencodable chars silently
with open("ascii_only.txt", "w", encoding="ascii", errors="ignore") as f:
f.write("café") # writes "caf" - silent loss
# 'xmlcharrefreplace' - encode as XML character references
with open("xml_safe.txt", "w", encoding="ascii", errors="xmlcharrefreplace") as f:
f.write("café") # writes "café"
# 'backslashreplace' - encode as Python escape sequences
with open("escaped.txt", "w", encoding="ascii", errors="backslashreplace") as f:
f.write("café") # writes "caf\xe9"
Part 9 - Real-World Patterns
Pattern 1: Safe Config File Update
import json, os, tempfile
from pathlib import Path
class ConfigManager:
def __init__(self, path):
self.path = Path(path)
def load(self):
if not self.path.exists():
return {}
with open(self.path, "r", encoding="utf-8") as f:
return json.load(f)
def save(self, config):
"""Atomically save config - old file intact if save fails."""
dir_path = self.path.parent
dir_path.mkdir(parents=True, exist_ok=True)
fd, tmp_path = tempfile.mkstemp(
dir=dir_path,
suffix=".tmp",
prefix=f".{self.path.name}."
)
try:
with os.fdopen(fd, "w", encoding="utf-8") as f:
json.dump(config, f, indent=2, sort_keys=True)
f.write("\n") # POSIX convention: files end with newline
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, self.path)
except Exception:
try:
os.unlink(tmp_path)
except OSError:
pass
raise
def update(self, **kwargs):
"""Update specific keys in the config."""
config = self.load()
config.update(kwargs)
self.save(config)
Pattern 2: Rotating Log Writer
import os
from datetime import datetime
class RotatingWriter:
"""Write to date-stamped log files, one per day."""
def __init__(self, log_dir, prefix="app"):
self.log_dir = log_dir
self.prefix = prefix
self._current_file = None
self._current_date = None
os.makedirs(log_dir, exist_ok=True)
def _get_path(self):
today = datetime.now().strftime("%Y-%m-%d")
return os.path.join(self.log_dir, f"{self.prefix}-{today}.log")
def write(self, message):
today = datetime.now().date()
if today != self._current_date:
if self._current_file:
self._current_file.close()
self._current_file = open(
self._get_path(), "a", encoding="utf-8", buffering=1
)
self._current_date = today
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
self._current_file.write(f"[{timestamp}] {message}\n")
def close(self):
if self._current_file:
self._current_file.close()
self._current_file = None
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
# Usage
with RotatingWriter("/var/log/myapp") as logger:
logger.write("Application started")
logger.write("Processing batch 1")
Pattern 3: Generating a Report to Multiple Destinations
import io, sys
def generate_report(output, data):
"""Write a report to any file-like object."""
output.write("Sales Report\n")
output.write("=" * 40 + "\n")
total = 0
for item in data:
line = f"{item['product']:20} ${item['amount']:10.2f}\n"
output.write(line)
total += item["amount"]
output.write("-" * 40 + "\n")
output.write(f"{'TOTAL':20} ${total:10.2f}\n")
data = [
{"product": "Widget A", "amount": 1250.00},
{"product": "Widget B", "amount": 875.50},
{"product": "Widget C", "amount": 2100.25},
]
# Write to disk
with open("report.txt", "w", encoding="utf-8") as f:
generate_report(f, data)
# Write to screen
generate_report(sys.stdout, data)
# Capture in memory (for testing or email)
buffer = io.StringIO()
generate_report(buffer, data)
report_text = buffer.getvalue()
send_email(subject="Daily Report", body=report_text)
Interview Questions
Q1: What is the difference between 'w' and 'a' mode? When would you choose each?
Answer: 'w' mode truncates the file to zero length immediately when open() is called, then writes from the beginning. 'a' mode preserves existing content and positions the write cursor at the end - all writes append. Use 'w' when you want to replace the entire content of a file (generating reports, saving snapshots). Use 'a' for log files, event streams, or any situation where you want to add to existing content without reading it first. For critical files, avoid 'w' and use the atomic write pattern (write to temp, then os.replace()) instead.
Q2: Explain the difference between file.flush() and os.fsync(). When do you need fsync()?
Answer: flush() transfers data from Python's in-process write buffer (a Python bytes object in RAM) to the operating system's page cache (also RAM, but managed by the OS). After flush(), other processes can read the data, but a power failure can still lose it. os.fsync(fd) issues a system call that forces the OS to write its page cache to the physical storage device and waits for the device to confirm. Only after fsync() is the data durable across power failures. You need fsync() when writing critical data: database transaction logs, financial records, user data in write-ahead logs. You do not need it for regeneratable files like build artifacts or application logs where some loss is acceptable.
Q3: Why is the exclusive create mode 'x' safer than checking os.path.exists() before writing?
Answer: The check-then-create pattern has a TOCTOU (time-of-check/time-of-use) race condition. Between os.path.exists() returning False and open('w') executing, another process can create the file. Both processes then open the file in 'w' mode, truncating each other's data. The 'x' mode maps to O_CREAT | O_EXCL at the OS level - a single atomic system call that creates the file only if it does not exist. If the file exists, FileExistsError is raised. There is no window for another process to interfere. This is the correct pattern for lock file creation, cache file creation, and any scenario where exactly-once creation matters.
Q4: Describe the atomic write pattern. Why must the temp file be on the same filesystem as the target?
Answer: The atomic write pattern: (1) write new content to a temporary file in the same directory as the target; (2) fsync() the temp file for durability; (3) call os.replace(tmp_path, target_path) to atomically swap them. On POSIX systems, rename() (which os.replace() uses) is atomic at the filesystem level - it either completes or does not; there is no partial state. This means readers always see either the old complete file or the new complete file, never an empty or truncated version.
The temp file must be on the same filesystem because rename() is only atomic within one filesystem. A cross-filesystem rename must physically copy bytes (which is not atomic) then delete the original. If the temp file is on /tmp and the target is on /data (different mount points), os.replace() may raise OSError: [Errno 18] Invalid cross-device link, or on some systems fall back to a non-atomic copy. Always create temp files using tempfile.mkstemp(dir=target_directory).
Q5: What is io.StringIO and what are its main use cases?
Answer: io.StringIO is an in-memory file object that implements the same interface as a text file but stores data in a str buffer in RAM. It supports read(), write(), readline(), seek(), tell(), and getvalue(). Main use cases: (1) Testing - pass a StringIO to functions that accept file objects to test without disk I/O; (2) Output capture - use with contextlib.redirect_stdout to capture print() output; (3) String building - for many small write() calls, StringIO is more efficient than str concatenation because it avoids repeated string allocation; (4) Protocol generation - build formatted text in memory before sending over a network socket.
Q6: You are writing a log file from multiple threads. What issues arise and how do you address them?
Answer: Two main issues: (1) Interleaving - two threads call f.write() concurrently; the OS may interleave their bytes mid-write, producing garbled log entries. (2) GIL protection - in CPython, the GIL serializes Python bytecode execution, so a single f.write() call on a Python str is not interrupted. But for multi-line writes (write header, write body), the GIL may be released between calls, allowing interleaving.
Solutions: (a) Use logging module - the standard library logging handlers are thread-safe; they use a threading.Lock internally. (b) For custom writers, use a threading.Lock around write operations. (c) Use a queue: each thread puts log records into a queue.Queue; a single writer thread consumes the queue and writes - serializing all I/O to one thread. (d) For append-only logs, 'a' mode with O_APPEND is atomic for single writes up to PIPE_BUF bytes (4096 bytes on Linux) per POSIX spec. Larger writes are still not atomic.
Practice Challenges
Beginner - Write a Formatted Report
Write a function write_inventory_report(filepath, items) that takes a list of dicts with keys "name", "count", "price" and writes a formatted report to filepath. The report should have a header, one row per item, and a footer with the total value.
Solution
def write_inventory_report(filepath, items):
"""
Write a formatted inventory report to filepath.
items: list of dicts with keys 'name', 'count', 'price'
"""
with open(filepath, "w", encoding="utf-8") as f:
# Header
f.write("Inventory Report\n")
f.write("=" * 50 + "\n")
f.write(f"{'Item Name':25} {'Count':8} {'Price':10} {'Total':12}\n")
f.write("-" * 50 + "\n")
# Rows
grand_total = 0.0
for item in items:
total = item["count"] * item["price"]
grand_total += total
f.write(
f"{item['name']:25} {item['count']:8d} "
f"${item['price']:9.2f} ${total:11.2f}\n"
)
# Footer
f.write("=" * 50 + "\n")
f.write(f"{'GRAND TOTAL':44} ${grand_total:11.2f}\n")
# Test it
import os, tempfile
items = [
{"name": "Widget Alpha", "count": 50, "price": 9.99},
{"name": "Widget Beta", "count": 12, "price": 24.99},
{"name": "Widget Gamma", "count": 200, "price": 2.49},
]
with tempfile.NamedTemporaryFile("w", suffix=".txt", delete=False) as tmp:
tmp_path = tmp.name
write_inventory_report(tmp_path, items)
with open(tmp_path, "r", encoding="utf-8") as f:
print(f.read())
os.unlink(tmp_path)
Output:
Inventory Report
==================================================
Item Name Count Price Total
--------------------------------------------------
Widget Alpha 50 $9.99 $499.50
Widget Beta 12 $24.99 $299.88
Widget Gamma 200 $2.49 $498.00
==================================================
GRAND TOTAL $1297.38
Intermediate - Atomic Config Update
Implement a ConfigStore class with get(key, default=None) and set(key, value) methods that persists data to JSON. The set() method must be atomic - if the process is killed during a write, the config file must not be corrupted.
Solution
import json
import os
import tempfile
from pathlib import Path
class ConfigStore:
"""
A JSON-backed key-value store with atomic writes.
Guaranteed: the config file is never in a corrupt or empty state.
"""
def __init__(self, path):
self.path = Path(path)
self.path.parent.mkdir(parents=True, exist_ok=True)
def _load(self):
"""Load the current config, return empty dict if file missing."""
if not self.path.exists():
return {}
try:
with open(self.path, "r", encoding="utf-8") as f:
return json.load(f)
except (json.JSONDecodeError, OSError):
return {}
def _save(self, data):
"""Atomically write data to the config file."""
dir_path = self.path.parent
fd, tmp_path = tempfile.mkstemp(
dir=dir_path,
prefix=f".{self.path.stem}_",
suffix=".tmp"
)
try:
with os.fdopen(fd, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2, sort_keys=True)
f.write("\n")
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, self.path)
except Exception:
try:
os.unlink(tmp_path)
except OSError:
pass
raise
def get(self, key, default=None):
"""Retrieve a value by key."""
return self._load().get(key, default)
def set(self, key, value):
"""Set a key atomically."""
config = self._load()
config[key] = value
self._save(config)
def delete(self, key):
"""Delete a key atomically."""
config = self._load()
config.pop(key, None)
self._save(config)
def all(self):
"""Return all key-value pairs."""
return self._load()
# Test
import tempfile, os
with tempfile.TemporaryDirectory() as tmpdir:
store = ConfigStore(os.path.join(tmpdir, "app", "config.json"))
store.set("db_host", "localhost")
store.set("db_port", 5432)
store.set("debug", True)
print(store.get("db_host")) # localhost
print(store.get("missing", "default_value")) # default_value
print(store.all())
# {'db_host': 'localhost', 'db_port': 5432, 'debug': True}
store.delete("debug")
print(store.all())
# {'db_host': 'localhost', 'db_port': 5432}
Advanced - Streaming Report Generator with Multiple Output Targets
Build a ReportGenerator class that can write a sales report to multiple destinations simultaneously: a file, an in-memory buffer, and optionally gzip-compressed output.
Requirements:
- Accepts an arbitrary number of output targets (file objects or file paths)
- Streams data - does not build the entire report in memory
- Supports gzip output if the target path ends with
.gz - Uses
writelines()with a generator for efficiency
Solution
import io
import os
import gzip
import tempfile
from contextlib import contextmanager, ExitStack
from typing import Union
class ReportGenerator:
"""
Streams a report to multiple output destinations simultaneously.
Uses generator-based writelines() for memory efficiency.
"""
def __init__(self, data):
self.data = data # iterable of row dicts
def _open_target(self, target):
"""Return a (file_obj, should_close) tuple for a target."""
if isinstance(target, (str, os.PathLike)):
path = str(target)
if path.endswith(".gz"):
return gzip.open(path, "wt", encoding="utf-8"), True
else:
return open(path, "w", encoding="utf-8"), True
else:
# Assume it's already a file-like object
return target, False
def _generate_lines(self):
"""Generator yielding report lines one at a time."""
yield "=" * 60 + "\n"
yield "SALES REPORT\n"
yield "=" * 60 + "\n"
yield f"{'Product':25} {'Qty':6} {'Unit Price':12} {'Total':12}\n"
yield "-" * 60 + "\n"
grand_total = 0.0
item_count = 0
for row in self.data:
total = row["qty"] * row["price"]
grand_total += total
item_count += 1
yield (
f"{row['product']:25} {row['qty']:6d} "
f"${row['price']:11.2f} ${total:11.2f}\n"
)
yield "-" * 60 + "\n"
yield f"{'Items:':25} {item_count:6d}\n"
yield f"{'Grand Total:':39} ${grand_total:11.2f}\n"
yield "=" * 60 + "\n"
def write(self, *targets):
"""Write report to all targets simultaneously."""
opened_files = []
try:
file_objects = []
for target in targets:
f, should_close = self._open_target(target)
file_objects.append(f)
if should_close:
opened_files.append(f)
# Stream each line to all targets
for line in self._generate_lines():
for f in file_objects:
f.write(line)
finally:
for f in opened_files:
try:
f.close()
except OSError:
pass
# --- Test ---
import tempfile, os
sales_data = [
{"product": "Python Course", "qty": 142, "price": 99.00},
{"product": "AI Fundamentals", "qty": 87, "price": 149.00},
{"product": "Cloud Workshop", "qty": 34, "price": 299.00},
{"product": "Docker Bootcamp", "qty": 201, "price": 79.00},
]
gen = ReportGenerator(sales_data)
# Write to file, compressed file, and in-memory buffer simultaneously
mem_buffer = io.StringIO()
with tempfile.TemporaryDirectory() as tmpdir:
txt_path = os.path.join(tmpdir, "report.txt")
gz_path = os.path.join(tmpdir, "report.txt.gz")
gen.write(txt_path, gz_path, mem_buffer)
# Read plain text
with open(txt_path, "r", encoding="utf-8") as f:
print(f.read())
# Read gzip
with gzip.open(gz_path, "rt", encoding="utf-8") as f:
print("Gzip size:", os.path.getsize(gz_path), "bytes")
# In-memory
report_text = mem_buffer.getvalue()
assert "Grand Total" in report_text
print("In-memory report:", len(report_text), "chars")
Output:
============================================================
SALES REPORT
============================================================
Product Qty Unit Price Total
------------------------------------------------------------
Python Course 142 $99.00 $14058.00
AI Fundamentals 87 $149.00 $12963.00
Cloud Workshop 34 $299.00 $10166.00
Docker Bootcamp 201 $79.00 $15879.00
------------------------------------------------------------
Items: 4
Grand Total: $53066.00
============================================================
Gzip size: 271 bytes
In-memory report: 393 chars
Quick Reference
| Task | Code | Notes |
|---|---|---|
| Write (replacing) | open(path, "w", encoding="utf-8") | Truncates immediately on open |
| Append to file | open(path, "a", encoding="utf-8") | Preserves content, writes at end |
| Create only if new | open(path, "x", encoding="utf-8") | Raises FileExistsError if exists |
| Write binary | open(path, "wb") | No encoding parameter |
| Write one string | f.write("text\n") | No automatic newline |
| Write many strings | f.writelines(iterable) | No automatic newlines added |
| Flush Python buffer | f.flush() | Data reaches OS page cache |
| Sync to disk | os.fsync(f.fileno()) | Durable across power failure |
| Atomic file update | write to temp + os.replace(tmp, target) | Same directory/filesystem required |
| Temp file (auto-delete) | tempfile.NamedTemporaryFile() | Deleted on close |
| Temp file (manual delete) | fd, path = tempfile.mkstemp() | Delete with os.unlink(path) |
| In-memory text stream | io.StringIO() | Testing, output capture |
| In-memory binary stream | io.BytesIO() | Binary protocol building |
| UTF-8 with Excel BOM | encoding="utf-8-sig" | For CSV files opened in Excel |
| Replace bad chars | errors="replace" | Writes ? for unencodable chars |
| Temp directory | tempfile.TemporaryDirectory() | All contents deleted on exit |
Key Takeaways
'w'mode truncates the file atopen()time - before anywrite()call. For critical files, always use the atomic write pattern: write to a temp file in the same directory, thenos.replace(tmp, target).file.flush()moves data from Python's buffer to the OS page cache - not to disk.os.fsync()moves it to disk. Only afterfsync()is data durable across power failures.'x'(exclusive create) mode is atomic - it maps toO_CREAT | O_EXCLand is the correct way to create a file only if it does not exist, without race conditions.'a'mode withO_APPENDmakes individualwrite()calls atomic at the OS level for writes up toPIPE_BUFbytes, making it safe for concurrent log writing from multiple processes.tempfile.mkstemp()creates secure temp files (mode 0600) and returns an OS file descriptor; you must close and delete it manually.NamedTemporaryFile()auto-deletes on close.io.StringIOandio.BytesIOimplement the full file interface in memory - use them for testing functions that accept file objects, without touching the filesystem.- For memory-efficient output of large datasets, use
f.writelines(generator_function())- the generator produces one line at a time without buffering the entire output in RAM.
