Writing Files - Modes, Atomicity, and Safe File Updates

Reading time: ~17 minutes | Level: Foundation → Engineering

Here is a scenario that has caused production outages:

# Updating a configuration file
with open("config.json", "w") as f:
    f.write(json.dumps(config))
    # Program crashes here (power cut, SIGKILL, disk full)

# Result: config.json is now empty or half-written.
# The server reads it on startup → crashes → you get a 3 AM page.

'w' mode truncates the file before writing. If your write is interrupted, you lose both the old content and the new content. The file is gone.

Production systems use a different pattern - write to a temporary file, then rename it atomically. This page explains why, and every other piece of file-writing knowledge you need to write robust code.

What You Will Learn

All write modes: 'w' (truncate), 'a' (append), 'x' (exclusive create), 'w+', 'a+'
write() vs writelines() - behavioral differences and when to use each
Flushing and syncing: what file.flush() does vs os.fsync() - the gap between them
How write buffering can silently lose data on crash
The atomic write pattern: write-to-temp-then-rename for crash-safe updates
Writing binary data: 'wb' mode, struct module, encoding in binary contexts
Exclusive create mode 'x': race-condition-free file creation
The tempfile module: NamedTemporaryFile, mkstemp for safe temporary file creation
io.StringIO and io.BytesIO: in-memory file objects for testing
Real-world patterns: safe config updates, atomic log rotation, report generation

Prerequisites

Understanding of open() and file modes from the Reading Files section
Basic understanding of Python exceptions and with statements
Familiarity with os module basics

Mental Model: What Happens When You Write

The path from your Python f.write("data") call to bits on disk is longer than you think:

Power failure between any two layers = data loss!

f.flush() moves data from Layer 1 to Layer 2. os.fsync() moves data from Layer 2 to Layer 3. Only after fsync() is the data truly durable.

Part 1 - Write Modes in Depth

All Write Modes at a Glance

Mode	File Exists	File Missing	After `open()`
`'w'`	Truncate	Create new	Position at 0 (empty file)
`'a'`	Preserve	Create new	Position at end
`'x'`	Raise Error	Create new	Position at 0 (new file)
`'w+'`	Truncate	Create new	Read+write, position at 0
`'a+'`	Preserve	Create new	Read+write, writes at end
`'r+'`	Preserve	Raise Error	Read+write, position at 0

`'w'` - Write (Truncate)

# 'w' creates the file if missing, TRUNCATES IT if it exists
with open("output.txt", "w", encoding="utf-8") as f:
    f.write("First line\n")
    f.write("Second line\n")

# File now contains exactly "First line\nSecond line\n"
# Whatever was there before is gone

:::danger 'w' mode truncates immediately The truncation happens when open() is called, not when write() is called. So:

f = open("important.txt", "w")   # file is now EMPTY at this point
# crash here - important.txt is now empty forever
f.write("data")
f.close()

Always write to a temp file first, then rename. See Part 5 for the pattern. :::

`'a'` - Append

# 'a' always writes to the end - existing content is preserved
with open("access.log", "a", encoding="utf-8") as f:
    f.write("2024-01-15 14:32:01 GET /api/users 200\n")

# Subsequent writes append to the end
with open("access.log", "a", encoding="utf-8") as f:
    f.write("2024-01-15 14:32:02 POST /api/login 401\n")

'a' mode is safe for appending to logs because:

On POSIX systems, O_APPEND makes each write atomic at the OS level - even if multiple processes write simultaneously, writes do not interleave within a single write() call
Existing content is never truncated

However, 'a' mode cannot be used to update existing content - writes always go to the end regardless of f.seek() calls.

`'x'` - Exclusive Create

# 'x' fails if the file already exists - race-condition-free creation
try:
    with open("config.lock", "x", encoding="utf-8") as f:
        f.write(str(os.getpid()))
    print("Lock acquired")
except FileExistsError:
    print("Another process holds the lock")

'x' maps to O_CREAT | O_EXCL at the OS level - a single atomic operation. Without 'x', a check-then-create pattern (if not os.path.exists() then open('w')) has a time-of-check/time-of-use (TOCTOU) race condition between the check and the create.

# WRONG - race condition (TOCTOU):
if not os.path.exists("lockfile"):
    # Another process could create it here!
    with open("lockfile", "w") as f:
        f.write("locked")

# CORRECT - atomic:
try:
    with open("lockfile", "x") as f:
        f.write("locked")
except FileExistsError:
    pass   # already locked

Part 2 - `write()` vs `writelines()`

`write(string)` - Write a String

with open("data.txt", "w", encoding="utf-8") as f:
    f.write("Hello, World!\n")    # returns 14 (number of chars written)
    f.write("Second line\n")
    # No automatic newline added - you must include it yourself

write() returns the number of characters (text mode) or bytes (binary mode) written. This is almost always the full length - partial writes are an edge case in network I/O but not in regular file I/O.

`writelines(iterable)` - Write an Iterable of Strings

lines = ["Line 1\n", "Line 2\n", "Line 3\n"]

with open("data.txt", "w", encoding="utf-8") as f:
    f.writelines(lines)
# File: "Line 1\nLine 2\nLine 3\n"

writelines() is a convenience method. It calls write() for each item. Crucially, it does not add newlines - that is your responsibility.

# Common mistake: forgetting newlines with writelines
words = ["apple", "banana", "cherry"]

# Wrong - all on one line:
with open("words.txt", "w", encoding="utf-8") as f:
    f.writelines(words)
# File: "applebananacherry"

# Correct - add newlines:
with open("words.txt", "w", encoding="utf-8") as f:
    f.writelines(word + "\n" for word in words)
# File: "apple\nbanana\ncherry\n"

Performance: `write()` vs `writelines()` vs `join()`+`write()`

import timeit, io

lines = [f"line {i}\n" for i in range(100000)]

# Option 1: repeated write()
def method_write():
    with io.StringIO() as f:
        for line in lines:
            f.write(line)

# Option 2: writelines()
def method_writelines():
    with io.StringIO() as f:
        f.writelines(lines)

# Option 3: join then single write()
def method_join():
    with io.StringIO() as f:
        f.write("".join(lines))

# All three are fast due to buffering; join() uses more peak RAM
# For very large iterables, writelines() with a generator is most memory-efficient

:::tip For large outputs, use writelines() with a generator

def generate_report_lines(data):
    yield "Report Header\n"
    yield "=" * 40 + "\n"
    for item in data:
        yield f"{item['name']:30} {item['value']:10.2f}\n"
    yield "=" * 40 + "\n"
    yield f"Total: {sum(d['value'] for d in data):.2f}\n"

with open("report.txt", "w", encoding="utf-8") as f:
    f.writelines(generate_report_lines(dataset))
# Memory: O(1) per line, not O(len(dataset))

:::

Part 3 - Flushing and Syncing: The Full Story

`file.flush()` - Python Buffer to OS Buffer

with open("realtime.log", "w", encoding="utf-8", buffering=1) as f:
    # buffering=1 means line-buffered in text mode
    for event in event_stream:
        f.write(f"{event}\n")
        f.flush()   # ensures this line is in the OS buffer immediately
        # Another process can now read this line from the file

flush() sends Python's internal buffer to the operating system's page cache. After flush(), another process reading the file will see the data. However, the data is still in RAM (the OS page cache) - not necessarily on disk. A power failure can still lose it.

`os.fsync()` - OS Buffer to Disk

import os

with open("critical.db", "w", encoding="utf-8") as f:
    f.write(serialized_data)
    f.flush()              # flush Python buffer to OS
    os.fsync(f.fileno())   # force OS to write to physical disk
# After fsync(), the data survives a power failure

os.fsync() issues a fsync(2) system call. This tells the OS to flush its page cache to the storage device and wait until the device confirms the write. It is slow (milliseconds) but guarantees durability.

Use fsync() for: database write-ahead logs (WAL), financial transactions, any data where loss = correctness violation.

Skip fsync() for: temporary files, generated reports (recreatable), logs (some loss acceptable), development/test output.

`os.fdatasync()` - Faster fsync Without Metadata

import os

# fdatasync() syncs data blocks but NOT file metadata (timestamps, etc.)
# Faster than fsync() when you only care about data durability
with open("data.bin", "wb") as f:
    f.write(data)
    f.flush()
    os.fdatasync(f.fileno())   # Linux only; use fsync() on macOS/Windows

Part 4 - Writing Binary Data

Binary Write Mode `'wb'`

# Write raw bytes
with open("data.bin", "wb") as f:
    data = b"\x89PNG\r\n\x1a\n"   # PNG file signature
    f.write(data)

# You CANNOT write str to a binary file
with open("data.bin", "wb") as f:
    f.write("hello")   # TypeError: a bytes-like object is required, not 'str'

`struct` Module for Binary Format Writing

import struct

# Pack binary data into a fixed-format byte sequence
# Format string: '<' = little-endian, 'I' = uint32, 'f' = float32, '10s' = 10-byte string
RECORD_FORMAT = "<If10s"
RECORD_SIZE = struct.calcsize(RECORD_FORMAT)   # 18 bytes per record

def write_binary_records(filepath, records):
    """Write a binary database file with fixed-size records."""
    with open(filepath, "wb") as f:
        for record_id, value, name in records:
            name_bytes = name.encode("utf-8")[:10].ljust(10, b"\x00")
            packed = struct.pack(RECORD_FORMAT, record_id, value, name_bytes)
            f.write(packed)

def read_binary_records(filepath):
    """Read all records from the binary file."""
    records = []
    with open(filepath, "rb") as f:
        while True:
            raw = f.read(RECORD_SIZE)
            if len(raw) < RECORD_SIZE:
                break
            record_id, value, name_bytes = struct.unpack(RECORD_FORMAT, raw)
            name = name_bytes.rstrip(b"\x00").decode("utf-8")
            records.append((record_id, value, name))
    return records

# Usage
data = [(1, 98.6, "Alice"), (2, 37.2, "Bob"), (3, 102.1, "Charlie")]
write_binary_records("/tmp/patients.bin", data)
recovered = read_binary_records("/tmp/patients.bin")
print(recovered)
# [(1, 98.6000..., 'Alice'), (2, 37.2000..., 'Bob'), (3, 102.0999..., 'Charlie')]

Encoding in Binary Contexts

# Explicitly encode before writing in binary mode
text = "Hello, café!"
encoded = text.encode("utf-8")    # b'Hello, caf\xc3\xa9!'

with open("output.bin", "wb") as f:
    # Write a length-prefixed string (common binary protocol pattern)
    length = len(encoded)
    f.write(struct.pack("<I", length))   # 4-byte little-endian length
    f.write(encoded)                     # the string bytes

# Read it back
with open("output.bin", "rb") as f:
    length_bytes = f.read(4)
    length = struct.unpack("<I", length_bytes)[0]
    data = f.read(length).decode("utf-8")
    print(data)   # Hello, café!

Part 5 - Atomic Writes: The Critical Pattern

The Problem with Direct `'w'` Mode Updates

# DANGEROUS for critical files:
def update_config(path, new_config):
    with open(path, "w", encoding="utf-8") as f:  # TRUNCATES HERE
        import json
        json.dump(new_config, f, indent=2)         # if crash here → empty file

If the process is killed between open() and json.dump(), the file is empty. If json.dump() writes half the data and the disk fills up, you have a corrupt JSON file.

The Atomic Write Pattern

On POSIX systems (Linux, macOS), os.rename() is atomic - it is a single OS operation that either succeeds completely or fails completely. There is no intermediate state.

import os
import json
import tempfile

def atomic_write_json(path, data):
    """
    Write JSON data to path atomically.
    The old file is always intact until the new one is complete.
    """
    dir_path = os.path.dirname(os.path.abspath(path))

    # Step 1: Write to a temp file in the SAME directory
    # (same directory = same filesystem = rename is atomic)
    fd, tmp_path = tempfile.mkstemp(dir=dir_path, suffix=".tmp")

    try:
        with os.fdopen(fd, "w", encoding="utf-8") as f:
            json.dump(data, f, indent=2)
            f.flush()
            os.fsync(f.fileno())   # ensure data is on disk before rename

        # Step 2: Atomically replace the target file
        os.replace(tmp_path, path)   # atomic on POSIX; overwrites on Windows

    except Exception:
        # Clean up temp file on failure
        try:
            os.unlink(tmp_path)
        except OSError:
            pass
        raise

# Usage
config = {"database": {"host": "localhost", "port": 5432}}
atomic_write_json("/etc/myapp/config.json", config)
# Either the old config.json exists, or the new one does. Never empty.

At no point does a reader see an empty or partial config.json.

:::warning Same filesystem requirement The temp file and target must be on the same filesystem for os.replace() to be atomic. Using /tmp when your target is on a different partition breaks the atomicity guarantee - the OS would need to copy bytes across filesystems, which is not atomic.

Always create the temp file in the same directory as the target:

dir_path = os.path.dirname(os.path.abspath(target_path))
fd, tmp = tempfile.mkstemp(dir=dir_path)

:::

`os.replace()` vs `os.rename()`

# os.rename() - POSIX: atomic if same filesystem, may raise if target exists on some systems
# os.replace() - always atomically replaces, even if target exists (Python 3.3+)
# Use os.replace() for cross-platform atomic replacement
os.replace(tmp_path, target_path)

Part 6 - The `tempfile` Module

`tempfile.mkstemp()` - Low-Level Temp File Creation

import tempfile, os

# mkstemp returns (fd, path) - fd is an OS file descriptor (integer)
fd, path = tempfile.mkstemp(
    suffix=".txt",     # file extension
    prefix="report_",  # prefix for temp name
    dir="/tmp",        # directory for temp file
    text=True          # open in text mode (default is binary)
)

try:
    with os.fdopen(fd, "w", encoding="utf-8") as f:
        f.write("temporary content")
    # Process the temp file
    process(path)
finally:
    os.unlink(path)   # MUST delete manually - mkstemp does not clean up

mkstemp() creates the file and returns an open file descriptor. The file is created with mode 0600 (owner read/write only) for security - no other user can read it.

`tempfile.NamedTemporaryFile()` - Context-Managed Temp Files

import tempfile

# NamedTemporaryFile: auto-deleted when closed (or context exits)
with tempfile.NamedTemporaryFile(
    mode="w",
    suffix=".csv",
    encoding="utf-8",
    delete=True   # default: delete on close
) as tmp:
    tmp.write("id,name,value\n")
    tmp.write("1,Alice,100\n")
    tmp_path = tmp.name

    # File exists while context is open
    print(os.path.exists(tmp_path))   # True

# File is automatically deleted here
print(os.path.exists(tmp_path))   # False

:::note Windows caveat with NamedTemporaryFile On Windows, a NamedTemporaryFile cannot be opened by another process while it is open (due to Windows file locking). For the atomic write pattern on Windows, use mkstemp() with delete=False and manage cleanup manually. :::

`tempfile.TemporaryDirectory()` - Temp Directory

import tempfile, os

with tempfile.TemporaryDirectory() as tmpdir:
    # Create files inside the temp directory
    config_path = os.path.join(tmpdir, "config.json")
    with open(config_path, "w", encoding="utf-8") as f:
        json.dump({"key": "value"}, f)

    data_path = os.path.join(tmpdir, "data.csv")
    with open(data_path, "w", encoding="utf-8") as f:
        f.write("id,value\n1,100\n")

    # Do processing with the temp files
    result = process_directory(tmpdir)

# Entire directory and all contents deleted here - recursive

Part 7 - In-Memory File Objects: `io.StringIO` and `io.BytesIO`

`io.StringIO` - In-Memory Text Stream

import io

# StringIO behaves exactly like a text file, but lives in RAM
buffer = io.StringIO()
buffer.write("Line 1\n")
buffer.write("Line 2\n")

# Read it back
buffer.seek(0)
content = buffer.read()
print(content)   # "Line 1\nLine 2\n"

# Get the full string value
value = buffer.getvalue()   # works without seeking
buffer.close()

Primary use cases:

Testing - pass a StringIO instead of a real file to functions that accept file objects
Building strings - faster than string concatenation for many small writes (no repeated allocation)
Capturing output from functions that write to file objects

# Testing a function that writes to a file
import io

def write_report(fileobj, data):
    fileobj.write("=== Report ===\n")
    for k, v in data.items():
        fileobj.write(f"{k}: {v}\n")

# In production:
with open("report.txt", "w", encoding="utf-8") as f:
    write_report(f, {"users": 42, "revenue": 1000})

# In tests - no disk I/O:
output = io.StringIO()
write_report(output, {"users": 42, "revenue": 1000})
assert "users: 42" in output.getvalue()

# Capture print() output
from contextlib import redirect_stdout

captured = io.StringIO()
with redirect_stdout(captured):
    print("hello")
    print("world")

text = captured.getvalue()   # "hello\nworld\n"

`io.BytesIO` - In-Memory Binary Stream

import io, struct

# Build a binary protocol message in memory
buf = io.BytesIO()
buf.write(b"\x89PNG\r\n\x1a\n")       # PNG signature
buf.write(struct.pack(">I", 13))       # chunk length
buf.write(b"IHDR")                     # chunk type

raw_bytes = buf.getvalue()
print(len(raw_bytes))   # 16
print(raw_bytes[:4])    # b'\x89PNG'

# Testing image processing without disk I/O
import io
from PIL import Image   # pip install Pillow

def make_thumbnail(image_bytes, max_size=(128, 128)):
    """Create a thumbnail from image bytes, return thumbnail bytes."""
    with Image.open(io.BytesIO(image_bytes)) as img:
        img.thumbnail(max_size)
        output = io.BytesIO()
        img.save(output, format="PNG")
        return output.getvalue()

# No temp files needed - pure in-memory processing

Part 8 - Encoding for Writing

Always Specify Encoding

# Dangerous - platform-dependent behavior:
with open("output.txt", "w") as f:
    f.write("café")   # might corrupt on Windows with non-UTF-8 locale

# Correct:
with open("output.txt", "w", encoding="utf-8") as f:
    f.write("café")   # always writes b'caf\xc3\xa9'

UTF-8 with BOM (`utf-8-sig`) for Excel Compatibility

# Excel expects a UTF-8 BOM to recognize UTF-8 CSV files correctly
with open("report.csv", "w", encoding="utf-8-sig", newline="") as f:
    import csv
    writer = csv.writer(f)
    writer.writerow(["Name", "Price", "Description"])
    writer.writerow(["Müsli", 3.99, "German breakfast cereal"])
    writer.writerow(["Café au lait", 4.50, "French coffee drink"])

# Excel will display the umlauts and accents correctly

Errors Parameter for Writing

# 'strict' (default) - raises UnicodeEncodeError if char can't be encoded
with open("ascii_only.txt", "w", encoding="ascii") as f:
    f.write("café")   # raises UnicodeEncodeError: 'é' can't encode in ascii

# 'replace' - replace unencodable chars with '?'
with open("ascii_only.txt", "w", encoding="ascii", errors="replace") as f:
    f.write("café")   # writes "caf?" - data lost but no error

# 'ignore' - drop unencodable chars silently
with open("ascii_only.txt", "w", encoding="ascii", errors="ignore") as f:
    f.write("café")   # writes "caf" - silent loss

# 'xmlcharrefreplace' - encode as XML character references
with open("xml_safe.txt", "w", encoding="ascii", errors="xmlcharrefreplace") as f:
    f.write("café")   # writes "caf&#233;"

# 'backslashreplace' - encode as Python escape sequences
with open("escaped.txt", "w", encoding="ascii", errors="backslashreplace") as f:
    f.write("café")   # writes "caf\xe9"

Part 9 - Real-World Patterns

Pattern 1: Safe Config File Update

import json, os, tempfile
from pathlib import Path

class ConfigManager:
    def __init__(self, path):
        self.path = Path(path)

    def load(self):
        if not self.path.exists():
            return {}
        with open(self.path, "r", encoding="utf-8") as f:
            return json.load(f)

    def save(self, config):
        """Atomically save config - old file intact if save fails."""
        dir_path = self.path.parent
        dir_path.mkdir(parents=True, exist_ok=True)

        fd, tmp_path = tempfile.mkstemp(
            dir=dir_path,
            suffix=".tmp",
            prefix=f".{self.path.name}."
        )
        try:
            with os.fdopen(fd, "w", encoding="utf-8") as f:
                json.dump(config, f, indent=2, sort_keys=True)
                f.write("\n")   # POSIX convention: files end with newline
                f.flush()
                os.fsync(f.fileno())

            os.replace(tmp_path, self.path)
        except Exception:
            try:
                os.unlink(tmp_path)
            except OSError:
                pass
            raise

    def update(self, **kwargs):
        """Update specific keys in the config."""
        config = self.load()
        config.update(kwargs)
        self.save(config)

Pattern 2: Rotating Log Writer

import os
from datetime import datetime

class RotatingWriter:
    """Write to date-stamped log files, one per day."""

    def __init__(self, log_dir, prefix="app"):
        self.log_dir = log_dir
        self.prefix = prefix
        self._current_file = None
        self._current_date = None
        os.makedirs(log_dir, exist_ok=True)

    def _get_path(self):
        today = datetime.now().strftime("%Y-%m-%d")
        return os.path.join(self.log_dir, f"{self.prefix}-{today}.log")

    def write(self, message):
        today = datetime.now().date()
        if today != self._current_date:
            if self._current_file:
                self._current_file.close()
            self._current_file = open(
                self._get_path(), "a", encoding="utf-8", buffering=1
            )
            self._current_date = today

        timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        self._current_file.write(f"[{timestamp}] {message}\n")

    def close(self):
        if self._current_file:
            self._current_file.close()
            self._current_file = None

    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.close()


# Usage
with RotatingWriter("/var/log/myapp") as logger:
    logger.write("Application started")
    logger.write("Processing batch 1")

Pattern 3: Generating a Report to Multiple Destinations

import io, sys

def generate_report(output, data):
    """Write a report to any file-like object."""
    output.write("Sales Report\n")
    output.write("=" * 40 + "\n")
    total = 0
    for item in data:
        line = f"{item['product']:20} ${item['amount']:10.2f}\n"
        output.write(line)
        total += item["amount"]
    output.write("-" * 40 + "\n")
    output.write(f"{'TOTAL':20} ${total:10.2f}\n")

data = [
    {"product": "Widget A", "amount": 1250.00},
    {"product": "Widget B", "amount": 875.50},
    {"product": "Widget C", "amount": 2100.25},
]

# Write to disk
with open("report.txt", "w", encoding="utf-8") as f:
    generate_report(f, data)

# Write to screen
generate_report(sys.stdout, data)

# Capture in memory (for testing or email)
buffer = io.StringIO()
generate_report(buffer, data)
report_text = buffer.getvalue()
send_email(subject="Daily Report", body=report_text)

Interview Questions

Q1: What is the difference between `'w'` and `'a'` mode? When would you choose each?

Answer: 'w' mode truncates the file to zero length immediately when open() is called, then writes from the beginning. 'a' mode preserves existing content and positions the write cursor at the end - all writes append. Use 'w' when you want to replace the entire content of a file (generating reports, saving snapshots). Use 'a' for log files, event streams, or any situation where you want to add to existing content without reading it first. For critical files, avoid 'w' and use the atomic write pattern (write to temp, then os.replace()) instead.

Q2: Explain the difference between `file.flush()` and `os.fsync()`. When do you need `fsync()`?

Answer: flush() transfers data from Python's in-process write buffer (a Python bytes object in RAM) to the operating system's page cache (also RAM, but managed by the OS). After flush(), other processes can read the data, but a power failure can still lose it. os.fsync(fd) issues a system call that forces the OS to write its page cache to the physical storage device and waits for the device to confirm. Only after fsync() is the data durable across power failures. You need fsync() when writing critical data: database transaction logs, financial records, user data in write-ahead logs. You do not need it for regeneratable files like build artifacts or application logs where some loss is acceptable.

Q3: Why is the exclusive create mode `'x'` safer than checking `os.path.exists()` before writing?

Answer: The check-then-create pattern has a TOCTOU (time-of-check/time-of-use) race condition. Between os.path.exists() returning False and open('w') executing, another process can create the file. Both processes then open the file in 'w' mode, truncating each other's data. The 'x' mode maps to O_CREAT | O_EXCL at the OS level - a single atomic system call that creates the file only if it does not exist. If the file exists, FileExistsError is raised. There is no window for another process to interfere. This is the correct pattern for lock file creation, cache file creation, and any scenario where exactly-once creation matters.

Q4: Describe the atomic write pattern. Why must the temp file be on the same filesystem as the target?

Answer: The atomic write pattern: (1) write new content to a temporary file in the same directory as the target; (2) fsync() the temp file for durability; (3) call os.replace(tmp_path, target_path) to atomically swap them. On POSIX systems, rename() (which os.replace() uses) is atomic at the filesystem level - it either completes or does not; there is no partial state. This means readers always see either the old complete file or the new complete file, never an empty or truncated version.

The temp file must be on the same filesystem because rename() is only atomic within one filesystem. A cross-filesystem rename must physically copy bytes (which is not atomic) then delete the original. If the temp file is on /tmp and the target is on /data (different mount points), os.replace() may raise OSError: [Errno 18] Invalid cross-device link, or on some systems fall back to a non-atomic copy. Always create temp files using tempfile.mkstemp(dir=target_directory).

Q5: What is `io.StringIO` and what are its main use cases?

Answer: io.StringIO is an in-memory file object that implements the same interface as a text file but stores data in a str buffer in RAM. It supports read(), write(), readline(), seek(), tell(), and getvalue(). Main use cases: (1) Testing - pass a StringIO to functions that accept file objects to test without disk I/O; (2) Output capture - use with contextlib.redirect_stdout to capture print() output; (3) String building - for many small write() calls, StringIO is more efficient than str concatenation because it avoids repeated string allocation; (4) Protocol generation - build formatted text in memory before sending over a network socket.

Q6: You are writing a log file from multiple threads. What issues arise and how do you address them?

Answer: Two main issues: (1) Interleaving - two threads call f.write() concurrently; the OS may interleave their bytes mid-write, producing garbled log entries. (2) GIL protection - in CPython, the GIL serializes Python bytecode execution, so a single f.write() call on a Python str is not interrupted. But for multi-line writes (write header, write body), the GIL may be released between calls, allowing interleaving.

Solutions: (a) Use logging module - the standard library logging handlers are thread-safe; they use a threading.Lock internally. (b) For custom writers, use a threading.Lock around write operations. (c) Use a queue: each thread puts log records into a queue.Queue; a single writer thread consumes the queue and writes - serializing all I/O to one thread. (d) For append-only logs, 'a' mode with O_APPEND is atomic for single writes up to PIPE_BUF bytes (4096 bytes on Linux) per POSIX spec. Larger writes are still not atomic.

Practice Challenges

Beginner - Write a Formatted Report

Write a function write_inventory_report(filepath, items) that takes a list of dicts with keys "name", "count", "price" and writes a formatted report to filepath. The report should have a header, one row per item, and a footer with the total value.

Solution

def write_inventory_report(filepath, items):
    """
    Write a formatted inventory report to filepath.

    items: list of dicts with keys 'name', 'count', 'price'
    """
    with open(filepath, "w", encoding="utf-8") as f:
        # Header
        f.write("Inventory Report\n")
        f.write("=" * 50 + "\n")
        f.write(f"{'Item Name':25} {'Count':8} {'Price':10} {'Total':12}\n")
        f.write("-" * 50 + "\n")

        # Rows
        grand_total = 0.0
        for item in items:
            total = item["count"] * item["price"]
            grand_total += total
            f.write(
                f"{item['name']:25} {item['count']:8d} "
                f"${item['price']:9.2f} ${total:11.2f}\n"
            )

        # Footer
        f.write("=" * 50 + "\n")
        f.write(f"{'GRAND TOTAL':44} ${grand_total:11.2f}\n")


# Test it
import os, tempfile

items = [
    {"name": "Widget Alpha", "count": 50, "price": 9.99},
    {"name": "Widget Beta",  "count": 12, "price": 24.99},
    {"name": "Widget Gamma", "count": 200, "price": 2.49},
]

with tempfile.NamedTemporaryFile("w", suffix=".txt", delete=False) as tmp:
    tmp_path = tmp.name

write_inventory_report(tmp_path, items)

with open(tmp_path, "r", encoding="utf-8") as f:
    print(f.read())

os.unlink(tmp_path)

Output:

Inventory Report
==================================================
Item Name                    Count      Price        Total
--------------------------------------------------
Widget Alpha                    50      $9.99       $499.50
Widget Beta                     12     $24.99       $299.88
Widget Gamma                   200      $2.49       $498.00
==================================================
GRAND TOTAL                                        $1297.38

Intermediate - Atomic Config Update

Implement a ConfigStore class with get(key, default=None) and set(key, value) methods that persists data to JSON. The set() method must be atomic - if the process is killed during a write, the config file must not be corrupted.

Solution

import json
import os
import tempfile
from pathlib import Path


class ConfigStore:
    """
    A JSON-backed key-value store with atomic writes.
    Guaranteed: the config file is never in a corrupt or empty state.
    """

    def __init__(self, path):
        self.path = Path(path)
        self.path.parent.mkdir(parents=True, exist_ok=True)

    def _load(self):
        """Load the current config, return empty dict if file missing."""
        if not self.path.exists():
            return {}
        try:
            with open(self.path, "r", encoding="utf-8") as f:
                return json.load(f)
        except (json.JSONDecodeError, OSError):
            return {}

    def _save(self, data):
        """Atomically write data to the config file."""
        dir_path = self.path.parent
        fd, tmp_path = tempfile.mkstemp(
            dir=dir_path,
            prefix=f".{self.path.stem}_",
            suffix=".tmp"
        )
        try:
            with os.fdopen(fd, "w", encoding="utf-8") as f:
                json.dump(data, f, indent=2, sort_keys=True)
                f.write("\n")
                f.flush()
                os.fsync(f.fileno())
            os.replace(tmp_path, self.path)
        except Exception:
            try:
                os.unlink(tmp_path)
            except OSError:
                pass
            raise

    def get(self, key, default=None):
        """Retrieve a value by key."""
        return self._load().get(key, default)

    def set(self, key, value):
        """Set a key atomically."""
        config = self._load()
        config[key] = value
        self._save(config)

    def delete(self, key):
        """Delete a key atomically."""
        config = self._load()
        config.pop(key, None)
        self._save(config)

    def all(self):
        """Return all key-value pairs."""
        return self._load()


# Test
import tempfile, os

with tempfile.TemporaryDirectory() as tmpdir:
    store = ConfigStore(os.path.join(tmpdir, "app", "config.json"))

    store.set("db_host", "localhost")
    store.set("db_port", 5432)
    store.set("debug", True)

    print(store.get("db_host"))   # localhost
    print(store.get("missing", "default_value"))   # default_value
    print(store.all())
    # {'db_host': 'localhost', 'db_port': 5432, 'debug': True}

    store.delete("debug")
    print(store.all())
    # {'db_host': 'localhost', 'db_port': 5432}

Advanced - Streaming Report Generator with Multiple Output Targets

Build a ReportGenerator class that can write a sales report to multiple destinations simultaneously: a file, an in-memory buffer, and optionally gzip-compressed output.

Requirements:

Accepts an arbitrary number of output targets (file objects or file paths)
Streams data - does not build the entire report in memory
Supports gzip output if the target path ends with .gz
Uses writelines() with a generator for efficiency

Solution

import io
import os
import gzip
import tempfile
from contextlib import contextmanager, ExitStack
from typing import Union


class ReportGenerator:
    """
    Streams a report to multiple output destinations simultaneously.
    Uses generator-based writelines() for memory efficiency.
    """

    def __init__(self, data):
        self.data = data   # iterable of row dicts

    def _open_target(self, target):
        """Return a (file_obj, should_close) tuple for a target."""
        if isinstance(target, (str, os.PathLike)):
            path = str(target)
            if path.endswith(".gz"):
                return gzip.open(path, "wt", encoding="utf-8"), True
            else:
                return open(path, "w", encoding="utf-8"), True
        else:
            # Assume it's already a file-like object
            return target, False

    def _generate_lines(self):
        """Generator yielding report lines one at a time."""
        yield "=" * 60 + "\n"
        yield "SALES REPORT\n"
        yield "=" * 60 + "\n"
        yield f"{'Product':25} {'Qty':6} {'Unit Price':12} {'Total':12}\n"
        yield "-" * 60 + "\n"

        grand_total = 0.0
        item_count = 0
        for row in self.data:
            total = row["qty"] * row["price"]
            grand_total += total
            item_count += 1
            yield (
                f"{row['product']:25} {row['qty']:6d} "
                f"${row['price']:11.2f} ${total:11.2f}\n"
            )

        yield "-" * 60 + "\n"
        yield f"{'Items:':25} {item_count:6d}\n"
        yield f"{'Grand Total:':39} ${grand_total:11.2f}\n"
        yield "=" * 60 + "\n"

    def write(self, *targets):
        """Write report to all targets simultaneously."""
        opened_files = []
        try:
            file_objects = []
            for target in targets:
                f, should_close = self._open_target(target)
                file_objects.append(f)
                if should_close:
                    opened_files.append(f)

            # Stream each line to all targets
            for line in self._generate_lines():
                for f in file_objects:
                    f.write(line)

        finally:
            for f in opened_files:
                try:
                    f.close()
                except OSError:
                    pass


# --- Test ---
import tempfile, os

sales_data = [
    {"product": "Python Course",     "qty": 142,  "price": 99.00},
    {"product": "AI Fundamentals",   "qty": 87,   "price": 149.00},
    {"product": "Cloud Workshop",    "qty": 34,   "price": 299.00},
    {"product": "Docker Bootcamp",   "qty": 201,  "price": 79.00},
]

gen = ReportGenerator(sales_data)

# Write to file, compressed file, and in-memory buffer simultaneously
mem_buffer = io.StringIO()

with tempfile.TemporaryDirectory() as tmpdir:
    txt_path = os.path.join(tmpdir, "report.txt")
    gz_path = os.path.join(tmpdir, "report.txt.gz")

    gen.write(txt_path, gz_path, mem_buffer)

    # Read plain text
    with open(txt_path, "r", encoding="utf-8") as f:
        print(f.read())

    # Read gzip
    with gzip.open(gz_path, "rt", encoding="utf-8") as f:
        print("Gzip size:", os.path.getsize(gz_path), "bytes")

    # In-memory
    report_text = mem_buffer.getvalue()
    assert "Grand Total" in report_text
    print("In-memory report:", len(report_text), "chars")

Output:

============================================================
SALES REPORT
============================================================
Product                     Qty    Unit Price        Total
------------------------------------------------------------
Python Course               142       $99.00     $14058.00
AI Fundamentals              87      $149.00     $12963.00
Cloud Workshop               34      $299.00     $10166.00
Docker Bootcamp             201       $79.00     $15879.00
------------------------------------------------------------
Items:                        4
Grand Total:                              $53066.00
============================================================
Gzip size: 271 bytes
In-memory report: 393 chars

Quick Reference

Task	Code	Notes
Write (replacing)	`open(path, "w", encoding="utf-8")`	Truncates immediately on open
Append to file	`open(path, "a", encoding="utf-8")`	Preserves content, writes at end
Create only if new	`open(path, "x", encoding="utf-8")`	Raises `FileExistsError` if exists
Write binary	`open(path, "wb")`	No encoding parameter
Write one string	`f.write("text\n")`	No automatic newline
Write many strings	`f.writelines(iterable)`	No automatic newlines added
Flush Python buffer	`f.flush()`	Data reaches OS page cache
Sync to disk	`os.fsync(f.fileno())`	Durable across power failure
Atomic file update	write to temp + `os.replace(tmp, target)`	Same directory/filesystem required
Temp file (auto-delete)	`tempfile.NamedTemporaryFile()`	Deleted on close
Temp file (manual delete)	`fd, path = tempfile.mkstemp()`	Delete with `os.unlink(path)`
In-memory text stream	`io.StringIO()`	Testing, output capture
In-memory binary stream	`io.BytesIO()`	Binary protocol building
UTF-8 with Excel BOM	`encoding="utf-8-sig"`	For CSV files opened in Excel
Replace bad chars	`errors="replace"`	Writes `?` for unencodable chars
Temp directory	`tempfile.TemporaryDirectory()`	All contents deleted on exit

Key Takeaways

'w' mode truncates the file at open() time - before any write() call. For critical files, always use the atomic write pattern: write to a temp file in the same directory, then os.replace(tmp, target).
file.flush() moves data from Python's buffer to the OS page cache - not to disk. os.fsync() moves it to disk. Only after fsync() is data durable across power failures.
'x' (exclusive create) mode is atomic - it maps to O_CREAT | O_EXCL and is the correct way to create a file only if it does not exist, without race conditions.
'a' mode with O_APPEND makes individual write() calls atomic at the OS level for writes up to PIPE_BUF bytes, making it safe for concurrent log writing from multiple processes.
tempfile.mkstemp() creates secure temp files (mode 0600) and returns an OS file descriptor; you must close and delete it manually. NamedTemporaryFile() auto-deletes on close.
io.StringIO and io.BytesIO implement the full file interface in memory - use them for testing functions that accept file objects, without touching the filesystem.
For memory-efficient output of large datasets, use f.writelines(generator_function()) - the generator produces one line at a time without buffering the entire output in RAM.

What You Will Learn
Prerequisites
Mental Model: What Happens When You Write
Part 1 - Write Modes in Depth
Part 2 - write() vs writelines()
Part 3 - Flushing and Syncing: The Full Story
Part 4 - Writing Binary Data
Part 5 - Atomic Writes: The Critical Pattern
Part 6 - The tempfile Module
Part 7 - In-Memory File Objects: io.StringIO and io.BytesIO
- io.StringIO - In-Memory Text Stream
- io.BytesIO - In-Memory Binary Stream
Part 8 - Encoding for Writing
Part 9 - Real-World Patterns
Interview Questions
Practice Challenges
Quick Reference
Key Takeaways

What You Will Learn​

Prerequisites​

Mental Model: What Happens When You Write​

Part 1 - Write Modes in Depth​

All Write Modes at a Glance​

'w' - Write (Truncate)​

'a' - Append​

'x' - Exclusive Create​

Part 2 - write() vs writelines()​

write(string) - Write a String​

writelines(iterable) - Write an Iterable of Strings​

Performance: write() vs writelines() vs join()+write()​

Part 3 - Flushing and Syncing: The Full Story​

file.flush() - Python Buffer to OS Buffer​

os.fsync() - OS Buffer to Disk​

os.fdatasync() - Faster fsync Without Metadata​

Part 4 - Writing Binary Data​

Binary Write Mode 'wb'​

struct Module for Binary Format Writing​

Encoding in Binary Contexts​

Part 5 - Atomic Writes: The Critical Pattern​

The Problem with Direct 'w' Mode Updates​

The Atomic Write Pattern​

os.replace() vs os.rename()​

Part 6 - The tempfile Module​

tempfile.mkstemp() - Low-Level Temp File Creation​

tempfile.NamedTemporaryFile() - Context-Managed Temp Files​

tempfile.TemporaryDirectory() - Temp Directory​

Part 7 - In-Memory File Objects: io.StringIO and io.BytesIO​

io.StringIO - In-Memory Text Stream​

io.BytesIO - In-Memory Binary Stream​

Part 8 - Encoding for Writing​

Always Specify Encoding​

UTF-8 with BOM (utf-8-sig) for Excel Compatibility​

Errors Parameter for Writing​

Part 9 - Real-World Patterns​

Pattern 1: Safe Config File Update​

Pattern 2: Rotating Log Writer​

Pattern 3: Generating a Report to Multiple Destinations​

Interview Questions​

Q1: What is the difference between 'w' and 'a' mode? When would you choose each?​

Q2: Explain the difference between file.flush() and os.fsync(). When do you need fsync()?​

Q3: Why is the exclusive create mode 'x' safer than checking os.path.exists() before writing?​

Q4: Describe the atomic write pattern. Why must the temp file be on the same filesystem as the target?​

Q5: What is io.StringIO and what are its main use cases?​

Q6: You are writing a log file from multiple threads. What issues arise and how do you address them?​

Practice Challenges​

Beginner - Write a Formatted Report​

Intermediate - Atomic Config Update​

Advanced - Streaming Report Generator with Multiple Output Targets​

Quick Reference​

Key Takeaways​

What You Will Learn

Prerequisites

Mental Model: What Happens When You Write

Part 1 - Write Modes in Depth

All Write Modes at a Glance

`'w'` - Write (Truncate)

`'a'` - Append

`'x'` - Exclusive Create

Part 2 - `write()` vs `writelines()`

`write(string)` - Write a String

`writelines(iterable)` - Write an Iterable of Strings

Performance: `write()` vs `writelines()` vs `join()`+`write()`

Part 3 - Flushing and Syncing: The Full Story

`file.flush()` - Python Buffer to OS Buffer

`os.fsync()` - OS Buffer to Disk

`os.fdatasync()` - Faster fsync Without Metadata

Part 4 - Writing Binary Data

Binary Write Mode `'wb'`

`struct` Module for Binary Format Writing

Encoding in Binary Contexts

Part 5 - Atomic Writes: The Critical Pattern

The Problem with Direct `'w'` Mode Updates

The Atomic Write Pattern

`os.replace()` vs `os.rename()`

Part 6 - The `tempfile` Module

`tempfile.mkstemp()` - Low-Level Temp File Creation

`tempfile.NamedTemporaryFile()` - Context-Managed Temp Files

`tempfile.TemporaryDirectory()` - Temp Directory

Part 7 - In-Memory File Objects: `io.StringIO` and `io.BytesIO`

`io.StringIO` - In-Memory Text Stream

`io.BytesIO` - In-Memory Binary Stream

Part 8 - Encoding for Writing

Always Specify Encoding

UTF-8 with BOM (`utf-8-sig`) for Excel Compatibility

Errors Parameter for Writing

Part 9 - Real-World Patterns

Pattern 1: Safe Config File Update

Pattern 2: Rotating Log Writer

Pattern 3: Generating a Report to Multiple Destinations

Interview Questions

Q1: What is the difference between `'w'` and `'a'` mode? When would you choose each?

Q2: Explain the difference between `file.flush()` and `os.fsync()`. When do you need `fsync()`?

Q3: Why is the exclusive create mode `'x'` safer than checking `os.path.exists()` before writing?

Q4: Describe the atomic write pattern. Why must the temp file be on the same filesystem as the target?

Q5: What is `io.StringIO` and what are its main use cases?

Q6: You are writing a log file from multiple threads. What issues arise and how do you address them?

Practice Challenges

Beginner - Write a Formatted Report

Intermediate - Atomic Config Update

Advanced - Streaming Report Generator with Multiple Output Targets

Quick Reference

Key Takeaways