Version Control Basics - Git Workflows for Python Engineers
Reading time: ~25 minutes | Level: Foundation → Engineering
# Every engineer has read a commit message like this:
git log --oneline
a3f21c9 fix
b7e14d2 asdf
c902a1f WIP
d44f88e changes
e9801bc final FINAL
f001234 ok now its actually done
g884abc more changes
# And every engineer has wished they could read one like this:
git log --oneline
a3f21c9 feat(auth): add JWT refresh token endpoint
b7e14d2 fix(api): handle empty query string in search endpoint
c902a1f refactor(models): extract User validation into UserValidator
d44f88e test(auth): add parametrized tests for token expiry edge cases
e9801bc docs(readme): update deployment instructions for Docker
The second log is documentation. The first is noise. Six months later, you will be grateful for every commit message that explains why something changed - and you will curse every asdf and WIP at exactly the moment you need it most.
Version control is not a backup system. It is a collaboration infrastructure, a blame tool, a debugging instrument, and a deployment mechanism. Treating it as just "saving your work" wastes most of its value.
What You Will Learn
- The three-stage model of Git at engineering depth
- Surgical staging with
git add -p- committing only what you intend - Writing commit messages that document decisions, not just changes
- The Conventional Commits standard and why teams adopt it
- Branching strategies for different team sizes and release models
- Pull request best practices that make code review productive
git bisectfor finding regressions in O(log n) timegit blameandgit log -Sfor code archaeology- Tagging releases and semantic versioning
- Pre-commit hooks for automated quality gates
- Python-specific
.gitignoreand what should never be committed
Prerequisites
- Basic Git commands:
git init,git add,git commit,git push,git pull - Familiarity with branches at the concept level
- Working Python development environment with a project to apply this to
Git Fundamentals at Engineering Depth
The Three-Stage Model
Git has three locations where your code lives. Understanding this precisely explains why every Git command works the way it does.
-
git diff(no args) - shows changes in working tree vs staging area -
git diff --staged- shows changes in staging area vs last commit -
git log- shows commit history -
Working tree: The files on disk. Untracked modifications live here.
-
Staging area (index): A snapshot of what your next commit will contain. You explicitly control what goes in it.
-
Commit history: Permanent, immutable snapshots. Each commit points to its parent(s).
The staging area is what separates Git from simpler systems. It lets you craft commits precisely - staging only some changes from a file, or only some files from a set of modifications.
Viewing State
# What has changed since the last commit?
git diff # working tree vs staging area
git diff --staged # staging area vs last commit
git diff HEAD # working tree vs last commit (combines both)
# What files are in what state?
git status # concise summary
# Full commit history as a graph
git log --oneline --graph --all --decorate
# --oneline: one line per commit
# --graph: ASCII art showing branch structure
# --all: show all branches, not just current
# --decorate: show branch and tag names
# Show what a specific commit changed
git show a3f21c9
git show HEAD # the most recent commit
git show HEAD~3 # three commits back
Surgical Staging with git add -p
git add -p (patch mode) is the most powerful and underused Git command for day-to-day work. It shows you each change in your working tree as an interactive "hunk" and asks you what to do with it.
git add -p
diff --git a/src/myapp/services.py b/src/myapp/services.py
index a3f2c9b..b47e1d2 100644
--- a/src/myapp/services.py
+++ b/src/myapp/services.py
@@ -12,6 +12,7 @@ def create_order(user, items, prices):
if not user.is_active:
raise UserNotFoundError(f"User {user.id} is not active")
+ log.debug("Creating order for user %s", user.id)
total = sum(prices[item] for item in items)
return Order(id=_next_id(), user_id=user.id, items=items, total=total)
Stage this hunk [y,n,q,a,d,s,?]?
The options:
y- stage this hunkn- skip this hunk (don't stage it)s- split this hunk into smaller piecese- manually edit the hunkq- quit, leaving the rest unstaged?- show help
Why this matters: You often make multiple changes in one editing session - a bug fix, a refactor, and some debug logging. git add -p lets you commit the bug fix as one clean commit and the refactor as another, even though both are in the same file. This produces a commit history that tells a story instead of a history that records sessions.
Commit Messages - The Conventional Commits Standard
A commit message is documentation. It outlives the ticket, the PR description, and the engineer who wrote it. Write it for the engineer who will debug this code at 2am in eighteen months - which might be you.
The Format
type(scope): subject
[optional body]
[optional footer]
Subject line rules:
- Use imperative mood: "add feature", not "added feature" or "adds feature"
- Maximum 50 characters - forces you to be specific
- No period at the end
- Lowercase after the colon
Body rules (if needed):
- Blank line between subject and body
- Wrap at 72 characters per line
- Explain why, not what (the diff shows what)
- Explain what problem this solves, what constraints existed, what alternatives were rejected
Types:
| Type | When to use |
|---|---|
feat | New feature or capability |
fix | Bug fix |
refactor | Code restructuring with no behavior change |
test | Adding or updating tests |
docs | Documentation changes only |
style | Formatting changes, no logic change |
chore | Build process, dependency updates, tooling |
perf | Performance improvements |
ci | CI/CD configuration changes |
Good vs Bad Commit Messages
# BAD commit messages
git commit -m "fix"
git commit -m "changes to auth"
git commit -m "WIP"
git commit -m "added the thing we talked about"
git commit -m "it works now"
# GOOD commit messages
git commit -m "fix(auth): handle expired JWT tokens with 401, not 500"
git commit -m "feat(api): add pagination to /api/courses endpoint"
git commit -m "refactor(services): extract order validation into OrderValidator"
git commit -m "test(auth): add parametrized tests for all token expiry cases"
git commit -m "chore(deps): upgrade pydantic from 1.x to 2.x"
Writing a Commit with a Body
git commit
# Opens your editor ($EDITOR) with a template
feat(auth): add JWT refresh token endpoint
The current auth flow requires users to log in again after the access token
expires (15 minutes). This creates a poor UX for long-running sessions.
Add POST /api/auth/refresh that accepts a valid refresh token and returns
a new access token. Refresh tokens expire after 7 days and are rotated on
each use (new refresh token issued with each refresh).
Rejects: used or expired refresh tokens with 401
Rejects: refresh tokens from inactive users with 403
Closes #142
This commit message contains more engineering value than the code diff alone. It records the problem, the solution, the design decisions, and the constraints \text{---} information that disappears from your head three months later.
.gitignore for Python Projects
# Python bytecode \text{---} generated automatically, never commit
__pycache__/
*.py[cod]
*$py.class
# Distribution and packaging - generated by build tools
dist/
build/
*.egg-info/
*.egg
MANIFEST
.eggs/
# Virtual environments - contains absolute paths and platform binaries
.venv/
venv/
env/
ENV/
.env.bak
# Environment variables - NEVER commit secrets
.env
.env.local
.env.development
.env.production
# Testing artifacts
.pytest_cache/
.coverage
.coverage.*
htmlcov/
.tox/
nosetests.xml
coverage.xml
# Type checker caches
.mypy_cache/
.dmypy.json
.pytype/
# Jupyter notebooks - checkpoint files
.ipynb_checkpoints/
*.ipynb
# IDEs
.idea/
.vscode/
*.swp
*.swo
.project
.pydevproject
# OS-specific
.DS_Store
Thumbs.db
desktop.ini
The three most important exclusions:
.env- contains secrets. Committing this is a security incident..venv/- contains absolute paths. It doesn't work on other machines.dist/- generated by the build process. Rebuild it, don't commit it.
Branching Strategies
The right branching strategy depends on your team size, release cadence, and deployment model.
Feature Branch Workflow (most teams)
The standard for teams of 2-10 engineers:
# Start a feature
git checkout main
git pull origin main
git checkout -b feature/add-search
# Work, commit, push
git add -p
git commit -m "feat(search): add full-text search to /api/courses"
git push -u origin feature/add-search
# Open a PR, get review, merge to main
# After merge, clean up
git checkout main
git pull origin main
git branch -d feature/add-search
Rules:
mainis always deployable- All changes go through pull requests
- Branches are short-lived (days, not weeks)
- Rebase or squash before merging to keep main history clean
GitFlow (larger teams, versioned releases)
GitFlow adds a develop branch as the integration point:
Branches:
main- production releases onlydevelop- integration branch for completed featuresfeature/*- individual features, branched from and merged todeveloprelease/*- release preparation (version bumps, final bug fixes)hotfix/*- emergency production fixes, merged to bothmainanddevelop
GitFlow adds ceremony that small teams don't need. Use it when you maintain multiple supported versions simultaneously or have a defined release cycle (quarterly releases, for example).
Trunk-Based Development (CI/CD heavy teams)
Large teams at Google, Meta, and similar organizations use a single main branch with very short-lived branches:
# Branch lives for hours or 1-2 days maximum
git checkout -b tiny-change
# ... single focused change ...
git commit -m "refactor(models): inline trivial helper function"
# immediate PR, immediate merge
Requirements:
- Extremely strong test coverage (80%+ is the floor)
- Feature flags to hide incomplete features
- Sophisticated CI/CD that runs tests in under 10 minutes
- Engineers who write small, focused commits
Pull Request Best Practices
A pull request is a communication tool, not just a merge mechanism.
PR Size
Keep PRs under 400 lines of diff. This is not a style preference - it is an engineering effectiveness rule. Studies consistently show that review quality drops sharply after 400 lines because reviewers lose context. A 2000-line PR gets rubber-stamped. A 200-line PR gets reviewed.
If your feature requires more than 400 lines, break it into a sequence of smaller PRs:
- PR 1: data model changes
- PR 2: business logic
- PR 3: API endpoint
PR Description Template
## What changed
Brief description of the technical change.
## Why
What problem does this solve? Link to issue or ticket.
## How to test
1. `pip install -e ".[dev]"`
2. `pytest tests/test_auth.py`
3. `curl -X POST http://localhost:8000/api/auth/refresh -d '{"refresh_token": "..."}'`
## Screenshots / Output
(if applicable)
## Checklist
- [ ] Tests added or updated
- [ ] Documentation updated
- [ ] No `.env` or secrets committed
Reviewing Code
Good code review focuses on:
- Logic errors: Does this handle edge cases? What happens on failure?
- Architecture: Does this fit the existing patterns? Does it belong in this module?
- Security: Is input validated? Are secrets handled correctly?
- Testability: Can this be tested? Is it tested?
Bad code review focuses on:
- Style: Use a formatter. Don't review formatting manually.
- Personal preferences: "I would have named this differently." If it's not wrong, it's acceptable.
- Nits that don't matter: One blank line vs two blank lines is noise.
Use inline comments for specific feedback, not general chat. Use "blocking" vs "suggestion" labels if your team uses them.
git stash - The Context-Switch Tool
You're in the middle of a feature when you need to switch to a hotfix:
# Save current work-in-progress (without committing)
git stash push -m "WIP: adding pagination to search results"
# Switch to the hotfix
git checkout main
git checkout -b hotfix/fix-auth-crash
# ... fix the bug, commit, PR ...
# Return to your feature
git checkout feature/add-pagination
git stash pop # restores your WIP changes
git stash list # see all stashes
git stash drop stash@{0} # delete a stash you don't need
git stash is also useful when you realize you started working on the wrong branch:
# I've been modifying code but forgot to create a branch
git stash
git checkout -b feature/my-actual-feature
git stash pop
# Now I'm on the right branch with my changes applied
git bisect - Finding Bugs in O(log n)
git bisect is a binary search through your commit history. Given a commit where a test passes and a later commit where it fails, it finds the exact commit that introduced the bug.
# Start bisect
git bisect start
# Mark current commit as bad (the bug exists here)
git bisect bad
# Mark a known-good commit (before the bug existed)
git bisect good v1.0.0
# or: git bisect good a3f21c9
# Git checks out the midpoint commit
# Bisecting: 47 revisions left to test after this (roughly 6 steps)
# [b7e14d2] feat(search): add full-text search endpoint
# Test whether the bug exists here
pytest tests/test_auth.py
# If test passes:
git bisect good
# If test fails:
git bisect bad
# After ~6 iterations, git identifies the culprit:
# b7e14d2 is the first bad commit
# Clean up
git bisect reset
With 100 commits to search, git bisect finds the bad commit in at most 7 steps. Manually checking 100 commits would take hours; bisect takes minutes.
Automating bisect
# Run bisect with a test script - fully automated
git bisect start HEAD v1.0.0
git bisect run pytest tests/test_auth.py::test_token_expiry -x
# Git runs this command at each midpoint and uses the exit code
# Exit 0 = good, non-zero = bad
# Reports the exact commit automatically
git blame and git log -S - Code Archaeology
git blame
git blame shows which commit last modified each line of a file:
git blame src/myapp/services.py
# Output:
a3f21c9 (Alice 2024-01-15) def create_order(user, items, prices):
a3f21c9 (Alice 2024-01-15) if not user.is_active:
b7e14d2 (Bob 2024-02-10) raise UserNotFoundError(...)
c902a1f (Carol 2024-03-01) total = sum(prices[item] for item in items)
When you find a confusing line, git blame tells you who wrote it and when. You can then look at the full commit:
git show b7e14d2
# Shows the full diff and commit message for that change
Most editors (VS Code, IntelliJ) have git blame built into the gutter.
git log -S - The "Pickaxe"
git log -S finds all commits that added or removed a specific string:
# Find when a specific function was introduced or removed
git log -S "def calculate_discount" --all --oneline
# Find when a specific import was added
git log -S "from myapp.auth import verify_token" --all --oneline
# Show the actual diff for each matching commit
git log -S "verify_token" --all -p
This is invaluable when you know a piece of code used to exist but was removed, or when you're trying to find who introduced a specific pattern or bug.
Tagging Releases - Semantic Versioning
Tags are permanent labels for specific commits. They're how you mark releases.
Semantic Versioning (SemVer)
MAJOR.MINOR.PATCH
1.0.0 → 1.0.1 # PATCH: backward-compatible bug fix
1.0.1 → 1.1.0 # MINOR: backward-compatible new feature
1.1.0 → 2.0.0 # MAJOR: breaking change
The rules:
- MAJOR: you changed the API in a way that breaks existing users
- MINOR: you added functionality that doesn't break existing users
- PATCH: you fixed a bug without changing the API
Pre-release versions: 1.0.0-alpha.1, 1.0.0-beta.2, 1.0.0-rc.1
Creating and Pushing Tags
# Create an annotated tag (preferred - stores tagger, date, message)
git tag -a v1.2.0 -m "Release v1.2.0 - adds pagination and search"
# Tag a specific past commit
git tag -a v1.1.9 b7e14d2 -m "Release v1.1.9 - security patch"
# Push tags (they are NOT pushed with git push by default)
git push origin v1.2.0 # push one tag
git push origin --tags # push all tags
# List tags
git tag -l "v1.*"
# See what a tag points to
git show v1.2.0
Using Tags in pyproject.toml
# Option 1: hardcoded version (common, simple)
[project]
version = "1.2.0"
# Option 2: dynamic version from git tags (setuptools-scm)
[build-system]
requires = ["setuptools>=68", "setuptools-scm"]
[tool.setuptools_scm]
# Reads version from the latest git tag
# `git tag v1.2.0` → package version becomes "1.2.0"
# Between tags: "1.2.1.dev3+g1234abc" (auto-generated)
With setuptools-scm, you never manually update the version number. Tag the commit, and the version follows automatically.
Pre-commit Hooks
Pre-commit hooks run automatically before every git commit. They're the last line of defense against committing broken code, failing tests, or accidentally staged secrets.
Two Approaches
Approach 1: Manual .git/hooks/pre-commit
# .git/hooks/pre-commit (must be executable: chmod +x)
#!/bin/bash
set -e
echo "Running pre-commit checks..."
# Run tests
pytest tests/ -x -q
if [ $? -ne 0 ]; then
echo "Tests failed. Commit aborted."
exit 1
fi
# Check formatting
black --check src/ tests/
if [ $? -ne 0 ]; then
echo "Code not formatted. Run: black src/ tests/"
exit 1
fi
# Type checking
mypy src/
The downside: .git/hooks/ is not version-controlled. New team members don't get these hooks.
Approach 2: The pre-commit tool (recommended)
pip install pre-commit
# .pre-commit-config.yaml - committed to the repo
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.3.0
hooks:
- id: ruff
args: [--fix]
- id: ruff-format
- repo: https://github.com/psf/black
rev: 24.2.0
hooks:
- id: black
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.9.0
hooks:
- id: mypy
additional_dependencies: [pydantic>=2.0]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-json
- id: detect-private-key # prevents committing secrets
- id: no-commit-to-branch
args: [--branch, main] # prevents direct commits to main
# Install hooks for this repo
pre-commit install
# Run hooks manually on all files (first-time setup check)
pre-commit run --all-files
# Update hook versions
pre-commit autoupdate
Now every team member who runs pre-commit install gets the same quality gates. The configuration is version-controlled alongside the code.
What to Gate on
| Gate | When it makes sense |
|---|---|
| Code formatting (black/ruff) | Always - eliminates style review entirely |
| Linting (ruff) | Always - catches obvious bugs |
| Type checking (mypy) | Teams with type annotations |
| Full test suite | Small/fast suites; skip for slow suites (run in CI instead) |
| Secret detection | Always - prevents accidental credential commits |
| No direct commit to main | Always for teams |
Python-Specific Commit Practices
What to Commit
# Always commit:
pyproject.toml # project metadata, dependencies, tool config
.pre-commit-config.yaml # team's quality gates
requirements.txt # lockfile if you use pip-compile
src/**/*.py # all source code
tests/**/*.py # all test code
docs/** # documentation
.gitignore # must be in the repo to work
.env.example # template showing what env vars are needed
# Conditionally commit:
requirements-dev.txt # if using pip-compile workflow
Makefile # build automation, if you use one
Dockerfile # if containerized
What to Never Commit
# Never commit:
.env # contains real secrets
.venv/ # virtual environment (platform-specific binaries)
dist/ # built packages (regenerate with `python -m build`)
*.egg-info/ # generated metadata
__pycache__/ # bytecode (never useful to anyone)
.coverage # test coverage data (regenerate with pytest)
*.pyc # compiled bytecode
The .env and .env.example Pattern
# .env - never commit
DATABASE_URL=postgresql://user:secret@localhost/mydb
API_KEY=sk-your-key-here
SECRET_KEY=my-very-secret-key
# .env.example - commit this (it's a template)
DATABASE_URL=postgresql://user:password@localhost/mydb
API_KEY=your-api-key-here
SECRET_KEY=generate-with-openssl-rand-hex-32
New team members copy .env.example to .env and fill in their values. They never have to guess what environment variables the application needs.
A Complete Git Workflow Example
Starting a new feature from scratch:
# 1. Start from a clean, up-to-date main
git checkout main
git pull origin main
# 2. Create a feature branch
git checkout -b feature/add-order-discounts
# 3. Make changes - multiple focused commits
# ... edit models.py ...
git add -p # stage only the model changes
git commit -m "feat(models): add discount_percent field to Order"
# ... edit services.py ...
git add -p # stage only the service changes
git commit -m "feat(services): implement apply_discount business logic"
# ... add tests ...
git add tests/test_services.py
git commit -m "test(services): add parametrized tests for discount edge cases"
# ... add documentation ...
git add docs/api.md
git commit -m "docs(api): document discount_percent field in Order schema"
# 4. Review your commits before pushing
git log --oneline main..HEAD # commits not yet on main
git diff main...HEAD # all changes since branching from main
# 5. Keep branch up to date
git fetch origin
git rebase origin/main # reapply your commits on top of latest main
# 6. Push and open PR
git push -u origin feature/add-order-discounts
gh pr create --title "feat(orders): add configurable order discounts" \
--body "Adds discount_percent to Order model and apply_discount service function.
Closes #89"
# 7. After PR is merged - clean up
git checkout main
git pull origin main
git branch -d feature/add-order-discounts
Interview Questions
Q1: What is git add -p and why should you use it instead of git add .?
Answer: git add -p (patch mode) presents each changed "hunk" of code interactively and asks whether to stage it. This lets you create commits that contain exactly the changes you intend - not everything you've touched since your last commit. The practical benefit is that you often make multiple logically distinct changes in one editing session: you fix a bug while exploring a feature, or you refactor while fixing a crash. With git add ., all of those go into one commit. With git add -p, you can commit the bug fix cleanly, then the refactor, then the feature work. This produces a commit history that tells a coherent story, makes git bisect accurate, and makes code review far easier. git add . is acceptable for the first commit on a new project; for ongoing work, git add -p produces a professional commit history.
Q2: What is the Conventional Commits standard and what problem does it solve?
Answer: Conventional Commits is a specification for commit message format: type(scope): subject. Types include feat, fix, refactor, test, docs, chore, and others. It solves two related problems. First, without a convention, commit messages vary wildly in quality and format, making the log useless for understanding history. With a convention, the log becomes readable documentation. Second, the format is machine-parseable - tools like semantic-release can automatically bump version numbers and generate changelogs from commit messages. A feat: commit triggers a MINOR version bump; a fix: triggers a PATCH bump; a BREAKING CHANGE: footer triggers a MAJOR bump. The scope (in parentheses) is optional but useful: it names the subsystem affected, like feat(auth): or fix(api):.
Q3: Explain git bisect. When would you use it?
Answer: git bisect is a binary search through commit history to find the commit that introduced a bug. You tell it a "bad" commit (where the bug exists) and a "good" commit (where it did not), and it checks out the midpoint commit and asks you to test it. You mark it good or bad, and it halves the search space again. For 100 commits, it finds the culprit in at most 7 steps. You would use it when a test suddenly starts failing, a user reports a regression, or a performance metric degrades - any situation where you know it worked at some point in history and doesn't now. You can also automate it completely with git bisect run pytest tests/test_specific.py, which uses the test exit code to determine good/bad automatically. This turns a potentially day-long investigation into a ten-minute automated process.
Q4: What is the difference between the feature branch workflow, GitFlow, and trunk-based development?
Answer: The feature branch workflow uses a single main branch where every change goes through a short-lived feature branch and pull request. It's simple, effective for teams of 2-10, and the most common approach. GitFlow adds a permanent develop branch as the integration point, with release/ and hotfix/ branches for managing versioned releases - it adds ceremony that makes sense for software with a formal release cycle or multiple maintained versions. Trunk-based development works from a single branch with branches lasting hours rather than days; it requires comprehensive automated testing, feature flags for incomplete work, and fast CI. Large engineering organizations like Google and Meta use it because it eliminates merge conflicts and produces the fastest integration cycle. The right choice depends on team size, release cadence, and how much CI/CD infrastructure you have.
Q5: What should and should not be committed in a Python project?
Answer: Always commit: pyproject.toml, all source code in src/, all tests in tests/, .gitignore, .env.example (template without real values), and .pre-commit-config.yaml. Commit lockfiles (requirements.txt from pip-compile or uv.lock) for applications so deployments are reproducible. Never commit: .env (contains secrets - this is a security incident), .venv/ (contains absolute paths and platform-specific binaries that break on other machines), dist/ and *.egg-info/ (generated artifacts that should be rebuilt), and __pycache__/ (bytecode that Python generates automatically). The fundamental principle is: commit source that defines intent, not generated artifacts that can be recreated. The .env.example / .env pattern is critical: new team members need to know what environment variables the app requires, but the actual values must never be in version control.
Q6: What are pre-commit hooks and why are they better than reminding team members to run linters manually?
Answer: Pre-commit hooks run automatically before every git commit. The pre-commit tool manages them as a version-controlled configuration file (.pre-commit-config.yaml) rather than untracked files in .git/hooks/. This means every team member who installs the hooks (pre-commit install) gets identical quality gates. The key advantage over manual conventions is that they're automatic: you cannot forget to run the linter if committing literally requires it to pass. Hooks typically run in under a second (formatters and linters are fast), so the friction is minimal. They catch issues before they reach code review, which removes the most tedious part of review: pointing out formatting issues and obvious linting violations. The reviewer can focus entirely on logic, architecture, and correctness. The most valuable hooks are: code formatting (black/ruff-format), linting (ruff), secret detection, and preventing direct commits to main.
Practice Challenges
Beginner - Write Conventional Commit Messages
For each of the following scenarios, write the correct Conventional Commits message. Make sure the subject line is 50 characters or fewer and uses imperative mood.
- You fixed a bug where the search endpoint returned HTTP 500 for empty query strings (it should return an empty list with HTTP 200).
- You added a new
export_csvfunction to the reports module. - You renamed
get_usr()toget_user()for clarity (no behavior change). - You updated pytest from 7.x to 8.x in pyproject.toml.
- You added 15 parametrized test cases for the discount calculation function.
Solution
# 1. Bug fix in the search endpoint
git commit -m "fix(api): return empty list on empty search query, not 500"
# 2. New feature in reports
git commit -m "feat(reports): add export_csv function to reports module"
# 3. Rename with no behavior change = refactor
git commit -m "refactor(users): rename get_usr to get_user for clarity"
# 4. Dependency update = chore
git commit -m "chore(deps): upgrade pytest from 7.x to 8.x"
# 5. Tests only
git commit -m "test(services): add parametrized tests for discount calculation"
Why each choice:
fixfor the search bug: it was broken behavior, now correctedfeatfor export_csv: new capability that didn't exist beforerefactorfor the rename: behavior unchanged, code improvedchorefor dependency update: housekeeping, no feature or bug fixtestfor new tests: no production code changed
Intermediate - Use git bisect to Find a Bug
Your project's test suite was passing at v1.0.0 but fails now at HEAD on test_calculate_discount. Set up a git bisect run to find the bad commit automatically.
Solution
# Start bisect - HEAD is bad, v1.0.0 was good
git bisect start
# Mark current state as bad
git bisect bad HEAD
# Mark the last known good state
git bisect good v1.0.0
# Automated bisect - runs pytest and uses the exit code
# Exit 0 = good (test passes), non-zero = bad (test fails)
git bisect run pytest tests/test_services.py::test_calculate_discount -x -q
# Git will output something like:
# b7e14d2 is the first bad commit
# Author: Bob <[email protected]>
# Date: Mon Feb 12 14:30:00 2024
#
# feat(services): optimize discount calculation for bulk orders
#
# The issue: the "optimization" introduced a floating-point rounding bug
# Reset back to HEAD
git bisect reset
# Now you know exactly which commit to look at
git show b7e14d2
To investigate further after bisect:
# Check out just before the bad commit to confirm it was fine
git checkout b7e14d2~1
pytest tests/test_services.py::test_calculate_discount # passes
# Check out the bad commit
git checkout b7e14d2
pytest tests/test_services.py::test_calculate_discount # fails
# Now look at exactly what changed
git show b7e14d2 -- src/myapp/services.py
# Fix the bug on a new branch
git checkout main
git checkout -b fix/discount-rounding
# ... make the fix ...
git commit -m "fix(services): correct floating-point rounding in discount calculation
The bulk discount optimization introduced in b7e14d2 used integer division
instead of float division, causing rounding errors for discounts that don't
divide evenly.
Fixes #156"
Advanced - Set Up a Complete Pre-commit Configuration
Set up a complete .pre-commit-config.yaml for a Python project that:
- Formats code with black
- Lints with ruff (and auto-fixes safe issues)
- Runs mypy for type checking
- Prevents committing directly to
main - Detects accidentally staged private keys or API tokens
- Fixes trailing whitespace and ensures files end with a newline
Then write a Makefile with targets lint, test, format, and ci that mirror what the hooks check.
Solution
# .pre-commit-config.yaml
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.3.4
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- id: ruff-format
- repo: https://github.com/psf/black
rev: 24.3.0
hooks:
- id: black
language_version: python3.11
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.9.0
hooks:
- id: mypy
args: [--strict]
additional_dependencies:
- pydantic>=2.0
- click>=8.1
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-toml
- id: check-json
- id: check-merge-conflict
- id: detect-private-key
- id: no-commit-to-branch
args: [--branch, main, --branch, master]
- id: check-added-large-files
args: [--maxkb=500]
# Makefile
.PHONY: install lint format test ci clean
# Install the package and all development dependencies
install:
python -m venv .venv
.venv/bin/pip install -e ".[dev]"
.venv/bin/pre-commit install
# Run all linters without auto-fixing
lint:
.venv/bin/ruff check src/ tests/
.venv/bin/black --check src/ tests/
.venv/bin/mypy src/
# Auto-fix formatting and safe lint issues
format:
.venv/bin/ruff check --fix src/ tests/
.venv/bin/black src/ tests/
# Run the full test suite with coverage
test:
.venv/bin/pytest tests/ -v --tb=short --cov=src/ --cov-report=term-missing
# Run everything - matches what CI runs
ci: lint test
# Remove generated artifacts
clean:
rm -rf dist/ build/ *.egg-info/
rm -rf .pytest_cache/ .mypy_cache/ .coverage htmlcov/
find . -type d -name __pycache__ -exec rm -rf {} +
Run the setup:
# First-time setup
make install
# Day-to-day
make format # fix formatting
make lint # check without fixing
make test # run tests
make ci # run everything (what CI will run)
The Makefile mirrors the pre-commit hooks so engineers can run the same checks locally on demand (make lint) and automatically on commit (pre-commit hooks).
Quick Reference
| Command | What it does |
|---|---|
git add -p | Interactively stage individual hunks |
git diff --staged | Show what's staged for next commit |
git log --oneline --graph --all | Visual branch history |
git stash push -m "message" | Save WIP without committing |
git stash pop | Restore most recent stash |
git bisect start / good / bad | Binary search for bug-introducing commit |
git bisect run <command> | Automate bisect with a test command |
git blame <file> | Show who last modified each line |
git log -S "search string" | Find commits that added/removed a string |
git tag -a v1.2.0 -m "message" | Create annotated release tag |
git push origin --tags | Push tags (not included in regular push) |
pre-commit install | Install hooks for this repo |
pre-commit run --all-files | Run all hooks on every file |
Key Takeaways
- Commit messages are documentation that outlasts tickets, PRs, and the engineer who wrote them. Write them for the engineer debugging at 2am in 18 months - using the Conventional Commits format and imperative mood.
git add -pis the single most impactful daily habit change for producing a readable commit history. Never again commit half-finished work alongside a critical bug fix.git bisectturns regression hunting from a day-long task into a 10-minute automated search. Know it before you need it.- The
.envfile must never be committed. The.env.examplefile should always be committed. This is non-negotiable. - Pre-commit hooks managed by the
pre-committool are version-controlled quality gates that every team member gets automatically - they eliminate the most tedious parts of code review. - Branching strategy should match team size and release cadence. Feature branches for most teams; GitFlow for formal release cycles; trunk-based development for large teams with strong CI.
- Never commit
.venv/,dist/, or__pycache__/. Commitpyproject.toml, lockfiles, and.env.example. Generated artifacts are rebuilt, not version-controlled.
