Python Publishing Packages Practice Problems & Exercises

Practice: Publishing Packages

11 problems4 Easy4 Medium3 Hard⏱ 40-55 min

Easy

#1Validate Wheel FilenameEasy

wheelfilenamecompatibility-tags

Parse and validate wheel filenames. The wheel filename encodes Python version compatibility, ABI compatibility, and platform compatibility — pip uses these tags to decide which wheel to download.

Python

import re

def parse_wheel_filename(filename):
    if not filename.endswith('.whl'):
        return {'valid': False}
    stem = filename[:-4]
    parts = stem.split('-')
    if len(parts) < 5:
        return {'valid': False}

    name = parts[0]
    version = parts[1]
    # Handle optional build tag (6 parts)
    if len(parts) == 6:
        python_tag = parts[3]
        abi_tag = parts[4]
        platform_tag = parts[5]
    else:
        python_tag = parts[2]
        abi_tag = parts[3]
        platform_tag = parts[4]

    is_pure_python = (
        python_tag.startswith('py') and
        abi_tag == 'none' and
        platform_tag == 'any'
    )

    return {
        'valid': True,
        'name': name,
        'version': version,
        'python_tag': python_tag,
        'abi_tag': abi_tag,
        'platform_tag': platform_tag,
        'is_pure_python': is_pure_python,
    }

tests = [
    'requests-2.31.0-py3-none-any.whl',
    'numpy-1.24.3-cp311-cp311-manylinux_2_17_x86_64.whl',
    'not_a_wheel.tar.gz',
]
for t in tests:
    r = parse_wheel_filename(t)
    print(t[:30], '->', r.get('valid'), r.get('is_pure_python'))

Solution

def parse_wheel_filename(filename):
    if not filename.endswith('.whl'):
        return {'valid': False}
    stem = filename[:-4]
    parts = stem.split('-')
    if len(parts) < 5:
        return {'valid': False}
    offset = 1 if len(parts) == 6 else 0
    python_tag = parts[2 + offset]
    abi_tag = parts[3 + offset]
    platform_tag = parts[4 + offset]
    return {
        'valid': True, 'name': parts[0], 'version': parts[1],
        'python_tag': python_tag, 'abi_tag': abi_tag, 'platform_tag': platform_tag,
        'is_pure_python': python_tag.startswith('py') and abi_tag == 'none' and platform_tag == 'any',
    }

The wheel filename is a machine-readable compatibility declaration. py3-none-any means: pure Python 3, no ABI requirements, any platform — the wheel works everywhere. cp311-cp311-manylinux_2_17_x86_64 means: CPython 3.11, CPython 3.11 ABI, 64-bit Linux with glibc >= 2.17. When pip downloads a package, it ranks available wheels by compatibility preference and picks the most specific one that matches the current environment. Pure Python wheels are preferred when no platform-specific wheel is needed.

Starter Code

import re

def parse_wheel_filename(filename):
    """Parse a wheel filename into its components.
    
    Wheel filename format:
    {distribution}-{version}(-{build_tag})?-{python_tag}-{abi_tag}-{platform_tag}.whl
    
    Examples:
    'requests-2.31.0-py3-none-any.whl'
    'numpy-1.24.3-cp311-cp311-manylinux_2_17_x86_64.whl'
    
    Return a dict:
    - 'valid': bool
    - 'name': str
    - 'version': str
    - 'python_tag': str
    - 'abi_tag': str
    - 'platform_tag': str
    - 'is_pure_python': bool (py3-none-any)
    """
    # TODO: implement
    pass

Expected Output

{'valid': True, 'name': 'requests', 'version': '2.31.0', 'python_tag': 'py3', 'abi_tag': 'none', 'platform_tag': 'any', 'is_pure_python': True}

Hints

Hint 1: Split the filename on "-" after removing ".whl". The minimum valid split gives 5 parts.

Hint 2: is_pure_python: python_tag starts with "py", abi_tag is "none", platform_tag is "any".

#2Generate MANIFEST.in RulesEasy

MANIFEST.insdistfile-inclusion

Generate MANIFEST.in content for a source distribution. The MANIFEST.in file controls what files are included in a sdist tarball — critical for packages with data files, templates, or type stubs.

Python

def generate_manifest(project_files, extras=None):
    lines = [
        '# Auto-generated MANIFEST.in',
        '',
        '# Standard files',
        'include README*',
        'include LICENSE*',
        'include CHANGES* CHANGELOG*',
        'include pyproject.toml',
        '',
        '# Source code',
        'recursive-include src *.py',
        'recursive-include src *.pyi',
        '',
    ]

    if extras:
        lines.append('# Extra patterns')
        for pattern in extras:
            lines.append('include ' + pattern)
        lines.append('')

    lines.extend([
        '# Exclusions',
        'global-exclude *.pyc',
        'global-exclude __pycache__',
        'prune build',
        'prune dist',
        'prune .git',
        'prune .tox',
        'prune *.egg-info',
    ])

    return '\n'.join(lines)

files = ['src/mypackage/__init__.py', 'README.md', 'LICENSE']
extras = ['data/*.json', 'templates/*.html']
print(generate_manifest(files, extras))

Solution

def generate_manifest(project_files, extras=None):
    lines = ['# Auto-generated MANIFEST.in', '',
             'include README*', 'include LICENSE*',
             'include CHANGES* CHANGELOG*', 'include pyproject.toml', '',
             'recursive-include src *.py', 'recursive-include src *.pyi', '']
    if extras:
        lines.append('# Extra patterns')
        lines.extend('include ' + p for p in extras)
        lines.append('')
    lines.extend(['global-exclude *.pyc', 'global-exclude __pycache__',
                  'prune build', 'prune dist', 'prune .git', 'prune *.egg-info'])
    return '\n'.join(lines)

MANIFEST.in is only relevant for sdist builds — wheel builds use the build backend's own file selection mechanism (e.g., [tool.hatch.build.include]). The sdist is still important because it is what PyPI uses as the source for building on unusual platforms (Alpine Linux, musl libc, etc.) that do not have a pre-built wheel. Missing data files in sdist is a common packaging bug: the wheel works fine in testing but pip install from source fails because the data files were never included. Always test your sdist with pip install dist/mypackage-1.0.0.tar.gz before publishing.

Starter Code

def generate_manifest(project_files, extras=None):
    """Generate MANIFEST.in content for a Python sdist.
    
    project_files: list of file paths relative to project root
    extras: list of additional glob patterns to include
    
    Rules to generate:
    - Always include: README*, LICENSE*, CHANGES*
    - Include all .py files under src/
    - Include all .pyi stub files
    - Include patterns from extras
    - Exclude: *.pyc, __pycache__, .git, build/, dist/
    
    Return a string with MANIFEST.in content.
    """
    # TODO: implement
    pass

Expected Output

include README*
include LICENSE*
...

Hints

Hint 1: Use "include PATTERN" for specific files and "recursive-include DIR PATTERN" for directories.

Hint 2: Use "global-exclude PATTERN" for patterns to exclude everywhere.

#3Generate Twine Upload CommandEasy

twineuploadpypi-api-token

Generate twine upload commands. Twine is the standard tool for uploading Python packages to PyPI and private registries — understanding its flags is essential for CI/CD pipeline setup.

Python

def generate_twine_command(dist_files, repository='pypi', token=None, skip_existing=False):
    parts = ['twine', 'upload']

    if repository == 'testpypi':
        parts.extend(['--repository', 'testpypi'])
    elif repository == 'pypi':
        pass  # default
    elif repository.startswith('http'):
        parts.extend(['--repository-url', repository])
    else:
        parts.extend(['--repository', repository])

    if token:
        parts.extend(['--username', '__token__', '--password', token])

    if skip_existing:
        parts.append('--skip-existing')

    parts.extend(dist_files)

    return ' '.join(parts)

print(generate_twine_command(['dist/*'], 'testpypi'))
print(generate_twine_command(['dist/*'], 'pypi', token='pypi-abc123'))
print(generate_twine_command(['dist/*'], 'https://my.registry.com/simple', skip_existing=True))

Solution

def generate_twine_command(dist_files, repository='pypi', token=None, skip_existing=False):
    parts = ['twine', 'upload']
    if repository == 'testpypi':
        parts.extend(['--repository', 'testpypi'])
    elif repository.startswith('http'):
        parts.extend(['--repository-url', repository])
    elif repository != 'pypi':
        parts.extend(['--repository', repository])
    if token:
        parts.extend(['--username', '__token__', '--password', token])
    if skip_existing:
        parts.append('--skip-existing')
    parts.extend(dist_files)
    return ' '.join(parts)

Never pass the token as a command-line argument in real CI. The token would appear in process listings and CI logs. The correct approach is to set TWINE_USERNAME=__token__ and TWINE_PASSWORD=pypi-your-token as CI environment secrets, then run twine upload dist/* with no credentials in the command. PyPI API tokens (starting with pypi-) are scoped to specific projects and can be revoked individually — always use a project-scoped token in CI, not a global account token. For GitHub Actions, the pypa/gh-action-pypi-publish action handles all of this automatically.

Starter Code

def generate_twine_command(dist_files, repository='pypi', token=None, skip_existing=False):
    """Generate the twine upload command string.
    
    dist_files: list of file paths to upload (or ['dist/*'])
    repository: 'pypi' or 'testpypi' or a custom URL
    token: PyPI API token string (starts with 'pypi-')
    skip_existing: bool
    
    Return the command string.
    Use __token__ as username when token is provided.
    For testpypi, use --repository testpypi.
    For custom URL, use --repository-url URL.
    """
    # TODO: implement
    pass

Expected Output

twine upload --repository testpypi dist/*

Hints

Hint 1: Build the command as a list of strings. For token auth, set TWINE_USERNAME=__token__ in the environment or use the env approach.

Hint 2: Custom URLs use --repository-url, named repos use --repository.

#4Package Name NormalizerEasy

name-normalizationPyPIPEP-503

Implement PEP 503 package name normalization. PyPI uses this to prevent namespace squatting — my-package, my_package, and my.package all refer to the same package.

Python

import re

def normalize_package_name(name):
    return re.sub(r'[-_.]+', '-', name.lower())

def names_equivalent(name1, name2):
    return normalize_package_name(name1) == normalize_package_name(name2)

names = ['My-Package', 'my_package', 'My.Package', 'MY__PACKAGE', 'mypackage']
for n in names:
    print(n, '->', normalize_package_name(n))

print(names_equivalent('my-package', 'my_package'))
print(names_equivalent('requests', 'Requests'))
print(names_equivalent('flask', 'Django'))

Solution

import re

def normalize_package_name(name):
    return re.sub(r'[-_.]+', '-', name.lower())

def names_equivalent(name1, name2):
    return normalize_package_name(name1) == normalize_package_name(name2)

Name normalization is why you can pip install Requests (capital R) and it works. Pip normalizes the requested name before querying PyPI's Simple API. This is also why you cannot register my_package on PyPI if my-package is already taken — they normalize to the same value. The PyPI API endpoint for a package is https://pypi.org/simple/{normalized-name}/. This normalization is defined in PEP 503 and has been the source of several supply chain attacks (typosquatting) where attackers register near-matches like requsets or flaskk. Always double-check package names before pip install.

Starter Code

import re

def normalize_package_name(name):
    """Normalize a Python package name per PEP 503.
    
    Rules:
    - Lowercase all letters
    - Replace runs of [-_.] with a single hyphen
    - The normalized form is used for PyPI URL paths
    
    Examples:
    'My-Package'   -> 'my-package'
    'my_package'   -> 'my-package'
    'My.Package'   -> 'my-package'
    'MY__PACKAGE'  -> 'my-package'
    
    Also return whether two names are equivalent.
    """
    # TODO: implement normalize_package_name and names_equivalent
    pass

def names_equivalent(name1, name2):
    pass

Expected Output

my-package
True
True

Hints

Hint 1: Use re.sub(r"[-_.]+", "-", name.lower()) to normalize.

Hint 2: Two names are equivalent if their normalized forms are equal.

Medium

#5Build Artifact ValidatorMedium

build-artifactsvalidationwheel-sdist

Validate a dist/ directory before uploading to PyPI. This is the check that prevents uploading inconsistent or malformed artifacts — a mistake that requires PyPI support to undo since PyPI does not allow re-uploading the same version.

Python

import os
import re

def extract_version_from_dist(filename):
    m = re.match(r'^[\w._-]+-(\d+[\d.]+)', filename)
    return m.group(1) if m else None

def validate_dist_directory(dist_dir):
    errors = []
    warnings = []

    if not os.path.isdir(dist_dir):
        return {'valid': False, 'wheels': [], 'sdists': [],
                'errors': ['dist directory does not exist'], 'warnings': []}

    all_files = os.listdir(dist_dir)
    wheels = [f for f in all_files if f.endswith('.whl')]
    sdists = [f for f in all_files if f.endswith('.tar.gz')]

    if not wheels:
        errors.append('No wheel (.whl) found in dist/')
    if not sdists:
        warnings.append('No sdist (.tar.gz) found in dist/ — consider including one')

    versions = set()
    for f in wheels + sdists:
        v = extract_version_from_dist(f)
        if v:
            versions.add(v)

    if len(versions) > 1:
        errors.append('Version mismatch across artifacts: ' + ', '.join(sorted(versions)))

    for f in wheels:
        if not re.match(r'^[\w._-]+-\d[\d.]+-[\w]+-[\w]+-[\w]+\.whl$', f):
            errors.append('Invalid wheel filename format: ' + f)

    return {
        'valid': len(errors) == 0,
        'wheels': sorted(wheels),
        'sdists': sorted(sdists),
        'errors': errors,
        'warnings': warnings,
    }

import tempfile
with tempfile.TemporaryDirectory() as tmp:
    open(os.path.join(tmp, 'mypackage-1.0.0-py3-none-any.whl'), 'w').close()
    open(os.path.join(tmp, 'mypackage-1.0.0.tar.gz'), 'w').close()
    print(validate_dist_directory(tmp))

Solution

import os, re

def validate_dist_directory(dist_dir):
    if not os.path.isdir(dist_dir):
        return {'valid': False, 'wheels': [], 'sdists': [],
                'errors': ['dist directory does not exist'], 'warnings': []}
    files = os.listdir(dist_dir)
    wheels = sorted(f for f in files if f.endswith('.whl'))
    sdists = sorted(f for f in files if f.endswith('.tar.gz'))
    errors, warnings = [], []
    if not wheels:
        errors.append('No wheel found')
    if not sdists:
        warnings.append('No sdist found')
    versions = set()
    for f in wheels + sdists:
        m = re.match(r'^[\w._-]+-(\d+[\d.]+)', f)
        if m:
            versions.add(m.group(1))
    if len(versions) > 1:
        errors.append('Version mismatch: ' + ', '.join(sorted(versions)))
    return {'valid': not errors, 'wheels': wheels, 'sdists': sdists,
            'errors': errors, 'warnings': warnings}

PyPI's "no reuploads" policy is the most important constraint in packaging. Once you upload mypackage-1.0.0-py3-none-any.whl, that exact filename is permanently reserved — you cannot upload a corrected version with the same name. The only option is a new version number (1.0.1). This is why pre-upload validation matters: run twine check dist/* to validate the package metadata before uploading. Always test on TestPyPI first. TestPyPI is completely separate from PyPI and allows the same filename to be re-uploaded, making it safe for experimentation.

Starter Code

import os

def validate_dist_directory(dist_dir):
    """Validate the contents of a dist/ directory before upload.
    
    Check:
    1. At least one wheel (.whl) file exists
    2. At least one sdist (.tar.gz) file exists
    3. All files follow correct naming conventions
    4. No duplicate packages (same name+version, different format)
    5. Versions are consistent across wheel and sdist
    
    Return a dict:
    - 'valid': bool
    - 'wheels': list of filenames
    - 'sdists': list of filenames
    - 'errors': list of str
    - 'warnings': list of str
    """
    # TODO: implement
    pass

Expected Output

{'valid': True, 'wheels': ['mypackage-1.0.0-py3-none-any.whl'], 'sdists': ['mypackage-1.0.0.tar.gz'], 'errors': [], 'warnings': []}

Hints

Hint 1: Use os.listdir() to get files. Filter by .whl and .tar.gz extensions.

Hint 2: Extract version from each filename. Check that all versions match.

#6Release Checklist ValidatorMedium

release-checklistCI-CDrelease-process

Implement a release readiness checker. Every team should have a documented release checklist — this formalizes it into code that can be run in CI before triggering a release workflow.

Python

def validate_release_checklist(state):
    required = [
        'tests_passing',
        'version_bumped',
        'changelog_updated',
        'git_tag_created',
        'dist_built',
        'twine_check_passed',
    ]
    recommended = ['testpypi_verified']

    blocking = [item for item in required if not state.get(item, False)]
    recommended_missing = [item for item in recommended if not state.get(item, False)]

    return {
        'ready': len(blocking) == 0,
        'blocking': blocking,
        'recommended': recommended_missing,
    }

full_state = {
    'tests_passing': True, 'version_bumped': True,
    'changelog_updated': True, 'git_tag_created': True,
    'dist_built': True, 'twine_check_passed': True,
    'testpypi_verified': False,
}
print("Full state:", validate_release_checklist(full_state))

partial_state = {
    'tests_passing': False, 'version_bumped': True,
    'changelog_updated': False, 'git_tag_created': False,
    'dist_built': False, 'twine_check_passed': False,
}
print("Partial state:", validate_release_checklist(partial_state))

Solution

def validate_release_checklist(state):
    required = ['tests_passing', 'version_bumped', 'changelog_updated',
                'git_tag_created', 'dist_built', 'twine_check_passed']
    recommended = ['testpypi_verified']
    blocking = [i for i in required if not state.get(i, False)]
    recommended_missing = [i for i in recommended if not state.get(i, False)]
    return {'ready': not blocking, 'blocking': blocking, 'recommended': recommended_missing}

The TestPyPI step is the most commonly skipped and the most valuable. It is the only way to verify that your package actually installs correctly from PyPI's infrastructure — not just from your local dist/ directory. A package can pass twine check locally but fail to install from PyPI due to missing data files (forgot to include them in MANIFEST.in or wheel), bad entry points, or import errors in __init__.py. The workflow: pip install --index-url https://test.pypi.org/simple/ mypackage and run a quick smoke test. If that passes, twine upload dist/* to production PyPI.

Starter Code

def validate_release_checklist(state):
    """Validate a pre-release checklist state.
    
    state: dict with bool values for:
    - 'tests_passing': CI tests pass
    - 'version_bumped': version in pyproject.toml updated
    - 'changelog_updated': CHANGELOG.md has entry for this version
    - 'git_tag_created': git tag vX.Y.Z exists
    - 'dist_built': wheel and sdist in dist/
    - 'twine_check_passed': twine check dist/* passed
    - 'testpypi_verified': installed from TestPyPI successfully
    
    Return:
    - 'ready': bool (all items True)
    - 'blocking': list of unmet required items
    - 'recommended': list of unmet recommended items
    
    Required: first 6. testpypi_verified is recommended.
    """
    # TODO: implement
    pass

Expected Output

{'ready': False, 'blocking': ['tests_passing', ...], 'recommended': []}

Hints

Hint 1: Separate required checks from recommended checks. All required must be True for ready=True.

Hint 2: Build the blocking list from required items where state.get(item) is False.

#7PyPI Metadata ExtractorMedium

PyPI-APImetadataJSON-API

Extract structured metadata from PyPI's JSON API response. This is what pip, IDE package inspectors, and security scanners do when they need to display or audit package information.

Python

def extract_pypi_metadata(api_response):
    info = api_response.get('info', {})
    urls = api_response.get('urls', [])

    wheel_urls = [
        u['url'] for u in urls
        if u.get('packagetype') == 'bdist_wheel'
    ]
    sdist_entry = next(
        (u for u in urls if u.get('packagetype') == 'sdist'),
        None
    )

    return {
        'name': info.get('name', ''),
        'version': info.get('version', ''),
        'summary': info.get('summary', ''),
        'requires_python': info.get('requires_python', ''),
        'license': info.get('license', ''),
        'author': info.get('author', ''),
        'home_page': info.get('home_page', ''),
        'wheel_urls': wheel_urls,
        'sdist_url': sdist_entry['url'] if sdist_entry else None,
        'total_files': len(urls),
    }

mock_response = {
    'info': {
        'name': 'requests', 'version': '2.31.0',
        'summary': 'Python HTTP for Humans.',
        'requires_python': '>=3.7',
        'license': 'Apache 2.0',
        'author': 'Kenneth Reitz',
        'home_page': 'https://requests.readthedocs.io',
    },
    'urls': [
        {'packagetype': 'bdist_wheel', 'url': 'https://files.pypi.org/packages/requests-2.31.0-py3-none-any.whl'},
        {'packagetype': 'sdist', 'url': 'https://files.pypi.org/packages/requests-2.31.0.tar.gz'},
    ]
}
result = extract_pypi_metadata(mock_response)
print(result['name'], result['version'])
print('wheels:', len(result['wheel_urls']))

Solution

def extract_pypi_metadata(api_response):
    info = api_response.get('info', {})
    urls = api_response.get('urls', [])
    sdist = next((u for u in urls if u.get('packagetype') == 'sdist'), None)
    return {
        'name': info.get('name', ''), 'version': info.get('version', ''),
        'summary': info.get('summary', ''), 'requires_python': info.get('requires_python', ''),
        'license': info.get('license', ''), 'author': info.get('author', ''),
        'home_page': info.get('home_page', ''),
        'wheel_urls': [u['url'] for u in urls if u.get('packagetype') == 'bdist_wheel'],
        'sdist_url': sdist['url'] if sdist else None,
        'total_files': len(urls),
    }

PyPI's JSON API is documented at https://warehouse.pypa.io/api-reference/json.html and is completely public without authentication. The /pypi/{package}/json endpoint returns the latest version. The /pypi/{package}/{version}/json endpoint returns a specific version. The urls list contains every downloadable artifact — for popular packages like numpy or torch, this can be 50+ entries covering every supported Python version, ABI, and platform combination. Tools like pip use the Simple API (/simple/{package}/) rather than the JSON API for performance — it returns just filenames and hashes, not full metadata.

Starter Code

def extract_pypi_metadata(api_response):
    """Extract key information from a PyPI JSON API response.
    
    The PyPI JSON API returns data at:
    https://pypi.org/pypi/{package}/{version}/json
    
    api_response: dict (parsed JSON)
    
    Return a dict:
    - 'name': str
    - 'version': str
    - 'summary': str
    - 'requires_python': str
    - 'license': str
    - 'author': str
    - 'home_page': str
    - 'wheel_urls': list of wheel download URLs
    - 'sdist_url': str or None
    - 'total_files': int
    """
    # TODO: implement
    pass

Expected Output

{'name': 'requests', 'version': '2.31.0', ...}

Hints

Hint 1: Access api_response["info"] for metadata fields. api_response["urls"] for download URLs.

Hint 2: Filter urls by packagetype: "bdist_wheel" for wheels, "sdist" for source.

#8Private Registry ConfiguratorMedium

private-registryartifactorypip-config

Generate pip.ini configuration for a private package registry. Enterprise teams use Artifactory, AWS CodeArtifact, or GCP Artifact Registry to host internal packages — understanding this configuration is essential for enterprise Python development.

Python

from urllib.parse import urlparse

def configure_private_registry(config):
    registry_url = config['registry_url']
    parsed = urlparse(registry_url)
    hostname = parsed.hostname

    lines = ['[global]']

    if config.get('auth_method') == 'token' and config.get('token'):
        auth_url = registry_url.replace('://', '://__token__:' + config['token'] + '@')
        lines.append('index-url = ' + auth_url)
    elif config.get('auth_method') == 'basic' and config.get('username'):
        auth_url = registry_url.replace('://', '://' + config['username'] + '@')
        lines.append('index-url = ' + auth_url)
    else:
        lines.append('index-url = ' + registry_url)

    if config.get('fallback_to_pypi'):
        lines.append('extra-index-url = https://pypi.org/simple')

    if config.get('trusted'):
        lines.append('trusted-host = ' + hostname)

    lines.extend([
        'timeout = 60',
        'retries = 3',
    ])

    if config.get('auth_method') == 'oidc':
        lines.append('# OIDC: set TWINE_USERNAME and TWINE_PASSWORD from CI secrets')

    return '\n'.join(lines)

config = {
    'registry_url': 'https://my.registry.example.com/simple',
    'fallback_to_pypi': True,
    'auth_method': 'token',
    'token': 'secret-token',
    'trusted': False,
}
print(configure_private_registry(config))

Solution

from urllib.parse import urlparse

def configure_private_registry(config):
    url = config['registry_url']
    hostname = urlparse(url).hostname
    lines = ['[global]']
    if config.get('auth_method') == 'token' and config.get('token'):
        auth_url = url.replace('://', '://__token__:' + config['token'] + '@')
        lines.append('index-url = ' + auth_url)
    elif config.get('auth_method') == 'basic' and config.get('username'):
        auth_url = url.replace('://', '://' + config['username'] + '@')
        lines.append('index-url = ' + auth_url)
    else:
        lines.append('index-url = ' + url)
    if config.get('fallback_to_pypi'):
        lines.append('extra-index-url = https://pypi.org/simple')
    if config.get('trusted'):
        lines.append('trusted-host = ' + hostname)
    lines.extend(['timeout = 60', 'retries = 3'])
    return '\n'.join(lines)

The extra-index-url setting with a private registry and PyPI fallback is a security risk. When pip cannot find a package in the private index, it falls back to PyPI — an attacker who registers a matching package name on PyPI can intercept private package installs (dependency confusion attack). The safe configuration is to use the private registry as the sole source and mirror all required public packages through it. AWS CodeArtifact, Artifactory, and Nexus all support mirroring PyPI, so your private registry becomes a complete proxy that includes both your packages and public dependencies.

Starter Code

def configure_private_registry(config):
    """Generate pip.ini content for a private registry setup.
    
    config: dict with:
    - 'registry_url': str (the private registry URL)
    - 'fallback_to_pypi': bool
    - 'auth_method': 'token', 'basic', or 'oidc'
    - 'token': str or None
    - 'username': str or None
    - 'trusted': bool
    
    Return a string with pip.ini [global] section content.
    For OIDC auth, note that credentials come from environment.
    """
    # TODO: implement
    pass

Expected Output

[global]
index-url = https://registry.example.com/simple
...

Hints

Hint 1: The [global] section sets index-url (primary) and extra-index-url (fallback).

Hint 2: For trusted, add trusted-host with the hostname extracted from the URL.

Hard

#9CI/CD Release Pipeline SimulatorHard

CI-CDrelease-pipelineautomation

Implement a release pipeline orchestrator. This models the kind of sequential stage execution used by GitHub Actions, GitLab CI, and tools like semantic-release.

Python

class ReleasePipeline:
    def __init__(self):
        self.stages = []
        self.results = []
        self.failed_at = None

    def add_stage(self, name, func):
        self.stages.append((name, func))
        return self

    def run(self, context):
        self.results = []
        self.failed_at = None

        for name, func in self.stages:
            try:
                success, message = func(context)
            except Exception as e:
                success = False
                message = 'Exception: ' + str(e)

            self.results.append({
                'stage': name,
                'success': success,
                'message': message,
            })

            if not success:
                self.failed_at = name
                break

        return self.get_report()

    def get_report(self):
        stages_run = len(self.results)
        success_count = sum(1 for r in self.results if r['success'])
        all_ok = self.failed_at is None and stages_run == len(self.stages)

        return {
            'status': 'success' if all_ok else 'failed',
            'stages_run': stages_run,
            'stages_total': len(self.stages),
            'failed_at': self.failed_at,
            'results': self.results,
            'success_count': success_count,
        }


# Simulate pipeline stages
def check_version(ctx):
    if ctx.get('version_exists_on_pypi'):
        return False, 'Version already exists on PyPI'
    return True, 'Version is new'

def run_tests(ctx):
    if ctx.get('test_failures', 0) > 0:
        return False, str(ctx['test_failures']) + ' tests failed'
    return True, 'All tests passed'

def build_package(ctx):
    ctx['artifacts'] = ['dist/pkg-1.0.0-py3-none-any.whl', 'dist/pkg-1.0.0.tar.gz']
    return True, 'Built ' + str(len(ctx['artifacts'])) + ' artifacts'

def upload_to_pypi(ctx):
    return True, 'Uploaded ' + str(len(ctx.get('artifacts', []))) + ' files'

pipeline = ReleasePipeline()
pipeline.add_stage('check_version', check_version)
pipeline.add_stage('run_tests', run_tests)
pipeline.add_stage('build', build_package)
pipeline.add_stage('upload', upload_to_pypi)

report = pipeline.run({'version_exists_on_pypi': False, 'test_failures': 0})
print('Status:', report['status'])
print('Stages run:', report['stages_run'])
print('Failed at:', report['failed_at'])

failing_report = pipeline.run({'version_exists_on_pypi': False, 'test_failures': 2})
print('\nFailing pipeline:')
print('Status:', failing_report['status'])
print('Failed at:', failing_report['failed_at'])

Solution

class ReleasePipeline:
    def __init__(self):
        self.stages = []
        self.results = []
        self.failed_at = None

    def add_stage(self, name, func):
        self.stages.append((name, func))
        return self

    def run(self, context):
        self.results = []
        self.failed_at = None
        for name, func in self.stages:
            try:
                success, message = func(context)
            except Exception as e:
                success, message = False, 'Exception: ' + str(e)
            self.results.append({'stage': name, 'success': success, 'message': message})
            if not success:
                self.failed_at = name
                break
        return self.get_report()

    def get_report(self):
        all_ok = self.failed_at is None and len(self.results) == len(self.stages)
        return {
            'status': 'success' if all_ok else 'failed',
            'stages_run': len(self.results), 'stages_total': len(self.stages),
            'failed_at': self.failed_at, 'results': self.results,
            'success_count': sum(1 for r in self.results if r['success']),
        }

This pipeline pattern is how GitHub Actions, GitLab CI, and tools like python-semantic-release work. Each stage has a clear input (context), output (success + message), and failure behavior (stop). The context dict acts as a shared state bag that stages can both read and write — build_package writes context["artifacts"], and upload reads it. In real CI, this becomes environment variables, artifact paths in the workspace, and step outputs. The key design principle: fail fast — stop at the first failure rather than running all stages and accumulating errors, because later stages depend on earlier ones succeeding.

Starter Code

class ReleasePipeline:
    """Simulate a CI/CD release pipeline for a Python package.
    
    Pipeline stages:
    1. validate_version: check version not already on PyPI
    2. run_tests: run test suite
    3. build: create wheel and sdist
    4. check: run twine check on artifacts
    5. upload_testpypi: upload to TestPyPI
    6. smoke_test: install from TestPyPI and verify import
    7. upload_pypi: upload to production PyPI
    8. tag_release: create git tag
    
    Each stage can fail. Failed stages should stop the pipeline.
    """
    
    def __init__(self):
        self.stages = []
        self.results = []
        self.failed_at = None
    
    def add_stage(self, name, func):
        pass
    
    def run(self, context):
        pass
    
    def get_report(self):
        pass

Expected Output

{'status': 'success', 'stages_run': 8, 'failed_at': None}

Hints

Hint 1: Each stage function takes context dict and returns (success: bool, message: str).

Hint 2: Stop processing stages after the first failure. Record which stage failed.

#10Package Security ScannerHard

securitytyposquattingsupply-chain

Implement a package name security scanner. Typosquatting attacks (registering requsets when requests is popular) have affected real developers. Tools like PyPI's malware scanning and pip-audit include similar checks.

Python

def levenshtein(s1, s2):
    if len(s1) < len(s2):
        return levenshtein(s2, s1)
    if not s2:
        return len(s1)
    prev = list(range(len(s2) + 1))
    for c1 in s1:
        curr = [prev[0] + 1]
        for j, c2 in enumerate(s2):
            curr.append(min(prev[j + 1] + 1, curr[j] + 1,
                            prev[j] + (0 if c1 == c2 else 1)))
        prev = curr
    return prev[-1]

def scan_package_name(package_name, known_popular):
    pkg = package_name.lower().replace('-', '_').replace('.', '_')
    suspicious = False
    similar_to = []
    patterns_found = []
    risk_score = 0

    for known in known_popular:
        k = known.lower().replace('-', '_').replace('.', '_')
        dist = levenshtein(pkg, k)

        if pkg == k:
            continue  # exact match, not a typosquat

        if dist == 1:
            similar_to.append(known)
            patterns_found.append('levenshtein_distance_1_from_' + known)
            risk_score = max(risk_score, 8)
            suspicious = True
        elif dist == 2:
            similar_to.append(known)
            patterns_found.append('levenshtein_distance_2_from_' + known)
            risk_score = max(risk_score, 5)
            suspicious = True

        # Check prefix/suffix patterns
        for prefix in ('py-', 'python-'):
            if pkg == prefix.replace('-', '_') + k:
                similar_to.append(known)
                patterns_found.append('suspicious_prefix_' + prefix)
                risk_score = max(risk_score, 6)
                suspicious = True

        for suffix in ('-python', '-py'):
            if pkg == k + suffix.replace('-', '_'):
                similar_to.append(known)
                patterns_found.append('suspicious_suffix_' + suffix)
                risk_score = max(risk_score, 6)
                suspicious = True

    return {
        'suspicious': suspicious,
        'similar_to': list(set(similar_to)),
        'risk_score': risk_score,
        'patterns_found': list(set(patterns_found)),
    }

popular = ['requests', 'flask', 'numpy', 'pandas', 'django']
print("requsets:", scan_package_name('requsets', popular)['suspicious'],
      scan_package_name('requsets', popular)['similar_to'])
print("flask:", scan_package_name('flask', popular)['suspicious'])
print("py-requests:", scan_package_name('py-requests', popular)['patterns_found'])

Solution

def levenshtein(s1, s2):
    if len(s1) < len(s2):
        return levenshtein(s2, s1)
    if not s2:
        return len(s1)
    prev = list(range(len(s2) + 1))
    for c1 in s1:
        curr = [prev[0] + 1]
        for j, c2 in enumerate(s2):
            curr.append(min(prev[j+1]+1, curr[j]+1, prev[j]+(0 if c1==c2 else 1)))
        prev = curr
    return prev[-1]

def scan_package_name(package_name, known_popular):
    pkg = package_name.lower().replace('-', '_').replace('.', '_')
    suspicious, similar, patterns, score = False, [], [], 0
    for known in known_popular:
        k = known.lower().replace('-', '_').replace('.', '_')
        if pkg == k:
            continue
        d = levenshtein(pkg, k)
        if d <= 1:
            similar.append(known); patterns.append('lev_' + str(d) + '_from_' + known)
            score = max(score, 8 if d == 1 else 5); suspicious = True
        elif d == 2:
            similar.append(known); patterns.append('lev_2_from_' + known)
            score = max(score, 5); suspicious = True
        for pat, s in [('py_', 6), ('python_', 6), ('_python', 6), ('_py', 6)]:
            variant = pat + k if pat.endswith('_') else k + pat
            if pkg == variant:
                similar.append(known); patterns.append('prefix_suffix_pattern')
                score = max(score, s); suspicious = True
    return {'suspicious': suspicious, 'similar_to': list(set(similar)),
            'risk_score': score, 'patterns_found': list(set(patterns))}

Typosquatting cost real engineers real money. In 2017, the event-stream incident showed how popular packages can be compromised. PyPI's malware detection team uses edit distance, pattern matching, and behavioral analysis to catch typosquats. The pip-audit and bandit tools include basic name checks. Best practices: always verify the exact package name before pip install, check the PyPI page for the number of downloads and project description, and use lockfiles so your CI installs exactly pinned hashes rather than re-resolving package names.

Starter Code

def scan_package_name(package_name, known_popular):
    """Scan a package name for potential typosquatting.
    
    known_popular: list of well-known package names
    
    Check:
    1. Edit distance from known packages (Levenshtein distance <= 2)
    2. Common typo patterns: doubled letters, transpositions, missing chars
    3. Suspicious name patterns: adding 'py-' prefix, '-python' suffix
    
    Return a dict:
    - 'suspicious': bool
    - 'similar_to': list of similar package names found
    - 'risk_score': int 0-10
    - 'patterns_found': list of str
    """
    # TODO: implement
    pass

Expected Output

{'suspicious': True, 'similar_to': ['requests'], 'risk_score': 8, 'patterns_found': ['levenshtein_distance_1']}

Hints

Hint 1: Implement Levenshtein distance using dynamic programming. Distance 1-2 from a popular package is suspicious.

Hint 2: Check for prefix/suffix patterns: if "py-" + package or package + "-python" matches a known package.

#11Multi-Platform Wheel BuilderHard

cibuildwheelmanylinuxcross-compilationC-extensions

Plan the multi-platform wheel build matrix for a package with C extensions. This is what cibuildwheel does — it generates a build matrix that covers every (Python version, OS, architecture) combination.

Python

def plan_wheel_builds(package_config):
    name = package_config['name']
    has_c = package_config.get('has_c_extension', False)
    platforms = package_config.get('platforms', ['linux', 'macos', 'windows'])

    if not has_c:
        return {
            'pure_python': True,
            'build_matrix': [{'python': 'py3', 'platform': 'any',
                               'arch': 'any', 'image': None, 'wheel_tag': 'py3-none-any'}],
            'total_builds': 1,
        }

    min_py = package_config.get('min_python', '3.8')
    max_py = package_config.get('max_python', '3.12')

    py_versions = []
    min_parts = [int(x) for x in min_py.split('.')]
    max_parts = [int(x) for x in max_py.split('.')]
    for minor in range(min_parts[1], max_parts[1] + 1):
        py_versions.append('3.' + str(minor))

    platform_configs = {
        'linux': [
            ('x86_64', 'quay.io/pypa/manylinux_2_17_x86_64'),
            ('aarch64', 'quay.io/pypa/manylinux_2_17_aarch64'),
        ],
        'macos': [
            ('x86_64', None),
            ('arm64', None),
        ],
        'windows': [
            ('AMD64', None),
        ],
    }

    build_matrix = []
    for py_ver in py_versions:
        cp_tag = 'cp' + py_ver.replace('.', '')
        for platform in platforms:
            for arch, image in platform_configs.get(platform, []):
                wheel_tag = cp_tag + '-' + cp_tag + '-'
                if platform == 'linux':
                    wheel_tag += 'manylinux_2_17_' + arch.lower()
                elif platform == 'macos':
                    wheel_tag += 'macosx_10_9_' + arch.lower()
                else:
                    wheel_tag += 'win_' + arch.lower()

                build_matrix.append({
                    'python': py_ver,
                    'platform': platform,
                    'arch': arch,
                    'image': image,
                    'wheel_tag': wheel_tag,
                })

    return {
        'pure_python': False,
        'build_matrix': build_matrix,
        'total_builds': len(build_matrix),
    }

config = {
    'name': 'myextension',
    'has_c_extension': True,
    'min_python': '3.9',
    'max_python': '3.12',
    'platforms': ['linux', 'macos', 'windows'],
}
result = plan_wheel_builds(config)
print("Total builds:", result['total_builds'])
print("First entry:", result['build_matrix'][0])
print("Pure python:", result['pure_python'])

Solution

def plan_wheel_builds(package_config):
    has_c = package_config.get('has_c_extension', False)
    platforms = package_config.get('platforms', ['linux', 'macos', 'windows'])
    if not has_c:
        return {'pure_python': True,
                'build_matrix': [{'python': 'py3', 'platform': 'any', 'arch': 'any',
                                   'image': None, 'wheel_tag': 'py3-none-any'}],
                'total_builds': 1}
    min_minor = int(package_config.get('min_python', '3.8').split('.')[1])
    max_minor = int(package_config.get('max_python', '3.12').split('.')[1])
    py_versions = ['3.' + str(m) for m in range(min_minor, max_minor + 1)]
    plat_conf = {
        'linux': [('x86_64', 'quay.io/pypa/manylinux_2_17_x86_64'), ('aarch64', 'quay.io/pypa/manylinux_2_17_aarch64')],
        'macos': [('x86_64', None), ('arm64', None)],
        'windows': [('AMD64', None)],
    }
    matrix = []
    for py in py_versions:
        cp = 'cp' + py.replace('.', '')
        for plat in platforms:
            for arch, img in plat_conf.get(plat, []):
                tag = cp + '-' + cp + '-'
                if plat == 'linux':
                    tag += 'manylinux_2_17_' + arch.lower()
                elif plat == 'macos':
                    tag += 'macosx_10_9_' + arch.lower()
                else:
                    tag += 'win_' + arch.lower()
                matrix.append({'python': py, 'platform': plat, 'arch': arch, 'image': img, 'wheel_tag': tag})
    return {'pure_python': False, 'build_matrix': matrix, 'total_builds': len(matrix)}

cibuildwheel is the standard tool for building cross-platform C extension wheels. It runs inside CI (GitHub Actions, GitLab CI) and produces wheels for every supported (Python, OS, arch) combination in a single workflow. For Python 3.8-3.12 on Linux (x86_64 + aarch64), macOS (x86_64 + arm64), and Windows (x86_64) = 5 Python versions x 5 configs = 25 builds. This is why popular packages like numpy, Pillow, and psycopg2 have 50+ wheel files on PyPI — one for every supported combination. The manylinux Docker images provide a frozen glibc environment that produces wheels compatible with any modern Linux distribution.

Starter Code

def plan_wheel_builds(package_config):
    """Plan the matrix of wheel builds for a package with C extensions.
    
    package_config: dict with:
    - 'name': str
    - 'has_c_extension': bool
    - 'min_python': str (e.g. '3.8')
    - 'max_python': str (e.g. '3.12')
    - 'platforms': list of 'linux', 'macos', 'windows'
    
    For pure Python: only one wheel needed (py3-none-any).
    For C extensions: generate build matrix entries.
    
    Each entry: {'python': str, 'platform': str, 'arch': str,
                 'image': str, 'wheel_tag': str}
    
    Return: {'pure_python': bool, 'build_matrix': list, 'total_builds': int}
    """
    # TODO: implement
    pass

Expected Output

{'pure_python': False, 'build_matrix': [...], 'total_builds': 18}

Hints

Hint 1: For pure Python, return a single entry. For C extensions, generate entries for each (python_version x platform x arch) combination.

Hint 2: Linux uses manylinux images. macOS supports x86_64 and arm64. Windows typically x86_64 only.

Practice: Publishing Packages

Easy​

Medium​

Hard​

Easy

Medium

Hard