Python Publishing Packages Practice Problems & Exercises
Practice: Publishing Packages
← Back to lessonEasy
Parse and validate wheel filenames. The wheel filename encodes Python version compatibility, ABI compatibility, and platform compatibility — pip uses these tags to decide which wheel to download.
import re
def parse_wheel_filename(filename):
if not filename.endswith('.whl'):
return {'valid': False}
stem = filename[:-4]
parts = stem.split('-')
if len(parts) < 5:
return {'valid': False}
name = parts[0]
version = parts[1]
# Handle optional build tag (6 parts)
if len(parts) == 6:
python_tag = parts[3]
abi_tag = parts[4]
platform_tag = parts[5]
else:
python_tag = parts[2]
abi_tag = parts[3]
platform_tag = parts[4]
is_pure_python = (
python_tag.startswith('py') and
abi_tag == 'none' and
platform_tag == 'any'
)
return {
'valid': True,
'name': name,
'version': version,
'python_tag': python_tag,
'abi_tag': abi_tag,
'platform_tag': platform_tag,
'is_pure_python': is_pure_python,
}
tests = [
'requests-2.31.0-py3-none-any.whl',
'numpy-1.24.3-cp311-cp311-manylinux_2_17_x86_64.whl',
'not_a_wheel.tar.gz',
]
for t in tests:
r = parse_wheel_filename(t)
print(t[:30], '->', r.get('valid'), r.get('is_pure_python'))Solution
def parse_wheel_filename(filename):
if not filename.endswith('.whl'):
return {'valid': False}
stem = filename[:-4]
parts = stem.split('-')
if len(parts) < 5:
return {'valid': False}
offset = 1 if len(parts) == 6 else 0
python_tag = parts[2 + offset]
abi_tag = parts[3 + offset]
platform_tag = parts[4 + offset]
return {
'valid': True, 'name': parts[0], 'version': parts[1],
'python_tag': python_tag, 'abi_tag': abi_tag, 'platform_tag': platform_tag,
'is_pure_python': python_tag.startswith('py') and abi_tag == 'none' and platform_tag == 'any',
}
The wheel filename is a machine-readable compatibility declaration. py3-none-any means: pure Python 3, no ABI requirements, any platform — the wheel works everywhere. cp311-cp311-manylinux_2_17_x86_64 means: CPython 3.11, CPython 3.11 ABI, 64-bit Linux with glibc >= 2.17. When pip downloads a package, it ranks available wheels by compatibility preference and picks the most specific one that matches the current environment. Pure Python wheels are preferred when no platform-specific wheel is needed.
import re
def parse_wheel_filename(filename):
"""Parse a wheel filename into its components.
Wheel filename format:
{distribution}-{version}(-{build_tag})?-{python_tag}-{abi_tag}-{platform_tag}.whl
Examples:
'requests-2.31.0-py3-none-any.whl'
'numpy-1.24.3-cp311-cp311-manylinux_2_17_x86_64.whl'
Return a dict:
- 'valid': bool
- 'name': str
- 'version': str
- 'python_tag': str
- 'abi_tag': str
- 'platform_tag': str
- 'is_pure_python': bool (py3-none-any)
"""
# TODO: implement
passExpected Output
{'valid': True, 'name': 'requests', 'version': '2.31.0', 'python_tag': 'py3', 'abi_tag': 'none', 'platform_tag': 'any', 'is_pure_python': True}Hints
Hint 1: Split the filename on "-" after removing ".whl". The minimum valid split gives 5 parts.
Hint 2: is_pure_python: python_tag starts with "py", abi_tag is "none", platform_tag is "any".
Generate MANIFEST.in content for a source distribution. The MANIFEST.in file controls what files are included in a sdist tarball — critical for packages with data files, templates, or type stubs.
def generate_manifest(project_files, extras=None):
lines = [
'# Auto-generated MANIFEST.in',
'',
'# Standard files',
'include README*',
'include LICENSE*',
'include CHANGES* CHANGELOG*',
'include pyproject.toml',
'',
'# Source code',
'recursive-include src *.py',
'recursive-include src *.pyi',
'',
]
if extras:
lines.append('# Extra patterns')
for pattern in extras:
lines.append('include ' + pattern)
lines.append('')
lines.extend([
'# Exclusions',
'global-exclude *.pyc',
'global-exclude __pycache__',
'prune build',
'prune dist',
'prune .git',
'prune .tox',
'prune *.egg-info',
])
return '\n'.join(lines)
files = ['src/mypackage/__init__.py', 'README.md', 'LICENSE']
extras = ['data/*.json', 'templates/*.html']
print(generate_manifest(files, extras))Solution
def generate_manifest(project_files, extras=None):
lines = ['# Auto-generated MANIFEST.in', '',
'include README*', 'include LICENSE*',
'include CHANGES* CHANGELOG*', 'include pyproject.toml', '',
'recursive-include src *.py', 'recursive-include src *.pyi', '']
if extras:
lines.append('# Extra patterns')
lines.extend('include ' + p for p in extras)
lines.append('')
lines.extend(['global-exclude *.pyc', 'global-exclude __pycache__',
'prune build', 'prune dist', 'prune .git', 'prune *.egg-info'])
return '\n'.join(lines)
MANIFEST.in is only relevant for sdist builds — wheel builds use the build backend's own file selection mechanism (e.g., [tool.hatch.build.include]). The sdist is still important because it is what PyPI uses as the source for building on unusual platforms (Alpine Linux, musl libc, etc.) that do not have a pre-built wheel. Missing data files in sdist is a common packaging bug: the wheel works fine in testing but pip install from source fails because the data files were never included. Always test your sdist with pip install dist/mypackage-1.0.0.tar.gz before publishing.
def generate_manifest(project_files, extras=None):
"""Generate MANIFEST.in content for a Python sdist.
project_files: list of file paths relative to project root
extras: list of additional glob patterns to include
Rules to generate:
- Always include: README*, LICENSE*, CHANGES*
- Include all .py files under src/
- Include all .pyi stub files
- Include patterns from extras
- Exclude: *.pyc, __pycache__, .git, build/, dist/
Return a string with MANIFEST.in content.
"""
# TODO: implement
passExpected Output
include README*
include LICENSE*
...Hints
Hint 1: Use "include PATTERN" for specific files and "recursive-include DIR PATTERN" for directories.
Hint 2: Use "global-exclude PATTERN" for patterns to exclude everywhere.
Generate twine upload commands. Twine is the standard tool for uploading Python packages to PyPI and private registries — understanding its flags is essential for CI/CD pipeline setup.
def generate_twine_command(dist_files, repository='pypi', token=None, skip_existing=False):
parts = ['twine', 'upload']
if repository == 'testpypi':
parts.extend(['--repository', 'testpypi'])
elif repository == 'pypi':
pass # default
elif repository.startswith('http'):
parts.extend(['--repository-url', repository])
else:
parts.extend(['--repository', repository])
if token:
parts.extend(['--username', '__token__', '--password', token])
if skip_existing:
parts.append('--skip-existing')
parts.extend(dist_files)
return ' '.join(parts)
print(generate_twine_command(['dist/*'], 'testpypi'))
print(generate_twine_command(['dist/*'], 'pypi', token='pypi-abc123'))
print(generate_twine_command(['dist/*'], 'https://my.registry.com/simple', skip_existing=True))Solution
def generate_twine_command(dist_files, repository='pypi', token=None, skip_existing=False):
parts = ['twine', 'upload']
if repository == 'testpypi':
parts.extend(['--repository', 'testpypi'])
elif repository.startswith('http'):
parts.extend(['--repository-url', repository])
elif repository != 'pypi':
parts.extend(['--repository', repository])
if token:
parts.extend(['--username', '__token__', '--password', token])
if skip_existing:
parts.append('--skip-existing')
parts.extend(dist_files)
return ' '.join(parts)
Never pass the token as a command-line argument in real CI. The token would appear in process listings and CI logs. The correct approach is to set TWINE_USERNAME=__token__ and TWINE_PASSWORD=pypi-your-token as CI environment secrets, then run twine upload dist/* with no credentials in the command. PyPI API tokens (starting with pypi-) are scoped to specific projects and can be revoked individually — always use a project-scoped token in CI, not a global account token. For GitHub Actions, the pypa/gh-action-pypi-publish action handles all of this automatically.
def generate_twine_command(dist_files, repository='pypi', token=None, skip_existing=False):
"""Generate the twine upload command string.
dist_files: list of file paths to upload (or ['dist/*'])
repository: 'pypi' or 'testpypi' or a custom URL
token: PyPI API token string (starts with 'pypi-')
skip_existing: bool
Return the command string.
Use __token__ as username when token is provided.
For testpypi, use --repository testpypi.
For custom URL, use --repository-url URL.
"""
# TODO: implement
passExpected Output
twine upload --repository testpypi dist/*Hints
Hint 1: Build the command as a list of strings. For token auth, set TWINE_USERNAME=__token__ in the environment or use the env approach.
Hint 2: Custom URLs use --repository-url, named repos use --repository.
Implement PEP 503 package name normalization. PyPI uses this to prevent namespace squatting — my-package, my_package, and my.package all refer to the same package.
import re
def normalize_package_name(name):
return re.sub(r'[-_.]+', '-', name.lower())
def names_equivalent(name1, name2):
return normalize_package_name(name1) == normalize_package_name(name2)
names = ['My-Package', 'my_package', 'My.Package', 'MY__PACKAGE', 'mypackage']
for n in names:
print(n, '->', normalize_package_name(n))
print(names_equivalent('my-package', 'my_package'))
print(names_equivalent('requests', 'Requests'))
print(names_equivalent('flask', 'Django'))Solution
import re
def normalize_package_name(name):
return re.sub(r'[-_.]+', '-', name.lower())
def names_equivalent(name1, name2):
return normalize_package_name(name1) == normalize_package_name(name2)
Name normalization is why you can pip install Requests (capital R) and it works. Pip normalizes the requested name before querying PyPI's Simple API. This is also why you cannot register my_package on PyPI if my-package is already taken — they normalize to the same value. The PyPI API endpoint for a package is https://pypi.org/simple/{normalized-name}/. This normalization is defined in PEP 503 and has been the source of several supply chain attacks (typosquatting) where attackers register near-matches like requsets or flaskk. Always double-check package names before pip install.
import re
def normalize_package_name(name):
"""Normalize a Python package name per PEP 503.
Rules:
- Lowercase all letters
- Replace runs of [-_.] with a single hyphen
- The normalized form is used for PyPI URL paths
Examples:
'My-Package' -> 'my-package'
'my_package' -> 'my-package'
'My.Package' -> 'my-package'
'MY__PACKAGE' -> 'my-package'
Also return whether two names are equivalent.
"""
# TODO: implement normalize_package_name and names_equivalent
pass
def names_equivalent(name1, name2):
passExpected Output
my-package
True
TrueHints
Hint 1: Use re.sub(r"[-_.]+", "-", name.lower()) to normalize.
Hint 2: Two names are equivalent if their normalized forms are equal.
Medium
Validate a dist/ directory before uploading to PyPI. This is the check that prevents uploading inconsistent or malformed artifacts — a mistake that requires PyPI support to undo since PyPI does not allow re-uploading the same version.
import os
import re
def extract_version_from_dist(filename):
m = re.match(r'^[\w._-]+-(\d+[\d.]+)', filename)
return m.group(1) if m else None
def validate_dist_directory(dist_dir):
errors = []
warnings = []
if not os.path.isdir(dist_dir):
return {'valid': False, 'wheels': [], 'sdists': [],
'errors': ['dist directory does not exist'], 'warnings': []}
all_files = os.listdir(dist_dir)
wheels = [f for f in all_files if f.endswith('.whl')]
sdists = [f for f in all_files if f.endswith('.tar.gz')]
if not wheels:
errors.append('No wheel (.whl) found in dist/')
if not sdists:
warnings.append('No sdist (.tar.gz) found in dist/ — consider including one')
versions = set()
for f in wheels + sdists:
v = extract_version_from_dist(f)
if v:
versions.add(v)
if len(versions) > 1:
errors.append('Version mismatch across artifacts: ' + ', '.join(sorted(versions)))
for f in wheels:
if not re.match(r'^[\w._-]+-\d[\d.]+-[\w]+-[\w]+-[\w]+\.whl$', f):
errors.append('Invalid wheel filename format: ' + f)
return {
'valid': len(errors) == 0,
'wheels': sorted(wheels),
'sdists': sorted(sdists),
'errors': errors,
'warnings': warnings,
}
import tempfile
with tempfile.TemporaryDirectory() as tmp:
open(os.path.join(tmp, 'mypackage-1.0.0-py3-none-any.whl'), 'w').close()
open(os.path.join(tmp, 'mypackage-1.0.0.tar.gz'), 'w').close()
print(validate_dist_directory(tmp))Solution
import os, re
def validate_dist_directory(dist_dir):
if not os.path.isdir(dist_dir):
return {'valid': False, 'wheels': [], 'sdists': [],
'errors': ['dist directory does not exist'], 'warnings': []}
files = os.listdir(dist_dir)
wheels = sorted(f for f in files if f.endswith('.whl'))
sdists = sorted(f for f in files if f.endswith('.tar.gz'))
errors, warnings = [], []
if not wheels:
errors.append('No wheel found')
if not sdists:
warnings.append('No sdist found')
versions = set()
for f in wheels + sdists:
m = re.match(r'^[\w._-]+-(\d+[\d.]+)', f)
if m:
versions.add(m.group(1))
if len(versions) > 1:
errors.append('Version mismatch: ' + ', '.join(sorted(versions)))
return {'valid': not errors, 'wheels': wheels, 'sdists': sdists,
'errors': errors, 'warnings': warnings}
PyPI's "no reuploads" policy is the most important constraint in packaging. Once you upload mypackage-1.0.0-py3-none-any.whl, that exact filename is permanently reserved — you cannot upload a corrected version with the same name. The only option is a new version number (1.0.1). This is why pre-upload validation matters: run twine check dist/* to validate the package metadata before uploading. Always test on TestPyPI first. TestPyPI is completely separate from PyPI and allows the same filename to be re-uploaded, making it safe for experimentation.
import os
def validate_dist_directory(dist_dir):
"""Validate the contents of a dist/ directory before upload.
Check:
1. At least one wheel (.whl) file exists
2. At least one sdist (.tar.gz) file exists
3. All files follow correct naming conventions
4. No duplicate packages (same name+version, different format)
5. Versions are consistent across wheel and sdist
Return a dict:
- 'valid': bool
- 'wheels': list of filenames
- 'sdists': list of filenames
- 'errors': list of str
- 'warnings': list of str
"""
# TODO: implement
passExpected Output
{'valid': True, 'wheels': ['mypackage-1.0.0-py3-none-any.whl'], 'sdists': ['mypackage-1.0.0.tar.gz'], 'errors': [], 'warnings': []}Hints
Hint 1: Use os.listdir() to get files. Filter by .whl and .tar.gz extensions.
Hint 2: Extract version from each filename. Check that all versions match.
Implement a release readiness checker. Every team should have a documented release checklist — this formalizes it into code that can be run in CI before triggering a release workflow.
def validate_release_checklist(state):
required = [
'tests_passing',
'version_bumped',
'changelog_updated',
'git_tag_created',
'dist_built',
'twine_check_passed',
]
recommended = ['testpypi_verified']
blocking = [item for item in required if not state.get(item, False)]
recommended_missing = [item for item in recommended if not state.get(item, False)]
return {
'ready': len(blocking) == 0,
'blocking': blocking,
'recommended': recommended_missing,
}
full_state = {
'tests_passing': True, 'version_bumped': True,
'changelog_updated': True, 'git_tag_created': True,
'dist_built': True, 'twine_check_passed': True,
'testpypi_verified': False,
}
print("Full state:", validate_release_checklist(full_state))
partial_state = {
'tests_passing': False, 'version_bumped': True,
'changelog_updated': False, 'git_tag_created': False,
'dist_built': False, 'twine_check_passed': False,
}
print("Partial state:", validate_release_checklist(partial_state))Solution
def validate_release_checklist(state):
required = ['tests_passing', 'version_bumped', 'changelog_updated',
'git_tag_created', 'dist_built', 'twine_check_passed']
recommended = ['testpypi_verified']
blocking = [i for i in required if not state.get(i, False)]
recommended_missing = [i for i in recommended if not state.get(i, False)]
return {'ready': not blocking, 'blocking': blocking, 'recommended': recommended_missing}
The TestPyPI step is the most commonly skipped and the most valuable. It is the only way to verify that your package actually installs correctly from PyPI's infrastructure — not just from your local dist/ directory. A package can pass twine check locally but fail to install from PyPI due to missing data files (forgot to include them in MANIFEST.in or wheel), bad entry points, or import errors in __init__.py. The workflow: pip install --index-url https://test.pypi.org/simple/ mypackage and run a quick smoke test. If that passes, twine upload dist/* to production PyPI.
def validate_release_checklist(state):
"""Validate a pre-release checklist state.
state: dict with bool values for:
- 'tests_passing': CI tests pass
- 'version_bumped': version in pyproject.toml updated
- 'changelog_updated': CHANGELOG.md has entry for this version
- 'git_tag_created': git tag vX.Y.Z exists
- 'dist_built': wheel and sdist in dist/
- 'twine_check_passed': twine check dist/* passed
- 'testpypi_verified': installed from TestPyPI successfully
Return:
- 'ready': bool (all items True)
- 'blocking': list of unmet required items
- 'recommended': list of unmet recommended items
Required: first 6. testpypi_verified is recommended.
"""
# TODO: implement
passExpected Output
{'ready': False, 'blocking': ['tests_passing', ...], 'recommended': []}Hints
Hint 1: Separate required checks from recommended checks. All required must be True for ready=True.
Hint 2: Build the blocking list from required items where state.get(item) is False.
Extract structured metadata from PyPI's JSON API response. This is what pip, IDE package inspectors, and security scanners do when they need to display or audit package information.
def extract_pypi_metadata(api_response):
info = api_response.get('info', {})
urls = api_response.get('urls', [])
wheel_urls = [
u['url'] for u in urls
if u.get('packagetype') == 'bdist_wheel'
]
sdist_entry = next(
(u for u in urls if u.get('packagetype') == 'sdist'),
None
)
return {
'name': info.get('name', ''),
'version': info.get('version', ''),
'summary': info.get('summary', ''),
'requires_python': info.get('requires_python', ''),
'license': info.get('license', ''),
'author': info.get('author', ''),
'home_page': info.get('home_page', ''),
'wheel_urls': wheel_urls,
'sdist_url': sdist_entry['url'] if sdist_entry else None,
'total_files': len(urls),
}
mock_response = {
'info': {
'name': 'requests', 'version': '2.31.0',
'summary': 'Python HTTP for Humans.',
'requires_python': '>=3.7',
'license': 'Apache 2.0',
'author': 'Kenneth Reitz',
'home_page': 'https://requests.readthedocs.io',
},
'urls': [
{'packagetype': 'bdist_wheel', 'url': 'https://files.pypi.org/packages/requests-2.31.0-py3-none-any.whl'},
{'packagetype': 'sdist', 'url': 'https://files.pypi.org/packages/requests-2.31.0.tar.gz'},
]
}
result = extract_pypi_metadata(mock_response)
print(result['name'], result['version'])
print('wheels:', len(result['wheel_urls']))Solution
def extract_pypi_metadata(api_response):
info = api_response.get('info', {})
urls = api_response.get('urls', [])
sdist = next((u for u in urls if u.get('packagetype') == 'sdist'), None)
return {
'name': info.get('name', ''), 'version': info.get('version', ''),
'summary': info.get('summary', ''), 'requires_python': info.get('requires_python', ''),
'license': info.get('license', ''), 'author': info.get('author', ''),
'home_page': info.get('home_page', ''),
'wheel_urls': [u['url'] for u in urls if u.get('packagetype') == 'bdist_wheel'],
'sdist_url': sdist['url'] if sdist else None,
'total_files': len(urls),
}
PyPI's JSON API is documented at https://warehouse.pypa.io/api-reference/json.html and is completely public without authentication. The /pypi/{package}/json endpoint returns the latest version. The /pypi/{package}/{version}/json endpoint returns a specific version. The urls list contains every downloadable artifact — for popular packages like numpy or torch, this can be 50+ entries covering every supported Python version, ABI, and platform combination. Tools like pip use the Simple API (/simple/{package}/) rather than the JSON API for performance — it returns just filenames and hashes, not full metadata.
def extract_pypi_metadata(api_response):
"""Extract key information from a PyPI JSON API response.
The PyPI JSON API returns data at:
https://pypi.org/pypi/{package}/{version}/json
api_response: dict (parsed JSON)
Return a dict:
- 'name': str
- 'version': str
- 'summary': str
- 'requires_python': str
- 'license': str
- 'author': str
- 'home_page': str
- 'wheel_urls': list of wheel download URLs
- 'sdist_url': str or None
- 'total_files': int
"""
# TODO: implement
passExpected Output
{'name': 'requests', 'version': '2.31.0', ...}Hints
Hint 1: Access api_response["info"] for metadata fields. api_response["urls"] for download URLs.
Hint 2: Filter urls by packagetype: "bdist_wheel" for wheels, "sdist" for source.
Generate pip.ini configuration for a private package registry. Enterprise teams use Artifactory, AWS CodeArtifact, or GCP Artifact Registry to host internal packages — understanding this configuration is essential for enterprise Python development.
from urllib.parse import urlparse
def configure_private_registry(config):
registry_url = config['registry_url']
parsed = urlparse(registry_url)
hostname = parsed.hostname
lines = ['[global]']
if config.get('auth_method') == 'token' and config.get('token'):
auth_url = registry_url.replace('://', '://__token__:' + config['token'] + '@')
lines.append('index-url = ' + auth_url)
elif config.get('auth_method') == 'basic' and config.get('username'):
auth_url = registry_url.replace('://', '://' + config['username'] + '@')
lines.append('index-url = ' + auth_url)
else:
lines.append('index-url = ' + registry_url)
if config.get('fallback_to_pypi'):
lines.append('extra-index-url = https://pypi.org/simple')
if config.get('trusted'):
lines.append('trusted-host = ' + hostname)
lines.extend([
'timeout = 60',
'retries = 3',
])
if config.get('auth_method') == 'oidc':
lines.append('# OIDC: set TWINE_USERNAME and TWINE_PASSWORD from CI secrets')
return '\n'.join(lines)
config = {
'registry_url': 'https://my.registry.example.com/simple',
'fallback_to_pypi': True,
'auth_method': 'token',
'token': 'secret-token',
'trusted': False,
}
print(configure_private_registry(config))Solution
from urllib.parse import urlparse
def configure_private_registry(config):
url = config['registry_url']
hostname = urlparse(url).hostname
lines = ['[global]']
if config.get('auth_method') == 'token' and config.get('token'):
auth_url = url.replace('://', '://__token__:' + config['token'] + '@')
lines.append('index-url = ' + auth_url)
elif config.get('auth_method') == 'basic' and config.get('username'):
auth_url = url.replace('://', '://' + config['username'] + '@')
lines.append('index-url = ' + auth_url)
else:
lines.append('index-url = ' + url)
if config.get('fallback_to_pypi'):
lines.append('extra-index-url = https://pypi.org/simple')
if config.get('trusted'):
lines.append('trusted-host = ' + hostname)
lines.extend(['timeout = 60', 'retries = 3'])
return '\n'.join(lines)
The extra-index-url setting with a private registry and PyPI fallback is a security risk. When pip cannot find a package in the private index, it falls back to PyPI — an attacker who registers a matching package name on PyPI can intercept private package installs (dependency confusion attack). The safe configuration is to use the private registry as the sole source and mirror all required public packages through it. AWS CodeArtifact, Artifactory, and Nexus all support mirroring PyPI, so your private registry becomes a complete proxy that includes both your packages and public dependencies.
def configure_private_registry(config):
"""Generate pip.ini content for a private registry setup.
config: dict with:
- 'registry_url': str (the private registry URL)
- 'fallback_to_pypi': bool
- 'auth_method': 'token', 'basic', or 'oidc'
- 'token': str or None
- 'username': str or None
- 'trusted': bool
Return a string with pip.ini [global] section content.
For OIDC auth, note that credentials come from environment.
"""
# TODO: implement
passExpected Output
[global]
index-url = https://registry.example.com/simple
...Hints
Hint 1: The [global] section sets index-url (primary) and extra-index-url (fallback).
Hint 2: For trusted, add trusted-host with the hostname extracted from the URL.
Hard
Implement a release pipeline orchestrator. This models the kind of sequential stage execution used by GitHub Actions, GitLab CI, and tools like semantic-release.
class ReleasePipeline:
def __init__(self):
self.stages = []
self.results = []
self.failed_at = None
def add_stage(self, name, func):
self.stages.append((name, func))
return self
def run(self, context):
self.results = []
self.failed_at = None
for name, func in self.stages:
try:
success, message = func(context)
except Exception as e:
success = False
message = 'Exception: ' + str(e)
self.results.append({
'stage': name,
'success': success,
'message': message,
})
if not success:
self.failed_at = name
break
return self.get_report()
def get_report(self):
stages_run = len(self.results)
success_count = sum(1 for r in self.results if r['success'])
all_ok = self.failed_at is None and stages_run == len(self.stages)
return {
'status': 'success' if all_ok else 'failed',
'stages_run': stages_run,
'stages_total': len(self.stages),
'failed_at': self.failed_at,
'results': self.results,
'success_count': success_count,
}
# Simulate pipeline stages
def check_version(ctx):
if ctx.get('version_exists_on_pypi'):
return False, 'Version already exists on PyPI'
return True, 'Version is new'
def run_tests(ctx):
if ctx.get('test_failures', 0) > 0:
return False, str(ctx['test_failures']) + ' tests failed'
return True, 'All tests passed'
def build_package(ctx):
ctx['artifacts'] = ['dist/pkg-1.0.0-py3-none-any.whl', 'dist/pkg-1.0.0.tar.gz']
return True, 'Built ' + str(len(ctx['artifacts'])) + ' artifacts'
def upload_to_pypi(ctx):
return True, 'Uploaded ' + str(len(ctx.get('artifacts', []))) + ' files'
pipeline = ReleasePipeline()
pipeline.add_stage('check_version', check_version)
pipeline.add_stage('run_tests', run_tests)
pipeline.add_stage('build', build_package)
pipeline.add_stage('upload', upload_to_pypi)
report = pipeline.run({'version_exists_on_pypi': False, 'test_failures': 0})
print('Status:', report['status'])
print('Stages run:', report['stages_run'])
print('Failed at:', report['failed_at'])
failing_report = pipeline.run({'version_exists_on_pypi': False, 'test_failures': 2})
print('\nFailing pipeline:')
print('Status:', failing_report['status'])
print('Failed at:', failing_report['failed_at'])Solution
class ReleasePipeline:
def __init__(self):
self.stages = []
self.results = []
self.failed_at = None
def add_stage(self, name, func):
self.stages.append((name, func))
return self
def run(self, context):
self.results = []
self.failed_at = None
for name, func in self.stages:
try:
success, message = func(context)
except Exception as e:
success, message = False, 'Exception: ' + str(e)
self.results.append({'stage': name, 'success': success, 'message': message})
if not success:
self.failed_at = name
break
return self.get_report()
def get_report(self):
all_ok = self.failed_at is None and len(self.results) == len(self.stages)
return {
'status': 'success' if all_ok else 'failed',
'stages_run': len(self.results), 'stages_total': len(self.stages),
'failed_at': self.failed_at, 'results': self.results,
'success_count': sum(1 for r in self.results if r['success']),
}
This pipeline pattern is how GitHub Actions, GitLab CI, and tools like python-semantic-release work. Each stage has a clear input (context), output (success + message), and failure behavior (stop). The context dict acts as a shared state bag that stages can both read and write — build_package writes context["artifacts"], and upload reads it. In real CI, this becomes environment variables, artifact paths in the workspace, and step outputs. The key design principle: fail fast — stop at the first failure rather than running all stages and accumulating errors, because later stages depend on earlier ones succeeding.
class ReleasePipeline:
"""Simulate a CI/CD release pipeline for a Python package.
Pipeline stages:
1. validate_version: check version not already on PyPI
2. run_tests: run test suite
3. build: create wheel and sdist
4. check: run twine check on artifacts
5. upload_testpypi: upload to TestPyPI
6. smoke_test: install from TestPyPI and verify import
7. upload_pypi: upload to production PyPI
8. tag_release: create git tag
Each stage can fail. Failed stages should stop the pipeline.
"""
def __init__(self):
self.stages = []
self.results = []
self.failed_at = None
def add_stage(self, name, func):
pass
def run(self, context):
pass
def get_report(self):
passExpected Output
{'status': 'success', 'stages_run': 8, 'failed_at': None}Hints
Hint 1: Each stage function takes context dict and returns (success: bool, message: str).
Hint 2: Stop processing stages after the first failure. Record which stage failed.
Implement a package name security scanner. Typosquatting attacks (registering requsets when requests is popular) have affected real developers. Tools like PyPI's malware scanning and pip-audit include similar checks.
def levenshtein(s1, s2):
if len(s1) < len(s2):
return levenshtein(s2, s1)
if not s2:
return len(s1)
prev = list(range(len(s2) + 1))
for c1 in s1:
curr = [prev[0] + 1]
for j, c2 in enumerate(s2):
curr.append(min(prev[j + 1] + 1, curr[j] + 1,
prev[j] + (0 if c1 == c2 else 1)))
prev = curr
return prev[-1]
def scan_package_name(package_name, known_popular):
pkg = package_name.lower().replace('-', '_').replace('.', '_')
suspicious = False
similar_to = []
patterns_found = []
risk_score = 0
for known in known_popular:
k = known.lower().replace('-', '_').replace('.', '_')
dist = levenshtein(pkg, k)
if pkg == k:
continue # exact match, not a typosquat
if dist == 1:
similar_to.append(known)
patterns_found.append('levenshtein_distance_1_from_' + known)
risk_score = max(risk_score, 8)
suspicious = True
elif dist == 2:
similar_to.append(known)
patterns_found.append('levenshtein_distance_2_from_' + known)
risk_score = max(risk_score, 5)
suspicious = True
# Check prefix/suffix patterns
for prefix in ('py-', 'python-'):
if pkg == prefix.replace('-', '_') + k:
similar_to.append(known)
patterns_found.append('suspicious_prefix_' + prefix)
risk_score = max(risk_score, 6)
suspicious = True
for suffix in ('-python', '-py'):
if pkg == k + suffix.replace('-', '_'):
similar_to.append(known)
patterns_found.append('suspicious_suffix_' + suffix)
risk_score = max(risk_score, 6)
suspicious = True
return {
'suspicious': suspicious,
'similar_to': list(set(similar_to)),
'risk_score': risk_score,
'patterns_found': list(set(patterns_found)),
}
popular = ['requests', 'flask', 'numpy', 'pandas', 'django']
print("requsets:", scan_package_name('requsets', popular)['suspicious'],
scan_package_name('requsets', popular)['similar_to'])
print("flask:", scan_package_name('flask', popular)['suspicious'])
print("py-requests:", scan_package_name('py-requests', popular)['patterns_found'])Solution
def levenshtein(s1, s2):
if len(s1) < len(s2):
return levenshtein(s2, s1)
if not s2:
return len(s1)
prev = list(range(len(s2) + 1))
for c1 in s1:
curr = [prev[0] + 1]
for j, c2 in enumerate(s2):
curr.append(min(prev[j+1]+1, curr[j]+1, prev[j]+(0 if c1==c2 else 1)))
prev = curr
return prev[-1]
def scan_package_name(package_name, known_popular):
pkg = package_name.lower().replace('-', '_').replace('.', '_')
suspicious, similar, patterns, score = False, [], [], 0
for known in known_popular:
k = known.lower().replace('-', '_').replace('.', '_')
if pkg == k:
continue
d = levenshtein(pkg, k)
if d <= 1:
similar.append(known); patterns.append('lev_' + str(d) + '_from_' + known)
score = max(score, 8 if d == 1 else 5); suspicious = True
elif d == 2:
similar.append(known); patterns.append('lev_2_from_' + known)
score = max(score, 5); suspicious = True
for pat, s in [('py_', 6), ('python_', 6), ('_python', 6), ('_py', 6)]:
variant = pat + k if pat.endswith('_') else k + pat
if pkg == variant:
similar.append(known); patterns.append('prefix_suffix_pattern')
score = max(score, s); suspicious = True
return {'suspicious': suspicious, 'similar_to': list(set(similar)),
'risk_score': score, 'patterns_found': list(set(patterns))}
Typosquatting cost real engineers real money. In 2017, the event-stream incident showed how popular packages can be compromised. PyPI's malware detection team uses edit distance, pattern matching, and behavioral analysis to catch typosquats. The pip-audit and bandit tools include basic name checks. Best practices: always verify the exact package name before pip install, check the PyPI page for the number of downloads and project description, and use lockfiles so your CI installs exactly pinned hashes rather than re-resolving package names.
def scan_package_name(package_name, known_popular):
"""Scan a package name for potential typosquatting.
known_popular: list of well-known package names
Check:
1. Edit distance from known packages (Levenshtein distance <= 2)
2. Common typo patterns: doubled letters, transpositions, missing chars
3. Suspicious name patterns: adding 'py-' prefix, '-python' suffix
Return a dict:
- 'suspicious': bool
- 'similar_to': list of similar package names found
- 'risk_score': int 0-10
- 'patterns_found': list of str
"""
# TODO: implement
passExpected Output
{'suspicious': True, 'similar_to': ['requests'], 'risk_score': 8, 'patterns_found': ['levenshtein_distance_1']}Hints
Hint 1: Implement Levenshtein distance using dynamic programming. Distance 1-2 from a popular package is suspicious.
Hint 2: Check for prefix/suffix patterns: if "py-" + package or package + "-python" matches a known package.
Plan the multi-platform wheel build matrix for a package with C extensions. This is what cibuildwheel does — it generates a build matrix that covers every (Python version, OS, architecture) combination.
def plan_wheel_builds(package_config):
name = package_config['name']
has_c = package_config.get('has_c_extension', False)
platforms = package_config.get('platforms', ['linux', 'macos', 'windows'])
if not has_c:
return {
'pure_python': True,
'build_matrix': [{'python': 'py3', 'platform': 'any',
'arch': 'any', 'image': None, 'wheel_tag': 'py3-none-any'}],
'total_builds': 1,
}
min_py = package_config.get('min_python', '3.8')
max_py = package_config.get('max_python', '3.12')
py_versions = []
min_parts = [int(x) for x in min_py.split('.')]
max_parts = [int(x) for x in max_py.split('.')]
for minor in range(min_parts[1], max_parts[1] + 1):
py_versions.append('3.' + str(minor))
platform_configs = {
'linux': [
('x86_64', 'quay.io/pypa/manylinux_2_17_x86_64'),
('aarch64', 'quay.io/pypa/manylinux_2_17_aarch64'),
],
'macos': [
('x86_64', None),
('arm64', None),
],
'windows': [
('AMD64', None),
],
}
build_matrix = []
for py_ver in py_versions:
cp_tag = 'cp' + py_ver.replace('.', '')
for platform in platforms:
for arch, image in platform_configs.get(platform, []):
wheel_tag = cp_tag + '-' + cp_tag + '-'
if platform == 'linux':
wheel_tag += 'manylinux_2_17_' + arch.lower()
elif platform == 'macos':
wheel_tag += 'macosx_10_9_' + arch.lower()
else:
wheel_tag += 'win_' + arch.lower()
build_matrix.append({
'python': py_ver,
'platform': platform,
'arch': arch,
'image': image,
'wheel_tag': wheel_tag,
})
return {
'pure_python': False,
'build_matrix': build_matrix,
'total_builds': len(build_matrix),
}
config = {
'name': 'myextension',
'has_c_extension': True,
'min_python': '3.9',
'max_python': '3.12',
'platforms': ['linux', 'macos', 'windows'],
}
result = plan_wheel_builds(config)
print("Total builds:", result['total_builds'])
print("First entry:", result['build_matrix'][0])
print("Pure python:", result['pure_python'])Solution
def plan_wheel_builds(package_config):
has_c = package_config.get('has_c_extension', False)
platforms = package_config.get('platforms', ['linux', 'macos', 'windows'])
if not has_c:
return {'pure_python': True,
'build_matrix': [{'python': 'py3', 'platform': 'any', 'arch': 'any',
'image': None, 'wheel_tag': 'py3-none-any'}],
'total_builds': 1}
min_minor = int(package_config.get('min_python', '3.8').split('.')[1])
max_minor = int(package_config.get('max_python', '3.12').split('.')[1])
py_versions = ['3.' + str(m) for m in range(min_minor, max_minor + 1)]
plat_conf = {
'linux': [('x86_64', 'quay.io/pypa/manylinux_2_17_x86_64'), ('aarch64', 'quay.io/pypa/manylinux_2_17_aarch64')],
'macos': [('x86_64', None), ('arm64', None)],
'windows': [('AMD64', None)],
}
matrix = []
for py in py_versions:
cp = 'cp' + py.replace('.', '')
for plat in platforms:
for arch, img in plat_conf.get(plat, []):
tag = cp + '-' + cp + '-'
if plat == 'linux':
tag += 'manylinux_2_17_' + arch.lower()
elif plat == 'macos':
tag += 'macosx_10_9_' + arch.lower()
else:
tag += 'win_' + arch.lower()
matrix.append({'python': py, 'platform': plat, 'arch': arch, 'image': img, 'wheel_tag': tag})
return {'pure_python': False, 'build_matrix': matrix, 'total_builds': len(matrix)}
cibuildwheel is the standard tool for building cross-platform C extension wheels. It runs inside CI (GitHub Actions, GitLab CI) and produces wheels for every supported (Python, OS, arch) combination in a single workflow. For Python 3.8-3.12 on Linux (x86_64 + aarch64), macOS (x86_64 + arm64), and Windows (x86_64) = 5 Python versions x 5 configs = 25 builds. This is why popular packages like numpy, Pillow, and psycopg2 have 50+ wheel files on PyPI — one for every supported combination. The manylinux Docker images provide a frozen glibc environment that produces wheels compatible with any modern Linux distribution.
def plan_wheel_builds(package_config):
"""Plan the matrix of wheel builds for a package with C extensions.
package_config: dict with:
- 'name': str
- 'has_c_extension': bool
- 'min_python': str (e.g. '3.8')
- 'max_python': str (e.g. '3.12')
- 'platforms': list of 'linux', 'macos', 'windows'
For pure Python: only one wheel needed (py3-none-any).
For C extensions: generate build matrix entries.
Each entry: {'python': str, 'platform': str, 'arch': str,
'image': str, 'wheel_tag': str}
Return: {'pure_python': bool, 'build_matrix': list, 'total_builds': int}
"""
# TODO: implement
passExpected Output
{'pure_python': False, 'build_matrix': [...], 'total_builds': 18}Hints
Hint 1: For pure Python, return a single entry. For C extensions, generate entries for each (python_version x platform x arch) combination.
Hint 2: Linux uses manylinux images. macOS supports x86_64 and arm64. Windows typically x86_64 only.
