Skip to main content

Project Descriptions - Make Every Bullet Count

How to describe ML projects so recruiters understand impact and interviewers want to dig deeper.

Reading time: ~20 min | Interview relevance: High | Roles: All AI/ML roles

Two Descriptions, One Project

Candidate A and Candidate B worked on the exact same project at the same company. Same team, same codebase, same launch date. Here is how each described it on their resume:

Candidate A:

Built a recommendation system using collaborative filtering and deep learning. Used Python, TensorFlow, and AWS. Worked with a team of 5 engineers.

Candidate B:

Designed and deployed a hybrid recommendation engine combining collaborative filtering with a two-tower neural retrieval model, serving 12M daily active users. Reduced cold-start recommendation latency from 850ms to 120ms by implementing an approximate nearest-neighbor index (FAISS), increasing new-user engagement by 23% in A/B testing (p<0.01, 14-day test window).

Candidate A got no callbacks. Candidate B got phone screens at three FAANG companies in the same week.

The difference is not exaggeration. Candidate B's description is simply more informative. It tells the recruiter "this person built something real at scale" and tells the interviewing engineer "this person understands systems, measurement, and tradeoffs." Candidate A's description could have been written by someone who followed a tutorial.

This chapter teaches you to write descriptions like Candidate B for every project on your resume, portfolio, and LinkedIn.

Why Project Descriptions Matter More Than Job Titles

In most engineering fields, your job title and company name do most of the signaling. A "Senior Software Engineer at Google" conveys a rough skill level regardless of what that person actually built.

AI/ML is different. The field is so broad and moves so fast that titles are nearly meaningless:

  • A "Machine Learning Engineer" might be training foundation models or might be writing SQL queries to populate dashboards
  • A "Data Scientist" might be publishing at NeurIPS or might be making bar charts in Jupyter notebooks
  • An "AI Engineer" might be building RAG systems from scratch or might be calling the OpenAI API with default parameters

Your project descriptions are the only place where your actual skill level becomes visible. A strong description of a side project can outweigh a weak description of a job at a prestigious company. Hiring managers know this. They scan project descriptions first, job titles second.

The 7-Second Rule

Recruiters spend an average of 7.4 seconds on a resume. In that time, they will read at most 2-3 bullet points. If your top project description does not immediately signal "this person is technical and delivers results," the resume goes into the rejection pile. Front-load your strongest project with the most specific, quantified description you can write.

The PSTAR Framework

Every strong project description follows a structure, even if the structure is invisible. Use PSTAR to ensure you cover every dimension a reader cares about:

LetterComponentWhat It AnswersExample
PProblemWhat business or research problem did you solve?"Cold-start users saw irrelevant recommendations, causing 34% drop-off in first session"
SSolutionWhat was your technical approach?"Hybrid retrieval model combining collaborative filtering with content-based embeddings"
TTechnologyWhat tools, frameworks, and infrastructure did you use?"PyTorch, FAISS, Redis, Kubernetes, served via gRPC endpoint"
AArchitectureWhat were the key design decisions and tradeoffs?"Two-tower model with shared embedding layer; ANN index updated hourly via batch pipeline"
RResultsWhat was the measurable outcome?"23% increase in new-user engagement; inference latency reduced from 850ms to 120ms"

You do not need to include every PSTAR element in every bullet point. A resume bullet might be P+S+R (problem, solution, result). A portfolio README might include all five. The framework ensures you never forget a critical dimension.

PSTAR Applied to a Resume Bullet

Full PSTAR breakdown (for your notes):

  • P: E-commerce search returned irrelevant results for long-tail queries, costing ~$2M/year in lost conversions
  • S: Fine-tuned a cross-encoder reranker on domain-specific query-document pairs
  • T: BERT-base, Elasticsearch, Python, deployed on AWS SageMaker
  • A: Two-stage retrieval: BM25 recall (top 100) then cross-encoder rerank (top 10)
  • R: 18% improvement in NDCG@10, 12% increase in conversion rate on long-tail queries

Compressed into a resume bullet:

Fine-tuned a BERT cross-encoder reranker for e-commerce search, improving NDCG@10 by 18% and conversion rate by 12% on long-tail queries ($2M+ annual revenue impact). Deployed two-stage retrieval pipeline (BM25 recall + neural rerank) serving 50K queries/day on SageMaker.

Notice how a single bullet carries problem context, approach, tools, architecture insight, and results. That density is what separates strong from weak descriptions.

Writing for Two Audiences

Your project description will be read by at least two very different people:

Audience 1: The Recruiter (Scanning for Impact)

Recruiters are not ML engineers. They are looking for:

  • Numbers - any metric, percentage, dollar amount, or scale indicator
  • Recognizable technologies - PyTorch, TensorFlow, AWS, Kubernetes, LLMs
  • Business language - revenue, users, latency, cost reduction, engagement
  • Scope indicators - "led," "designed," "owned," "scaled to X users"

The recruiter's question: "Is this person senior enough for the role and did they have real impact?"

Audience 2: The Interviewing Engineer (Scanning for Depth)

The technical interviewer is looking for:

  • Specific model architectures - not "deep learning" but "two-tower retrieval model with in-batch negatives"
  • Tradeoff awareness - why this approach and not the obvious alternative?
  • Systems thinking - how was this trained, served, monitored, and updated?
  • Evaluation rigor - proper metrics, statistical significance, offline vs online evaluation

The engineer's question: "Does this person actually understand what they built, or did they just follow instructions?"

The Depth Trap

Do not sacrifice recruiter readability for technical depth. A bullet like "Implemented RLHF with PPO using a reward model trained on 50K human preference pairs" is perfect - the recruiter sees "RLHF" (a hot keyword) and "50K" (scale), while the engineer sees specific methodology. A bullet like "Implemented proximal policy optimization with clipped surrogate objective and generalized advantage estimation for language model alignment" loses the recruiter entirely. Save deep technical details for your portfolio README or interview conversation.

The formula: Lead with impact (for the recruiter), follow with specifics (for the engineer).

Before/After Examples for Common ML Projects

The following examples show how to transform a weak project description into a strong one. Each "before" represents a pattern seen on hundreds of real resumes.

1. Recommendation System

Before:

Built a recommendation system for an e-commerce platform using collaborative filtering and neural networks.

After:

Designed a hybrid recommendation engine combining matrix factorization with a two-tower neural retrieval model, serving personalized product recommendations to 8M monthly active users. Reduced cold-start bounce rate by 31% using content-based fallback embeddings. System processes 200K events/min via Kafka, with model retraining triggered daily on 90-day sliding windows.

What changed: Added scale (8M users, 200K events/min), specific architecture (two-tower, matrix factorization), a named technical challenge solved (cold-start), and infrastructure context (Kafka, daily retraining).

2. NLP Pipeline

Before:

Developed an NLP pipeline for text classification using BERT and Python.

After:

Built an end-to-end document classification pipeline processing 50K insurance claims/day across 47 categories (F1=0.91, up from 0.72 with the previous regex-based system). Fine-tuned DeBERTa-v3 on 120K labeled examples with curriculum learning, reducing annotation cost by 60% vs. random sampling. Deployed as a FastAPI service on EKS with sub-200ms p99 latency.

What changed: Added domain context (insurance claims), concrete scale (50K/day, 47 categories), comparison to baseline (0.72 to 0.91), training details (curriculum learning, 120K examples), and deployment specifics (FastAPI, EKS, p99 latency).

3. Computer Vision

Before:

Worked on a computer vision project for defect detection in manufacturing using CNNs.

After:

Developed a real-time visual inspection system detecting 12 defect types on a high-speed production line (200 parts/min). Trained a YOLOv8 model on 45K annotated images (15K synthetic via domain-randomized rendering), achieving 97.3% recall at 99.1% precision - reducing escaped defects by 84% vs. manual inspection. Optimized inference to 8ms/frame on NVIDIA Jetson AGX via TensorRT INT8 quantization.

What changed: Added operational context (200 parts/min production line), specific model (YOLOv8), data strategy (synthetic data), precision/recall tradeoff, business impact (84% defect reduction), and edge deployment details (Jetson, TensorRT, INT8).

4. RAG System

Before:

Built a RAG system using LangChain and OpenAI for internal document search.

After:

Architected a retrieval-augmented generation system over 2.3M internal documents (policy manuals, SOPs, engineering specs), reducing average employee search time from 12 minutes to 45 seconds. Implemented hybrid retrieval (BM25 + dense embeddings via e5-large-v2) with reciprocal rank fusion, achieving 89% answer accuracy on a 500-question eval set curated with domain SMEs. Built citation verification pipeline that grounds every generated answer to source paragraphs, reducing hallucination rate from 23% to 4%.

What changed: Added corpus scale (2.3M documents), document types, user impact (12 min to 45 sec), specific retrieval strategy (hybrid + RRF), evaluation methodology (500-question set, SME-curated), and hallucination mitigation (citation verification with measured rate).

5. MLOps Pipeline

Before:

Set up MLOps infrastructure for the ML team using Kubeflow and MLflow.

After:

Designed and deployed the end-to-end MLOps platform serving 14 ML models across 3 product teams, reducing model deployment time from 2 weeks to 4 hours. Built CI/CD pipelines (GitHub Actions + Kubeflow Pipelines) with automated data validation (Great Expectations), model training, evaluation gating (auto-reject if metrics regress >2%), and canary deployment on Kubernetes. Implemented model monitoring (feature drift via PSI, prediction drift via KS test) with PagerDuty alerting, catching 3 data pipeline failures before they impacted users.

What changed: Added scope (14 models, 3 teams), time savings (2 weeks to 4 hours), specific tooling chain, quality gates, monitoring methodology (PSI, KS test), and a concrete outcome (caught 3 failures).

6. Data Pipeline

Before:

Built data pipelines for ML feature engineering using Spark and Airflow.

After:

Engineered a feature platform processing 2TB/day of clickstream and transaction data, generating 340+ features for 6 downstream ML models. Migrated batch Spark jobs (4-hour SLA) to a streaming architecture (Flink + Kafka), reducing feature freshness from 24 hours to 15 minutes. Implemented point-in-time-correct feature joins, eliminating a data leakage bug that had inflated offline metrics by ~8% across all models.

What changed: Added data volume (2TB/day), feature count (340+), downstream impact (6 models), a meaningful migration story (batch to streaming), and a subtle technical win (point-in-time correctness fixing data leakage).

7. Research Project

Before:

Conducted research on efficient transformers and published a paper at a top venue.

After:

Proposed a sparse attention mechanism (BlockSparse Attention) that reduces transformer self-attention complexity from O(n^2) to O(n*sqrt(n)) while retaining 98.5% of dense attention quality on 6 NLP benchmarks. Published at ACL 2025 (oral presentation, top 4% of submissions). Method adopted by 2 industry teams for long-document summarization, enabling 32K-token context windows on single A100 GPUs.

What changed: Added the specific contribution (named method), complexity improvement, benchmark results, venue prestige indicator (oral, acceptance rate), and real-world adoption.

8. Fine-Tuning Project

Before:

Fine-tuned a large language model for customer support automation.

After:

Fine-tuned Llama-3-8B on 25K curated support conversations using QLoRA (rank=64, 4-bit NormalFloat), reducing GPU cost from 12K(fullfinetuneon8xA100)to12K (full fine-tune on 8xA100) to 180 (single A100, 6 hours). Achieved 91% resolution rate on Tier-1 tickets (up from 67% with GPT-4 + prompt engineering alone), handling 4K tickets/day and saving $1.2M/year in support staffing. Implemented RLHF alignment stage using 3K preference pairs from senior agents to reduce policy-violating responses from 8% to 0.3%.

What changed: Added model specifics (Llama-3-8B, QLoRA parameters), cost comparison (full vs. efficient fine-tune), baseline comparison (GPT-4 + prompting), business metrics (resolution rate, cost savings), and safety alignment (RLHF with measured violation reduction).

9. Fraud Detection

Before:

Built a fraud detection model using XGBoost and deep learning for a fintech company.

After:

Built a real-time fraud detection system processing 15K transactions/second, catching 47Minfraudulenttransactionsannually(upfrom47M in fraudulent transactions annually (up from 29M with the previous rule-based system). Designed a two-stage architecture: XGBoost for low-latency scoring (<5ms p99) on 180 engineered features, escalating borderline cases to a graph neural network analyzing transaction networks. Reduced false positive rate by 41% (saving 2,200 analyst hours/year in manual review) while increasing fraud recall from 78% to 94%.

What changed: Added throughput (15K txn/sec), dollar impact ($47M caught), system architecture (two-stage with rationale), feature count, latency requirements, and the precision-recall tradeoff story (reduced FP while increasing recall).

10. Search Ranking

Before:

Improved search ranking for a marketplace using machine learning.

After:

Rebuilt the search ranking stack for a marketplace with 3M active listings, replacing a hand-tuned BM25 configuration with a learning-to-rank pipeline (LambdaMART on 85 features including query-listing semantic similarity via bi-encoder embeddings). Improved NDCG@5 by 22% offline and click-through rate by 14% in A/B test (n=1.2M searches, 21-day test). Implemented online feature serving via Redis (p99 < 10ms) and automated retraining with weekly click-through feedback loops.

What changed: Added marketplace scale (3M listings), specific ranking approach (LambdaMART, 85 features), offline and online evaluation separation, A/B test rigor (sample size, duration), and the serving/retraining architecture.

11. Conversational AI / Chatbot

Before:

Developed a chatbot for customer service using NLP and a large language model.

After:

Designed a multi-turn conversational agent handling 8K daily sessions across billing, troubleshooting, and account management intents (23 intent classes, intent accuracy 96.2%). Implemented a state machine orchestrator with LLM-based slot filling (GPT-4-turbo) and deterministic action execution, ensuring 100% policy compliance on financial operations. Reduced average handle time from 11 minutes (human agent) to 2.4 minutes, with a 72% full-resolution rate and a CSAT score of 4.3/5.0 (vs. 3.9 for human agents).

Describing Projects at Different Scales

Not every project is a production system serving millions of users. Your description strategy should match the project's actual scope.

Side Project / Personal Project

You do not have production metrics, large user bases, or A/B tests. Instead, emphasize:

  • Technical rigor - evaluation methodology, dataset curation, ablation studies
  • Engineering quality - clean code, documentation, reproducibility, CI/CD
  • Problem motivation - why this project, what gap does it fill?

Example:

Built an open-source tool for automatic detection of data leakage in ML pipelines. Analyzes scikit-learn Pipeline objects and pandas DataFrames to identify 7 common leakage patterns (target encoding before split, future data in time series, etc.). Validated on 50 Kaggle kernels, correctly flagging leakage in 12 that had inflated scores by 5-30%. 280+ GitHub stars.

Side Projects That Impress

The most impressive personal projects are tools that other ML engineers would use. A leakage detector, a model debugging library, a dataset versioning tool - these demonstrate engineering taste and understanding of real pain points. A personal project that is just "I trained a model on a Kaggle dataset" does not stand out.

Course / Bootcamp Project

Be honest about the context, but emphasize what you did beyond the baseline requirements.

Example:

Extended the course capstone project (image classification on CIFAR-100) by implementing knowledge distillation from a ResNet-152 teacher to a MobileNetV3 student, achieving 96% of teacher accuracy at 8x lower inference cost. Added Grad-CAM visualization for model interpretability and containerized the inference server with Docker. (Course project, self-directed extensions.)

Production System at a Company

Lean into scale, business impact, and system complexity.

Example:

Owned the pricing ML system serving dynamic prices for 45K SKUs across 12 regional markets, processing 2M price updates/day. Migrated from daily batch predictions (XGBoost) to near-real-time pricing (online gradient-boosted model updating every 15 minutes on streaming demand signals), increasing gross margin by 3.2 percentage points ($18M annual impact). Designed the A/B testing framework with geo-level randomization to avoid cannibalization bias.

Describing Team Projects: Your Contribution vs. Team Contribution

One of the most common interview failures is being unable to clearly articulate your specific contribution to a team project. Interviewers will probe this aggressively.

Rules for Team Project Descriptions

  1. Use "I" language for your work and "we/team" language for shared context
  2. Name your specific technical contributions
  3. Do not claim credit for the entire system, but do not undersell your role either
  4. Be prepared to go deep on any claim you make

Good example:

Led the model development workstream for a real-time bidding optimization system (team of 8 engineers, 6-month project). Personally designed and trained the click-through rate prediction model (DeepFM architecture, 120M parameters, trained on 2B impression logs), improving AUC from 0.74 to 0.81. Collaborated with the infrastructure team on the serving layer (TensorFlow Serving + feature store) and with the product team on A/B test design.

Bad example:

Built a real-time bidding optimization system with 8 engineers over 6 months using DeepFM, improving AUC from 0.74 to 0.81.

The bad example is ambiguous \text{---} did you build the whole thing? Design the model? Write tests? Interviewers will assume the worst if you are vague.

The Ownership Question

In almost every technical interview, you will be asked: "What specifically was your contribution?" If your resume description implies you owned the whole project but your interview answers reveal you only ran experiments someone else designed, your credibility collapses instantly. It is far better to honestly describe a smaller, well-defined contribution than to vaguely imply ownership of a larger effort.

Handling Confidential Work (NDA-Safe Descriptions)

Many ML engineers work on proprietary systems where they cannot share specific numbers, model architectures, or business metrics. Here is how to write compelling descriptions without violating your NDA.

Techniques for NDA-Safe Descriptions

TechniqueExample
Use relative metrics instead of absolutes"Improved key engagement metric by 23%" instead of "Increased DAU from 4.2M to 5.1M"
Generalize the domain"Large-scale e-commerce platform" instead of naming the company's specific product
Describe the class of problem"Built a multi-task ranking model for content recommendation" \text{---} this reveals methodology without revealing IP
Use order-of-magnitude scale"Serving predictions for tens of millions of users" instead of exact numbers
Name open-source components only"Fine-tuned a transformer model" \text{---} you can name the architecture class without naming a proprietary variant
Describe your role, not the system"Led model evaluation and A/B testing for the ranking team" avoids revealing system internals

NDA-safe example:

Designed and trained the core ranking model for a content recommendation system serving tens of millions of daily users at a major social media platform. Improved the primary engagement metric by 15%+ through a multi-task learning architecture jointly optimizing for multiple user actions. Personally led the evaluation framework, including offline replay evaluation and online A/B test design with variance reduction techniques.

When in Doubt, Ask

If you are unsure whether a description violates your NDA, ask your former employer's legal team or your manager. Most companies are fine with generic descriptions that do not reveal proprietary architecture details or exact business metrics. Some companies (Google, Meta, etc.) have explicit guidelines for what former employees can say. Check before you publish.

Projects Section for New Grads

If you are a recent graduate with limited industry experience, your projects section is the most important part of your resume. Here is how to structure it.

What to Include

Project TypeCountPriority
Research projects (thesis, publications)1-2Highest if targeting research roles
Substantial course projects (with extensions)1-2Medium \text{---} only if you went beyond the assignment
Personal / open-source projects1-3Highest if targeting engineering roles
Internship projects1-2High \text{---} treat like industry experience
Competition projects (Kaggle, etc.)0-1Medium \text{---} only if top placement

New Grad Project Description Template

[Action verb] a [system/model/tool] for [problem domain] using [key technologies]. [Key technical detail or design decision]. Achieved [metric] on [evaluation method], [comparison to baseline or significance]. [Scale indicator or deployment status].

Filled example:

Developed a low-resource named entity recognition system for medical text using few-shot learning. Fine-tuned SetFit on 50 labeled examples per entity type (8 types), leveraging domain-adapted BioBERT embeddings. Achieved 82.4 F1 on a held-out test set (1,200 annotated notes), outperforming GPT-4 zero-shot (71.2 F1) and the previous supervised baseline requiring 5K+ labels. Open-sourced with a Streamlit demo for clinical NLP researchers. Senior thesis, advised by Prof. [Name].

What Not to Include

  • Introductory course assignments everyone in the class completed
  • Tutorials you followed without modification
  • Projects where you cannot explain any design decision you made
  • Group projects where your individual contribution was minor

Projects Section for Career Changers

If you are transitioning into AI from another field (software engineering, data analytics, physics, finance, etc.), your projects need to bridge your existing expertise with your new ML skills.

Strategy: Leverage Your Domain Expertise

The fastest path into AI is through the domain you already know. A former financial analyst who builds a credit risk model has a massive advantage over a fresh bootcamp graduate doing the same \text{---} because they understand the domain, the data, and the business context.

Software Engineer transitioning to ML:

Rebuilt the rule-based content moderation system (my team's legacy project) as an ML pipeline, replacing 200+ hand-written regex rules with a fine-tuned DistilBERT classifier. Reduced false positives by 62% while maintaining 99.2% recall on policy-violating content. Designed the gradual rollout strategy (shadow mode for 2 weeks, then 10%/50%/100% canary deployment) to build organizational trust in the ML approach.

Data Analyst transitioning to ML:

Extended my quarterly churn analysis (previously descriptive SQL + Tableau dashboards) into a predictive system. Built an XGBoost churn prediction model using 3 years of behavioral data (payment history, product usage, support tickets). Model identifies 78% of churning customers 30 days in advance (precision 0.71), enabling the retention team to proactively intervene. Reduced quarterly churn rate by 2.1 percentage points ($340K impact).

The Bridge Project

If you are changing careers, build at least one "bridge project" that connects your previous domain to ML. This project should demonstrate that your domain expertise is an asset, not just that you can follow a PyTorch tutorial. Hiring managers value domain expertise more than most candidates realize.

Red Flags Interviewers Notice in Project Descriptions

Technical interviewers have seen thousands of resumes. These patterns immediately raise skepticism:

Instant Red Flags

Red FlagWhy It Is SuspiciousFix
"Built an end-to-end ML system" with no specificsCould mean anything from a Jupyter notebook to a production pipelineName the specific components you built
Perfect metrics with no context"99.5% accuracy" - on what? With what baseline? On what test set?Always include baseline comparison and evaluation context
Listing every framework in existence"PyTorch, TensorFlow, JAX, Keras, scikit-learn, XGBoost, LightGBM, CatBoost" - nobody uses all of theseList only what you actually used for this project
Vague scale claims"Large-scale system" - this could mean 100 users or 100M usersUse specific numbers or order-of-magnitude ranges
All projects use the same architectureEvery project is "fine-tuned BERT" - suggests you have one hammerShow range across your projects
No mention of evaluationBuilt a model but never mentions how it was validatedAlways describe offline metrics, and online metrics if applicable
Passive voice throughout"A model was trained" - who trained it?Use active voice: "I trained" or "Trained"
Buzzword stacking"Leveraged cutting-edge AI and state-of-the-art deep learning"Replace adjectives with specific details

Subtle Red Flags

  • Describing only the modeling work, never the data work. Real ML projects are 80% data. If your description is all about model architecture and never mentions data cleaning, labeling, feature engineering, or data quality - interviewers suspect you worked on a clean dataset, not a real problem.

  • No mention of failure or iteration. Every real project involves approaches that did not work. Descriptions that read like a straight line from problem to perfect solution feel rehearsed. Mentioning what you tried first and why you pivoted shows genuine experience.

  • Mismatched complexity and scale. Claiming you "deployed a transformer model serving 10M users" for a seed-stage startup with 500 customers is a credibility problem. Match your description to the actual scale.

  • Over-indexing on tools, under-indexing on decisions. "Used PyTorch, Weights & Biases, DVC, Docker, Kubernetes, Airflow" is a grocery list. What interviewers want to know is why you chose those tools and what tradeoffs you considered. A description that says "Chose ONNX Runtime over TensorFlow Serving for 3x lower cold-start latency on CPU-only inference nodes" says more about your judgment than listing ten tools.

  • Claiming SOTA without context. Saying "achieved state-of-the-art results" without specifying the benchmark, the date, and the comparison is meaningless. SOTA changes weekly. If you held SOTA on a benchmark, be precise: "Achieved SOTA on SQuAD 2.0 as of March 2025 (EM=92.1), surpassing the previous best by 0.8 points."

  • Describing only happy-path metrics. Reporting "95% accuracy" without mentioning class imbalance, edge cases, or failure modes suggests you do not understand evaluation. Strong descriptions acknowledge limitations: "95% accuracy overall; 78% on the long-tail category (15% of volume), which we addressed with targeted data augmentation."

Action Verbs That Signal Seniority

The verb you choose at the start of a bullet point implicitly communicates your seniority level. Choose deliberately.

Seniority LevelStrong VerbsWeak Verbs
Junior / ICImplemented, built, trained, developed, automated, testedHelped, assisted, contributed to, participated in
Mid-levelDesigned, optimized, migrated, led (a workstream), establishedWorked on, was responsible for, utilized
Senior / StaffArchitected, owned, drove, defined (the strategy), mentoredManaged, oversaw, was involved in
ResearchProposed, demonstrated, proved, introduced, formalizedStudied, explored, investigated
Verb Precision Matters

"Built" and "Architected" describe very different activities. "Built" means you wrote the code. "Architected" means you made the high-level design decisions about components, interfaces, and tradeoffs. If you did both, say "Designed and implemented." If you only wrote code from someone else's design, say "Implemented." Interviewers will probe which one you actually did.

Quantifying When You Do Not Have Numbers

Sometimes you genuinely do not have precise metrics - the project was internal, the A/B test was run after you left, or the system was research-oriented. Here is how to still add quantitative texture.

Quantify the input, not just the output:

  • "Trained on 2.3M labeled examples across 47 categories"
  • "Processed 500GB of raw log data daily"
  • "Feature store contained 340+ features from 12 data sources"

Quantify the engineering effort:

  • "Reduced training time from 18 hours to 2.5 hours via mixed-precision training and gradient accumulation"
  • "Compressed model from 1.2GB to 85MB using knowledge distillation and INT8 quantization"
  • "Reduced deployment pipeline from 14 manual steps to a single CI/CD trigger"

Quantify the scope:

  • "Supported 6 downstream ML models across 3 product teams"
  • "Served 14 internal teams via self-service API (200+ registered users)"
  • "Evaluated across 8 benchmarks spanning 4 languages"

Use order-of-magnitude estimates (clearly labeled):

  • "Estimated ~$500K annual cost savings based on reduced manual review volume"
  • "Serving approximately 10M predictions/day at peak traffic"
Never Fabricate Numbers

Estimated numbers are fine if labeled as estimates. Fabricated numbers will end your interview instantly if caught. If an interviewer asks "how did you measure that 23% improvement?" and you cannot explain the methodology, the entire resume loses credibility. Only include numbers you can defend.

Template Sentences

Use these fill-in-the-blank templates as starting points, then customize for your specific context.

The Impact-First Template

[Action verb] a [system type] that [business outcome in plain language], resulting in [X% improvement in metric] ([absolute numbers if possible]). [One sentence on technical approach]. [One sentence on scale or deployment].

The Technical-Depth Template

Designed and implemented a [specific architecture] for [problem], training on [data scale] with [key training technique]. Achieved [metric = value] on [benchmark/test set], a [X% improvement] over [baseline method]. [Deployment or adoption detail].

The Migration / Improvement Template

Replaced [legacy system/approach] with [new ML approach], reducing [pain metric] by [X%] while improving [quality metric] by [Y%]. [Key technical decision and why]. [Scale or cost impact].

The Research Template

Proposed [method name], a novel approach to [problem] that [key insight]. Demonstrated [X% improvement] over [previous SOTA] on [N benchmarks]. Published at [venue] ([acceptance rate or tier indicator]). [Adoption or impact if any].

The Infrastructure Template

Built [platform/pipeline/tool] supporting [N models/teams/use cases], reducing [time/cost metric] from [old value] to [new value]. Implemented [key technical capabilities]. [Reliability or scale metric].

Describing Open-Source Contributions

Open-source contributions deserve their own mention, especially if they demonstrate depth beyond your job responsibilities.

Contributing to major projects:

Contributed [feature/fix] to [project name] ([stars count]): [specific description of contribution]. PR merged after [review context]. [Impact \text{---} downloads, users affected, etc.].

Example:

Contributed the FlashAttention-2 integration to Hugging Face Transformers (120K+ stars): implemented the attention backend swap for LLaMA and Mistral model families, reducing fine-tuning memory usage by 40% for sequences > 2048 tokens. PR reviewed by 3 core maintainers, merged in v4.36. Used by 50K+ monthly downloads of affected models.

Maintaining your own project:

Created and maintain [project name] ([stars/downloads]), a [description of what it does]. Used by [adopters or download count]. [Key technical feature].

Putting It All Together: A Complete Projects Section

Here is what a strong projects section looks like on a resume for a mid-level ML engineer:

Senior ML Engineer, Acme Corp (2023-2025)

  • Owned the real-time fraud detection pipeline processing 25K transactions/sec, catching 52Minannualfraud(upfrom52M in annual fraud (up from 31M with previous rule-based system). Designed a two-stage architecture: gradient-boosted model for low-latency first-pass scoring (<3ms p99) with a graph neural network for high-risk transaction network analysis. Reduced false positive rate by 38%, saving ~3,000 analyst hours/year.

  • Led migration of the feature store from batch-only (Hive, 24-hour freshness) to real-time (Feast + Redis, sub-second freshness), enabling 4 downstream models to incorporate live behavioral features. Personally implemented point-in-time-correct feature joins that eliminated a data leakage bug inflating offline metrics by ~6%.

ML Engineer, StartupCo (2021-2023)

  • Built the company's first search ranking system for a marketplace with 800K listings. Implemented a learning-to-rank pipeline (LambdaMART, 65 features) replacing keyword-only search, improving NDCG@5 by 28% and purchase conversion by 11% in A/B test (n=400K searches). Owned the full stack from feature engineering to online serving (Redis, <15ms p99).

  • Designed an automated product categorization pipeline classifying 50K new listings/week into a 3-level taxonomy (1,200 leaf categories). Fine-tuned a multi-label DistilBERT classifier achieving 93% top-1 accuracy, replacing a manual review process that cost $8K/month in contractor hours.

Personal Projects

  • Created ml-leak-detector (340+ GitHub stars), an open-source tool that statically analyzes scikit-learn pipelines for 7 common data leakage patterns. Validated on 50 public Kaggle kernels, correctly identifying leakage in 12 with inflated scores.

Final Checklist

Before submitting your resume, run every project description through this checklist:

  • Does it pass the "So what?" test? If someone reads it and thinks "so what?", add impact metrics.
  • Does it include at least one number? Percentages, dollar amounts, user counts, latency, dataset size - anything quantifiable.
  • Would an engineer want to ask you about it? If the description is so generic that there is nothing to follow up on, add technical specificity.
  • Can you defend every claim in a 30-minute deep dive? Do not write anything you cannot explain in detail.
  • Is the action verb specific? "Designed," "implemented," "migrated," "optimized" are good. "Worked on," "helped with," "was involved in" are weak.
  • Does it distinguish your contribution from the team's? If it was a team project, is your specific role clear?
  • Is it honest? Exaggeration will be caught in the interview. Accurate descriptions of smaller work are always better than inflated descriptions of larger work.
  • Does it work without the context of your other bullets? Each bullet should stand alone - the reader may only read one.

Key Takeaways

  1. Project descriptions are the highest-signal section of your AI resume. They reveal your actual skill level in a way that titles and company names cannot.

  2. Use the PSTAR framework (Problem, Solution, Technology, Architecture, Results) to ensure every description has substance.

  3. Write for two audiences simultaneously: recruiters scanning for impact and engineers scanning for depth. Lead with metrics, follow with technical specifics.

  4. Be specific about your contribution. Vague team-project descriptions destroy credibility when probed in interviews.

  5. Match your description to the project's actual scale. An honest description of a well-executed side project is more impressive than an inflated description of a minor contribution to a large system.

  6. Every bullet needs at least one number. If you cannot quantify the result, quantify the input (dataset size, query volume, number of features, latency requirement).

Next: GitHub Portfolio - how to build a portfolio that makes your project descriptions verifiable and your skills undeniable.

© 2026 EngineersOfAI. All rights reserved.