Engineers of AI Research Lab is an independent, practitioner-led research group. We study what actually happens when large language models and AI agents leave controlled benchmarks and enter production systems, where latency matters, costs compound, and failure modes are rarely the ones papers warn you about.
The gap between AI research and AI engineering keeps growing. Papers optimize for benchmarks that don't reflect production. Frameworks market features that break under load. "State of the art" means nothing if you can't deploy it. We exist to close that gap.
Every experiment we run ships with code, data, and methodology. Every benchmark is runnable. Every claim is falsifiable. If you can't reproduce it, it's not research. It's marketing. We decode papers with an engineering lens, benchmark frameworks under real conditions, and build tools that work in production, not just in notebooks.
Our current focus areas include LLM framework evaluation: systematic benchmarks comparing developer experience, RAG pipelines, agent capabilities, and production concerns across frameworks. We're also building a failure mode taxonomy for deployed LLM agents: uncontrolled loops, tool call errors, context window exhaustion, and cost overruns. And we're working on inference optimization research covering quantization, speculative decoding, and KV cache strategies, benchmarked under real serving conditions.
No institutional affiliation is required to contribute. No PhD gatekeeping. If you build with AI and care about rigorous evaluation, there's a seat at the table. We're fully async, fully remote, and open to collaborators worldwide. Two paragraphs is enough to get started: your background and what you want to work on. We reply to every email.
Everything we publish is open, and built for the people who actually ship AI systems.
