Lemma Launches: Continuous Learning for AI Agents
"Deploy 1x. Learn forever."
TL;DR: Lemma is the first evaluation + observability platform built not just to measure performance, but to improve it automatically. They help AI agents learn from real user feedback and production data, closing the loop so your prompts and agents continuously optimize themselves over time.
Launch Video: https://www.youtube.com/watch?v=E4_v-pY_4fs
Founded by Jerry Zhang & Cole Gawin
They met freshman year at USC and have been building together ever since instead of going to classes.
Before starting Lemma, they were engineers at two high growth, AI-native startups: Tandem (AI for healthcare) and Chipstack (AI agents for chip design). At both companies, setting up evaluations looked like clunky Retool dashboards and multiple engineers manually tweaking experiments. They built internal systems that automated both running the evaluations themselves, as well as the error-driven feedback loop. The result: 2x accuracy improvement and speed of iteration.
They soon realized every AI company was reinventing the same internal tooling in-house. So they left college, joined YC, and are now bringing continuous learning infrastructure to everyone else.

The Problem:
AI agents don’t learn from their mistakes. In fact, they get worse with use.
In production, prompts and agents continuously degrade due to real-world input drift (new user behaviors or unseen edge cases). Agent performance can often drop ~40% in a few weeks, and suddenly what worked in testing breaks in front of customers.
When that happens, engineers are forced to dig through logs, collect failing examples, and manually tweak prompts rather than building core product features.
Solution:
That’s why the team built Lemma: the first end-to-end system that closes the loop between agent deployment and improvement.
Here's what that means:
Step 1: Lemma detects failed outcomes directly from live traffic, and it automatically identifies the exact cause in an agent chain.
Step 2: Lemma alerts you, and with one click, it runs targeted prompt optimizations to fix the failing behavior without any manual tracing or guesswork.
Step 3: They give you back an improved prompt and automatically open a PR in your codebase so your prompts can live where you want them. Alternatively, you can also fetch your prompt from the Lemma dashboard.
Plus, Lemma provides all the LLM eval and observability features you rely on, just reimagined for continuous learning:
- Data-ingestion pipeline to bring your existing eval sets and automatically flag inconsistencies and gaps

- Prompt editor and inference support for any closed & open-source model for prompt iteration.

- Agent tracing observability with live drift detection, regression alerts, and performance visibility across real user interactions.

Teams using Lemma cut manual prompt iteration by 90%, resolve production drifts in minutes instead of days, and improve model performance ~2–5% every optimization cycle.
Learn More
🌐 Visit www.uselemma.ai to learn more.
👉 Try their platform - If you’re building with LLMs and run a ton of prompt or eval experiments, they would love for you to work with them.
⭐ Introductions - If you know a Head of AI/Eng or CTO at a pre-seed to Series A startup, they owe you lunch :) Please reach out to the founders here, or book a live demo on our website uselemma.ai.
👣 Follow Lemma on LinkedIn & X.
Need help with the upcoming tax deadline?
Take the stress out of bookkeeping, taxes, and tax credits with Fondo’s all-in-one accounting platform built for startups. Start saving time and money with our expert-backed solutions.
.png)
.png)
.png)
.png)
.png)











.png)

