How Braintrust uses AI agents, evals, and CI to ship better software | Ankur Goyal | Startup News

Braintrust's founder Ankur Goyal spoke with a reporter about using AI agents for database optimization. He uses Codex to run week-long benchmark experiments across database indexes, column store formats, and execution engines to speed up slow queries. This can be done tirelessly by agents, making it possible to skip rigorous benchmarking. The "agent line" framework is used to decide which decisions, directions, and interactions can be handed off to an agent.

Ankur also discussed the importance of evals in modern software development. He argued that evals are like PRDs (product requirement documents) but for AI models. They encode what good looks like so a model can figure out how to achieve it. A scoring function is built live and let's an agent improve the prompt inside a safe playground.

The interview also touched on CI/CD investment, with Ankur stating that fixing the continuous integration process is critical for AI-accelerated teams. He emphasized that human attention decays on tedious work and agents can handle tasks like this more efficiently.

Braintrust uses several tools in their workflow, including Codex, GPT 5.4, Claude, and others. The full interview with Ankur Goyal is available to listen or watch on YouTube, Spotify, or Apple Podcasts.

Read full original story ↗

How Braintrust uses AI agents, evals, and CI to ship better software | Ankur Goyal

More news