Results for "evals"

2 results

Episodes

StandardSummaries only
METR’s Joel Becker on exponential Time Horizon Evals, Threat Models, and the Limits of AI Productivity
Latent Space: The AI Engineer Podcast· Feb 27, 2026
This is a free preview of a paid episode. To hear more, visit www.latent.spaceAIE Europe CFP and AIE World’s Fair paper submissions for CAIS peer review are due TODAY - do not delay! Last call ever.We’re excited to welco…
evals
StandardSummaries only
⚡️The End of SWE-Bench Verified — Mia Glaese & Olivia Watkins, OpenAI Frontier Evals & Human Data
Latent Space: The AI Engineer Podcast· Feb 23, 2026
Olivia Watkins (Frontier Evals team) and Mia Glaese (VP of Research at OpenAI, leading the Codex, human data, and alignment teams) discuss a new blog post (https://openai.com/index/why-we-no-longer-evaluate-swe-bench-ver…
openaievals