David Rein

David Rein

San Francisco

I'm a researcher at METR, making agent benchmarks, measuring the impacts of AI tools on software developer productivity, and modeling trends in AI capabilities.

I'm most well-known for creating GPQA, one of the (as of mid-2025) most widely used AI capability benchmarks. I've also done research on scalable oversight (i.e. trying to figure out how to supervise a system that's smarter or more capable than the supervisor), and I was one of the first few employees at Cohere, where I trained embedding models for semantic search.