Blog

Marius Hobbhahn Marius Hobbhahn

Apollo is adopting Inspect

Apollo is adopting Inspect as its evals framework. We will contribute features and potentially example agent evals to Inspect and look forward to work with the Inspect community

Read More
Marius Hobbhahn Marius Hobbhahn

The Evals Gap

The quality and quantity of evals required to make rigorous safety statements could outpace available evals. We explain “the evals gap” and what would be required to close it.

Read More
Marius Hobbhahn Marius Hobbhahn

We need a Science of Evals

In this post, we argue that if AI model evaluations (evals) want to have meaningful real-world impact, we need a “Science of Evals”, i.e. the field needs rigorous scientific processes that provide more confidence in evals methodology and results.

Read More