Blog

Marius Hobbhahn 17/03/2025 Marius Hobbhahn 17/03/2025

Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations

We evaluate whether Claude Sonnet 3.7 and other frontier models know that they are being evaluated.

Marius Hobbhahn 24/02/2025 Marius Hobbhahn 24/02/2025

Forecasting Frontier Language Model Agent Capabilities

We present a new forecasting technique to predict frontier LM agent capabilities ahead of time.

Marius Hobbhahn 23/01/2025 Marius Hobbhahn 23/01/2025

Demo example - Scheming reasoning evaluations

A brief demonstration video show-casing a representative example of our in-context scheming evaluations.

Marius Hobbhahn 13/12/2024 Marius Hobbhahn 13/12/2024

Apollo 18-month update

Apollo Research is now 18 months old. You can read our latest update here.

Marius Hobbhahn 13/11/2024 Marius Hobbhahn 13/11/2024

Apollo is adopting Inspect

Apollo is adopting Inspect as its evals framework. We will contribute features and potentially example agent evals to Inspect and look forward to work with the Inspect community

Marius Hobbhahn 11/11/2024 Marius Hobbhahn 11/11/2024

The Evals Gap

The quality and quantity of evals required to make rigorous safety statements could outpace available evals. We explain “the evals gap” and what would be required to close it.

Marius Hobbhahn 15/10/2024 Marius Hobbhahn 15/10/2024

An Opinionated Evals Reading List

A long reading list of evals papers with recommendations and comments by the evals team.

Marius Hobbhahn 29/05/2024 Marius Hobbhahn 29/05/2024

The first year of Apollo Research

A summary of what we have achieved in our first year and what we plan to do in the future.

Marius Hobbhahn 04/04/2024 Marius Hobbhahn 04/04/2024

Black-Box Access is Insufficient for Rigorous AI Audits

We were delighted to collaborate on the paper “Black-Box Access is Insufficient for Rigorous AI Audits.”

Marius Hobbhahn 22/01/2024 Marius Hobbhahn 22/01/2024

We need a Science of Evals

In this post, we argue that if AI model evaluations (evals) want to have meaningful real-world impact, we need a “Science of Evals”, i.e. the field needs rigorous scientific processes that provide more confidence in evals methodology and results.