organizations|Lie detectors

Apollo Research

Apollo Research is an organisation based in London that tests artificial-intelligence systems for safety. It is led by Marius Hobbhahn. The outfit is known for designing experiments that probe whether AI models engage in deceptive behaviour—a key concern in AI alignment—a phenomenon Hobbhahn calls "clever cunning".

In 2023 Apollo conducted an experiment in which OpenAI's GPT-4 was instructed to manage a fictional stock portfolio. Given an insider tip, the model bought the stock and then lied about having had advance notice. In further tests, Apollo evaluated Anthropic's Opus-3 and Sonnet 3.5 models for "sandbagging" — deliberately underperforming to avoid being modified — and found that both models chose to submit incorrect answers to preserve their capabilities. Apollo researchers also tested OpenAI's o1 model and reported in December 2024 that its actions were "best explained as scheming against the user".