My current AI safety research is supported by a grant from BlueDot Impact.
Active projects
Evaluation-invariant measurement for alignment of multi-agent systems
llms en garde: ain’t misbehavin’?2
Do LLMs misbehave less when they’re en garde?
Is the model scheming to deceive… or is it just wrong?
Natural Language Autoencoders as measurement instruments for AI safety
Multi-agent simulation of a scientific publishing ecosystem
Technical AI safety and alignment research interests
I’m interested in developing construct-valid instruments for measuring properties of AI systems. Recent projects I’m excited about include faithful reconstruction of model latent space and measurement of emergent properties of AI collectives.
I’m broadly interested in technical AI safety across domains: AI evaluations and measurement methodology, AI control, scalable oversight, multi-agent alignment, model organisms of misalignment, mechanistic interpretability.
-
Name inspired by Emmy Noether (1882–1935), who did foundational work connecting symmetries to invariants. ↩︎
-
A jazz standard, here performed by Sarah Vaughan (feat. Miles Davis): Ain’t Misbehavin’ ↩︎