AI Alignment Research

My current AI safety research is supported by a grant from BlueDot Impact.

Active projects

emmy1

Evaluation-invariant measurement for alignment of multi-agent systems

llms en garde: ain’t misbehavin’?2

Do LLMs misbehave less when they’re en garde?

liar liar

Is the model scheming to deceive… or is it just wrong?

activation tomography

Natural Language Autoencoders as measurement instruments for AI safety

paper chase

Multi-agent simulation of a scientific publishing ecosystem

Technical AI safety and alignment research interests

I’m interested in developing construct-valid instruments for measuring properties of AI systems. Recent projects I’m excited about include faithful reconstruction of model latent space and measurement of emergent properties of AI collectives.

I’m broadly interested in technical AI safety across domains: AI evaluations and measurement methodology, AI control, scalable oversight, multi-agent alignment, model organisms of misalignment, mechanistic interpretability.


  1. Name inspired by Emmy Noether (1882–1935), who did foundational work connecting symmetries to invariants. ↩︎

  2. A jazz standard, here performed by Sarah Vaughan (feat. Miles Davis): Ain’t Misbehavin’ ↩︎