Aaron

Aaron J. Li

PhD Student in Computer Science

University of California, Berkeley

I am a (rising) second-year CS PhD student at UC Berkeley advised by Prof. Ion Stoica and Prof. Bin Yu, affiliated with Sky Computing Lab and BAIR. I completed my Master’s degree in Computational Science and Engineering at Harvard University, where I was fortunate to be advised by Prof. Hima Lakkaraju. Prior to that, I earned my Bachelor’s degree from UC Berkeley, double majoring in Computer Science and Psychology.

I’ve been collaborating with LM Arena since Spring 2026. In summer 2026, I’m joining Microsoft Research (Redmond) as a Research Intern.

My research centers on LLM/agent evaluation, agentic systems, and alignment, with an emphasis on direct practical impact. Here are several overarching directions of my current interests:

· Developing novel evaluation paradigms that are realistic (closing the gap between benchmarks and real-world utility), sustainable (addressing saturation and contamination), and informative (surfacing concrete failure modes that guide model development and alignment).

· Building agentic systems with genuine real-world value, focusing on domain-specific self-improvement, reliable customizability, and efficiency.

· Operationalizing AI safety risks as concrete, measurable behaviors, and studying how post-training and alignment techniques can systematically address them in practice.

I’m always open to collaborations and happy to discuss all kinds of research ideas, and the best way to contact me is through email. For undergraduate students interested in working with me, I’m happy to have you either leading your own project or joining an existing one as a contributor, if there’s a good fit.

Research Interests

EvaluationAgentsAlignment and Safety

Selected Publications * equal contribution  · † equal advising

  1. benchevolver.png
    BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution
    Yangzhen Wu*, Aaron J. Li*, Wenjie Ma, Li Cao, Ziheng Zhou, Mert Cemri,
    Shu Liu, Yuran Xiu, Chenxiao Yan, Haikun Zhao, Bin Yu,
    Ion Stoica†, and Dawn Song†
    arXiv Preprint, 2026
  2. greenshielding.png
    Green Shielding: A User-Centric Approach Towards Trustworthy AI
    Aaron J. Li*, Nicolas Sanchez*, Hao Huang, Ruijiang Dong, Jaskaran Bains,
    Katrin Jaradeh, Zhen Xiang, Bo Li, Feng Liu, Aaron Kornblith, and Bin Yu
    arXiv Preprint, 2026
  3. sae_robustness.png
    Evaluating Adversarial Robustness of Concept Representations in Sparse Autoencoders
    Aaron J. LiSuraj SrinivasUsha Bhalla, and Himabindu Lakkaraju
    EACL 2026
  4. rlhf_trust.png
    More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
    Aaron J. LiSatyapriya Krishna, and Himabindu Lakkaraju
    ICLR 2025  ·  Oral, Top 1.8%
  5. r3_ppnet.png
    Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining
    Aaron J. LiRobin NetzorgZhihan ChengZhuoqin Zhang, and Bin Yu
    ICML 2024