Aaron J. Li

prof_pic.jpg

Cambridge, MA, 02138

I am currently a second-year Master’s student in Computational Science and Engineering at Harvard University, advised by Prof. Hima Lakkaraju. Previously, I graduated from UC Berkeley with a double major in Computer Science and Psychology, where I also collaborated with Prof. Bin Yu as part of the EECS Honors Program.

I’m broadly interested in the intersections of Trustworthy Machine Learning, Large Language Models, and Mechanistic Interpretability. The two overarching research questions I aim to address are:

(1) How could we obtain reliable interpretations of learning dynamics and explanations of observed model behaviors?

(2) How could we leverage such understanding to improve next-generation foundation models?

selected publications

  1. ICLR
    rlhf_trust.png
    More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
    Aaron J. Li, Satyapriya Krishna, and Himabindu Lakkaraju
    In Proceedings of the International Conference on Learning Representations, 2025
    Oral Presentation, Top 1.8%
  2. ICML
    r3_ppnet.png
    Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining
    Aaron J. Li, Robin Netzorg, Zhihan Cheng, Zhuoqin Zhang, and Bin Yu
    In Proceedings of the International Conference on Machine Learning, 2024