Publications

For a complete and up-to-date list, see my Google Scholar page.

2026

  1. dualeval.png
    DualEval: Joint Model-Item Calibration for Unified LLM Evaluation
    Aaron J. Li, Hao Huang, Youngmin Park, Yitong Ma, Wei-Lin Chiang,
    Li Chen, Cho-Jui Hsieh, Bin Yu, and Ion Stoica
    arXiv Preprint, 2026
  2. benchevolver.png
    BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution
    Yangzhen Wu*, Aaron J. Li*, Wenjie Ma, Li Cao, Ziheng Zhou, Mert Cemri,
    Shu Liu, Yuran Xiu, Chenxiao Yan, Haikun Zhao, Bin Yu,
    Ion Stoica†, and Dawn Song†
    arXiv Preprint, 2026
  3. greenshielding.png
    Green Shielding: A User-Centric Approach Towards Trustworthy AI
    Aaron J. Li*, Nicolas Sanchez*, Hao Huang, Ruijiang Dong, Jaskaran Bains,
    Katrin Jaradeh, Zhen Xiang, Bo Li, Feng Liu, Aaron Kornblith, and Bin Yu
    arXiv Preprint, 2026

2025

  1. sae_robustness.png
    Evaluating Adversarial Robustness of Concept Representations in Sparse Autoencoders
    Aaron J. LiSuraj SrinivasUsha Bhalla, and Himabindu Lakkaraju
    EACL 2026

2024

  1. rlhf_trust.png
    More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
    Aaron J. LiSatyapriya Krishna, and Himabindu Lakkaraju
    ICLR 2025  ·  Oral, Top 1.8%
  2. r3_ppnet.png
    Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining
    Aaron J. LiRobin NetzorgZhihan ChengZhuoqin Zhang, and Bin Yu
    ICML 2024

2023

  1. certify.png
    Certifying LLM Safety against Adversarial Prompting
    Aounon KumarChirag AgarwalSuraj SrinivasAaron J. LiSoheil Feizi, and Himabindu Lakkaraju
    COLM 2024