publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. ICLR
    rlhf_trust.png
    More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
    Aaron J. Li, Satyapriya Krishna, and Himabindu Lakkaraju
    In Proceedings of the International Conference on Learning Representations, 2025
    Oral Presentation, Top 1.8%

2024

  1. ICML
    r3_ppnet.png
    Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining
    Aaron J. Li, Robin Netzorg, Zhihan Cheng, Zhuoqin Zhang, and Bin Yu
    In Proceedings of the International Conference on Machine Learning, 2024
  2. COLM
    certify.png
    Certifying LLM Safety against Adversarial Prompting
    Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron J. Li, Soheil Feizi, and Himabindu Lakkaraju
    In Conference on Language Modeling, 2024