Here are some of my research publications and articles.
Policy-as-Prompt: Turning AI Governance Rules into Guardrails for AI Agents
Authors: Gauri Kholkar, Ratinder Ahuja
Conference: NeurIPS 2025, 3rd Regulatable ML Workshop Spotlight Presentation
Abstract: A regulatory machine learning framework that converts unstructured design artifacts (PRDs, TDDs, code) into verifiable runtime guardrails. Our Policy-as-Prompt method builds source-linked policy trees compiled into prompt-based classifiers for real-time monitoring. Evaluations show reduced prompt-injection risk, blocked out-of-scope requests, and scalable AI safety assurance.
CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement
Authors: Gauri Kholkar et al.
Conference: ACL 2025, LLMSEC Workshop
Abstract: A context-aware benchmark assessing both attack detection and over-defense tendencies in prompt injection guardrails. Experiments reveal current models suffer from high false negatives in adversarial cases and excessive false positives in benign scenarios.
Towards Socio-Culturally Aware Evaluation of Large Language Models for Content Moderation
Authors: Shanu Kumar, Gauri Kholkar, Saish Mendke, Anubhav Sadana, Parag Agrawal, Sandipan Dandapat
Conference: COLING 2025
Abstract: A socio-culturally aware evaluation framework for LLM-driven content moderation with a scalable method for creating diverse datasets using persona-based generation. These datasets provide broader perspectives and pose greater challenges for LLMs, especially smaller models.
LITMUS Predictor: An AI Assistant for Building Reliable, High-Performing and Fair Multilingual NLP Systems
Authors: Anirudh Srinivasan*†, Gauri Kholkar*, Rahul Kejriwal*, Tanuja Ganu, Sandipan Dandapat, Sunayana Sitaram, Balakrishnan Santhanam, Somak Aditya, Kalika Bali, Monojit Choudhury (*Equal contribution)
Conference: AAAI 2022
Abstract: A tool that makes reliable performance projections for fine-tuned task-specific models across languages without test and training data, helping strategize data labeling efforts to optimize performance and fairness objectives.