Gauri K.

My Publications

Here are some of my research publications and articles.

Policy-as-Prompt: Turning AI Governance Rules into Guardrails for AI Agents

Authors: Gauri Kholkar, Ratinder Ahuja

Conference: NeurIPS 2025, 3rd Regulatable ML Workshop Spotlight Presentation

Abstract: A regulatory machine learning framework that converts unstructured design artifacts (PRDs, TDDs, code) into verifiable runtime guardrails. Our Policy-as-Prompt method builds source-linked policy trees compiled into prompt-based classifiers for real-time monitoring. Evaluations show reduced prompt-injection risk, blocked out-of-scope requests, and scalable AI safety assurance.

Read on arXiv | MLCollective (DLCT) Talk →

CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement

Authors: Gauri Kholkar et al.

Conference: ACL 2025, LLMSEC Workshop

Abstract: A context-aware benchmark assessing both attack detection and over-defense tendencies in prompt injection guardrails. Experiments reveal current models suffer from high false negatives in adversarial cases and excessive false positives in benign scenarios.

Read on arXiv | Watch Community Talk →

Towards Socio-Culturally Aware Evaluation of Large Language Models for Content Moderation

Authors: Shanu Kumar, Gauri Kholkar, Saish Mendke, Anubhav Sadana, Parag Agrawal, Sandipan Dandapat

Conference: COLING 2025

Abstract: A socio-culturally aware evaluation framework for LLM-driven content moderation with a scalable method for creating diverse datasets using persona-based generation. These datasets provide broader perspectives and pose greater challenges for LLMs, especially smaller models.

Read on arXiv

LITMUS Predictor: An AI Assistant for Building Reliable, High-Performing and Fair Multilingual NLP Systems

Authors: Anirudh Srinivasan*†, Gauri Kholkar*, Rahul Kejriwal*, Tanuja Ganu, Sandipan Dandapat, Sunayana Sitaram, Balakrishnan Santhanam, Somak Aditya, Kalika Bali, Monojit Choudhury (*Equal contribution)

Conference: AAAI 2022

Abstract: A tool that makes reliable performance projections for fine-tuned task-specific models across languages without test and training data, helping strategize data labeling efforts to optimize performance and fairness objectives.

Read the PDF