research | WhileBug

AI Agent Security

Securing autonomous AI agents against adversarial manipulation, prompt injection, and unintended behaviors in real-world deployments.

[Arxiv-2026] PI-SoK: The Landscape of Prompt Injection Threats in LLM Agents
[Arxiv-2025] AgentArmor: Enforcing Program Analysis on Agent Runtime Trace to Defend Against Prompt Injection
[Arxiv-2024] PIB-Bench: Prompt Injection Benchmark for Foundation Model Integrated Systems

Interpretable AI Security

Leveraging interpretability and explainability techniques to understand, diagnose, and mitigate vulnerabilities in AI systems.

Usable Security of AI

Designing intuitive security mechanisms and interfaces that help users safely interact with and configure AI systems.

AI Misuse Measurement

Understanding and measuring how AI is being misused in the wild, from MCP server poisoning to adversarial exploitation of AI-powered services, and developing methods to quantify such misuse at scale.

[Arxiv-2026] MCPSmell: From Docs to Descriptions: Smell-Aware Evaluation of MCP Server Descriptions

AI Society Security

Studying the safety implications of deploying AI as participants in social networks and human-like environments, including how AI handles moral dilemmas, social norms, and trust dynamics when acting as autonomous social agents.

[ICLR-2026] TrustGen: A Platform of Dynamic Benchmarking on the Trustworthiness of Generative Foundation Models

AI for Security

Applying AI techniques to strengthen cybersecurity defenses, including automated vulnerability repair, threat detection, and security analysis.

[NAACL-2025] CVE-Bench: Benchmarking LLM-based Software Engineering Agent’s Ability to Repair Real-World CVE Vulnerabilities