The Customer: A Scaling Enterprise Running Multiple AI Models
Our client was a fast-growing enterprise SaaS company that had integrated AI into core product features . Over 12 months, they had quietly adopted 3 different AI models across different teams: OpenAI, Anthropic, Gemini. The result? Chaos. No unified logging. No way to see which model was handling what. No safeguards against malicious inputs.
The Problem
As AI usage exploded internally, so did risk. The engineering team had no centralized way to monitor, control, or audit AI requests across the organization.
- Zero Visibility: Requests were going directly to individual model APIs. There was no central log of what prompts were being sent or what responses were coming back. Debugging failures meant manually digging through scattered logs across 3 different systems.
- Prompt Injection Vulnerabilities: Users - both internal and external - were sending crafted inputs designed to manipulate model behavior. Without a detection layer, these attacks were going completely unnoticed.
- Confidentiality Leaks: Sensitive PII and internal data was being inadvertently included in prompts sent to third-party cloud APIs, creating a compliance and legal risk the team hadn't fully appreciated.
- No Cost Control: Different teams were spinning up model calls with no rate limiting or budget awareness, resulting in runaway API costs with no attribution.
How we helped
We designed and built a centralized AI proxy layer that sits between the client's applications and all external (and internal) model endpoints. Every AI request - regardless of which team sent it or which model it targeted - now flows through this single gateway.
- Unified Multi-Model Router: The proxy intelligently routes requests to the appropriate model based on task type, cost thresholds, and latency requirements. Teams no longer hardcode model endpoints; the router handles it dynamically.
- Prompt Injection Detection Engine: We built a multi-layered detection pipeline that analyzes incoming prompts for known injection patterns, role-override attempts, and jailbreak structures. Flagged requests are quarantined and escalated in real-time.
- Confidentiality Controls: A pre-processing layer scans outgoing prompts for PII (names, emails, phone numbers, financial data) using a combination of regex rules and a lightweight classification model. Sensitive fields are masked or rejected before ever reaching an external API.
- Detailed Observability Dashboard: Every request and response is logged with full metadata - model used, latency, token count, cost estimate, user/team attribution, and any flags raised. The dashboard gives leadership a live view of AI activity across the entire organization.
- Rate Limiting and Budget Guardrails: Per-team and per-application rate limits prevent runaway usage, with configurable alerts when spending thresholds are approached.
The Results: Control Without Compromise
Within the first week of deployment, the system detected and blocked 47 prompt injection attempts that would have previously gone unnoticed. The compliance team was able to retroactively identify three categories of PII that had been leaking into external model calls - and put an immediate stop to it.
Engineering teams reported dramatically faster debugging cycles, with full request-response traces available in seconds rather than hours. Leadership now has a single source of truth for AI usage, cost, and risk across the entire company.
The proxy layer became the foundation on which the client now scales confidently - adding new models and new AI-powered features without sacrificing observability or security.






