DLP (Data Loss Prevention)
The set of controls that keeps sensitive data from leaving the perimeter. For AI, it means inspecting every prompt before the provider and masking or blocking CPF, cards, credentials and PII.
Why it matters
DLP — Data Loss Prevention — is the discipline of keeping sensitive information from crossing the organization's boundary. In the AI context, that boundary is the prompt: everything the user types and sends to the model leaves your operation for an external provider. Without DLP at that point, a tax ID pasted for the model to check, an API key inside a snippet of code, or a customer secret embedded in context all become tokens sent outside — and often recorded in logs no one reviews.
What makes AI DLP specific is that the leak happens through normal usage, not through an attack. No one is circumventing a protection; it's that there is no protection at all between what gets typed and the model that receives it. And the data that left doesn't come back. That's why the control has to sit on the path of the call, not in a best-practices manual.
How it works
An AI DLP inspects the content before the call reaches the provider — pre-call, at the gateway layer. Detection works on two complementary fronts. The first is deterministic: known patterns — API keys, secrets, document formats — recognized by regex executed under Google RE2, which is ReDoS-immune and therefore safe to run on every request. The second is NLP, with Microsoft Presidio, which recognizes what has no fixed format, such as a person's name. Together they cover both the rigid-pattern secret and the PII that's only identifiable from context.
On a hit, there are two responses: mask the snippet — replacing the sensitive data and letting the call proceed without it — or block the request entirely. The choice is a policy decision, set per organization, with three modes: Off (the check doesn't run), Monitor (the hit is recorded but the call proceeds, to measure exposure before tightening), and Block (the sensitive data stops the send). So the organization calibrates the rigor to its own risk without switching tools.
How Horse Labs handles it
At Horse Labs, every call passes a SecOps guardrail at the gateway before reaching the provider — the single point where PII and secrets are detected before they're sent. The guardrail combines deterministic detection (regex under RE2) with Presidio's NLP tier, and applies the per-organization policy in Off, Monitor, or Block mode. Solved pre-call, the sensitive data is no longer present when the rest of the chain logs the request: what wasn't sent can't be logged.
Common mistakes
The most frequent mistake is treating DLP as manual prompt review: asking each developer to remember to scrub the content before sending. That doesn't scale and isn't reliable — it depends on everyone remembering, always, in every piece of code, and a single lapse already exposes the data. The second mistake is letting the DLP fail open: if the detector becomes unavailable and the call goes through anyway, the protection vanishes exactly when it would be needed. A real DLP is fail-closed and runs at a single point, not on each person's discipline.