AI model governance

AI model governance means deciding and enforcing which models the organization may use and routing each task to the right model, without being tied to a single provider. This guide details the four mechanisms: an allowlist that starts off, a catalog that comes live from each provider's API, wildcard routing that avoids lock-in, and approval enforced at the gateway itself.

Default-OFF allowlist

A default-OFF allowlist is a list of approved models that starts empty: nothing is usable until the organization explicitly approves each model.

When access to models follows the provider's default, everything it exposes is available out of the box — the expensive models, the experimental ones, the just-released ones, and the ones your organization never evaluated. The default is "allowed," and it falls to someone to remember to turn off what shouldn't run, one by one, always chasing what the provider publishes. The default-OFF allowlist flips that logic: the list starts empty and nothing is usable until it's explicitly approved. The default stops being permissive and becomes denied by omission — the organization decides what may run, not the vendor's catalog.

In practice this means adopting a new model is a deliberate decision, not a side effect of the provider having shipped it. The approval is stored as governance state — not baked into the code that calls the gateway — so the list of what's allowed is a single, reviewable, auditable source. Approving or revoking a model is editing that list, without touching any application. The result is real control over the model surface: the organization knows exactly what's enabled and on whose decision.

At Horse Labs, the allowlist is global and default-OFF, stored in the governance-db: nothing is usable until it's approved, and approval is toggled in batches (admin + second factor) — the organization decides what may run.

Live catalog

A live catalog means discovering the available models by querying each provider's API in real time, instead of keeping a static list that ages.

A model list hand-written in the code has a short shelf life. Providers ship new models, deprecate old ones, and rename versions at a pace no static list keeps up with: before long it points at models that no longer exist and ignores ones that just appeared. Keeping that list current becomes recurring, error-prone work, and every lag is either a model wrongly unavailable or a decision made on stale information. The live catalog removes that maintenance: the available models come from each provider's own API, queried when the list is needed.

With that, what the organization sees to approve is always the current state of what each provider actually offers — including the name and date of each model, pulled from the API's own response. New models surface for evaluation as soon as the provider publishes them; deprecated ones drop off. Governance stays the allowlist, which decides what's approved; the catalog only guarantees that decision is made against present reality, not a stale snapshot someone forgot to revisit.

At Horse Labs, the model catalog comes live from the providers' /models APIs (with name and date per model), instead of a static list in the code — new models surface for approval and deprecated ones drop off on their own.

Routing without lock-in

Routing without lock-in means dispatching each call wildcard per provider, so that adding or swapping a model or provider is configuration, not a code rewrite.

Lock-in is born when the application talks directly to a single vendor's API: that provider's client, its formats, and its quirks end up scattered through the code. Switching providers, in that scenario, is a rewrite project, and that friction is exactly what ties an operation to one vendor even when another would be better or cheaper. Wildcard-per-provider routing breaks that coupling: the gateway understands a wildcard route for each provider, so enabling a new provider or a new model is adjusting configuration, not touching the application.

For this to work, whoever calls the gateway uses a model identifier prefixed by the provider, not a loose alias — the prefix makes explicit which provider the call is destined for and lets wildcard routing resolve the target without a manual entry per model. The application talks to a single stable interface, in the OpenAI-compatible standard, while the choice of provider becomes a reversible configuration decision. Control over which models run and portability across vendors stop being conflicting goals: both live in the same layer.

At Horse Labs, the gateway's model_list is wildcard per provider (anthropic/* openai/* gemini/* xai/*) and the caller uses the prefixed id (e.g. anthropic/claude-opus-4-8) — adding or swapping a provider is configuration, not a rewrite.

Enforcement at the gateway

Enforcement at the gateway means checking the allowlist at the single point every call passes through: a non-approved model returns 403, before it reaches the provider.

An allowlist only counts if it's impossible to bypass. If approval were a convention — a list in a document each team is expected to respect — one direct call, a forgotten snippet of code, or a misconfigured agent would be enough to run a never-approved model, and no one would notice until the spend or the incident showed up. Enforcement at the gateway closes that gap by placing the check at the one point every call necessarily passes through: before forwarding to the provider, the gateway verifies that the model is on the allowlist for that access profile. It isn't, the call doesn't proceed — it gets a 403.

The important detail is that the governance decision (what's approved) and the enforcement point (the gateway) stay separate but connected: the allowlist is the source of truth, and what the gateway permits is the already-reconciled result of that decision. So approving or revoking a model in the allowlist takes effect at enforcement without rewriting any rule. The end result is that "which models may run" stops being an intention and becomes a technical guarantee: what wasn't approved simply doesn't execute.

At Horse Labs, approval is enforced at the gateway via LiteLLM's team.models (the already-reconciled result of the allowlist): a non-approved model returns 403, before it reaches the provider.


FAQ

What is AI model governance?

It's the set of controls that decides and enforces which models the organization may use — a default-OFF allowlist fed by the providers' live catalog — and routes each task to the right model wildcard per provider, with approval enforced at the gateway, without lock-in.

How do I switch model providers without rewriting the code?

By routing wildcard per provider: the application talks to a single OpenAI-compatible interface using a provider-prefixed model id, and enabling or swapping a provider becomes a configuration change — not a rewrite.