Status: Experimental
The security gateway checks agent traffic that actually passes through Calciforge. That phrase matters. Calciforge can inspect model calls, tool requests, fetched pages, and provider traffic only when those requests use a Calciforge-controlled path. It is not a spell cast over every process on the machine.
Calciforge treats coverage as a support-tier question:
calciforge doctor reports the
gaps it can see.For stronger guarantees, route model calls through Calciforge’s model gateway, give agents explicit Calciforge fetch/tool wrappers, or run the agent under a host/container boundary that prevents bypass.
Outbound traffic from protected agents can be routed through the gateway by a
specific supported integration. Calciforge’s own provider calls, health
checks, and LAN control-plane traffic should not use ambient
HTTP_PROXY/HTTPS_PROXY; proxying Calciforge itself can send model-gateway
requests and internal webhooks through the security proxy unnecessarily or
recursively.
Outbound pipeline:
api_key=sk-.... Transport-auth headers, such
as Authorization, Cookie, and provider API-key headers, are sanitized
before this check; otherwise normal model/provider sessions and local
gateways would look suspicious. Exact proxy-managed explicit references such as
{{secret:NAME}} and
Bearer {{secret:NAME}} are safe control syntax;
mixed manual-plus-reference values still remain visible to the check.scan_outbound = true, outgoing request bodies are analyzed by the
adversary-detector for exfiltration language, credential-harvest phrasing,
and adversarial patterns. This is opt-in by default because provider/tool
transcripts often include benign prompt-injection examples and opaque IDs.{{secret:NAME}} in URLs, headers, and supported
bodies, and inject provider Authorization headers from the configured
env/fnox resolver. The
staged placeholder path will use this same step to replace registered opaque
credentials such as cfg_OPENAI_API_KEY_<random> once lifecycle wiring is
enabled.X-Calciforge-* control headers, then forwards the request to the
destination.Inbound pipeline:
scan_response_secrets = true,
response bodies are also checked for high-entropy and secret-shaped values.
This is opt-in by default because provider APIs commonly return opaque IDs
and hashes as normal transport data.unsafe, the gateway blocks the
content and returns 403 Forbidden to the agent.The gateway has several enforcement modes. They are not interchangeable; pick the strongest mode the target agent can actually run under, then verify that the selected agent adapter actually uses it.
| Mode | Level | Status | Description |
|---|---|---|---|
| Model gateway | API | Working | Route OpenAI-compatible model calls through Calciforge’s gateway. This is the most reliable path for providers and local dispatcher routes because Calciforge owns the HTTP request. |
| Explicit tools/fetch | App | Working/expanding | Give agents Calciforge-provided fetch, MCP, or recipe wrappers for network actions that need scanning or secret substitution. MCP means Model Context Protocol: a structured way to expose tools to an agent. |
| Cooperative HTTP proxy | App | Limited | Set HTTP_PROXY only for agents and tools that have been tested with the proxy. This is useful for plaintext HTTP and simple HTTP clients. |
| HTTPS inspecting proxy | App/host trust | Experimental | Trust a Calciforge CA and terminate CONNECT traffic for clients that support custom trust stores. CA means certificate authority: a local certificate issuer your runtime agrees to trust. The hudsucker-backed prototype runs the existing scan/substitution pipeline over decrypted requests and responses. |
| OS redirect | Host | Roadmap | Use firewall rules such as Linux iptables/nftables or macOS pf to redirect outbound traffic from a controlled UID/process group to the gateway. |
| Container or VM isolation | Runtime | Roadmap | Run the agent in Docker, a Linux namespace, LXC, or a VM where egress is denied except through Calciforge-managed gateways. This is the likely path for agents that ignore proxy env or use complex transports. |
| Placeholder injection | Secret boundary | Staged primitives | Give off-the-shelf agents fake env credentials or managed credential files and substitute real secrets only at the gateway. This keeps raw secrets out of agent memory but still needs agent lifecycle wiring, live request rewriting, and a network enforcement path. |
Calciforge now has two related secret-use shapes:
{{secret:NAME}}. Agents that know about Calciforge can
ask calciforge-secrets ref NAME or use the MCP tool, place that reference
into a visible outbound request, and let the gateway resolve it.cfg_<NAME>_<random>. Calciforge generates the token, registers the full
token against an authoritative secret name for one agent, then provides the
token through a supervised surface such as an environment variable, wrapper,
or managed credential file. The embedded <NAME> is only a hint for humans;
policy must resolve the full token through the per-agent registry.The second path exists because many agents and tools do not know Calciforge’s
mustache-style syntax. They expect OPENAI_API_KEY, a credentials directory,
or a provider config file. For example, an OpenClaw lane may already have a
plaintext credentials folder. In a managed placeholder setup, Calciforge should
write placeholder values there instead of real keys, register those values with
the security proxy, and retire them when that managed runtime stops or rotates.
Do not mark explicit references deprecated yet. They remain the only fully wired path, they are simple to audit, and they work for agents that can follow Calciforge’s CLI/MCP guidance. Placeholder credentials may become the default for some supervised first-class agents once generation, delivery, registration, live replacement, and retirement are all end-to-end tested. Even then, both mechanisms may remain supported: explicit references are clearer for agent-aware workflows, while opaque placeholders are better for ordinary tools that expect env vars or credential files.
There is also a scanner compatibility reason to keep both. Opaque placeholders are deliberately random and secret-shaped. If IronClaw-style exfiltration detection is enabled, those stand-ins may look like credentials unless the scanner learns Calciforge’s placeholder registry or allowlist. That is solvable, but it means placeholder injection and aggressive exfil detection should be treated as separate knobs until the integration is proven.
The unified installer starts security-proxy, but it does not put
HTTP_PROXY/HTTPS_PROXY on the Calciforge service itself. Do not assume
command-line or exec-backed agents can be protected by generic proxy
environment variables.
Codex, Claude, ACPX, npm-backed adapters, and streaming clients may use
CONNECT, WebSockets, or browser-backed authentication flows that the current
proxy cannot inspect and may break. Keep those agents unproxied unless you
have a tested wrapper for that specific runtime, and prefer OpenAI-compatible
gateway routes or explicit fetch/tool integrations for traffic that must be
scanned.
By default security-proxy binds to 127.0.0.1. Keep that default for a
single-host install. For a trusted LAN deployment where other agent hosts must
use one shared proxy, set SECURITY_PROXY_BIND=0.0.0.0 for the local installer
run, or add "security_proxy_bind": "0.0.0.0" to that host’s node entry in
deploy/nodes.json. Pair a LAN bind with host firewall rules or equivalent
network restrictions when the LAN is not fully trusted.
Ambient HTTPS_PROXY is not a complete protection story unless it points at a
Calciforge inspecting proxy and the client trusts the Calciforge CA. Standard
HTTPS proxying uses CONNECT tunnels; without inspection, a proxy can only see
the destination host and encrypted bytes. Current security-proxy uses
hudsucker to terminate CONNECT traffic, mint per-host certificates from the
configured CA, and run the existing request/response substitution and scanner
pipeline over the decrypted HTTP messages. Prefer Calciforge-owned model
gateway routes, explicit fetch/tool integration, or audited recipe wrappers for
runtimes that cannot use this trust setup.
Externally managed agent daemons are different. OpenClaw, ZeroClaw, Claude
Code, opencode, Dirac, or any custom process started by a separate service
manager must be launched with a tested proxy configuration in that service
manager, or enforced with an OS/network tier. Registering Calciforge webhooks
lets those agents talk back to Calciforge, but it does not by itself prove
their outbound HTTP is going through security-proxy.
For a manually started daemon that uses plaintext HTTP:
export HTTP_PROXY=http://127.0.0.1:8888
export NO_PROXY=localhost,127.0.0.1,::1
Use service-manager environment blocks for persistent daemons, and validate by
checking security-proxy logs while the agent makes a known outbound request.
calciforge doctor warns if the Calciforge daemon itself has ambient proxy
environment, flags explicit subprocess proxy env for verification, and warns
when configured HTTP/native agent daemons need separate validation.
HTTP(S)_PROXYCalciforge did not remove proxy support; it narrowed where proxy env is treated as a reliable security mechanism.
HTTP_PROXY remains useful for tested plaintext HTTP clients. The
OpenClaw installer path can write service proxy env via proxy_endpoint,
after checking that the configured security-proxy is reachable from the
OpenClaw host.HTTPS_PROXY should only be set for agent runtimes that have been tested
with Calciforge’s inspecting-proxy mode and trust the configured CA. Setting it globally
can break streaming clients, WebSockets, browser/OAuth flows, and npm-backed
adapters.browser.extraArgs = ["--proxy-server=..."]; relying on ambient env is
not enough because OpenClaw strips Chrome proxy env and otherwise starts
Chrome with --no-proxy-server.The installer now starts security-proxy with the hudsucker-backed inspecting
listener enabled by default and generates a persistent local CA if one does not
already exist. On macOS, the installer explains why the trust step is needed
before it asks the system to add that CA to the login keychain. This is required
for any tested browser, tool, or agent runtime that sends HTTPS traffic through
security-proxy and expects inspected pages without certificate errors. Set
SECURITY_PROXY_TRUST_MITM_CA=false to skip the keychain prompt. That makes
inspected HTTPS the default available proxy mode, but it does not automatically
make every runtime trust that CA.
To run the binary manually, use:
SECURITY_PROXY_CA_CERT=/etc/calciforge/mitm-ca.pem \
SECURITY_PROXY_CA_KEY=/etc/calciforge/mitm-ca-key.pem \
SECURITY_PROXY_PORT=8888 \
security-proxy
Then configure the target agent process, not the Calciforge daemon itself:
export HTTP_PROXY=http://127.0.0.1:8888
export HTTPS_PROXY=http://127.0.0.1:8888
export NO_PROXY=localhost,127.0.0.1,::1
The agent runtime must trust mitm-ca.pem. Depending on the runtime that can
mean the system trust store, SSL_CERT_FILE, REQUESTS_CA_BUNDLE,
NODE_EXTRA_CA_CERTS, browser trust settings, or tool-specific configuration.
The current prototype covers explicit proxy mode; OS-level transparent
redirects and installer-managed per-runtime trust setup are next.
Practical tiers:
proxy_endpoint plus inspecting-proxy CA trust for tested HTTP/HTTPS egress. This is
convenient but cooperative; OpenClaw can still bypass Calciforge if it opens
its own direct connections outside the configured proxy environment.iptables/nftables rules so the agent process has fewer unmanaged egress
paths.For agents Calciforge launches as subprocesses, start with direct channel routing plus conservative CLI flags. Add gateway coverage only through a path that has been tested for that specific runtime:
kind = "openai-compat" or the model gateway when the work is really a
model call;For externally managed daemons, Calciforge can authenticate inbound callbacks and gate channel access, but it cannot prove outbound network policy unless the daemon is launched in a controlled environment. The practical future path is a local-lab profile that can run selected agents inside a container or VM with egress limited to Calciforge services.
The gateway is configured via GatewayConfig:
scan_outbound: Toggle outbound adversary/exfiltration detection. Defaults
off while this policy matures; enable only for deployments that have tuned
false positives on provider/tool transcripts.scan_inbound: Toggle injection detection.scan_response_secrets: Toggle high-entropy/secret-pattern response leak
detection. Defaults off independently from prompt-injection scanning.inject_credentials: Toggle automatic API key injection.manual_credential_override_requires_operator_approval: Require an operator token for ironclaw.manual_credential override headers. Default: true.bypass_domains: List of domains that skip scanning (e.g., internal services).scanner_checks: Ordered adversary-detector checks. Empty means the built-in
default Starlark scanner policy.Manual credential blocks return an agent-readable explanation plus structured headers:
X-Calciforge-Policy: ironclaw.manual_credentialX-Calciforge-Operator-Approval: requiredX-Calciforge-Override-Supported: operator_scopedX-Calciforge-Override-Header: X-Calciforge-OverrideThe operator override header is request-side control metadata, not upstream API input:
X-Calciforge-Override: ironclaw.manual_credential:<token>
With the default configuration, <token> must match
SECURITY_PROXY_MANUAL_CREDENTIAL_OVERRIDE_TOKEN. Operators can explicitly
allow self-asserted overrides by setting
manual_credential_override_requires_operator_approval = false in
security-proxy.toml, or
SECURITY_PROXY_MANUAL_CREDENTIAL_OVERRIDE_REQUIRES_OPERATOR_APPROVAL=false
in the service environment. Calciforge strips X-Calciforge-* headers before
forwarding, so override metadata is never sent to the upstream server.
Calciforge’s security checks are an ordered pipeline:
scanner_checks is empty.
It implements the default hidden-payload, prompt-injection, PII-harvest,
and exfiltration checks in editable policy code.starlark — in-process operator policy. This is the low-latency path for
site-specific rules that do not need network calls. Policies can call
regex_match(pattern, content) and
base64_decoded_regex_match(pattern, content) for bounded Rust-backed
matching.remote_http — optional custom policy service. This is where operators can
add a model-based classifier, heavier data-loss prevention checks, or
organization-specific threat modeling that belongs outside the proxy
process.Not every gateway denial should be equally overrideable. Recommended defaults:
| Policy / block class | Configurable? | Overrideable? | Default approval |
|---|---|---|---|
ironclaw.manual_credential — raw credential supplied by the agent |
Yes | Yes, scoped header | Operator required |
Secret substitution destination denied by secret_destination_allowlist or dynamic allowed_destinations metadata |
Yes, via operator config or secret metadata | Not by agent header | Operator config/metadata change required |
Malformed or unresolved {{secret:NAME}} |
No | No | Fix request or secret store |
agent_web.forbid_search_engines |
Yes | Prefer config only | Operator config change required |
agent_web.preflight_message_urls destination denial |
Yes | Prefer config only | Operator config change required |
agent_web.scan_search_responses blocked result |
Yes | Prefer config only | Operator config change required |
| Provider-side browsing tool stripped/blocked | Yes | Prefer config only | Operator config change required |
| Inbound prompt-injection / unsafe response scan | Yes, scanner policy | Not by agent header | Operator policy/config change required |
| Outbound exfiltration scan | Yes, scanner policy; default off | Not by agent header | Operator policy/config change required |
| Response secret-leak scan | Yes; default off | Not by agent header | Operator policy/config change required |
The reason for the split is blast radius. Manual-credential detection can be a false positive for legacy APIs that use unfortunate parameter names, so a scoped override is useful. Transport authentication is not governed by a provider-host whitelist; known auth headers are sanitized before the manual-credential scanner, and real secret movement is governed by placeholder resolution plus destination allowlists. Destination allowlists, prompt-injection blocks, and opt-in exfiltration/secret-leak blocks are higher-risk policy boundaries; an agent should receive a clear explanation and ask for operator help rather than self-override.
Calciforge can still make these policies configurable for operators. The key
rule is that configuration changes should happen in security-proxy.toml,
service environment, or policy files, while request-carried override metadata
stays narrowly scoped and is stripped before forwarding upstream.
Calciforge intentionally has both local and remote adversary detectors. The local Starlark policy is for deterministic prefiltering: hidden page text, encoding, obvious exfiltration language, and concrete tool-policy bypass patterns. The remote HTTP/model check is for semantic judgment: foreign language, poetry or other style-shift attacks, fictional framing, coercion, multi-step decomposition, and intent that would be brittle or overbroad as regex. The remote pass adds latency and still asks one model to defend another model, so Calciforge keeps Starlark as the default and makes model review explicitly configurable.
No remote service is required for the default gateway. The localhost HTTP hop is small, but a model classifier call is not; enable it only when the extra security pass is worth the added latency.
On a local release build, the built-in Starlark default scanner measured about
299µs per warm scan for ordinary small content. Treat that as a sanity check,
not a universal latency guarantee: large bodies, cold starts, extra configured
policies, proxy I/O, and remote LLM checks dominate real end-to-end latency.
The example prompt covers more than classic prompt injection: credential
exfiltration, malicious tool-use instructions, false authority claims, identity
spoofing, cross-agent propagation, denial-of-service attempts, destructive
cleanup, unbounded resource use, and other governance failures described by
agent red-team work such as
Agents of Chaos.
For the standalone security-proxy binary, the fastest way to add a custom
remote check is:
SECURITY_PROXY_REMOTE_SCANNER_URL=http://127.0.0.1:9801 \
SECURITY_PROXY_REMOTE_SCANNER_FAIL_CLOSED=true \
security-proxy
For Calciforge channel-message scanning, use:
CALCIFORGE_REMOTE_SCANNER_URL=http://127.0.0.1:9801 \
CALCIFORGE_REMOTE_SCANNER_FAIL_CLOSED=true \
calciforge
The unified installer can also host the example scanner as a managed local service:
CALCIFORGE_REMOTE_SCANNER_ENABLED=1 \
REMOTE_SCANNER_API_KEY_FILE=~/.config/calciforge/secrets/remote-scanner-api-key \
REMOTE_SCANNER_PROMPT_FILE=~/.config/calciforge/remote-llm-scanner-prompt.txt \
bash scripts/install.sh
When enabled, the installer starts remote-llm-scanner on
127.0.0.1:9801 and sets SECURITY_PROXY_REMOTE_SCANNER_URL plus
CALCIFORGE_REMOTE_SCANNER_URL for the managed services. The API key can be
provided through REMOTE_SCANNER_API_KEY_FILE or REMOTE_SCANNER_API_KEY; the
file path is preferred so service definitions do not contain the key. The
classifier prompt is also editable: set REMOTE_SCANNER_PROMPT_FILE to a text
file or REMOTE_SCANNER_PROMPT to an inline override. The installer seeds a
default prompt file when it manages the example service.
Or configure checks directly in config.toml:
[security]
profile = "balanced"
scan_outbound = false
scan_response_secrets = false
# Empty scanner_checks uses the built-in Starlark default:
# builtin:calciforge/default-scanner.star
#
# To customize it, copy
# crates/adversary-detector/policies/default-scanner.star to
# /etc/calciforge/scanner-policies/default-scanner.star, edit it, then
# configure it explicitly:
#
[[security.scanner_checks]]
kind = "starlark"
path = "/etc/calciforge/scanner-policies/default-scanner.star"
fail_closed = true
max_callstack = 64
[[security.scanner_checks]]
kind = "starlark"
path = "/etc/calciforge/scanner.star"
fail_closed = true
max_callstack = 64
[[security.scanner_checks]]
kind = "remote_http"
url = "http://127.0.0.1:9801"
fail_closed = true
Checks are evaluated in order. A clean result continues to the next check.
A review result is retained while later checks continue, so a later
unsafe result can still block; unsafe stops the pipeline immediately.
fail_closed controls scanner errors or outages only: with false, an
unavailable optional check is skipped; successful review or unsafe
verdicts still enforce.
Starlark checks run in-process with load() disabled and a bounded call stack.
The policy file must define scan(input) and return "clean", "review",
"unsafe", or a dict with verdict and optional reason:
def scan(input):
content = input["content"].lower()
if input["context"] == "api" and "wire money" in content:
return {
"verdict": "unsafe",
"reason": "operator policy blocks wire-transfer instructions",
}
return "clean"
Starlark policies receive url, content, context,
discussion_ratio_threshold, and min_signals_for_ratio. They also have
helpers backed by Rust’s regex crate with compiled-pattern caching:
regex_match(pattern, content) for direct matching and
base64_decoded_regex_match(pattern, content) for bounded inspection of
base64-encoded text tokens. See
crates/adversary-detector/policies/default-scanner.star for the default
policy, examples/security-scanner.star for a minimal starter policy, and
examples/scanner-policies/ for reusable examples covering destination
allowlists, destructive command patterns, and credential-language review.
calciforge doctor --no-network validates Starlark policy files and remote
scanner URL syntax without calling remote scanner services.
Remote checks receive the same content that would otherwise be allowed or blocked by the local scanner:
POST /scan
Content-Type: application/json
{"url":"https://api.example.com","content":"...","context":"api"}
They return:
{"verdict":"clean|review|unsafe","reason":"short reason"}
scripts/remote-llm-scanner.py is a built-in example. It exposes /scan and
uses the local Calciforge model boundary by default with a strict
security-classifier prompt:
REMOTE_SCANNER_API_KEY_FILE=~/.config/calciforge/secrets/model-gateway-client-key \
REMOTE_SCANNER_API_BASE=http://127.0.0.1:18083/v1 \
REMOTE_SCANNER_MODEL=adversary/default \
REMOTE_SCANNER_PROMPT_FILE=./scripts/remote-llm-scanner-prompt.txt \
./scripts/remote-llm-scanner.py
Use fail_closed = true when the remote check is part of your enforcement
boundary. Use fail_closed = false for advisory classifiers where local checks
must continue to work if the remote service is unavailable.
There are three extension paths today:
adversary-detector can implement the
ScannerCheck trait and compose their own in-process pipeline.security-proxy instances can load Starlark policy
files for low-latency operator-owned logic without a sidecar service.security-proxy instances load arbitrary custom
logic through the remote_http contract above. That keeps heavyweight code
outside the trusted proxy process and lets users write checks in Python, Rust,
Go, Lua, shell, or any other runtime.Scanner code is operator-owned configuration-layer policy, so the sandbox is
not about treating the operator as hostile. It is about reliability and
blast-radius reduction: accidental recursion, dependency behavior, or unexpected
file and network access should not weaken the gateway. Starlark is the default
in-process scanner layer because it is already used by Calciforge policy code,
has no ambient filesystem or network access in this integration, supports
editable branching logic, and can use cached Rust regexes through
regex_match(). WebAssembly remains a possible future plugin layer when
stronger fuel and memory controls are needed. Use Starlark for local rules,
including regexes, keyword lists, size limits, allowed-language checks, or
context-specific branching; use remote_http when the rule needs networked
services or heavyweight dependencies.
Starter Starlark policies live under examples/scanner-policies/:
| Policy | Purpose |
|---|---|
allowed-destinations.star |
Review or block credential-shaped content sent outside an allowed destination list. |
command-denylist.star |
Block destructive shell-command patterns and review network download commands. |
credential-language.star |
Review or block credential disclosure, forwarding, and exfiltration language. |
Copy these into /etc/calciforge/scanner-policies/, edit the constants at the
top of each file, then add one or more starlark checks to config.toml.
Integration tests are located in crates/security-proxy/tests/. They verify:
The scanner also has a contributor-friendly red-team fixture suite:
cargo run -p adversary-detector --example red-team
Fixtures live in examples/red-team/adversary-fixtures.json. Add cases there
when you find a bypass or false positive. Useful categories include encoded
payloads, foreign-language prompt injection, Unicode obfuscation, benign
security research, and GTFOBins/LOLBins-style instructions where a legitimate
tool is used to bypass a higher-level policy. Some fixtures can intentionally
document current gaps by expecting clean; hardening work should update the
fixture expectation in the same PR that improves the policy.
Good sources for new fixture families include:
Agents of Chaos.[security.secret_access] — identity-scoped secret ACLs[security.secret_access] gates which secret names an identified agent,
user, or channel may discover, reference, and substitute. This is an
identity gate; secret_destination_allowlist and dynamic
allowed_destinations metadata still apply independently as destination
gates.
[security.secret_access]
[[security.secret_access.rules]]
agents = ["research-*"]
users = ["owner"]
channels = ["signal"]
secrets = ["BRAVE_*", "SEARCH_*"]
Rule selectors are conjunctive. Empty agents, users, or channels
lists are wildcards for that selector type; configured selectors must
match the active identity. secrets must be non-empty and supports *
wildcards.
Identity sources:
calciforge-secrets: CALCIFORGE_AGENT_ID,
CALCIFORGE_USER_ID, CALCIFORGE_CHANNEL_ID, or
CALCIFORGE_CHANNEL.calciforge-secrets wrappers forward those identities to the
central secret-control API; managed installs set CALCIFORGE_AGENT_ID to
the claw name in the generated wrapper.x-calciforge-agent-id, legacy x-agent-id,
x-calciforge-user-id, x-calciforge-channel-id, or
x-calciforge-channel.Secret access rules fail closed: if no rule allows a secret, list_secrets
and calciforge-secrets list hide it, reference creation rejects it, and
security-proxy substitution refuses to resolve it. Unknown identities preserve
process-scoped compatibility only when no secret access rules are configured.
The proxy strips Calciforge identity headers, including legacy x-agent-id,
before forwarding upstream.
This ACL is a read/use policy. The central GET /control/secrets/list and
GET /control/secrets/ref/* helper endpoints use the read-only
secret_discovery_api_key. The central POST /control/secrets/set helper
remains a privileged operator path guarded by the secret_control_api_key
and, when allowed_destinations are supplied, refuses to store the secret
value unless destination metadata is stored first. It does not currently
grant per-identity write permissions; treat
that as separate secret-integrity hardening before exposing write-capable
helpers broadly.
[security.agent_web] — agent-web-content defensesCalciforge’s inspecting gateway can scan outbound HTTPS when the runtime uses
the trusted proxy path, but the highest-likelihood leak path for blocked
content is not a direct egress to a denied host. It is the search-API
response that contains pre-indexed snippets of the same denied host, or a
provider-side browsing tool that the model invokes from inside an allowed
api.openai.com session.
[security.agent_web] adds four configurable defenses against this class of leak. All default to safe values; operators opt into stricter modes.
This complements but does not replace secret_destination_allowlist or
dynamic allowed_destinations secret metadata. Those allowlists gate
secrets-into-hosts, while agent_web gates content: search snippets,
provider browsing tool definitions, and URLs in large-language-model request
bodies. Static TOML policy and dynamic metadata are intersected; metadata read
failures fail closed when substitution needs a destination policy decision.
forbid_search_enginesBlock all egress to known search APIs entirely. When true, requests to any host matching search_engine_patterns are denied.
[security.agent_web]
forbid_search_engines = true
# Override the default curated list (api.search.brave.com, duckduckgo.com,
# api.tavily.com, serpapi.com, serper.dev, api.firecrawl.dev, api.you.com,
# api.exa.ai, api.kimi.com, api.minimax.com).
search_engine_patterns = ["api.search.brave.com", "api.tavily.com"]
scan_search_responsesScan responses from search APIs for prompt-injection AND for URLs that fail the url_destination_denylist.
search_response_strategy = "block" (default) — replace the entire response with the standard block page.search_response_strategy = "strip" — parse the JSON and drop only the offending result entries; falls back to “block” if the JSON can’t be parsed.[security.agent_web]
forbid_search_engines = false
scan_search_responses = true
search_response_strategy = "strip"
url_destination_denylist = ["leaked-corp-docs.example.com", "intranet.acme.local"]
forbid_provider_browsingInspect outbound LLM API request bodies and either strip or block known provider-side browsing tools (web_search, web_search_preview, web_search_20250305, computer_use_*, google_search, google_search_retrieval, browser, browser_use, …).
Always-search models (gpt-4o-search-preview*) cannot be stripped — they’re always blocked when this is on.
provider_browsing_strategy = "strip" (default) — rewrite request body to drop the tool defs.provider_browsing_strategy = "block" — refuse the request entirely.[security.agent_web]
forbid_provider_browsing = true
provider_browsing_strategy = "strip"
# Override the curated tool / model lists if needed.
forbidden_browsing_tools = ["web_search", "web_search_20250305", "google_search"]
forbidden_browsing_models = ["gpt-4o-search-preview"]
known_llm_apis = [
"api.openai.com",
"chatgpt.com",
"chat.openai.com",
"api.anthropic.com",
"openrouter.ai",
"generativelanguage.googleapis.com",
"api.groq.com",
]
preflight_message_urlsExtract https?://... URLs from outbound LLM request bodies for hosts in known_llm_apis; test each against url_destination_denylist. The scanner covers common shapes such as messages[].content, Anthropic content arrays, OpenAI Responses input, provider-specific nested JSON envelopes, and tools[].description when preflight_tool_descriptions = true.
If any URL would be blocked at fetch time, the LLM request is refused before forwarding to the provider. This is separate from content scanning: response scanners still inspect raw content that crosses the gateway, while URL preflight prevents opaque provider-side browsing from fetching denied origins where the gateway would otherwise only see a synthesized model summary.
[security.agent_web]
preflight_message_urls = true
preflight_tool_descriptions = true
url_destination_denylist = ["leaked-corp-docs.example.com", "ref.jock.pl"]
Each policy hit emits a tracing INFO event with structured fields (policy = "agent_web.<feature>", dest_host, decision, plus tool/model/denied_host when relevant) — these flow into the existing Calciforge audit pipeline.