Skip to content

guard json schema ref retrieval against internal targets#2269

Open
uwezkhan wants to merge 1 commit into
confluentinc:masterfrom
uwezkhan:json-schema-ssrf-guard
Open

guard json schema ref retrieval against internal targets#2269
uwezkhan wants to merge 1 commit into
confluentinc:masterfrom
uwezkhan:json-schema-ssrf-guard

Conversation

@uwezkhan

@uwezkhan uwezkhan commented Jun 8, 2026

Copy link
Copy Markdown

The JSON deserializer resolves a $ref the schema registry doesn't know about by handing the raw URI to httpx.get in _retrieve_via_httpx. The writer schema is selected by the schema id embedded in the consumed message, so a producer can register a schema whose $ref points at http://169.254.169.254/..., loopback, or an RFC1918 host, and the consumer fetches it during deserialization and parses the response as a schema.

Before, any scheme and any address were fetched. After, the helper requires http/https, resolves the host, and refuses private, loopback, link-local, reserved, multicast, or unspecified targets (including IPv4-mapped IPv6); public URLs still resolve as they did. The check lives in the retrieve callback because that is the single point every $ref lookup passes through, so the sync and async paths are both covered without each caller repeating it. Tradeoff: a schema legitimately served from an internal host is now rejected and has to be reachable at a public address or registered as a named reference.

@uwezkhan uwezkhan requested review from a team and Matthew Seal (MSeal) as code owners June 8, 2026 15:07
@confluent-cla-assistant

confluent-cla-assistant Bot commented Jun 8, 2026

Copy link
Copy Markdown

🎉 All Contributor License Agreements have been signed. Ready to merge.
✅ uwezkhan
Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.

@rayokota Robert Yokota (rayokota) added the component:schema-registry Any schema registry related isues rather than kafka isolated ones label Jun 8, 2026
@rayokota

Copy link
Copy Markdown
Member

/sem-approve

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens JSON Schema $ref retrieval during deserialization by adding an allowlist-style network guard in the shared external-retrieve callback, preventing SSRF-style fetches to internal/non-routable targets while preserving public URL retrieval.

Changes:

  • Add _guard_external_uri() and IP classification helpers to block non-HTTP(S) schemes and non-public resolved targets before fetching.
  • Update _retrieve_via_httpx() to apply the guard and explicitly disable redirect following.
  • Add tests validating that internal/non-HTTP(S) targets are rejected without issuing an HTTP request, and that a public-resolving host is allowed.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/confluent_kafka/schema_registry/common/json_schema.py Adds URI scheme/host validation and IP-range blocking before performing external $ref retrieval.
tests/schema_registry/test_json_schema_retrieve.py Adds regression tests for allowed/blocked $ref retrieval behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +64 to +75
def _is_blocked_ip(ip) -> bool:
mapped = getattr(ip, 'ipv4_mapped', None)
if mapped is not None:
ip = mapped
return (
ip.is_private
or ip.is_loopback
or ip.is_link_local
or ip.is_reserved
or ip.is_multicast
or ip.is_unspecified
)
Comment on lines +85 to +88
try:
infos = socket.getaddrinfo(host, parts.port or (443 if parts.scheme == 'https' else 80))
except socket.gaierror as ex:
raise ValueError("Could not resolve schema URI host {}: {}".format(host, ex))
Comment on lines +28 to +35
@pytest.mark.parametrize("uri", [
"http://169.254.169.254/latest/meta-data/",
"http://127.0.0.1:8080/internal",
"http://10.0.0.5/schema",
"http://[::1]/schema",
"file:///etc/passwd",
"gopher://127.0.0.1/x",
])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:schema-registry Any schema registry related isues rather than kafka isolated ones

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants