RFC0055 Identity-Aware Routing by rkoster · Pull Request #535 · cloudfoundry/routing-release

rkoster · 2026-03-05T16:26:52Z

RFC0055: Identity-Aware mTLS Routing

Implements Phase 1 (1a + 1b) of RFC0055: App-to-App mTLS Routing.

Tracking: cloudfoundry/community#1481
Acceptance Testing Guide: https://gist.github.com/rkoster/5b252b0edca606f10be2dbdcb81a796f

What This Does

Enables GoRouter to enforce mutual TLS and identity-based authorization on a per-domain basis. Apps calling routes on configured mTLS domains must present a valid Diego instance identity certificate. GoRouter extracts the caller's app/space/org identity and checks it against route policies before forwarding the request.

Phase 1a: mTLS Infrastructure

Per-domain TLS configuration via GetConfigForClient callback (SNI-based)
Domain-specific client certificate validation against configurable CA
Domain-aware XFCC header handling with two formats:
- raw: base64-encoded full certificate (~1.5KB)
- envoy: compact Hash=...;Subject="..." format (~250 bytes)
SNI/Host mismatch protection (prevents connection reuse attacks across domains)
BOSH job properties for router.domains

Phase 1b: Authorization

Identity extraction from Diego instance identity certificates (Subject DN OUs + SPIFFE URIs)
Pre-selection auth: validates mTLS domain, client cert presence, identity extraction
Post-selection auth: enforces route policies (scope and allowed_sources) against selected endpoint
Supports authorization at app, space, and org granularity
Default deny when no route policies are configured
RTR access logs emitted for denied requests (401/403)

Key Design Decisions

Two-layer authorization: Pre-selection (before endpoint is chosen) handles domain/cert/identity checks. Post-selection (after load balancer picks a backend) handles scope and route-policy checks against the specific endpoint's tags.
Feature is dormant by default: No behavior change unless router.domains is configured in the BOSH manifest and a shared domain with --enforce-access-rules is created.
No regression on existing traffic: Non-mTLS domains are completely unaffected.

Testing

Unit tests for all new handlers and config validation
Integration tests for end-to-end mTLS routing flows
BOSH template tests for configuration rendering
CI runs go fmt, go vet, staticcheck, ginkgo with --race

Configuration Example

# BOSH manifest (via ops-file)
router:
  domains:
    - name: "*.apps.identity"
      ca_certs: "((diego_instance_identity_ca.certificate))"
      forwarded_client_cert: sanitize_set
      xfcc_format: envoy

Related PRs

Component	PR	Status
cloud_controller_ng	cloudfoundry/cloud_controller_ng#4910	Open
capi-release	cloudfoundry/capi-release#625	Open
CLI	cloudfoundry/cli#3758	Draft

Merge Ordering

All PRs are independently safe to merge — the feature is dormant without the ops-file and domain creation. No strict ordering required. Recommend merging around the same time once all are approved.

AI Disclosure

This PR was developed with AI assistance. All code has been read and verified manually. Each error path, branch, and edge case has corresponding test coverage.

rkoster · 2026-04-16T12:43:52Z

Latest Update: RFC-Compliant Post-Selection Authorization

Implemented breaking change to replace pre-selection authorization with strict post-selection enforcement per RFC lines 475-517.

Key Changes (commit `cbf0695`)

Architecture:

✅ Composable PostSelectionHandler interface for middleware pipeline
✅ Separation of pre-selection checks (SNI, route lookup, identity) from post-selection authorization
✅ Immediate 403 on authorization failure (non-retriable, per RFC)
✅ Post-selection scope checking with :post-selection suffix in metrics

Implementation:

handlers/post_selection_pipeline.go - Infrastructure for composable checks
handlers/mtls_scope_auth.go - Org/space boundary enforcement
handlers/mtls_access_rules_auth.go - Access rules evaluation (cf:app:, cf:space:, etc.)
handlers/mtls_pre_auth.go - Pre-selection checks only
handlers/mtls_auth_error.go - Custom error type with Rule/Reason/HTTPStatus

Test Coverage:

+44 new tests (14 scope + 17 access rules + 13 pipeline)
+4 integration tests for shared route scenarios
All 393 tests passing

RFC Compliance

✅ Intermittent 403s - Expected for shared routes across scope boundaries (RFC-compliant)
✅ Error messages - Include "caller org X does not match selected backend org Y"
✅ Strict enforcement - Prevents unauthorized cross-scope access

Breaking Change

⚠️ This replaces the permissive pre-selection authorization entirely. No feature flag provided as this is a security improvement required by the RFC.

Deprecated:

handlers/mtls_authorization.go (old implementation with migration notes)
route/pool.go EndpointOrgIDs/SpaceIDs methods

Integration Test Results

All integration tests compile successfully. Shared route scenarios validate:

Intermittent 403s with scope=space (different spaces in same org)
Always succeed with scope=org (same org, different spaces)
Always fail with scope=org (different orgs)
Per-endpoint access rules with intermittent behavior

Ready for full integration test run and review.

rkoster · 2026-04-16T13:37:29Z

Refactoring: AuthError for Future Extensibility

Commit: 4ff64b9

Renamed MtlsAuthError to AuthError to prepare for future authentication methods beyond mTLS, such as SPIFFE JWT tokens.

Changes

✅ Renamed handlers/mtls_auth_error.go → handlers/auth_error.go
✅ Updated struct, constructor functions, and all references
✅ Changed error messages from "mTLS authorization denied" to "authorization denied"
✅ Updated all test files

Benefits

🔮 Future-proof: Ready for SPIFFE JWT token authentication
🏗️ Generic design: Error type not tied to specific auth mechanism
🧩 Reusable: Can be used across different authentication methods
✨ Clean: Better naming convention for authorization errors

No functional changes - pure refactoring for extensibility.

HTTP clients that include explicit ports in URLs (e.g., https://app.example.com:443/) result in Go's http.Request.Host containing the port (app.example.com:443). Previously, GetMtlsDomainConfig() did not strip the port before matching against configured mTLS domains (e.g., *.apps.identity), causing: - Domain matching to fail for requests with explicit ports - No XFCC header added (fell back to default behavior) - Identity extraction failure in CallerIdentity - Pre-auth handler denying requests with 403 and reason "identity-extraction-failed" This particularly affected Java Spring Boot HTTP clients which construct URLs with explicit ports by default. Fix: Use net.SplitHostPort() to strip port before domain matching, ensuring consistent behavior regardless of whether clients include explicit ports. Added comprehensive unit tests covering: - Wildcard domain matching with/without ports - Exact domain matching with/without ports - IsMtlsDomain() function with/without ports - Negative test cases for non-mTLS domains

The test was incorrectly expecting 403 Forbidden when a route is registered on an mTLS domain without route policy enforcement enabled. The correct behavior is to allow the request through (200 OK) and let the backend handle authorization. Route policy enforcement is controlled by Cloud Controller via the RoutePolicyScope field. When RoutePolicyScope is empty (enforcement disabled), GoRouter allows authenticated requests through. Default-deny only applies when enforcement IS enabled but no policies are configured.

- Fix domainMatches wildcard to only match single DNS label (security) - Improve SNI/Host mismatch security checks to prevent domain confusion attacks - Add AuthError.ClientMessage() to prevent information leakage in error responses - Add nil CallerIdentity checks in post-selection auth handlers (defense in depth) - Set AuthResult.Outcome in all auth success paths for proper observability - Add proper error handling for response writes in proxy error handler - Remove unnecessary blank type assertion - Add comprehensive unit tests for mtls_pre_auth handler

- Fix scope=any to populate AuthResult for access log consistency - Add AuthResult assertions to scope=any test case - Add test coverage for unknown RoutePolicyScope default case - Add empty GUID guard to cf:app: policy for consistency with cf:space:/cf:org:

- Fix wildcard domain matching inconsistency between GetMtlsDomainConfig and domainMatches to only match single-level subdomains - Add bounds check for empty Subject field in XFCC header parsing - Change RoutePool nil handling from silent bypass to explicit denial with error logging to prevent authorization bypass - Improve error messages by including malformed DN in error output - Add comprehensive test coverage for edge cases including multi-level subdomains, empty Subject strings, and nil RoutePolicies - Update existing tests to validate new security-focused behavior All tests pass (163 config, 390 handlers) and code passes go vet, gofmt, and staticcheck linting.

…line - Fix dead-code bug: skip internal error handler for *AuthError in proxy_round_tripper so ReverseProxy.ErrorHandler can write the 403 - Fix error leak: replace err.Error() with generic status text in fallback error handler to avoid exposing internal details - Extract handleReverseProxyError() as testable package-level function - Add unit tests for handleReverseProxyError (proxy_error_handler_test.go) - Add post-selection pipeline tests in proxy_round_tripper_test.go - Add Layer 0 security branch test in mtls_pre_auth_test.go

Add ERB template validation that raises a deployment error when xfcc_format is configured alongside forwarded_client_cert: always_forward on an mTLS domain. In always_forward mode the XFCC header is passed through untouched, so xfcc_format has no effect and the combination indicates operator misconfiguration. Add rspec coverage for the new validation and surrounding valid combinations (sanitize_set+envoy, always_forward alone, xfcc_format without explicit forwarded_client_cert).

Previously this combination was only rejected by the BOSH template at deploy time. With gorouter now used outside of BOSH (cf-on-kind), the Go config must also enforce this constraint. Also removes dead code in GetMtlsDomainConfig wildcard matching where the strings.Contains check was redundant due to SplitN guarantees.

…ort/readyreader to routing-api BOSH package spec

- Rename caller_app/space/org → caller_cf_app/space/org for clarity - Remove auth, auth_rule, auth_denied_reason fields (not needed) - Always emit tls_sni and caller_cf_* fields with "-" when empty - Removes conditional emission that caused inconsistent log output

Per-request denial log statements (mtls-route-policies-denied, mtls-pre-auth-denied, mtls-scope-auth-denied, post-selection-auth-denied) now log at DEBUG level to avoid log volume amplification in production. The access log already captures all denial information via caller_cf_* fields and HTTP status codes. These DEBUG logs remain available for local debugging when operators enable debug-level logging.

…ation tests - Update router.client_cert_validation description to note that router.domains enforce mTLS independently - Update router.domains description to clarify relationship with router.client_cert_validation - Add rspec tests for all ERB template validation branches: non-array input, non-hash entry, missing/empty name, missing/empty ca_certs, invalid forwarded_client_cert mode, and invalid xfcc_format value Addresses PR #535 review threads 1-8.

- Rename identityHandler to cfIdentityHandler / NewCfIdentity to clarify it is specific to CF app instance identity certificates (thread 9) - Guard identity extraction: only run when (1) TLS was used and (2) the host is a configured mTLS domain, preventing XFCC header spoofing on non-mTLS routes (thread 10) - Move MtlsPreAuth handler above ClientCert in the proxy chain so a 421 response skips unnecessary certificate processing (thread 11) - Use configured xfcc_format from domain config instead of auto-detecting format at runtime; reject if format doesn't match (thread 12) All 386 handler tests and 179 proxy tests passing.

Split MtlsPreAuth into MtlsSniCheck (early 421) and MtlsPreAuth (post- CfIdentity 403) to fix the handler ordering regression from ac2e87e where moving MtlsPreAuth above ClientCert/CfIdentity caused CallerIdentity to always be nil, denying all mTLS app-to-app requests with 403. Handler chain order is now: Lookup → MtlsSniCheck → ClientCert → CfIdentity → MtlsPreAuth Additional PR #535 review feedback addressed: - Add route_policy field to access logs (renamed from auth_rule, always emitted with '-' when empty) - Remove per-request denial/granted log statements entirely (they duplicate access log information) - Move routePolicies/routePolicyScope from endpoint to pool-level fields to avoid stale data and reduce mutex contention All handler, access log, route, and proxy tests passing.

Cover the behavior introduced when moving route policy fields from endpoint to pool level: initial state, Put updates, re-Put updates, persistence after Remove, and default-deny (empty policies with scope).

DNS hostnames are case-insensitive per RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt), but IsMtlsDomain() and GetMtlsDomainConfig() used case-sensitive map lookups. This caused mTLS domain matching to fail when clients sent uppercase or mixed-case hostnames in the Host header or SNI field. Fix by normalizing domain names to lowercase both when storing in mtlsDomainMap (in processMtlsDomains) and when looking up in GetMtlsDomainConfig. Added unit tests covering: - Wildcard domain matching with uppercase host - Exact domain matching with mixed case host - Matching with uppercase host and port - IsMtlsDomain with various case combinations

The route policies auth handler was using pool-level policies instead of endpoint-level policies. This caused authorization failures when multiple endpoints on the same route have different route policies (e.g., backend-1 allows app-1, backend-2 allows app-2). Now uses the selected endpoint's RoutePolicies field which is already passed to the Check method, enabling per-endpoint authorization decisions. Fixes CI test: allows only the specified app and denies others (per-endpoint rules)

ameowlia · 2026-05-27T16:26:14Z

+// domainMatches checks if a hostname matches a domain pattern (supports wildcard domains).
+// Wildcard patterns (*.domain) only match a single DNS label, not multiple levels.
+func domainMatches(hostname, domainPattern string) bool {
+	if hostname == domainPattern {


I think there is still a case sensitivity issue here because there is nothing that lowercases the hostname.

Fixed in d3f4135 — domainMatches() now normalizes both hostname and domainPattern to lowercase before comparison, per RFC 1035. Added tests for uppercase hostnames, mixed-case suffixes, and mixed-case domain patterns.

ameowlia · 2026-05-27T16:30:07Z

+		}
+
+		// Store with lowercase key for case-insensitive matching (RFC 1035)
+		c.mtlsDomainMap[strings.ToLower(domain.Domain)] = domain


There is still a casing issue here. The map key is lowercased. But domain.Domain is not.

I think it should be consistent and lowercased everywhere.

Fixed in d3f4135 — domain.Domain is now normalized to lowercase when storing, not just the map key. This ensures downstream code (like domainMatches) receives consistent lowercase values. Added tests verifying cfg.Domain returns lowercase even when configured with mixed case.

ameowlia · 2026-05-27T16:52:41Z

I found more per-request logs, which goes against the logging standards for gorouter.

mtls_sni_check.go:52 - Non-mTLS domain with cert required
mtls_sni_check.go:69 - mTLS domain without proper cert validation
mtls_scope_auth.go:41 - Route pool missing during scope auth
mtls_route_policies_auth.go:72 - Route pool missing during policies auth
router.go:399 - mTLS domain detected during TLS handshake

All of these should already result in access logs (I think). Thus these logs are duplicative and should be deleted.

ameowlia · 2026-05-27T16:58:25Z

+	n.Use(handlers.NewMtlsSniCheck(cfg, logger))
 	n.Use(handlers.NewClientCert(
 		SkipSanitize(routeServiceHandler.(*handlers.RouteService)),
 		ForceDeleteXFCCHeader(routeServiceHandler.(*handlers.RouteService), cfg.ForwardedClientCert, logger),
 		cfg.ForwardedClientCert,
+		cfg,
 		logger,
 		errorWriter,
 	))
+	n.Use(handlers.NewCfIdentity(cfg))
+	n.Use(handlers.NewMtlsPreAuth(cfg, logger))


Can you add ifs so that these handlers aren't always run? I suggest something like...

shouldRunMtlsHandlers := len(config.Domains) > 0 || config.GlobalMtlsEnabled

actually DOES this code work when config.GlobalMtlsEnabled is true? if that is true, should all domains automatically work with route policies? Or do you want to force people to configure the domains in that case?

Done, I went with the no-op handler pattern instead of adding conditionals in proxy.go.

When I tried to write tests for conditional wrapping in proxy.go, I ran into a code smell: the tests needed intimate knowledge of handler internals to verify observable behavior differences. That coupling suggested the abstraction was in the wrong place.

So instead, the handler constructors now return a no-op handler when len(cfg.Domains) == 0. This keeps the decision inside the handler package and is straightforward to test. See commit a4c165a.

Regarding GlobalMtlsEnabled: I think requiring explicit router.domains config makes sense for identity-aware routing. The feature is strongest when a dedicated CA is configured for the identity domain. With global client_cert_validation: require, operators would need to add off-platform client CAs to the trust store to allow external traffic — but then those external certs could potentially spoof CF identity OUs (app:, space:, org:), weakening the trust model. A dedicated domain with only the Diego CA keeps the identity guarantees tight.

Address PR #535 review threads 14, 15, 16: - Normalize domain.Domain to lowercase when storing in mtlsDomainMap, not just the map key (thread 16) - Make domainMatches() case-insensitive by lowercasing both hostname and pattern before comparison (thread 15) Added 9 new tests covering mixed-case domain configuration and hostname matching. All 569 tests passing (175 config + 394 handlers).

…g standards Remove duplicative per-request logs that violate gorouter logging standards. Access logs already capture all necessary information via status codes and the caller_cf_* fields. Removed log statements: - clientcert.go: using-mtls-domain-xfcc-config (Debug) - mtls_sni_check.go: mtls-enforcement-mismatch (Warn) x2 - mtls_scope_auth.go: mtls-scope-auth-no-route-pool (Error) - mtls_route_policies_auth.go: mtls-route-policies-auth-no-route-pool (Error) - router.go: mtls-domain-detected (Debug) Addresses PR #535 comment: #535 (comment)

rkoster · 2026-05-28T07:30:12Z

I found more per-request logs, which goes against the logging standards for gorouter.

mtls_sni_check.go:52 - Non-mTLS domain with cert required
mtls_sni_check.go:69 - mTLS domain without proper cert validation
mtls_scope_auth.go:41 - Route pool missing during scope auth
mtls_route_policies_auth.go:72 - Route pool missing during policies auth
router.go:399 - mTLS domain detected during TLS handshake

Removed all 5 in f100947. Also found and removed one additional per-request log: clientcert.go:63 (using-mtls-domain-xfcc-config).

mTLS handler constructors now return NoopHandler/NoopPostSelectionHandler when len(cfg.Domains) == 0, avoiding unnecessary handler instantiation. This keeps the conditional logic encapsulated in the handler package rather than coupling proxy setup to handler internals. Handlers affected: - NewMtlsSniCheck -> NoopHandler - NewCfIdentity -> NoopHandler - NewMtlsPreAuth -> NoopHandler - NewMtlsScopeAuth -> NoopPostSelectionHandler - NewMtlsRoutePoliciesAuth -> NoopPostSelectionHandler Tests added to each handler's test file verifying constructor behavior.

…nt output

This was referenced Mar 5, 2026

RFC0055 Identity-Aware Routing cloudfoundry/cloud_controller_ng#4910

Open

Update cloud_controller_ng for mTLS app-to-app routing (RFC draft) cloudfoundry/capi-release#625

Draft

RFC: Domain-Scoped mTLS for GoRouter cloudfoundry/community#1438

Merged

cf-foundation-community-automation Bot added this to Application Runtime Platform Working Group Mar 5, 2026

cf-foundation-community-automation Bot moved this to Inbox in Application Runtime Platform Working Group Mar 5, 2026

rkoster force-pushed the feature/app-to-app-mtls-routing branch from 46b4007 to 23b4a0c Compare April 3, 2026 15:10

rkoster force-pushed the feature/app-to-app-mtls-routing branch 3 times, most recently from 1f9b804 to 79271b7 Compare April 17, 2026 12:12

rkoster added ready-to-run and removed ready-to-run labels Apr 17, 2026

rkoster force-pushed the feature/app-to-app-mtls-routing branch from 5cc4170 to b875867 Compare April 20, 2026 09:18

rkoster added ready-to-run and removed ready-to-run labels Apr 20, 2026

rkoster added 18 commits May 27, 2026 14:33

fix: apply gofmt to mtls_route_policies_auth_test.go

3fe508b

go mod tidy && go mod vendor

84395ca

fix: add missing locket/lock, cactus/go-statsd-client and grpc/transp…

0a83573

…ort/readyreader to routing-api BOSH package spec

test: add unit tests for pool-level RoutePolicyScope and RoutePolicies

94a81c4

Cover the behavior introduced when moving route policy fields from endpoint to pool level: initial state, Put updates, re-Put updates, persistence after Remove, and default-deny (empty policies with scope).

rkoster force-pushed the feature/app-to-app-mtls-routing branch from 82e9fa1 to 846c45b Compare May 27, 2026 13:50

ameowlia reviewed May 27, 2026

View reviewed changes

rkoster added 2 commits May 28, 2026 09:07

rkoster added 3 commits May 28, 2026 13:04

style: gofmt mtls_route_policies_auth_test.go

36868de

feat: include route_policy_scope and route_policies in /routes endpoi…

0c7d4ab

…nt output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC0055 Identity-Aware Routing#535

RFC0055 Identity-Aware Routing#535
rkoster wants to merge 69 commits into
developfrom
feature/app-to-app-mtls-routing

rkoster commented Mar 5, 2026 •

edited

Loading

Uh oh!

rkoster commented Apr 16, 2026

Uh oh!

rkoster commented Apr 16, 2026

Uh oh!

ameowlia May 27, 2026

Uh oh!

rkoster May 28, 2026

Uh oh!

ameowlia May 27, 2026

Uh oh!

rkoster May 28, 2026

Uh oh!

ameowlia commented May 27, 2026

Uh oh!

ameowlia May 27, 2026

Uh oh!

ameowlia May 27, 2026

Uh oh!

rkoster May 28, 2026

Uh oh!

rkoster commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rkoster commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

RFC0055: Identity-Aware mTLS Routing

What This Does

Phase 1a: mTLS Infrastructure

Phase 1b: Authorization

Key Design Decisions

Testing

Configuration Example

Related PRs

Merge Ordering

AI Disclosure

Uh oh!

rkoster commented Apr 16, 2026

Latest Update: RFC-Compliant Post-Selection Authorization

Key Changes (commit cbf0695)

RFC Compliance

Breaking Change

Integration Test Results

Uh oh!

rkoster commented Apr 16, 2026

Refactoring: AuthError for Future Extensibility

Changes

Benefits

Uh oh!

ameowlia May 27, 2026

Choose a reason for hiding this comment

Uh oh!

rkoster May 28, 2026

Choose a reason for hiding this comment

Uh oh!

ameowlia May 27, 2026

Choose a reason for hiding this comment

Uh oh!

rkoster May 28, 2026

Choose a reason for hiding this comment

Uh oh!

ameowlia commented May 27, 2026

Uh oh!

ameowlia May 27, 2026

Choose a reason for hiding this comment

Uh oh!

ameowlia May 27, 2026

Choose a reason for hiding this comment

Uh oh!

rkoster May 28, 2026

Choose a reason for hiding this comment

Uh oh!

rkoster commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rkoster commented Mar 5, 2026 •

edited

Loading

Key Changes (commit `cbf0695`)