Skip to content

[codex] integrate vivaldi latency state#61

Open
miciav wants to merge 63 commits into
mainfrom
faas-platform-abstraction
Open

[codex] integrate vivaldi latency state#61
miciav wants to merge 63 commits into
mainfrom
faas-platform-abstraction

Conversation

@miciav

@miciav miciav commented Mar 27, 2026

Copy link
Copy Markdown
Member

Summary

  • add Vivaldi latency-state support on top of the existing libp2p peer-discovery layer
  • introduce common coordinate messages and shared node-table latency/coordinate state without changing current strategy behavior
  • add targeted tests for Vivaldi manager lifecycle, error paths, and startup config wiring

Why

This adds latency-awareness infrastructure to the DFaaS agent while keeping the current Kademlia/mDNS discovery flow and existing strategies unchanged. It prepares the agent for future latency-aware forwarding decisions without introducing a second membership system.

Test Plan

  • go test ./...
  • go test -cover ./agent/latency/vivaldi
  • go test -cover ./agent/latency/vivaldi ./agent/msgtypes ./agent/nodestbl ./agent

@miciav miciav requested a review from ema-pe March 27, 2026 10:01
@miciav miciav marked this pull request as ready for review March 27, 2026 10:01
miciav and others added 28 commits March 27, 2026 11:05
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…trategies

Wire faasprovider.FaaSProvider into RecalcStrategy, NodeMarginStrategy,
StaticStrategy, and AllLocalStrategy; use faasprovider.NewFaaSProvider()
in all four strategy factory methods, removing direct offuncs/ofpromq deps.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dcoded OpenFaaS endpoint

Create a FaaSProvider once in agent.go and pass it to httpserver.Initialize(),
replacing the hardcoded healthCheckOpenFaaS() private function with a call to
_faasProvider.HealthCheck() so the health check is platform-agnostic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements faasprovider.FaaSProvider for Apache OpenWhisk by adding
openwhisk.Client that fetches action metadata (name, dfaas.maxrate,
dfaas.timeout_ms) from the /api/v1/namespaces/{ns}/actions endpoint.
Wires the new client into the factory so AGENT_FAAS_PLATFORM=openwhisk
is now functional. Prometheus query methods are stubbed for Task 7.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Change owAnnotation.Value from string to json.RawMessage so that
  non-string annotation values (objects, booleans, numbers) unmarshal
  correctly; update annotation() helper to unquote JSON strings or
  fall back to the raw representation
- Replace deprecated ioutil.ReadAll with io.ReadAll (Go 1.16+)
- Fix HealthCheck() to send the Authorization header via http.NewRequest
  instead of the bare http.Get call that always fails on secured deployments
- Add an httpClient field (30 s timeout) to Client; use it in both
  doActionsRequest and HealthCheck instead of http.DefaultClient
- Add TestNewFaaSProvider_OpenWhisk to factory_test.go
- Update mockOpenWhiskServer to reject requests to unexpected URL paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add prometheusHost field to Client struct and NewWithPrometheus constructor
  so tests can inject a mock Prometheus server without touching production config
- Add promQuery helper that uses c.httpClient (not DefaultClient) with proper
  body draining and io.ReadAll (no ioutil)
- Implement QueryAFET using openwhisk_action_duration_seconds_sum/count with
  the "action" label instead of OpenFaaS "function_name"
- Implement QueryInvoc using openwhisk_action_activations_total; map OpenWhisk
  "status" label ("success" -> "200", anything else -> "500")
- Implement QueryServiceCount using kube_deployment_status_replicas filtered
  by the client namespace, keyed by "deployment" label
- Implement QueryCPUusage and QueryRAMusage using identical node-exporter PromQL
  as ofpromq (node-level metrics are platform-independent)
- Implement QueryCPUusagePerFunction and QueryRAMusagePerFunction using
  cAdvisor container_* metrics, keyed by "container" label
- Add promtypes.go with typed response structs for all five query shapes
- Add TDD tests for all seven Query* methods via mockPrometheusServer helper

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…OpenWhisk promquery

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…templates

Rename the OpenFaaSHost and OpenFaaSPort fields in the HACfg base struct
and all derived structs/templates to FaaSHost and FaaSPort, making the
internal naming platform-agnostic. Config struct fields (used for env
var mappings) are intentionally left unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ath rewrite

Adds BackendPathPrefix helper in faasprovider, FaaSBackendPath field to HACfg,
and http-request replace-path rule in all four HAProxy templates so /function/<name>
is transparently rewritten to the platform-specific path (no-op for OpenFaaS,
/api/v1/namespaces/<ns>/actions/<name> for OpenWhisk).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add http-request replace-path to per-function be_{{funcName}} backends
  in haproxycfgnms.tmpl and haproxycfgstatic.tmpl so OpenWhisk receives
  the correct /api/v1/namespaces/<ns>/actions/<name> path for locally-
  weighted direct-client requests, not just node-forwarded ones.

- Rename AGENT_OPENFAAS_HOST/PORT → AGENT_FAAS_HOST/AGENT_FAAS_PORT
  across config.go, agent.go, strategyfactory.go, all four strategy
  files, values.yaml, values-openwhisk.yaml, and docs/commands.md.
  The old names implied OpenFaaS even when OpenWhisk was configured.

- Escape function names with regexp.QuoteMeta before building the
  container=~ regex filter in QueryCPUusagePerFunction and
  QueryRAMusagePerFunction to prevent PromQL injection when action
  names contain regex metacharacters (e.g. dots in package names).

- Remove unreachable dead code in staticstrategy.publishNodeInfo()
  (duplicate GetFuncsNames call after return nil).

- Remove CLAUDE.md from .gitignore (the file is tracked; the entry
  was a no-op and only caused confusion).

- Pin OpenWhisk Helm chart to --version 1.0.1 in docs/commands.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove e2e_render_control_plane_sync_env, e2e_deploy_control_plane,
e2e_deploy_function_runtime, e2e_verify_core_pods_running,
e2e_verify_control_plane_health, and e2e_dump_core_pod_logs — all
NanoFaaS-specific helpers that have no place in the generic DFaaS
e2e library.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Delete e2e_register_pool_function, e2e_kubectl_curl_control_plane,
e2e_extract_json_by_field, e2e_extract_execution_id,
e2e_extract_execution_status, e2e_extract_bool_field,
e2e_extract_numeric_field, e2e_invoke_sync_message, e2e_enqueue_message,
e2e_fetch_execution, e2e_wait_execution_success, e2e_enqueue_message_burst,
e2e_get_control_plane_pod_name, and e2e_fetch_control_plane_prometheus.
These were NanoFaaS-specific helpers with no role in the DFaaS e2e suite.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
miciav and others added 29 commits March 27, 2026 11:05
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces MakeCommonCallback in agent/commondispatch.go, which wraps any
strategy's OnReceived callback with a pre-filter that intercepts common
broadcast messages (heartbeat, overload_alert, function_event) and updates
the CommonNodeTable before delegating to the strategy callback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Initialize DirectMessenger (with configurable timeout and 5s fallback)
before the load balancer, create a CommonNodeTable with a TTL of 3×
HeartbeatInterval (30s fallback), and wrap the strategy callback with
MakeCommonCallback so all incoming PubSub messages update the shared
table before being forwarded to the active strategy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move the duplicated msgTypeEnvelope struct to msgtypes.MsgEnvelope and
CommonMsgTypes to msgtypes.CommonBroadcastTypes so both commondispatch
and communication/direct share a single canonical definition. Also
combine the two Write syscalls in writeMsg into a single buffered write.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Describes the interface hierarchy (PeriodicStrategy, EventDrivenStrategy,
HybridStrategy), the runner dispatcher, event flow, concurrency guarantees,
and migration path for existing strategies.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
10-task TDD plan covering: interface definitions, three runner
implementations (periodic/event-driven/hybrid), strategy migrations,
and agent.go wiring with context-based shutdown.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds periodicRunner that drives a PeriodicStrategy on a fixed ticker,
with context-cancellation support and a 1-minute default period fallback.
NewRunner dispatches to the correct runner via type switch
(HybridStrategy > EventDrivenStrategy > PeriodicStrategy). Placeholder
runners (noopRunner, hybrid-as-periodic) are clearly marked for Tasks 3/4.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the noopRunner placeholder with a real eventDrivenRunner that
forwards trigger events to a worker goroutine via a capacity-1 channel,
collapses bursts with an optional debounce window, and delegates all
pubsub messages to OnReceived. Also fixes the test mock counters to use
sync/atomic.Int32 so the race detector passes cleanly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace RunStrategy() self-managed loops in RecalcStrategy,
StaticStrategy, AllLocalStrategy, and NodeMarginStrategy with
Period() and Tick(ctx) methods. Move AllLocalStrategy.prevFuncs
and NodeMarginStrategy pre-loop init (maxValues thresholds,
nodeInfo fields) to their respective factory createStrategy().
Add compile-time interface checks (var _ PeriodicStrategy = ...).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- makeTrigSet: replaces identical trigSet construction in eventdriven/hybrid runners
- makeTriggerCallback: replaces identical Callback() body in both runners
- effectivePeriod: replaces period-defaulting duplication in periodic/hybrid runners
- sleepOrCtx: replaces three identical context-aware select blocks in RecalcStrategy.Tick
- Stop debounce timer on ctx.Done() in eventDrivenRunner for proper cleanup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…models

Adds docs/paper/ with two IEEEtran-formatted sections:
- Section III: Messaging Subsystem (two-plane architecture, common message
  vocabulary, GossipSub broadcast, CommonNodeTable, directed libp2p streams)
- Section IV: Strategy Execution Models (interface hierarchy, three runner
  implementations, context propagation, migration of existing strategies)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@miciav miciav force-pushed the faas-platform-abstraction branch from 8088992 to 206e902 Compare March 27, 2026 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant