Skip to content

dev_process_worker:compute_cached/3 — unwrap AO-Core {ao-result, body} envelope (#942)#943

Open
codex-curator wants to merge 2624 commits into
permaweb:mainfrom
codex-curator:fix/compute-cached-envelope-unwrap
Open

dev_process_worker:compute_cached/3 — unwrap AO-Core {ao-result, body} envelope (#942)#943
codex-curator wants to merge 2624 commits into
permaweb:mainfrom
codex-curator:fix/compute-cached-envelope-unwrap

Conversation

@codex-curator
Copy link
Copy Markdown

Companion to #942.

Summary

Adds one clause to compute_cached/3 in src/preloaded/process/dev_process_worker.erl that unwraps the AO-Core message envelope (#{<<"ao-result">> => <<"body">>, <<"body">> => <<"14040">>}) when the v0.9 HTTP resolve path delivers RawSlot wrapped instead of as a bare binary. Without this, every /compute request on a production node running v0.9-FINAL crashes via function_clause in hb_util:int/1 (no map clause). Full diagnosis + stacktrace in #942.

Change

src/preloaded/process/dev_process_worker.erl, inserted before the existing compute_cached(ProcID, not_found, Opts) -> head:

compute_cached(ProcID, RawSlot, Opts) when is_map(RawSlot) ->
    %% Defensively unwrap the AO-Core message envelope when the v0.9 HTTP
    %% resolve path delivers RawSlot wrapped instead of as a bare binary.
    %% The slot id lives at the `body' key; everything else is unchanged.
    case maps:find(<<"body">>, RawSlot) of
        {ok, Body} -> compute_cached(ProcID, Body, Opts);
        error -> false
    end;

Rationale

  • Bare-binary contract preserved on every existing call site.
  • Wrapped calls unwrap once and recurse into the original clause; hb_util:int(Body) then sees the binary the existing is_binary clause handles.
  • maps:find/2 returning error falls through to false, matching compute_cached/3's existing "not cached" semantics.
  • Recursion terminates in one step.

If maintainers prefer the unwrap upstream in dev_process:target_slot/2 (closer to where the envelope is introduced), happy to rework.

Validation

Applied against permaweb/HyperBEAM/main HEAD on a production node serving a custom (non-aos) Lua process at PID Dwnuy4MbuQkgwxw4-P08wxeny2KcwCh8Kd22mehacTc. Pre-patch: 100% of /compute requests crash with function_clause. Post-patch: function_clause does not recur.

The two related v0.9 bugs reported in #942 (LMDB mdb_page_dirty assertion against existing cache, loadMessages "Body is not valid" against a fresh cache) are out of scope here.

Follow-ups welcome

  1. Whether the unwrap belongs here or in dev_process:target_slot/2.
  2. Whether the related LMDB / loadMessages envelope issues point at one upstream seam we could fix together.
  3. Whether a regression test should land alongside — happy to write one if pointed at the right test module pattern.

Tad MacPherson, Metavolve Labs / Golden Codex
curator@golden-codex.com · @codex-curator · golden-codex.com

  • Claude (Anthropic) — Jedi Code Master

nikooo777 and others added 30 commits April 17, 2026 01:45
- Return tagged tuples from latest_height and normalize_height
- Propagate errors through parse_range using maybe block
- Return {error, unavailable} (HTTP 503) on upstream failures
- Validate resolved heights are non-negative in parse_range
- Log original upstream error reason before collapsing to unavailable
- Add regression tests with mock server for both failure paths
Two generic, RFC 9421-compliant extensions to the HTTP Signature
encoder and decoder. Neither is tied to a specific commitment device.

## 1. `keyid' is optional on the wire

RFC 9421 §1.4.2.3 permits `keyid' to be absent when an application
does not rely on receiver-side key-material retrieval. Previously, a
commitment without a `keyid' field encoded as `keyid=""'; it now emits
no `keyid' parameter at all, handled uniformly with the other optional
params (`nonce', `created', `expires') via the existing
undefined-drop filter. The decoder has always tolerated absence and is
unchanged.

## 2. `id' parameter — transport the commitment's map key when it is
   not a function of `Sig'

The decoder has always derived a commitment's map key from the
signature bytes: `human_id(Sig)' for 32-byte sigs, otherwise
`human_id(sha256(Sig))'. That is correct for HMAC, RSA-PSS, and any
device whose identity is a function of `Sig'. It is not correct for
devices whose identity is chosen independently of the signature.

The encoder now threads each commitment's map key through and compares
it to the default derivation:

  - Match (every existing device): nothing emitted. Wire shape unchanged.
  - Mismatch: emit `id="<CommID>"' as an additional parameter on the
    signature-input line.

The decoder checks for `id' first, falling back to the default
derivation when absent, and strips `id' from the commitment body
before returning. The shared derivation is factored out as
`derived_commitment_id/1'. `<<"id">>' is added to
`get_additional_params/1''s blacklist so it does not round-trip as a
user-defined parameter.

All 2230 regression tests pass (`hb_message_test_vectors',
`dev_codec_httpsig*', `dev_codec_ans104', `dev_codec_flat',
`dev_codec_json', `dev_codec_structured', `hb_cache',
`hb_ao_test_vectors', `hb_http', `hb_message').

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-id-param

impr: optional `keyid` + `id` param on signature-input in `httpsig@1.0`
Caches the raw default message in the process dictionary on first call;
subsequent calls in the same Erlang process return the cached map
directly. Matches the cached_os_env pattern already used for env-var
lookups.

Measured impact on hb_ao:resolve/3 (simple, single-step), warm-runtime
escript harness, Opts={store, priv_wallet}:

  before: 17,693/s  (56.5 us/resolve)
  after:  95,382/s  (10.5 us/resolve)
  speedup: 5.4x, or 81% time reduction

The immutable portion of the node config is constant for the lifetime
of an Erlang process, so the cache is always safe; mutating code paths
that need a fresh base (none found in the current tree) can call
raw_default_message/0 directly.

All 2148 targeted tests pass (hb_opts, hb_ao_test_vectors,
hb_message_test_vectors).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
perf: memoize node message, leading to ~81% single-step AO-Core exec speedup
…nsform

`hb_test_parallel.erl' is a minimal parse_transform and runtime helper
that lets a test module opt into parallel EUnit execution purely by
naming convention:

    foo_test_parallel()    -> ?assert(...).
    bar_test_parallel_()   -> {timeout, 30, fun() -> ... end}.

The transform does two things at compile time:

  * auto-exports every 0-arity function whose name ends in
    `_test_parallel' or `_test_parallel_', and
  * (when the module does not already define one) injects

        all_parallel_test_() -> hb_test_parallel:all(?MODULE).

There is no rename: the names the author writes are the names that
get compiled. EUnit's own auto-discovery only matches `_test' and
`_test_' suffixes, so our `_parallel'-ending names are ignored by it
and only run once, via the injected generator.

Activation is by including `hb.hrl', which wires the transform in
under `-ifdef(TEST)'. `rebar.config' sets `erl_first_files' so the
transform module compiles before anything that uses it.

No existing modules are converted in this PR -- the infrastructure
lands on its own so that subsequent PRs can migrate individual test
suites module-by-module.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l-autowire

feat: auto-parallelize all `*_test_parallel[_]`tests
…ation

Cuts full rebar3 eunit wall time from ~10:26 baseline to ~4:55 (53%
faster) by enabling in-VM test parallelism.

Changes:

- hb_test_utils:suite_with_opts/2 now wraps the top-level list of
  OptSpec groups in {inparallel, ...} and, via the foreach setup,
  creates a fresh per-test store. Previously tests within the same
  OptSpec shared a store whose reset ran in every test's setup,
  which meant concurrent tests in the `inparallel` inner group could
  wipe each other's data (visible as load_as_test flakiness).

- hb_test_utils:test_store/2 uses microsecond + 6 random bytes in
  the path instead of millisecond + 1 ms sleep, making unique paths
  cheap under parallel load.

- hb_store_volatile: add <<"max-ttl-ms">> option for test use so the
  max_ttl_test no longer has to use 1 s TTL and 1250 ms waits.
  max_ttl_test now uses 100 ms + 200 ms sleeps (~2.1 s saved).
  <<"max-ttl">> (seconds) is unchanged.

- dev_copycat_arweave, dev_manifest, dev_copycat_graphql, dev_arweave,
  dev_name: rename individual `_test()` cases to `_tc()`, add a single
  `all_tests_test_/0` generator that returns {inparallel, [fun
  ?MODULE:F/0 || ...]}, and export the `_tc/0` functions under
  -ifdef(TEST). Each test creates its own store / node so concurrent
  execution is safe. Per-module wall times drop:

    dev_manifest:          29s -> 7s  (-22s)
    dev_copycat_arweave:   33s -> 16s (-17s)
    dev_arweave:           19s -> 8s  (-11s)
    dev_copycat_graphql:   17s -> 9s  (-8s)
    dev_name:              14s -> 6s  (-8s)

dev_bundler was evaluated but not parallelized: it registers a
singleton bundler_server via hb_name, so parallel tests race on the
registration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 6dc41cd ("fix(?): add waits to gateway store remote node read
test") added two 1 s `?debug_wait`s to `remote_hyperbeam_node_ans104_test`
as a provisional fix for a flakiness that was later fixed at the root
cause in commit fb93d6f ("fix: hb_store_gateway:remote_hyperbeam_node_
ans104_test flaky test") by repairing the `~query@1.0` commitment lookup.
The waits were not removed when the real fix landed.

Re-run the test 3x with the waits removed: passes cleanly (~7-9 s per
module run). Saves 2 s per full suite run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…vectors

dev_process_test_vectors (~27 s serial -> ~5.6 s parallel):
- Rename 17 `_test()` / `_test_()` cases to `_tc()` / `_tc_()` and add a
  single `all_tests_test_/0' generator that wraps them in `{inparallel,
  ...}`. Each test creates its own store via `hb_test_utils:test_store/0`
  and, where applicable, its own HTTP server, so they are safe to run
  concurrently.

dev_query_test_vectors (~14 s serial -> ~10 s parallel):
- Same rename + `{inparallel, ...}` wrapper applied to 18 `_test()`
  cases. Savings are smaller here because the tests drive real traffic
  to `arweave.net` via `~copycat@1.0/arweave`, so wall time is dominated
  by network latency.

dev_query_graphql:ensure_started/1 race fix:
- The previous implementation did `hb_name:lookup` -> `spawn_link` +
  `init/1`, which was not atomic. Under parallel eunit two callers could
  both see `undefined` and both call `graphql:load_schema/2`, which the
  `graphql' library rejects with `entry_already_exists_in_schema'.
- New implementation: one spawned process atomically claims the name via
  `hb_name:register/2`, runs `init/1`, and sets a `persistent_term` flag
  when the schema is ready. Losers of the registration race poll that
  flag via `hb_util:wait_until/2` (bounded by `?START_TIMEOUT') instead
  of returning early with a half-initialized schema.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Applies the same `_test()` / `_test_()` -> `_tc()` / `_tc_()` + single
`all_tests_test_/0' generator pattern used elsewhere. Each router test
already starts its own node(s) via `hb_http_server:start_node/1' with
a fresh store, so parallel execution is safe.

Slowest tests: `dynamic_routing_by_performance_tc_` (~5 s, benchmarks
route selection under load) and `full_route_config_tc` (~2.3 s) now
overlap instead of running back-to-back.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…che race

dev_bundler: 17 s -> ~15 s per module run
- Threaded a new `bundler_server_name' option through dev_bundler so
  each test's `hb_name:singleton/2' registration is unique instead of
  sharing the legacy `?SERVER_NAME' global atom. Default stays
  `?SERVER_NAME' so external callers (`dev_arweave', `dev_bundler_cache',
  `hb_client') keep working unchanged.
- `start_mock_gateway/1' stamps a unique `{bundler_server, make_ref()}'
  name into `NodeOpts'; tests now call `stop_test_servers/2' with those
  opts so the per-test server is torn down correctly.
- `stop_server/1' and `get_state/1' added; 0-arity versions retained.
- Rename `_test()` cases to `_tc()`; add `all_tests_test_/0' generator.
- Most tests run in `{inparallel, ...}'. 4 timing-sensitive tests
  (`idle_tc', `bundle_dispatch_delay_tc', `dispatch_blocking_tc',
  `exponential_backoff_timing_tc') assert tight wall-clock windows
  that false-fail under CPU contention, so they run `{inorder, ...}'
  after the parallel batch.
- 4 consecutive runs all pass.

dev_scheduler_cache: fix concurrent_read_write_test flakiness
- Pre-write slot 1 synchronously before spawning the reader processes.
  Under heavy parallel CPU load the 10 readers could blast through
  their 100 reads each before the writer's first `write/2' landed,
  failing `?assert(TotalSuccessfulReads > 0)'. The real assertion
  (`FinalSlots == [1..100]' preserving order) is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dev_scheduler (~13 s serial -> ~3.7 s parallel):
- `http_init/1' was sharing a fixed-name `hb_store_volatile' instance
  (`cache-TEST/volatile') across every scheduler HTTP test. Replace
  with a per-test `hb_test_utils:test_store(hb_store_volatile, ...)'
  so parallel HTTP tests do not share ETS state.
- Rename `_test()` / `_test_()` cases to `_tc()` / `_tc_()' and add
  `all_tests_test_/0' that wraps the 14 parallel-safe cases in
  `{inparallel, ...}' and runs `benchmark_suite_tc_' afterwards in
  `{inorder, ...}'. The benchmark seeds rand globally and picks a
  random port, which is incompatible with running concurrently with
  sibling tests.
- 3 consecutive runs all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`dev_arweave_offset:offset_item_cases_test/0' previously made five
serial live-network calls to arweave.net via `assert_offset_item/4'.
Use `hb_pmap:parallel_map/3' (already imported into the same style at
`dev_copycat_arweave:1d03ba5f3') so the five fetches overlap. Wall
time in that test drops proportionally to the slowest remote fetch.

No semantic change: each fetch still runs `assert_offset_item' with
the same arguments and the same assertions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`http_get_schedule_tc_' previously did:
    ?assertMatch({ok, #{<<"current">> := 3}}, http_get_slot(Node, PMsg)),
    ?debug_wait(100),
    {ok, Schedule} = http_get_schedule(...)

The `assertMatch' already observes `current == 3', which means every
write-side effect of the three prior POSTs has landed -- the schedule
is readable without an extra 100 ms sleep. Removed. 5 consecutive
runs all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The workers spawned by `spawn_test_workers(random)' exit after 500 ms
and the `hb_name' cleanup reaper unregisters them shortly after. The
test was sleeping a flat second before asserting the table length.
Poll with `hb_util:wait_until/2' (100 ms poll, 2 s timeout) so the
test wakes as soon as cleanup finishes rather than always paying
1 s. 3 consecutive runs pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
samcamwilliams and others added 28 commits May 25, 2026 18:27
fix(device): package preloaded store in release
chore: removes SNP and Green Zone support, migrating to `os` repo
chore: bump version number for start of 0.10 series.
Enables defining `match-type` directives (`strict`, `primary`, `only-present`) within nested map structures. This provides more granular control over message comparison.
feat: Support flexible, nested matching modes in `hb_message`
Improves safety in device separation and removes potential for OOM.
…xclusive

impr: terminate early when forge bootstrap lacks required device
…gnore

fix: ignore hyperbeam rt dirs in forge template
Fix arweave http call by adding bundle true
…} envelope

Field report + diagnosis: see companion issue.

The v0.9 HTTP resolve path delivers RawSlot to compute_cached/3 as the
AO-Core message envelope #{<<"ao-result">> => <<"body">>, <<"body">> => <<"14040">>}
instead of as the bare binary <<"14040">> the existing
'hb_util:int(RawSlot)' call expects. hb_util:int/1 has only is_binary
| is_list | is_integer clauses, so the BEAM raises function_clause on
the map, OTP restarts the worker, every subsequent /compute request
crashes the same way. Substrate effectively offline for reads.

Add a single defensive clause at the top of compute_cached/3 that
unwraps the envelope's body key and recurses through the existing
clauses. Bare-binary callers are unaffected; wrapped callers now
land on hb_util:int(<<"14040">>) which the existing is_binary
clause already handles. maps:find/2 returning error falls through
to false, matching compute_cached/3's existing semantics for
'not cached'.

Validated against production node serving custom (non-aos) Lua AO
process (Golden Codex Aeternum Registrar PID
Dwnuy4MbuQkgwxw4-P08wxeny2KcwCh8Kd22mehacTc). Pre-patch: 100% of
/compute requests crash. Post-patch: function_clause does not recur.

The unwrap may belong upstream in dev_process:target_slot/2 — happy
to follow whichever seam the maintainers prefer.

Co-Authored-By: Claude (Anthropic) <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create a Codex account and connect to github.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants