dev_process_worker:compute_cached/3 — unwrap AO-Core {ao-result, body} envelope (#942)#943
Open
codex-curator wants to merge 2624 commits into
Open
dev_process_worker:compute_cached/3 — unwrap AO-Core {ao-result, body} envelope (#942)#943codex-curator wants to merge 2624 commits into
codex-curator wants to merge 2624 commits into
Conversation
- Return tagged tuples from latest_height and normalize_height
- Propagate errors through parse_range using maybe block
- Return {error, unavailable} (HTTP 503) on upstream failures
- Validate resolved heights are non-negative in parse_range
- Log original upstream error reason before collapsing to unavailable
- Add regression tests with mock server for both failure paths
Rebase neo/edge onto edge
Two generic, RFC 9421-compliant extensions to the HTTP Signature
encoder and decoder. Neither is tied to a specific commitment device.
## 1. `keyid' is optional on the wire
RFC 9421 §1.4.2.3 permits `keyid' to be absent when an application
does not rely on receiver-side key-material retrieval. Previously, a
commitment without a `keyid' field encoded as `keyid=""'; it now emits
no `keyid' parameter at all, handled uniformly with the other optional
params (`nonce', `created', `expires') via the existing
undefined-drop filter. The decoder has always tolerated absence and is
unchanged.
## 2. `id' parameter — transport the commitment's map key when it is
not a function of `Sig'
The decoder has always derived a commitment's map key from the
signature bytes: `human_id(Sig)' for 32-byte sigs, otherwise
`human_id(sha256(Sig))'. That is correct for HMAC, RSA-PSS, and any
device whose identity is a function of `Sig'. It is not correct for
devices whose identity is chosen independently of the signature.
The encoder now threads each commitment's map key through and compares
it to the default derivation:
- Match (every existing device): nothing emitted. Wire shape unchanged.
- Mismatch: emit `id="<CommID>"' as an additional parameter on the
signature-input line.
The decoder checks for `id' first, falling back to the default
derivation when absent, and strips `id' from the commitment body
before returning. The shared derivation is factored out as
`derived_commitment_id/1'. `<<"id">>' is added to
`get_additional_params/1''s blacklist so it does not round-trip as a
user-defined parameter.
All 2230 regression tests pass (`hb_message_test_vectors',
`dev_codec_httpsig*', `dev_codec_ans104', `dev_codec_flat',
`dev_codec_json', `dev_codec_structured', `hb_cache',
`hb_ao_test_vectors', `hb_http', `hb_message').
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-id-param impr: optional `keyid` + `id` param on signature-input in `httpsig@1.0`
Caches the raw default message in the process dictionary on first call;
subsequent calls in the same Erlang process return the cached map
directly. Matches the cached_os_env pattern already used for env-var
lookups.
Measured impact on hb_ao:resolve/3 (simple, single-step), warm-runtime
escript harness, Opts={store, priv_wallet}:
before: 17,693/s (56.5 us/resolve)
after: 95,382/s (10.5 us/resolve)
speedup: 5.4x, or 81% time reduction
The immutable portion of the node config is constant for the lifetime
of an Erlang process, so the cache is always safe; mutating code paths
that need a fresh base (none found in the current tree) can call
raw_default_message/0 directly.
All 2148 targeted tests pass (hb_opts, hb_ao_test_vectors,
hb_message_test_vectors).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
perf: memoize node message, leading to ~81% single-step AO-Core exec speedup
…nsform
`hb_test_parallel.erl' is a minimal parse_transform and runtime helper
that lets a test module opt into parallel EUnit execution purely by
naming convention:
foo_test_parallel() -> ?assert(...).
bar_test_parallel_() -> {timeout, 30, fun() -> ... end}.
The transform does two things at compile time:
* auto-exports every 0-arity function whose name ends in
`_test_parallel' or `_test_parallel_', and
* (when the module does not already define one) injects
all_parallel_test_() -> hb_test_parallel:all(?MODULE).
There is no rename: the names the author writes are the names that
get compiled. EUnit's own auto-discovery only matches `_test' and
`_test_' suffixes, so our `_parallel'-ending names are ignored by it
and only run once, via the injected generator.
Activation is by including `hb.hrl', which wires the transform in
under `-ifdef(TEST)'. `rebar.config' sets `erl_first_files' so the
transform module compiles before anything that uses it.
No existing modules are converted in this PR -- the infrastructure
lands on its own so that subsequent PRs can migrate individual test
suites module-by-module.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l-autowire feat: auto-parallelize all `*_test_parallel[_]`tests
…ation
Cuts full rebar3 eunit wall time from ~10:26 baseline to ~4:55 (53%
faster) by enabling in-VM test parallelism.
Changes:
- hb_test_utils:suite_with_opts/2 now wraps the top-level list of
OptSpec groups in {inparallel, ...} and, via the foreach setup,
creates a fresh per-test store. Previously tests within the same
OptSpec shared a store whose reset ran in every test's setup,
which meant concurrent tests in the `inparallel` inner group could
wipe each other's data (visible as load_as_test flakiness).
- hb_test_utils:test_store/2 uses microsecond + 6 random bytes in
the path instead of millisecond + 1 ms sleep, making unique paths
cheap under parallel load.
- hb_store_volatile: add <<"max-ttl-ms">> option for test use so the
max_ttl_test no longer has to use 1 s TTL and 1250 ms waits.
max_ttl_test now uses 100 ms + 200 ms sleeps (~2.1 s saved).
<<"max-ttl">> (seconds) is unchanged.
- dev_copycat_arweave, dev_manifest, dev_copycat_graphql, dev_arweave,
dev_name: rename individual `_test()` cases to `_tc()`, add a single
`all_tests_test_/0` generator that returns {inparallel, [fun
?MODULE:F/0 || ...]}, and export the `_tc/0` functions under
-ifdef(TEST). Each test creates its own store / node so concurrent
execution is safe. Per-module wall times drop:
dev_manifest: 29s -> 7s (-22s)
dev_copycat_arweave: 33s -> 16s (-17s)
dev_arweave: 19s -> 8s (-11s)
dev_copycat_graphql: 17s -> 9s (-8s)
dev_name: 14s -> 6s (-8s)
dev_bundler was evaluated but not parallelized: it registers a
singleton bundler_server via hb_name, so parallel tests race on the
registration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 6dc41cd ("fix(?): add waits to gateway store remote node read test") added two 1 s `?debug_wait`s to `remote_hyperbeam_node_ans104_test` as a provisional fix for a flakiness that was later fixed at the root cause in commit fb93d6f ("fix: hb_store_gateway:remote_hyperbeam_node_ ans104_test flaky test") by repairing the `~query@1.0` commitment lookup. The waits were not removed when the real fix landed. Re-run the test 3x with the waits removed: passes cleanly (~7-9 s per module run). Saves 2 s per full suite run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…vectors
dev_process_test_vectors (~27 s serial -> ~5.6 s parallel):
- Rename 17 `_test()` / `_test_()` cases to `_tc()` / `_tc_()` and add a
single `all_tests_test_/0' generator that wraps them in `{inparallel,
...}`. Each test creates its own store via `hb_test_utils:test_store/0`
and, where applicable, its own HTTP server, so they are safe to run
concurrently.
dev_query_test_vectors (~14 s serial -> ~10 s parallel):
- Same rename + `{inparallel, ...}` wrapper applied to 18 `_test()`
cases. Savings are smaller here because the tests drive real traffic
to `arweave.net` via `~copycat@1.0/arweave`, so wall time is dominated
by network latency.
dev_query_graphql:ensure_started/1 race fix:
- The previous implementation did `hb_name:lookup` -> `spawn_link` +
`init/1`, which was not atomic. Under parallel eunit two callers could
both see `undefined` and both call `graphql:load_schema/2`, which the
`graphql' library rejects with `entry_already_exists_in_schema'.
- New implementation: one spawned process atomically claims the name via
`hb_name:register/2`, runs `init/1`, and sets a `persistent_term` flag
when the schema is ready. Losers of the registration race poll that
flag via `hb_util:wait_until/2` (bounded by `?START_TIMEOUT') instead
of returning early with a half-initialized schema.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Applies the same `_test()` / `_test_()` -> `_tc()` / `_tc_()` + single `all_tests_test_/0' generator pattern used elsewhere. Each router test already starts its own node(s) via `hb_http_server:start_node/1' with a fresh store, so parallel execution is safe. Slowest tests: `dynamic_routing_by_performance_tc_` (~5 s, benchmarks route selection under load) and `full_route_config_tc` (~2.3 s) now overlap instead of running back-to-back. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…che race
dev_bundler: 17 s -> ~15 s per module run
- Threaded a new `bundler_server_name' option through dev_bundler so
each test's `hb_name:singleton/2' registration is unique instead of
sharing the legacy `?SERVER_NAME' global atom. Default stays
`?SERVER_NAME' so external callers (`dev_arweave', `dev_bundler_cache',
`hb_client') keep working unchanged.
- `start_mock_gateway/1' stamps a unique `{bundler_server, make_ref()}'
name into `NodeOpts'; tests now call `stop_test_servers/2' with those
opts so the per-test server is torn down correctly.
- `stop_server/1' and `get_state/1' added; 0-arity versions retained.
- Rename `_test()` cases to `_tc()`; add `all_tests_test_/0' generator.
- Most tests run in `{inparallel, ...}'. 4 timing-sensitive tests
(`idle_tc', `bundle_dispatch_delay_tc', `dispatch_blocking_tc',
`exponential_backoff_timing_tc') assert tight wall-clock windows
that false-fail under CPU contention, so they run `{inorder, ...}'
after the parallel batch.
- 4 consecutive runs all pass.
dev_scheduler_cache: fix concurrent_read_write_test flakiness
- Pre-write slot 1 synchronously before spawning the reader processes.
Under heavy parallel CPU load the 10 readers could blast through
their 100 reads each before the writer's first `write/2' landed,
failing `?assert(TotalSuccessfulReads > 0)'. The real assertion
(`FinalSlots == [1..100]' preserving order) is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dev_scheduler (~13 s serial -> ~3.7 s parallel):
- `http_init/1' was sharing a fixed-name `hb_store_volatile' instance
(`cache-TEST/volatile') across every scheduler HTTP test. Replace
with a per-test `hb_test_utils:test_store(hb_store_volatile, ...)'
so parallel HTTP tests do not share ETS state.
- Rename `_test()` / `_test_()` cases to `_tc()` / `_tc_()' and add
`all_tests_test_/0' that wraps the 14 parallel-safe cases in
`{inparallel, ...}' and runs `benchmark_suite_tc_' afterwards in
`{inorder, ...}'. The benchmark seeds rand globally and picks a
random port, which is incompatible with running concurrently with
sibling tests.
- 3 consecutive runs all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`dev_arweave_offset:offset_item_cases_test/0' previously made five serial live-network calls to arweave.net via `assert_offset_item/4'. Use `hb_pmap:parallel_map/3' (already imported into the same style at `dev_copycat_arweave:1d03ba5f3') so the five fetches overlap. Wall time in that test drops proportionally to the slowest remote fetch. No semantic change: each fetch still runs `assert_offset_item' with the same arguments and the same assertions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`http_get_schedule_tc_' previously did:
?assertMatch({ok, #{<<"current">> := 3}}, http_get_slot(Node, PMsg)),
?debug_wait(100),
{ok, Schedule} = http_get_schedule(...)
The `assertMatch' already observes `current == 3', which means every
write-side effect of the three prior POSTs has landed -- the schedule
is readable without an extra 100 ms sleep. Removed. 5 consecutive
runs all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The workers spawned by `spawn_test_workers(random)' exit after 500 ms and the `hb_name' cleanup reaper unregisters them shortly after. The test was sleeping a flat second before asserting the table length. Poll with `hb_util:wait_until/2' (100 ms poll, 2 s timeout) so the test wakes as soon as cleanup finishes rather than always paying 1 s. 3 consecutive runs pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix: redact more events
fix(device): package preloaded store in release
This reverts commit 02883d9.
chore: removes SNP and Green Zone support, migrating to `os` repo
chore: bump version number for start of 0.10 series.
Enables defining `match-type` directives (`strict`, `primary`, `only-present`) within nested map structures. This provides more granular control over message comparison.
feat: Support flexible, nested matching modes in `hb_message`
Improves safety in device separation and removes potential for OOM.
…xclusive impr: terminate early when forge bootstrap lacks required device
perf: export trie reserved-keys helpers
…gnore fix: ignore hyperbeam rt dirs in forge template
Fix arweave http call by adding bundle true
…20260518 Introduce hint-device
…} envelope
Field report + diagnosis: see companion issue.
The v0.9 HTTP resolve path delivers RawSlot to compute_cached/3 as the
AO-Core message envelope #{<<"ao-result">> => <<"body">>, <<"body">> => <<"14040">>}
instead of as the bare binary <<"14040">> the existing
'hb_util:int(RawSlot)' call expects. hb_util:int/1 has only is_binary
| is_list | is_integer clauses, so the BEAM raises function_clause on
the map, OTP restarts the worker, every subsequent /compute request
crashes the same way. Substrate effectively offline for reads.
Add a single defensive clause at the top of compute_cached/3 that
unwraps the envelope's body key and recurses through the existing
clauses. Bare-binary callers are unaffected; wrapped callers now
land on hb_util:int(<<"14040">>) which the existing is_binary
clause already handles. maps:find/2 returning error falls through
to false, matching compute_cached/3's existing semantics for
'not cached'.
Validated against production node serving custom (non-aos) Lua AO
process (Golden Codex Aeternum Registrar PID
Dwnuy4MbuQkgwxw4-P08wxeny2KcwCh8Kd22mehacTc). Pre-patch: 100% of
/compute requests crash. Post-patch: function_clause does not recur.
The unwrap may belong upstream in dev_process:target_slot/2 — happy
to follow whichever seam the maintainers prefer.
Co-Authored-By: Claude (Anthropic) <noreply@anthropic.com>
|
To use Codex here, create a Codex account and connect to github. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Companion to #942.
Summary
Adds one clause to
compute_cached/3insrc/preloaded/process/dev_process_worker.erlthat unwraps the AO-Core message envelope (#{<<"ao-result">> => <<"body">>, <<"body">> => <<"14040">>}) when the v0.9 HTTP resolve path deliversRawSlotwrapped instead of as a bare binary. Without this, every/computerequest on a production node running v0.9-FINAL crashes viafunction_clauseinhb_util:int/1(no map clause). Full diagnosis + stacktrace in #942.Change
src/preloaded/process/dev_process_worker.erl, inserted before the existingcompute_cached(ProcID, not_found, Opts) ->head:Rationale
hb_util:int(Body)then sees the binary the existingis_binaryclause handles.maps:find/2returningerrorfalls through tofalse, matchingcompute_cached/3's existing "not cached" semantics.If maintainers prefer the unwrap upstream in
dev_process:target_slot/2(closer to where the envelope is introduced), happy to rework.Validation
Applied against
permaweb/HyperBEAM/mainHEAD on a production node serving a custom (non-aos) Lua process at PIDDwnuy4MbuQkgwxw4-P08wxeny2KcwCh8Kd22mehacTc. Pre-patch: 100% of/computerequests crash withfunction_clause. Post-patch:function_clausedoes not recur.The two related v0.9 bugs reported in #942 (LMDB
mdb_page_dirtyassertion against existing cache,loadMessages"Body is not valid" against a fresh cache) are out of scope here.Follow-ups welcome
dev_process:target_slot/2.loadMessagesenvelope issues point at one upstream seam we could fix together.— Tad MacPherson, Metavolve Labs / Golden Codex
curator@golden-codex.com·@codex-curator·golden-codex.com