MILAB-6319: canonical-encode createJsonResource + trace string for deterministic CIDs#1662
MILAB-6319: canonical-encode createJsonResource + trace string for deterministic CIDs#1662PaulNewling wants to merge 2 commits into
Conversation
…c RTYPE_JSON CIDs
🦋 Changeset detectedLatest commit: dc18208 The changes in this PR will be included in the next version bump. This PR includes changesets to release 10 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Code Review
This pull request addresses non-deterministic RTYPE_JSON resource CIDs by updating createJsonResource in smart.lib.tengo to use canonical.encode (which ensures key-sorted, stable serialization) instead of json.encode. Tests have been added in smart.test.tengo to verify that both flat and nested maps are serialized canonically. Feedback is provided to remove an unused import of the json module in the test file.
| canonical := import(":canonical") | ||
| json := import("json") |
❌ 39 Tests Failed:
View the full list of 39 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
Second instance of the non-canonical-encoding trap (re-applies ef340dbf9). makeTrace built the pl7.app/trace annotation via json.encode of step maps (randomized Go-map order). The string is embedded inside a spec resource, so non-canonical inner key order escapes the backend's outer canonicalization and poisons consumer CIDs -> CIDConflictError. Switch to canonical.encode; add trace + canonical regression tests. Updated the changeset to cover both canonical fixes.
| encoded := canonical.encode(value) | ||
|
|
||
| return createValueResource(constants.RTYPE_JSON, encoded) |
There was a problem hiding this comment.
Other resource-creation sites still use
json.encode directly
Several callers in the SDK bypass createJsonResource entirely and call smart.createValueResource (or tx.createValue) with a raw json.encode(...) output, leaving those resource types equally non-deterministic. For example, workflow/bobject.lib.tengo:49 does smart.createValueResource(constants.RTYPE_BOBJECT_SPEC, json.encode(spec)), and exec/runcmd.lib.tengo:773-776 creates RTYPE_RUN_COMMAND_OPTIONS and RTYPE_RUN_COMMAND_ARGS the same way. Any map-typed spec or options struct at those sites will produce non-stable CIDs the same way createJsonResource did before this fix. These are outside this PR's stated scope but worth tracking as follow-up.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/workflow-tengo/src/smart.lib.tengo
Line: 1514-1516
Comment:
**Other resource-creation sites still use `json.encode` directly**
Several callers in the SDK bypass `createJsonResource` entirely and call `smart.createValueResource` (or `tx.createValue`) with a raw `json.encode(...)` output, leaving those resource types equally non-deterministic. For example, `workflow/bobject.lib.tengo:49` does `smart.createValueResource(constants.RTYPE_BOBJECT_SPEC, json.encode(spec))`, and `exec/runcmd.lib.tengo:773-776` creates `RTYPE_RUN_COMMAND_OPTIONS` and `RTYPE_RUN_COMMAND_ARGS` the same way. Any map-typed spec or options struct at those sites will produce non-stable CIDs the same way `createJsonResource` did before this fix. These are outside this PR's stated scope but worth tracking as follow-up.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
What this fixes
Two non-canonical JSON encodings in
workflow-tengoproduced non-deterministic resource CIDs — causingCIDConflictErrorand silent cross-project dedup misses:createJsonResourceencoded viajson.encode, whose Go-map iteration order varies per render, so the same logical value yielded a differentRTYPE_JSONCID each time. This affects every caller, including the SDK-internal params built insidepframes.processColumnandxsv.importFile.makeTrace(pframes/spec.lib.tengo) built thepl7.app/traceannotation viajson.encodeof step maps. That string sits inside a spec resource, so its inner key order escapes the backend's outer canonicalization; a randomized order poisoned the CID of every spec consumer.Both now use
canonical.encode(recursively key-sorted). The decoded values are unchanged; only the byte order becomes deterministic, so CIDs are stable across renders.How it was tested
canonical.test.tengo,spec.test.tengo): assertjson.encodeis non-deterministic across repeated calls whilecanonical.encodeis stable and key-sorted (flat and nested), and that the trace value-string is deterministic. The fullworkflow-tengosuite passes.mixcr-amplicon-alignment, both encodings droveCIDConflictErrorin itswf.test.ts; with these two fixes, the createJsonResource- and trace-driven conflicts no longer occur on a fresh backend.Risks and downsides
createJsonResourcecaller across the SDK and all blocks now gets canonical bytes — intended (it closes the trap everywhere), but a behavior change for all consumers.canonical.encodesorts keys recursively, slightly slower thanjson.encode; negligible for the small params/trace maps on the hot path.Scope note
A separate, related CID-determinism issue — the
hash_override+saveStdoutStreampattern in MiXCR-family blocks, where rate-dependent stdout poisons the dedup CID — is not addressed here. It needs a different fix (an SDK stdout-only-capture primitive) and is tracked separately.Greptile Summary
This PR fixes two CID non-determinism bugs in
workflow-tengoby replacingjson.encode(Go map iteration order is randomized per call) withcanonical.encode(recursively key-sorted, RFC 8785-style) increateJsonResourceandmakeTrace. The decoded values are unchanged; only the byte order becomes deterministic, so resource CIDs are now stable across renders.smart.lib.tengo:createJsonResourcenow callscanonical.encode, closing the trap for everyRTYPE_JSONresource including SDK-internal params inpframes.processColumnandxsv.importFile.pframes/spec.lib.tengo:makeTracenow canonical-encodes thepl7.app/traceannotation string, preventing its non-deterministic inner key order from poisoning the CID of parent spec resources.canonical.test.tengo,spec.test.tengo,smart.test.tengo): regression tests confirmjson.encodeis non-deterministic across 200 calls whilecanonical.encodeproduces stable, lexicographically sorted output for both flat and nested maps.Confidence Score: 4/5
The two targeted fixes are correct and well-tested; the main caveat is a known one-time CID recompute on upgrade, which is unavoidable and documented in the changeset.
The changes are minimal and backed by thorough regression tests pinning exact sorted-key output. Several other
createValueResourcecall sites inworkflow/bobject.lib.tengo,exec/runcmd.lib.tengo, andpframes-rs.lib.tengostill usejson.encodedirectly, so the same class of non-determinism survives in those resource types and warrants a follow-up pass.The remaining
json.encodeusages for direct resource creation inworkflow/bobject.lib.tengoandexec/runcmd.lib.tengoare worth a second look, since they bypass the now-fixedcreateJsonResource.Important Files Changed
createJsonResourcefromjson.encodetocanonical.encode; fix is correct and well-scoped, though several sibling sites that callcreateValueResourcedirectly withjson.encoderemain unfixed.makeTracetrace-string encoding fromjson.encodetocanonical.encode; thestring()cast around the result is now redundant (canonical.encode already returns a string for arrays) but harmless.json.encodeis non-deterministic andcanonical.encodeis stable across 200 calls for both flat and nested maps; good coverage.TestMakeTraceValueStrIsDeterministicandTestMakeTraceValueStrIsSortedKeysto pin the exact sorted-key output ofmakeTrace.valueStr; solidly guards the MILAB-6319 regression.canonical.encodecontract forcreateJsonResourcewith golden-string assertions for flat and nested maps.Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[createJsonResource / makeTrace called] --> B{Encode value} B -- before fix --> C[json.encode\nGo map iteration = random order] B -- after fix --> D[canonical.encode\nkeys sorted lexicographically] C --> E[Non-deterministic bytes] D --> F[Deterministic bytes] E --> G[Different CID per render\nCIDConflictError / dedup miss] F --> H[Stable CID across renders] H --> I[tx.createValue stores resource] G --> J[Bug] I --> K[Fixed]Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "MILAB-6319: canonicalize trace string in..." | Re-trigger Greptile