fix: handle ACM AdminSetup silent-close, Setup fallback, and TLS flag…#1295
fix: handle ACM AdminSetup silent-close, Setup fallback, and TLS flag…#1295nbmaiti wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves AMT provisioning reliability (especially for TLS/LMS flows) by handling AdminSetup silent socket closes, adding a Setup→AdminSetup fallback path, and propagating a “local TLS enforced” flag through CLI → RPS payloads.
Changes:
- Add
localTlsEnforcedto the RPS message payload and thread the flag through command contexts and RPS requests. - Update the RPS executor to (a) synthesize a success response when AdminSetup returns no data, (b) attempt AdminSetup fallback when Setup returns NotSupported, and (c) add targeted diagnostics for CommitChanges 401s.
- Rework LMS
Listen()to read until a complete HTTP response (chunked / Content-Length) and add tests for the HTTP completeness check.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/utils/constants.go | Adds LMSReadIdleTimeout used by LMS read-loop behavior. |
| internal/rps/message.go | Adds LocalTLSEnforced to JSON payload sent to RPS. |
| internal/rps/message_test.go | Adds coverage ensuring localTlsEnforced is always present in request JSON. |
| internal/rps/executor.go | Adds AdminSetup silent-close handling, Setup→AdminSetup fallback, CommitChanges auth reset/diagnostics, and helper parsers/builders. |
| internal/rps/executor_test.go | New test file (currently empty) intended for executor coverage. |
| internal/lm/service.go | Updates LMS Listen() to buffer until “complete” HTTP response; adds read-deadline logic and HTTP completeness helper. |
| internal/lm/service_test.go | Adds unit tests for isCompleteHTTPResponse. |
| internal/commands/shared.go | Adds LocalTLSEnforced to shared command context. |
| internal/commands/deactivate.go | Propagates TLS/control-mode flags into remote deactivation RPS requests. |
| internal/commands/activate/remote.go | Propagates LocalTlsEnforced into activation RPS request. |
| internal/commands/activate/activate.go | Ensures context is updated with control-mode / local TLS enforcement before remote activation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a1ddef5 to
aaaa919
Compare
There was a problem hiding this comment.
Few important things happening here that should not be. RPC-go should never be manipulating or inspecting wsman messages or making decisions off of them for RPS flows. That all has to be removed. Additionally there is duplicate flags now -- we already handle TLSEnforced, not sure why it has been added twice. Theres an empty cert file that needs to be removed. The control mode should already be available via AMTBaseCmd for activate/deactivate/configure . As it stands the version you tested with beta.23 is over two weeks old. You should be using main always -- same with RPS. I do not see the same failures. Please try again ensuring you have the latest code, and then lets take each failure and take a look at whats going on so we can address each systematically. Thanks !
is enforced When the device has LocalTLSEnforced set, RPS owns the TLS session end-to-end through its TLSTunnelManager. rpc-go must forward raw TCP bytes; layering its own TLS client on top causes a TLS-in-TLS handshake that AMT rejects. * lm/service.go: add a 'tlsTunnel' flag on LMSConnection with SetTLSTunnelMode and ActivateTLSTunnelMode helpers so callers can mark the connection as a raw-bytes tunnel without needing to know the field layout. * rps/executor.go: in NewExecutor, when LocalTlsEnforced is true, build the LMS connection with tunnel mode enabled and mark the executor as already switched. Also auto-switch on the first MethodTLSData if RPS skipped port_switch (v2.x protocol), and have handlePortSwitch dial plain TCP + tag the new connection as tunnel mode. Signed-off-by: Nabendu Maiti <nabendu.bikash.maiti@intel.com>
ReturnValue errors MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit AMT's IPS_HostBasedSetupService/AdminSetup, AMT_SetupAndConfigurationService/ Unprovision, and /transfer/Delete close the LMS socket without sending an HTTP response when they succeed. RPS interprets the dangling read as a failure and aborts the activation. Additionally, in TLS-tunnel mode TLS 1.3 has handshake rounds that legitimately produce zero AMT-side bytes — the Listen loop must not treat those quiet rounds as a fatal close. * lm/service.go: rework Listen() to always deliver the read buffer (including empty), gate the typed ErrLMSReadTimeoutNoData signal on tlsTunnel mode so non-tunnel callers don't see noise on a benign silent close, and let non-tunnel mode loop on timeout-no-data until bytes appear or an error fires. * rps/executor.go: classify the outgoing payload via isAdminSetupRequest. On an empty response: synthesize a SOAP success in LME mode, HECI-verify control mode in LMS mode before synthesizing, and skip synthesis in tunnel mode (RPS expects raw tunnel bytes, not XML). On a non-empty Setup/ AdminSetup response, parse <g:ReturnValue> and surface a typed error for non-zero values (rv=1 NotSupported is common when CCM is disabled). Signed-off-by: Nabendu Maiti <nabendu.bikash.maiti@intel.com>
response MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two end-of-conversation problems left rpc-go and RPS waiting on each other: * CommitChanges sometimes returned '401 Unauthorized' because the existing digest-auth context from prior Setup ops still had a stale nonce. Closing the LMS connection before the CommitChanges request forces a fresh auth negotiation. When 401 still comes back, surface it as a typed error instead of letting RPS time out. * In TLS-tunnel mode, AMT issues a TCP close (FIN/RST) right after the final WSMAN response (CommitChanges, last cert op, etc.). RPS waits for a MethodConnectionReset signal before emitting terminal success; without it the websocket sits idle until RPS times out at ~60s and closes with code 1006 — even though the device is already provisioned. * lm/service.go: add PeekClose(peekTimeout) that does a bounded read to detect EOF/RST and returns any trailing bytes (e.g. TLS close_notify) so the caller can forward them before signalling RPS. * rps/executor.go: add isCommitChangesRequest classification + close the LMS connection up front for CommitChanges; surface 401 Unauthorized in the CommitChanges path as a typed error; after each tunnel response, PeekClose for 150ms and emit MethodConnectionReset on close. Also add isProvisionedAfterUnexpectedRPSClose: when the websocket drops without terminal status in tunnel mode, verify via HECI that the device is actually provisioned before declaring failure. Signed-off-by: Nabendu Maiti <nabendu.bikash.maiti@intel.com>
latency MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two timing knobs were paying a multi-second tax on every activation: * lm/service.go: in Listen()'s TLS branch the subsequent-read timeout was set to 2s. Once AMT starts streaming bytes for a response the remaining packets arrive within milliseconds, so the second timeout fires on every single round-trip — for an ACM/CCM flow with ~50 round-trips this added ~100s of pure idle. Drop it to 100ms; keep the first-byte timeout at 2s so AMT still has time to start producing a TLS record. * commands/base.go: GetControlMode retried on HECI busy with a flat 4s backoff for up to 4 attempts, so a transient busy at startup cost 12s before the command could begin. Use exponential backoff starting at 500ms, capped at 4s; the worst-case 3-attempt total drops to 3.5s and a typical one-retry case to 500ms. * rps/executor.go: cosmetic-only — expand the empty-tunnel-response branch doc comment to describe RPS's Phase-2 expectations and add a //nolint rationale for the best-effort connection_reset Send. No behavior change. Signed-off-by: Nabendu Maiti <nabendu.bikash.maiti@intel.com>
| tlsTunnelActive: config.LocalTlsEnforced, | ||
| switchedToTunnel: config.LocalTlsEnforced, // already in plain-TCP tunnel mode from the start |
| e.isCommitChanges = isCommitChangesRequest(msg.Payload) | ||
| if e.isCommitChanges { | ||
| log.Debug("CommitChanges detected - closing any existing connections to force fresh auth") | ||
| e.localManagement.Close() |
| soapBody := "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" + | ||
| "<a:Envelope xmlns:a=\"http://www.w3.org/2003/05/soap-envelope\" xmlns:b=\"http://schemas.xmlsoap.org/ws/2004/08/addressing\" xmlns:c=\"http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd\" xmlns:g=\"http://intel.com/wbem/wscim/1/ips-schema/1/IPS_HostBasedSetupService\">" + | ||
| "<a:Header>" + | ||
| "<b:To>http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</b:To>" + | ||
| "<b:RelatesTo>" + messageID + "</b:RelatesTo>" + | ||
| "<b:Action a:mustUnderstand=\"true\">http://intel.com/wbem/wscim/1/ips-schema/1/IPS_HostBasedSetupService/" + actionName + "Response</b:Action>" + |
All AMT and TLS are working.
Detail explanation of change in wiki
Summary
Fixes ACM activation edge cases and payload propagation:
Validation
Test Result
Index | Mode | AMT Ver | ccm_local | Remote | Overall | acm_p (S/Act/Deact) | ccm_p (S/Act/Deact)
-------+------+---------+-----------+--------+---------+---------------------+---------------------
1 | lms | 16.1.27 | PASS | FAIL | FAIL | FAIL / 20s / 1s | FAIL / 1s / 1s
-------+------+---------+-----------+--------+---------+---------------------+---------------------
2 | lms | 18.1.18 | PASS | FAIL | FAIL | FAIL / 20s / 2s | FAIL / 3s / 2s
-------+------+---------+-----------+--------+---------+---------------------+---------------------
3 | lms | 19.0.5 | PASS | FAIL | FAIL | FAIL / 20s / 9s | FAIL / 22s / 9s
-------+------+---------+-----------+--------+---------+---------------------+---------------------
4 | lms | 20.0.5 | PASS | FAIL | FAIL | FAIL / 19s / 9s | FAIL / 19s / 9s
-------+------+---------+-----------+--------+---------+---------------------+---------------------
5 | lms | 21.0.2 | PASS | FAIL | FAIL | FAIL / 20s / 9s | FAIL / 20s / 9s
Index | Mode | AMT Ver | ccm_local | Remote | Overall | acm_p (S/Act/Deact) | ccm_p (S/Act/Deact)
-------+------+---------+-----------+--------+---------+---------------------+---------------------
1 | lms | 16.1.27 | PASS | PASS | PASS | PASS / 25s / 1s | PASS / 23s / 1s
-------+------+---------+-----------+--------+---------+---------------------+---------------------
2 | lms | 18.1.18 | PASS | PASS | PASS | PASS / 28s / 2s | PASS / 26s / 2s
-------+------+---------+-----------+--------+---------+---------------------+---------------------
3 | lms | 19.0.5 | PASS | PASS | PASS | PASS / 53s / 3s | PASS / 49s / 3s
-------+------+---------+-----------+--------+---------+---------------------+---------------------
4 | lms | 20.0.5 | PASS | PASS | PASS | PASS / 48s / 2s | PASS / 44s / 2s
-------+------+---------+-----------+--------+---------+---------------------+---------------------
5 | lms | 21.0.2 | PASS | PASS | PASS | PASS / 61s / 4s | PASS / 54s / 4s