Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions docs/manual-test-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -1502,6 +1502,110 @@ Expected: per-entry findings count exactly, valid forms produce zero findings, a

Regression anchor: the no-overlap unit test `TestMultidocNetAddrHandlers_NoOverlapWithRefHandlers` pins the dispatch-map disjointness contract. A future entry that lands in BOTH `multidocHandlers` and `multidocNetAddrHandlers` produces double findings — verify via the unit suite before manual smokes.

## N. Output discipline & error diagnostics

These scenarios pin two operator-facing contracts: (1) stdout stays clean enough to be piped into another file without filtering progress text out; (2) when a `lookup` against the live Talos API fails, the error chain names the dialed endpoint, the resource kind, and a class-specific remedy.

### N.1 stdout cleanliness — `talm template --file X > Y`

```bash
cd $PROJECT
talm template --offline -f nodes/node0.yaml > /tmp/rendered.yaml
head -1 /tmp/rendered.yaml
# Expected: the first line is `# talm: nodes=[...]` (rendered modeline comment), NOT `- talm: file=...`.
# A leading `- talm:` would mean the per-file progress line is leaking into stdout.

# Same redirect without --offline against a reachable node:
talm template -f nodes/node0.yaml > /tmp/rendered.yaml
head -1 /tmp/rendered.yaml
# Expected: same `# talm: nodes=[...]` first line; progress is on stderr only.

# Confirm progress lands on stderr:
talm template --offline -f nodes/node0.yaml 1>/dev/null 2>/tmp/stderr.txt
grep '^- talm: file=' /tmp/stderr.txt
# Expected: one matching line per --file argument; stderr is the only channel that carries it.
```

Run the same shape against `talm apply --dry-run -f nodes/node0.yaml > /tmp/rendered.yaml`. Note that `talm apply` writes nothing to stdout in normal mode (only `--debug` emits the recipe stream); the assertion here is **defensive** — verifying that nothing new starts leaking onto stdout in the future, not filtering an existing stream. The progress line must still go to stderr; assert stdout is empty and stderr carries `- talm: file=...`.

### N.2 Lookup error diagnostics — node unreachable

Run with `--endpoints` pointing at an address that is reachable on TCP but does not run Talos (or a local closed port), and WITHOUT `--offline`:

```bash
talm template -f nodes/node0.yaml --endpoints 192.0.2.123 || true
# Expected error chain MUST contain:
# - `looking up resource kind="disks" namespace="" id="" on endpoints=[192.0.2.123]`
# - a class hint mentioning the dialed endpoint (192.0.2.123)
# - one of the remedies depending on class:
# * TLS handshake failure → mentions "cert SANs" / "maintenance mode"
# * connection refused → mentions "firewall" / "wrong port"
# * timeout → mentions "timed out"
# * authn rejection → mentions "talosconfig"
# - a closing sentence pointing at `--offline` as the safe escape ("If live discovery is not required, pass --offline").
```

Repeat with `talm apply --dry-run` against the same unreachable endpoint:

```bash
talm apply --dry-run -f nodes/node0.yaml --endpoints 192.0.2.123 || true
# Expected: same class hint shape, but the remedy clause MUST say
# "Fix node reachability before re-running apply." and MUST NOT mention `--offline`
# (apply has no --offline flag — suggesting it would teach a non-existent workflow).
```

### N.3 Lookup retry policy — transient connectivity

Bring up a node, then drop its API briefly mid-template (e.g. `systemctl stop apid` for ~300ms on a Talos test node, or use `iptables -A INPUT -p tcp --dport 50000 -j DROP` for half a second on a non-prod target). The expectation is that `talm template -f nodes/node0.yaml` succeeds on the second or third attempt without the operator seeing an error — retry budget is 3 attempts with 200ms/400ms exponential backoff (~600ms total).

```bash
# In one shell:
talm template -f nodes/node0.yaml > /tmp/rendered.yaml

# In another (within ~300ms of the first):
# Briefly block the talos API:
ssh ${NODE} iptables -A INPUT -p tcp --dport 50000 -j DROP
sleep 0.3
ssh ${NODE} iptables -D INPUT -p tcp --dport 50000 -j DROP
```

Expected: template render succeeds after retry (1 to 2 retried attempts visible only if `--debug` is on); operator sees the rendered output, no error message.

Same shape against a node that stays down longer than the budget:

```bash
ssh ${NODE} iptables -A INPUT -p tcp --dport 50000 -j DROP
talm template -f nodes/node0.yaml || true
ssh ${NODE} iptables -D INPUT -p tcp --dport 50000 -j DROP
```

Expected: failure after ~600ms+ (3 attempts × backoff); operator sees the `connection refused` class hint pointing at firewall / port / `--offline`.

Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated
Ctrl+C should interrupt retries instantly:

```bash
ssh ${NODE} iptables -A INPUT -p tcp --dport 50000 -j DROP
talm template -f nodes/node0.yaml &
sleep 0.1
kill -INT $!
ssh ${NODE} iptables -D INPUT -p tcp --dport 50000 -j DROP
```

Expected: `talm` exits within ~100ms, not after the full 600ms retry budget.

Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated
### N.4 Lookup error diagnostics — non-network failure

Forge a chart that calls `lookup` for an unknown resource kind (e.g. add `{{- (lookup "nonexistent_resource_kind" "" "").items }}` to a helper template), then run `talm template -f nodes/node0.yaml` against a reachable Talos node:

```bash
talm template -f nodes/node0.yaml || true
# Expected error chain MUST contain:
# - `looking up resource kind="nonexistent_resource_kind" ...`
# - hint mentioning "chart bug" / "unsupported Talos resource"
# - `talosctl get nonexistent_resource_kind` as the verification step
# - MUST NOT suggest `--offline` (would mask the chart bug by rendering against empty discovery).
```

## Sanity-check block

Run after every destructive section (E, F, H, and anything that touches `--mode=reboot` / `--mode=staged` / `apply -I`):
Expand Down
22 changes: 12 additions & 10 deletions pkg/commands/apply.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,12 @@ import (
const parentDir = ".."

// applyCommandName labels this subcommand inside engine.Options for
// FailIfMultiNodes error wording. Centralised so the template
// rendering options block and any test asserting against the field
// share a single canonical value.
const applyCommandName = "talm apply"
// FailIfMultiNodes error wording and for the engine's apply-vs-template
// hint branching. Aliased to engine.CommandNameApply so the engine
// constant remains the single source of truth — drift between the
// two would let apply error hints silently start suggesting the
// non-existent --offline flag.
const applyCommandName = engine.CommandNameApply

// roleAnchor and roleSidePatch label the role of a config file in
// the apply chain for operator-facing error messages
Expand Down Expand Up @@ -417,12 +419,11 @@ func applyOneFileTemplateMode(configFile string, sidePatches, modelineTemplates
// dominant invocation shape is `talm apply -f nodes/<name>.yaml`
// without side-patches; reporting `side-patches=[]` on every line
// was visible noise without operator value.
// Progress line goes to stderr. Apply normally writes nothing to stdout (only `--debug` emits the recipe stream); routing progress here matches the stdout-cleanliness contract already in effect for `talm template > file.yaml` and keeps the future `--debug` recipe stream uncontaminated.
if len(sidePatches) == 0 {
//nolint:forbidigo // CLI progress line surfaces the file-to-target mapping for the operator
fmt.Printf("- talm: file=%s, nodes=[%s], endpoints=[%s]\n", configFile, strings.Join(nodes, ","), strings.Join(GlobalArgs.Endpoints, ","))
fmt.Fprintf(os.Stderr, "- talm: file=%s, nodes=[%s], endpoints=[%s]\n", configFile, strings.Join(nodes, ","), strings.Join(GlobalArgs.Endpoints, ","))
} else {
//nolint:forbidigo // CLI progress line surfaces the file-to-target mapping for the operator
fmt.Printf("- talm: file=%s, side-patches=[%s], nodes=[%s], endpoints=[%s]\n", configFile, strings.Join(sidePatches, ","), strings.Join(nodes, ","), strings.Join(GlobalArgs.Endpoints, ","))
fmt.Fprintf(os.Stderr, "- talm: file=%s, side-patches=[%s], nodes=[%s], endpoints=[%s]\n", configFile, strings.Join(sidePatches, ","), strings.Join(nodes, ","), strings.Join(GlobalArgs.Endpoints, ","))
}

applyClosure := buildApplyClosure()
Expand Down Expand Up @@ -567,8 +568,8 @@ func applyOneFileDirectPatchMode(configFile, withSecretsPath string) error {
return err
}

//nolint:forbidigo // CLI progress line surfaces the file-to-target mapping for the operator
fmt.Printf("- talm: file=%s, nodes=[%s], endpoints=[%s]\n", configFile, strings.Join(targetNodes, ","), strings.Join(GlobalArgs.Endpoints, ","))
// Progress line goes to stderr; stdout is reserved for rendered output.
fmt.Fprintf(os.Stderr, "- talm: file=%s, nodes=[%s], endpoints=[%s]\n", configFile, strings.Join(targetNodes, ","), strings.Join(GlobalArgs.Endpoints, ","))

read := cosiVersionReader(c)

Expand Down Expand Up @@ -1010,6 +1011,7 @@ func buildApplyRenderOptions(modelineTemplates []string, withSecretsPath string)
Root: Config.RootDir,
TemplateFiles: resolvedTemplates,
CommandName: applyCommandName,
TalosEndpoints: append([]string(nil), GlobalArgs.Endpoints...),
}
}

Expand Down
41 changes: 41 additions & 0 deletions pkg/commands/contract_apply_command_name_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
// Copyright Cozystack Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

// Contract: applyCommandName is the exact engine.CommandNameApply
// value. Drift between the two would cause apply error hints to
// silently start suggesting the non-existent `--offline` flag — the
// apply hint formatter in pkg/engine/lookup_classify.go branches on
// equality with engine.CommandNameApply to omit the `--offline`
// remedy clause.
//
// `applyCommandName` is declared as `= engine.CommandNameApply` in
// pkg/commands/apply.go so a Go compile error catches drift at the
// declaration site already. This test is the belt-and-suspenders
// safety net: a future refactor that re-introduces a literal would
// fail here too.

package commands

import (
"testing"

"github.com/cozystack/talm/pkg/engine"
)

func TestContract_ApplyCommandName_MatchesEngineConstant(t *testing.T) {
if applyCommandName != engine.CommandNameApply {
t.Errorf("applyCommandName=%q drifted from engine.CommandNameApply=%q — apply hints will silently suggest --offline",
applyCommandName, engine.CommandNameApply)
}
}
Loading
Loading