Skip to content

fix(remote): preserve caller info for default server decode errors#1967

Draft
ChocoLZS wants to merge 4 commits into
cloudwego:mainfrom
ChocoLZS:fix/log-info-in-decode-phase
Draft

fix(remote): preserve caller info for default server decode errors#1967
ChocoLZS wants to merge 4 commits into
cloudwego:mainfrom
ChocoLZS:fix/log-info-in-decode-phase

Conversation

@ChocoLZS

@ChocoLZS ChocoLZS commented Jun 9, 2026

Copy link
Copy Markdown
Member

What type of PR is this?

fix

Check the PR title.

  • This PR title match the format: (optional scope):
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Attach the PR updating the user documentation if the current PR requires user awareness at the usage level. User docs repo

(Optional) Translate the PR title into Chinese.

修复默认服务端 decode 错误日志中调用方信息丢失的问题

(Optional) More detailed description for this PR(en: English/zh: Chinese).

en:
This is a more complete fix based on PR #1272 for the logging issue reported in #1267.

The issue is limited to the default server handler path, which is reused by gonet, netpoll, and invoke. Before PR #1272, decode/read errors were logged inside OnRead, so request rpcinfo was still available. PR #1272 moved the final log to the outer fallback OnError to avoid duplicate error logs, but that fallback runs too late for request rpcinfo:

  • With rpcinfo pool enabled, OnRead resets the reused rpcinfo in deferred cleanup before the outer fallback log.
  • With rpcinfo pool disabled, OnRead creates a request ctx/rpcinfo, but only returns an error. The outer fallback still uses the connection ctx returned by OnActive.
// outer trans server
ctx, err = ts.transHdlr.OnActive(ctx, conn)
err = ts.transHdlr.OnRead(ctx, conn)
if err != nil {
    ts.transHdlr.OnError(ctx, err, conn) // fallback log after OnRead returns
}

// default_server_handler.go
func (t *svrTransHandler) OnRead(ctx context.Context, conn net.Conn) (err error) {
    ctx, ri := t.newCtxWithRPCInfo(ctx, conn)
    defer func() {
        if rpcinfo.PoolEnabled() {
            t.opt.InitOrResetRPCInfoFunc(ri, conn.RemoteAddr()) // reset before fallback log
        }
    }()

    ctx, err = t.transPipe.Read(ctx, conn, recvMsg)
    if err != nil {
        return err
    }
}

This PR restores the correct logging timing by logging decode/read errors inside OnRead, while request rpcinfo is still available. It also keeps PR #1272's duplicate-log optimization by returning an unwrap-compatible marker:

if err != nil {
    t.writeErrorReplyIfNeeded(ctx, recvMsg, conn, err, ri, true, true)
    t.OnError(ctx, err, conn)
    return reportedError{err}
}

type reportedError struct{ error }
func (e reportedError) Unwrap() error { return e.error }

The returned error remains non-nil, so the outer trans server still closes the connection as before. Unwrap preserves errors.Is/As, and the marker only suppresses the second fallback log.

Checked other ServerTransHandler implementations: netpollmux already logs read/write errors before rpcinfo reset; nphttp2 and ttstream use separate stream handling flows; detection only delegates to the selected handler. Business handler panics are also unaffected because they are recovered and logged inside OnRead before deferred cleanup.

A regression test is added for both rpcinfo pool modes to verify that read errors are logged with remoteService and that the outer fallback log is skipped.

zh(optional):
这是基于 PR #1272 的更完整修复,用于解决 #1267 中报告的日志问题。

该问题仅限于默认服务端 handler 路径,也就是 gonet、netpoll、invoke 复用的路径。PR #1272 之前,decode/read 错误是在 OnRead 内部打印的,因此还能拿到请求级 rpcinfo。PR #1272 为了解决重复错误日志问题,把最终日志交给外层兜底 OnError 打印,但兜底打印对请求级 rpcinfo 来说太晚了:

  • 开启 rpcinfo pool 时,OnRead 的 deferred cleanup 会在外层兜底打印前 reset 复用的 rpcinfo。
  • 关闭 rpcinfo pool 时,OnRead 会创建 request ctx/rpcinfo,但只返回 error;外层兜底仍然使用 OnActive 返回的 connection ctx。
// outer trans server
ctx, err = ts.transHdlr.OnActive(ctx, conn)
err = ts.transHdlr.OnRead(ctx, conn)
if err != nil {
    ts.transHdlr.OnError(ctx, err, conn) // fallback log after OnRead returns
}

// default_server_handler.go
func (t *svrTransHandler) OnRead(ctx context.Context, conn net.Conn) (err error) {
    ctx, ri := t.newCtxWithRPCInfo(ctx, conn)
    defer func() {
        if rpcinfo.PoolEnabled() {
            t.opt.InitOrResetRPCInfoFunc(ri, conn.RemoteAddr()) // reset before fallback log
        }
    }()

    ctx, err = t.transPipe.Read(ctx, conn, recvMsg)
    if err != nil {
        return err
    }
}

本 PR 恢复正确的打印时机:在 OnRead 内部、请求级 rpcinfo 仍然可用时打印 decode/read 错误。同时保留 PR #1272 的去重效果:返回一个支持 Unwrap 的 marker error,用来标记该错误已经打印过:

if err != nil {
    t.writeErrorReplyIfNeeded(ctx, recvMsg, conn, err, ri, true, true)
    t.OnError(ctx, err, conn)
    return reportedError{err}
}

type reportedError struct{ error }
func (e reportedError) Unwrap() error { return e.error }

返回错误仍然非 nil,外层 trans server 仍会按原逻辑关闭连接;Unwrap 保留 errors.Is/As 行为;marker 只用于跳过第二条兜底日志。

已检查其他 ServerTransHandler 实现:netpollmux 已经在 rpcinfo reset 前打印 read/write 错误;nphttp2 和 ttstream 使用独立的 stream 处理流程;detection 只负责转发到选中的 handler。业务 handler panic 也不受影响,因为 panic 会在 OnRead 内部、deferred cleanup 前被 recover 并打印。

新增回归测试,覆盖 rpcinfo pool 开启和关闭两种模式,验证 read error 日志包含 remoteService,且外层兜底日志会被跳过。

(Optional) Which issue(s) this PR fixes:

Fixes #1267

(optional) The PR that updates user documentation:

@ChocoLZS ChocoLZS requested review from a team as code owners June 9, 2026 12:05
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 51.87%. Comparing base (d27f7db) to head (a69bd67).
⚠️ Report is 2 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (d27f7db) and HEAD (a69bd67). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (d27f7db) HEAD (a69bd67)
unit 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1967       +/-   ##
===========================================
- Coverage   62.84%   51.87%   -10.98%     
===========================================
  Files         394      332       -62     
  Lines       30267    23337     -6930     
===========================================
- Hits        19022    12105     -6917     
+ Misses       9948     9905       -43     
- Partials     1297     1327       +30     
Flag Coverage Δ
integration 51.87% <ø> (+0.01%) ⬆️
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ChocoLZS ChocoLZS marked this pull request as draft June 10, 2026 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

svrTransHandler.OnError will be called twice when some error occured in svrTransHandler.OnRead

1 participant