Skip to content

fix(security): harden auth system and fix run journal logic bug#2593

Open
WillemJiang wants to merge 2 commits intorelease/2.0-rcfrom
review-2566-security
Open

fix(security): harden auth system and fix run journal logic bug#2593
WillemJiang wants to merge 2 commits intorelease/2.0-rcfrom
review-2566-security

Conversation

@WillemJiang
Copy link
Copy Markdown
Collaborator

@WillemJiang WillemJiang commented Apr 27, 2026

Reviewed 2566 security-related code, found out below issues

1. Logic Bug in RunJournal.on_chat_model_start (journal.py:144)

if not self._first_human_msg and not messages:

This condition enters the block when messages is empty (falsy), then attempts to iterate messages.reversed() over an empty list — a no-op. The first human message is never captured. Should be:

if not self._first_human_msg and messages:

This is the canonical place to extract the first human message per the module docstring, so this bug means run.input will always be empty.

2. bcrypt 72-byte Password Truncation (password.py)

No password length limit is enforced. bcrypt silently truncates at 72 bytes. Two passwords differing only after byte 72 produce identical hashes. Consider either pre-hashing with SHA-256 or documenting this
limitation.

3. load_dotenv() at Module Scope (auth/config.py)

Calling load_dotenv() at import time mutates os.environ as a side effect of importing the module. This can break test isolation and cause surprising behavior when the module is imported before test fixtures
configure the environment. Move it into get_auth_config() or the app startup sequence.

Security Concerns

4. In-Process Rate Limiting (routers/auth.py:155)

The login rate limiter (_login_attempts dict) is per-process only. With gunicorn multi-worker deployments, an attacker gets 5 attempts per worker (e.g., 4 workers = 20 attempts in 5 minutes). This should be
documented or supplemented with a shared store.

5. /setup-status Information Leak

This endpoint is public and reveals whether the system has been initialized. An attacker can probe it to determine if /initialize is still open. Consider rate-limiting it or returning a non-descriptive
response.

6. Token Error Details Exposed to Clients

langgraph_auth.py returns f"Token error: {payload.value}" which reveals whether a token is expired, has an invalid signature, or is malformed. Return a generic "Invalid token" message instead.

7. Auth Decorator Coupling (authz.py)

@require_auth alone does NOT enforce authentication — it relies on AuthMiddleware being in the stack. If someone removes the middleware, routes decorated with only @require_auth will silently allow unauthenticated access. Consider having @require_auth independently enforce 401.

Changes Summary

 ┌─────┬───────────────────────────────┬───────────────────┬──────────────────────────────────────────────────────────────────────────────┐
 │  #  │             Issue             │       File        │                                     Fix                                      │
 ├─────┼───────────────────────────────┼───────────────────┼──────────────────────────────────────────────────────────────────────────────┤
 │ 1   │ on_chat_model_start logic bug │ journal.py:144    │ not messages → messages — first human message now captured correctly         │
 ├─────┼───────────────────────────────┼───────────────────┼──────────────────────────────────────────────────────────────────────────────┤
 │ 2   │ bcrypt 72-byte truncation     │ password.py       │ SHA-256 pre-hashing before bcrypt — full password always contributes to hash │
 ├─────┼───────────────────────────────┼───────────────────┼──────────────────────────────────────────────────────────────────────────────┤
 │ 3   │ load_dotenv() at module scope │ config.py         │ Moved inside get_auth_config() — no more import-time env mutation            │
 ├─────┼───────────────────────────────┼───────────────────┼──────────────────────────────────────────────────────────────────────────────┤
 │ 4   │ Rate limiter docs             │ auth.py           │ Added detailed comment about multi-worker limitation                         │
 ├─────┼───────────────────────────────┼───────────────────┼──────────────────────────────────────────────────────────────────────────────┤
 │ 5   │ /setup-status info leak       │ auth.py           │ Added per-IP 60s cooldown; rapid probing returns needs_setup=False           │
 ├─────┼───────────────────────────────┼───────────────────┼──────────────────────────────────────────────────────────────────────────────┤
 │ 6   │ Token error details exposed   │ langgraph_auth.py │ Generic "Invalid token" instead of specific error variant                    │
 ├─────┼───────────────────────────────┼───────────────────┼──────────────────────────────────────────────────────────────────────────────┤
 │ 7   │ Auth decorator coupling       │ authz.py          │ require_auth now independently raises 401 for unauthenticated requests       │
 └─────┴───────────────────────────────┴───────────────────┴──────────────────────────────────────────────────────────────────────────────┘

  - Fix inverted condition in RunJournal.on_chat_model_start that prevented
    first human message capture (not messages → messages)
  - Pre-hash passwords with SHA-256 before bcrypt to avoid silent 72-byte
    truncation vulnerability
  - Move load_dotenv() from module scope into get_auth_config() to prevent
    import-time os.environ mutation breaking test isolation
  - Return generic ‘Invalid token’ instead of exposing specific error
    variants (expired, malformed, invalid_signature) to clients
  - Make @require_auth independently enforce 401 instead of silently
    passing through when AuthMiddleware is absent
  - Rate-limit /setup-status endpoint with per-IP cooldown to mitigate
    initialization-state information leak
  - Document in-process rate limiter limitation for multi-worker deployments
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens DeerFlow’s authentication surface area (token errors, decorator behavior, setup-state exposure, password hashing) and fixes a RunJournal callback condition that prevented capturing the first human prompt in a run.

Changes:

  • Fix RunJournal’s first-human-message capture by correcting an inverted messages condition.
  • Harden auth behavior: generic invalid-token messaging and @require_auth fail-closed enforcement.
  • Strengthen security posture: SHA-256 pre-hash before bcrypt, defer load_dotenv() to runtime config parsing, and add per-IP cooldown on /setup-status with documentation.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
backend/tests/test_langgraph_auth.py Updates assertions to expect the new generic “Invalid token” message.
backend/tests/test_initialize_admin.py Clears the new /setup-status cooldown state between tests for isolation.
backend/tests/test_auth.py Updates require_auth test to expect 401 on unauthenticated requests.
backend/packages/harness/deerflow/runtime/journal.py Fixes RunJournal condition so first human message is captured when messages exist.
backend/app/gateway/routers/auth.py Documents in-process rate limiter limits and adds /setup-status per-IP cooldown.
backend/app/gateway/langgraph_auth.py Returns generic “Invalid token” instead of leaking token error variants.
backend/app/gateway/authz.py Makes @require_auth independently enforce authentication (401) without relying on middleware.
backend/app/gateway/auth/password.py Adds SHA-256 pre-hashing before bcrypt for long-password safety.
backend/app/gateway/auth/config.py Moves load_dotenv() into get_auth_config() to avoid import-time env mutation.

Comment on lines +15 to 28
def _pre_hash(password: str) -> bytes:
"""Pre-hash password with SHA-256 to bypass bcrypt's 72-byte limit."""
return base64.b64encode(hashlib.sha256(password.encode("utf-8")).digest())


def hash_password(password: str) -> str:
"""Hash a password using bcrypt."""
return bcrypt.hashpw(password.encode("utf-8"), bcrypt.gensalt()).decode("utf-8")
"""Hash a password using bcrypt with SHA-256 pre-hashing."""
return bcrypt.hashpw(_pre_hash(password), bcrypt.gensalt()).decode("utf-8")


def verify_password(plain_password: str, hashed_password: str) -> bool:
"""Verify a password against its hash."""
return bcrypt.checkpw(plain_password.encode("utf-8"), hashed_password.encode("utf-8"))
return bcrypt.checkpw(_pre_hash(plain_password), hashed_password.encode("utf-8"))

Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing verify_password() to always SHA-256 pre-hash before bcrypt will break authentication for any existing users whose password_hash values were generated with the old (raw-password) bcrypt scheme. Consider supporting both formats during a transition (e.g., try legacy check first, then new check; or version/prefix hashes and migrate on successful login), otherwise this is a backwards-incompatible auth change that can lock out current deployments after upgrade.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greatmengqi I'd been hit by this issue. It should work, as we don't have a public release yet. But we also need to provide a password reset method for users to use.

Comment on lines +392 to +397
client_ip = _get_client_ip(request)
now = time.time()
last_check = _SETUP_STATUS_COOLDOWN.get(client_ip, 0)
if now - last_check < _SETUP_STATUS_COOLDOWN_SECONDS:
return {"needs_setup": False}
_SETUP_STATUS_COOLDOWN[client_ip] = now
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /setup-status cooldown currently returns {"needs_setup": False} during the cooldown window, which can mislead legitimate clients (e.g., UI polling) into thinking setup is complete when it may not be. Instead of returning a falsified state, consider returning 429 with Retry-After, or returning a cached last-computed needs_setup value per IP while enforcing the refresh cooldown.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 9d83642. The /setup-status endpoint now returns HTTP 429 with a Retry-After header (value = remaining cooldown seconds) instead of the misleading {"needs_setup": False} response. The _SETUP_STATUS_COOLDOWN dict also gets bounded eviction (capped at 10,000 IPs) using the same TTL-first then oldest-half strategy as _login_attempts. A new test test_setup_status_rate_limited_on_second_call covers the 429 + Retry-After behavior.

Comment on lines +385 to +387
_SETUP_STATUS_COOLDOWN: dict[str, float] = {}
_SETUP_STATUS_COOLDOWN_SECONDS = 60

Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_SETUP_STATUS_COOLDOWN is an unbounded in-memory dict keyed by (potentially attacker-controlled) client IPs. Over time this can grow without limit and become a small memory-DoS vector. Consider adding eviction (max entries + TTL pruning) similar to the _login_attempts limiter, or using a bounded LRU/TTL cache implementation.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already addressed in commit 9d83642 as part of the /setup-status rate-limiting fix. _SETUP_STATUS_COOLDOWN is now capped at _MAX_TRACKED_SETUP_STATUS_IPS = 10000 entries. When the cap is reached, expired (TTL-past) entries are evicted first; if still over the cap, the oldest half is dropped — mirroring the _login_attempts eviction strategy.

Comment thread backend/app/gateway/routers/auth.py Outdated
Comment on lines +395 to +396
if now - last_check < _SETUP_STATUS_COOLDOWN_SECONDS:
return {"needs_setup": False}
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new /setup-status rate limiting behavior isn’t covered by tests (e.g., second call within the cooldown window). Adding a targeted test would help prevent regressions and clarify the intended client-visible behavior (429 vs cached value vs other).

Suggested change
if now - last_check < _SETUP_STATUS_COOLDOWN_SECONDS:
return {"needs_setup": False}
elapsed = now - last_check
if elapsed < _SETUP_STATUS_COOLDOWN_SECONDS:
retry_after = max(1, int(_SETUP_STATUS_COOLDOWN_SECONDS - elapsed))
raise HTTPException(
status_code=status.HTTP_429_TOO_MANY_REQUESTS,
detail="Setup status check is rate limited",
headers={"Retry-After": str(retry_after)},
)

Copilot uses AI. Check for mistakes.
…nd cooldown dict

Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/070d0be8-99a5-46c8-85bb-6b81b5284021

Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
@greatmengqi
Copy link
Copy Markdown
Collaborator

整体方向对:7 个安全问题里 5 个修得干净(require_auth 独立鉴权、token 错误信息脱敏、load_dotenv 延迟、rate-limiter 文档化、setup-status cooldown)。但 journal 那个"逻辑 bug"修复本身是错的,建议 block 到改完为止。

🔴 必须修:journal.py 的修复是半成品

# Before
if not self._first_human_msg and not messages:
    for batch in messages.reversed():
        for m in batch.reversed():

# After (本 PR)
if not self._first_human_msg and messages:
    for batch in messages.reversed():       # ← AttributeError
        for m in batch.reversed():          # ← AttributeError

list 在 Python 里没有 .reversed() 方法,只有内置 reversed(seq)list.reverse()(in-place、返回 None)。本地验过:

>>> [[1,2,3]].reversed()
AttributeError: 'list' object has no attribute 'reversed'

原 guard not messages 顺带把这个 AttributeError 藏住了——条件永远是 False,循环体从未执行过。本 PR 把 guard 改对之后,首次进入循环就会抛异常。LangChain callback 一般会吞掉回调异常并继续,所以表现不是 500,而是:

  • _first_human_msg 仍然永远不被设置
  • run.input 仍然永远是空
  • PR 自称"first human message now captured correctly" —— 实际仍然 broken

应改为:

if not self._first_human_msg and messages:
    for batch in reversed(messages):
        for m in reversed(batch):
            ...

而且这个改动没有任何回归测试。建议补一个 unit test:构造 [[HumanMessage(content="hi")]],调用 on_chat_model_start,断言 _first_human_msg == "hi"。否则同一个 bug 还能再循环一次。

🟠 应改:password.py 的 schema break 没说

hash_password 的输入函数从 bcrypt(pwd) 变成 bcrypt(b64(sha256(pwd)))。这等价于换了个不兼容的密码哈希算法——

  • DB 里所有已存在的 bcrypt 哈希在升级后全部 verify 失败
  • 用户表现为"密码忽然全错"
  • 没有迁移路径、没有版本前缀(如 \$dfv2\$...)、没有 fallback verify

如果这个 auth 系统还没正式发出去,加一行 release note 说明"仅适用于全新部署"即可。如果已经有线上用户,必须做迁移:要么登录时检测旧格式 hash 自动 rehash,要么在 hash 前加版本前缀双路径并存。

🟠 应改:PR 描述和实现对不上

PR 描述写的是:

Added per-IP 60s cooldown; rapid probing returns needs_setup=False

但代码返回的是:

raise HTTPException(status_code=status.HTTP_429_TOO_MANY_REQUESTS, ...)

两种行为安全语义差别很大:

  • 返 false:信息伪装,攻击者拿不到真状态,但合法前端轮询也会看到假数据
  • 返 429:诚实拒绝,但在状态码层面仍然侧信道泄露"这个 IP 刚问过"

代码实现是 429,建议改 PR description 与之对齐。或者换成"返回缓存的最近一次结果"——这样既不泄露探测、也不打断合法前端。

🟡 建议:_SETUP_STATUS_COOLDOWN 的 multi-worker 局限和 login rate-limiter 是同一个

auth.py:155 已经写了详细注释说"per-worker 独立、生产环境建议换 Redis",但同一文件 388 行新加的 _SETUP_STATUS_COOLDOWN 是同种类型的进程内 dict、同样的局限,却没有一行注释。要么照抄 disclaimer,要么干脆抽成一个 helper(InProcessCooldown / InProcessRateLimiter),两处复用,未来一起换 Redis。

🟡 概念:信息泄露的修复打在了错位置

/setup-status 没解决根问题——攻击者可以直接探测 /initialize 来判断初始化窗口是否还开着。/setup-status 只是 UX 钩子。更对的修法是给 /initialize 自己上一次性 setup-token:第一次启动时把随机 token 打进 server 日志,必须带这个 token 才能初始化(Sonarr / Radarr / Vaultwarden 都用这套)。这样 /setup-status 是否泄露都无所谓。本 PR 不一定要做,至少留个 follow-up issue。

🟢 修得好的部分

修复 文件 评价
require_auth 独立鉴权 authz.py ✓ 修好了装饰器和 middleware 必须共生的耦合,测试也更新了
Token 错误脱敏 langgraph_auth.py ✓ Invalid token 是标准做法,回归测试同步
load_dotenv 延迟 config.py ✓ 解决 import side effect 污染测试隔离
限流文档化 auth.py ✓ 多 worker 弱点说清楚即可,工程取舍合理

推荐 Action

  1. 必改:journal.py 的 `messages.reversed()` / `batch.reversed()` → `reversed(messages)` / `reversed(batch)`,加 unit test
  2. 应改:password.py 加迁移说明或版本前缀;PR description 与 429 对齐
  3. 建议:抽一个 `InProcessCooldown` helper 共用;follow-up issue 跟踪 `/initialize` 的 setup-token 方案

@WillemJiang
Copy link
Copy Markdown
Collaborator Author

@greatmengqi journal.py messages.reversed() / batch.reversed()reversed(messages) / reversed(batch) is fixed in #2604

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants