Skip to content

fix(retool): coerce list prompt to str in reward_func#2120

Open
mvanhorn wants to merge 1 commit into
THUDM:mainfrom
mvanhorn:fix/retool-reward-prompt-concat
Open

fix(retool): coerce list prompt to str in reward_func#2120
mvanhorn wants to merge 1 commit into
THUDM:mainfrom
mvanhorn:fix/retool-reward-prompt-concat

Conversation

@mvanhorn

Copy link
Copy Markdown

Summary

In examples/retool/generate_with_retool.py, reward_func did sample.prompt + sample.response. With --apply-chat-template, sample.prompt is a list[dict], so the concatenation raised TypeError: can only concatenate list (not "str") to list.

Changes

Coerce the prompt to str before concatenation, matching slime's own str(sample.prompt) idiom in sglang_rollout.

Fixes #1829

AI was used for assistance.

With --apply-chat-template, sample.prompt is a list[dict], so
sample.prompt + sample.response raised TypeError. Coerce to str first,
matching slime's own str(sample.prompt) idiom in sglang_rollout.

Fixes THUDM#1829
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] solution_str = sample.prompt + sample.response

1 participant