GRPO Comprehensive Review — Tracking Issue
This issue tracks all findings from the GRPO code review across both openadapt-ml and openadapt-evals.
Critical — Blocks production training
Important — Required for external integration / RL training use case
Medium — Reliability and performance
Batch 2 — Prompt, data structure, VLM, and correctness fixes
fix(grpo): align prompt format across SFT, GRPO, and CoT warmup #43 — Align prompt format across SFT, GRPO, and CoT warmup (openadapt-ml, bug)
refactor(grpo): replace _grpo_raw_text monkey-patch with proper data structure #44 — Replace _grpo_raw_text monkey-patch with proper data structure (openadapt-ml, refactor)
fix(grpo): VLM processor metadata misalignment after token concatenation #45 — VLM processor metadata misalignment after token concatenation (openadapt-ml, bug)
fix(grpo): inner tokenizer extraction bypasses processor preprocessing #46 — Inner tokenizer extraction bypasses processor preprocessing (openadapt-ml, bug)
perf(grpo): use JSONL append-only format for training log #47 — Use JSONL append-only format for training log (openadapt-ml, performance)
feat(grpo): add gradient accumulation across multiple groups #48 — Add gradient accumulation across multiple groups (openadapt-ml, enhancement)
fix(grpo): coordinate precision loss in fraction-pixel roundtrip #49 — Coordinate precision loss in fraction-pixel roundtrip (openadapt-ml, bug)
fix(grpo): minor issues (regex whitespace, CoT adapter reuse, shutil import) #50 — Minor issues: regex whitespace, CoT adapter reuse, shutil import (openadapt-ml, bug)
GRPO Comprehensive Review — Tracking Issue
This issue tracks all findings from the GRPO code review across both
openadapt-mlandopenadapt-evals.Critical — Blocks production training
Important — Required for external integration / RL training use case
Medium — Reliability and performance
Batch 2 — Prompt, data structure, VLM, and correctness fixes