All notable changes to ComfyUI-MultiModal-Prompt-Nodes will be documented in this file.
-
Added support for Qwen3.5 local GGUF models
- Added Qwen3.5 model detection and proper handler selection (
Qwen35ChatHandler) - Fixed incorrect fallback to
Qwen3VLChatHandlerfor Qwen3.5 model names - Updated mmproj handling for Qwen3.5 (requirement checks and auto-detection flow)
- Added Qwen3.5 model detection and proper handler selection (
-
Improved post-run cleanup behavior for local model nodes
VisionLLMNode,WanVideoPromptGenerator, andQwenImageEditPromptGeneratornow callcleanup()at the end of execution- Introduced
cleanup(finalize=False/True)to separate regular unload from final teardown on process exit - Added safe manager re-initialization after cleanup for stable repeated runs
-
Expanded the search scope for local Qwen-family GGUF models
- Added
models/text_encodersand all subdirectories under bothmodels/LLMandmodels/text_encodersto the search paths - Centralized model path resolution and mmproj resolution in
local_gguf_utils.pyto reduce duplicated logic
- Added
-
Improved mmproj selection behavior
- The UI now shows only mmproj files from the same directory as the selected GGUF model
- When
mmproj = (Not required)is selected, the node now explicitly switches to text-only mode to avoid unnecessary vision handler usage
-
Strengthened the local prompt rewrite flow for Qwen and Wan
- Added dedicated system prompts for
qwen_image,qwen_image_edit,wan_t2v, andwan_i2v - Tightened prompt instructions to reduce verbose analysis-style responses and make it easier to return only the final prompt body
- Added a second pass that preserves quoted text and normalizes the result to Simplified Chinese when Chinese output is requested but another language is returned
- Added dedicated system prompts for
-
Expanded Qwen Image Edit Prompt Generator
- Made the
imageinput optional soQwen-Imagecan also be used for text-only prompt generation - Treated mmproj as not required when
Qwen-Imageis run locally without images - Increased local inference
max_tokensandn_ctxto better support longer prompt generation
- Made the
-
Improved the robustness of Wan Video Prompt Generator
- Added explicit validation errors when Image-to-Video is used without an input image
-
Fixed issue where
Qwen2.5-VLwere always loaded in text-only mode even when a valid mmproj file was specified.- Added vision chat handler support for
Qwen2.5-VL - Enable vision mode automatically when supported model + mmproj are present
- Added vision chat handler support for
-
Improved mmproj auto-detection logic
- Auto-detect now selects mmproj files based on model family prefix (qwen2, qwen3) instead of arbitrary alphabetical fallback
- Prevents incorrect mmproj selection when multiple mmproj files exist in the same directory
- Fixed incorrect detection of Qwen3-VL when mmproj is set to (Not required).
- Disabled automatic mmproj detection and prevented use of the VL handler in this case.
- Updated GGUFModelManager.load_model and node-side mmproj interpretation to correctly respect (Not required).
- Fixed an issue where incorrect mmproj could remain loaded when switching between Qwen3-VL GGUF models
- Properly unload and reload GGUF models when model or mmproj changes
- Prevent stale vision projectors from being reused across different Qwen3-VL models
- Improved mmproj auto-detection logic to avoid accidentally picking mmproj files from other models
- Refined internal GGUF model lifecycle management for better stability when switching models (e.g. 8B ↔ 4B)
- Minor internal refactors to reduce state leakage in llama-cpp-python based vision models
- Improved README documentation for clarity and accuracy:
- Clarified project scope as a prompt generator for QwenImageEdit and Wan2.2
- Reorganized Credits and Dependencies to clearly separate derived works and external dependencies
- Updated llama-cpp-python installation notes to reference the JamePeng fork documentation directly, avoiding incomplete or misleading installation instructions
- Added a
backends/directory as a structural placeholder- This directory does not change behavior in v1.0.6
- Reserved for future refactoring of Local GGUF and Cloud API backends without changing node interfaces
- General Prompt Rewriter (nodes.py) - Removed as it was unchanged from original ComfyUI-QwenPromptRewriter
- Users should use the original ComfyUI-QwenPromptRewriter for this functionality
- Updated documentation to reflect 3-node architecture
- Clarified project scope: focus on multimodal (vision + text) capabilities
- kantan-kanto (https://github.com/kantan-kanto) - Initial development and release
- Initial release of ComfyUI-MultiModal-Prompt-Nodes
- Vision LLM Node: Local GGUF vision language model support
- Qwen2.5-VL and Qwen3-VL compatibility
- Multi-image input (batch support)
- Multiple style presets (raw, default, detailed, concise, creative)
- Qwen Image Edit Prompt Generator: Image editing prompt generation
- Dynamic model selection (local GGUF + cloud API)
- Manual mmproj selection for Qwen3-VL
- Multi-image support via image2/image3 inputs
- Wan Video Prompt Generator: Video generation prompt optimization
- Text-to-Video and Image-to-Video task support
- Local Qwen3-VL integration
- Wan2.2-specific prompt templates
- GPL-3.0 license (due to llama-cpp-python dependency)
- Unified category:
multimodal/prompt - Python 3.10+ support
- Comprehensive error handling and user feedback
- Workflow templates
- Extended documentation