Replies: 3 comments
-
it seems the markdown format that works |
Beta Was this translation helpful? Give feedback.
-
|
I'm experiencing the same issue; as soon as I switch from doclings standard OCR pipeline to the VlmPipeline, the enabled |
Beta Was this translation helpful? Give feedback.
-
|
this has been my experience. the VLM pipeline seems to expect that a single pass through a VLM model will do EVERYTHING including extracting text and generating the descriptions or classifications for images/figures. However, when using Granite-Docling (default) that model is NOT designed to do general image descriptions, it only extracts text and labels images/figures. The ONLY WAY i have gotten a more complete package is by taking the VLM response, then looping through the doctags for figures, then sending those figures off one at a time to another model to get descriptions. I even tried working with the enrichment pipelines to no avail. This is incredibly annoying and I desperately need it to be fixed before I can deploy it for my company. (VLM is a must, we have lots of scanned documents) |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Wanted to use granite docling for OCR and Qwen3 VL for image description, but the option doesnt seem to work with the VlmPipeline. Nothing happens when I add it. I do have PictureItems in my result, but only with the caption. The Qwen 3 VL is never called. The models are running and i can see their VLLM logs, granite receives requests, Qwen doesnt.
Is this intended, or am I doing something wrong?
I tried to combine two examples from the docs:
Beta Was this translation helpful? Give feedback.
All reactions