Does VlmPipeline ignore pipeline_options.do_picture_description = True? #2434

adrianad · 2025-10-10T14:33:10Z

adrianad
Oct 10, 2025

Wanted to use granite docling for OCR and Qwen3 VL for image description, but the option doesnt seem to work with the VlmPipeline. Nothing happens when I add it. I do have PictureItems in my result, but only with the caption. The Qwen 3 VL is never called. The models are running and i can see their VLLM logs, granite receives requests, Qwen doesnt.

Is this intended, or am I doing something wrong?

I tried to combine two examples from the docs:

import json
import logging
import os
from pathlib import Path
from typing import Optional

import requests
from docling_core.types.doc.page import SegmentedPage
from docling_core.types.doc import PictureItem
from dotenv import load_dotenv

from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import (
    VlmPipelineOptions,
    PictureDescriptionApiOptions
)
from docling.datamodel.pipeline_options_vlm_model import ApiVlmOptions, ResponseFormat
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.pipeline.vlm_pipeline import VlmPipeline

### Example of ApiVlmOptions definitions

#### Using LM Studio or VLLM (OpenAI-compatible APIs)


def openai_compatible_vlm_options(
    model: str,
    prompt: str,
    format: ResponseFormat,
    hostname_and_port,
    temperature: float = 0.7,
    max_tokens: int = 4096,
    api_key: str = "",
    skip_special_tokens=False,
):
    headers = {}
    if api_key:
        headers["Authorization"] = f"Bearer {api_key}"

    options = ApiVlmOptions(
        url=f"http://{hostname_and_port}/v1/chat/completions",  # LM studio defaults to port 1234, VLLM to 8000
        params=dict(
            model=model,
            max_tokens=max_tokens,
            skip_special_tokens=skip_special_tokens,  # needed for VLLM
        ),
        headers=headers,
        prompt=prompt,
        timeout=90,
        scale=2.0,
        temperature=temperature,
        response_format=format,
        concurrency=20,
    )
    return options

def vllm_local_options(model: str):
    options = PictureDescriptionApiOptions(
        url="http://x.x.x.x:8000/v1/chat/completions",
        params=dict(
            model=model,
            seed=42,
            max_completion_tokens=200,
        ),
        prompt="Describe the image.",
        timeout=90,
        concurrency=20,
    )
    return options



### Usage and conversion
logging.basicConfig(level=logging.DEBUG)

input_doc_path = "./Downloads/Open-ended_Deep_Reinforcement_Learning_for_a_Bitcoin_Trading_Bot.pdf"

# Configure the VLM pipeline. Enabling remote services allows HTTP calls to
# locally hosted APIs (LM Studio, Ollama) or cloud services.
pipeline_options = VlmPipelineOptions(
    enable_remote_services=True,  # required when calling remote VLM endpoints
    generate_picture_images=True,
    do_picture_description=True,
    picture_description_options=vllm_local_options(model="LLM")
)

# Example using the Granite-Docling model with LM Studio or VLLM:
pipeline_options.vlm_options = openai_compatible_vlm_options(
    model="ibm-granite/granite-docling-258M",  # For VLLM use "ibm-granite/granite-docling-258M"
    hostname_and_port="x.x.x.x:8001",  # LM studio defaults to port 1234, VLLM to 8000
    prompt="Convert this page to docling.",
    format=ResponseFormat.DOCTAGS,
    api_key="dummy",
)

# Create the DocumentConverter and launch the conversion.
doc_converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_options=pipeline_options,
            pipeline_cls=VlmPipeline,
        )
    }
)
result = doc_converter.convert(input_doc_path)
print(result.document.export_to_markdown())

LifangD · 2026-01-06T08:39:05Z

LifangD
Jan 6, 2026

it seems the markdown format that works

0 replies

david-kort · 2026-04-20T12:54:23Z

david-kort
Apr 20, 2026

I'm experiencing the same issue; as soon as I switch from doclings standard OCR pipeline to the VlmPipeline, the enabled do_picture_description and do_picture_classification enrichment feature settings don't have any effect and are ignored.

0 replies

awschult002 · 2026-04-24T02:38:58Z

awschult002
Apr 24, 2026

this has been my experience. the VLM pipeline seems to expect that a single pass through a VLM model will do EVERYTHING including extracting text and generating the descriptions or classifications for images/figures. However, when using Granite-Docling (default) that model is NOT designed to do general image descriptions, it only extracts text and labels images/figures.

The ONLY WAY i have gotten a more complete package is by taking the VLM response, then looping through the doctags for figures, then sending those figures off one at a time to another model to get descriptions. I even tried working with the enrichment pipelines to no avail.

This is incredibly annoying and I desperately need it to be fixed before I can deploy it for my company. (VLM is a must, we have lots of scanned documents)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does VlmPipeline ignore pipeline_options.do_picture_description = True? #2434

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Does VlmPipeline ignore pipeline_options.do_picture_description = True? #2434

Uh oh!

Uh oh!

adrianad Oct 10, 2025

Replies: 3 comments

Uh oh!

Uh oh!

LifangD Jan 6, 2026

Uh oh!

david-kort Apr 20, 2026

Uh oh!

awschult002 Apr 24, 2026

adrianad
Oct 10, 2025

LifangD
Jan 6, 2026

david-kort
Apr 20, 2026

awschult002
Apr 24, 2026