Skip to content

[QNN:Bugfix] Add support for convert llm/visual mnn model to Qnn mode…#4549

Open
blueskycoco wants to merge 1 commit into
alibaba:masterfrom
blueskycoco:multi-model-convert-qnn
Open

[QNN:Bugfix] Add support for convert llm/visual mnn model to Qnn mode…#4549
blueskycoco wants to merge 1 commit into
alibaba:masterfrom
blueskycoco:multi-model-convert-qnn

Conversation

@blueskycoco

Copy link
Copy Markdown

…l, output to same qnn folder

first visual convert:
python3 npu/generate_llm_qnn.py --model ../../..//model/Qwen3-VL-4B-Instruct-MNN-QNN --soc_id=57 --dsp_arch=v75 --image_sizes 256x256 --model_name visual.mnn

second llm convert:
python3 npu/generate_llm_qnn.py --model ../../..//model/Qwen3-VL-4B-Instruct-MNN-QNN --soc_id=57 --dsp_arch=v75 --reuse_config_qnn_json true

layout of qnn model:
.
|-- config.json
|-- config_qnn.json
|-- embeddings_bf16.bin
|-- export_args.json
|-- llm.mnn
|-- llm.mnn.json
|-- llm.mnn.weight
|-- llm_config.json
|-- qnn
| |-- graphllm0.bin
| |-- graphllm1.bin
| |-- graphllm10.bin
| |-- graphllm11.bin
| |-- graphllm12.bin
| |-- graphllm13.bin
| |-- graphllm14.bin
| |-- graphllm15.bin
| |-- graphllm16.bin
| |-- graphllm17.bin
| |-- graphllm18.bin
| |-- graphllm19.bin
| |-- graphllm2.bin
| |-- graphllm20.bin
| |-- graphllm21.bin
| |-- graphllm22.bin
| |-- graphllm23.bin
| |-- graphllm24.bin
| |-- graphllm25.bin
| |-- graphllm26.bin
| |-- graphllm27.bin
| |-- graphllm28.bin
| |-- graphllm29.bin
| |-- graphllm3.bin
| |-- graphllm30.bin
| |-- graphllm31.bin
| |-- graphllm32.bin
| |-- graphllm33.bin
| |-- graphllm34.bin
| |-- graphllm35.bin
| |-- graphllm36.bin
| |-- graphllm37.bin
| |-- graphllm4.bin
| |-- graphllm5.bin
| |-- graphllm6.bin
| |-- graphllm7.bin
| |-- graphllm8.bin
| |-- graphllm9.bin
| |-- graphvisual0.bin
| |-- llm.mnn
| -- visual.mnn |-- tokenizer.mtok |-- visual.mnn -- visual.mnn.weight

2 directories, 52 files

push Qwen3-VL-4B-Instruct-MNN-QNN to /data/local/tmp, then will get image inference on npu with vision and decode stage.

Description

Module

Type

  • Feature
  • [ *] Bugfix
  • Perf
  • Refact
  • Style
  • Doc
  • Test
  • Chore

Checklist

  • Commit message follows [Module:Type] Description format
  • Code compiles without errors
  • [* ] Tested on relevant platform(s)
  • No unrelated format or style changes included

该提交主要是为了方便通过同一个qnn模型,在npu上做图像/文本推理。目前现状似乎是图像推理,vision部分是跑在npu上,输出text部分(decode)是跑在cpu上,这笔提交是为了让这两个阶段都跑在npu.

…l, output to same qnn folder

first visual convert:
python3 npu/generate_llm_qnn.py --model ../../..//model/Qwen3-VL-4B-Instruct-MNN-QNN --soc_id=57 --dsp_arch=v75 --image_sizes 256x256 --model_name visual.mnn

second llm convert:
python3 npu/generate_llm_qnn.py --model ../../..//model/Qwen3-VL-4B-Instruct-MNN-QNN --soc_id=57 --dsp_arch=v75 --reuse_config_qnn_json true

layout of qnn model:
.
|-- config.json
|-- config_qnn.json
|-- embeddings_bf16.bin
|-- export_args.json
|-- llm.mnn
|-- llm.mnn.json
|-- llm.mnn.weight
|-- llm_config.json
|-- qnn
|   |-- graphllm0.bin
|   |-- graphllm1.bin
|   |-- graphllm10.bin
|   |-- graphllm11.bin
|   |-- graphllm12.bin
|   |-- graphllm13.bin
|   |-- graphllm14.bin
|   |-- graphllm15.bin
|   |-- graphllm16.bin
|   |-- graphllm17.bin
|   |-- graphllm18.bin
|   |-- graphllm19.bin
|   |-- graphllm2.bin
|   |-- graphllm20.bin
|   |-- graphllm21.bin
|   |-- graphllm22.bin
|   |-- graphllm23.bin
|   |-- graphllm24.bin
|   |-- graphllm25.bin
|   |-- graphllm26.bin
|   |-- graphllm27.bin
|   |-- graphllm28.bin
|   |-- graphllm29.bin
|   |-- graphllm3.bin
|   |-- graphllm30.bin
|   |-- graphllm31.bin
|   |-- graphllm32.bin
|   |-- graphllm33.bin
|   |-- graphllm34.bin
|   |-- graphllm35.bin
|   |-- graphllm36.bin
|   |-- graphllm37.bin
|   |-- graphllm4.bin
|   |-- graphllm5.bin
|   |-- graphllm6.bin
|   |-- graphllm7.bin
|   |-- graphllm8.bin
|   |-- graphllm9.bin
|   |-- graphvisual0.bin
|   |-- llm.mnn
|   `-- visual.mnn
|-- tokenizer.mtok
|-- visual.mnn
`-- visual.mnn.weight

2 directories, 52 files

push Qwen3-VL-4B-Instruct-MNN-QNN to /data/local/tmp, then will get image inference on npu with vision and decode stage.

Signed-off-by: Dillon Min <dillonhua@gmail.com>
@CLAassistant

CLAassistant commented Jun 16, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants