[QNN:Bugfix] Add support for convert llm/visual mnn model to Qnn mode…#4549
Open
blueskycoco wants to merge 1 commit into
Open
[QNN:Bugfix] Add support for convert llm/visual mnn model to Qnn mode…#4549blueskycoco wants to merge 1 commit into
blueskycoco wants to merge 1 commit into
Conversation
…l, output to same qnn folder first visual convert: python3 npu/generate_llm_qnn.py --model ../../..//model/Qwen3-VL-4B-Instruct-MNN-QNN --soc_id=57 --dsp_arch=v75 --image_sizes 256x256 --model_name visual.mnn second llm convert: python3 npu/generate_llm_qnn.py --model ../../..//model/Qwen3-VL-4B-Instruct-MNN-QNN --soc_id=57 --dsp_arch=v75 --reuse_config_qnn_json true layout of qnn model: . |-- config.json |-- config_qnn.json |-- embeddings_bf16.bin |-- export_args.json |-- llm.mnn |-- llm.mnn.json |-- llm.mnn.weight |-- llm_config.json |-- qnn | |-- graphllm0.bin | |-- graphllm1.bin | |-- graphllm10.bin | |-- graphllm11.bin | |-- graphllm12.bin | |-- graphllm13.bin | |-- graphllm14.bin | |-- graphllm15.bin | |-- graphllm16.bin | |-- graphllm17.bin | |-- graphllm18.bin | |-- graphllm19.bin | |-- graphllm2.bin | |-- graphllm20.bin | |-- graphllm21.bin | |-- graphllm22.bin | |-- graphllm23.bin | |-- graphllm24.bin | |-- graphllm25.bin | |-- graphllm26.bin | |-- graphllm27.bin | |-- graphllm28.bin | |-- graphllm29.bin | |-- graphllm3.bin | |-- graphllm30.bin | |-- graphllm31.bin | |-- graphllm32.bin | |-- graphllm33.bin | |-- graphllm34.bin | |-- graphllm35.bin | |-- graphllm36.bin | |-- graphllm37.bin | |-- graphllm4.bin | |-- graphllm5.bin | |-- graphllm6.bin | |-- graphllm7.bin | |-- graphllm8.bin | |-- graphllm9.bin | |-- graphvisual0.bin | |-- llm.mnn | `-- visual.mnn |-- tokenizer.mtok |-- visual.mnn `-- visual.mnn.weight 2 directories, 52 files push Qwen3-VL-4B-Instruct-MNN-QNN to /data/local/tmp, then will get image inference on npu with vision and decode stage. Signed-off-by: Dillon Min <dillonhua@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…l, output to same qnn folder
first visual convert:
python3 npu/generate_llm_qnn.py --model ../../..//model/Qwen3-VL-4B-Instruct-MNN-QNN --soc_id=57 --dsp_arch=v75 --image_sizes 256x256 --model_name visual.mnn
second llm convert:
python3 npu/generate_llm_qnn.py --model ../../..//model/Qwen3-VL-4B-Instruct-MNN-QNN --soc_id=57 --dsp_arch=v75 --reuse_config_qnn_json true
layout of qnn model:
.
|-- config.json
|-- config_qnn.json
|-- embeddings_bf16.bin
|-- export_args.json
|-- llm.mnn
|-- llm.mnn.json
|-- llm.mnn.weight
|-- llm_config.json
|-- qnn
| |-- graphllm0.bin
| |-- graphllm1.bin
| |-- graphllm10.bin
| |-- graphllm11.bin
| |-- graphllm12.bin
| |-- graphllm13.bin
| |-- graphllm14.bin
| |-- graphllm15.bin
| |-- graphllm16.bin
| |-- graphllm17.bin
| |-- graphllm18.bin
| |-- graphllm19.bin
| |-- graphllm2.bin
| |-- graphllm20.bin
| |-- graphllm21.bin
| |-- graphllm22.bin
| |-- graphllm23.bin
| |-- graphllm24.bin
| |-- graphllm25.bin
| |-- graphllm26.bin
| |-- graphllm27.bin
| |-- graphllm28.bin
| |-- graphllm29.bin
| |-- graphllm3.bin
| |-- graphllm30.bin
| |-- graphllm31.bin
| |-- graphllm32.bin
| |-- graphllm33.bin
| |-- graphllm34.bin
| |-- graphllm35.bin
| |-- graphllm36.bin
| |-- graphllm37.bin
| |-- graphllm4.bin
| |-- graphllm5.bin
| |-- graphllm6.bin
| |-- graphllm7.bin
| |-- graphllm8.bin
| |-- graphllm9.bin
| |-- graphvisual0.bin
| |-- llm.mnn
|
-- visual.mnn |-- tokenizer.mtok |-- visual.mnn-- visual.mnn.weight2 directories, 52 files
push Qwen3-VL-4B-Instruct-MNN-QNN to /data/local/tmp, then will get image inference on npu with vision and decode stage.
Description
Module
Type
Checklist
[Module:Type] Descriptionformat该提交主要是为了方便通过同一个qnn模型,在npu上做图像/文本推理。目前现状似乎是图像推理,vision部分是跑在npu上,输出text部分(decode)是跑在cpu上,这笔提交是为了让这两个阶段都跑在npu.