LLM Model Performance Notes

The following results were obtained on a Jetson Orin™ NX 8GB module configured in 25W power mode to ensure optimal performance. Jetson™ clocks were enabled to maximize CPU and GPU frequency during benchmarking and evaluation. These models were pulled from the official repository of Ollama.

Model Family	Parameters	Quantization	Size	Performance
DeepSeek R1	1.5 B	Q4_K_M	1.1 GB	~15-17 tokens/sec
DeepSeek R1	7 B	Q4_K_M	4.7 GB	~5-7 tokens/sec
DeepSeek Coder	1.3 B	Q4_0	776 MB	~20-25 tokens/sec
Llama 3.2	1 B	Q8_0	1.3 GB	~17-20 tokens/sec
Llama 3.2 Instruct	1 B	Q4_0	~0.8 GB	~17-20 tokens/sec
Llama 3.2	3 B	Q4_K_M	2 GB	~10-12 tokens/sec
Llama 2	7 B	Q4_0	3.8 GB	~5-7 tokens/sec
Tinyllama	1.1 B	Q4_0	637 MB	~22-27 tokens/sec
Qwen 2.5	0.5 B	Q4_K_M	398 MB	~25-30 tokens/sec
Qwen 2.5	1.5 B	Q4_K_M	986 MB	~15-17 tokens/sec
Qwen 2.5 Coder	0.5 B	Q8_0	531 MB	~25-30 tokens/sec
Qwen 2.5 Coder	1.5 B	Q4_K_M	986 MB	~15-17 tokens/sec
Qwen	0.5 B	Q4_0	395 MB	~25-30 tokens/sec
Qwen	1.8 B	Q4_0	1.1 GB	~15-20 tokens/sec
Gemma 2	2 B	Q4_0	1.6 GB	~10-12 tokens/sec
Mistral	7 B	Q4_0	4.1 GB	~5-7 tokens/sec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Model Performance Notes

FilesExpand file tree

llm-models-performance-notes-readme.md

Latest commit

History

llm-models-performance-notes-readme.md

File metadata and controls

LLM Model Performance Notes