Qwen3 uses more memory than Qwen2.5 for a similar model size? #1332
Unanswered
DhruvaKartik
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I was checking out Qwen/Qwen3-0.6B on vLLM and noticed this:
vllm serve Qwen/Qwen3-0.6B --max-model-len 8192
Right after this, I ran the following and saw
vllm serve Qwen/Qwen2.5-0.5B-Instruct --max-model-len 8192
How can there be a 10x difference in concurrency for a similar model size? Am I missing something?
Beta Was this translation helpful? Give feedback.
All reactions