benchmarks
#1204
Replies: 2 comments 1 reply
-
what's your setup, e.g. GPU?
I'm not sure what this means.
listed at the beginning of the page you referenced. not much can be added but the OS is a Linux distro. |
Beta Was this translation helpful? Give feedback.
0 replies
-
I have rtx 4060 gpu and intel iris xe gpu + core i5 13500H I am not sure, but I think inference runs on cpu. I don't know really how to be sure of it. my vllm code is ;
vllm api -api v0 or v1: |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I have seen your benchmarks
https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html
Inside wsl2.
with qwen2.5-1.5B-Instruct and Transformers I get similar speed, however with vllm , I get around 56tok/sec.
Using vllm api -api v0 or v1 - (inside wsl) I get worse result of 5 token/sec... ,
Please can you detail the setup ?
Beta Was this translation helpful? Give feedback.
All reactions