Skip to content

Conversation

googs1025
Copy link
Member

What this PR does / why we need it

  • support vllm cpu in backendruntime

Which issue(s) this PR fixes

Fixes #

Special notes for your reviewer

Does this PR introduce a user-facing change?

support vllm cpu in backendruntime

@InftyAI-Agent InftyAI-Agent added needs-triage Indicates an issue or PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Jun 29, 2025
@InftyAI-Agent InftyAI-Agent requested review from cr7258 and carlory June 29, 2025 02:28
@@ -0,0 +1,75 @@
{{- if .Values.backendRuntime.enabled -}}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest: I think this is a bit redundant, because vllm cpu image is maintained separately (see: https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo), and at this stage we seem to have to use another backend runtime to maintain it. 🤔

@googs1025
Copy link
Member Author

root@VM-0-13-ubuntu:/home/ubuntu# kubectl get pods
NAME            READY   STATUS    RESTARTS   AGE
qwen3-0--6b-0   1/1     Running   0          24m
root@VM-0-13-ubuntu:/home/ubuntu# kubectl get pods -oyaml | grep image
      image: public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.8.5
      imagePullPolicy: IfNotPresent
      image: inftyai/model-loader:v0.0.10
      imagePullPolicy: IfNotPresent
      image: public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.8.5
      imageID: public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo@sha256:36444ac581f98dc4336a7044e4c9858d5b3997a7bda1e952c03af8b6917c8311
      image: docker.io/inftyai/model-loader:v0.0.10
      imageID: docker.io/inftyai/model-loader@sha256:b67a8bb3acbc496a62801b2110056b9774e52ddc029b379c7370113c7879c7d9

@googs1025
Copy link
Member Author

root@VM-0-13-ubuntu:/home/ubuntu# kubectl port-forward svc/qwen3-0--6b-lb 8080:8080
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080

@googs1025
Copy link
Member Author

root@VM-0-13-ubuntu:/home/ubuntu# curl -X POST http://localhost:8080/v1/chat/completions   -H 'Content-Type: application/json'   -d '{
        "model": "qwen3-0--6b",
        "messages": [
          {
            "role": "user",
            "content": "Who are you?"
          }
        ]
      }' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1436  100  1273  100   163    111     14  0:00:11  0:00:11 --:--:--   393
{
  "id": "chatcmpl-7f5a02b2bd964760832cdf7f5e0a104d",
  "object": "chat.completion",
  "created": 1751164548,
  "model": "qwen3-0--6b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "<think>\nOkay, the user is asking, \"Who are you?\" So, I need to respond appropriately. Let me start by recalling the previous interactions. The user mentioned they are asking about my identity, so I should confirm my name and provide a brief description.\n\nI should make sure to be friendly and offer further assistance. Maybe mention that I am a language model, but also highlight that I can help with various tasks. It's important to keep the response straightforward and conversational.\n\nWait, should I use a specific name? The user might be expecting a name, so I should include that. Also, avoid any technical jargon. Keep the tone natural and helpful. Let me put that together.\n</think>\n\nI am a language model developed by OpenAI. I can help with a wide range of tasks, from answering questions to providing information. How can I assist you today?",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 190,
    "completion_tokens": 178,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}
root@VM-0-13-ubuntu:/home/ubuntu#

@googs1025
Copy link
Member Author

Seeing that other projects have vllm cpu type example, I think we also can integrate it into our backendruntime.

https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config/manifests/vllm/cpu-deployment.yaml

@googs1025
Copy link
Member Author

/kind feature

@InftyAI-Agent InftyAI-Agent added feature Categorizes issue or PR as related to a new feature. and removed do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Jun 29, 2025
@carlory
Copy link
Member

carlory commented Jun 30, 2025

The image seems only support the x86 architecture. https://docs.vllm.ai/en/latest/getting_started/installation/cpu.html?h=cpu#pre-built-images

@googs1025
Copy link
Member Author

The image seems only support the x86 architecture. https://docs.vllm.ai/en/latest/getting_started/installation/cpu.html?h=cpu#pre-built-images

Yes, according to the docs, currently vllm cpu only supports specific architectures 🤔

app.kubernetes.io/name: backendruntime
app.kubernetes.io/part-of: llmaz
app.kubernetes.io/created-by: llmaz
name: vllmcpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move it to the examples instead of the part of the default template.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems more reasonable, and I will change. In examples, we provide users another way to support vllm(in cpu).

@googs1025 googs1025 force-pushed the support/vllm-cpu branch 2 times, most recently from e1d7d06 to 78af4c3 Compare July 3, 2025 13:55
Signed-off-by: googs1025 <googs1025@gmail.com>
@googs1025
Copy link
Member Author

cc @kerthcet

Copy link
Contributor

@cr7258 cr7258 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐉 LGTM

@carlory
Copy link
Member

carlory commented Jul 7, 2025

/assign @kerthcet

@googs1025
Copy link
Member Author

friendly ping @kerthcet
can you check this when you have time? 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a label and requires one. needs-triage Indicates an issue or PR lacks a label and requires one.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants