How many tokens are used to train the base models? #1386
Unanswered
clairez-cerebras
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! I'm wondering for the base models released (Qwen/Qwen3-0.6B-Base, Qwen/Qwen3-1.7B-Base, Qwen/Qwen3-4B-Base, Qwen/Qwen3-8B-Base, Qwen/Qwen3-14B-Base), how many tokens are used to train each of them? The blog post seems to only report the total tokens used to train the largest model (Qwen3-235B-A22B Base?). Would love to learn about the base models too. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions