How many tokens are used to train the base models? #1386

Unanswered

clairez-cerebras asked this question in Q&A

clairez-cerebras
May 9, 2025

Hi! I'm wondering for the base models released (Qwen/Qwen3-0.6B-Base, Qwen/Qwen3-1.7B-Base, Qwen/Qwen3-4B-Base, Qwen/Qwen3-8B-Base, Qwen/Qwen3-14B-Base), how many tokens are used to train each of them? The blog post seems to only report the total tokens used to train the largest model (Qwen3-235B-A22B Base?). Would love to learn about the base models too. Thanks!

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment