Qwen3-4B 微调后输出全为空? #1512
Unanswered
guxuan123456
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
我在微调一个信息抽取模型,微调后模型全部输出空,问“中国的首都是哪里”也输出空。我的训练参数:
WORKSPACE=./env_run/xxx
export PATH=${WORKSPACE}/env/miniconda3/bin:$PATH
CUDA_VISIBLE_DEVICES=3 swift sft
--model xxx/Qwen3-4B
--train_type lora
--output_dir ./output_v4/Qwen3-4B
--dataset llm_event_train_dataset_v4.json
--gradient_accumulation_steps 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--attn_impl flash_attn
--split_dataset_ratio 0.15
--lazy_tokenize true
--num_train_epochs 6
--save_steps 100
--eval_steps 200
--save_total_limit 200
Beta Was this translation helpful? Give feedback.
All reactions