-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Just trying it out and I presume use_triton=False for training. Have you tried it with training/
With attention_type="flash", use_triton=Falsen with bf16 I get atomic_add does not support bf16 with triton.
with fp16 it hangs after one forward pass. I know training can be different for flash.
But, I think T5 can only be trained with bf16 anyway.
What mode would you recommend for training long contexts on novel task, different from LM.
Metadata
Metadata
Assignees
Labels
No labels