Does this work for training?

Just trying it out and I presume use_triton=False for training. Have you tried it with training/

With    attention_type="flash", use_triton=Falsen with bf16 I get `atomic_add does not support bf16 with triton.`

with fp16 it hangs after one forward pass. I know training can be different for flash.
 But, I think T5 can only be trained with bf16 anyway. 

What mode would you recommend for training long contexts on novel task, different from LM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does this work for training? #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Does this work for training? #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions