Skip to content

Does this work for training? #1

@jamesharrisivi

Description

@jamesharrisivi

Just trying it out and I presume use_triton=False for training. Have you tried it with training/

With attention_type="flash", use_triton=Falsen with bf16 I get atomic_add does not support bf16 with triton.

with fp16 it hangs after one forward pass. I know training can be different for flash.
But, I think T5 can only be trained with bf16 anyway.

What mode would you recommend for training long contexts on novel task, different from LM.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions