You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
First of all, great work. I am big proponent of FLan-t5 and use it in my projects. For multilingual, mT5 and bigscience/mt0 models provide a good baseline and are truly multilingual. Does Flash Attention work on mt5 architecture? Seems like only T5 is supported now?