Replies: 1 comment
-
This is not a numerical stability issue where an alternative set of operations might be more suitable, simply that the input is not really in a valid range. Adding such guards typically adds unnecessary overhead especially on GPU, so perhaps it is better to leave it to the user to clamp the input values e.g. gelu(tensor.clamp(NEG_MAX, POS_MAX)) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I noticed that GELU may return NaN for -inf inputs due to the x * Φ(x) formulation. In cases like MaxPool padding, -inf can be passed into GELU, and I’ve seen it lead to NaN during testing. Curious if you think it's worth explicitly guarding for this case?
Beta Was this translation helpful? Give feedback.
All reactions