Should GELU Handle -inf Gracefully? #3328

lucianyao · 2025-06-28T16:03:50Z

lucianyao
Jun 28, 2025

Hi, I noticed that GELU may return NaN for -inf inputs due to the x * Φ(x) formulation. In cases like MaxPool padding, -inf can be passed into GELU, and I’ve seen it lead to NaN during testing. Curious if you think it's worth explicitly guarding for this case?

laggui · 2025-07-01T14:01:24Z

laggui
Jul 1, 2025
Maintainer

This is not a numerical stability issue where an alternative set of operations might be more suitable, simply that the input is not really in a valid range.

Adding such guards typically adds unnecessary overhead especially on GPU, so perhaps it is better to leave it to the user to clamp the input values e.g.

gelu(tensor.clamp(NEG_MAX, POS_MAX))

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Should GELU Handle -inf Gracefully? #3328

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Should GELU Handle -inf Gracefully? #3328

Uh oh!

lucianyao Jun 28, 2025

Replies: 1 comment

Uh oh!

laggui Jul 1, 2025 Maintainer

lucianyao
Jun 28, 2025

laggui
Jul 1, 2025
Maintainer