Skip to contents

This module implements the Gaussian Error Linear Unit Gated Linear Unit (GeGLU) activation function. It computes \(\text{GeGLU}(x, g) = x \cdot \text{GELU}(g)\) where \(x\) and \(g\) are created by splitting the input tensor in half along the last dimension.

Usage

nn_geglu()

References

Shazeer N (2020). “GLU Variants Improve Transformer.” 2002.05202, https://arxiv.org/abs/2002.05202.

Examples

x = torch::torch_randn(10, 10)
glu = nn_geglu()
glu(x)
#> torch_tensor
#>  2.1250 -1.3281  0.0847  1.1350  0.0147
#>  0.8679  0.0165 -0.0255 -0.1974  0.1809
#> -2.0777  0.1736  0.0801  0.0842  0.0564
#> -0.0296 -0.0408  0.1391  0.1185  0.1363
#>  0.1634  0.7449  0.0067  0.0154  0.7121
#> -0.0265  0.0178  0.3446 -0.0709 -0.0139
#>  0.0069  0.0336 -0.0962  0.0030 -0.1447
#>  0.0127 -0.0668  0.1435  0.6840  0.0522
#>  0.0875 -0.3387  2.3387 -0.0368  0.0812
#> -0.0378 -0.1659  0.0004 -0.2095 -0.0131
#> [ CPUFloatType{10,5} ]