Skip to contents

This module implements the Gaussian Error Linear Unit Gated Linear Unit (GeGLU) activation function. It computes \(\text{GeGLU}(x, g) = x \cdot \text{GELU}(g)\) where \(x\) and \(g\) are created by splitting the input tensor in half along the last dimension.

Usage

nn_geglu()

References

Shazeer N (2020). “GLU Variants Improve Transformer.” 2002.05202, https://arxiv.org/abs/2002.05202.

Examples

x = torch::torch_randn(10, 10)
glu = nn_geglu()
glu(x)
#> torch_tensor
#> -8.6340e-05  7.3866e-01  1.5666e-01  9.0087e-01  1.2459e-01
#> -9.4389e-02 -1.0175e+00 -5.3816e-01  5.3050e-01  3.0874e-02
#> -1.9642e-01 -3.6265e-01  2.3447e-01 -5.2723e-03  2.9839e-02
#>  2.7064e+00  1.0403e+00 -4.3660e-02 -2.4516e-02 -8.8397e-03
#>  4.9637e-01  7.7827e-02 -9.1744e-02  6.6462e-02 -1.4607e-01
#> -1.8864e-01 -5.2940e-01 -1.3365e-02 -5.5786e-02  5.9989e-02
#>  4.5743e-02 -2.9217e-03 -6.7277e-02  5.7167e-02 -3.6195e-02
#> -2.4695e-02  1.1955e-01 -9.6608e-02  3.3875e-02  4.8542e-03
#> -3.9722e-01 -1.0947e-01  1.1897e-01  3.6750e-01 -1.0221e+00
#>  1.3965e-01 -3.4384e-02  2.3788e-01 -3.2289e-03  2.5511e-01
#> [ CPUFloatType{10,5} ]