Skip to contents

This module implements the Gaussian Error Linear Unit Gated Linear Unit (GeGLU) activation function. It computes \(\text{GeGLU}(x, g) = x \cdot \text{GELU}(g)\) where \(x\) and \(g\) are created by splitting the input tensor in half along the last dimension.

Usage

nn_geglu()

References

Shazeer N (2020). “GLU Variants Improve Transformer.” 2002.05202, https://arxiv.org/abs/2002.05202.

Examples

x = torch::torch_randn(10, 10)
glu = nn_geglu()
glu(x)
#> torch_tensor
#> -0.1283  0.0194  0.3077 -0.2315  0.1275
#>  1.1615  0.0015 -0.1145 -0.2539  0.0295
#>  2.6376  2.1295  0.0724  0.1144 -0.1247
#>  0.1080  0.0859  0.0078  0.1499  0.0782
#>  0.4435 -0.0045  0.0531  0.1401  0.0786
#> -0.3958  0.0560  0.5835  0.0464  0.0138
#>  0.0911 -0.1985 -0.1178  0.0112  1.6551
#>  0.0597  0.0639  0.8862 -0.1416  0.5112
#> -0.1585 -0.1959  0.1453  0.0597 -0.0680
#> -0.0445  0.2221 -0.1414  0.0003  1.6765
#> [ CPUFloatType{10,5} ]