This module implements the Gaussian Error Linear Unit Gated Linear Unit (GeGLU) activation function. It computes \(\text{GeGLU}(x, g) = x \cdot \text{GELU}(g)\) where \(x\) and \(g\) are created by splitting the input tensor in half along the last dimension.
References
Shazeer N (2020). “GLU Variants Improve Transformer.” 2002.05202, https://arxiv.org/abs/2002.05202.
Examples
x = torch::torch_randn(10, 10)
glu = nn_geglu()
glu(x)
#> torch_tensor
#> -0.1283 0.0194 0.3077 -0.2315 0.1275
#> 1.1615 0.0015 -0.1145 -0.2539 0.0295
#> 2.6376 2.1295 0.0724 0.1144 -0.1247
#> 0.1080 0.0859 0.0078 0.1499 0.0782
#> 0.4435 -0.0045 0.0531 0.1401 0.0786
#> -0.3958 0.0560 0.5835 0.0464 0.0138
#> 0.0911 -0.1985 -0.1178 0.0112 1.6551
#> 0.0597 0.0639 0.8862 -0.1416 0.5112
#> -0.1585 -0.1959 0.1453 0.0597 -0.0680
#> -0.0445 0.2221 -0.1414 0.0003 1.6765
#> [ CPUFloatType{10,5} ]