This module implements the Gaussian Error Linear Unit Gated Linear Unit (GeGLU) activation function. It computes \(\text{GeGLU}(x, g) = x \cdot \text{GELU}(g)\) where \(x\) and \(g\) are created by splitting the input tensor in half along the last dimension.
References
Shazeer N (2020). “GLU Variants Improve Transformer.” 2002.05202, https://arxiv.org/abs/2002.05202.
Examples
x = torch::torch_randn(10, 10)
glu = nn_geglu()
glu(x)
#> torch_tensor
#> 2.1250 -1.3281 0.0847 1.1350 0.0147
#> 0.8679 0.0165 -0.0255 -0.1974 0.1809
#> -2.0777 0.1736 0.0801 0.0842 0.0564
#> -0.0296 -0.0408 0.1391 0.1185 0.1363
#> 0.1634 0.7449 0.0067 0.0154 0.7121
#> -0.0265 0.0178 0.3446 -0.0709 -0.0139
#> 0.0069 0.0336 -0.0962 0.0030 -0.1447
#> 0.0127 -0.0668 0.1435 0.6840 0.0522
#> 0.0875 -0.3387 2.3387 -0.0368 0.0812
#> -0.0378 -0.1659 0.0004 -0.2095 -0.0131
#> [ CPUFloatType{10,5} ]