This module implements the Gaussian Error Linear Unit Gated Linear Unit (GeGLU) activation function. It computes \(\text{GeGLU}(x, g) = x \cdot \text{GELU}(g)\) where \(x\) and \(g\) are created by splitting the input tensor in half along the last dimension.
References
Shazeer N (2020). “GLU Variants Improve Transformer.” 2002.05202, https://arxiv.org/abs/2002.05202.
Examples
x = torch::torch_randn(10, 10)
glu = nn_geglu()
glu(x)
#> torch_tensor
#> -0.0781 0.0429 0.2326 -0.3157 0.3587
#> 0.0269 -1.2169 1.1411 0.3839 1.0649
#> -2.8136 0.0258 0.3226 -0.1794 -0.2462
#> -0.0137 0.0488 0.0029 -0.0838 -0.1425
#> -0.7842 -1.2676 0.9033 0.0471 0.0375
#> -0.1288 0.3352 -0.1260 0.1101 2.9018
#> -0.1718 -0.0550 -0.2146 0.0457 0.1578
#> 0.0651 0.3436 -0.0093 -0.1082 -0.0523
#> -0.0214 -0.6303 0.0938 2.1864 0.0974
#> -0.0777 -0.1244 0.6496 0.0455 -0.0079
#> [ CPUFloatType{10,5} ]