This module implements the Gaussian Error Linear Unit Gated Linear Unit (GeGLU) activation function. It computes \(\text{GeGLU}(x, g) = x \cdot \text{GELU}(g)\) where \(x\) and \(g\) are created by splitting the input tensor in half along the last dimension.
References
Shazeer N (2020). “GLU Variants Improve Transformer.” 2002.05202, https://arxiv.org/abs/2002.05202.
Examples
x = torch::torch_randn(10, 10)
glu = nn_geglu()
glu(x)
#> torch_tensor
#> 0.0793 0.0175 -1.4657 -0.0856 1.0849
#> -0.0235 -0.2009 -0.3677 -0.0243 0.0647
#> 0.1060 -0.0161 0.0025 0.2204 1.1108
#> 0.0067 -0.2920 -0.3804 0.1730 0.1917
#> -1.8504 0.0107 0.0174 0.1139 -0.0466
#> -0.2444 0.5351 0.1470 0.5995 0.7774
#> 1.6299 0.1825 -2.2643 0.2216 -0.2261
#> 0.0314 -0.4396 0.1819 -3.2899 -1.9441
#> -0.0255 -0.1722 -0.2135 -0.0115 0.8493
#> 0.0539 0.1630 0.0691 -0.1778 -0.1063
#> [ CPUFloatType{10,5} ]