Skip to contents

This module implements the Gaussian Error Linear Unit Gated Linear Unit (GeGLU) activation function. It computes \(\text{GeGLU}(x, g) = x \cdot \text{GELU}(g)\) where \(x\) and \(g\) are created by splitting the input tensor in half along the last dimension.

Usage

nn_geglu()

References

Shazeer N (2020). “GLU Variants Improve Transformer.” 2002.05202, https://arxiv.org/abs/2002.05202.

Examples

x = torch::torch_randn(10, 10)
glu = nn_geglu()
glu(x)
#> torch_tensor
#>  0.0793  0.0175 -1.4657 -0.0856  1.0849
#> -0.0235 -0.2009 -0.3677 -0.0243  0.0647
#>  0.1060 -0.0161  0.0025  0.2204  1.1108
#>  0.0067 -0.2920 -0.3804  0.1730  0.1917
#> -1.8504  0.0107  0.0174  0.1139 -0.0466
#> -0.2444  0.5351  0.1470  0.5995  0.7774
#>  1.6299  0.1825 -2.2643  0.2216 -0.2261
#>  0.0314 -0.4396  0.1819 -3.2899 -1.9441
#> -0.0255 -0.1722 -0.2135 -0.0115  0.8493
#>  0.0539  0.1630  0.0691 -0.1778 -0.1063
#> [ CPUFloatType{10,5} ]