Skip to contents

This module implements the Gaussian Error Linear Unit Gated Linear Unit (GeGLU) activation function. It computes \(\text{GeGLU}(x, g) = x \cdot \text{GELU}(g)\) where \(x\) and \(g\) are created by splitting the input tensor in half along the last dimension.

Usage

nn_geglu()

References

Shazeer N (2020). “GLU Variants Improve Transformer.” 2002.05202, https://arxiv.org/abs/2002.05202.

Examples

x = torch::torch_randn(10, 10)
glu = nn_geglu()
glu(x)
#> torch_tensor
#>  0.0179 -0.6477 -0.1188 -0.1395  1.0449
#>  0.1703  0.0110  0.4797 -0.0022  0.1369
#> -0.0069 -0.0486 -0.2903  0.9705 -0.0463
#> -0.0225 -0.0269 -0.0266  0.0357 -0.6464
#>  0.0490  0.0688 -0.8732  0.7153  0.4669
#> -0.0337 -0.1098 -0.2817 -0.0176 -0.3475
#> -0.0533  0.0595 -0.2736  0.1725 -0.0623
#> -2.2409  1.4362  0.0114 -0.1502 -0.3535
#>  0.0216 -1.6423  0.1215  0.0632  0.0196
#> -0.0972 -0.0770  0.0022 -0.1045 -0.0472
#> [ CPUFloatType{10,5} ]