Skip to contents

This module implements the Gaussian Error Linear Unit Gated Linear Unit (GeGLU) activation function. It computes \(\text{GeGLU}(x, g) = x \cdot \text{GELU}(g)\) where \(x\) and \(g\) are created by splitting the input tensor in half along the last dimension.

Usage

nn_geglu()

References

Shazeer N (2020). “GLU Variants Improve Transformer.” 2002.05202, https://arxiv.org/abs/2002.05202.

Examples

x = torch::torch_randn(10, 10)
glu = nn_geglu()
glu(x)
#> torch_tensor
#> -0.1476 -0.1378 -0.0542  0.7774 -0.4236
#> -0.4343 -0.2940  0.0242 -0.9113 -0.1184
#> -0.3425 -0.0401  0.1440 -0.0644 -0.0006
#> -0.1055  0.0002  0.0666 -0.0314 -0.0007
#> -0.0079 -0.5192 -0.1715  0.6789 -0.4059
#>  0.0246 -0.1338 -0.2803  0.0073  0.2208
#>  0.6273 -0.1369  0.0140 -0.1649  0.1916
#> -0.6103 -0.5801 -0.1418 -0.3669 -0.0654
#>  0.1867 -0.2060  0.0336  0.0149  0.0064
#> -0.0339  0.0510  3.0001  0.0103 -0.2779
#> [ CPUFloatType{10,5} ]