Skip to contents

A transformer block consisting of a multi-head self-attention mechanism followed by a feed-forward network.

This is used in LearnerTorchFTTransformer.

nn_module

Calls nn_ft_transformer_block() when trained.

State

The state is the value calculated by the public method $shapes_out().

See also

Other PipeOps: mlr_pipeops_nn_adaptive_avg_pool1d, mlr_pipeops_nn_adaptive_avg_pool2d, mlr_pipeops_nn_adaptive_avg_pool3d, mlr_pipeops_nn_avg_pool1d, mlr_pipeops_nn_avg_pool2d, mlr_pipeops_nn_avg_pool3d, mlr_pipeops_nn_batch_norm1d, mlr_pipeops_nn_batch_norm2d, mlr_pipeops_nn_batch_norm3d, mlr_pipeops_nn_block, mlr_pipeops_nn_celu, mlr_pipeops_nn_conv1d, mlr_pipeops_nn_conv2d, mlr_pipeops_nn_conv3d, mlr_pipeops_nn_conv_transpose1d, mlr_pipeops_nn_conv_transpose2d, mlr_pipeops_nn_conv_transpose3d, mlr_pipeops_nn_dropout, mlr_pipeops_nn_elu, mlr_pipeops_nn_flatten, mlr_pipeops_nn_ft_cls, mlr_pipeops_nn_geglu, mlr_pipeops_nn_gelu, mlr_pipeops_nn_glu, mlr_pipeops_nn_hardshrink, mlr_pipeops_nn_hardsigmoid, mlr_pipeops_nn_hardtanh, mlr_pipeops_nn_head, mlr_pipeops_nn_identity, mlr_pipeops_nn_layer_norm, mlr_pipeops_nn_leaky_relu, mlr_pipeops_nn_linear, mlr_pipeops_nn_log_sigmoid, mlr_pipeops_nn_max_pool1d, mlr_pipeops_nn_max_pool2d, mlr_pipeops_nn_max_pool3d, mlr_pipeops_nn_merge, mlr_pipeops_nn_merge_cat, mlr_pipeops_nn_merge_prod, mlr_pipeops_nn_merge_sum, mlr_pipeops_nn_prelu, mlr_pipeops_nn_reglu, mlr_pipeops_nn_relu, mlr_pipeops_nn_relu6, mlr_pipeops_nn_reshape, mlr_pipeops_nn_rrelu, mlr_pipeops_nn_selu, mlr_pipeops_nn_sigmoid, mlr_pipeops_nn_softmax, mlr_pipeops_nn_softplus, mlr_pipeops_nn_softshrink, mlr_pipeops_nn_softsign, mlr_pipeops_nn_squeeze, mlr_pipeops_nn_tanh, mlr_pipeops_nn_tanhshrink, mlr_pipeops_nn_threshold, mlr_pipeops_nn_tokenizer_categ, mlr_pipeops_nn_tokenizer_num, mlr_pipeops_nn_unsqueeze, mlr_pipeops_torch_ingress, mlr_pipeops_torch_ingress_categ, mlr_pipeops_torch_ingress_ltnsr, mlr_pipeops_torch_ingress_num, mlr_pipeops_torch_loss, mlr_pipeops_torch_model, mlr_pipeops_torch_model_classif, mlr_pipeops_torch_model_regr

Super classes

mlr3pipelines::PipeOp -> mlr3torch::PipeOpTorch -> PipeOpTorchFTTransformerBlock

Methods

Inherited methods


Method new()

Create a new instance of this R6 class.

Usage

PipeOpTorchFTTransformerBlock$new(
  id = "nn_ft_transformer_block",
  param_vals = list()
)

Arguments

id

(character(1))
Identifier of the resulting object.

param_vals

(list())
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction.


Method clone()

The objects of this class are cloneable with this method.

Usage

PipeOpTorchFTTransformerBlock$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# Construct the PipeOp
pipeop = po("nn_ft_transformer_block")
pipeop
#> PipeOp: <nn_ft_transformer_block> (not trained)
#> values: <attention_n_heads=8, attention_dropout=0.2, attention_initialization=kaiming, attention_normalization=<nn_layer_norm>, ffn_dropout=0.1, ffn_activation=<nn_reglu>, ffn_normalization=<nn_layer_norm>, residual_dropout=0, prenormalization=TRUE, is_first_layer=FALSE, query_idx=<NULL>, attention_bias=TRUE, ffn_bias_first=TRUE, ffn_bias_second=TRUE>
#> Input channels <name [train type, predict type]>:
#>   input [ModelDescriptor,Task]
#> Output channels <name [train type, predict type]>:
#>   output [ModelDescriptor,Task]
# The available parameters
pipeop$param_set
#> <ParamSet(16)>
#>                           id    class lower upper nlevels        default
#>                       <char>   <char> <num> <num>   <num>         <list>
#>  1:        attention_n_heads ParamInt     1   Inf     Inf <NoDefault[0]>
#>  2:        attention_dropout ParamDbl     0     1     Inf <NoDefault[0]>
#>  3: attention_initialization ParamFct    NA    NA       2 <NoDefault[0]>
#>  4:  attention_normalization ParamUty    NA    NA     Inf <NoDefault[0]>
#>  5:             ffn_d_hidden ParamInt     1   Inf     Inf <NoDefault[0]>
#>  6:  ffn_d_hidden_multiplier ParamDbl     0   Inf     Inf <NoDefault[0]>
#>  7:              ffn_dropout ParamDbl     0     1     Inf <NoDefault[0]>
#>  8:           ffn_activation ParamUty    NA    NA     Inf <NoDefault[0]>
#>  9:        ffn_normalization ParamUty    NA    NA     Inf <NoDefault[0]>
#> 10:         residual_dropout ParamDbl     0     1     Inf <NoDefault[0]>
#> 11:         prenormalization ParamLgl    NA    NA       2 <NoDefault[0]>
#> 12:           is_first_layer ParamLgl    NA    NA       2 <NoDefault[0]>
#> 13:                query_idx ParamUty    NA    NA     Inf <NoDefault[0]>
#> 14:           attention_bias ParamLgl    NA    NA       2 <NoDefault[0]>
#> 15:           ffn_bias_first ParamLgl    NA    NA       2 <NoDefault[0]>
#> 16:          ffn_bias_second ParamLgl    NA    NA       2 <NoDefault[0]>
#>                  value
#>                 <list>
#>  1:                  8
#>  2:                0.2
#>  3:            kaiming
#>  4: <nn_layer_norm[1]>
#>  5:             [NULL]
#>  6:             [NULL]
#>  7:                0.1
#>  8:      <nn_reglu[1]>
#>  9: <nn_layer_norm[1]>
#> 10:                  0
#> 11:               TRUE
#> 12:              FALSE
#> 13:             [NULL]
#> 14:               TRUE
#> 15:               TRUE
#> 16:               TRUE