Feature-Tokenizer Transformer for tabular data that can either work on lazy_tensor inputs
or on standard tabular features.
Some differences from the paper implementation: no attention compression, no option to have prenormalization in the first layer.
If training is unstable, consider a combination of standardizing features (e.g. using po("scale")), using an adaptive optimizer (e.g. Adam), reducing the learning rate,
and using a learning rate scheduler (see CallbackSetLRScheduler for options).
Parameters
Parameters from LearnerTorch and PipeOpTorchFTTransformerBlock, as well as:
n_blocks::integer(1)
The number of transformer blocks.d_token::integer(1)
The dimension of the embedding.cardinalities::integer(1)
The number of categories for each categorical feature. This only needs to be specified when working withlazy_tensorinputs.init_token::character(1)
The initialization method for the embedding weights. Either "uniform" or "normal". "Uniform" by default.ingress_tokens:: namedlist()orNULL
A list ofTorchIngressTokens. Only required when using lazy tensor features. The names are either "num.input" or "categ.input", and the values are lazy tensor ingress tokens constructed by, e.g.ingress_ltnsr(<num_feat_name>).
References
Gorishniy Y, Rubachev I, Khrulkov V, Babenko A (2021). “Revisiting Deep Learning for Tabular Data.” arXiv, 2106.11959.
Super classes
mlr3::Learner -> mlr3torch::LearnerTorch -> LearnerTorchFTTransformer
Methods
Inherited methods
mlr3::Learner$base_learner()mlr3::Learner$configure()mlr3::Learner$encapsulate()mlr3::Learner$help()mlr3::Learner$predict()mlr3::Learner$predict_newdata()mlr3::Learner$reset()mlr3::Learner$selected_features()mlr3::Learner$train()mlr3torch::LearnerTorch$dataset()mlr3torch::LearnerTorch$format()mlr3torch::LearnerTorch$marshal()mlr3torch::LearnerTorch$print()mlr3torch::LearnerTorch$unmarshal()
Method new()
Creates a new instance of this R6 class.
Usage
LearnerTorchFTTransformer$new(
task_type,
optimizer = NULL,
loss = NULL,
callbacks = list()
)Arguments
task_type(
character(1))
The task type, either"classif" or"regr".optimizer(
TorchOptimizer)
The optimizer to use for training. Per default, adam is used.loss(
TorchLoss)
The loss used to train the network. Per default, mse is used for regression and cross_entropy for classification.callbacks(
list()ofTorchCallbacks)
The callbacks. Must have unique ids.
Examples
# Define the Learner and set parameter values
learner = lrn("classif.ft_transformer")
learner$param_set$set_values(
epochs = 1, batch_size = 16, device = "cpu",
n_blocks = 2, d_token = 32, ffn_d_hidden_multiplier = 4/3
)
# Define a Task
task = tsk("iris")
# Create train and test set
ids = partition(task)
# Train the learner on the training ids
learner$train(task, row_ids = ids$train)
# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)
# Score the predictions
predictions$score()
#> classif.ce
#> 0.26