Quickstart
In this vignette we will show how to get started with
mlr3torch
by training a simple neural network on a tabular
regression problem. We assume that you are familiar with the
mlr3
framework, see e.g. the mlr3 book. As a first example,
we will train a simple multi-layer perceptron (MLP) on the well-known
“mtcars” task, where the goal is to predict the miles per galleon
(‘mpg’) of cars. This architecture comes as a predfined learner with
mlr3torch
, but you can also easily create new network
architectures, see the Neural Networks as Graphs vignette for a
detailed introduoduion. We first set a seed for reproducibility, load
the library and construct the task.
set.seed(314)
library(mlr3torch)
task = tsk("mtcars")
task$head()
#> mpg am carb cyl disp drat gear hp qsec vs wt
#> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
#> 1: 21.0 1 4 6 160 3.90 4 110 16.46 0 2.620
#> 2: 21.0 1 4 6 160 3.90 4 110 17.02 0 2.875
#> 3: 22.8 1 1 4 108 3.85 4 93 18.61 1 2.320
#> 4: 21.4 0 1 6 258 3.08 3 110 19.44 1 3.215
#> 5: 18.7 0 2 8 360 3.15 3 175 17.02 0 3.440
#> 6: 18.1 0 1 6 225 2.76 3 105 20.22 1 3.460
Learners in mlr3torch
work very similary to other
mlr3
learners. Below, we construct a simple multi layer
perceptron for regression. We do this as usual by calling
lrn()
and configuring the parameters: We use two hidden
layers with 50 neurons, For training, we set the batch size to 32, the
number of training epochs to 30 and the device to "cpu"
.
For a complete description of the available parameters see
?mlr3torch::LearnerTorchMLP
.
mlp = lrn("regr.mlp",
# architecture parameters
neurons = c(50, 50),
# training arguments
batch_size = 32, epochs = 30, device = "cpu"
)
#> Warning in warn_deprecated("Learner$initialize argument 'data_formats'"):
#> Learner$initialize argument 'data_formats' is deprecated and will be removed in
#> the future.
mlp
#> <LearnerTorchMLP[regr]:regr.mlp>: My Little Powny
#> * Model: -
#> * Parameters: epochs=30, device=cpu, num_threads=1, seed=random,
#> eval_freq=1, measures_train=<list>, measures_valid=<list>,
#> patience=0, min_delta=0, batch_size=32, neurons=50,50, p=0.5,
#> activation=<nn_relu>, activation_args=<list>
#> * Validate: NULL
#> * Packages: mlr3, mlr3torch, torch
#> * Predict Types: [response]
#> * Feature Types: integer, numeric, lazy_tensor
#> * Properties: internal_tuning, marshal, validation
#> * Optimizer: adam
#> * Loss: mse
#> * Callbacks: -
We can use this learner for training and prediction just like any other regression learner. Below, we split the observations into a training and test set, train the learner on the training set and create predictions for the test set. Finally, we compute the mean squared error of the predictions.
# Split the obersevations into training and test set
splits = partition(task)
# Train the learner on the train set
mlp$train(task, row_ids = splits$train)
# Predict the test set
prediction = mlp$predict(task, row_ids = splits$test)
# Compute the mse
prediction$score(msr("regr.mse"))
#> regr.mse
#> 283.838
Configuring a Learner
Although torch learners are quite like other mlr3
learners, there are some differences. One is that all
LearnerTorch
classes have construction arguments,
i.e. torch learners are more modular than other learners. While learners
are free to implement their own construction arguments, there are some
that are common to all torch learners, namely the loss
,
optimizer
and callbacks
. Each of these object
can have their own parameters that are included in the
LearnerTorch
’s parameter set.
In the previous example, we did not specify any of these explicitly
and used the default values, which was the Adam optimizer, MSE as the
loss and no callbacks. We will now show how to configure these three
aspects of a learner through the mlr3torch::TorchOptimizer
,
mlr3torch::TorchLoss
, and
mlr3torch::TorchCallback
classes.
Loss
The loss function, also known as the objective function or cost
function, measures the discrepancy between the predicted output and the
true output. It quantifies how well the model is performing during
training. The R package torch
, which underpins the
mlr3torch
framework, already provides a number of
predefined loss functions such as the Mean Squared Error
(nn_mse_loss
), the Mean Absolute Error
(nn_l1_loss
), or the cross entropy loss
(nn_cross_entropy_loss
). In mlr3torch
, we
represent loss functions using the mlr3torch::TorchLoss
class. It provides a thin wrapper around the torch loss functions and
annotates them with meta information, most importantly a
paradox::ParamSet
that allows to configure the loss
function. Such an object can be constructed using
t_loss(<key>)
. Below, we construct the L1 loss
function, which is also known as Mean Absolute Error (MAE). The printed
output below informs us about the wrapped loss function
(nn_l1_loss
), the configured parameters, the packages it
depends on and for which task types it can be used.
l1 = t_loss("l1")
l1
#> <TorchLoss:l1> Absolute Error
#> * Generator: nn_l1_loss
#> * Parameters: list()
#> * Packages: torch,mlr3torch
#> * Task Types: regr
Its ParamSet
contains only one parameter, namely
reduction
, which specifies how the loss is reduced over the
batch.
# the paradox::ParamSet of the loss
l1$param_set
#> <ParamSet(1)>
#> id class lower upper nlevels default value
#> <char> <char> <num> <num> <num> <list> <list>
#> 1: reduction ParamFct NA NA 2 mean [NULL]
The wrapped loss module generator is accessible through the slot
$generator
.
l1$generator
#> <nn_l1_loss> object generator
#> Inherits from: <inherit>
#> Public:
#> .classes: nn_l1_loss nn_loss nn_module
#> initialize: function (reduction = "mean")
#> forward: function (input, target)
#> clone: function (deep = FALSE, ..., replace_values = TRUE)
#> Private:
#> .__clone_r6__: function (deep = FALSE)
#> Parent env: <environment: 0x55a8f10bdd40>
#> Locked objects: FALSE
#> Locked class: FALSE
#> Portable: TRUE
We can pass the TorchLoss
as the argument
loss
during initialization of the learner. The parameters
of the loss are added to the learner’s ParamSet
, prefixed
with "loss."
.
mlp_l1 = lrn("regr.mlp", loss = l1)
mlp_l1$param_set$values$loss.reduction
#> NULL
All predefined loss functions are stored in the
mlr3torch_losses
dictionary, from which they can be
retrieved using t_loss(<key>)
.
mlr3torch_losses
#> <DictionaryMlr3torchLosses> with 3 stored values
#> Keys: cross_entropy, l1, mse
Optimizer
The optimizer determines how the model’s weights are updated based on
the calculated loss. It adjusts the parameters of the model to minimize
the loss function, optimizing the model’s performance. Optimizers work
analogous to loss functions, i.e. mlr3torch
provides a thin
wrapper – the TorchOptimizer
class – around the optimizers
such as Adam (optim_adam
) or SGD (optim_sgd
).
TorchLoss
objects can be constructed using
t_opt(<key>)
. For optimizers, the associated
ParamSet
is more interesting as we see below:
sgd = t_opt("sgd")
sgd
#> <TorchOptimizer:sgd> Stochastic Gradient Descent
#> * Generator: optim_sgd
#> * Parameters: list()
#> * Packages: torch,mlr3torch
sgd$param_set
#> <ParamSet(5)>
#> id class lower upper nlevels default value
#> <char> <char> <num> <num> <num> <list> <list>
#> 1: lr ParamDbl 0 Inf Inf <NoDefault[0]> [NULL]
#> 2: momentum ParamDbl 0 1 Inf 0 [NULL]
#> 3: dampening ParamDbl 0 1 Inf 0 [NULL]
#> 4: weight_decay ParamDbl 0 1 Inf 0 [NULL]
#> 5: nesterov ParamLgl NA NA 2 FALSE [NULL]
The wrapped torch optimizer can be accessed through the slot
generator
.
Parameters of TorchOptimizer
(but also
TorchLoss
and TorchCallback
) can be set in the
usual mlr3
way, i.e. either during construction, or
afterwards using the $set_values()
method of the parameter
set.
sgd$param_set$set_values(
lr = 0.5, # increase learning rate
nesterov = FALSE # no nesterov momentum
)
Below we see that the optimizer’s parameters are added to the
learner’s ParamSet
(prefixed with "opt."
) and
that the values are set to the values we specified.
mlp_sgd = lrn("regr.mlp", optimizer = sgd)
as.data.table(mlp_sgd$param_set)[
startsWith(id, "opt.")][[1L]]
#> [1] "opt.lr" "opt.momentum" "opt.dampening" "opt.weight_decay"
#> [5] "opt.nesterov"
mlp_sgd$param_set$values[c("opt.lr", "opt.nesterov")]
#> $opt.lr
#> [1] 0.5
#>
#> $opt.nesterov
#> [1] FALSE
By exposing the optimizer’s parameters, they can be conveniently
tuned using mlr3tuning
.
All available optimizers are stored in the
mlr3torch_optimizers
dictionary.
mlr3torch_optimizers
#> <DictionaryMlr3torchOptimizers> with 7 stored values
#> Keys: adadelta, adagrad, adam, asgd, rmsprop, rprop, sgd
Callbacks
The third important configuration option are callbacks which allow to
customize the training process. This allows saving model checkpoints,
logging metrics, or implementing custom functionality for specific
training scenarios. For a tutorial on how to implement a custom
callback, see the Custom Callbacks vignette. Here, we will only
show how to use predefined callbacks. Below, we retrieve the
"history"
callback using t_clbk()
, which has
no parameters and merely saves the training and validation history in
the learner so it can be accessed afterwards.
history = t_clbk("history")
history
#> <TorchCallback:history> History
#> * Generator: CallbackSetHistory
#> * Parameters: list()
#> * Packages: mlr3torch,torch
If we wanted to learn about what the callback does, we can access the
help page of the wrapped object using the $help()
method.
Note that this is also possible for the loss and optimizer.
history$help()
All predefined callbacks are stored in the
mlr3torch_callbacks
dictionary.
mlr3torch_callbacks
#> <DictionaryMlr3torchCallbacks> with 3 stored values
#> Keys: checkpoint, history, progress
Putting it Together
We now define our customized MLP learner using the loss, optimizer
and callback we have just covered. To keep track of the performance, we
use 30% of the training data for validation and evaluate it using the
MAE Measure
. Note that the mearures_valid
and
measures_train
parameters of LearnerTorch
take
common mlr3::Measure
s, whereas the loss function must be a
TorchLoss
.
mlp_custom = lrn("regr.mlp",
# construction arguments
optimizer = sgd, loss = l1, callbacks = history,
# scores to keep track of
measures_valid = msr("regr.mae"),
# other parameters are left as-is:
# architecture
neurons = c(50, 50),
# training arguments
batch_size = 32, epochs = 30, device = "cpu",
# validation proportion
validate = 0.3
)
mlp_custom
#> <LearnerTorchMLP[regr]:regr.mlp>: My Little Powny
#> * Model: -
#> * Parameters: epochs=30, device=cpu, num_threads=1, seed=random,
#> eval_freq=1, measures_train=<list>,
#> measures_valid=<MeasureRegrSimple>, patience=0, min_delta=0,
#> batch_size=32, neurons=50,50, p=0.5, activation=<nn_relu>,
#> activation_args=<list>, opt.lr=0.5, opt.nesterov=FALSE
#> * Validate: 0.3
#> * Packages: mlr3, mlr3torch, torch
#> * Predict Types: [response]
#> * Feature Types: integer, numeric, lazy_tensor
#> * Properties: internal_tuning, marshal, validation
#> * Optimizer: sgd
#> * Loss: l1
#> * Callbacks: history
We now train the learner on the “mtcars” task again and use the same train-test split as before.
mlp_custom$train(task, row_ids = splits$train)
prediction_custom = mlp_custom$predict(task, row_ids = splits$test)
Below we make predictions on the unseen test data and compare the
scores. Because we directly optimized the L1 (aka MAE) loss and tweaked
the learning rate, our configured mlp_custom
learner has a
lower MAE than the default mlp
learner.
prediction_custom$score(msr("regr.mae"))
#> regr.mae
#> 7.122375
prediction$score(msr("regr.mae"))
#> regr.mae
#> 15.27983
Because we configured the learner to use the history callback, we can
find the validation history in its $model
slot:
head(mlp_custom$model$callbacks$history$valid)
#> epoch regr.mae
#> <num> <num>
#> 1: 1 1.777395e+04
#> 2: 2 3.955504e+08
#> 3: 3 1.143863e+04
#> 4: 4 1.792927e+01
#> 5: 5 1.759594e+01
#> 6: 6 1.726260e+01
The plot below shows it for the epochs 6 to 30.
Other important information that is stored in the
Learner
’s model is the $network
, which is the
underlying nn_module
. For a full description of the model,
see ?LearnerTorch
.