3. Training a reflectorch model#

First, we import the necessary methods from the reflectorch package, as well as other basic Python packages:

import matplotlib.pyplot as plt
import numpy as np
import torch

from reflectorch import SAVED_MODELS_DIR, SaveBestModel, StepLR, get_trainer_by_name, get_callbacks_by_name
from reflectorch.extensions.jupyter import JPlotLoss

3.1. The training loop#

3.1.1. Loading the trainer#

For training a model we use the Trainer class, which contains all the components necessary for the training process such as the data generator, the neural network and the optimizer.

We can initialize the trainer according to the specifications defined in a YAML configuration file using the get_trainer_by_name method which takes as input the name of the configuration file. If the package was installed in editable model, the configuration files are read from the configs directory located inside the repository, otherwise the path to the directory containing the configuration file should also be specified using the config_dir argument. The load_weights argument should be set to False since we want the neural network weights to be randomly initialized for a fresh training.

config_name = 'c1'
trainer = get_trainer_by_name(config_name, load_weights=False)
Model c1 loaded. Number of parameters: 3.83 M

The trainer contains several important attributes we can inspect:

  1. The Pytorch optimizer. We can observe that the optimizer specified in the configuration is AdamW:

trainer.optim
AdamW (
Parameter Group 0
    amsgrad: False
    betas: [0.9, 0.999]
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.0001
    maximize: False
    weight_decay: 0.0005
)

Note

The learning rate can be easily changed using trainer.set_lr(new_lr)

  1. The batch size

trainer.batch_size
4096
  1. The Pytorch neural network module. We can see that the network is an instance of the class NetworkWithPriorsConvEmb. This architecture consists of a multilayer perceptron (MLP) with residual connections, batch normalization layers and GELU activations (trainer.model.mlp). A 1D CNN embedding network (trainer.model.embedding_net) produces a latent embedding of the input batch of reflectivity curves which is concatenated with the prior bounds for the thin film parameters.

trainer.model
NetworkWithPriorsConvEmb(
  (embedding_net): ConvEncoder(
    (core): Sequential(
      (0): Sequential(
        (0): Conv1d(1, 32, kernel_size=(3,), stride=(2,), padding=(1,))
        (1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): GELU(approximate='none')
      )
      (1): Sequential(
        (0): Conv1d(32, 64, kernel_size=(3,), stride=(2,), padding=(1,))
        (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): GELU(approximate='none')
      )
      (2): Sequential(
        (0): Conv1d(64, 128, kernel_size=(3,), stride=(2,), padding=(1,))
        (1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): GELU(approximate='none')
      )
      (3): Sequential(
        (0): Conv1d(128, 256, kernel_size=(3,), stride=(2,), padding=(1,))
        (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): GELU(approximate='none')
      )
      (4): Sequential(
        (0): Conv1d(256, 512, kernel_size=(3,), stride=(2,), padding=(1,))
        (1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): GELU(approximate='none')
      )
    )
    (avpool): AdaptiveAvgPool1d(output_size=1)
    (fc): Linear(in_features=512, out_features=128, bias=True)
  )
  (mlp): ResidualMLP(
    (first_layer): Linear(in_features=144, out_features=512, bias=True)
    (blocks): ModuleList(
      (0-5): 6 x ResidualBlock(
        (activation): GELU(approximate='none')
        (batch_norm_layers): ModuleList(
          (0-1): 2 x BatchNorm1d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (linear_layers): ModuleList(
          (0-1): 2 x Linear(in_features=512, out_features=512, bias=True)
        )
      )
    )
    (last_layer): Linear(in_features=512, out_features=8, bias=True)
  )
)

3.1.2. Defining callbacks#

We can control the training process using callback objects, such as:

  1. JPlotLoss - allows the interactive visualization of the loss curve when training inside a Jupyter Notebook, the frequency argument setting the refresh rate of the interactive widget

  1. StepLR - implements a learning rate scheduler which decreases the learning rate in steps (after a number of iterations defined by step_size the learning rate is multiplied by the factor gamma). Other types of learning rate schedulers can alternatively be used, such as CyclicLR, LogCyclicLR, OneCycleLR or ReduceLROnPlateau.

  1. SaveBestModel - it enables the periodic saving of the weights of the neural network during training. After a number of iterations defined by the freq argument, the weights of the neural network are saved at the specified path if the current average loss (computed over the last average iterations) is lower than the loss for the previous save. The history of the losses and learning rate values is also saved.

When the package is installed in editable mode, the default save path is relative to the repository directory (defined by the global variable SAVED_MODELS_DIR).

save_model_name = 'model_' + config_name + '.pt'
save_path = str(SAVED_MODELS_DIR / save_model_name)

We group the callback objects together in a touple:

callbacks = (
    JPlotLoss(frequency=10), 
    StepLR(step_size=5000, gamma=0.1, last_epoch=-1), 
    SaveBestModel(path=save_path, freq=100, average=10),
)

Note

The callbacks can also be initialized directly from the configuration file: callbacks = get_callbacks_by_name(config_name)

3.1.3. Run the training#

The training process is initiated by calling the train method of the trainer. This method accepts as arguments the previously defined tuple of callbacks, as well as the number of iterations (batches). Notably, a new batch of data is generated at each iteration, the training taking place in a “one-epoch regime”.

trainer.train(num_batches=1000, callbacks=callbacks)

After training, the history of the losses and learning rates can be accessed via trainer.losses and trainer.lrs. We can also find them together with the model state_dict in the saved dictionary:

torch.load(save_path).keys()
dict_keys(['model', 'lrs', 'losses', 'prev_save', 'batch_num', 'best_loss'])

The model state dictionaries of all the saved ‘.pt’ files in a directory can be further converted to the ‘.safetensors’ format for exporting to Huggingface using the convert_pt_to_safetensors method.

3.1.4. Training from the terminal#

Above we described the workflow for training a model in a Jupyter Notebook, where we loaded the trainer from the configuration file but defined the callbacks manually. Alternatively, one can train a model from the terminal (in this case the callbacks defined in the configuration file are used):

python -m reflectorch.train config_name

3.2. Customizing the YAML configuration for training#

In the following we show how the YAML configuration file can be customized.

Sample YAML configuration
general:
  name: c1
  root_dir: null
  
dset:
  cls: ReflectivityDataLoader
  prior_sampler:
    cls: SubpriorParametricSampler
    kwargs:
      param_ranges:
        thicknesses: [1., 500.]
        roughnesses: [0., 60.]
        slds: [0., 25.]
      bound_width_ranges:
        thicknesses: [1.0e-2, 500.]
        roughnesses: [1.0e-2, 60.]
        slds: [ 1.0e-2, 4.]
      model_name: standard_model
      max_num_layers: 2
      constrained_roughness: true
      max_thickness_share: 0.5
      logdist: false
      scale_params_by_ranges: false
      scaled_range: [-1., 1.]
      device: 'cuda'
      
  q_generator:
    cls: ConstantQ
    kwargs:
      q: [0.02, 0.15, 128]
      device: 'cuda'
      
  intensity_noise:
    cls: BasicExpIntensityNoise
    kwargs:
      relative_errors: [0.0, 0.2]
      abs_errors: 0.0
      consistent_rel_err: true
      logdist: false
      apply_shift: true
      shift_range: [-0.3, 0.3]
      apply_scaling: true
      scale_range: [-0.02, 0.02]
      apply_background: false
      background_range: [1.0e-10, 1.0e-8]

  q_noise:
    cls: BasicQNoiseGenerator
    kwargs:
      shift_std: 1.0e-3
      noise_std: [0., 1.0e-3]
      
  curves_scaler:
    cls: LogAffineCurvesScaler
    kwargs:
      weight: 0.2
      bias: 1.0
      eps: 1.0e-10

model:
  network:
    cls: NetworkWithPriorsConvEmb
    pretrained_name: null
    device: 'cuda'
    kwargs:
      in_channels: 1
      hidden_channels: [32, 64, 128, 256, 512]
      dim_embedding: 128
      dim_avpool: 1
      embedding_net_activation: 'gelu'
      use_batch_norm: true
      dim_out: 8
      layer_width: 512
      num_blocks: 6
      repeats_per_block: 2
      mlp_activation: 'gelu'
      dropout_rate: 0.0 
      pretrained_embedding_net: null
       
training:
  num_iterations: 10000
  batch_size: 4096
  lr: 1.0e-4
  grad_accumulation_steps: 1
  clip_grad_norm_max: null
  train_with_q_input: False
  update_tqdm_freq: 1
  optimizer: AdamW
  trainer_kwargs:
    optim_kwargs:
      betas: [0.9, 0.999]
      weight_decay: 0.0005
  callbacks:
    save_best_model:
      enable: true
      freq: 500
    lr_scheduler:
      cls: StepLR
      kwargs:
        step_size: 500
        gamma: 0.5
  logger:
    use_neptune: false

The general key, contains the following subkeys:

  • name - name used for saving the model

  • root - path to the root directory, defaults to the package directory

general:
  name: c1
  root_dir: null

The dset key defines the settings pertaining to the data generation (i.e. the SLD profile parameterization, the ranges of the thin film parameters, the q values, the noise added to the reflectivity curves and the scaling of the reflectivity curves). It has the following subkeys:

  1. cls (optional) - the class of the data loader. If not provided, the default class ReflectivityDataLoader is used.

  1. prior_sampler - responsible for defining the type of SLD parameterization, the ranges from which the thin film parameters are sampled and the ranges from which the widths of the prior bounds are sampled. The SubpriorParametricSampler class first samples a center (C) from the parameter ranges and a width (W) from the bound width ranges. This defines a subinterval delimited by the minimum prior bound B_min = C - W/2 and the maximum prior bound B_max = C + W/2. Then, the values of the parameters (to be used for simulating the reflectivity curves and as ground truth) are uniformly sampled within the interval [B_min, B_max]. It has the following keyword arguments:

  • model_name - name associated with the type of SLD parameterization. Here, standard_model represents the standard box model parameterization of the SLD with the parameters thickness, roughness and real layer SLD.

  • max_num_layers - the number of layers in the thin film (in addition to the substrate)

  • param_ranges - the ranges from which the values of each type of thin film parameter are uniformly sampled (for the standard model thicknesses, roughnesses and slds)

  • bound_width_ranges - the ranges from which the prior bound widths of each type of thin film parameter are uniformly sampled. If the argument logdist is set to true, the prior bound widths are sampled uniformly on a logarithmic scale instead, biasing the training towards smaller prior bound widths.

  • constrained_roughness - if true the sampling of the roughness parameters is constrained such that the roughness of an interface between two layers does not exceed a fraction (defined by the argument max_thickness_share) of the thickness of either one of those layers.

  • scale_params_by_ranges - if true the parameters are scaled with respect to their ranges, otherwise they are scaled with respect to their subprior bound interval. The default is false.

  • scaled_range - the ML-friendly range to which the parameters (and prior bounds) are scaled to, the default is [-1, 1]

  • device - default is 'cuda' for GPU use, can be changed to 'cpu' for CPU use

_images/FigureReflectometryNetwork1.png

Fig. 3.1 (a) Parameter sampling process (b) Neural network architecture#

dset:
  cls: ReflectivityDataLoader
  prior_sampler:
    cls: SubpriorParametricSampler
    kwargs:
      param_ranges:
        thicknesses: [1., 500.]
        roughnesses: [0., 60.]
        slds: [0., 25.]
      bound_width_ranges:
        thicknesses: [1.0e-2, 500.]
        roughnesses: [1.0e-2, 60.]
        slds: [ 1.0e-2, 4.]
      model_name: standard_model
      max_num_layers: 2
      constrained_roughness: true
      max_thickness_share: 0.5
      logdist: false
      scale_params_by_ranges: false
      scaled_range: [-1., 1.]
      device: 'cuda'
  1. q_generator - responsible for generating the transfer vector (q) values at which the reflectivity is to be simulated. We must first specifiy its class. The ConstantQ class generates a fixed discretization for all the reflectivity curves in the batch. Its q keyword argument is a tuple formatted as [q_min, q_max, num_q_points], which defines the minimum q value, the maximum q value as well as the number of points (including the interval boundaries) to be equidistantlly sampled. Other q generator classes are available such as VariableQ (equidistant grid with variable q_min, q_max and num_q_points, further described in the Advanced functionality section) and ConstantAngle which generates the grid of q values based on equidistantlly sampled scattering angles and the wavelength of the beam. The device argument can be changed to 'cpu' for CPU use (default is 'cuda' for GPU use).

dset:
  q_generator:
    cls: ConstantQ
    kwargs:
      q: [0.02, 0.15, 128]
      device: 'cuda'
      
  1. q_noise (optional) - responsible for adding noise to the generated q values, which emulates possible measurement errors due to sample misalignment. The BasicQNoiseGenerator class can add both systematic q shifts (the same change applied to all q points of a curve) and random noise (different changes applied to each q point of a curve) to the q values of the batch of curves, it has the following arguments:

  • shift_std - the standard deviation of the normal distribution for sampling the systematic q shifts (one value sampled per curve in the batch)

  • noise_std - the standard deviation of the normal distribution for sampling the random q noise (one value sampled per point in the curve). The standard deviation is the same for all curves in the batch if provided as a float, or uniformly sampled from a range for each curve in the batch if provided as a tuple.

dset:
  q_noise:
    cls: BasicQNoiseGenerator
    kwargs:
      shift_std: 1.0e-3
      noise_std: [0., 1.0e-3]
  1. intensity_noise - responsible for adding noise to the intensity values of the reflectivity curves.

    The BasicExpIntensityNoise combines four types of noise:

    • Poisson noise: Simulates count-based noise common in photon counting experiments.

    • Scaling noise: Applies a scaling transformation to the curves, equivalent to a vertical stretch or compression in logarithmic space.

    • Shift noise: Applies a multiplicative shift to the curves, equivalent to a vertical shift in logarithmic space.

    • Background noise: Adds a constant background value to the curves.

    It has the following arguments:

    • relative_errors - range of relative errors for Poisson noise

    • consistent_rel_err - if true, uses a consistent relative error for Poisson noise across all points in a curve

    • logdist - if true the relative errors are sampled uniformly on a logarithmic scale from the interval defined by relative_errors

    • apply_shift - if true, applies shift noise to the curves

    • shift_range - range of shift factors for shift noise

    • apply_scaling - if true, applies scaling noise to the curves

    • scale_range - range of scaling factors for scaling noise

    • apply_background - if true, applies background noise to the curves

    • background_range - range for background values

dset:   
  intensity_noise:
    cls: BasicExpIntensityNoise
    kwargs:
      relative_errors: [0.0, 0.2]
      abs_errors: 0.0
      consistent_rel_err: true
      logdist: false
      apply_shift: true
      shift_range: [-0.3, 0.3]
      apply_scaling: true
      scale_range: [-0.02, 0.02]
      apply_background: false
      background_range: [1.0e-10, 1.0e-8]
  1. smearing (optional) - responsible for adding resolution smearing to the curves (which can occur in neutron experiments due to uncertainties in incident angle or wavelength). The intensity at a q point will be the average of the intensities of neighbouring q points, weighted by a gaussian profile. The arguments of the Smearing class are:

  • sigma_range - the range for sampling the resolutions

  • constant_dq - if True the smearing is constant (the resolution is given by the constant dq at each point in the curve), otherwise the smearing is linear (the resolution is given by the constant dq/q at each point in the curve)

  • gauss_num - the number of interpolating gaussian profiles

  • share_smeared - the share of curves in the batch for which the resolution smearing is applied

dset:   
  smearing:
    cls: Smearing
    kwargs:
      sigma_range: [8.0e-4, 5.0e-3]
      constant_dq: true
      gauss_num: 31
      share_smeared: 0.2
  1. curves_scaler - responsible for scaling the reflectivity curves to a ML-friendly range. The LogAffineCurvesScaler class scales the reflectivity curves (R) accoring to the formula:

\(R_{scaled} = log_{10}(R + eps) \times weight + bias\)

The argument eps sets the minimum intensity value of the reflectivity curves which is considered. The default values of 1.0e-10 for eps, 0.2 for weight and 1.0 for bias results in the interval [-1, 1] for the scaled reflectivity curves.

dset:    
  curves_scaler:
    cls: LogAffineCurvesScaler
    kwargs:
      weight: 0.2
      bias: 1.0
      eps: 1.0e-10

The model key defines the neural network architecture. Its network subkey contains the following subkeys:

  • cls - the class of the neural network

  • pretrained_name - the name of a pretrained model used to initialize the weights (optional)

  • device - the Pytorch device (either cuda or cpu)

  • kwargs - the keyword arguments for the provided class

The NetworkWithPriorsConvEmb class has the following keyword arguments:

  • in_channels - the number of input channels of the 1D CNN (1 for reflectivity curves)

  • hidden_channels - list with the number of channels for each layer of the 1D CNN

  • dim_embedding - the dimension of the embedding produced by the 1D CNN

  • dim_avpool - the output size of the adaptive average pooling layer (it scales the input size of the linear layer which produces the embedding as avpool * hidden_dims[-1])

  • embedding_net_activation - the type of activation function in the 1D CNN

  • use_batch_norm - whether to use batch normalization (in both the 1D CNN and the MLP)

  • dim_out - the dimension of the output produced by the MLP (i.e. the number of thin film parameters to be predicted)

  • layer_width - the width (i.e. number of neurons) of a linear layer in the MLP

  • num_blocks - the number of residual blocks in the MLP

  • repeats_per_block - the number of normalization/activation/linear repeats in a block

  • mlp_activation - the type of activation function in the MLP

  • dropout_rate - dropout rate for each block

  • pretrained_embedding_net - the path to the weights of a pretrained embedding network (optional)

model:
  network:
    cls: NetworkWithPriorsConvEmb
    pretrained_name: null
    kwargs:
      in_channels: 1
      hidden_channels: [32, 64, 128, 256, 512]
      dim_embedding: 128
      dim_avpool: 1
      embedding_net_activation: 'gelu'
      use_batch_norm: true
      dim_out: 8
      layer_width: 512
      num_blocks: 6
      repeats_per_block: 2
      mlp_activation: 'gelu'
      dropout_rate: 0.0 
      pretrained_embedding_net: null
  

The training key can be used to customize the training settings:

  • num_iterations - the total number of training iterations

  • batch_size - the batch size (number of curves generated at each iteration)

  • lr - the initial learning rate

  • grad_accumulation_steps - if larger than 1, training is performed using gradient accumulation with the chosen number of steps

  • clip_grad_norm_max - the maximum norm for gradient clipping (optional)

  • train_with_q_input - must be set to True if the q-values are also used as input (i.e. when the FNO embedding network is used)

  • update_tqdm_freq - the frequency for updating the tqdm progress bar

  • optimizer - the used Pytorch optimizer. Default is AdamW

  • trainer_kwargs - additional trainer keyword arguments, can be used for setting the optimizer keyword arguments (optim_kwargs)

  • callbacks - (optional) the callback classes together with their keyword arguments which are used when training a model from the terminal.

training:
  num_iterations: 10000
  batch_size: 4096
  lr: 1.0e-4
  grad_accumulation_steps: 1
  clip_grad_norm_max: null
  train_with_q_input: False
  update_tqdm_freq: 1
  optimizer: AdamW
  trainer_kwargs:
    optim_kwargs:
      betas: [0.9, 0.999]
      weight_decay: 0.0005
  callbacks:
    save_best_model:
      enable: true
      freq: 500
    lr_scheduler:
      cls: StepLR
      kwargs:
        step_size: 500
        gamma: 0.5
  logger:
    use_neptune: false