4. Advanced functionality#
import numpy as np
import torch
import matplotlib.pyplot as plt
from ipywidgets import interact
from reflectorch import *
from reflectorch.extensions.jupyter import JPlotLoss
torch.manual_seed(0); # set seed for reproducibility
4.1. Using alternative parameterizations of the SLD profile#
In this section we describe alternative parameterizations of the SLD profile implementated in reflectorch
(i.e. other than the standard box model parameterization)
4.1.1. Model with absorption#
The default box model parameterization of the SLD profile presented in the previous sections takes into account only the real part of the SLD profile but neglects the imaginary part of the SLD which is related to the absorption of the scattering medium. While this is a reasonable approximation in many use cases, in reflectorch
we are able to alter the parameterization of the SLD profile as to also incorporate the constant imaginary SLD value of each layer.
We can initialize a reflectorch model with absorption by making the following changes to the YAML configuration file:
Firstly, we set the
model_name
argument of the prior sampler tomodel_with_absorption
instead ofstandard_model
. In addition, the parameter range and bound width range of the imaginary layer SLDs (islds
) must be specified. Ifconstrained_isld
is set totrue
, the imaginary part of the SLD is constrained to not exceed a predefined fractionmax_sld_share
of the real part of the SLD (Note:constrained_roughness
must also betrue
).
dset:
prior_sampler:
cls: SubpriorParametricSampler
kwargs:
param_ranges:
thicknesses: [0., 300.]
roughnesses: [0., 20.]
slds: [0., 50.]
islds: [0., 5.]
bound_width_ranges:
thicknesses: [ 1.0e-2, 300.]
roughnesses: [ 1.0e-2, 20.]
slds: [ 1.0e-2, 10.]
islds: [ 1.0e-2, 5.]
model_name: model_with_absorption
max_num_layers: 2
constrained_roughness: true
constrained_isld: true
max_thickness_share: 0.5
max_sld_share: 0.2
logdist: false
scale_params_by_ranges: false
scaled_range: [-1., 1.]
device: 'cuda'
For the 2-layer box model parameterization of the SLD profile without absorption, the neural network had to predict 8 values (2 thicknesses, 3 roughnesses, 3 real layer SLDs). When absorption is considered, we have 3 additional output values (the imaginary layer SLDs), summing up to a total of 11. The computation for a higher number of layers is analogous. Thus the neural network architecture must also reflect these changes in the input and output dimensionalities: in this example the
dim_out
argument is set to 11.
model:
network:
cls: NetworkWithPriorsConvEmb
pretrained_name: null
device: 'cuda'
kwargs:
in_channels: 1
hidden_channels: [32, 64, 128, 256, 512]
dim_embedding: 128
dim_avpool: 1
embedding_net_activation: 'gelu'
use_batch_norm: true
dim_out: 11
layer_width: 512
num_blocks: 6
repeats_per_block: 2
mlp_activation: 'gelu'
dropout_rate: 0.0
pretrained_embedding_net: null
We initialize a model with absorption from a suitable configuration file.
trainer = get_trainer_by_name(config_name='c_absorption', load_weights=False)
Model c_absorption loaded. Number of parameters: 3.84 M
We observe that the parametric model is ModelWithAbsorption
and the sampler strategy is ConstrainedRoughnessAndImgSldSamplerStrategy
trainer.loader.prior_sampler.model_name, trainer.loader.prior_sampler.param_model, trainer.loader.prior_sampler.param_model.sampler_strategy
('model_with_absorption',
<reflectorch.data_generation.priors.parametric_models.ModelWithAbsorption at 0x2151ec00e80>,
<reflectorch.data_generation.priors.sampler_strategies.ConstrainedRoughnessAndImgSldSamplerStrategy at 0x21544989fa0>)
batch_size = 64
simulated_data = trainer.loader.get_batch(batch_size)
n_layers = simulated_data['params'].max_layer_num
n_params = simulated_data['params'].num_params
print(f'Number of layers: {n_layers}, Number of film parameters: {n_params}')
print('SLD data type: ' + str(simulated_data['params'].slds.dtype))
print('Img slds properly constrained with respect to real slds: ' + str(torch._is_all_true(simulated_data['params'].slds.imag < 0.2*simulated_data['params'].slds.real).item()))
Number of layers: 2, Number of film parameters: 11
SLD data type: torch.complex128
Img slds properly constrained with respect to real slds: True
We observe that the slds
tensor is of complex type now. The absorption leads to the rounding of the total reflection edge and the smoothing-off of oscillations of the reflectivity curve.
Show code cell source
q = to_np(simulated_data['q_values'])[0]
scaled_noisy_curves = simulated_data['scaled_noisy_curves']
fig, ax = plt.subplots(1,2,figsize=(12,6))
ax[0].set_ylim(-1.1, 1.1)
ax[0].set_xlabel('q [$Å^{-1}$]', fontsize=18)
ax[0].set_ylabel('R$_{scaled}$ (q)', fontsize=18)
ax[0].tick_params(axis='both', which='major', labelsize=14)
i = 0
ax[0].scatter(q, to_np(scaled_noisy_curves[i]), c='blue', s=2.0);
z_axis = torch.linspace(-200, 1000, 1000, device='cuda')
_, sld_profile_real, _ = get_density_profiles(
simulated_data['params'].thicknesses,
simulated_data['params'].roughnesses,
simulated_data['params'].slds.real,
z_axis)
_, sld_profile_imag, _ = get_density_profiles(
simulated_data['params'].thicknesses,
simulated_data['params'].roughnesses,
simulated_data['params'].slds.imag,
z_axis)
ax[1].plot(to_np(z_axis), to_np(sld_profile_real[i]), c='cyan', label='Re(SLD)')
ax[1].plot(to_np(z_axis), to_np(sld_profile_imag[i]), c='violet', label='Im(SLD)')
ax[1].set_xlabel('z [$Å$]', fontsize=20)
ax[1].set_ylabel('SLD [$10^{-6} Å^{-2}$]', fontsize=20)
ax[1].tick_params(axis='both', which='major', labelsize=15)
ax[1].tick_params(axis='both', which='minor', labelsize=15)
ax[1].legend(fontsize=14)
plt.tight_layout()
4.1.2. Periodically repeating monolayer#
This SLD parameterization addresses the following commonly encountered scenario for thin films. On top of a silicon/silicon oxide substrate we consider a thin film composed of repeating identical monolayers (grey curve in the figure), each monolayer consisting of two boxes with distinct SLDs. A sigmoid envelope modulating the SLD profile of the monolayers defines the film thickness and the roughness at the top interface (green curve in the figure). A second sigmoid envelope can be used to modulate the amplitude of the monolayer SLDs as a function of the displacement from the position of the first sigmoid (red curve in the figure). These two sigmoids allow one to model a thin film that is coherently ordered up to a certain coherent thickness and gets incoherently ordered or amorphous toward the top of the film. In addition, a layer between the substrate and the multilayer (i.e. ”phase layer”) is introduced to account for the interface structure, which does not necessarily have to be identical to the multilayer period.
This parameterization is described by 17 film parameters, their physical description together with their corresponding alias in the YAML configuration file being shown in the following table:
Parameter description |
Parameter alias in the configuration file |
---|---|
monolayer thickness (i.e. two boxes stacked together) |
d_block |
relative roughness of the monolayer interfaces (wrt. the monolayer thickness) |
s_block_rel |
SLD of the first box in the monolayer |
r_block |
SLD difference between the second and the first box in the monolayer |
dr |
fraction of the monolayer thickness belonging to the first box |
d_block1_rel |
roughness of the silicon substrate |
s_si |
SLD of the silicon substrate |
r_si |
thickness of the silicon oxide layer |
d_sio2 |
roughness of the silicon oxide layer |
s_sio2 |
SLD of the silicon oxide layer |
r_sio2 |
SLD of the phase layer |
r3 |
relative thickness of the phase layer (wrt. the monolayer thickness) |
d3_rel |
relative roughness of the phase layer (wrt. the monolayer thickness) |
s3_rel |
relative position of the first sigmoid (i.e. total film thickness) |
d_full_rel |
relative width of the first sigmoid |
rel_sigmas |
relative position of the second sigmoid (coherently ordered film thickness) |
dr_sigmoid_rel_pos |
relative width of the second sigmoid |
dr_sigmoid_rel_width |
We can initialize a reflectorch model which uses this type of SLD parameterization by making the following changes to the YAML configuration file:
Firstly, we set the
model_name
argument of the prior sampler torepeating_multilayer_v3
instead ofstandard_model
, and themax_num_layers
argument to a high value representing the maximum considered number of monolayers (e.g. 30). In addition, the parameter ranges and bound width ranges for the 17 multilayer parameters must be specified (the above table shows the correspondance between the physical parameters and their YAML subkeys in the configuration file).
dset:
prior_sampler:
cls: SubpriorParametricSampler
kwargs:
param_ranges:
d_full_rel: [0, 25]
rel_sigmas: [0, 5]
dr_sigmoid_rel_pos: [-10, 10]
dr_sigmoid_rel_width: [0, 20]
d_block1_rel: [0.01, 0.99]
d_block: [10, 20]
s_block_rel: [0., 0.3]
r_block: [0., 20.]
dr: [-10., 10.]
d3_rel: [0, 1]
s3_rel: [0, 1]
r3: [0., 25]
d_sio2: [0, 10]
s_sio2: [0, 10]
s_si: [0., 10]
r_sio2: [17., 19.]
r_si: [19., 21.]
bound_width_ranges:
d_full_rel: [0.1, 25]
rel_sigmas: [0.1, 5]
dr_sigmoid_rel_pos: [0.1, 20]
dr_sigmoid_rel_width: [0.1, 20]
d_block1_rel: [0.01, 1.0]
d_block: [0.1, 10.]
s_block_rel: [0.1, 0.3]
r_block: [0.1, 5.]
dr: [0.1, 5.]
d3_rel: [0.01, 1]
s3_rel: [0.01, 1]
r3: [0.01, 25]
d_sio2: [0.01, 10]
s_sio2: [0.01, 10]
s_si: [0.01, 10]
r_sio2: [0.01, 2]
r_si: [0.01, 2]
model_name: repeating_multilayer_v3
max_num_layers: 30
logdist: false
scale_params_by_ranges: false
scaled_range: [-1., 1.]
device: 'cuda'
The neural network architecture must be set up such that the
dim_out
argument (i.e. the output dimension) is set to the number of predicted parameters which in this case is 17.
model:
encoder:
cls: NetworkWithPriorsConvEmb
pretrained_name: null
device: 'cuda'
kwargs:
in_channels: 1
hidden_channels: [32, 64, 128, 256, 512]
dim_embedding: 128
dim_avpool: 1
embedding_net_activation: 'gelu'
use_batch_norm: true
dim_out: 17
layer_width: 512
num_blocks: 6
repeats_per_block: 2
mlp_activation: 'gelu'
dropout_rate: 0.0
pretrained_embedding_net: null
4.1.2.1. Trainer#
We initialize a model with absorption from a suitable configuration file. Here we use an extended q range, up to 0.5 Å\(^{-1}\).
trainer = get_trainer_by_name(config_name='c_repeating_multilayer', load_weights=False)
Model c_repeating_multilayer loaded. Number of parameters: 3.85 M
Show code cell source
simulated_data = trainer.loader.get_batch(batch_size=64)
n_layers = simulated_data['params'].max_layer_num
n_params = simulated_data['params'].num_params
print(f'Max number of layers: {n_layers}, Number of film parameters: {n_params}')
Max number of layers: 30, Number of film parameters: 17
Show code cell source
q = to_np(simulated_data['q_values'])[0]
scaled_noisy_curves = simulated_data['scaled_noisy_curves']
i = 3
fig, ax = plt.subplots(1,2,figsize=(12,6))
ax[0].set_ylim(-1.1, 1.1)
ax[0].set_xlabel('q [$Å^{-1}$]', fontsize=18)
ax[0].set_ylabel('R$_{scaled}$ (q)', fontsize=18)
ax[0].tick_params(axis='both', which='major', labelsize=14)
ax[0].scatter(q, to_np(scaled_noisy_curves[i]), c='blue', s=2.0);
z_axis = torch.linspace(-200, 1000, 1000, device='cuda')
_, sld_profile, _ = get_density_profiles(
simulated_data['params'].thicknesses,
simulated_data['params'].roughnesses,
simulated_data['params'].slds,
z_axis)
ax[1].plot(z_axis.cpu().numpy(), to_np(sld_profile[i]), c='b', label='ground truth')
ax[1].set_xlabel('z [$Å$]', fontsize=20)
ax[1].set_ylabel('SLD [$10^{-6} Å^{-2}$]', fontsize=20)
ax[1].tick_params(axis='both', which='major', labelsize=15)
ax[1].tick_params(axis='both', which='minor', labelsize=15)
plt.tight_layout()
4.1.2.2. Inference#
inference_model = EasyInferenceModel(config_name='c_repeating_multilayer_trained1', device='cuda')
Configuration file `D:\Github Projects\reflectorch\reflectorch\configs\c_repeating_multilayer_trained1.yaml` found locally.
Weights file `D:\Github Projects\reflectorch\reflectorch\saved_models\model_c_repeating_multilayer_trained1.safetensors` found locally.
Model c_repeating_multilayer_trained1 loaded. Number of parameters: 3.85 M
The model corresponds to a `repeating_multilayer_v3` parameterization with 30 layers (17 predicted parameters)
Parameter types and total ranges:
- d_full_rel: [0, 25]
- rel_sigmas: [0, 5]
- dr_sigmoid_rel_pos: [-10, 10]
- dr_sigmoid_rel_width: [0, 20]
- d_block1_rel: [0.01, 0.99]
- d_block: [10, 20]
- s_block_rel: [0.0, 0.3]
- r_block: [0.0, 20.0]
- dr: [-10.0, 10.0]
- d3_rel: [0, 1]
- s3_rel: [0, 1]
- r3: [0.0, 25]
- d_sio2: [0, 10]
- s_sio2: [0, 10]
- s_si: [0.0, 10]
- r_sio2: [17.0, 19.0]
- r_si: [19.0, 21.0]
Allowed widths of the prior bound intervals (max-min):
- d_full_rel: [0.1, 25]
- rel_sigmas: [0.1, 5]
- dr_sigmoid_rel_pos: [0.1, 20]
- dr_sigmoid_rel_width: [0.1, 20]
- d_block1_rel: [0.01, 1.0]
- d_block: [0.1, 10.0]
- s_block_rel: [0.1, 0.3]
- r_block: [0.1, 5.0]
- dr: [0.1, 5.0]
- d3_rel: [0.01, 1]
- s3_rel: [0.01, 1]
- r3: [0.01, 25]
- d_sio2: [0.01, 10]
- s_sio2: [0.01, 10]
- s_si: [0.01, 10]
- r_sio2: [0.01, 2]
- r_si: [0.01, 2]
The model was trained on curves discretized at 256 uniform points between between q_min=0.02 and q_max=0.5
We load an experimental curve and make a prediction:
Show code cell source
multilayer_data_path = '.././exp_data/DIP-nSi_34a.dat'
data = np.genfromtxt(multilayer_data_path, delimiter='\t', skip_header=0, unpack=True)
q_vals = data[0]
intensity_vals = data[1]
interp_range = [0.02, 0.5]
interp_points = 256
q_interp = np.linspace(interp_range[0], interp_range[1], interp_points)
min_q_idx = abs(q_vals - interp_range[0]).argmin()
max_q_idx = abs(q_vals - interp_range[1]).argmin() + 1
q = q_vals[min_q_idx:max_q_idx]
curve = intensity_vals[min_q_idx:max_q_idx]
exp_curve_interp = interp_reflectivity(q_interp, q, curve)
q_model = inference_model.trainer.loader.q_generator.q.cpu().numpy()
prior_bounds = [
(5., 20.), #relative sigmoid center
(0., 5.), #relative roughness
(-10., 10.), #position of the second sigmoid relative to d_full_rel (units are d_block)
(0., 20.), #width of the second sigmoid relative to d_full_rel (units are d_block)
(0.6, 0.9), #fractional thickness of one box1 in the monolayer
(16., 17.5), #thickness of one monolayer (two boxes stacked together)
(0.05, 0.3), #roughness of each interface in the monolayer relative to d_block
(5., 20.), ##SLD of box1 in the multilayer
(-10, -5), #dr = SLD(box2) - SLD(box1)
(0, 1), #relative thickness of phase layer with respect to d_block
(0, 1), #relative roughness of phase layer with respect to d_block
(5, 15), #SLD of phase layer
(5, 10), #thickness SiO2
(0, 10), #roughness SiO2
(0, 10), #roughness Si
(17., 18.), #SLD SiO2
(19.5, 20.5), #SLD Si
]
prediction_dict = inference_model.predict(reflectivity_curve=exp_curve_interp,
q_values=q_model,
prior_bounds=prior_bounds,
clip_prediction=False,
calc_pred_curve=True,
calc_pred_sld_profile=True,
)
pred_params = prediction_dict['predicted_params_array']
pred_curve = prediction_dict['predicted_curve']
pred_sld_xaxis = prediction_dict['predicted_sld_xaxis']
pred_sld_profile = prediction_dict['predicted_sld_profile']
n_layers = inference_model.trainer.loader.prior_sampler.max_num_layers
for param_name, param_val in zip(prediction_dict["param_names"], pred_params):
print(f'{param_name.ljust(20)} : {param_val:.2f}')
fig, ax = plt.subplots(1,2,figsize=(12,6))
ax[0].set_yscale('log')
ax[0].set_ylim(0.5e-10, 5)
ax[0].set_xlabel('q [$Å^{-1}$]', fontsize=20)
ax[0].set_ylabel('R(q)', fontsize=20)
ax[0].tick_params(axis='both', which='major', labelsize=15)
ax[0].tick_params(axis='both', which='minor', labelsize=15)
y_tick_locations = [10**(-2*i) for i in range(6)]
ax[0].yaxis.set_major_locator(plt.FixedLocator(y_tick_locations))
ax[0].scatter(q_model, exp_curve_interp, c='g', s=2, label='interp exp. curve')
ax[0].plot(q_model, pred_curve, c='r', lw=1, label='pred. curve')
ax[0].legend(loc='upper right', fontsize=14);
ax[1].plot(pred_sld_xaxis, pred_sld_profile, c='r', label='prediction')
ax[1].set_xlabel('z [$Å$]', fontsize=20)
ax[1].set_ylabel('SLD [$10^{-6} Å^{-2}$]', fontsize=20)
ax[1].tick_params(axis='both', which='major', labelsize=15)
ax[1].tick_params(axis='both', which='minor', labelsize=15)
ax[1].legend(fontsize=14);
plt.tight_layout()
d_full_rel : 12.20
rel_sigmas : 1.75
dr_sigmoid_rel_pos : -2.30
dr_sigmoid_rel_width : 16.05
d_block1_rel : 0.74
d_block : 16.80
s_block_rel : 0.06
r_block : 14.76
dr : -8.56
d3_rel : 0.18
s3_rel : 0.41
r3 : 9.58
d_sio2 : 7.33
s_sio2 : 1.83
s_si : 4.83
r_sio2 : 17.52
r_si : 20.19
4.1.3. Model with shifts#
This is a version of the standard box model parameterization, but the horizontal and vertical shifts of the reflectivity curve are additional predicted parameters.
Firstly, we set the
model_name
argument of the prior sampler tomodel_with_shifts
instead ofstandard_model
. In addition, the parameter range and bound width range of the q shifts (q_shift
) and (multiplicative) normalization shifts (norm_shift
) must be specified.
dset:
prior_sampler:
cls: SubpriorParametricSampler
kwargs:
param_ranges:
thicknesses: [1., 500.]
roughnesses: [0., 20.]
slds: [0., 25.]
q_shift: [-0.002, 0.002]
norm_shift: [0.8, 1.2]
bound_width_ranges:
thicknesses: [1.0e-2, 500.]
roughnesses: [1.0e-2, 20.]
slds: [ 1.0e-2, 4.]
q_shift: [1.0e-3, 0.004]
norm_shift: [1.0e-2, 0.4]
model_name: model_with_shifts
max_num_layers: 2
constrained_roughness: true
max_thickness_share: 0.5
logdist: false
scale_params_by_ranges: false
scaled_range: [-1., 1.]
device: 'cuda'
The output dimension of the neural network must also be modified to to account for the two additional parameters.
model:
network:
cls: NetworkWithPriorsConvEmb
pretrained_name: null
device: 'cuda'
kwargs:
in_channels: 1
hidden_channels: [32, 64, 128, 256, 512]
dim_embedding: 128
dim_avpool: 1
embedding_net_activation: 'gelu'
use_batch_norm: true
dim_out: 10
layer_width: 512
num_blocks: 6
repeats_per_block: 2
mlp_activation: 'gelu'
dropout_rate: 0.0
pretrained_embedding_net: null
We initialize a model with shifts from a suitable configuration file:
trainer = get_trainer_by_name(config_name='c_model_with_shifts', load_weights=False)
print(trainer.loader.prior_sampler.param_model)
Model c_model_with_shifts loaded. Number of parameters: 3.84 M
<reflectorch.data_generation.priors.parametric_models.ModelWithShifts object at 0x000002151EC00610>
batch_size = 64
simulated_data = trainer.loader.get_batch(batch_size)
params = simulated_data['params']
n_layers = params.max_layer_num
n_params = params.num_params
print(f'Number of layers: {n_layers}, Number of film parameters: {n_params}')
Number of layers: 2, Number of film parameters: 10
Show code cell source
q = simulated_data['q_values']
scaled_noisy_curves = simulated_data['scaled_noisy_curves']
curves_without_shifts = trainer.loader.curves_scaler.scale(
reflectivity(simulated_data['q_values'], params.thicknesses, params.roughnesses, params.slds)
)
i = 0
print(f'Q shift: {params.parameters[i, -2].item()} Intensity shift: {params.parameters[i, -1].item()}')
q = to_np(simulated_data['q_values'])
fig, ax = plt.subplots(1,1,figsize=(6,6))
ax.set_ylim(-1.1, 1.1)
ax.set_xlabel('q [$Å^{-1}$]', fontsize=18)
ax.set_ylabel('R$_{scaled}$ (q)', fontsize=18)
ax.tick_params(axis='both', which='major', labelsize=14)
ax.scatter(q[i], to_np(scaled_noisy_curves[i]), c='blue', s=2.0, label='shifted curve');
ax.plot(q[i], to_np(curves_without_shifts[i]), c='green', lw=1.0, label='original curve');
ax.legend(fontsize=14)
plt.tight_layout()
Q shift: -0.0011754242598062277 Intensity shift: 0.9204985043289542
4.2. Using alternative embedding networks#
Embedding networks have the role of processing the reflectivity curves, producing a latent representation which is fed together with the prior bounds to the main fully-connected (MLP) network in order to obtain the predictions. The default embedding network is the 1D CNN which works on reflectivity curves with fixed discretization (fixed q range and number of points in the curves). The arguments of the 1D CNN embedding network have been already explained in the previous section.
Alternatively, we can use an embedding network inspired by the Fourier Neural Operator (FNO) architecture. In this scenario the reflectivity curves together with their respective q-values are input to the embedding network. It allows us to train the model on reflectivity curves with variable discretizations (variable q ranges and numbers of points in the curves).
We can initialize a reflectorch model which uses the FNO-based embedding network by making the following changes to the YAML configuration file:
Firstly, the
q_generator
has to be changed toVariableQ
instead ofConstantQ
. Its arguments are:
q_min_range
- the range for sampling the minimum q value of the curves, q_minq_max_range
- the range for sampling the maximum q value of the curves, q_maxn_q_range
- the range for the number of points in the curves (equidistantly sampled between q_min and q_max, the number of points varies between batches but is constant within a batch)
dset:
q_generator:
cls: VariableQ
kwargs:
q_min_range: [0.01, 0.03]
q_max_range: [0.15, 0.4]
n_q_range: [128, 256]
device: 'cuda'
The network architecture is changed to be an instance of the
NetworkWithPriorsFnoEmb
class, which has the following keyword arguments:
in_channels
- the number of input channels to the FNO-based embedding network (should be 2, i.e. (R(q), q))dim_embedding
- the dimension of the embedding produced by the FNOwidth_fno
- the number of channels in the FNO blocksn_fno_blocks
- the number of FNO blocksmodes
- the number of Fourier modes that are utilizedembedding_net_activation
- the type of activation function in the embedding networkfusion_self_attention
- ifTrue
a fusion layer is used after the FNO blocks to produce the final outputuse_batch_norm
- whether to use batch normalization (only in the MLP)
The other keyword arguments are the same as for the NetworkWithPriorsConvEmb
class.
model:
network:
cls: NetworkWithPriorsFnoEmb
pretrained_name: null
device: 'cuda'
kwargs:
in_channels: 2
dim_embedding: 256
width_fno: 128
n_fno_blocks : 6
modes: 16
embedding_net_activation: 'gelu'
use_batch_norm: True
dim_out: 8
layer_width: 512
num_blocks: 5
repeats_per_block: 2
mlp_activation: 'gelu'
dropout_rate: 0.0
The
train_with_q_input
subkey of thetraining
key must be set toTrue
. Enabling gradient clipping (clip_grad_norm_max
) is also recommended.
training:
num_iterations: 10000
batch_size: 1024
lr: 1.0e-4
grad_accumulation_steps: 1
clip_grad_norm_max: 1.0
train_with_q_input: True
We initialize a model having a FNO-based embedding network from a suitable configuration file:
trainer = get_trainer_by_name(config_name='c_fno', load_weights=False)
Model c_fno loaded. Number of parameters: 4.52 M
trainer.model
NetworkWithPriorsFnoEmb(
(embedding_net): FnoEncoder(
(activation): GELU(approximate='none')
(fc0): Linear(in_features=2, out_features=128, bias=True)
(spectral_convs): ModuleList(
(0-5): 6 x SpectralConv1d()
)
(w_convs): ModuleList(
(0-5): 6 x Conv1d(128, 128, kernel_size=(1,), stride=(1,))
)
(fc_out): Linear(in_features=128, out_features=256, bias=True)
(fusion): FusionSelfAttention(
(fuser): Sequential(
(0): Linear(in_features=128, out_features=256, bias=True)
(1): Tanh()
(2): Linear(in_features=256, out_features=1, bias=False)
)
)
)
(mlp): ResidualMLP(
(first_layer): Linear(in_features=272, out_features=512, bias=True)
(blocks): ModuleList(
(0-4): 5 x ResidualBlock(
(activation): GELU(approximate='none')
(batch_norm_layers): ModuleList(
(0-1): 2 x BatchNorm1d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
)
(linear_layers): ModuleList(
(0-1): 2 x Linear(in_features=512, out_features=512, bias=True)
)
)
)
(last_layer): Linear(in_features=512, out_features=8, bias=True)
)
)
4.3. Using alternative trainers#
By default the class of the trainer object is PointEstimatorTrainer
, whose purpose is to train the neural network as a regression problem for predicting the physical parameters based on the reflectivity curves and the prior bounds for the parameters. Other types of trainers can also be specified, for example a trainer used for encoding reflectivity curves to latent representation.
In the configuration file, the trainer_cls
subkey of the training
key is set to DenoisingAETrainer
:
training:
trainer_cls: DenoisingAETrainer
We also select an appropriate neural network for this task:
model:
network:
cls: ConvAutoencoder
pretrained_name: null
device: 'cuda'
kwargs:
in_channels: 1
encoder_hidden_channels: [32, 64, 128, 256, 512]
decoder_hidden_channels: [512, 256, 128, 64, 32]
dim_latent: 64
dim_avpool: 1
use_batch_norm: true
activation: 'gelu'
decoder_in_size: 4 # num_q_points / 32
trainer = get_trainer_by_name(config_name='c_ae', load_weights=False)
print(trainer)
Model c_ae loaded. Number of parameters: 1.22 M
<reflectorch.ml.trainers.DenoisingAETrainer object at 0x0000021519AE61F0>
trainer.model
ConvAutoencoder(
(encoder): ConvEncoder(
(core): Sequential(
(0): Sequential(
(0): Conv1d(1, 32, kernel_size=(3,), stride=(2,), padding=(1,))
(1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): GELU(approximate='none')
)
(1): Sequential(
(0): Conv1d(32, 64, kernel_size=(3,), stride=(2,), padding=(1,))
(1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): GELU(approximate='none')
)
(2): Sequential(
(0): Conv1d(64, 128, kernel_size=(3,), stride=(2,), padding=(1,))
(1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): GELU(approximate='none')
)
(3): Sequential(
(0): Conv1d(128, 256, kernel_size=(3,), stride=(2,), padding=(1,))
(1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): GELU(approximate='none')
)
(4): Sequential(
(0): Conv1d(256, 512, kernel_size=(3,), stride=(2,), padding=(1,))
(1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): GELU(approximate='none')
)
)
(avpool): AdaptiveAvgPool1d(output_size=1)
(fc): Linear(in_features=512, out_features=64, bias=True)
)
(decoder): ConvDecoder(
(decoder_input): Linear(in_features=64, out_features=2048, bias=True)
(decoder): Sequential(
(0): Sequential(
(0): ConvTranspose1d(512, 256, kernel_size=(3,), stride=(2,), padding=(1,), output_padding=(1,))
(1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): GELU(approximate='none')
)
(1): Sequential(
(0): ConvTranspose1d(256, 128, kernel_size=(3,), stride=(2,), padding=(1,), output_padding=(1,))
(1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): GELU(approximate='none')
)
(2): Sequential(
(0): ConvTranspose1d(128, 64, kernel_size=(3,), stride=(2,), padding=(1,), output_padding=(1,))
(1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): GELU(approximate='none')
)
(3): Sequential(
(0): ConvTranspose1d(64, 32, kernel_size=(3,), stride=(2,), padding=(1,), output_padding=(1,))
(1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): GELU(approximate='none')
)
)
(final_layer): Sequential(
(0): ConvTranspose1d(32, 32, kernel_size=(3,), stride=(2,), padding=(1,), output_padding=(1,))
(1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): GELU(approximate='none')
(3): Conv1d(32, 1, kernel_size=(3,), stride=(1,), padding=(1,))
)
)
)