multistage.multistage_train

multistage.multistage_train(net, residual_fun_s1, residual_fun_s2, loss_fun_s1, loss_fun_s2, x, training_samples, optimizer, steps, *, learning_rate=None, adaptive_sample_freq=1000, n_stages=2, width_size=20, depth=4, activation=<PjitFunction of <function tanh>>, num_samples_for_epsilon=(1024, ), order=(1, ), beta_fun=None, heuristic=0.9, frequency_estimator='spectral', residual_tol=0.0, frequency_samples_per_mode=6.0, frequency_training_samples=None, chebyshev=False, feature_map='separable', stage_correction_param_map=None, extra_stage_corrections=None, x_stage2=None, training_samples_stage2=None, loss_components_fun_s1=None, loss_components_fun_s2=None, gamma_s1=0.5, gamma_s2=0.5, gamma_g_s1=None, gamma_g_s2=None, gamma_g_eps=1e-12, gamma_select_kwargs=None, adaptive_sample_accumulate=False, max_adaptive_samples=None, return_loss_history=True, print_every=100, key=None, net_kwargs_for_save=None, name='', checkpoint_dir=None, checkpoint_every=5000, resume=False, benchmark_state=None, normalize_loss=True, **adaptive_sample_kwargs)Source 

Multi-stage training.

Examples

See tests/test_burgers.py.

Parameters:

net (eqx.Module) – The initial model architecture.
residual_fun_s1 (callable) – Function to compute PDE residuals for the first stage.
residual_fun_s2 (callable) – Function to compute PDE residuals for subsequent stages.
loss_fun_s1 (callable) – Scalar output loss function for the first stage.
loss_fun_s2 (callable) – Scalar output loss function for subsequent stages.
x (tuple or list of jax.Array) – Input coordinates for the first stage.
training_samples (jax.Array) – Target values for the first stage.
optimizer (optax.GradientTransformation) – Optimizer for training loops.
steps (int) – Number of training steps per stage.
learning_rate (float, optional) – Learning rate passed to the optimizer.
adaptive_sample_freq (int, optional) – Frequency of adaptive sampling during training.
n_stages (int, optional) – Total number of training stages. Default is 2.
width_size (int, optional) – Width of the sub-networks added in later stages.
depth (int, optional) – Depth of the sub-networks added in later stages.
activation (callable, optional) – Activation function for new stages. Default is jnp.tanh.
num_samples_for_epsilon (tuple, optional) – Number of samples used to estimate error statistics between stages.
order (tuple, optional) – Derivative orders used for error estimation. A 1D tuple is interpreted as separate operator terms in each input direction; pass nested multi-indices for mixed derivative terms.
beta_fun (callable, optional) – Function defining scalar or per-operator-term coefficients for the error estimate.
heuristic (float, optional) – Heuristic multiplier for error estimation. Default is 0.9. Used only when frequency_estimator="zero_crossing".
frequency_estimator ({"spectral", "zero_crossing"}) – Fourier residual frequency estimator used between stages.
residual_tol (float or None) – Stop adding stages when the RMS residual estimate is at or below this tolerance. Set to None to disable this safeguard.
frequency_samples_per_mode (float or None) – Warn when the estimated Fourier/Chebyshev mode exceeds the available statistics samples divided by this value. Set to None to disable.
frequency_training_samples (tuple[int], optional) – Explicit per-axis training/collocation resolution for frequency warnings. If omitted, grid data are inferred from unique coordinates and scattered data use roughly num_points ** (1 / in_size) per axis.
chebyshev (bool) – Whether to use Chebyshev feature mapping instead of Fourier. If given, heuristic is ignored.
feature_map ({"separable", "random"}) – First-layer feature geometry for new stages. "separable" is the default; "random" preserves the previous dense plane-wave mapping.
stage_correction_param_map (dict, optional) – Mapping from current-stage parameter names to next-stage correction names or initializers. Custom entries extend the built-in defaults, including {"log_lambda_2": "lambda_2"} for Burgers-style signed physical diffusion corrections.
extra_stage_corrections (sequence or callable, optional) – Additional correction specs for automatically constructing a MultiCorrectionStage. Each spec may be (epsilon, kappa), a dict with epsilon/kappa or epsilon_scale/kappa_scale, or a callable spec(net, stage, eps_residual, eps_prediction, kappa).
x_stage2 (tuple of jax.Array, optional) – Input coordinates for stage 2 and beyond. Default is x.
training_samples_stage2 (jax.Array, optional) – Training data for stage 2 and beyond. Default is training_samples.
loss_components_fun_s1 (callable, optional) – Component loss functions returning data/equation/(optional) gradient losses. When supplied, they are wrapped using gamma and gamma_g.
loss_components_fun_s2 (callable, optional) – Component loss functions returning data/equation/(optional) gradient losses. When supplied, they are wrapped using gamma and gamma_g.
gamma_s1 (float, optional) – Equation-loss weights used with component loss functions.
gamma_s2 (float, optional) – Equation-loss weights used with component loss functions.
gamma_g_s1 (float, optional) – Residual-gradient weights used with component loss functions. If None and a gradient component is returned, the weight is estimated from residual and residual-gradient magnitudes.
gamma_g_s2 (float, optional) – Residual-gradient weights used with component loss functions. If None and a gradient component is returned, the weight is estimated from residual and residual-gradient magnitudes.
gamma_g_eps (float, optional) – Numerical floor for automatic gamma_g estimates.
gamma_select_kwargs (dict, optional) – Options for automatic Algorithm-3-style gamma selection. Set gamma_s1="auto" and/or gamma_s2="auto" to enable it.
adaptive_sample_accumulate (bool, optional) – If True, adaptive collocation points accumulate instead of replacing the previous adaptive set.
max_adaptive_samples (int, optional) – Maximum number of accumulated adaptive samples per coordinate.
return_loss_history (bool, optional) – If True, returns loss histories for all stages.
print_every (int, optional) – Logging frequency.
key (jax.random.PRNGKey, optional) – Random key for initialization and sampling.
net_kwargs_for_save (dict, optional) – Additional metadata to save with the model.
name (str, optional) – Base name for saving models and checkpoints.
checkpoint_dir (str, optional) – Directory to store stage-specific checkpoints. Default is None, which disables checkpointing unless explicitly requested.
checkpoint_every (int, optional) – Frequency of checkpointing within stages.
resume (bool, optional) – If True, resume each stage from an existing checkpoint. Default is False to avoid silently reusing stale experiments.
benchmark_state (callable, optional) – Callback for external benchmarking or logging. Signature: benchmark_state(net,stage,name,step=step).
normalize_loss (bool, optional) – Whether to normalize each stage’s objective by its initial value.

Returns:

net (eqx.Module) – The final trained multi-stage model.
loss_histories (list of list of float, optional) – A list containing the loss history for each stage (if return_loss_history` is True).