Adam
phasic.svgd.Adam(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08)Adam optimizer for SVGD with per-parameter adaptive learning rates.
Adam maintains running estimates of the first moment (mean) and second moment (uncentered variance) of gradients, using these to adaptively scale updates per-parameter. This is especially useful when: - Gradients have vastly different scales across parameters - Dataset size causes large gradient magnitudes - Optimization landscape has varying curvature
Parameters
learning_rate :floatorStepSizeSchedule= 0.001-
Base learning rate (α in Adam paper). Can be a schedule (e.g., ExpStepSize, WarmupExpStepSize) for learning rate decay during optimization.
beta1 :floatorStepSizeSchedule= 0.9-
Exponential decay rate for first moment estimates (momentum). Higher = more smoothing, slower adaptation. Can be a schedule for advanced warmup strategies.
beta2 :floatorStepSizeSchedule= 0.999-
Exponential decay rate for second moment estimates (gradient variance). Higher = longer memory of gradient magnitudes. Can be a schedule.
epsilon :float= 1e-8-
Small constant for numerical stability in division.
Attributes
m :arrayor None-
First moment estimate (shape: n_particles, theta_dim)
v :arrayor None-
Second moment estimate (shape: n_particles, theta_dim)
t :int-
Current timestep (for bias correction)
Examples
>>> from phasic import SVGD, Adam
>>>
>>> # Create optimizer with default settings
>>> optimizer = Adam(learning_rate=0.01)
>>>
>>> # Use with SVGD
>>> svgd = SVGD(
... model=model,
... observed_data=observations,
... theta_dim=2,
... optimizer=optimizer,
... n_particles=50,
... n_iterations=200
... )
>>> svgd.fit()
>>>
>>> # Exponential decay learning rate
>>> optimizer = Adam(learning_rate=ExpStepSize(first_step=0.01, last_step=0.001, tau=500))
>>>
>>> # Warmup + decay (recommended for large models)
>>> optimizer = Adam(learning_rate=WarmupExpStepSize(peak_lr=0.01, warmup_steps=70))References
Kingma, D. P., & Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv:1412.6980. https://arxiv.org/abs/1412.6980
Notes
When using Adam, the learning_rate parameter passed to SVGD is ignored in favor of the optimizer’s learning rate.
Methods
| Name | Description |
|---|---|
| reset | Reset optimizer state for given particle shape. |
| step | Compute Adam update given SVGD gradient direction. |
reset
phasic.svgd.Adam.reset(shape)Reset optimizer state for given particle shape.
Called at the start of optimization to initialize moment estimates.
Parameters
shape :tuple-
Shape of particles array (n_particles, theta_dim) or (n_particles, learnable_dim) if fixed parameters are used.
step
phasic.svgd.Adam.step(phi, particles=None)Compute Adam update given SVGD gradient direction.
Parameters
phi :array(n_particles,theta_dim)-
SVGD gradient direction: (K @ grad_log_p + sum(grad_K)) / n_particles This is the direction of steepest ascent in the RKHS.
particles :array(n_particles,theta_dim) = None-
Current particle positions. Not used by base Adam, but available for subclasses (e.g., Adamelia jitter).
Returns
update :array(n_particles,theta_dim)-
Scaled update to add to particles. Each element is adaptively scaled based on the history of gradients for that parameter.