Adagrad
phasic.svgd.Adagrad(learning_rate=0.01, epsilon=1e-08)Adagrad optimizer for SVGD.
Adagrad adapts the learning rate for each parameter based on the accumulated sum of squared gradients. Parameters with large gradients get smaller learning rates, and parameters with small gradients get larger learning rates.
Parameters
learning_rate :floatorStepSizeSchedule= 0.01-
Base learning rate. Can be a schedule for learning rate decay.
epsilon :float= 1e-8-
Small constant for numerical stability.
Attributes
G :arrayor None-
Accumulated sum of squared gradients, shape (n_particles, theta_dim)
Examples
>>> from phasic import SVGD, Adagrad
>>>
>>> optimizer = Adagrad(learning_rate=0.01)
>>> svgd = SVGD(
... model=model,
... observed_data=observations,
... theta_dim=2,
... optimizer=optimizer
... )References
Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. JMLR 12:2121-2159.
Notes
Update rule: G += gradient²; params += lr * gradient / (√G + ε)
Warning: The learning rate decays over time as G accumulates. For long runs, consider using RMSprop or Adam which have bounded effective learning rates.
Methods
| Name | Description |
|---|---|
| reset | Reset optimizer state for given particle shape. |
| step | Compute Adagrad update. |
reset
phasic.svgd.Adagrad.reset(shape)Reset optimizer state for given particle shape.
step
phasic.svgd.Adagrad.step(phi, particles=None)Compute Adagrad update.
Parameters
phi :array(n_particles,theta_dim)-
SVGD gradient direction.
particles :array(n_particles,theta_dim) = None-
Current particle positions. Not used by Adagrad.
Returns
update :array(n_particles,theta_dim)-
Scaled update to add to particles.