Adagrad

phasic.svgd.Adagrad(learning_rate=0.01, epsilon=1e-08)

Adagrad optimizer for SVGD.

Adagrad adapts the learning rate for each parameter based on the accumulated sum of squared gradients. Parameters with large gradients get smaller learning rates, and parameters with small gradients get larger learning rates.

Parameters

learning_rate : float or StepSizeSchedule = 0.01

Base learning rate. Can be a schedule for learning rate decay.

epsilon : float = 1e-8

Small constant for numerical stability.

Attributes

G : array or None

Accumulated sum of squared gradients, shape (n_particles, theta_dim)

Examples

>>> from phasic import SVGD, Adagrad
>>>
>>> optimizer = Adagrad(learning_rate=0.01)
>>> svgd = SVGD(
...     model=model,
...     observed_data=observations,
...     theta_dim=2,
...     optimizer=optimizer
... )

References

Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. JMLR 12:2121-2159.

Notes

Update rule: G += gradient²; params += lr * gradient / (√G + ε)

Warning: The learning rate decays over time as G accumulates. For long runs, consider using RMSprop or Adam which have bounded effective learning rates.

Methods

Name Description
reset Reset optimizer state for given particle shape.
step Compute Adagrad update.

reset

phasic.svgd.Adagrad.reset(shape)

Reset optimizer state for given particle shape.

step

phasic.svgd.Adagrad.step(phi, particles=None)

Compute Adagrad update.

Parameters

phi : array(n_particles, theta_dim)

SVGD gradient direction.

particles : array(n_particles, theta_dim) = None

Current particle positions. Not used by Adagrad.

Returns

update : array(n_particles, theta_dim)

Scaled update to add to particles.