from phasic import ( Graph, with_ipv, GaussPrior, HalfCauchyPrior, ExpStepSize, ExpRegularization, clear_caches, Adam, Adagrad, RMSprop, SGDMomentum, SparseObservations, dense_to_sparse, is_sparse_observations,) # ALWAYS import phasic first to set jax backend correctlyimport numpy as npimport jax.numpy as jnpimport pandas as pdfrom typing import Optionalimport matplotlib.pyplot as pltfrom matplotlib.colors import LogNormimport seaborn as snsfrom tqdm.auto import tqdmfrom vscodenb import set_vscode_themenp.random.seed(42)set_vscode_theme()sns.set_palette('tab10')clear_caches()
Removed 14 file(s), preserved directory structure
Rewards
We can fit to reward transformed observations, like the total branch length of the coalescent. In the example below, we sample total branch lengths from the model and pass them to svgd along with the appropriate rewards.
clear_caches(verbose=True)
nr_samples =4@with_ipv([nr_samples]+[0]*(nr_samples-1))def coalescent_1param(state): transitions = []for i inrange(state.size):for j inrange(i, state.size): same =int(i == j)if same and state[i] <2:continueifnot same and (state[i] <1or state[j] <1):continue new = state.copy() new[i] -=1 new[j] -=1 new[i+j+1] +=1 transitions.append([new, [state[i]*(state[j]-same)/(1+same)]])return transitionstrue_theta = [7]graph = Graph(coalescent_1param) # should check using the graph hash if a trace of the graph is cached or available onlinegraph.update_weights(true_theta)graph.plot()
Parameter Fixed MAP Mean SD HPD 95% lo HPD 95% hi
0 No 6.9720 7.0378 0.2717 6.7002 7.3255
Particles: 24, Iterations: 100
svgd.get_results()["theta_mean"].item()
7.037833690911762
svgd.map_estimate_from_particles()
([6.97195685203109], -1530.0278012595795)
Multi-feature observations
Observations can be vectors of features, in which case you need to supply a set or rewards for each feature. The example below is the the same as above, except that we pass observations as single-value vectors.
clear_caches(verbose=True)
Removed 1 file(s), preserved directory structure
# same as abovegraph = Graph(coalescent_1param)graph.update_weights(true_theta)nr_observations =10000rewards = graph.states().Trewards = np.sum(rewards, axis=0)observed_data = graph.sample(nr_observations, rewards=rewards)
# make reward vector the only row of a 2d arrayrewards = np.reshape(rewards, (-1, len(rewards))) # reshape data into a single-column 2d arrayobserved_data = np.reshape(observed_data, (len(observed_data), -1))
With more than one feature, SVGD computes the likelihood for each feature using the the provided rewards. In this example, sample 1’ton, 2’ton, and 3’ton branch lengths are first, second and third observation features. The reward matrix passed to svgd must have as many rows as your data has features. Three rows in the reward matrix:
Data must be passed to svgd as a SparseData object. You can create it like this:
n = nr_observationsidxs = np.arange(rewards.shape[0])feature_data = np.concatenate([graph.sample(n, rewards=rewards[i]) for i in idxs])features = np.concatenate([[i]*n for i in idxs])feature_slices = [(i*n, (i+1)*n) for i in idxs]sparse_observations = SparseObservations( values=observed_data, features=features, n_features=3, slices=feature_slices)sparse_observations
It can also be created from a two-dimensional array where each row is an observation with all but a single non-nan value:
n = nr_observationsobserved_data = np.zeros((n * (rewards.shape[0]), rewards.shape[0]), dtype=float)observed_data[:] = np.nanfor i inrange(rewards.shape[0]): observed_data[(i*n):((i+1)*n), i] = graph.sample(n, rewards=rewards[i])np.random.shuffle(observed_data)observed_data[:nr_observations] # just the first nr_observations rowsobserved_data