from phasic import Graph, with_ipv # ALWAYS import phasic first to set jax backend correctlyimport sysimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom matplotlib.colors import LogNormimport seaborn as snsfrom vscodenb import set_vscode_themenp.random.seed(42)set_vscode_theme()sns.set_palette('tab10')
Overriding theme from NOTEBOOK_THEME environment variable. <phasic._DeviceListFilter object at 0x17c6e4f90>
Construct a graph:
nr_samples =4@with_ipv([nr_samples]+[0]*(nr_samples-1))def coalescent(state): transitions = []for i inrange(state.size):for j inrange(i, state.size): same =int(i == j)if same and state[i] <2:continueifnot same and (state[i] <1or state[j] <1):continue new = state.copy() new[i] -=1 new[j] -=1 new[i+j+1] +=1 transitions.append((new, state[i]*(state[j]-same)/(1+same)))return transitions
graph = Graph(coalescent)graph.plot()
Overriding theme from NOTEBOOK_THEME environment variable. <phasic._DeviceListFilter object at 0x17c6e4f90>
Show states nicely as data frame:
labels = [f"{i+1}'ton"for i inrange(nr_samples)]pd.DataFrame(graph.states(), columns=labels)
1'ton
2'ton
3'ton
4'ton
0
0
0
0
0
1
4
0
0
0
2
2
1
0
0
3
0
2
0
0
4
1
0
1
0
5
0
0
0
1
This phase-type distribution models the time until all lineages have coalesced. For convenience, we can get its expectation and variance like this:
graph.expectation()
1.4999999999999996
graph.variance()
1.1388888888888893
But if you want you can get any number of moments like this (here three):
Expected time spent in each state after some time:
graph.accumulated_occupancy
<bound method Graph.accumulated_occupancy of <Graph (6 vertices)>>
We can get the CDF and PDF. The distribution methods reuse cached computations and recompute only if the graph changes. Compare the running times for the first and second call to the function:
time = np.arange(0, 4, 0.001)
%%timecdf = graph.cdf(time)
CPU times: user 213 μs, sys: 22 μs, total: 235 μs
Wall time: 238 μs
%%timecdf = graph.cdf(time)
CPU times: user 28 μs, sys: 4 μs, total: 32 μs
Wall time: 34.1 μs
Adding these rewards, the phase-type distribution now represent the total doubleton branch length (total length of branches with two descendants). The expectation and variance are now:
Using rewards to the moment functions etc. is much faster than changing the graph. Here is the SFS (Site Frequency Spectrum) computed in this way. The 4’ton is included to illustrate that it is represented but has expectation zero since this it the absorbing TMRCA.
labels = [f"{i+1}'ton"for i inrange(reward_matrix.shape[0])]sfs = [graph.expectation(r) for r in reward_matrix]plt.figure(figsize=(4, 3))sns.barplot(x=labels, y=sfs, hue=labels) ;
Computing the (unrewarded) expectation on the reward-transforming graph produces the same result as passing the rewards to expectation, but is much slower:
We can use that to compute the expected number of lineages across time:
times = np.arange(0, 5, 0.05)expected_lineages_left = [ np.sum(graph.state_probability(i) * np.sum(graph.states(), axis=1)) for i in times ]fig, ax = plt.subplots(1, 1, figsize=(4, 3))ax.plot(times, expected_lineages_left)ax.set_xlabel('Time')ax.set_ylabel("Expected nr lineages") ;
Figure 1: Expected number of lineages across time.
The same way we can also compute the expected number of each kind of lineage across time:
times = np.arange(0, 5, 0.05)state_probs = np.array([graph.state_probability(i) * graph.states().T for i in times])labels = [f"{i+1}'ton"for i inrange(state_probs.shape[1])]fig, axes = plt.subplots(1, 2, figsize=(6, 3))axes[0].stackplot(times, np.sum(state_probs, axis=2).T)axes[1].plot(times, np.sum(state_probs, axis=2), label=labels)# plt.subplot(121).stackplot(times, np.sum(state_probs, axis=2).T)# plt.subplot(122).plot(times, np.sum(state_probs, axis=2), label=labels)plt.legend() ;
Figure 2: Expected number of lineages across time.
We can also get the accumulated visiting time of a particular state. E.g. the total doubleton branchlength before time t=1.5: