State vector indexing

Download notebook

To make it easier to work with complex state spaces phasic sports StateIndexer utility that lets you create the state space from combinations of state properties so you can map between these and the state vector indices. It also provides a flexible way to map between flat the integer indices of the state vector and structured state properties.

This page presents only the core functionality. See the State Indexer Tutorial for more details.

from phasic import Graph, with_ipv, StateIndexer, Property
import numpy as np
import seaborn as sns
from vscodenb import set_vscode_theme
set_vscode_theme()

Overriding theme from NOTEBOOK_THEME environment variable. <phasic._DeviceListFilter object at 0x163734690>

For the coalescent, the state vector indices be referred to by a combination of property values. This is useful when the vector representation of states becomes large. In the coalescent model, there is only a single property (the number of descendants) that maps directly to the state vector index. We need to create a graph more complicated than the standard coalescent to see its use. Lets make a model for the two-locus ancestral recombination graph where we need to keep track of how many descendants a lineage has at each locus.

nr_samples = 4
indexer = StateIndexer(
    n_descendants=[
        Property('locus1', max_value=nr_samples),
        Property('locus2', max_value=nr_samples)
    ]
)
indexer

StateIndexer(n_descendants=[locus1:0-4, locus2:0-4])

indexer.state_length

indexer.n_descendants

PropertySet('n_descendants', state_length=25, properties=2)

Each index in the state vector maps to a set of property values:

print('index', 'locus1', 'locus2', sep='\t')

# you can iterate over indices of a property set
for i in indexer.n_descendants: 
    props = indexer.n_descendants.index_to_props(i) 
    print(i, props.locus1, props.locus2, sep='\t')

index   locus1  locus2
0   0   0
1   1   0
2   2   0
3   3   0
4   4   0
5   0   1
6   1   1
7   2   1
8   3   1
9   4   1
10  0   2
11  1   2
12  2   2
13  3   2
14  4   2
15  0   3
16  1   3
17  2   3
18  3   3
19  4   3
20  0   4
21  1   4
22  2   4
23  3   4
24  4   4

You can map back from properties to index:

props = indexer.n_descendants.index_to_props(5) 
props.locus1, props.locus2

(0, 1)

indexer.n_descendants.props_to_index(props)

indexer.n_descendants.props_to_index(locus1=props.locus1 + 1, 
                                     locus2=props.locus2)

Two-locus ARG:

# zero state vector with appropriate size
initial = [0] * indexer.state_length

# set initial state with all lineages having one descendant at both loci
initial[indexer.props_to_index(locus1=1, locus2=1)] = nr_samples

@with_ipv(initial)
def two_locus_arg(state, indexer=None, N=None, R=None):

    transitions = []
    if state.sum() <= 1: return transitions

    for i in range(indexer.state_length):
        if state[i] == 0: continue
        props_i = indexer.n_descendants.index_to_props(i)

        for j in range(i, indexer.state_length):
            if state[j] == 0: continue
            props_j = indexer.n_descendants.index_to_props(j)
            
            same = int(i == j)
            if same and state[i] < 2:
                continue
            if not same and (state[i] < 1 or state[j] < 1):
                continue 
            child = state.copy()
            child[i] -= 1
            child[j] -= 1
            locus1 = props_i.locus1 + props_j.locus1
            locus2 = props_i.locus2 + props_j.locus2
            if locus1 <= nr_samples and locus2 <= nr_samples:
                child[indexer.props_to_index(locus1=locus1, locus2=locus2)] += 1
                transitions.append([child, state[i]*(state[j]-same)/(1+same)/N])

        if state[i] > 0 and props_i.locus1 > 0 and props_i.locus2 > 0:
            child = state.copy()
            child[i] -= 1
            child[indexer.props_to_index(locus1=props_i.locus1, locus2=0)] += 1
            child[indexer.props_to_index(locus1=0, locus2=props_i.locus2)] += 1
            transitions.append([child, R])

    return transitions


graph = Graph(two_locus_arg, 
                     N=1, R=1,
                     indexer=indexer) # passing indexer as argument
graph.plot(max_nodes=200)

Overriding theme from NOTEBOOK_THEME environment variable. <phasic._DeviceListFilter object at 0x163734690>

# def fun(i):
#     i in indexer.props_to_index({'locus1': True})

# graph.plot(max_nodes=200, by_index=fun)

graph.expectation()

2.009144546646421

reward_matrix = graph.states().T

If you pass only a subset of the state properties, props_to_index will return all the indices matching that property. This lets you get the get the relevant indices for use as rewards. Say, you want the total branch length of lineages with one descendant at locus 1 and any number of descendants at locus 2:

idx_singleton_loc1 = indexer.n_descendants.props_to_index(locus1=1)
idx_singleton_loc1

array([ 1,  6, 11, 16, 21])

Now we can get those reward vectors and sum them:

reward_matrix = graph.states().T

singleton_rewards_loc1 = reward_matrix[idx_singleton_loc1].sum(axis=0)
singleton_rewards_loc1

array([0, 4, 2, 4, 0, 1, 2, 2, 2, 4, 4, 0, 0, 1, 1, 0, 1, 2, 2, 2, 2, 1,
       2, 2, 2, 2, 4, 4, 4, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 2,
       2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 0, 1, 2, 2, 4, 4, 4, 4, 4, 0, 0, 0,
       1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 0,
       1, 1, 2, 4, 4, 4, 4, 0, 0, 0, 1, 1, 2, 2, 0, 0, 1, 4, 4, 4, 0, 4])

graph.expectation(singleton_rewards_loc1)

2.0

If we repeat the exercise for all numbers of descendants at locus 1, we can get the standard SFS (we we should), except that the expected 4’ton branch length is not zero. This is because there can be a 4’ton branch from the TMRCA of the samples at locus 1 and the TMRCA for the full ARG:

sfs_loc1 = [graph.expectation(reward_matrix[indexer.n_descendants.props_to_index(locus1=i)].sum(axis=0)) for i in range(1, nr_samples+1)]
labels = [f"{i+1}'ton" for i in range(nr_samples)]
sns.barplot(x=labels, y=sfs_loc1, hue=labels) ;