Math and Algorithms
Introduction to Phase-Type Distributions
Phase-type distributions are a powerful class of probability distributions that model the time until absorption in continuous or discrete-time Markov chains on a finite state space. They find diverse applications in statistical modeling, including physics, telecommunications, queuing theory, and recently, population genetics.
This page provides an introduction to phase-type distributions, their mathematical formulation, and the correspondence between traditional matrix representations and the graph-based approach implemented in phasic.
What is a Phase-Type Distribution?
A phase-type distribution represents the time until a Markov process reaches an absorbing state. The process moves through various “phases” (transient states) before eventually being absorbed.
Continuous Phase-Type Distributions
In a continuous phase-type distribution, the underlying process is a continuous-time Markov jump process.
Example: Consider a system with 3 transient states and 1 absorbing state. The system starts in state 1, transitions between states at exponential rates, and eventually reaches the absorbing state. The time until absorption follows a phase-type distribution.
Key characteristics: - Transient states: temporary states the process can visit - Absorbing state: terminal state from which the process cannot escape - Transition rates: exponential rates governing state transitions - Time until absorption: the random variable of interest
Discrete Phase-Type Distributions
A discrete phase-type distribution models the number of jumps in a discrete-time Markov chain until reaching the absorbing state.
Example: Number of steps in a random walk before hitting a boundary.
Mathematical Formulation
Continuous Phase-Type Distributions
Let \tau be a phase-type distributed random variable with p transient states. We write \tau \sim PH_p(\boldsymbol{\alpha}, \mathbf{S}) where:
- \boldsymbol{\alpha} = (\alpha_1, \ldots, \alpha_p): Initial probability vector
- \alpha_i is the probability of starting in state i
- Potential defect: \alpha_0 = 1 - \sum_{i=1}^p \alpha_i (probability of starting in absorbing state)
- \mathbf{S}: Sub-intensity matrix (rate matrix excluding absorbing states)
- Entry s_{ij} is the rate of transition from state i to state j
- Diagonal entries are negative: s_{ii} = -\sum_{j \neq i} s_{ij}
- \mathbf{U} = (-\mathbf{S})^{-1}: Green matrix
- Entry u_{ij} is the expected total time spent in state j when starting in state i
Cumulative Distribution Function (CDF)
The probability that absorption occurs by time t:
F(t) = 1 - \boldsymbol{\alpha} e^{\mathbf{S}t} \mathbf{e}, \quad t \geq 0
where e^{\mathbf{S}t} is the matrix exponential and \mathbf{e} is a vector of ones.
Probability Density Function (PDF)
The density function is:
f(t) = \boldsymbol{\alpha} e^{\mathbf{S}t} (-\mathbf{S}) \mathbf{e}, \quad t \geq 0
Moments
The expectation (first moment):
\mathbb{E}[\tau] = \boldsymbol{\alpha} \mathbf{U} \mathbf{e}
Higher-order moments:
\mathbb{E}[\tau^k] = k! \boldsymbol{\alpha} \mathbf{U}^k \mathbf{e}
The variance:
\text{Var}(\tau) = 2\boldsymbol{\alpha} \mathbf{U}^2 \mathbf{e} - (\boldsymbol{\alpha} \mathbf{U} \mathbf{e})^2
Discrete Phase-Type Distributions
For discrete phase-type distributions with sub-transition matrix \mathbf{T} and initial distribution \boldsymbol{\pi}:
Green Matrix
\mathbf{U} = -(\mathbf{T} - \mathbf{I})^{-1}
CDF (Probability Mass Function)
The probability of absorption at exactly jump t:
F(t) = 1 - \boldsymbol{\pi} \mathbf{T}^t \mathbf{e}
Moments
The expectation:
\mathbb{E}[\tau] = \boldsymbol{\pi} \mathbf{U} \mathbf{e}
Reward-Transformed Distributions
Phase-type distributions can be reward-transformed by assigning rewards r_i > 0 to each state i. Instead of measuring time until absorption, the distribution represents accumulated reward until absorption.
Mathematical Interpretation
Intuition: Rewards scale the “waiting time” in each state. If state i has reward r_i = 2, then spending time t in that state accumulates reward 2t. This is like earning interest at different rates in different states.
Given rewards \mathbf{r} = (r_1, \ldots, r_p), the reward-transformed variable is:
Y = \int_0^\tau r(X_t) dt
where \{X_t\} is the underlying Markov process and \tau is the time until absorption.
Example: Consider a system where: - State 1 has reward r_1 = 1 (baseline accumulation rate) - State 2 has reward r_2 = 5 (accumulates 5× faster) - The process spends time t_1 in state 1 and t_2 in state 2
Then the total accumulated reward is Y = 1 \cdot t_1 + 5 \cdot t_2.
The k-th moment of the reward-transformed distribution:
\mathbb{E}[Y^k] = k! \boldsymbol{\alpha} (\mathbf{U} \boldsymbol{\Delta}(\mathbf{r}))^k \mathbf{e}
where \boldsymbol{\Delta}(\mathbf{r}) is a diagonal matrix with rewards on the diagonal.
Graph Interpretation
In the graph representation, rewards are vertex scalars that modify what we measure:
- Without rewards: Measure time until absorption (standard phase-type distribution)
- With rewards: Each vertex v has an associated reward r_v (a positive scalar)
- Accumulation: While the process resides in vertex v, reward accumulates at rate r_v per unit time
- Edge weights unchanged: Transition rates (edge weights) remain the same; only the “value” of time in each state changes
Think of rewards as time multipliers at each vertex: spending time \Delta t in vertex v contributes r_v \cdot \Delta t to the total accumulated reward.
Key insight: Rewards don’t change the transition structure of the graph—they only change what we’re measuring. The same stochastic process gives different distributions depending on the rewards.
Practical uses: - Population genetics: Counting mutations (reward = mutation rate per lineage) - Queuing theory: Accumulated cost (reward = cost per unit time in each state) - Physics: Energy or distance (reward = rate of energy/distance change in each state)
Example: In a population genetics model: - Each state represents a configuration of genealogical lineages - State with k lineages has reward r_k = \binom{k}{2} (number of pairs that could coalesce) - The reward-transformed distribution represents total branch length, not just time
Multivariate Phase-Type Distributions
Multivariate phase-type distributions assign a vector of rewards to each state, resulting in a vector-valued outcome:
\mathbf{Y} \sim MPH^*(\boldsymbol{\alpha}, \mathbf{S}, \mathbf{R})
where \mathbf{R} is a matrix of rewards (each row corresponds to a state, each column to a reward dimension).
Applications: Modeling multiple correlated quantities accumulated along the same stochastic process.
Marginal moments: Computed independently for each dimension.
Joint moments: The covariance between dimensions i and j:
\text{Cov}(Y_i, Y_j) = \boldsymbol{\alpha}\mathbf{U}(\mathbf{R}_{\cdot i})\mathbf{U}(\mathbf{R}_{\cdot j})\mathbf{e} + \boldsymbol{\alpha}\mathbf{U}(\mathbf{R}_{\cdot j})\mathbf{U}(\mathbf{R}_{\cdot i})\mathbf{e} - \boldsymbol{\alpha}\mathbf{U}(\mathbf{R}_{\cdot i})\mathbf{e} \cdot \boldsymbol{\alpha}\mathbf{U}(\mathbf{R}_{\cdot j})\mathbf{e}
Graph-Based Representation
Traditional matrix-based formulations become computationally infeasible for systems with thousands of states. However, many real-world systems have sparse state spaces where each state connects to only a few others. This sparseness makes a graph-based representation more natural and efficient.
Graph Representation
A phase-type distribution can be represented as a weighted directed graph G = (V, E) where:
- Vertices (V): Represent states in the Markov chain
- Special starting vertex S: initial state
- Transient vertices: states the process can visit
- Absorbing vertices: terminal states (no outgoing edges)
- Edges (E): Represent transitions between states
- An edge (v \to z) connects vertex v to vertex z
- Each edge has a weight w(v \to z) representing the transition rate
- Edge weights:
- For continuous distributions: transition rates (positive real numbers)
- For discrete distributions: transition probabilities (between 0 and 1)
Correspondence: Matrices ↔︎ Graphs
The graph representation directly corresponds to the matrix formulation:
| Matrix Concept | Graph Concept |
|---|---|
| State i | Vertex v |
| Initial probability \alpha_i | Weight w(S \to v_i) from starting vertex |
| Sub-intensity entry s_{ij} | Edge weight w(v_i \to v_j) |
| Absorbing state | Vertex with no outgoing edges |
| Total exit rate from state i | \lambda_v = \sum_{z \in \text{children}(v)} w(v \to z) |
Example: The sub-intensity matrix
\mathbf{S} = \begin{pmatrix} -5 & 1 & 4 \\ 1 & -8 & 2 \\ 0 & 1 & -3 \end{pmatrix}
corresponds to a graph with: - Vertex A with edges: (A → B, weight=1), (A → C, weight=4) - Vertex B with edges: (B → A, weight=1), (B → C, weight=2) - Vertex C with edges: (C → B, weight=1) - Total exit rates: \lambda_A = 5, \lambda_B = 8, \lambda_C = 3
Advantages of Graph Representation
- Memory efficiency: Store only existing transitions (not full p \times p matrix)
- For sparse systems: O(|E|) vs O(p^2) memory
- Computational efficiency: Operations on edges only
- Many algorithms: O(|V|^2) vs O(p^3) for matrix operations
- Iterative construction: Build state space incrementally
- Add vertices and edges as needed
- Natural for complex phenomena
- Intuitive modeling: Specify transitions from each state independently
- Markov property: next state depends only on current state
- Easy to encode complex models
- Numerical stability: Avoid matrix inversions and exponentials
Example: Rabbit Island Model
Consider rabbits jumping between two islands at rate 1. Islands flood at rates 2 and 4, drowning all rabbits on that island. The absorbing state is when all rabbits have drowned.
State representation: Vector (n_1, n_2) where n_i is the number of rabbits on island i.
Graph construction: - Start with state (2, 0): 2 rabbits on island 1 - From state (i, j): - If i > 0: transition to (i-1, j+1) at rate i (rabbit jumps left to right) - If j > 0: transition to (i+1, j-1) at rate j (rabbit jumps right to left) - If i > 0: transition to (0, j) at rate 2 (island 1 floods) - If j > 0: transition to (i, 0) at rate 4 (island 2 floods) - Absorbing state: (0, 0)
This naturally sparse structure makes graphs ideal for representation.
Computational Algorithms
phasic implements efficient graph-based algorithms for:
- State space construction: Iterative building of the graph
- Reward transformation: Modifying distributions by assigning rewards
- Moment computation: Computing expectations, variances, higher moments
- Distribution functions: Computing PMF/PDF and CDF
- Multivariate distributions: Joint and marginal moments
Key Algorithmic Innovations
From Røikjer, Hobolth, and Munch (2022):
- Acyclic graph transformation: Convert cyclic graphs to acyclic form for recursive computation
- Cached operations: Compute moments in O(|V|^2) time after initial O(|V|^3) setup
- Discrete approximation: Approximate continuous distributions with arbitrary precision
- Parallel computation: Natural parallelization over graph structure
Performance: Orders of magnitude faster than matrix-based methods for sparse systems.
Practical Example
Let’s compute properties of an Erlang distribution (sum of exponential random variables):
from phasic import Graph
# Build Erlang(3) distribution: sum of 3 exponentials with rate 1
g = Graph(1)
start = g.starting_vertex()
# Create chain: S -> v1 -> v2 -> v3 -> absorbing
v1 = g.find_or_create_vertex([1])
v2 = g.find_or_create_vertex([2])
v3 = g.find_or_create_vertex([3])
start.add_edge(v1, 1.0) # Start in state 1 with probability 1
v1.add_edge(v2, 1.0) # Transition rate 1
v2.add_edge(v3, 1.0) # Transition rate 1
v3.add_edge(g.find_or_create_vertex([4]), 1.0) # To absorbing state
# Compute properties
print(f"Expectation: {g.expectation()}") # Should be 3.0
print(f"Variance: {g.variance()}") # Should be 3.0
print(f"PDF at t=2.0: {g.pdf(2.0)}")
print(f"CDF at t=2.0: {g.cdf(2.0)}")When to Use Phase-Type Distributions
Phase-type distributions are particularly useful when:
- Modeling absorption times: Time until system reaches terminal state
- Sparse state spaces: Many states but few transitions per state
- Complex phenomena: Iterative construction from simple rules
- Multiple quantities: Multivariate distributions with correlated outcomes
- Large state spaces: Thousands to millions of states
Applications: - Queuing theory (service times) - Reliability theory (time to failure) - Population genetics (coalescent times) - Telecommunications (packet delays) - Survival analysis (time to event)
Further Reading
For detailed algorithms and proofs, see:
Røikjer, T., Hobolth, A., & Munch, K. (2022). Graph-based algorithms for phase-type distributions. Statistics and Computing, 32:103. https://doi.org/10.1007/s11222-022-10174-3
The paper provides: - Complete algorithmic details - Complexity analysis - Proofs of correctness - Empirical performance comparisons - Applications to population genetics
References
- Neuts, M.F. (1981). Matrix-Geometric Solutions in Stochastic Models. Johns Hopkins University Press.
- Bladt, M., & Nielsen, B.F. (2017). Matrix-Exponential Distributions in Applied Probability. Springer.
- Hobolth, A., Siri-Jegousse, A., & Bladt, M. (2019). Phase-type distributions in population genetics. Theoretical Population Biology, 127:16-32.
- Duff, I.S., Erisman, A.M., & Reid, J.K. (2017). Direct Methods for Sparse Matrices. Oxford University Press.
- Bobbio, A., Horváth, A., & Telek, M. (2004). The scale factor: a new degree of freedom in phase-type approximation. Performance Evaluation, 56(1-4):121-144.
This documentation is based on the graph-based algorithms implemented in phasic and described in Røikjer et al. (2022).
Statements and Declarations:
The authors received no support from any organization for the submitted work and have no relevant financial or non-financial interests to disclose.
Abstract
Phase-type distributions model the time until absorption in continuous or discrete-time Markov chains on finite state spaces. Multivariate phase-type distributions model rewards accumulated at visited states, enabling diverse applications. Traditional matrix-based computations become infeasible even for moderately sized state spaces. Since phase-type state spaces are often large but sparse, with few transitions per state, graph-based representations are more natural and efficient than matrix-based ones. We develop graph-based algorithms for analyzing phase-type distributions, including state space construction, reward transformation, moments calculation, marginal distribution functions of multivariate distributions, and state probability vectors for both time-homogeneous and time-inhomogeneous Markov chains. We implement these algorithms in phasic, a numerically stable, memory-efficient open source C library with bindings for C and R. We compare phasic to the fastest matrix-based tools for computing probability distributions (via matrix exponentiation) and moments (via matrix inversion). Our graph-based algorithms produce identical numerical results but run orders of magnitude faster. We demonstrate the advantages of phasic using a classic population genetics problem.
Key words: Phase-type distributions, computational statistics, moments, distribution, graph-based algorithms.
Introduction
Phase-type distributions model the time until absorption in continuous-time Markov jump processes or discrete-time Markov chains on finite state spaces (Neuts 1981). They are widely used in physics, telecommunication, queuing theory (Aalen 1995; Faddy and McClean 1999; He 2014; Acal et al. 2019), and recently in population genetics to model coalescence processes (Hobolth, Siri-Jegousse, and Bladt 2019). Their solid theoretical foundation (He 2014; Bladt and Nielsen 2017) enables elegant matrix-based formulations and proofs, but matrix computations become infeasible for systems with thousands of states. For example, in the basic coalescence model (Hobolth, Bladt, and Andersen 2021), state space size grows exponentially with the square root of sample size.
Since most real-world models have sparsely connected states, graph-based representations are more natural and efficient than matrix-based ones. We develop graph-based algorithms for computing moments, probability distributions, state probability vectors, and reward transformations of phase-type distributions. Graph-based approaches offer three key advantages: (1) iterative state space construction simplifies modeling complex phenomena; (2) memory requirements are drastically reduced by avoiding matrix storage; and (3) moments and distributions are computed orders of magnitude faster. Our algorithms operate directly on graph structures and handle both continuous and discrete phase-type distributions. We provide algorithms for iterative state space construction, reward transformation (positive or zero rewards), marginal and joint moments of multivariate distributions, distribution functions, and state probability vectors for time-homogeneous and time-inhomogeneous processes.
We present the theoretical principles and probabilistic interpretation of our algorithms with brief code examples. The algorithms are implemented in C and exposed as the open-source library phasic with bindings for C and R. Install via:
library(devtools)
devtools::install_github("TobiasRoikjer/phasic")
Full documentation is available at the GitHub website and in R man pages.
The paper is organized as follows. Section 2.1 presents the graph representation of continuous phase-type distributions. Sections 2.2-2.3 cover state space construction and reward transformation. Section 2.4 shows how to compute moments recursively on acyclic graphs. Sections 2.5-2.6 describe converting cyclic graphs to acyclic form. Section 2.7 shows caching techniques for quadratic-time computation of higher-order moments. Section 2.8 covers multivariate distributions and joint moments. Sections 2.9-2.11 extend the framework to discrete phase-type distributions and their distribution functions, including arbitrary-precision approximations for continuous distributions. Section 2.12 addresses time-inhomogeneous models. Section 3 compares empirical running times to state-of-the-art matrix-based approaches. Section 4 demonstrates the method on an isolation-with-migration model from population genetics.
Graph-based algorithms for phase-type distributions
Graph-representation of a phase-type distribution
Following (Bladt and Nielsen 2017), we describe a continuous phase-type distribution with p transient states, initial distribution \alpha_i (i=1,\ldots,p), and potential defect \alpha_0=1-\sum_{i=1}^p{\alpha_i}. Let \mathbf{\alpha}=(\alpha_1,\ldots,\alpha_p) be the initial distribution vector. The sub-intensity matrix (rate matrix excluding absorbing states) is \bm{S} and the Green matrix is \bm{U}=(-\bm{S})^{-1}. Entry (i,j) in the Green matrix, \bm{U}_{ij}, is the expected total time spent in state j when starting in state i. The cumulative distribution function is F_\tau(t) = 1 - \bm{\alpha}e^{\bm{S}t}\bm{e}, \;\; t\geq 0, where e^{\bm{S}t} is the matrix exponential and \bm{e} is the vector of ones.
We represent phase-type distributions as weighted directed graphs, following (Frydenberg 1990; Younes and Simmons 2004). A directed graph G=(V,E) has vertices V and edges E (ordered pairs of vertices). An edge from \texttt{v}\in V to \texttt{z}\in V is denoted (\texttt{v}\rightarrow \texttt{z})\in E. The edges (\texttt{v}\rightarrow \texttt{z}) for all \texttt{z}\in V are the out-going edges of v, and edges (\texttt{u}\rightarrow \texttt{v}) for all \texttt{u}\in V are the in-going edges of v. For any edge (\texttt{u}\rightarrow \texttt{v}), \texttt{u} is a parent of \texttt{v} and \texttt{v} is a child of \texttt{u}. We denote the set of children as \texttt{children}(\texttt{v}) and the set of parents as \texttt{parents}(\texttt{v}). In cyclic graphs, a parent may also be a child of the same vertex.
A weighted directed graph assigns each edge a real-valued weight W\colon E\rightarrow \mathbb{R} representing the transition rate. The weight of edge (\texttt{v}\rightarrow \texttt{z})\in E is w(\texttt{v}\rightarrow \texttt{z})\in \mathbb{R}, and the sum of out-going weights for vertex \texttt{v} is \lambda_\texttt{v}=\sum_{\texttt{z}\in \texttt{children}(\texttt{v})}{w(\texttt{v} \rightarrow \texttt{z})}. For a valid phase-type distribution, all weights must be strictly positive (exponential rates), edges only connect different vertices (no self-loops), at least one vertex has no out-going edges (absorbing state), and all vertices have a path to an absorbing vertex with positive probability.
We designate a single starting vertex S with edge weights representing the initial distribution: w(\texttt{S}\rightarrow \texttt{v}_i)=\alpha_i and \lambda_\texttt{S}=1. A defect is modeled by edges from S to absorbing vertices. The starting vertex has zero reward (Section 2.3), so it doesn’t contribute to absorption time.
The graph is mutable: we can add vertices, add/remove edges, and change weights. We use prime notation for transformations, e.g., G\gets G' represents updating G.
State space construction
We construct state spaces iteratively as graphs. This greatly simplifies specifying large, complex state spaces from simple transition rules. The construction algorithm visits each vertex once, independently of other vertices, exploiting the Markov property: out-going transitions and rates depend only on the current state. Algorithm [alg:generic-state-space-algorithm] shows the procedure. Figure 1 illustrates state space construction with R code.
Our algorithm requires an efficient lookup structure mapping states to vertices. We use an AVL tree in phasic to map states (represented as integer vectors) to vertices. Any state representation with an equivalence relation and ordering (< relation) works.
While algorithms are independent of the underlying data structure, asymptotic complexities assume that merging all children of vertex v to all parents takes quadratic time in vertex count. This requires constant-time edge addition and weight updates. In phasic, we store edges in ordered linked lists and merge two lists in linear time by element-wise comparison. For O(|V|) parents, we perform O(|V|) merges, each in O(2|V|) time, yielding quadratic time overall.
For modeling convenience, phasic supports edge parametrization, allowing weight updates after state space construction. This is useful for exploring different parameter values in the same model (see documentation).
To integrate phasic into existing workflows, we provide matrix_as_graph, which produces a graph from a sub-intensity matrix and initial probability vector. For example, computing the expectation of the phase-type distribution in Figure 1:
SIM <- cbind(c(-5, 1, 0, 0, 0),
c( 1, -8, 1, 4, 2),
c( 0, 1, -3, 0, 0),
c( 0, 0, 0, -3, 1),
c( 0, 0, 0, 1, -5))
IPV <- c(0, 0, 1, 0, 0)
graph <- matrix_as_graph(IPV, SIM)
expectation(graph)
find_or_create_vertex only creates vertices for new states. (C) State space after each while loop iteration in panel B. The index variable enumerates visited states (bold outline), while vertices_length(graph) returns the current vertex count. Construction completes when these are equal.
Let V be the vertices, visited be a subset of vertices, and E be the edges. Let W be the weight function E\rightarrow\mathbb{R}. Let f be a bijection between states and vertices, and f^{-1} be its inverse. Denote the starting vertex as S. User-defined functions: returns states reachable from state. returns the transition rate from state_from to state_to.
function.
V \gets \{\texttt{S}\}
unvisited \gets \{\texttt{S}\}
E \gets \emptyset W \gets \emptyset
\texttt{v} \gets \text{any entry from } unvisited unvisited \gets unvisited\setminus \{\texttt{v}\} \text{Add new vertex } \texttt{z} V \gets V\cup\{\texttt{z}\} unvisited \gets unvisited \cup \{\texttt{z}\} f \gets f \cup \{state \mapsto \texttt{z}\} \texttt{z} \gets f(state) E \gets E \cup \{(\texttt{v} \rightarrow \texttt{z})\} W \gets W\cup \{(\texttt{v} \rightarrow \texttt{z})\mapsto \Call{Rate}{f^{-1}(\texttt{v}),f^{-1}(\texttt{z})}\} Graph (V, E) and weight function W
Reward transformation of a phase-type graph
We assign zero or positive real-valued rewards to each state. Waiting time in each state is scaled by its reward. The phase-type distribution becomes the accumulated reward until absorption (not time). If all rewards are one, the distribution is unchanged. Let \tau \sim {\rm PH}_p(\bm{\alpha},\bm{S}) with p transient states, and let \{X_t\} be the underlying Markov jump process. We are interested in the reward-transformed variable Y=\int_0^{\tau} r(X_t)dt, where \bm{r}=(r(1),\ldots,r(p)) is the vector of non-negative rewards. Theorem 3.1.33 in (Bladt and Nielsen 2017) states that Y is phase-type distributed and provides matrix formulas for the reward-transformed sub-intensity matrix. We derive the corresponding graph-based construction.
Consider assigning a positive reward r_k>0 to state k while keeping all other rewards at one (r_i=1, i\neq k). The reward-transformed sub-intensity matrix is obtained by scaling row k by 1/r_k: \bm{S}'=\bm{S} \triangle(1/\bm{r}). In the graph, this corresponds to multiplying all out-going edge weights from vertex k by 1/r_k.
Now consider assigning zero reward r_k=0 to only state k (so r_i=1, i\neq k). We must remove vertex k and update edges and weights accordingly. Before transformation, the embedded Markov chain’s transition matrix has entries q_{ij}=s_{ij}/\lambda_i, \;\; i \neq j, \;\; {\rm and } \;\; q_{ii}=0, where s_{ij} is the rate from i to j (the weight of edge i\rightarrow j) and \lambda_i is the sum of out-going weights from i. The transition probability from state i\neq k to state j \neq \{ k, i\} with state k removed is p_{ij}=q_{ij}+q_{ik}q_{kj}. Since waiting time in each state must remain unchanged, the intensity from i to j must be s'_{ij}=\lambda_ip_{ij}. Thus the sub-intensity matrix with state k removed has entries s'_{ij}=\lambda_ip_{ij}=\lambda_i(q_{ij}+q_{ik}q_{kj}) =s_{ij}+s_{ik}\frac{s_{kj}}{\lambda_k}. In graph operations, if edge i\rightarrow j exists (s_{ij}>0), add s_{ik} s_{kj} / \lambda_k to its weight. If edge i\rightarrow j doesn’t exist (s_{ij}=0), add it with weight s_{ik} s_{kj} / \lambda_k. Algorithm [alg:reward-trans] shows the procedure. Figure 2 illustrates reward transformation.
B and reward one to states A, C, and D (Algorithm [alg:reward-trans]). Red vertices and edges are removed. Green edges are added or updated.
Scale weight w(\texttt{v}\rightarrow \texttt{z}) by 1/r(\texttt{v}) Increment weight w(\texttt{u}\rightarrow \texttt{z}) by \frac{w(\texttt{v}\rightarrow \texttt{z})}{\lambda_\texttt{v}}w(\texttt{u}\rightarrow \texttt{v}) Add edge (\texttt{u}\rightarrow \texttt{z}) with weight \frac{w(\texttt{v}\rightarrow \texttt{z})}{\lambda_\texttt{v}}w(\texttt{u}\rightarrow \texttt{v}) Remove edge (\texttt{u}\rightarrow \texttt{v})
Properties of higher-order moments
Higher-order moments of reward-transformed phase-type distributions are determined by matrix equations (Theorem 8.1.5, (Bladt and Nielsen 2017)): \mathbb{E}[\tau^k]=k!\bm{\alpha} (\bm{U}\triangle(\bm{r}))^{k}\bm{e}. The expectation is \mathbb{E}[\tau] = \bm{\alpha} \bm{U}\triangle(\bm{r})\bm{e}, and the second moment is \mathbb{E}[\tau^2]=2\bm{\alpha}\bm{U} \underbrace{\triangle(\bm{r})\bm{U}\triangle(\bm{r})\bm{e}}_{\triangle(\bm{r}')\bm{e}}. By associativity of matrix multiplication, we can compute \triangle(\bm{r})\bm{U}\triangle(\bm{r})\bm{e} first, yielding a column vector \triangle(\bm{r'})\bm{e}. This shows that the second moment can be computed like the first moment using a different reward vector. By induction, we find new rewards such that \mathbb{E}[\tau^k]= \bm{\alpha} \big( k!\bm{U}\triangle(\bm{r}') \big)\bm{e}= \mathbb{E}[\tau'], where \tau' is a phase-type distribution with the same state space as \tau but different rewards. Below, we use this property to construct algorithms for all moments and joint moments.
A graph representation of normalized phase-type distributions
We can normalize any rewarded phase-type distribution so that intensities at each state sum to one. This is achieved by adjusting rewards so that reward divided by total intensity remains constant after normalization. This makes state intensities become transition probabilities, exposing the embedded Markov process. In this form, the expected accumulated reward at each state is simply the expected number of visits scaled by the reward.
In the standard graph formulation, both reward and transition rate contribute to edge weights. In our normalized distribution representation, we factorize each edge weight into the exponentially distributed expected waiting time at each parent vertex and the remaining transition probability. We assign each vertex \texttt{v}\in V a scalar x_\texttt{v} =\lambda^{-1}_{\texttt{v}} representing expected waiting time. Out-going edge weights then represent transition probabilities, summing to one. In the normalized graph, total intensity at each non-absorbing vertex \texttt{v}\in V is unchanged and expressed as \frac{1}{x_\texttt{v}}\sum_{\texttt{z}\in\text{children}(\texttt{v})} w'(\texttt{v}\rightarrow \texttt{z}) where w'(\texttt{v}\rightarrow \texttt{z})=w(\texttt{v}\rightarrow \texttt{z})/\lambda_\texttt{v}. At the starting vertex, x=1; for absorbing vertices, x=0.
Acyclic phase-type distributions are an important special case where the sub-intensity matrix can be reorganized into an upper triangular matrix (Cumani 1982). In graphs, such distributions have a topological ordering, allowing expectations to be computed by simple recursion. Let v and z be vertices and let \mathbb{E}[\tau|\texttt{v}] be the expected accumulated reward until absorption starting at v. For the normalized graph, first-step analysis of Markov chains gives: \mathbb{E}[\tau|\texttt{v}] = x_\texttt{v} + \sum_{\texttt{z}\in\text{children}(\texttt{v})}{w(\texttt{v}\rightarrow \texttt{z})\mathbb{E}[\tau|\texttt{z}]}. Since we have a single starting vertex, \mathbb{E}[\tau]=\mathbb{E}[\tau|\texttt{S}]. Topological ordering allows the recursion to be computed via dynamic programming, yielding the k-th moment in quadratic time O(|V|^2k). This is impossible for cyclic state spaces, as the recursion never terminates. However, we can transform any cyclic phase-type distribution graph into an acyclic one with the same states.
An acyclic graph representation of phase-type distributions
We can manipulate edges V and vertex scalars x in a cyclic phase-type distribution graph to produce an acyclic graph with the same expected time (or accumulated reward) until absorption. Once constructed, this acyclic graph allows recursive computation of expectation as in the previous section. Algebraically, we find a phase-type distribution such that \bm{U}\triangle(\bm{r})\bm{e}=\bm{U'}\triangle(\bm{r}')\bm{e} where \bm{S}'=(-\bm{U}')^{-1} is an upper-triangular matrix. This matrix can be found by Gaussian elimination of -\bm{S}\bm{e}=\triangle(\bm{r})\bm{e}. Gaussian elimination of sparse matrices via graph theory is well-studied (Duff, Erisman, and Reid 2017). In phasic, we apply this technique directly to graphs. We assume the phase-type distribution has been normalized as in Section 2.5.
We index each vertex arbitrarily to \{1,2,\dots, |V|\}, with the starting vertex S at index 1 and absorbing vertices at the largest indices. We visit all vertices in index order. The initial graph is G^{(0)}; after visiting vertex i, the graph is G^{(i)}. The algorithm ensures three invariants after visiting each vertex i:
The vertex has no in-going edges from higher-indexed vertices.
Expected accumulated rewards until absorption, starting at any vertex, are preserved.
Out-going edge weights sum to 1 (0 for absorbing vertices).
In matrix formulation, this means \bm{U}^{(i+1)}\triangle(\bm{r}^{(i+1)})\bm{e}= \bm{U}^{(i)}\triangle(\bm{r}^{(i)})\bm{e}, where \bm{U}^{(i)} is the Green matrix for graph G^{(i)} and \bm{r}^{(i)} are associated rewards. Since vertex i has no in-going edges from vertices j>i, the graph is acyclic once all vertices are visited. We show in a later section that we only need this relatively expensive computation once to compute any moment or joint moment.
We first show how to remove an in-going and out-going edge of a vertex without changing expectation. Let v and z be vertices with associated indices. Let \mathbb{E}[\tau|\texttt{v}] be the expected accumulated reward until absorption starting at v. The recursion (Section 2.4) applies but doesn’t terminate for cyclic graphs: \mathbb{E}[\tau|\texttt{v}] = x_\texttt{v} + \sum_{\texttt{z}\in\text{children}(\texttt{v})}{w(\texttt{v}\rightarrow \texttt{z})\mathbb{E}[\tau|\texttt{z}]}. However, we can expand the recursion by bridging the immediate child: \mathbb{E}[\tau|\texttt{v}] =
x_\texttt{v} +
\sum_{\texttt{z}_1\in\text{children}(\texttt{v})}{w(\texttt{v}\rightarrow \texttt{z}_1)\Big( x_{\texttt{z}_1} + \sum_{\texttt{z}_2\in\text{children}(\texttt{z}_1)}{w(\texttt{z}_1\rightarrow \texttt{z}_2)}\mathbb{E}[\tau|\texttt{z}_2]\Big)} which is equivalent to \mathbb{E}[\tau|\texttt{v}] =\Big( x_\texttt{v} +\sum_{\texttt{z}_1\in\text{children}(\texttt{v})}{w(\texttt{v}\rightarrow \texttt{z}_1)x_{\texttt{z}_1}}\Big)+ \sum_{\texttt{z}_1\in\text{children}(\texttt{v})}{\Big( \sum_{\texttt{z}_2\in\text{children}(\texttt{z}_1)}{w(\texttt{v}\rightarrow \texttt{z}_1)w(\texttt{z}_1\rightarrow \texttt{z}_2)}\mathbb{E}[\tau|\texttt{z}_2]\Big)}. This reveals that we can remove edge (\texttt{v}\rightarrow \texttt{z}_1) without changing \mathbb{E}[\tau|\texttt{v}] by two operations: (1) increase expected waiting time to x_\texttt{v} +\sum_{\texttt{z}_1\in\text{children}(\texttt{v})}{w(\texttt{v}\rightarrow \texttt{z}_1)x_{\texttt{z}_1}} and (2) set weights of all edges w(\texttt{v}\rightarrow \texttt{z}_2) for all \texttt{z}_2\in \text{children}(\texttt{z}_1) to w(\texttt{v}\rightarrow\texttt{z}_1)w(\texttt{z}_1\rightarrow \texttt{z}_2).
To remove z, we increase the expected waiting time of each parent by the in-going weight multiplied by x_\texttt{z}, and add or update edges from each parent to all children of z by the product of edge weights. These operations remove vertex z from a state-path without changing expectation.
When vertex \texttt{v} is also a child of \texttt{z}, this creates a self-loop. A self-loop with weight w(\texttt{v}\rightarrow \texttt{v}) can be removed without changing expectation by increasing x_\texttt{v} to x_\texttt{v}'=x_\texttt{v}/w(\texttt{v}\rightarrow \texttt{v}) and scaling all other out-going edges by 1/(1-w(\texttt{v}\rightarrow \texttt{v})).
By the three invariants, when we visit vertex \texttt{v}_i (index i), all its children have higher indices because all vertices with indices below i have been visited. Not all parents of \texttt{v}_i have indices smaller than i. However, redirecting edges from higher-indexed parents establishes the three invariants for vertex i. The equation for \mathbb{E}[\tau|\texttt{v}] shows that expected accumulated reward is unchanged, and by induction, an acyclic graph is produced once all vertices are visited. Algorithm [alg:expectation] summarizes the procedure, building an acyclic graph in O(|V|^3) time. Figure 3 shows a worked example.
The algorithm then computes expected accumulated reward recursively using dynamic programming in O(|V|^2) time on the acyclic graph. Section 2.7 shows we only need to construct the acyclic graph once. In phasic, we improve empirical running time by indexing vertices by topological ordering when it exists or by ordering of strongly connected components.
To see the correspondence with Gauss elimination, note how -\bm{S}\bm{e}=\triangle(\bm{r})\bm{e} corresponds to the example graph in Figure 3 when \bm{S} =
\begin{bmatrix}
-1 & 0.6 & 0\\
0 & -1 & 0.5 \\
1 & 0 & -1
\end{bmatrix} and \bm{r}=(5, 2, 5). The system of equations is: \begin{aligned}
T_\texttt{A} &= 5 + 0.6 \cdot T_\texttt{B} \\
T_\texttt{B} &= 2 + 0.5 \cdot T_\texttt{C} \\
T_\texttt{C} &= 5 + T_\texttt{A}
\end{aligned} where T_\texttt{A} = \mathbb{E}[\tau | \texttt{A} ] is the expectation starting in state A, and similarly for T_\texttt{B} and T_\texttt{C}. The zero expectation starting at the absorbing state, T_\texttt{D}, doesn’t appear. Gauss elimination uses three operations. First, “Swap row positions” — our algorithm maintains proper ordering by visiting vertices in index order. Second, “Add to one row a scalar multiple of another” (substitute one equation in another) — this corresponds to removing an edge to a higher-indexed parent. For example, in Step 1 of Figure 3, removing edge (\texttt{C} \rightarrow \texttt{A}) and updating edges (\texttt{C} \rightarrow \texttt{B}) and (\texttt{C} \rightarrow \texttt{D}) corresponds to substituting the first equation into the third. Third, “Multiply a row by a non-zero scalar” (remove multiple instances of a variable) — this corresponds to removing a self-loop. For example, in Step 4 of Figure 3, removing self-loop (\texttt{C} \rightarrow \texttt{C}) and updating edge (\texttt{C} \rightarrow \texttt{D}) corresponds to isolating T_C in T_\texttt{C} = 11.2 + 0.3 \cdot T_\texttt{C}. Once all vertices are visited and the graph is acyclic, the system is upper triangular: \begin{aligned}
T_\texttt{A} &= 5 + 0.6 \cdot T_\texttt{B} \\
T_\texttt{B} &= 2 + 0.5 \cdot T_\texttt{C} \\
T_\texttt{C} &= 16,
\end{aligned} which is solved by back-substitution. On the graph, this corresponds to the recursion in topological order at the end of Algorithm [alg:expectation].
Computing higher-order moments in quadratic time
In converting to an acyclic graph, scalars x associated with vertices are updated in a series of increments. In Algorithm [alg:expectation], we save this list of updates rather than applying them directly. The resulting list is at most O(|V|^2) long since we visit each vertex at most once and update at most |V| parent scalars. We also save the update functions for computing expectation on the acyclic graph via dynamic programming (also O(|V|^2)). The two lists compute expectation in O(|V|^2) time.
Converting a cyclic graph to acyclic form requires O(|V|^3) operations. However, only O(|V|^2) updates of scalars x are required, and edge weights of the resulting acyclic graph are independent of expected waiting times at each vertex. This means that once the normalized acyclic graph is constructed and edge weights are known, it can be reconstructed for alternative rewards using only O(|V|^2) updates of scalars x. Since higher-order moments of phase-type distributions are just expectations with different rewards (Section 2.4), we can compute any number of moments in O(|V|^2) time once the acyclic graph is constructed. In phasic, the acyclic graph and update functions are created only the first time a user calls a moment function; subsequent moment computations run in O(|V|^2) time.
Multivariate distributions
Instead of assigning a single real-valued reward to each state, we can assign a vector of real-valued rewards. The outcome is then a vector of positive real numbers. This multivariate phase-type distribution is defined in Chapter 8 of (Bladt and Nielsen 2017) as \bm{Y} \sim {\rm MPH}^*(\bm{\alpha},\bm{S}, \bm{R}), where \bm{R} is a reward matrix such that each row is the accumulated reward earned at that state. A single column of \bm{R} represents a univariate phase-type distribution. Marginal moments of multivariate phase-type distributions can be computed using our graph algorithms and require only a single acyclic graph computation.
The joint distribution of a multivariate phase-type distribution represents the conditional outcome of marginal distributions. Joint moments are well-defined via matrix formulations (Theorem 8.1.5 (Bladt and Nielsen 2017)). The first cross-moment is \mathbb{E}[Y_iY_j]=\bm{\alpha}\bm{U}(\bm{R}_{\cdot i})\bm{U}(\bm{R}_{\cdot j})\bm{e}+\bm{\alpha}\bm{U}(\bm{R}_{\cdot j})\bm{U}(\bm{R}_{\cdot i})\bm{e}, so the covariance is \mathbb{C}ov(Y_i,Y_j)=\bm{\alpha}\bm{U}(\bm{R}_{\cdot i})\bm{U}(\bm{R}_{\cdot j})\bm{e}+\bm{\alpha}\bm{U}(\bm{R}_{\cdot j})\bm{U}(\bm{R}_{\cdot i})\bm{e}-\bm{\alpha}\bm{U}(\bm{R}_{\cdot i})\bm{e}\bm{\alpha}\bm{U}(\bm{R}_{\cdot j})\bm{e}. As for all other moments, joint moments can be computed in quadratic time after constructing the acyclic graph. Computing the covariance matrix of \ell rewards takes O(|V|^3+\ell^2|V|^2) time. The utility functions expectation, variance, covariance, and moments in phasic support multivariate phase-type distributions.
Initialize property x_\texttt{v}=r_\texttt{v} for all \texttt{v}\in V Uniquely index vertices i_\texttt{v}\in \{1,2,\dots, |V|\} matching order in list V
w(\texttt{v}\rightarrow \texttt{z}) \gets w(\texttt{v}\rightarrow \texttt{z})/\lambda_\texttt{v} x_\texttt{v} \gets x_\texttt{v} + x_\texttt{v} \cdot ( 1/\lambda_\texttt{v} - 1)
Increment weight w(\texttt{u}\rightarrow \texttt{z}) by w(\texttt{u}\rightarrow \texttt{v})w(\texttt{v}\rightarrow \texttt{z}) Add edge (\texttt{u}\rightarrow \texttt{z}) with weight w(\texttt{u}\rightarrow \texttt{v})w(\texttt{v}\rightarrow \texttt{z}) x_\texttt{u} \gets x_\texttt{u} + x_\texttt{v}\cdot w(\texttt{u}\rightarrow \texttt{v}) Remove edge (\texttt{u}\rightarrow \texttt{v})
x_\texttt{u} \gets x_\texttt{u} + x_\texttt{u} \cdot \left ( \frac{1}{1-w(\texttt{u}\rightarrow \texttt{u})} - 1\right) w(\texttt{u}\rightarrow \texttt{z}) \gets \frac{w(\texttt{u}\rightarrow \texttt{z})}{1-w(\texttt{u}\rightarrow \texttt{u})} Remove edge (\texttt{u}\rightarrow \texttt{u}) x_\texttt{v} \gets x_\texttt{v}+ w(\texttt{v}\rightarrow \texttt{z}) x_\texttt{z} x
Discrete phase-type distributions
A discrete phase-type distribution is the number of jumps in a discrete Markov chain until the absorbing state is entered. States have total transition probability one, traditionally represented by a sub-transition matrix \bm{T} of rates between non-absorbing states. The initial probability vector is \bm{\pi}. As for continuous phase-type distributions, the expectation is \mathbb{E}[\tau] = \bm{\pi}\bm{U}\bm{e}, where the Green matrix is \bm{U}=-(\bm{T}-\bm{I})^{-1}.
Our graph representation for continuous phase-type distributions directly accommodates unrewarded discrete phase-type distributions. We don’t represent self-transitions as self-loops (\texttt{v}\rightarrow \texttt{v}), as these aren’t compatible with our graph algorithms. Instead, we represent self-transitions by the missing transition rate 1-\lambda_\texttt{v}. After normalization, out-going weights sum to one, and each vertex scalar x_\texttt{v} equals the geometric expectation of consecutive visits to the state (i.e., the transition to v and immediate self-transitions).
In the normalized discrete phase-type distribution, sub-transition matrix \bm{T} has diagonal zero, so \bm{T}-\bm{I} shares properties of the normalized sub-intensity matrix \bm{S}: diagonal entries are -1 and row sums for non-absorbing states are zero. Thus we can apply Algorithm [alg:expectation] to discrete phase-type distributions.
Rewarded discrete and multivariate discrete phase-type distributions are thoroughly described in (Navarro 2019). We translate the matrix-based reward transformation algorithm from Theorem 5.2 in (Navarro 2019) to graph operations. Consider vertex v with reward r_\texttt{v}\in \mathbb{N}. To represent this integer reward, we augment the graph with a new sequence of connected auxiliary vertices, \texttt{H}_1\rightarrow \texttt{H}_2 \rightarrow \dots \rightarrow \texttt{H}_{r_\texttt{v}-1} \rightarrow \texttt{v}, each connected by edges with weight 1. The last auxiliary vertex has an edge with weight 1 to v, and edge (\texttt{v}\rightarrow \texttt{H}_1) has weight equal to the self-transition rate. By redirecting all in-going edges from v to \texttt{H}_1 instead, we ensure each visit to v results in r_\texttt{v} jumps in the unrewarded discrete phase-type distribution. Zero reward transformation uses the algorithm for continuous phase-type distributions.
Higher-order moments are well-defined for rewarded discrete phase-type distributions (Proposition 5.7 in (Navarro 2019)). The first moment is \mathbb{E}[\tau] = \bm{\pi}\bm{U}\triangle(\bm{r})\bm{e} and the second moment is \mathbb{E}[\tau^2] = 2\bm{\pi}\bm{U}\triangle(\bm{r})\bm{U}\triangle(\bm{r})\bm{e}-\bm{\pi}\bm{U}\triangle(\bm{r}^2)\bm{e}. We construct multivariate discrete phase-type distributions by associating a vector of zero or positive integers as rewards to each vertex. The moment generating function is well-defined by matrix equations (Section 5.2.4 in (Navarro 2019)). For example, the first cross moment is \mathbb{E}[Y_iY_j] = \bm{\pi} \bm{U}\triangle(\bm{r_i})\bm{U}\triangle(\bm{r_j})\bm{e} + \bm{\pi}\bm{U}\triangle(\bm{r_j})\bm{U}\triangle(\bm{r_i})\bm{e}-\bm{\pi}\bm{U}\triangle(\bm{r_i})\triangle(\bm{r_j})\bm{e}. Although moments of discrete and continuous phase-type distributions are defined differently, terms involving \bm{U} still have the form \bm{U}\triangle(\bm{r})\bm{e} from the right, which reduces to a single reward vector \bm{U}\triangle(\bm{r})\bm{e}=\triangle(\bm{r'})\bm{e} = \bm{r'}. These rewards correspond to row sums of the Green matrix computed for the previous moment, as described for continuous phase-type distributions in Section 2.4. This allows O(|V|^2) computation of higher order moments as described for continuous phase-type distributions in Section 2.7.
Distribution function of discrete phase-type distributions
The probability mass function (PMF) of a discrete phase-type distribution at time t is the probability of the Markov chain entering the absorbing state exactly at jump t. A defective distribution (initial distribution vector not summing to 1) has non-zero probability at t=0. The cumulative distribution function (CDF) at time t is the probability that the chain has entered the absorbing vertex at jump t or before. In matrix form, the CDF is F(t)=1-\bm{\pi}\bm{T}^t\bm{e}. We can express the PMF as a recursion in t (Eisele 2006). Using total probability: \Pr(X_{t+1}=v)= \sum_{\texttt{u} \neq v} \Pr(X_t=u)T_{uv}+ \Pr(X_t=v)T_{vv}. In our graph, we represent the probability of being at vertex \texttt{v} after t jumps as vertex property \texttt{v}.p[t] and compute the recursion as the probability that a parent will transition to the vertex in the next jump, or that the vertex will jump to itself: \texttt{v}.p[t+1] = \sum_{\texttt{u} \in \text{parents}(\texttt{v})}{\texttt{u}.p[t]\cdot w(\texttt{u}\rightarrow \texttt{v})} + \texttt{v}.p[t](1-\lambda_\texttt{v}) for t \geq 0.
We define the base case at t=-1. At the base case, \texttt{v}.p is zero for all vertices except the starting vertex \texttt{S} where it is one. Using dynamic programming, we find the PMF (and thus CDF) at time t by computing the PMF at times 0,1,\dots,t-1. To compute the CDF at time t, we sum over the PMF at times 0,1,\dots, t. The asymptotic complexity to time t is O(t|V|^2). For sparse matrices or relatively few jumps (t<<|V|), this algorithm is empirically very fast (Figure 4) and serves as an efficient alternative to matrix formulation. Algorithm [alg:dph_pmf] shows the CDF computation.
We note that, although computationally infeasible in most situations, the joint distribution function of multivariate discrete phase-type distributions can also be described by such a recursive algorithm (including time-inhomogeneous multivariate discrete distributions). Instead of defining the probability of visiting a vertex at time t, we track both the visited vertex and accumulated reward at time t. Consider an \ell+1-dimensional table for an \ell-dimensional multivariate discrete distribution. The first dimension has |V| entries, one per vertex. The other dimensions have entries corresponding to natural numbers from 0 to k where k is sufficiently large so that for a given distribution function of specified accumulated rewards \bm{s}, F_{\bm{s}}, then k is greater than or equal to any entry in \bm{s}. At time t, an entry in the table is the probability of being in a specific state with a specific vector of integer rewards accumulated (other dimensions). If we’ve accumulated more than any of the rewards \bm{s}, we assign zero probability to the entry before the next time step. Assuming we’ve removed vertices with zero rewards in all dimensions (e.g., using reward transformation), all time steps increase at least one reward entry, and after finitely many steps, all entries are either zero or, for absorbing vertices, the total probability of reaching an absorbing state with less than or equal to rewards \bm{s}, which is the definition of the cumulative distribution function for multivariate discrete phase-type distributions.
Initialize the number of jumps to t=-1 Initialize property \texttt{v}.p=0 for all vertices \texttt{v}\in V, and let \texttt{S}.p=1. Let \texttt{v}.p be the current probability of being at vertex \texttt{v} at time t Assume out-going weights sum to a value less than or equal to 1 Let \text{cdf}[] be an array of the cumulative distribution function such that entry 0 corresponds to the start at no jumps, t=0. Let \text{pmf}[] be an array of the probability mass function. Initialize \text{cdf}[-1] = 0 and \text{pmf}[-1]=0 \texttt{v}.q \gets \texttt{v}.p \texttt{z}.p \gets \texttt{z}.p + \texttt{v}.q \cdot w(\texttt{v} \rightarrow \texttt{z}) \texttt{v}.p \gets \texttt{v}.p - \texttt{v}.q \cdot w(\texttt{v} \rightarrow \texttt{z}) t \gets t + 1 \text{cdf}[t] \gets \text{cdf}[t-1] \text{cdf}[t] \gets \text{cdf}[t] + \texttt{v}.p \texttt{v}.p \gets 0 \text{pmf}[t] \gets \text{cdf}[t] - \text{cdf}[t - 1]
i\gets i+1
Distribution function for continuous phase-type distributions
In the matrix formulation, computing the CDF of continuous phase-type distributions requires matrix exponentiation. However, a continuous phase-type distribution can be approximated with arbitrary precision by a discrete phase-type distribution (Bobbio, Horváth, and Telek 2004). The number of discrete steps occupying each state has geometric expectation approximating the exponential expectation of the continuous distribution. This allows efficient computation of the PDF, CDF, and probability of occupying each state at any time using Algorithm [alg:dph_pmf].
Precision is determined by the number of discrete steps per unit time, controlled by a granularity parameter in phasic. Granularity of 1000 means each time unit of the continuous distribution is divided into 1000 discrete steps. Since our graph representations for discrete and continuous phase-type distributions are identical, and self-loops are represented by remaining transition probability, we simply divide all out-going weights by this granularity (except starting vertex weights). The constraint is that granularity must be at least as large as the largest transition rate so that out-going rates are smaller than one after division. In phasic, granularity can be user-set to control approximation precision, but defaults to at least twice the highest out-going rate and always at least 1024.
To verify numerical accuracy, we compared to matrix exponentiation for a phase-type distribution with 1000 fully connected states. Transition rates were sampled randomly in [0,1], so each vertex had average total out-going rate 500. With granularity 10,000, average self-loop probability is 0.95. We computed the cumulative distribution function for times (0, 0.01, 0.02, \dots, 1.00) using both Algorithm [alg:dph_pmf] as implemented in phasic and using matrix exponential 1-\bm{\alpha}e^{\bm{S}t}\bm{e}. Average absolute and maximum differences were 0.00003 and 0.0002, demonstrating stability and negligible numerical difference.
Time-inhomogeneous discrete phase-type distributions
So far, algorithms have assumed time-homogeneous phase-type distributions (constant rates between states). Time-inhomogeneous phase-type distributions are also well-described using matrix equations (Albrecher and Bladt 2019). Although not the focus of this paper, we note that our algorithm for computing distribution functions (Algorithm [alg:dph_pmf]) can produce the distribution and state probability vector of time-inhomogeneous discrete and continuous phase-type distributions. This is achieved by changing edge weights at each time step (e.g., using graph_update_weights_parameterized in phasic), effectively allowing edge weight or edge existence to be time-dependent. Having computed the probability distribution this way, we can compute all moments of unrewarded time-inhomogeneous distributions by integration. We can also compute the expectation of a rewarded time-inhomogeneous distribution by summing expected visiting time multiplied by reward for each vertex.
Since Algorithm [alg:dph_pmf] computes the probability of being at each vertex at any time step, we can compute accumulated waiting time in each state \bm{a}(t) at time t. This gives an alternative way to compute the expectation of a reward-transformed truncated distribution as the dot product \bm{r} \cdot \bm{a}(t). In models where state space changes at one or a few points in time, this provides efficient means to compute expectation as the sum of expectations of truncated time-homogeneous distributions.
We can describe a time-inhomogeneous phase-type distribution that is piece-wise time-homogeneous. The n constituent time-homogeneous distributions begin at defined times \bm{\tau} starting at \tau_1 = 0. We assume constituent distributions share the same state space and differ only in transition probabilities. The n-1 constituent distributions begin at time \tau_i and are truncated at time \tau_{i+1}, when each state transitions to the corresponding state in the subsequent distribution with probability 1. We call this set of transitions “linking transitions”.
In this formulation, probabilities of residing in each state at time \tau_{i} serve as the initial probability vector for the distribution beginning at \tau_{i+1}. The last distribution beginning at time \tau_{n} is not truncated.
Constituent distributions can be represented as graphs \{G_{1}, \ldots \} all with state space V, but daisy-chaining graphs as outlined above preserves only marginal expectations. However, we can construct a graph for a single time-homogeneous distribution with state space \bigcup_{\tau_i \in \{1, \ldots, n\}} \{V^1, V^2, \ldots, V^n\}. We first describe an algorithm for n = 2, where G^1 is truncated and G^2 is not, then how this algorithm can be used iteratively to construct a single graph for any n.
In combining G^1, truncated at time \tau_1, with G^2, the resulting graph must be time-homogeneous and retain expected waiting times at each vertex of G^1 and G^2. Truncating G^1 corresponds to adding edges \texttt{v}^1_i \rightarrow \texttt{v}^2_i for i \in \{1, \ldots, \lvert V \rvert\} with weights 1 that exist only at \tau_1. Since only expected waiting time needs to be retained, the added transition can be allowed at any time if the weight is instead 1/\bm{a}(\tau_1), where \bm{a}(\tau_1) is the accumulated waiting time in each state at time \tau_1.
This algorithm joins state spaces for epochs a and b at time \tau, producing a time-homogeneous graph with states a and b that retains expected waiting times in each state:
Construct state space of epoch a as graph G^a using Algorithm [alg:generic-state-space-algorithm].
Use Algorithm [alg:dph_pmf] to compute probabilities of being at each vertex of G^a at \tau_1, s(\tau_1), and accumulated waiting time in each state up to \tau_1, \bm{a}(\tau_1).
Connect vertices in G^a to their counterparts in epoch b by edges with weights s(\tau_1) / \bm{a}(\tau_1).
Add remaining edges between added states using Algorithm [alg:generic-state-space-algorithm].
Let V be the vertices, visited be a subset of vertices, and E be the edges. Let W be the weight function E\rightarrow\mathbb{R}. Let f be a bijection between states and vertices, and f^{-1} be its inverse. Denote the starting vertex as S. User-defined functions: returns states reachable from state. returns the transition rate from state_from to state_to.
G^A \gets \Call{GenerateStateSpace}{\textsc{TransitionsA}, \textsc{RateA}} \Call{DistributionFunction}{G^A, t} \texttt{v} \gets \Call{TransitionsB}{f(state)}
\text{Add new vertex } \texttt{v}' V \gets V\cup\{\texttt{z}\} add edge to v’ with weight v.p/
G
E \gets \emptyset W \gets \emptyset \texttt{v} \gets \text{any entry from } unvisited unvisited \gets unvisited\setminus \{\texttt{v}\} \text{Add new vertex } \texttt{z} V \gets V\cup\{\texttt{z}\} unvisited \gets unvisited \cup \{\texttt{z}\} f \gets f \cup \{state \mapsto \texttt{z}\} \texttt{z} \gets f(state) E \gets E \cup \{(\texttt{v} \rightarrow \texttt{z})\} W \gets W\cup \{(\texttt{v} \rightarrow \texttt{z})\mapsto \Call{Rate}{f^{-1}(\texttt{v}),f^{-1}(\texttt{z})}\} Graph (V, E) and weight function W
V \gets \{\texttt{S}\} unvisited \gets \{\texttt{S}\} E \gets \emptyset W \gets \emptyset \texttt{v} \gets \text{any entry from } unvisited unvisited \gets unvisited\setminus \{\texttt{v}\} \text{Add new vertex } \texttt{z} V \gets V\cup\{\texttt{z}\} unvisited \gets unvisited \cup \{\texttt{z}\} f \gets f \cup \{state \mapsto \texttt{z}\} \texttt{z} \gets f(state) E \gets E \cup \{(\texttt{v} \rightarrow \texttt{z})\} W \gets W\cup \{(\texttt{v} \rightarrow \texttt{z})\mapsto \Call{Rate}{f^{-1}(\texttt{v}),f^{-1}(\texttt{z})}\} Graph (V, E) and weight function W
Empirical running time
We describe empirical running time of state space construction, moment computation, and cumulative distribution computation in the rabbit island model (Figure 1) and in the coalescent, a standard model in population genetics. The coalescent describes the decreasing number of ancestors to a sample of DNA sequences as time progresses into the past (Kingman 1982). All experiments were run on a MacBook Pro with an i7 processor using a single core.
Although all phasic library features are exposed as R functions, the similar C API offers an efficient alternative for generating very large state spaces. Figures 4A and 4E show the time to construct the two state spaces for different numbers of rabbits and DNA samples using both R and C APIs. For example, for 1000 rabbits (more than half a million states), we can construct the state space and compute expectation in two minutes (not shown).
We compare running time of phasic to PhaseTypeR (Rivas-González, Andersen, and Hobolth 2022), which implements the same functionality using matrix computations. PhaseTypeR uses expm::expm() (Goulet et al. 2021) for exponentiation. We evaluated the exponentiation algorithms in expm and use PadeRBS, which performs well for both models. We further modified PhaseTypeR to use the matrixdist R package, which wraps the C++ Armadillo library, for matrix inversion. To our knowledge, this makes PhaseTypeR the fastest available matrix-based approach. Plots in Figure 4 show the time phasic and PhaseTypeR use to compute expectation, 100 higher moments or cross-moments (e.g., a 10×10 covariance matrix), and the CDF for 100 evenly spaced values up to cumulative probability 0.95. Panels B-D and F-H show computation times for the rabbit model and coalescent model.
{fig-experiments}
An application in population genetics
To demonstrate phasic capabilities, we present an example from population genetics. Kern and Hey (2017) describe and implement a method for computing the joint site frequency spectrum (JSFS) for an isolation-with-migration (IM) model. A site frequency spectrum (SFS) is the number of single-base genetic variants shared by 1, 2, \ldots, n-1 sequences among n sequences sampled from a population. The JSFS tabulates the number of variants shared by i sequences in one population and j in another. The JSFS matrix can be obtained empirically and is determined by the population sizes of the two populations and the common ancestral population, migration rate between populations, and the time the populations have been separated (Figure 5A).
A genetic variant appearing in three samples from population one and four from population two arose from a mutation on a genealogical lineage with three descendants in population one and four in population two. Knowing the mutation rate, the JSFS is given by the expected length of branches with different numbers of descendants in each population.
The structured coalescent models the length of ancestral branches. Formulated as a continuous phase-type distribution, states represent ancestral lineages with particular numbers (i,j) of descendants in each population. The initial state represents i and j present-day individual lineages, each with a single descendant. The absorbing state is where all lineages have found a single common ancestor. Absorption time corresponds to time to the most recent common ancestor (TMRCA).
The IM model is not time-homogeneous as it changes from two separate populations to a shared ancestral population at time T. Using phasic, we can easily model this with one time-homogeneous continuous phase-type distribution truncated at time T and another time-homogeneous distribution representing the system after time T, as described in Section 2.12.
Expected length of genealogical branches with i and j descendants from the two populations is readily computed after appropriately reward transforming the two phase-type distributions. With seven samples from each population, the two state spaces have 123,135 and 2,999 states.
(Kern and Hey 2017) computed the JSFS in around 15 minutes using 12 cores. For comparison, state space construction and JSFS computation takes only 35 seconds on a single core using phasic (Figure 5B). This is particularly noteworthy because phasic is a general purpose library not tailored to this specific problem. Code for model construction and JSFS computation is available with phasic on GitHub.
{fig-im-model}
Discussion
This paper presents a new open-source library called phasic written in C that implements graph-based algorithms for constructing and reward-transforming continuous and discrete phase-type distributions, and for computing their moments and distribution functions. Moment computation extends to multivariate phase-type distributions. The library has native interfaces in C and R. While some methods build on previously published graph-based matrix manipulation, to our knowledge, this is the first time these graph-based approaches have been applied to phase-type distributions and published as an accessible software package.
The joint distribution function of multivariate discrete phase-type distributions can be described by such a recursive algorithm as we presented for marginal distribution functions. However, this application requires tracking not just the probability of visiting a vertex but also the rewards accumulated at that point for each vertex. Although computationally infeasible in most situations, such an algorithm would be useful for small state spaces or accumulated rewards and could be an avenue for future work.
Although phasic provides some support for time-inhomogeneous phase-type distributions, this is achieved by discrete changes to time-homogeneous models. New algorithms are required to provide full support for time-inhomogeneous phase-type distributions in a graph framework.
We have demonstrated that our graph representation is orders of magnitude faster than matrix-based approaches. Straightforward iterative construction of state spaces enables powerful modeling, and our algorithms allow computation of moments and distributions for huge state spaces. General multivariate phase-type distributions allow marginal expectations and covariance between events to be studied easily. Since phasic includes functions for converting between graph and matrix representations, our library may serve as a plug-in in a multifaceted modeling and inference process. We hope this package will enable users to quickly and accessibly model many complex phenomena in natural sciences, including population genetics.
Acknowledgements
We are grateful to two anonymous reviewers. Their useful comments and constructive suggestions helped improve a previous version of the manuscript.