Model sharing

WarningUnder Construction

This notebook is under development

Once you have recorded an elimination trace for a model, you may want to share it with collaborators or the wider research community. The trace_repository module provides a decentralized sharing system built on IPFS (InterPlanetary File System) — a content-addressed peer-to-peer network. Content is identified by cryptographic hashes (CIDs), not by server URLs, so traces remain available as long as any node pins them.

The system uses progressive enhancement:

Tier Configuration How traces are fetched
1 — Zero config Nothing installed Public HTTP gateways (dweb.link, ipfs.io, …)
2 — Auto-start IPFS binary installed Python auto-starts the daemon; faster, peer-to-peer
3 — Optimal IPFS running as a system service Instant access, you also seed content for others

This tutorial walks through the full sharing workflow: checking backend status, downloading traces, pinning and unpinning content, inspecting where a trace is loaded from, and retracting faulty models.

Setup

from phasic import Graph
from phasic.trace_repository import (
    IPFSBackend,
    TraceRegistry,
    TransportBackend,
)

Checking backend status

Before sharing anything, it helps to know the state of your local IPFS environment. The IPFSBackend.status() method returns a dictionary summarising the connection:

backend = IPFSBackend()
status = backend.status()

print("IPFS Backend Status")
print("=" * 40)
print(f"IPFS installed:    {status['ipfs_installed']}")
print(f"Daemon running:    {status['daemon']}")
print(f"Daemon version:    {status['daemon_version'] or 'N/A'}")
print(f"HTTP gateways:     {len(status['gateways'])} configured")
for gw in status['gateways']:
    print(f"  - {gw}")
IPFS Backend Status
========================================
IPFS installed:    True
Daemon running:    True
Daemon version:    0.38.1
HTTP gateways:     4 configured
  - https://ipfs.io
  - https://cloudflare-ipfs.com
  - https://dweb.link
  - https://gateway.pinata.cloud
/Users/kmt/phasic/.pixi/envs/default/lib/python3.11/site-packages/ipfshttpclient/client/__init__.py:75: VersionMismatch: Unsupported daemon version '0.38.1' (not in range: 0.5.0 ≤ … < 0.9.0)
  warnings.warn(exceptions.VersionMismatch(version, minimum, maximum))

If daemon is False, the backend falls back to HTTP gateways automatically. If IPFS is installed but the daemon is not running, IPFSBackend will attempt to start it for you (unless you pass auto_start=False).

To start the daemon manually from a terminal:

ipfs daemon &

Or, for a system-wide service on Linux:

sudo systemctl enable ipfs
sudo systemctl start ipfs

Browsing the trace registry

The TraceRegistry maintains a JSON index of all published traces hosted on GitHub. It is downloaded (and cached locally) the first time you create a registry instance.

registry = TraceRegistry()

# List all available traces
traces = registry.list_traces()
print(f"Total traces available: {len(traces)}")
for t in traces[:5]:  # show first 5
    print(f"  {t['trace_id']:30s}  {t.get('description', '')}")
Total traces available: 5
  coalescent_n10_theta1           Standard Kingman coalescent for n=10 haploid samples with theta parameter
  coalescent_n15_theta1           Standard Kingman coalescent for n=15 haploid samples with theta parameter
  coalescent_n20_theta1           Standard Kingman coalescent for n=20 haploid samples with theta parameter
  coalescent_n3_theta1            Standard Kingman coalescent for n=3 haploid samples with theta parameter
  coalescent_n5_theta1            Standard Kingman coalescent for n=5 haploid samples with theta parameter
# Filter by domain and model type
popgen = registry.list_traces(
    domain="population-genetics",
    model_type="coalescent"
)
print(f"Population-genetics coalescent traces: {len(popgen)}")
for t in popgen:
    print(f"  {t['trace_id']:30s}  vertices={t.get('vertices', '?')}")
Population-genetics coalescent traces: 5
  coalescent_n10_theta1           vertices=10
  coalescent_n15_theta1           vertices=15
  coalescent_n20_theta1           vertices=20
  coalescent_n3_theta1            vertices=3
  coalescent_n5_theta1            vertices=5

Downloading a trace

Use get_trace() to download and deserialise a trace. If the trace is already in your local cache it is loaded instantly.

# Download by ID (cached automatically)
# trace = registry.get_trace("coalescent_n5_theta1")
# print(f"Loaded trace: {trace.n_vertices} vertices, {len(trace.operations)} operations")

# Or use the convenience function:
# from phasic import get_trace_by_hash
# trace = get_trace_by_hash("coalescent_n5_theta1")

print("(Uncomment the lines above when the registry is populated with traces.)")
(Uncomment the lines above when the registry is populated with traces.)

Inspecting the source of a trace

When working with shared models, it is useful to know where a trace would be loaded from — local cache, the IPFS daemon, or a public HTTP gateway. The trace_source() method tells you without actually downloading anything.

# Demonstrate trace_source() using a mock registry
import json, tempfile
from pathlib import Path

# Create a temporary registry with one trace
tmp = Path(tempfile.mkdtemp())
demo_registry = {
    "version": "1.0.0",
    "traces": {
        "coalescent_n5": {
            "cid": "QmYwAPJzv5CZsnN625s3Xf2nemtYgPpHdWEz79ojWnPbdG",
            "description": "Coalescent model, n=5",
            "metadata": {"domain": "population-genetics", "model_type": "coalescent"}
        }
    }
}
(tmp / "registry.json").write_text(json.dumps(demo_registry))

reg = TraceRegistry(cache_dir=tmp, auto_update=False)
info = reg.trace_source("coalescent_n5")

print("Trace source info")
print("=" * 40)
for key, val in info.items():
    print(f"  {key:15s}: {val}")
Trace source info
========================================
  trace_id       : coalescent_n5
  cid            : QmYwAPJzv5CZsnN625s3Xf2nemtYgPpHdWEz79ojWnPbdG
  cache_path     : /var/folders/s6/srs8qkh52w1_h32d65z95tth0000gn/T/tmpew9w3b0f/traces/coalescent_n5/trace.json.gz
  cached         : False
  backend        : ipfs
  private_network: False
  source         : ipfs_daemon

The source field tells you where the next get_trace() call will fetch data from:

source Meaning
local_cache Already downloaded; loaded from ~/.phasic_cache/
ipfs_daemon Will be fetched from the local IPFS daemon (fast, peer-to-peer)
http_gateway Will be fetched from a public HTTP gateway (slower, but zero-config)

If you are using a custom TransportBackend (e.g. S3 or GCS), the source will be the backend’s name property.

Pinning a model for sharing

On IPFS, content is garbage-collected unless someone explicitly pins it. Pinning tells your local IPFS node to keep the data permanently, and — because IPFS is peer-to-peer — anyone who requests the same CID can fetch it from your node.

In other words: pinning a model = hosting it for the community.

backend = IPFSBackend()  # requires running daemon

cid = "QmYwAPJzv5CZsnN625s3Xf2nemtYgPpHdWEz79ojWnPbdG"

# Pin a CID so it persists on your node
backend.pin(cid)
print(f"Pinned: {cid}")

# Check pinning status
print(f"Is pinned? {backend.is_pinned(cid)}")

To pin all traces from the registry at once you could write:

for t in registry.list_traces():
    backend.pin(t['cid'])
    print(f"Pinned {t['trace_id']}")

This makes your machine a mirror for the entire model library.

Unpinning

If you no longer want to host a particular trace, unpin it. After unpinning the IPFS garbage collector may reclaim the disk space.

backend.unpin(cid)
print(f"Unpinned: {cid}")
print(f"Is pinned? {backend.is_pinned(cid)}")  # -> False

Unpinning only affects your node. If other nodes also pin the same CID, the trace remains available on the network.

Live demo (requires IPFS daemon)

The cell below demonstrates the full pin/unpin cycle. It will only succeed if you have the IPFS daemon running. If the daemon is not available the cell prints an explanation instead.

from phasic.exceptions import PTDBackendError

backend = IPFSBackend()
cid = "QmYwAPJzv5CZsnN625s3Xf2nemtYgPpHdWEz79ojWnPbdG"

if backend.status()["daemon"]:
    # --- Full pin / unpin cycle ---
    print("Daemon is running — demonstrating pin/unpin cycle:\n")
    try:
        backend.pin(cid)
        print(f"1. Pinned  {cid[:20]}...")
        print(f"   is_pinned = {backend.is_pinned(cid)}")

        backend.unpin(cid)
        print(f"\n2. Unpinned {cid[:20]}...")
        print(f"   is_pinned = {backend.is_pinned(cid)}")
    except PTDBackendError as e:
        print(f"Pin/unpin failed (CID may not exist on the network yet):\n  {e}")
        print("\nThis is expected if the CID has not been published.")
        print("With a real published trace, pin() and unpin() work instantly.")
else:
    print("No IPFS daemon detected.")
    print("To try this demo, start the daemon with:")
    print("  ipfs daemon &")
    print("\nWithout a daemon, traces are still available via HTTP gateways.")
Daemon is running — demonstrating pin/unpin cycle:

Pin/unpin failed (CID may not exist on the network yet):
  Failed to pin QmYwAPJzv5CZsnN625s3Xf2nemtYgPpHdWEz79ojWnPbdG: ReadTimeout: HTTPConnectionPool(host='127.0.0.1', port=5001): Read timed out. (read timeout=5)

This is expected if the CID has not been published.
With a real published trace, pin() and unpin() work instantly.

Publishing a trace

To share a trace you have recorded, use publish_trace(). This uploads the trace to IPFS and optionally submits a pull request to the registry repository so that others can discover it.

# Example publish workflow (requires IPFS daemon)
# ------------------------------------------------
#
# from phasic.trace_elimination import record_elimination_trace
#
# # 1. Build a model and record a trace
# graph = Graph(my_callback, theta_dim=2)
# trace = record_elimination_trace(graph, theta_dim=2)
#
# # 2. Publish to IPFS + registry
# cid = registry.publish_trace(
#     trace=trace,
#     trace_id="my_model_v1",
#     metadata={
#         "description": "Population genetics model with migration",
#         "domain": "population-genetics",
#         "model_type": "structured-coalescent",
#         "param_length": 2,
#         "vertices": trace.n_vertices,
#         "tags": ["migration", "two-population"]
#     },
#     submit_pr=True   # opens a GitHub PR to add the trace to the registry
# )
# print(f"Published with CID: {cid}")
#
# # 3. Pin so your node serves it
# backend.pin(cid)

print("(Uncomment the code above to publish a real trace.)")
(Uncomment the code above to publish a real trace.)

Retracting a faulty model

Mistakes happen — a trace might be built from a buggy callback, the wrong number of samples, or a corrupted graph. The retract_trace() method marks a trace as retracted in your local registry cache. After retraction:

  • list_traces() skips retracted entries
  • get_trace() raises an error with the retraction reason
  • The cached trace file is deleted to prevent accidental use

To retract a trace globally (for all users), submit a PR to the registry repository that adds "retracted": true to the trace entry in registry.json.

import json, tempfile
from pathlib import Path
from phasic.trace_repository import TraceRegistry
from phasic.exceptions import PTDBackendError

# Set up a demo registry with two traces
tmp = Path(tempfile.mkdtemp())
demo_reg = {
    "version": "1.0.0",
    "traces": {
        "good_model": {
            "cid": "QmGoodCID",
            "description": "Well-tested coalescent model",
            "metadata": {"domain": "population-genetics"}
        },
        "buggy_model": {
            "cid": "QmBuggyCID",
            "description": "Model with incorrect rate calculation",
            "metadata": {"domain": "population-genetics"}
        }
    }
}
(tmp / "registry.json").write_text(json.dumps(demo_reg))

reg = TraceRegistry(cache_dir=tmp, auto_update=False)

# Before retraction
print("Before retraction:")
for t in reg.list_traces():
    print(f"  {t['trace_id']:20s} {t['description']}")

# Retract the buggy model
reg.retract_trace("buggy_model", reason="Incorrect coalescent rate: n*(n-1) instead of n*(n-1)/2")
print("\nRetracted 'buggy_model'.")

# After retraction: list_traces() skips it
print("\nAfter retraction:")
for t in reg.list_traces():
    print(f"  {t['trace_id']:20s} {t['description']}")

# get_trace() raises with the reason
try:
    reg.get_trace("buggy_model")
except PTDBackendError as e:
    print(f"\nget_trace() error:\n  {e}")
Before retraction:
  good_model           Well-tested coalescent model
  buggy_model          Model with incorrect rate calculation

Retracted 'buggy_model'.

After retraction:
  good_model           Well-tested coalescent model

get_trace() error:
  Trace 'buggy_model' has been retracted: Incorrect coalescent rate: n*(n-1) instead of n*(n-1)/2
  This trace is no longer available for download.

Cache management

Downloaded traces are cached in ~/.phasic_cache/ by default. You can inspect and clear the cache with clear_cache().

import json, gzip, tempfile
from pathlib import Path
from phasic.trace_repository import TraceRegistry

# Set up a cache with two fake traces
tmp = Path(tempfile.mkdtemp())
demo_reg = {
    "version": "1.0.0",
    "traces": {
        "trace_a": {"cid": "QmA", "description": "Trace A", "metadata": {}},
        "trace_b": {"cid": "QmB", "description": "Trace B", "metadata": {}},
    }
}
(tmp / "registry.json").write_text(json.dumps(demo_reg))

# Simulate cached files
for tid in ("trace_a", "trace_b"):
    d = tmp / "traces" / tid
    d.mkdir(parents=True)
    with gzip.open(d / "trace.json.gz", "wt") as f:
        json.dump({"placeholder": True}, f)

reg = TraceRegistry(cache_dir=tmp, auto_update=False)

# Clear a single trace
n = reg.clear_cache("trace_a")
print(f"Cleared {n} cache entry (trace_a)")

# Clear everything
n = reg.clear_cache()
print(f"Cleared {n} remaining cache entries")
Cleared 1 cache entry (trace_a)
Cleared 1 remaining cache entries

Using a custom transport backend

If IPFS is not suitable for your environment, you can implement your own TransportBackend. For example, an S3 backend:

from phasic.trace_repository import TransportBackend

class S3Backend(TransportBackend):
    """Example backend that stores traces in an S3 bucket."""

    @property
    def name(self) -> str:
        return "s3"

    def get(self, cid, output_path=None):
        # In a real implementation you would call boto3 here
        print(f"  S3Backend.get({cid!r})")
        content = b"(placeholder content)"
        if output_path is not None:
            from pathlib import Path
            Path(output_path).write_bytes(content)
            return None
        return content

    def add(self, path):
        print(f"  S3Backend.add({path!r})")
        return "s3://my-bucket/traces/example"

# Plug it into the registry
import json, tempfile
from pathlib import Path

tmp = Path(tempfile.mkdtemp())
reg_data = {"version": "1.0.0", "traces": {}}
(tmp / "registry.json").write_text(json.dumps(reg_data))

reg = TraceRegistry(cache_dir=tmp, auto_update=False, backend=S3Backend())
print(f"Registry backend: {reg.backend.name}")
Registry backend: s3

Summary

Task Method
Check IPFS status IPFSBackend().status()
List traces registry.list_traces(domain=..., model_type=..., tags=...)
Download a trace registry.get_trace(trace_id)
Check where a trace comes from registry.trace_source(trace_id)
Pin a model for sharing backend.pin(cid)
Unpin a model backend.unpin(cid)
Publish a new trace registry.publish_trace(trace, trace_id, metadata)
Retract a faulty model registry.retract_trace(trace_id, reason)
Clear local cache registry.clear_cache() or registry.clear_cache(trace_id)
Use a different backend TraceRegistry(backend=MyBackend())