unclone
UnClone
UnClone - An unofficial PyClone and PhyClone Clone for Clonal Analysis
unclone is an unofficial reimplementation of PyClone-VI and PhyClone. Written in Crystal CLI with a Rust kernel.
Scope
vi: PyClone-VI style variational inferencephy run: PhyClone-style tree trace generation (JSONL)phy map: MAP-like summary from phy tracephy consensus: topology + clade consensus summaryphy topology-report: topology support summary from phy trace- TSV input with PyClone-VI-compatible core fields
- deterministic runs with fixed seeds
- Rust-side parallelism with
--kernel-threads
Mode maturity
- PyClone-VI mode is near-parity with the original PyClone-VI implementation.
phycommands are experimental.phycommands are not intended to reproduce upstream PhyClone results exactly.
Release Binaries
GitHub release builds may include separate x86_64 baseline and v3 archives.
baseline: built withx86-64-v2; use this for wider compatibility on older x86_64 CPUs.v3: built withx86-64-v3; use this on newer x86_64 CPUs with AVX2/FMA support for faster VI hot paths.
If unsure, start with baseline. On Apple Silicon, the aarch64 build already uses the platform baseline SIMD/FMA support and does not need a separate v3 variant.
Build
Requirements:
- Crystal
- Rust / Cargo
- make
Main workflows are exposed through the Makefile.
make build
For an optimized Crystal CLI build, use:
make build release=1
The resulting binary is bin/unclone.
By default, make build avoids build-machine-specific CPU tuning. This keeps the resulting binary suitable for ordinary local use, CI, and redistribution to similar systems.
For a local machine only, you can ask Rust to tune the kernel for the current CPU:
make build release=1 cpu=native
For distribution builds, prefer an explicit CPU level instead of native:
make build release=1 cpu=x86-64-v2
make build release=1 cpu=x86-64-v3
Additional Rust codegen flags can still be passed through RUSTFLAGS:
RUSTFLAGS="-C opt-level=3" make build release=1 cpu=native
Test
make test
Lint
make lint
Run
Variational inference:
./bin/unclone vi -i ../pyclone-vi/examples/synthetic.tsv -o out.tsv
Deterministic VI run:
./bin/unclone vi -i ../pyclone-vi/examples/synthetic.tsv -o out.tsv -c 4 -d beta-binomial -g 21 -r 2 --max-iters=200 --precision=1000 --seed=7 --kernel-threads=1 --restart-parallelism=1 --print-freq=0
Phy workflow:
./bin/unclone phy run -i input.tsv -o trace.jsonl --num-iters=50 --num-chains=2 --num-particles=16 --burnin=1000 --seed=7
./bin/unclone phy map -i trace.jsonl -o map.json
./bin/unclone phy consensus -i trace.jsonl -o consensus.json --consensus-threshold=0.5
./bin/unclone phy topology-report -i trace.jsonl -o topology_report.json
Expected input columns are:
mutation_idsample_idref_countsalt_countsmajor_cnminor_cnnormal_cn
Optional columns:
tumour_contentdefault1.0error_ratedefault0.001
Larger VI run:
./bin/unclone vi -i ../pyclone-vi/examples/tracerx.tsv -o out.tsv -c 40 -d beta-binomial -r 2 --precision=200 --seed=7 --print-freq=0
Common Options
-i,--in-file: input TSV-o,--out-file: output TSV-c,--num-clusters: cluster cap-d,--density:binomialorbeta-binomial--seed: fixed seed for reproducibility--print-freq: progress output frequency
VI-only:
-g,--num-grid-points: CCF grid size-r,--num-restarts: number of restarts--max-iters: maximum VI iterations--mix-weight-prior: Dirichlet prior weight--precision: beta-binomial precision--kernel-threads: Rust kernel parallelism--restart-parallelism: outer restart parallelism--debug-init-file: debug-only JSON file withpi,theta,zarrays for same-initial-state validation
Phy run only:
-b,--burnin: burn-in iterations--num-iters: main-chain MCMC iterations--num-chains: number of chains--num-particles: particle count--thin: trace thinning interval--resample-threshold: SMC ESS resampling threshold-p,--proposal:bootstrap,fully-adapted, orsemi-adapted-s,--subtree-update-prob: subtree PG probability--num-samples-data-point: data-point Gibbs passes per iteration--num-samples-prune-regraph: prune-regraph passes per iteration--concentration-update,--no-concentration-update: concentration update toggle--concentration-value: initial concentration value--grid-size: CCF grid size for the exact outlier model-l,--outlier-prob: fallback outlier prior-t,--max-time: maximum runtime in seconds-c,--cluster-file: optional cluster assignment TSV
Python helper selection:
UNCLONE_PYTHON: default Python executable for helper scripts
Notes:
vi --python-compatiblestill expects a Python 3 executable with NumPy support fordefault_rng
Diagnostics
Restart diagnostics for VI:
PCV_DEBUG_RESTART_METRICS_FILE=restart_metrics.csv \
./bin/unclone vi -i ../pyclone-vi/examples/tracerx.tsv -o out.tsv -c 40 -d beta-binomial -r 2 --precision=200 --seed=7 --print-freq=1
This writes one row per restart with:
restartseedfinal_elboused_clustersis_best
Optional kernel profiling:
PCV_PROFILE=1 \
./bin/unclone vi -i ../pyclone-vi/examples/tracerx.tsv -o out.tsv -c 4 -d beta-binomial -r 1 --precision=200 --seed=7 --print-freq=0
This prints aggregated timings to stderr for:
- initial ELBO
update_zupdate_piupdate_theta- iterative ELBO recomputation
Debug-only initial value injection:
./bin/unclone vi -i ../pyclone-vi/examples/synthetic.tsv -o out.tsv -c 4 -g 21 -r 1 --debug-init-file=init.json --print-freq=0
The JSON file must contain flat pi, theta, and z arrays matching:
pi:num_clusterstheta:num_clusters * num_samples * num_grid_pointsz:num_mutations * num_clusters
This hook is intended for implementation comparison and fairness checks, not normal runs.
Current status
- Crystal CLI and Rust kernel are wired end to end
- VI entry point is available and near-parity with the original PyClone-VI implementation
phy run/map/consensus/topology-reportworkflow is available with JSONL/JSON outputs- VI restart selection uses best ELBO
- Rust hot paths use Rayon when enabled
- output rows are inference-derived and cluster IDs are compactly renumbered
- tests cover Rust units, Crystal specs, and a deterministic golden output check
PhyClone Notes
- The
phyimplementation is separate from upstream PhyClone. - Exact matching of PhyClone posterior probabilities, sampler behavior, traces, and output files is not planned.
- In
phy run,num_itersis total iterations and recorded trace length follows post---burnin/--thin. - consensus output includes clade support and a
consensus_treereconstruction, while representative topology fields remain for compatibility. - loss prior supports
cellular_prevalence-informed assignment via--cluster-filemetadata. - trace and post-process outputs use unclone's JSONL/JSON formats.
Attribution And License
unclone is an unofficial reimplementation of the methods below. Cite the original papers, not unclone.
- PyClone-VI — Roth-Lab/pyclone-vi — Gillis & Roth, BMC Bioinformatics 2020. doi:10.1186/s12859-020-03919-2
- PhyClone — Roth-Lab/PhyClone — Hurtado, Bouchard-Côté & Roth, Bioinformatics 2025. doi:10.1093/bioinformatics/btaf344
- PyClone — Roth-Lab/pyclone — Roth et al., Nature Methods 2014. doi:10.1038/nmeth.2883
Upstream is GPL v3 or later; unclone is GPL v3 or later as well.
unclone
- 0
- 0
- 0
- 0
- 1
- about 6 hours ago
- June 21, 2026
GNU General Public License v3.0 only
Sun, 21 Jun 2026 05:09:45 GMT