GIZMO · substrate-graph inference

A shared biochemistry substrate for multi-omic disease biology.

GIZMO is a multi-source biochemistry graph — Reactome reactions, StringDB PPI, gene-catalysis edges — operated on by a per-patient MAP reconstruction with a Laplacian smoothness prior. Each patient's projection decomposes into a signed β (projection onto the log-PageRank substrate backbone) and an α residual (orthogonal mechanism axis). The substrate is what makes mixed-panel cohorts comparable; the decomposition is what surfaces disease biology that PageRank-only and Cohen's-d-only baselines miss.

Request a demo → See an example analysis

capabilities

Six layers of inference on one substrate.

Contextualized signature

Map your hit list onto the graph: metabolites by HMDB / KEGG / PubChem, genes and transcripts by Ensembl / Entrez, proteins by UniProt — with fuzzy name matching and dataset-memory from prior corrections. Anchors pin reference nodes so real biomarkers don't drift under graph updates.

Per-patient MAP projection

Each patient's heterogeneous measurements (metabolites, transcripts, proteins) are projected onto the biochem subgraph by minimising λ·fᵀLf + Σ (1/σ²)‖PKf−x‖² — a Laplacian smoothness prior with per-assay σ estimated from low-variance features. The result is one comparable signal vector per patient on the same ~38k-node biochem backbone.

Signed β / α decomposition

Sequential OLS regresses each patient's projection on log-PageRank and Louvain community indicators. β is the signed projection onto the substrate's hub backbone (an intensity scalar); α is the orthogonal residual (a mechanism axis). In unimodal disease cohorts β tracks severity; in bimodal cohorts (e.g. IDH-mutant vs IDH-wildtype glioma) β-sign separates subtypes unsupervised.

Reaction & pathway over-representation

Classical hypergeometric ORA against Reactome pathways with per-pathway fold enrichment and BH-FDR. Foreground = your hit list; background = measured HMDBs (or community-derived). Modules from the supervised arm bridge across pathways rather than recapitulating them — the cross-pathway-bridging claim is the one that survives matched-K head to head against WGCNA.

Anchor-gene retrieval

The win against PageRank-only and Cohen's-d-only baselines is structural, not absolute: GIZMO surfaces a set of disease anchor genes at median PageRank percentile 52 (mid-substrate, MW p=0.002 vs the Cohen's-d null) that are invisible to either baseline alone. Comparator scores on flat metrics tie DIABLO, MOFA+, SNF — uniqueness is what the framework actually offers.

Auditable & reproducible

Single λ-multiplier (0.1) calibrated on Crohn (n=33), applied unchanged across every cohort. CG solver with Jacobi preconditioning; pinned graph bundle (v1.0.0 — 79,998 nodes / 298,906 edges; ~38k after hub-cap). All ten hyperparameters disclosed in-paper. Every run is reproducible against the named active graph version.

worked example

IDH-glioma subtype, recovered unsupervised from a single modality.

Adult-type diffuse glioma splits cleanly along IDH1/IDH2 mutation status — a 5-year-survival-defining axis that drives standard of care. The benchmark: can a single-modality projection through GIZMO separate IDH-mutant from IDH-wildtype patients without any label supervision? The signed β-projection does it in two different cohorts with two different modalities.

What the run returns

• Trautwein NMR cohort (n=88): signed-β AUC 0.805 for IDH status — from metabolomics alone (0.3% substrate coverage; the framework still works because it routes through the substrate, not the panel).
• TCGA_IDH RNA replication (n=458): signed-β AUC 0.674 — same decomposition, different modality, different cohort, same direction.
• β-sign separates the subtypes without labels; the substrate-mediated projection is what makes the unsupervised split possible. Standard PCA on raw features can't find this axis.
• 286 GIZMO-only anchor genes at median PageRank percentile 52 (mid-substrate; MW p=0.002 vs Cohen's-d null) — the genes that are invisible to PageRank-only and Cohen's-d-only baselines.

Why it's different from running DIABLO / MOFA+ / SNF

On flat benchmark metrics GIZMO ties DIABLO, MOFA+, and SNF. What's unique is structural: substrate-mediated graph inference accesses mid-PageRank disease biology those methods don't surface, and the β/α decomposition functions as an intensity scalar in unimodal disease cohorts and as a subtype-direction indicator in bimodal ones — same projection, two regimes.

Graph snapshot

Total nodes: ~80 k
Metabolites: ~7 k
Reactions: ~15 k
Genes: ~16 k
Disease nodes: ~38 k
Biochem subgraph: ~38 k (hub-cap)

Bundle v1.0.0. Reactome + StringDB PPI + gene catalysis. Hub-cap (k=200) is what the per-patient MAP solver actually operates on; full graph is for inspection and downstream enrichment. Every production run is anchored to a single named active graph version.

positioning

How GIZMO differs from DIABLO, MOFA+, SNF, and the pathway tools.

Capability	GIZMO	DIABLO	MOFA+	SNF	MetaboAnalyst
Multi-omic patient stratification (flat metric)	ties (7/17 cells; MOFA wins 14/17 at matched-K)	✓ (multi-block PLS)	✓ (probabilistic FA)	✓ (similarity fusion)	—
Panel-agnostic cross-cohort comparability	✓ 28× Jaccard ratio (Su↔Filbin COVID)	— per-cohort latent space	— per-cohort factor space	— per-cohort similarity	—
Mid-substrate disease anchor recovery	✓ 286 anchors at PR-percentile 52 (p=0.002)	— driven by top-loaded features	— driven by top-weighted factors	— driven by k-NN topology	via pathway enrichment
Signed β / α decomposition (intensity vs mechanism)	✓ unimodal & bimodal regimes	—	—	—	—
Unsupervised subtype recovery from a single modality	✓ IDH-glioma β AUC 0.805 (NMR n=88)	— requires multi-block	— factor selection ambiguous	via clustering	—
Mechanistic pathway / ORA on the same projection	✓ Louvain modules → Reactome ORA	—	—	—	✓ standalone ORA

DIABLO, MOFA+, and SNF are mature multi-block / factor models that perform well on flat patient-stratification metrics — GIZMO ties them, it doesn't dominate. What GIZMO uniquely offers is the substrate: a fixed biochemistry graph that makes mixed-panel cohorts comparable, surfaces mid-PageRank disease genes invisible to top-feature-driven methods, and supplies the same projection in two regimes — β as severity scalar in unimodal cohorts, β-sign as subtype direction in bimodal ones.

how to engage

Three paths, depending on what you need.

Self-serve on Forge

Upload your metabolite list, map to graph nodes, run analysis. The active graph is shared across every Forge user, so results are comparable across datasets.

Request Forge access →

Target discovery engagement

For biotechs with a compound panel and a disease area. We return a ranked target list with causal chains, counterfactual KO deltas, and druggability scores. Typically in 6–8 weeks from NDA signing.

Scope a project →

Custom graph layer

Need a proprietary target set, phenotype ontology, or chemistry corpus integrated into the graph additively? We extend the active graph under contract, keeping public nodes intact.

Discuss a custom layer →

about

Two manuscripts in flight, one shared substrate.

GIZMO's method is described across two single-author Cell Systems submissions: the supervised paper covers MAP reconstruction, β/α decomposition, anchor-gene recovery against DIABLO / MOFA+ / SNF, and the IDH-glioma flagship; the unsupervised companion covers panel-agnostic cross-cohort comparability (28× Jaccard ratio on Su↔Filbin COVID). Both are in preparation, with bioRxiv preprints planned before submission.

11 cohorts in the validation ladder (Crohn, Su_COVID, Filbin_COVID, IDH_glioma, Erawijantari, Gao_RA, TCGA_IDH, TCGA_LUAD, KMPLOT_BRCA, GSE89408_RA, HMP2_IBD, CorEvitas_RA). Benchmark suite, admin editor for custom graph layers, and the pinned v1.0.0 graph bundle ship with the platform. For Joey's essays and related work see insilijo.github.io.

code
github.com/insilijo/GIZMO

platform
github.com/insilijo/forge