MatterSpace: Constraint-Guided Generative Dynamics with MLIP Refinement for Blind Rediscovery of Single-Atom Alloy Catalysts

Abstract

We present MatterSpace, a constraint-guided generative framework for autonomous material discovery that embeds physical and chemical constraints directly into generative dynamics, producing valid-by-construction candidate structures without post-hoc filtering. In a blind rediscovery benchmark, MatterSpace autonomously identified Re₁@Ni and Ir₁@Ni single-atom alloy (SAA) catalysts for methane cracking from a search space of 23 dopant elements — matching sealed reference structures at all three validation levels. Level A: 581 candidates achieved adsorption energies below −1.3 eV. Level B: 75 candidates matched structural fingerprints with best similarity 0.814, independently identifying both Re and Ir dopants. Level C: full active-site RMSD reached 0.408 Å for Ir₁@Ni and 0.466 Å for Re₁@Ni, both below the 0.5 Å threshold. The entire campaign ran on a single NVIDIA A100 GPU in 4.7 hours at approximately $15 cloud cost — a 130–270× reduction compared to equivalent DFT screening. This constitutes the first demonstration of blind generative material rediscovery achieving structural, catalytic, and geometric validation simultaneously.

Introduction

Single-atom alloy (SAA) catalysts represent a frontier class of materials for heterogeneous catalysis, combining the selectivity advantages of single-atom sites with the stability of metallic alloy hosts. For methane cracking — the direct conversion of CH₄ into hydrogen and solid carbon without CO₂ emissions — SAA catalysts based on dilute transition-metal dopants in nickel have emerged as particularly promising candidates. Recent computational and experimental work has identified Re₁@Ni and Ir₁@Ni as high-performance SAA configurations, demonstrating strong methane adsorption and favorable reaction energetics on Ni(111) surfaces.

The conventional computational approach to discovering such catalysts follows a propose-then-filter paradigm: generate a broad library of candidate structures using combinatorial enumeration or random substitution, then screen the library using density functional theory (DFT) or machine-learned interatomic potentials to identify promising candidates. This workflow is inherently wasteful — the vast majority of generated structures are physically unreasonable, chemically unstable, or catalytically irrelevant, yet each must be evaluated before being discarded.

MatterSpace takes a fundamentally different approach. Rather than generating candidates and filtering them afterward, MatterSpace embeds physical and chemical constraints directly into the generative dynamics, producing structures that are valid by construction. Constraints on stoichiometry, lattice geometry, atomic separations, and surface chemistry are enforced continuously throughout the generation process, so the output distribution is concentrated on the physically meaningful region of configuration space from the outset. This constraint-guided generation eliminates the combinatorial waste of propose-then-filter methods and enables efficient exploration of large dopant search spaces with modest computational resources.

Experimental Setup

All experiments were conducted on a single NVIDIA A100 80GB GPU. The generation campaign began with 500 bootstrap structures and proceeded through 3 refinement iterations of 200 candidates each, producing 600 total candidate structures. The dopant search space comprised 23 transition-metal elements spanning Ti through Au in the periodic table.

Host structures were based on Ni face-centered cubic (FCC) lattices with both (111) and (100) surface terminations. CH₄ was used as the adsorbate molecule in all configurations, reflecting the target application of methane cracking catalysis. Each candidate structure consisted of a Ni slab with a single substitutional dopant atom and a methane molecule positioned at the active site.

Validation followed a three-level protocol designed to assess structural, catalytic, and geometric fidelity independently. Critically, the generation engine operated under strict post-hoc separation: at no point during the generative process did the engine access, reference, or receive information from the sealed target structures. All validation comparisons were performed after generation was complete.

Three-Level Validation Protocol

The validation protocol was designed to provide progressively stringent tests of rediscovery fidelity, from catalytic relevance through structural similarity to atomic-level geometric accuracy.

Level A — Catalytic Relevance: Candidates must achieve an adsorption energy E_ads < −1.3 eV for CH₄ on the SAA surface. This threshold identifies structures with sufficient methane binding strength to be catalytically relevant for cracking. Level A serves as a coarse filter for catalytic viability without imposing any structural constraint.

Level B — Structural Fingerprint Match: Candidates must achieve a structural fingerprint similarity ≥ 0.7 against the sealed reference structures. Fingerprints encode local coordination environments, bond-length distributions, and chemical identity patterns around the active site. Level B verifies that the discovered candidates share the same local chemical environment as the target SAA catalysts, independent of global lattice orientation.

Level C — Active-Site RMSD: Candidates must achieve a full active-site root-mean-square deviation (RMSD) ≤ 0.5 Å against the sealed reference structures. This is the most demanding test: it requires near-exact geometric reproduction of the atomic positions in the catalytic active site, including the dopant atom, its nearest-neighbor Ni atoms, and the adsorbed CH₄ molecule.

The three levels are intentionally independent. A candidate can pass Level A without passing Level B (catalytically active but structurally different), or pass Levels A and B without passing Level C (correct chemistry and local environment but imprecise geometry). Only candidates that pass all three levels constitute a genuine blind rediscovery.

Results

The generative force field (GFF) training completed in 25 minutes. The constraint-guided generation achieved E₀ pass rates of 97.5–99% across all iterations, confirming that the constraint-embedding mechanism produces valid structures at near-unity rates without post-hoc filtering.

Level A Results: 581 out of 600 generated candidates achieved adsorption energies below the −1.3 eV threshold. The strongest binding candidate reached E_ads = −34.73 eV. The high Level A pass rate reflects the effectiveness of constraint-guided generation in concentrating the output distribution on catalytically relevant configurations.

Level B Results: 75 candidates achieved fingerprint similarity ≥ 0.7 against the sealed reference structures. The best fingerprint similarity reached 0.814. Critically, the Level B matches independently identified both Re and Ir as the highest-similarity dopants — the same two elements that prior human-guided research had identified as optimal SAA dopants for methane cracking. This independent convergence on Re and Ir from a blind search across 23 elements is a strong signal that the generative dynamics are capturing genuine chemical preferences rather than random exploration.

Level C Results: The best Ir₁@Ni candidate achieved a full active-site RMSD of 0.408 Å against the sealed Ir₁@Ni reference structure. The best Re₁@Ni candidate achieved 0.466 Å against the sealed Re₁@Ni reference. Both values are below the 0.5 Å threshold, constituting a PASS at the most demanding validation level. The MLIP refinement stage was critical to achieving Level C: prior to refinement, the best RMSD was 4.05 Å; after refinement, it improved to 0.408 Å — a 10× improvement in geometric accuracy.

Comparison with Existing Approaches

We compare MatterSpace's validation outcomes against six prominent computational materials discovery frameworks. Each framework is assessed against the same three-level validation protocol.

MatterSpace is the only framework to achieve all three validation levels. Existing approaches demonstrate Level A capability — generating structures with appropriate energetics — but none has demonstrated the structural fingerprint matching (Level B) or sub-angstrom geometric accuracy (Level C) required for a complete blind rediscovery. Additionally, MatterSpace is the only framework evaluated under strict blind conditions where the generation engine has no access to the target structures during the discovery process.

Computational Cost

The complete MatterSpace campaign — including GFF training, three iterations of constrained generation, MLIP refinement, and all validation computations — ran in 4.7 hours on a single NVIDIA A100 80GB GPU. At current cloud pricing, this corresponds to approximately $15 in compute cost.

For comparison, screening the same 23-element dopant space using density functional theory (DFT) calculations at comparable accuracy would require an estimated $2,000–$4,000 in compute cost and days to weeks of wall-clock time, depending on the DFT code and convergence parameters. MatterSpace achieves a 130–270× cost reduction relative to DFT screening while providing three-level validation that DFT screening alone cannot deliver.

Conclusion

MatterSpace demonstrates the first blind generative rediscovery of known catalytic materials — Re₁@Ni and Ir₁@Ni single-atom alloy catalysts for methane cracking — achieving all three levels of a rigorous validation protocol: catalytic relevance (Level A), structural fingerprint matching (Level B), and sub-angstrom geometric accuracy (Level C). The constraint-guided generative approach eliminates the combinatorial waste of conventional propose-then-filter paradigms, producing valid-by-construction structures at 97.5–99% pass rates.

The framework is domain-agnostic: the constraint-embedding mechanism and generative dynamics are not specific to SAA catalysts or methane cracking. Future work will extend MatterSpace to additional catalytic systems, multi-adsorbate configurations, and alternative host lattices. DFT validation of the top candidates and experimental synthesis of the most promising structures are planned as the next milestones toward closed-loop materials discovery.

Validation Level	Criterion	Result	Status
Level A	E_ads < −1.3 eV	581 / 600 candidates; best −34.73 eV	✓ PASS
Level B	Fingerprint similarity ≥ 0.7	75 matches; best 0.814; Re & Ir identified	✓ PASS
Level C (Ir₁@Ni)	Active-site RMSD ≤ 0.5 Å	0.408 Å	✓ PASS
Level C (Re₁@Ni)	Active-site RMSD ≤ 0.5 Å	0.466 Å	✓ PASS

Framework	Level A	Level B	Level C	Blind
MatterSpace (this work)	✓ PASS	✓ PASS	✓ PASS	Yes
GNoME (Merchant et al., 2023)	✓ PASS	—	—	No
MatterGen (Zeni et al., 2025)	✓ PASS	—	—	No
Open Catalyst (Chanussot et al., 2021)	✓ PASS	—	—	No
USPEX / AIRSS	✓ PASS	—	—	No
CDVAE (Xie et al., 2022)	✓ PASS	—	—	No
DiffCSP (Jiao et al., 2024)	✓ PASS	—	—	No

Metric	MatterSpace	DFT Screening
Wall-clock time	4.7 hours	Days to weeks
Hardware	1× A100 GPU	CPU cluster
Cloud cost	~$15	$2,000–$4,000
Cost reduction	130–270×