Benchmark
ARDA's Autonomous Discovery: Validated on Real Physics
Can an autonomous system discover genuine scientific laws without human guidance? Not approximate them. Not suggest plausible-looking equations. Actually discover the governing dynamics of physical systems, verify them against known ground truth, and do so with full negative controls — all through a single API call. That is the question Vareon Research set out to answer, and the results, published March 2026, are unambiguous: ARDA discovered governing equations, causal graphs, and regulatory network topologies across three validated physics systems, with zero human input and full reproducibility.
The validation protocol was deliberately rigorous. Each experiment used a system with established ground truth — physics that textbooks agree on, dynamics that decades of laboratory work have confirmed. ARDA was given raw observational data with no hints about the underlying mechanisms. The engine profiled the data, selected discovery modes, ran automated negative controls, and emitted typed scientific claims. Those claims were then compared against known solutions. The results speak for themselves: R² values above 0.99, path fidelity above 0.98, and every negative control battery passed without exception.
Three systems were chosen to span a range of scientific complexity: a mechanical oscillator governed by a second-order differential equation, a four-compartment pharmacokinetic model with directed causal structure, and a six-variable gene regulatory network with cyclic repression topology. Each system tests a different discovery mode and a different class of scientific claim. Together, they demonstrate that ARDA's discovery capabilities are not narrow tricks tuned to a single domain — they are general-purpose engines that produce defensible science across physics, pharmacology, and biology.
Experiment 1: Damped Harmonic Oscillator
The damped harmonic oscillator is one of the most fundamental systems in physics — a mass on a spring with friction. Its governing equation is taught in every undergraduate mechanics course, making it an ideal first test: can ARDA rediscover what every physics student learns, starting from nothing but time-series data?
ARDA's Symbolic discovery mode was given raw displacement and velocity observations. No equation templates. No hints about spring constants or damping coefficients. No human intervention of any kind. In 16.3 seconds, the engine discovered the governing equation with an R² of 0.9944 — effectively a perfect reconstruction of the underlying dynamics. The discovered relationship was not a curve fit or an approximation. It was the governing equation, recovered from data alone.
Two critical negative controls confirmed the result was genuine. Time shuffle control randomized the temporal ordering of the data, destroying any true dynamical signal while preserving marginal statistics. Phase randomization preserved the power spectrum while destroying phase relationships. Both controls passed cleanly, confirming that ARDA's discovery depended on the actual dynamical structure of the data, not on statistical artifacts or spectral coincidences. These controls were not manually requested — they ran automatically as part of the governed discovery pipeline.
R² = 0.9944. Sixteen seconds. Zero human input. The governing equation of a damped harmonic oscillator, discovered from raw observational data with automated negative controls. That is the bar ARDA sets for what autonomous discovery means.
Experiment 2: Four-Compartment Pharmacokinetics (ADME)
The second validation system was substantially more complex: a four-compartment pharmacokinetic model describing Absorption, Distribution, Metabolism, and Excretion (ADME). This is the standard framework pharmaceutical scientists use to model how drugs move through the body. The causal structure — which compartments drive which — is well-established, making it a rigorous test for ARDA's Causal Dynamics Engine (CDE).
CDE was given multivariate time-series data from the four-compartment system with no labels, no structural hints, and no prior knowledge of pharmacokinetic theory. The engine recovered the drug absorption causal graph with three high-confidence directed edges, each at statistical significance p > 0.90. Path fidelity — the degree to which the discovered causal pathways match the true dynamical evolution — reached 0.982. CDE correctly identified Tissue as the convergent node in the causal graph, a non-trivial structural inference that reflects genuine understanding of how the compartments interact.
The negative control battery for this experiment was comprehensive. All four out of four negative controls passed without exception: time shuffle, phase randomization, label permutation, and noise robustness. The noise robustness test was particularly demanding — the discovered causal structure remained stable at 93.2% noise levels, meaning CDE's causal graph survived even when the signal-to-noise ratio was pushed to extreme limits. That robustness is not a statistical convenience. It is evidence that the discovered structure reflects genuine mechanism, not fragile pattern matching.
Why the ADME result matters beyond pharmacokinetics
Pharmacokinetic models are not academic curiosities — they are the foundation of drug dosing, toxicity prediction, and formulation design across the pharmaceutical industry. Recovering the ADME causal graph autonomously means that ARDA can potentially accelerate the characterization of drug absorption dynamics for novel compounds, reducing the time from candidate identification to mechanistic understanding. The fact that the causal structure was recovered without any domain-specific priors suggests that the same capability generalizes to other multi-compartment systems: environmental transport, chemical reactor networks, and metabolic pathways.
Experiment 3: Six-Variable Gene Regulatory Network (Repressilator)
The third and most demanding validation system was a six-variable gene regulatory network modeled on the repressilator — a synthetic biological circuit consisting of three genes that repress each other in a cycle, producing oscillatory behavior. The repressilator is one of the landmark achievements of synthetic biology, and its cyclic repression topology is well-characterized, making it an exacting test for causal discovery in complex biological systems.
ARDA's Causal Dynamics Engine discovered the cyclic repression topology from multivariate gene expression data. Path fidelity reached 0.989 — near-perfect reconstruction of the true causal structure. CDE correctly identified mRNA_B as the primary regulatory hub, a structurally important inference about which molecular species occupies the most influential position in the regulatory network. All four out of four CDE negative controls passed: time shuffle, phase randomization, label permutation, and noise robustness. The Neuro-Symbolic decomposition analysis revealed 92.6% conservative dynamics, confirming that the discovered network preserves the energy-like constraints characteristic of well-regulated biological circuits.
This result deserves emphasis. Gene regulatory networks are among the most challenging systems in modern biology. They are noisy, high-dimensional, and governed by complex nonlinear interactions. Discovering the correct cyclic topology — not just the pairwise correlations, but the directed causal structure — from observational data alone, with automated negative controls confirming every edge, represents a qualitative advance in what autonomous discovery systems can achieve in biology.
Negative controls: automated and non-optional
Every discovery mode in ARDA runs automated negative controls as a non-optional part of the discovery process. This is not a convenience feature — it is an architectural commitment. Time shuffle destroys temporal dependencies while preserving marginal distributions, testing whether a discovered relationship depends on genuine dynamics or mere co-occurrence. Phase randomization preserves power spectra while destroying phase relationships, testing whether the discovery depends on specific temporal coordination or just frequency content. Label permutation tests whether the discovered structure depends on the actual variable assignments or would emerge from any random relabeling. Noise robustness progressively degrades the signal to establish how much corruption the discovery can withstand before the claim collapses.
These controls ran automatically on every experiment. No human had to remember to schedule them, design them, or interpret whether they passed. The engine ran them because the governance policy required them, and the results were recorded in the evidence ledger alongside the primary discovery. That automation is what separates a validated scientific claim from a figure in a slide deck. When every claim carries its own stress-test results, the conversation shifts from "do you believe this?" to "here is the evidence — evaluate it yourself."
The four discovery modes
ARDA provides four discovery modes, each targeting a distinct class of scientific question. Symbolic mode searches for compact governing equations — the kind of closed-form relationships that compress mechanism into forms humans can inspect and transfer across contexts. Neural mode captures high-dimensional dynamics where premature structural commitment would be dishonest. Neuro-Symbolic mode bridges the two, using neural expressiveness as a scaffold and distilling interpretable structure when the data support it. The Causal Dynamics Engine (CDE) targets directed mechanistic claims — causal graphs, intervention predictions, and path-level dynamical summaries — treating identifiability and negative controls as first-class requirements.
The validation paper exercised Symbolic mode on the damped oscillator and CDE on the pharmacokinetic and gene regulatory systems. Each mode was selected because it matched the scientific question: Symbolic for recovering a governing equation, CDE for recovering causal structure. That mode selection was performed by the engine, not by a human analyst. The routing is part of the governed pipeline — the engine explains which mode it chose and why, so disagreement can happen before results are produced rather than after.
Reproducibility through a single API call
Every experiment described in the validation paper is reproducible through a single API call: POST /v1/discover. The request specifies the data, the discovery mode, and the Truth Dial tier. The engine handles everything else — data profiling, mode routing, control scheduling, claim typing, and evidence packaging. There is no manual pipeline to reconstruct, no notebook to debug, and no environment to resurrect. The API call is the experiment, and the response is the typed scientific claim with full evidence provenance.
The Truth Dial governs the rigor tier across all modes: explore for breadth with honest uncertainty, validate for stress testing with mandatory negative controls, publish for external-grade reproducibility with deterministic replay. The validation experiments were conducted at validate tier, meaning every negative control was mandatory and every claim was stress-tested before emission. Moving to publish tier would add deterministic replay guarantees and full evidence packaging suitable for regulatory submission or peer review.
What validation on known ground truth proves
Validation against known ground truth answers the most fundamental question about any discovery system: does it find real science? Not plausible science. Not approximately correct science. Real, verifiable, ground-truth-matching science. ARDA's results on three distinct physics systems — a mechanical oscillator, a pharmacokinetic model, and a gene regulatory network — demonstrate that the answer is yes, across different domains, different discovery modes, and different levels of system complexity.
The validation also demonstrates something subtler: ARDA does not merely fit data. It discovers structure. The governing equation for the oscillator is not a regression — it is the physical law. The causal graph for ADME is not a correlation matrix — it is the directed mechanism. The cyclic topology for the repressilator is not a clustering — it is the regulatory architecture. These are qualitatively different outputs from what predictive models produce, and they require qualitatively different validation. The fact that all three passed their respective negative control batteries confirms that the discovered structures are genuine, not artifacts of overfitting or statistical coincidence.
AI-native research and engineering
Vareon is an AI-native research and engineering company built from the ground up on first principles. ARDA's API-native design reflects this: every discovery capability is accessible through structured endpoints that AI agents can invoke directly. The validation paper demonstrates that autonomous agents can conduct rigorous scientific research — from data profiling through discovery to controlled validation — without human intermediation. The implications for research at scale are significant. When discovery is an API call, the rate-limiting factor is no longer human attention. It is the quality of the scientific questions being asked.
The validation paper by Vareon Research, March 2026, will be available at vareon.com/research. It provides complete experimental protocols, result tables, and negative control outcomes for all three validation systems. The paper is written for researchers and practitioners who want to evaluate ARDA's capabilities against their own scientific standards — not marketing claims, but reproducible evidence with full provenance.
From validated foundations to open discovery
Validation on known ground truth is not an end goal — it is a foundation. If ARDA can autonomously discover the governing equation of a damped oscillator, the causal graph of drug absorption, and the regulatory topology of a gene network, then the novel discoveries it produces on systems without known ground truth deserve serious scientific consideration. They are not statistical artifacts from a black box. They are typed claims from a governed engine that has demonstrated its ability to find truth when truth is knowable, with every step documented in an evidence ledger that any scientist can audit.
That is the scientific argument for validation-first design. You build trust by showing the engine works where the answer is known. Then you deploy it where the answer is not yet known, with the same governance, the same controls, and the same evidence standards. The science compounds because the methodology is consistent, and the methodology is consistent because it is encoded in the engine — not in the habits of whichever researcher happens to be running the experiment that day.
Download Paper