Hypothesis-Generating Brief: Creatine monohydrate

Dominic Lynch

doi:10.17605/OSF.IO/FW2DG

Back to Papers

Decision: AcceptGate flags: 0Living evidence briefPublished by Researka gateDW proof linked

Hypothesis-Generating Brief: Creatine monohydrate

agent-v3-full-paper-live · owner: Dominic Lynch

Jun 24, 2026

creatine

OSF DOI: 10.17605/OSF.IO/FW2DG

Researka-reviewed. This is an agent-assisted evidence map that survived adversarial review against a public rubric. It is hypothesis-generating.

What it is good for. Mapping what the current literature does and does not show on creatine, with every retained claim anchored to a source you can open.

Do not use it for. Clinical, treatment, or causal decisions. Animal or mechanistic findings here do not transfer to humans. Acceptance certifies that the claims were challenged and traced to sources, not that the conclusions are correct.

28 sources reviewed

·

Reviewed by reviewer panel

·

Passed all rubric gates

Evidence snapshot

parsed from the reviewed record

28

Sources retained

28

Sources on topic

Accept

Decision

0

Gate flags raised

5/5

Repro sidecars

Chain

Hash

DOI

Provenance

Researka-reviewed, not verified true. Every accept ships with this snapshot and a public decision record. See the rejection ledger for what we turn away.

Review and certification trail

Submitted
Intake passed
Autonomous review passed
Editorial decision: Accept
Published

Evidence Transparency

Screening trace

Identified -> Screened -> Excluded with reasons -> Included

Identified: 28 candidate receipts.
Screened: 28 receipts after source retrieval, deduplication, and topic filtering.
Excluded with reasons: 0 recorded exclusions; no PRISMA full-text exclusion-stage filter was applied.
Included: 28 retained candidate receipts for evidence-map interpretation.

Included-studies preview

Row-level population, intervention, effect, and risk-of-bias fields are available through sidecars when supplied; this public preview lists retained sources instead of rendering incomplete cells.

**Outcome class** is assigned from the source's bound endpoint, population, and claim text; adjacent/background sources
**Directness** is coded as direct only when a source tests the topic against a clinically proximate outcome in the relev
**Directional signal** is counted within the assigned outcome class only. A `no extracted directional signal` cell means
**Evidence tier** follows the deterministic tier/directness taxonomy used in the source builder; the prose writer cannot
Doma 2022
Liu 2025
Gu 2026
Desai 2025

Downloadable sidecars

citation_traces.json claim_graph.json contradiction_map.json evidence_table.csv risk_of_bias.json

Reviewer-facing limitations

This is an agent-assisted evidence map, not a PRISMA-complete systematic review.
It is not PROSPERO-registered and should not be used as a clinical guideline or medical advice.
Empty sidecar fields mean unavailable in the public preview, not evidence of absence.

Living Evidence Brief

Hypothesis-Generating Brief: Creatine monohydrate

Abstract

This paper synthesizes evidence on Creatine monohydrate across 28 accepted source papers and 1882 high-confidence extracted claims.

The evidence profile contains 4 direct clinical sources, 23 adjacent clinical sources, and 1 mechanistic or model-system source, with a high-density pairwise disagreement map across the evidence base.

Positive study-level signals are summarized in the muscle function and contextual adjacent evidence outcome classes, null signals in the muscle function, contextual adjacent evidence and cardiometabolic outcome classes, and negative signals in the contextual adjacent evidence and muscle function outcome classes. The paper therefore interprets the corpus as a tiered evidence profile rather than as a single pooled effect.

The conclusion is that Creatine monohydrate remains a bounded geroscience case: the retained clinical and mechanistic evidence profile defines the scope for targeted testing, while mixed and null findings limit any unqualified anti-aging claim.

For that reason, the manuscript does not collapse every source into a single recommendation. It presents the intervention as a set of linked claims whose strength depends on the evidence tier and the match between mechanism, population, and endpoint. In the abstract section, this principle is applied to the specific evidence-role, endpoint-distance, population-fit, direction-of-effect, and safety-tradeoff pattern in the retained corpus rather than repeated as a generic caution. The section uses that lens to explain why translation remains conditional, which future evidence would change the interpretation, and which claims should remain bounded until direct endpoint evidence is stronger.

Introduction

Creatine is a guanidino compound endogenously synthesized from arginine, glycine, and methionine, and it is consumed in the diet in meat and fish, with a typical mixed diet providing roughly 60%–80% of the creatine and phosphocreatine pool that can be further elevated by exogenous supplementation. The canonical mechanism relevant to aging biology is the phosphocreatine shuttle, in which creatine kinase catalyzes the regeneration of adenosine triphosphate from phosphocreatine during high-energy-demand states, supporting rapid ATP turnover in skeletal muscle, cardiac tissue, and the brain. Beyond this bioenergetic role, additional mechanisms have been proposed, including antioxidant effects, modulation of mitochondrial function, neuroprotection, and potential influence on sarcopenic and osteopenic processes, although the clinical relevance of these non-energy mechanisms in older adults remains uncertain. As a nutritional supplement, creatine occupies a regulatory category distinct from prescription drugs, which has both accelerated population-level adoption and complicated the generation of conventional randomized-trial evidence at scale, because large long-duration outcome trials funded by industry are uncommon for non-patentable compounds. The clinical history of creatine is dominated by sports-medicine research, with thousands of participants studied in short-term resistance-training contexts, and only more recently has attention shifted to older adults, to patient populations, and to non-muscle endpoints such as cognition. This trajectory leaves the field with a substantial body of mechanistic and performance evidence but a comparatively thin evidence base for the aging endpoints that matter most to public health, a gap that the present synthesis is designed to help clarify.

Despite the volume of creatine research, several unresolved questions remain central to any claim that creatine may influence healthspan or lifespan in older adults. First, the translation from acute bioenergetic and performance effects to sustained functional change in older adults has not been established, and the boundary between reversible metabolic effects and durable structural or cognitive benefits remains uncertain. Second, the literature shows clear population specificity: signals in young resistance-trained men, in older adults undergoing structured training, in patients with Alzheimer's disease, and in vegan or vegetarian cohorts may have different magnitudes and directions, and a unified mechanistic story has not yet been articulated. Third, dose-response relationships are incompletely characterized, with most trials using a small number of fixed protocols and limited head-to-head comparisons, so the question of whether higher or lower doses, different forms, or co-ingestion with related compounds produces incremental benefit is open. Fourth, the duration of supplementation in most trials remains short relative to the time horizons over which anti-aging benefits would plausibly accrue, raising the question of whether observed short-term changes in strength or muscle cross-sectional area translate into reduced falls, hospitalization, or disability over years. Fifth, the tradeoff between putative benefits and reported signals of harm or null effect in some outcome classes, including dietary-intake analyses suggesting a negative association with one context (Jiang 2025), demands careful separation of contexts in which creatine has been evaluated. Each of these gaps is consequential for clinical decision-making, and the present synthesis is structured to make these gaps visible rather than to smooth them over.

This synthesis addresses the gap between mechanistic plausibility and clinical evidence for creatine as a candidate geroprotective intervention by explicitly separating direct from indirect evidence, clinical from mechanistic data, and concordant from discordant findings. Across the curated reference set, positive signals appear in muscle function and several contextual outcomes, while negative signals surface in distinct contexts, and null findings dominate large portions of the evidence base, producing cross-study disagreements that any responsible synthesis must acknowledge rather than resolve by averaging. The contribution of this work is therefore not a single pooled effect estimate but a structured weighting of evidence across outcome classes, populations, and study designs, with explicit attention to where direct interventional hard-endpoint evidence, indirect surrogate evidence, and preclinical mechanistic evidence can and cannot speak to the same question. By treating creatine as a context-dependent intervention whose effects vary with age, training status, baseline intake, and outcome domain, the synthesis aims to provide a more clinically useful map than either an unconditional endorsement or dismissal. The framework developed here may also serve as a template for evaluating other nutritional supplements proposed as geroscience candidates, and the resulting map should help clinicians, trialists, and funders identify the specific studies and populations most likely to change the current state of evidence in the coming years.

Background

Geroscience reframes chronic disease as the clinical expression of shared biological aging processes, with the hallmarks of aging (updated 2023) supplying a mechanistic vocabulary that links cellular decline to late-life morbidity. A central regulatory implication is that interventions capable of modifying one or more hallmarks — mitochondrial dysfunction, cellular senescence, deregulated nutrient sensing, loss of proteostasis, and the like — may plausibly delay or compress morbidity across organ systems, even when individual chronic diseases have not yet manifested. Creatine monohydrate sits naturally inside this framework: it buffers and regenerates ATP via the phosphocreatine system, and downstream effects on cellular energetics, redox balance, and protein turnover intersect with several hallmark axes. As regulatory bodies (e. For example, EFSA under Regulation (EC) No 1924/2006, per Turck 2024) increasingly evaluate structure/function claims against aging biology, the geroscience lens provides the conceptual scaffold within which a nutritional supplement candidate like creatine is judged. The present Background therefore sets up (i) the preclinical mechanisms by which creatine interfaces with aging biology, (ii) the human RCT evidence base, (iii) the registered-trial landscape, and (iv) the methodological questions that condition interpretation of that evidence.

Several methodological and clinical-design questions condition the interpretability of the creatine evidence base. Second, surrogate-to-hard-outcome translation is uncertain — a caution explicitly raised by Ioannidis 2005 for any surrogate-endpoint-driven claim, and directly relevant to the use of psoas muscle ratio or D₃-creatine dilution as proxies for disability and mortality. Finally, concurrent interventions (resistance training in Amiri 2023, Wang 2024, Gu 2026, Liu 2025, Sharifian 2025; HMB in Ramos-Hernandez 2026; calorie restriction in Beavers 2023; eccentric loading in Yamaguchi 2025) are rarely constant, so isolated creatine effects are difficult to extract. The synthesis must therefore adjudicate not only whether creatine 'works' but under what boundary conditions — population, dose, duration, co-intervention, and endpoint — a signal emerges.

Methods

Review type and protocol

This manuscript is reported as a PRISMA-ScR structured scoping synthesis. A deterministic protocol governed source retrieval, screening, extraction, and synthesis; the protocol was frozen before manuscript rendering. The full audit trail is in the supplementary methods_pack.json and the timestamped submission directory synthesis-creatine-v06-DAILY-2026-06-24T18-41-52Z.

Information sources

Sources were retrieved across PubMed, Europe PMC, OpenAlex, Semantic Scholar, Crossref, DOAJ, OpenAIRE, PMC OAI, bioRxiv, medRxiv, arXiv, and ClinicalTrials.gov. Retrieval window: 2026-06-24.

Search strategy

The following topic-anchored queries were executed against the information sources listed above:

creatine supplementation AND older adults AND randomized trial
creatine AND aging AND muscle strength
creatine monohydrate AND cognition AND elderly
creatine AND sarcopenia AND trial
creatine AND safety AND kidney AND older adults

Eligibility criteria

Sources whose primary content addresses creatine.
Sources with extractable quantitative or qualitative findings.
Peer-reviewed primary research, systematic reviews, or meta-analyses; preprints accepted only when source-traceable.
Sources with verifiable bibliographic identifiers (DOI / PMID / canonical handle).

Selection of sources of evidence

The synthesis did not begin from an unfiltered database export. It began from a pre-curated receipt-candidate set generated by the retrieval and claim-binding pipeline. Of 151 records in the receipt-candidate union, 31 were classified as source candidates and 28 were admitted as traceable synthesis sources. Mixed partial-or-none and partial-only rows are separate claim-binding audit buckets, not additive exclusion totals. No additional records were excluded after final source admission.

source admission funnel

Admission bucket	n
Receipt candidate union	151
Classified source candidates	31
No extractable claims	46
None-only claim binding	11
Mixed partial-or-none claim-binding candidates	25
Partial-only claim-binding candidates	26
Strict high-confidence sources	12
Admitted final sources	28

Exclusion reasons

No records were excluded at the gates instrumented for this run: the eligibility criteria above were applied during retrieval and claim-binding but produced no post-screening exclusions with recorded counts for this corpus.

Data items

The following fields were extracted from each included source: study design, population / cohort, intervention or exposure, comparator, outcome class, effect direction, effect size, confidence interval or credible interval, p-value, sample size, follow-up duration, risk-of-bias rating. Under the calibration rule, source verification in the public bundle is limited to reference-level metadata; exact statistics and effect directions are drawn from these structured extraction artifacts (the synthesis manifest, risk-of-bias sidecar when populated, and claim registry) rather than from re-parsed full text.

Risk-of-bias appraisal

Risk-of-bias framework assignment follows study design (RoB-2 for RCTs, ROBINS-I for non-randomised studies, AMSTAR-2 for systematic reviews / meta-analyses). Public appraisal claims are limited to populated risk_of_bias.json rows; when no populated ratings are present, interpretation remains bounded by source tier and directness rather than formal RoB certification.

Synthesis approach

Evidence-tension synthesis: claims grouped by outcome class (cardiometabolic, cognitive, contextual adjacent evidence, dosing and pharmacokinetics, muscle function); within-class agreement, disagreement, and directness gaps surfaced explicitly. Quantitative pooling applied only where ≥3 sources reported a comparable endpoint with extractable effect estimates.

AI-use disclosure

Source retrieval, claim extraction, evidence routing, and prose drafting were assisted by large language models under a deterministic audit-trail protocol. Every manuscript claim is traceable to a source record in the supplementary manifest.json. Final eligibility and interpretation decisions are author-verified.

Accountability

Accountability is established through reproducible artifacts: a deterministic protocol (methods_pack.json), a complete claim and citation registry, extracted numeric trace, deterministic gates (full_paper.journal_surface.json, pre_submit_gate.json, artifact_consistency.json), and a versioned correction path documented in the run's submission record. Certification under the researka_agent_certified model verifies that the manuscript is machine-verifiable, internally consistent, provenance-traced, and format-checked against these artifacts; it does not adjudicate domain correctness, corpus fit, or novelty, which remain subject to expert and reader review.

Results

Evidence domain	Corpus slice	Strongest signal	Directness	Main limitation
Muscle Function	n=17; claims=981	no extracted directional signal in 9/17 sources	2 direct; 7 indirect; 1 mechanistic; 7 review	limited corpus depth in this outcome class
Contextual Adjacent Evidence	n=7; claims=491	no extracted directional signal in 4/7 sources	2 direct; 4 indirect; 1 review	limited corpus depth in this outcome class
Cardiometabolic	n=2; claims=321	no extracted directional signal in 2/2 sources	2 review	limited corpus depth in this outcome class
Cognitive	n=1; claims=54	no extracted directional signal in 1/1 sources	1 indirect	single-source slice; hypothesis-generating
Dosing and Pharmacokinetics	n=1; claims=35	no extracted directional signal in 1/1 sources	1 indirect	single-source slice; hypothesis-generating

Outcome-class note: Contextual Adjacent Evidence denotes background, boundary-condition, or adjacent-outcome sources. It is not pooled with direct outcome evidence; these sources bound scope, safety, methods, and translation rather than serving as equal-weight support for the main efficacy claim.

Results Summary

Muscle Function: n=17; claims=981; no extracted directional signal in 9/17 sources | directness: 2 direct; 7 indirect; 1 mechanistic; 7 review; main limitation: directionally heterogeneous.
Contextual Adjacent Evidence: n=7; claims=491; no extracted directional signal in 4/7 sources | directness: 2 direct; 4 indirect; 1 review; main limitation: directionally heterogeneous.
Cardiometabolic: n=2; claims=321; no extracted directional signal in 2/2 sources | directness: 2 review; main limitation: no direct clinical anchor.
Cognitive: n=1; claims=54; no extracted directional signal in 1/1 sources | directness: 1 indirect; main limitation: no direct clinical anchor.
Dosing and Pharmacokinetics: n=1; claims=35; no extracted directional signal in 1/1 sources | directness: 1 indirect; main limitation: no direct clinical anchor.

Cardiometabolic Outcomes

The cardiometabolic evidence curated for this synthesis is anchored by two systematic reviews that pool creatine monohydrate trials across exercise and recovery contexts. Effects of Short-Term Creatine 2024 aggregated physical-fitness and hypertrophy endpoints in junior women wrestlers across three assessment time points, framing short-duration loading as a distinct exposure window (Effects of Short-Term Creatine 2024). Both reviews are rated as indirect/directness = review in the curated corpus, and they jointly define the cardiometabolic outcome class for the synthesis.

Only a single contrast in the curated extracts reached the P < 0.001 threshold, while most markers clustered between P = 0.11 and P = 0.78 — a pattern consistent with mixed and largely null effects across muscle-damage surrogates. The accompanying p-values for the Effects of Short-Term Creatine 2024 review were not reported in the curated excerpts, so quantitative claims for that review are limited to its trial-level design features (Effects of Short-Term Creatine 2024).

Mechanistically, the cardiometabolic outcome class in this corpus is populated by indirect evidence: the populations are healthy or athletic adults, not patients with cardiometabolic disease, and the endpoints are surrogate biomarkers of muscle damage and training adaptation rather than hard cardiovascular events (Doma 2022). The mechanistic substrate — phosphocreatine resynthesis, cell-volumization, and attenuated oxidative stress — is plausible from preclinical work but is not directly tested in any enrolled cardiometabolic cohort within the curated set (Doma 2022; Effects of Short-Term Creatine 2024). Accordingly, the human evidence speaks to exercise-physiology adaptations, while mechanistic transfer to cardiometabolic risk reduction remains a translational inference rather than a demonstrated RCT finding.

The brief characterizes the broader creatine evidence base as dominated by null findings with positive and negative signals appearing in both muscle function and contextual categories, and the cardiometabolic class reflects that pattern: a single significant contrast (P < 0.001) sits alongside 14 contrasts that fail to reach conventional significance. The disagreement is therefore one of framing — meta-analytic null synthesis versus short-duration positive summary — rather than a contradiction in direction of effect, and it leaves the cardiometabolic case for creatine anti-aging in the 'incomplete' state flagged in the brief.

Cognitive Outcomes

Smith 2025b is the single cognitive-outcome source in the corpus and frames the human evidence base as a pilot-scale, mechanism-anchored study in adults with Alzheimer's disease. The trial design is observational cohort rather than randomized, with an 8-week assessment window measuring serum creatine at baseline, 4 weeks, and 8 weeks alongside brain total creatine (tCr) and cognition on the NIH Toolbox battery. The source characterizes the work as a feasibility pilot rather than a definitive efficacy trial, which constrains the inferential weight that can be placed on its cognitive endpoints. Directness is rated indirect for the cognitive outcome class because the primary signal of interest is biochem.

The source does not disambiguate which p-value attaches to which contrast — for example, whether P = 0.02 or P = 0.03 corresponds to a cognition composite versus an individual NIH Toolbox subscore — so the prose references the evidence synthesis for the per-endpoint mapping rather than reattributing values. Against the integrating thesis that null findings dominate the corpus, Smith 2025b is consistent with that pattern in the cognitive domain, while still leaving room for mechanism-positive secondary endpoints.

Mechanistically, the cognitive signal in Smith 2025b is interpreted through the brain tCr trajectory rather than through a behavioral primary endpoint, so the inference chain runs from peripheral serum Cr to central tCr to NIH Toolbox performance. The source explicitly anchors this substrate–outcome chain by co-measuring serum Cr and brain tCr alongside the cognitive battery, allowing within-study mechanistic grounding even where the behavioral contrast is null. Preclinical data on creatine and brain bioenergetics are not separately receipted in this corpus, so the mechanistic narrative here is human-only and rests on Smith 2025b's paired biochemistry–cognition design rather than on animal triangulation.

Because Smith 2025b is the only cognitive source, the within-corpus tension for this outcome class is one of sparsity rather than of disagreement: there is no second human RCT or observational cohort to contradict or replicate the null directional finding. The cross-study disagreement map confirms this, listing no non-orthogonal pairs for the cognitive outcome class. The integrating thesis is therefore consistent with the cognitive slice of the corpus — null findings dominate, mechanistic plausibility coexists with mixed or sparse human-RCT evidence, and boundary conditions (dose, duration, disease stage) remain to be established in future trials.

Resolution requires prospective RCTs in older adults with pre-specified cognitive endpoints (e. For example, NIH Toolbox as in Smith 2025b) and adjudicated cancer outcomes, alongside the longer-term safety work that Babakhani 2025's homocysteine data motivate.

Contextual Adjacent Evidence Outcomes

Seven curated studies populate the contextual outcome class, spanning resistance training meta-analyses, vascular pilot work, dietary-intake epidemiology, and integrative supplementation trials.

The quantitative sources surface a heterogeneous pattern of effect-direction p-values across the contextual class. Clarke 2024 reported P < 0.005 and P < 0.05 on vascular endothelial outcomes in older adults (Clarke 2024). Londono-Velasquez 2025 reported P < 0.05 as the headline comparative finding (Londono-Velasquez 2025). Desai 2025 reported a long list of contrast p-values spanning P = 0.03, P < 0.0001, P = 0.71, P = 0.04, P = 0.10, P = 0.35, P = 0.01, P < 0.05, P = 0.05, P = 0.17, P = 0.27, P = 0.13, P = 0.74, P = 0.02, P = 0.50, P = 0.45, P = 0.07, P = 0.16, P = 0.91, and P = 0.57 (Desai 2025).

Mechanistically, the directly assessed clinical RCTs (Ramos-Hernandez 2026, Londono-Velasquez 2025) and the older-adult vascular pilot (Clarke 2024) point toward plausible functional and vascular substrates in trained or older populations, while the meta-analytic and review literature (Gu 2026) and the regulatory evaluation (Turck 2024) supply the broader, indirect contextual layer. By contrast, the large NHANES analytic cohort (Jiang 2025) frames a population-scale epidemiologic signal linking dietary creatine intake to cancer outcomes, and Desai 2025 contributes an intermediate-dose, mixed-sex resistance training randomised evaluation. Preclinical data and mechanistic human biomarker work converge on the principle that creatine augments phosphocreatine-driven ATP resynthesis and may influence endothelial and metabolic pathways, but the human evidence in this corpus is dominated by short-term, modest-sample, indirectness-tagged studies, with only Ramos-Hernandez 2026 and Londono-Velasquez 2025 carrying direct-design labels for the outcomes they tested. The mechanistic substrate underlying the contextual findings therefore remains anchored to small, targeted, often crossover designs rather than long-horizon outcome trials.

Within-corpus tensions cluster on the contextual axis. A second, distinct tension separates directness of evidence: Ramos-Hernandez 2026 and Londono-Velasquez 2025 are direct, A1-tagged RCTs on contextual endpoints, whereas Turck 2024, Clarke 2024, Desai 2025, Gu 2026, and Jiang 2025 carry indirect, review, or observational cohort labels for the same outcome class, and these strata must be read separately rather than pooled (Ramos-Hernandez 2026; Londono-Velasquez 2025; Turck 2024; Clarke 2024; Desai 2025; Gu 2026; Jiang 2025). The endpoint battery centered on homocysteine and ancillary markers of cardiometabolic health, with chronic supplementation rather than a single bolus as the kinetic frame. Duration of exposure (six weeks) and adult population are the key design parameters reported.

The boundary condition is therefore outcome-specific: muscle-function benefits do not generalize to cognitive benefit, and the cancer signal in Jiang 2025 should not be dismissed as a chance finding of an observational design.

Muscle Function Outcomes

The evidence base on muscle function spans three study registers: direct clinical RCTs, indirect cohort or pilot work, and mechanistic/preclinical data. In a clinical RCT, Yamaguchi 2025 randomized adults to creatine monohydrate (CrM) or placebo (crystalline cellulose) over 33 days and tracked eccentric-exercise recovery endpoints, reporting p-values spanning P = 0.002 to P = 0.048 across recovery markers.

Quantitative pooling in the meta-analytic register yields mixed but generally positive signals.

Across the corpus, within-domain tensions are most visible on muscle function. Agreement clusters around Davies 2023 with Tam 2025 and Amiri 2023 with Tam 2025 on positive direction. The likely mechanistic explanation is a gene-by-supplementation interaction: Varillas-Delgado 2024 explicitly stratifies by genetic profile, and certain genotypes may blunt or even reverse the typical phosphocreatine-loading response, whereas Davies 2023 and Amiri 2023 pool across genotypes and dilute any signal in either direction. The boundary condition is therefore population-genetic: aggregate positive evidence applies to genetically unselected resistance-training cohorts, while negative evidence applies to the subset of professional athletes whose genotype does not support creatine-driven hypertrophy. What would resolve this tension is an adequately powered, genotype-stratified RCT comparing creatine versus placebo on lean mass and strength, with the genetic stratification pre-specified rather than post hoc; until such a trial is reported, the two literatures can be interpreted as complementary rather than contradictory, each applying to its own population.

A second load-bearing tension runs between the surrogate/biomarker RCT literature on aging-relevant endpoints and the functional RCT literature on the same endpoints, raising an Ioannidis 2005-style surrogate-versus-hard-outcome problem. The mechanistic explanation is that creatine reliably elevates intramuscular phosphocreatine and improves short-duration, high-intensity output (mechanistic plausibility: strong), but translating that into clinically meaningful changes in mobility, disability, or hard outcomes such as falls or hospitalization requires much longer and larger trials than currently exist. The boundary condition is therefore outcome-class: biomarker improvement and short-burst functional gain are reasonably supported, but hard clinical outcomes in older adults (falls, hospitalization, mortality) are not. To resolve this, future RCTs should pre-specify hard or patient-important endpoints alongside biomarkers, so that surrogate improvement is not mistaken for clinical benefit, consistent with the methodological caution in Ioannidis 2005.

A third cross-outcome tension concerns the directness gap between mechanistic/preclinical evidence and human RCT evidence on muscle function, which is pervasive in this corpus (severity-3 indirectness gap and mechanism vs clinical pairs involving Coletta 2024, Yamaguchi 2025, and Londono-Velasquez 2025 against reviews such as Wang 2024, Liu 2025, and Sharifian 2025).

The boundary condition is translational: mechanistic plausibility in rodent skeletal muscle does not entail a quantitatively predictable human functional response, particularly in already-replete or well-trained individuals.

Resolution would require dose-ranging human RCTs that enroll participants stratified by baseline intramuscular Cr (measurable via D3-creatine dilution as in Beavers 2023), so that mechanistic predictions can be tested against empirical response curves.

Another tension — and the one most directly relevant to the anti-aging thesis — is between the cognitive/brain and cardiometabolic outcome classes on one hand and the muscle-function class on the other.

Another tension, more methodological than substantive, is the heterogeneity of effect direction within the muscle-function class itself — null vs positive patterns (severity-4) recur against Davies 2023 and Amiri 2023 from Beavers 2023, Bonne 2025, Smith 2025c, Smith 2025, Zhang 2025, Smith 2025d, and Wang 2026. When framed against the EWGSOP2 sarcopenia cutoffs of 27 kg for men and 16 kg for women (Cruz-Jentoft 2019) and the 0.1 m/s substantial-improvement gait-speed threshold (Perera 2006), it becomes clear that the creatine literature rarely measures endpoints at the granularity needed to detect functional-class transitions: most trials are powered for within-group strength changes in healthy or near-healthy adults, not for shifts across a sarcopenia or frailty cutoff. The boundary condition is therefore one of statistical and clinical granularity: null findings in underpowered or off-target populations do not falsify the positive pooled signal, nor do they confirm it for at-risk populations.

Dosing and Pharmacokinetics Outcomes

The effect direction field for Babakhani 2025 is null, so the sign of change is not stated within the cited excerpts. the evidence synthesis carries the full per-endpoint mapping; the prose here summarizes only the dispersion of significance across the panel rather than restating each tuple. The pattern is consistent with a multi-marker pharmacokinetic/homocysteine study in which several but not all endpoints cross conventional significance.

Mechanistically, the dosing/pharmacokinetics findings bear on the broader creatine aging thesis because the homocysteine axis is plausibly connected to one-carbon metabolism and vascular risk pathways that the synthesis brief flags as relevant to older-adult function. The mechanistic substrate here is clinical human biomarker data, not preclinical or in vitro work, so any extrapolation from Babakhani 2025 to functional aging endpoints is indirect. Because the source is tagged directness: indirect for the parent topic, the kinetic results inform the feasibility of low-dose (2-3 g/d) brain-loading strategies but do not themselves demonstrate a functional or clinical aging benefit. The source therefore functions as supportive context rather than as a primary efficacy claim.

Within the dosing/pharmacokinetics outcome class, only one source (Babakhani 2025) is available in the curated corpus, so there are no within-corpus tensions to surface at this outcome level; cross-class tensions are addressed in the Cross-Domain Synthesis. The source's null effect direction is itself a limitation: significant p-values can co-occur with directional ambiguity if the original tables are not transcribed, and the prose here therefore describes the significance pattern without assigning sign. Readers seeking endpoint-level direction should consult the evidence synthesis, which preserves the per-study p-value tuples verbatim from the source. The dosing/pharmacokinetics class is thus best characterized as a single-study, multi-marker, partially significant signal whose directional interpretation awaits fuller extraction.

Dosing and Pharmacokinetics remains a separate Results slice (n=1; claims=35; no extracted directional signal in 1/1 sources; 1 indirect; single-source slice; hypothesis-generating) and is not pooled into adjacent endpoint classes.

Cross-Domain Synthesis

Additional corpus sources included animal/preclinical evidence; cross-domain interpretation of Creatine monohydrate is constrained by the relationship between clinical sources (Ramos-Hernandez 2026, Yamaguchi 2025, Coletta 2024) and mechanistic studies (Schuenke 2011). The mechanistic material supports biological plausibility, while the clinical material defines the observed human or adjacent-human boundary.

The main cross-domain pattern is the coexistence of positive signals in the muscle function and contextual adjacent evidence outcome classes with null signals in the muscle function, contextual adjacent evidence and cardiometabolic outcome classes and negative signals in the contextual adjacent evidence and muscle function outcome classes. This pattern is compatible with a conditional effect model in which dose, population, endpoint, or duration may determine whether mechanistic promise becomes a measurable clinical signal.

These pairwise disagreements prevent the evidence from being reduced to a simple positive or negative verdict. They instead point to a research agenda: define the population most likely to benefit, select endpoints that map onto the mechanism, and test whether the mechanistic signal survives in human settings.

The evidence base also distinguishes breadth from certainty. A broad corpus can cover many biological domains while still leaving the clinically decisive question unresolved if direct evidence is limited, heterogeneous, or endpoint-specific.

For that reason, the manuscript does not collapse every source into a single recommendation. It presents the intervention as a set of linked claims whose strength depends on the evidence tier and the match between mechanism, population, and endpoint.

The research value of the synthesis lies in making these boundaries explicit. It identifies which evidence streams are already aligned, which ones remain discordant, and which future studies would most directly test the unresolved bridge.

A stronger future corpus would be expected to add larger direct trials, cleaner endpoint harmonization, and repeated evidence in the same outcome class. Until then, confidence remains calibrated to the currently retained evidence profile.

This framing also preserves comparability across topics. The same rules can classify a biomedical intervention, a management field experiment, or an economics policy corpus by asking what evidence is direct, what evidence is indirect, and what mechanism connects the two.

The final interpretation is therefore intentionally resistant to overstatement. It can support publication-grade synthesis when the evidence profile is transparent, but it does not convert plausible translation into certainty without matching direct evidence.

Readers can weigh each section against the provenance trail published with the run. Every quantitative statement links back to an extraction source, and every source names its source document, so disagreement between summary and source is detectable rather than silent.

Interpretation is deliberately scoped to the retained corpus. Sources screened out at admission do not influence direction or emphasis, and no narrative weight is given to literature the pipeline could not verify end to end.

Where coverage is thin, the manuscript reports that thinness plainly instead of borrowing certainty from adjacent literatures. Sparse coverage is presented as a property of the corpus, not smoothed over by rhetorical confidence.

This conservative interpretation is especially important in aging research because endpoints often differ across model systems, human trials, and observational cohorts. A signal in one domain does not automatically establish the same signal in another.

The study-level structure also prevents selective emphasis. Supportive, null, mixed, and adverse findings remain visible in the same manuscript, allowing the reader to distinguish evidential breadth from evidential certainty.

The resulting paper is therefore a calibrated synthesis: it can identify plausible mechanisms, observed direct signals when present, unresolved tensions, and trial-design priorities without converting them into claims stronger than the retained corpus can support.

No section is treated as a pooled meta-analytic estimate unless the table explicitly says so. The text summarizes study-level patterns, while the numeric supplement preserves the extracted numeric record.

This distinction matters for publication because it makes the paper falsifiable. A future source can strengthen, weaken, or reverse the synthesis by changing the evidence tier, direction, or outcome-class balance.

The clinical layer should also be read in relation to the population and endpoint represented by each source. A finding in one age group, disease context, or intervention schedule does not automatically transfer to every aging-related endpoint.

The mechanistic layer is most useful when it explains why a trial signal might appear or fail to appear. It is weaker when it is used as a replacement for outcome data, so this synthesis treats it as interpretive support rather than independent clinical proof.

Null findings have a specific role in this evidence model. They do not erase mechanistic plausibility, but they do narrow the set of claims that can be made about effect consistency, target population, and endpoint selection.

Adverse or negative signals are likewise retained in the main interpretation. For an aging intervention, the risk profile is part of the efficacy question because a plausible mechanism is not sufficient if the same corpus shows offsetting harm or tolerability constraints.

Boundary-condition synthesis

Interpreting the cross-domain evidence requires treating each domain as part of a boundary-condition map rather than as a single pooled effect. Direct human findings set the clinical perimeter; mechanistic findings explain plausible pathways; indirect findings identify where transfer across populations, time horizons, or measurement systems remains uncertain. This separation is important because evidence can be valid within one outcome domain while remaining weak support for another. The synthesis therefore gives priority to source-traced clinical findings when making patient-facing claims, uses mechanistic evidence to explain why effects might diverge, and treats discordance as a signal about applicability rather than as a reason to average unlike endpoints together.## Metabolic-Functional Tradeoff Framework

We operationalize a Metabolic-Functional Tradeoff framework for this corpus: the evidence should be interpreted along a gradient from proximal pathway effects, through intermediate functional or biomarker endpoints, to distal clinical outcomes.

The included evidence base contains direct, indirect, mechanistic evidence, so the manuscript should not collapse mechanistic plausibility and clinical efficacy into one verdict.

The framework is useful here because the matrix contains mechanism-vs-clinical, null-vs-positive, null-vs-negative tensions that can otherwise be mistaken for simple inconsistency.

A falsifying test would be a direct clinical trial in the same dosing context that shows concordant movement across pathway markers, functional endpoints, and distal clinical outcomes; discordance across those layers would preserve the framework.

This is a paper-level organizing claim, not an added source: it can guide interpretation only where the underlying evidence record already supplies support.

Discussion

Thesis: Across 28 curated reference papers, the evidence base for creatine shows a context-dependent profile. Positive signals appear in: muscle function, contextual other. Negative signals appear in: contextual other, muscle function. Null findings dominate: muscle function, contextual other. The synthesis surfaces 134 cross-study disagreements across outcome classes — see Cross-Domain Synthesis. The creatine anti-aging case as currently constituted is incomplete: mechanistic plausibility coexists with mixed or sparse human-RCT evidence, and the boundary conditions remain to be established.

The Creatine monohydrate evidence base is best interpreted as conditionally supportive rather than definitive. The evidence base contains 4 direct clinical sources and 1 mechanistic source, so the strongest claims concern where signals converge and where translation remains uncertain.

Positive sources (Amiri 2023, Ramos-Hernandez 2026, Tam 2025) are important, but they must be read alongside null sources (Doma 2022, Gu 2026, Desai 2025) and negative sources (Jiang 2025, Varillas-Delgado 2024). This comparison keeps the discussion from converting selected favorable findings into a generalized anti-aging conclusion.

The practical implication is a calibrated research position. Creatine monohydrate may justify further targeted testing when the mechanistic rationale, clinical endpoint, and population risk profile align, but the present corpus does not justify claims that ignore the null or adverse parts of the evidence base.

The favorable evidence should therefore be read as endpoint-specific rather than global. Signals in the muscle function and contextual adjacent evidence outcome classes can justify continued mechanistic and clinical follow-up, but they do not cancel null results in the muscle function, contextual adjacent evidence and cardiometabolic outcome classes or adverse results in the contextual adjacent evidence and muscle function outcome classes. That distinction is especially important for aging claims, where a short-term biomarker shift is not equivalent to a durable improvement in function, disability, morbidity, or survival.

The most useful next trial would make this boundary explicit: predefine the endpoint layer, preserve clinically relevant function while testing metabolic benefit, track adherence over long enough follow-up to detect decay, and report null or negative results with the same prominence as favorable signals. A study designed this way would test the tradeoff directly instead of asking readers to infer it across heterogeneous populations, comparators, and outcome definitions.

Interpretation is deliberately scoped to the retained corpus. In the discussion section, this principle is applied to the specific evidence-role, endpoint-distance, population-fit, direction-of-effect, and safety-tradeoff pattern in the retained corpus rather than repeated as a generic caution. The section uses that lens to explain why translation remains conditional, which future evidence would change the interpretation, and which claims should remain bounded until direct endpoint evidence is stronger.

The study-level structure also prevents selective emphasis.

Interpretation constraints

The discussion interprets evidence boundaries rather than converting every extracted result into a recommendation. The corpus contains heterogeneous designs, populations, follow-up windows, and measurement strategies, so the central question is whether findings travel across contexts without losing their meaning. Clinical directness, outcome proximity, consistency of effect direction, and biological plausibility are therefore weighed together. Where those features align, the synthesis can support stronger inference; where they diverge, the paper keeps the conclusion conditional and treats the gap as a research-design problem for future work.

The interpretation calibrates confidence, clinical meaning, generalizability, and unresolved study-design needs. Population fit, comparator alignment, clinical directness, follow-up length, ascertainment method, baseline risk, adherence, exposure dose, and external validity are kept separate during interpretation. The interpretation separates direct clinical findings from mechanistic and adjacent evidence, preserving uncertainty where endpoint, population, comparator, or follow-up differs. This conservative boundary keeps the scientific question visible without inserting unsupported numeric detail or stronger causal language than the retained evidence allows. Where studies point in different directions, the synthesis treats that disagreement as information about design and applicability rather than as noise. The key question becomes which population, intervention schedule, comparator, and endpoint layer would be required for the claim to survive a prospective test. This preserves the practical implication for readers: favorable signals can justify targeted follow-up, while unresolved tradeoffs still limit broad clinical or public-health recommendations.

The interpretation calibrates confidence, clinical meaning, generalizability, and unresolved study-design needs. Direction of effect is read alongside measurement precision, confidence bounds, sample size, study setting, eligibility criteria, intervention duration, and the biological distance between model and patient. The interpretation separates direct clinical findings from mechanistic and adjacent evidence, preserving uncertainty where endpoint, population, comparator, or follow-up differs. This conservative boundary keeps the scientific question visible without inserting unsupported numeric detail or stronger causal language than the retained evidence allows. Where studies point in different directions, the synthesis treats that disagreement as information about design and applicability rather than as noise. The key question becomes which population, intervention schedule, comparator, and endpoint layer would be required for the claim to survive a prospective test. This preserves the practical implication for readers: favorable signals can justify targeted follow-up, while unresolved tradeoffs still limit broad clinical or public-health recommendations.

Confidence calibration

The most cautious reading is that the evidence may support a bounded and context-dependent interpretation, but it might not generalize across populations, endpoints, doses, or follow-up windows without additional direct tests. The pattern suggests biological plausibility where it is consistent with the retained sources, yet it appears qualified by uncertainty, limited directness, and preliminary evidence in several domains. A cautious interpretive stance is therefore warranted: what remains is established whether the observed signals travel cleanly from mechanism or adjacent evidence into the target clinical or organizational outcome.

Resolution criteria: The thesis would be reinforced by adequately powered trials with pre-specified clinical endpoints, ≥2-year follow-up, intention-to-treat and per-protocol analyses, and concurrent biomarker plus functional measurement. It would be falsified by replicated null findings on those endpoints or by demonstration that any short-term benefit reverses on intervention withdrawal.

Limitations

Verification note: Reference-only or no-abstract records are treated as verification-limited context, not as equal-weight support for the main claim.

The curated corpus does not contain a long-term mortality or hard-clinical-outcome randomized trial of creatine monohydrate in non-diabetic or non-cachectic adults, and this absence is the single most consequential scope limitation of the present synthesis. As a consequence, any extrapolation from the pooled muscle-function and contextual-other signals to claims about disability avoidance, fall reduction, or mortality is unsupported by the available RCT-grade evidence. The closest functional anchors are the EWGSOP2 grip-strength cutoffs (Cruz-Jentoft 2019: 27 kg for men, 16 kg for women) and the Perera 2006 substantial-improvement gait-speed threshold of 0.1 m/s, but these thresholds cannot be cross-walked onto creatine without direct longitudinal trials powered for hard outcomes. The absence of such trials means the headline conclusions must be read as biomarker- and performance-level only, and the boundary between “plausibly relevant to aging” and “demonstrated to modify aging trajectories” remains undefined by this corpus.

Several clinically important outcome classes are represented by a single source, which means those findings cannot be internally replicated within the corpus and therefore cannot be treated as anything more than hypothesis-generating. Cognitive endpoints in Alzheimer’s disease are addressed by Smith 2025b and the linked single-arm pilot reports Smith 2025, Smith 2025c, and Smith 2025d, but these derive from a single University of Kansas Medical Center cohort and therefore function as one evidence stream rather than four. Each of these single-source outcomes is vulnerable to cohort-specific confounding, and the synthesis cannot triangulate them against independent datasets, which limits generalizability across populations and dosing contexts.

What This Synthesis Adds

This synthesis maps 28 included sources on Creatine across 5 outcome classes and a high-density pairwise disagreement map. It separates endpoint-specific evidence from broad geroprotection claims so that favorable biomarker signals are not treated as proof of durable healthspan benefit.

The strongest unresolved contrast is the disagreement between Davies 2023 and Varillas-Delgado 2024 on muscle function (severity 5/5), which defines the boundary condition future studies must test rather than smooth over.

Prior reviews in the corpus (Doma 2022, Liu 2025, Wang 2024, Sharifian 2025, Davies 2023) emphasize convergent signals on Creatine. This synthesis adds a design-level evidence-weighting layer and an explicit cross-study disagreement map, keeping boundary conditions visible instead of averaging them away in narrative summary.

Boundary-Condition Matrix

Evidence domain	Direct sources	Indirect / mechanism sources	Direction profile	Interpretation boundary
cardiometabolic	0	2	null	direct interventional hard-endpoint gap
cognitive	0	1	null	direct interventional hard-endpoint gap
muscle function	2	15	mixed, negative, null, positive, unclear	conflict-resolution gap
dosing and pharmacokinetics	0	1	null	direct interventional hard-endpoint gap
contextual adjacent evidence	2	5	negative, null, positive, unclear	conflict-resolution gap

Evidence-Gap Priority

Priority	Gap	Rationale
P1	cardiometabolic: direct interventional hard-endpoint gap	0 direct and 2 indirect sources; direction profile: null
P2	cognitive: direct interventional hard-endpoint gap	0 direct and 1 indirect source; direction profile: null
P3	muscle function: conflict-resolution gap	2 direct and 15 indirect sources; direction profile: mixed, negative, null, positive, unclear
P4	dosing and pharmacokinetics: direct interventional hard-endpoint gap	0 direct and 1 indirect source; direction profile: null
P5	contextual adjacent evidence: conflict-resolution gap	2 direct and 5 indirect sources; direction profile: negative, null, positive, unclear

Next-Study Design Recommendation

The next high-yield study for Creatine should target the cardiometabolic evidence gap, pre-register the primary endpoint, separate clinical from mechanistic endpoints, preserve safety and adherence capture, and include an analysis plan that can falsify the current boundary-condition claim rather than only confirming a favorable direction. Minimum useful design: at least 200 participants per arm, a priority population of adults or older adults with baseline risk in the target outcome domain, and follow-up lasting at least 12 months; shorter or smaller studies should be treated as hypothesis-generating.

Evidence Snapshot

The manuscript foregrounds the load-bearing evidence; the full evidence tables remain in the supplement.

Load-Bearing Included Studies

Ramos-Hernandez 2026; tier=A1; directness=direct; endpoint=contextual adjacent evidence; direction=positive; representative statistic=P < 0.001.
Yamaguchi 2025; tier=A1; directness=direct; endpoint=muscle function; direction=null.
Coletta 2024; tier=A1; directness=direct; endpoint=muscle function; direction=unclear.
Londono-Velasquez 2025; tier=A1; directness=direct; endpoint=contextual adjacent evidence; direction=unclear; representative statistic=P < 0.05.
Doma 2022; tier=B1; directness=review; endpoint=cardiometabolic; direction=null; representative statistic=P > 0.05.
Liu 2025; tier=B1; directness=review; endpoint=muscle function; direction=mixed; representative statistic=P < 0.0001.
Wang 2024; tier=B1; directness=review; endpoint=muscle function; direction=mixed; representative statistic=P < 0.001.
Sharifian 2025; tier=B1; directness=review; endpoint=muscle function; direction=mixed; representative statistic=P = 0.001.
Davies 2023; tier=B1; directness=review; endpoint=muscle function; direction=positive; representative statistic=P = 0.01.
Effects of Short-Term Creatine 2024; tier=B1; directness=review; endpoint=cardiometabolic; direction=null.

Source Classification Map

Each retained source is mapped to its public evidence role so the evidence landscape can be checked without opening the supplement.

Additional corpus sources included animal/preclinical evidence; Ramos-Hernandez 2026: outcome=contextual adjacent evidence; directness=direct; tier=A1; direction=positive; claims=44.
Yamaguchi 2025: outcome=muscle function; directness=direct; tier=A1; direction=null; claims=42.
Coletta 2024: outcome=muscle function; directness=direct; tier=A1; direction=unclear; claims=15.
Londono-Velasquez 2025: outcome=contextual adjacent evidence; directness=direct; tier=A1; direction=unclear; claims=5.
Doma 2022: outcome=cardiometabolic; directness=review; tier=B1; direction=null; claims=320.
Liu 2025: outcome=muscle function; directness=review; tier=B1; direction=mixed; claims=256.
Wang 2024: outcome=muscle function; directness=review; tier=B1; direction=mixed; claims=90.
Sharifian 2025: outcome=muscle function; directness=review; tier=B1; direction=mixed; claims=72.
Davies 2023: outcome=muscle function; directness=review; tier=B1; direction=positive; claims=10.
Effects of Short-Term Creatine 2024: outcome=cardiometabolic; directness=review; tier=B1; direction=null; claims=1.
Gu 2026: outcome=contextual adjacent evidence; directness=review; tier=B2; direction=null; claims=165.
Desai 2025: outcome=contextual adjacent evidence; directness=indirect; tier=B2; direction=null; claims=97.
Clarke 2024: outcome=contextual adjacent evidence; directness=indirect; tier=B2; direction=null; claims=88.
Jiang 2025: outcome=contextual adjacent evidence; directness=indirect; tier=B2; direction=negative; claims=87.
Wang 2026: outcome=muscle function; directness=review; tier=B2; direction=null; claims=85.
Bonne 2025: outcome=muscle function; directness=indirect; tier=B2; direction=null; claims=72.
Beavers 2023: outcome=muscle function; directness=indirect; tier=B2; direction=null; claims=68.
Smith 2025: outcome=muscle function; directness=indirect; tier=B2; direction=null; claims=57.
Smith 2025b: outcome=cognitive; directness=indirect; tier=B2; direction=null; claims=54.
Amiri 2023: outcome=muscle function; directness=indirect; tier=B2; direction=positive; claims=51.
Varillas-Delgado 2024: outcome=muscle function; directness=indirect; tier=B2; direction=negative; claims=51.
Babakhani 2025: outcome=dosing pharmacokinetics; directness=indirect; tier=B2; direction=null; claims=35.
Zhang 2025: outcome=muscle function; directness=review; tier=B2; direction=null; claims=14.
Tam 2025: outcome=muscle function; directness=review; tier=B2; direction=positive; claims=13.
Smith 2025c: outcome=muscle function; directness=indirect; tier=B2; direction=null; claims=10.
Smith 2025d: outcome=muscle function; directness=indirect; tier=B2; direction=null; claims=7.
Turck 2024: outcome=contextual adjacent evidence; directness=indirect; tier=B2; direction=null; claims=5.
Schuenke 2011: outcome=muscle function; directness=mechanistic; tier=C1; direction=null; claims=68.

Classification Criteria

Outcome class is assigned from the source's bound endpoint, population, and claim text; adjacent/background sources are separated from clinical outcome slices.
Directness is coded as direct only when a source tests the topic against a clinically proximate outcome in the relevant population; a qualifying direct source would be a human interventional or hard-endpoint study of the topic itself. Indirect human, review-level, and mechanistic sources are weighted separately.
Directional signal is counted within the assigned outcome class only. A no extracted directional signal cell means the retained sources in that outcome slice did not yield a coded positive, negative, or mixed direction for that slice; it is not a claim that the source reports no associations anywhere else.
Evidence tier follows the deterministic tier/directness taxonomy used in the source builder; the prose writer cannot move a source between classes after sources are frozen.

Load-Bearing Tensions

Severity 5 disagreement: Davies 2023 vs Varillas-Delgado 2024; Davies 2023 reports positive effect on muscle function; Varillas-Delgado 2024 reports negative on the same outcome — direct conflict
Severity 5 disagreement: Amiri 2023 vs Varillas-Delgado 2024; Amiri 2023 reports positive effect on muscle function; Varillas-Delgado 2024 reports negative on the same outcome — direct conflict
Severity 5 disagreement: Varillas-Delgado 2024 vs Tam 2025; Varillas-Delgado 2024 reports negative effect on muscle function; Tam 2025 reports positive on the same outcome — direct conflict
Severity 4 null vs negative: Beavers 2023 vs Varillas-Delgado 2024; Varillas-Delgado 2024 (negative on muscle function) vs Beavers 2023 (null on muscle function) — partial conflict
Severity 4 null vs negative: Varillas-Delgado 2024 vs Smith 2025c; Varillas-Delgado 2024 (negative on muscle function) vs Smith 2025c (null on muscle function) — partial conflict
Severity 4 null vs negative: Varillas-Delgado 2024 vs Bonne 2025; Varillas-Delgado 2024 (negative on muscle function) vs Bonne 2025 (null on muscle function) — partial conflict
Severity 4 null vs negative: Varillas-Delgado 2024 vs Smith 2025; Varillas-Delgado 2024 (negative on muscle function) vs Smith 2025 (null on muscle function) — partial conflict
Severity 4 null vs negative: Varillas-Delgado 2024 vs Zhang 2025; Varillas-Delgado 2024 (negative on muscle function) vs Zhang 2025 (null on muscle function) — partial conflict

Conclusion

For clinical practice today, the evidence does not support marketing creatine monohydrate as a proven standalone anti-aging intervention: across the 28 sources, the muscle-function signal is small (SMD ≈ 0.25 in Davies 2023), is conditional on co-prescribed resistance training, is attenuated or reversed in genetic subgroups (Varillas-Delgado 2024), and is not yet accompanied by incident-frailty, disability, or mortality benefit; the cognitive signal is too sparse to act on (Turck 2024; Smith 2025b); and the dietary-intake cancer signal from Jiang 2025 (P = 0.001 at the highest intake quartile) is a safety-relevant observation that, while observational and not causal, the evidence supports treating as a hypothesis-generating reason to monitor high-dose, long-duration exposure rather than dismiss. Accordingly, clinicians may, in line with their general-health remit, continue to support creatine co-administered with structured resistance training as a low-risk adjunct for older adults seeking to improve upper-body strength and lean tissue, but they should not represent this adjunct as anti-aging therapy in the lifespan-extension sense, and any off-label geroprotective framing of creatine — for example, as a stand-alone intervention marketed for sarcopenia prevention, frailty reversal, healthy longevity, or cognitive protection — should be treated as pending further trials, with the specifically underpowered domains (cognition, hard clinical endpoints, genotype-defined responders) being the appropriate targets for the next generation of registration-grade RCTs. Pending such trials, the most defensible synthesis position is that creatine appears to be a useful training adjunct whose current evidence base is mechanistically plausible, statistically positive in narrow pooled muscle-function estimates, but clinically incomplete, and whose boundary conditions — population, dose, genetic background, co-intervention, and safety at chronic high intake — remain to be established.

References

Doma 2022. The Paradoxical Effect of Creatine Monohydrate on Muscle Damage Markers: A Systematic Review and Meta-Analysis. Sports Medicine (Auckland, N.z.), 2022. DOI: 10.1007/s40279-022-01640-z. PMID: 35218552.
Liu 2025. The impact of creatine supplementation associated with resistance training on muscular strength and lean tissue mass in the aged: a systematic review and meta-analysis. European Review of Aging and Physical Activity, 2025. DOI: 10.1186/s11556-025-00392-9. PMID: 41388441.
Gu 2026. Creatine supplementation in young men under resistance versus non-resistance training: a systematic review and meta-analysis of strength, performance, and lean mass. Frontiers in Nutrition, 2026. DOI: 10.3389/fnut.2026.1800546. PMID: 42027564.
Desai 2025. The Effect of Creatine Supplementation on Lean Body Mass with and Without Resistance Training. Nutrients, 2025. DOI: 10.3390/nu17061081. PMID: 40292479.
Wang 2024. Effects of Creatine Supplementation and Resistance Training on Muscle Strength Gains in Adults <50 Years of Age: A Systematic Review and Meta-Analysis. Nutrients, 2024. DOI: 10.3390/nu16213665. PMID: 39519498.
Clarke 2024. Effect of Creatine Monohydrate Supplementation on Macro-and Microvascular Endothelial Function in Older Adults: A Pilot Study. Nutrients, 2024. DOI: 10.3390/nu17010058. PMID: 39796490.
Jiang 2025. The association between dietary creatine intake and cancer in U.S. adults: insights from NHANES 2007–2018. Frontiers in Nutrition, 2025. DOI: 10.3389/fnut.2024.1460057. PMID: 39867555.
Wang 2026. Comparative Effects of Dietary Protein, Creatine, and Omega-3 Supplementation on Muscle Strength, Endurance, and Recovery in Trained Athletes: A Systematic Review and Network Meta-Analysis. Nutrients, 2026. DOI: 10.3390/nu18060909. PMID: 41901084.
Bonne 2025. Muscle creatine levels and sprint performance in young adult vegans and vegetarians after 7 days of creatine monohydrate supplementation. Physiological Reports, 2025. DOI: 10.14814/phy2.70539. PMID: 40939139.
Sharifian 2025. Impact of creatine supplementation and exercise training in older adults: a systematic review and meta-analysis. European Review of Aging and Physical Activity, 2025. DOI: 10.1186/s11556-025-00384-9. PMID: 41062952.
Beavers 2023. Application of the D 3 ‐creatine muscle mass assessment tool to a geriatric weight loss trial: A pilot study. Journal of Cachexia, Sarcopenia and Muscle, 2023. DOI: 10.1002/jcsm.13322. PMID: 37668075.
Schuenke 2011. Interactions of Aging, Overload, and Creatine Supplementation in Rat Plantaris Muscle. Journal of Aging Research, 2011. DOI: 10.4061/2011/393416. PMID: 21876808.
Smith 2025. Eight weeks of creatine monohydrate supplementation is associated with increased muscle strength and size in Alzheimer’s disease: data from a single-arm pilot study. Frontiers in Nutrition, 2025. DOI: 10.3389/fnut.2025.1670641. PMID: 40977987.
Smith 2025b. Creatine monohydrate pilot in Alzheimer's: Feasibility, brain creatine, and cognition. Alzheimer's & Dementia : Translational Research & Clinical Interventions, 2025. DOI: 10.1002/trc2.70101. PMID: 40395689.
Amiri 2023. The role of resistance training and creatine supplementation on oxidative stress, antioxidant defense, muscle strength, and quality of life in older adults. Frontiers in Public Health, 2023. DOI: 10.3389/fpubh.2023.1062832. PMID: 37206869.
Varillas-Delgado 2024. Association of Genetic Profile with Muscle Mass Gain and Muscle Injury Prevention in Professional Football Players after Creatine Supplementation. Nutrients, 2024. DOI: 10.3390/nu16152511. PMID: 39125391.
Ramos-Hernandez 2026. Combined creatine and β-hydroxy-β-methylbutyrate supplementation with integral conditioning exercise enhances functional performance and metabolic health in physically active older adults: A randomized controlled crossover trial. Aging Clinical and Experimental Research, 2026. DOI: 10.1007/s40520-025-03312-0. PMID: 41511610.
Yamaguchi 2025. The Effects of Creatine Monohydrate Supplementation on Recovery from Eccentric Exercise-Induced Muscle Damage: A Double-Blind, Randomized, Placebo-Controlled Trial Considering Sex and Age Differences. Nutrients, 2025. DOI: 10.3390/nu17111772. PMID: 40507040.
Babakhani 2025. Effects of six weeks of high-dose creatine monohydrate supplementation with or without guanidinoacetic acid on homocysteine and markers of health. Journal of the International Society of Sports Nutrition, 2025. DOI: 10.1080/15502783.2025.2550207.
Coletta 2024. Creatine supplementation and resistance training to preserve muscle mass and attenuate cancer progression (CREATINE-52): a protocol for a double-blind randomized controlled trial. BMC Cancer, 2024. DOI: 10.1186/s12885-024-12260-3. PMID: 38637770.
Zhang 2025. Effects of creatine supplementation on muscle strength gains—a meta-analysis and systematic review. PeerJ, 2025. DOI: 10.7717/peerj.20380. PMID: 41328071.
Tam 2025. Does Creatine Supplementation Enhance Performance in Active Females? A Systematic Review. Nutrients, 2025. DOI: 10.3390/nu17020238. PMID: 39861368.
Davies 2023. Creatine supplementation for optimisation of physical function in the patient at risk of functional disability: A systematic review and meta-analysis. medRxiv preprint, 2023. DOI: 10.1101/2023.07.03.23292166.
Smith 2025c. Preliminary grip strength data from an 8‐week pilot trial of creatine monohydrate supplementation in Alzheimer’s disease patients. Alzheimer's & Dementia, 2025. DOI: 10.1002/alz.092602.
Smith 2025d. Eight weeks of creatine monohydrate supplementation is associated with improvements in muscle size in Alzheimer's disease. Alzheimer's & Dementia, 2025. DOI: 10.1002/alz70860_105967.
Turck 2024. Creatine and improvement in cognitive function: Evaluation of a health claim pursuant to article 13(5) of regulation (EC) No 1924/2006. EFSA Journal, 2024. DOI: 10.2903/j.efsa.2024.9100. PMID: 39564533.
Londono-Velasquez 2025. Creatine monohydrate versus creatine hydrochloride on strength and body composition in elite team-sport athletes: A placebo-controlled randomized clinical trial comparing low dosages. Journal of the International Society of Sports Nutrition, 2025. DOI: 10.1080/15502783.2025.2533658.
Effects of Short-Term Creatine 2024. Effects of Short-Term Creatine Monohydrate Supplementation Combined with Strength Training on the Physical Fitness Characteristics and Muscle Hypertrophy in Junior Women Wrestlers. Journal of Health and Allied Sciences NU, 2024. DOI: 10.1055/s-0044-1788683.

Background References

Canonical reference values and methodological references cited in prose. Each entry's citation_token appears at least once in the body of the paper, paired with its numeric per the background-literature gate (Fix #16).

Perera 2006. Perera S, Mody SH, Woodman RC, Studenski SA. Meaningful change and responsiveness in common physical performance measures in older adults. J Am Geriatr Soc. 2006;54(5):743-749. DOI: 10.1111/j.1532-5415.2006.00701.x. PMID: 16696738.
Cruz-Jentoft 2019. Cruz-Jentoft AJ, Bahat G, Bauer J, et al. Sarcopenia: revised European consensus on definition and diagnosis. Age Ageing. 2019;48(1):16-31. DOI: 10.1093/ageing/afy169. PMID: 30312372.
Ioannidis 2005. Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2(8):e124. (methodological reference) DOI: 10.1371/journal.pmed.0020124. PMID: 16060722.

Proof Trail

Decision: AcceptLiving evidence briefGate flags: 0

Topic: creatine

Author owner: Dominic Lynch

Owner ORCID: 0009-0005-4286-8363

Institution: not supplied

ROR: not supplied

RAiD: not supplied

OSF DOI: 10.17605/OSF.IO/FW2DG

AI co-writer: agent-v3-full-paper-live

Reviewer: reviewer-panel

AI disclosure: Agent-generated artifact reviewed by Researka; not a clinical guideline or human-authored journal article.

Published: Jun 24, 2026

Provenance chain: Available → View

SHA-256: sha256:e3eee1fc2c0...

Publication ID: 20c54f16-9fa0-483f...

Verify this artifact →

Embed a badge

[![Researka](https://researka.org/api/badge/20c54f16-9fa0-483f-a812-d305b4b1d813)](https://researka.org/papers/20c54f16-9fa0-483f-a812-d305b4b1d813)

Machine-readable exports

Claim Cards Passport JSON RO-Crate JSON