How to Use ESCIMate

A guide to using the tool and interpreting results

Quick Start

Upload

Upload a PDF, HTML, or DOCX file — or paste text directly.

Run

Click RUN AUDIT and wait for analysis. First request may take ~90s (server cold start).

Review

Green = consistent, amber = review needed, red = discrepancy found.

Status Definitions

PASS

Consistent: The numbers check out.

We recomputed this statistic and it matches what the paper reports. The effect size and p-value are consistent with each other.

No action needed. The numbers are consistent.

Verified: The p-value checks out (no effect size to compare).

No effect size was reported for this test, so we could only check the p-value. It's consistent with the test statistic.

No action needed. P-value is consistent (no effect size to compare).

NOTE

Minor caveat: Likely correct, but with a small caveat.

The result appears consistent, but we couldn't fully verify it — for example, the study design was unclear, which affects the exact effect size calculation.

Review the caveat if it matters for your analysis.

WARN

Needs review: Something looks off — worth double-checking.

There is a moderate discrepancy between the reported and computed values, or the significance conclusion may change. This doesn't necessarily mean there's an error, but it deserves a closer look.

Worth double-checking the original calculation in the paper.

ERROR

Inconsistency: The numbers don't add up.

There is a large discrepancy between the reported and computed values. This could indicate a typo, copy-paste error, or calculation mistake in the paper.

Investigate the discrepancy — could be a typo, rounding, or calculation error.

SKIP

Not analyzed: Extracted but could not be verified.

This test statistic was found in the text, but no p-value or effect size was reported alongside it, so nothing could be checked.

Not verifiable. Consider reporting original values more completely.

Supported Test Types

tt-test: Compares the means of two groups to see if they differ significantly.

FF-test (ANOVA): Tests whether the means of two or more groups are significantly different.

rCorrelation: Measures the strength and direction of a linear relationship between two variables.

chisqChi-square test: Tests whether observed counts differ from what you'd expect by chance.

chi2Chi-square test: Tests whether observed counts differ from what you'd expect by chance.

zz-test: Similar to a t-test, used with large samples or known population variance.

UMann-Whitney U: Compares two groups without assuming data is normally distributed.

WWilcoxon W: Compares paired observations without assuming normal distribution.

HKruskal-Wallis H: Compares three or more groups without assuming normal distribution.

regressionRegression: Tests whether a predictor variable is significantly related to an outcome.

Effect Size Glossary

d — Cohen's d: How many standard deviations apart two group means are. Small ≈ 0.2, Medium ≈ 0.5, Large ≈ 0.8.

g — Hedges' g: Like Cohen's d but corrected for small sample bias.

dz — Cohen's dz: Effect size for paired (within-subjects) designs.

dav — Cohen's dav: Average standardized difference for repeated measures.

drm — Cohen's drm: Repeated-measures effect size accounting for correlation between measurements.

eta2 — Eta-squared: The percentage of total variation explained by the factor. Like R² for ANOVA.

etap2 — Partial eta-squared: Percentage of variance explained by a factor, controlling for other factors.

partial_eta2 — Partial eta-squared: Percentage of variance explained by a factor, controlling for other factors.

eta — Eta: The square root of eta-squared — a correlation-like measure for ANOVA.

omega2 — Omega-squared: A less biased estimate of variance explained than eta-squared.

cohens_f — Cohen's f: Effect size for ANOVA. Small ≈ 0.10, Medium ≈ 0.25, Large ≈ 0.40.

r — Correlation (r): Strength and direction of a linear relationship. Ranges from -1 to +1.

phi — Phi coefficient: Effect size for 2×2 chi-square tests, equivalent to a correlation.

V — Cramér's V: Effect size for chi-square tests with tables larger than 2×2.

rank_biserial_r — Rank-biserial correlation: Effect size for Mann-Whitney U tests. Ranges from -1 to +1.

cliffs_delta — Cliff's delta: How often values in one group exceed the other. Ranges from -1 to +1.

epsilon_squared — Epsilon-squared: Effect size for Kruskal-Wallis tests. Proportion of variance in ranks.

kendalls_W — Kendall's W: Agreement among raters/rankings. Ranges from 0 (no agreement) to 1 (complete).

beta — Regression coefficient (β): How much the outcome changes per unit change in the predictor.

standardized_beta — Standardized beta: Regression coefficient in standard deviation units, comparable across predictors.

b — Regression coefficient (b): The unstandardized change in outcome per unit of predictor.

Known Limitations

• Table-format statistics (not parsed from table structures)
• Sign errors not detected (absolute value comparison by design)
• Some multi-stat sentences: only first statistic captured (~2% of detections)
• eta² from F actually computes partial eta² (total eta² requires SS, not available from F/df alone)
• Welch t-test N estimation is approximate (depends on 4 unknowns)
• Design-ambiguous t-tests may show WARN when paired/independent is unclear

Frequently Asked Questions

Reporting Bugs

Found an issue? Please report it on GitHub Issues.

Include: the input text or PDF, the expected result, and what ESCIMate reported.