How to Use ESCIMate

A guide to using the tool and interpreting results

Quick Start

1
Upload

Upload a PDF, HTML, or DOCX file — or paste text directly.

2
Run

Click RUN AUDIT and wait for analysis. First request may take ~90s (server cold start).

3
Review

Green = consistent, amber = review needed, red = discrepancy found.

Status Definitions

PASS
Consistent: The numbers check out.

We recomputed this statistic and it matches what the paper reports. The effect size and p-value are consistent with each other.

No action needed. The numbers are consistent.

OK
Verified: The p-value checks out (no effect size to compare).

No effect size was reported for this test, so we could only check the p-value. It's consistent with the test statistic.

No action needed. P-value is consistent (no effect size to compare).

NOTE
Minor caveat: Likely correct, but with a small caveat.

The result appears consistent, but we couldn't fully verify it — for example, the study design was unclear, which affects the exact effect size calculation.

Review the caveat if it matters for your analysis.

WARN
Needs review: Something looks off — worth double-checking.

There is a moderate discrepancy between the reported and computed values, or the significance conclusion may change. This doesn't necessarily mean there's an error, but it deserves a closer look.

Worth double-checking the original calculation in the paper.

ERROR
Inconsistency: The numbers don't add up.

There is a large discrepancy between the reported and computed values. This could indicate a typo, copy-paste error, or calculation mistake in the paper.

Investigate the discrepancy — could be a typo, rounding, or calculation error.

SKIP
Not analyzed: Extracted but could not be verified.

This test statistic was found in the text, but no p-value or effect size was reported alongside it, so nothing could be checked.

Not verifiable. Consider reporting original values more completely.

Supported Test Types

tt-test: Compares the means of two groups to see if they differ significantly.
FF-test (ANOVA): Tests whether the means of two or more groups are significantly different.
rCorrelation: Measures the strength and direction of a linear relationship between two variables.
chisqChi-square test: Tests whether observed counts differ from what you'd expect by chance.
chi2Chi-square test: Tests whether observed counts differ from what you'd expect by chance.
zz-test: Similar to a t-test, used with large samples or known population variance.
UMann-Whitney U: Compares two groups without assuming data is normally distributed.
WWilcoxon W: Compares paired observations without assuming normal distribution.
HKruskal-Wallis H: Compares three or more groups without assuming normal distribution.
regressionRegression: Tests whether a predictor variable is significantly related to an outcome.

Effect Size Glossary

dCohen's d: How many standard deviations apart two group means are. Small ≈ 0.2, Medium ≈ 0.5, Large ≈ 0.8.
gHedges' g: Like Cohen's d but corrected for small sample bias.
dzCohen's dz: Effect size for paired (within-subjects) designs.
davCohen's dav: Average standardized difference for repeated measures.
drmCohen's drm: Repeated-measures effect size accounting for correlation between measurements.
eta2Eta-squared: The percentage of total variation explained by the factor. Like R² for ANOVA.
etap2Partial eta-squared: Percentage of variance explained by a factor, controlling for other factors.
partial_eta2Partial eta-squared: Percentage of variance explained by a factor, controlling for other factors.
etaEta: The square root of eta-squared — a correlation-like measure for ANOVA.
omega2Omega-squared: A less biased estimate of variance explained than eta-squared.
cohens_fCohen's f: Effect size for ANOVA. Small ≈ 0.10, Medium ≈ 0.25, Large ≈ 0.40.
rCorrelation (r): Strength and direction of a linear relationship. Ranges from -1 to +1.
phiPhi coefficient: Effect size for 2×2 chi-square tests, equivalent to a correlation.
VCramér's V: Effect size for chi-square tests with tables larger than 2×2.
rank_biserial_rRank-biserial correlation: Effect size for Mann-Whitney U tests. Ranges from -1 to +1.
cliffs_deltaCliff's delta: How often values in one group exceed the other. Ranges from -1 to +1.
epsilon_squaredEpsilon-squared: Effect size for Kruskal-Wallis tests. Proportion of variance in ranks.
kendalls_WKendall's W: Agreement among raters/rankings. Ranges from 0 (no agreement) to 1 (complete).
betaRegression coefficient (β): How much the outcome changes per unit change in the predictor.
standardized_betaStandardized beta: Regression coefficient in standard deviation units, comparable across predictors.
bRegression coefficient (b): The unstandardized change in outcome per unit of predictor.

Known Limitations

  • • Table-format statistics (not parsed from table structures)
  • • Sign errors not detected (absolute value comparison by design)
  • • Some multi-stat sentences: only first statistic captured (~2% of detections)
  • • eta² from F actually computes partial eta² (total eta² requires SS, not available from F/df alone)
  • • Welch t-test N estimation is approximate (depends on 4 unknowns)
  • • Design-ambiguous t-tests may show WARN when paired/independent is unclear

Frequently Asked Questions

Reporting Bugs

Found an issue? Please report it on GitHub Issues.

Include: the input text or PDF, the expected result, and what ESCIMate reported.