
Modern sequential inference, applied to election auditing
Horizon Lecture, Australian Statistical Conference 2025
1 December 2025
Planned study: 200 participants
After 40 participants, you decide to do an interim analysis 👁️:
Could also have checked at 40, 80, 120, 160 participants.
High probability of accidentally seeing p < 0.05 at least once.
Planned study: 200 participants
After 200 participants, you do the analysis:
After 20 more: p = 0.06
And another 20 more: p = 0.049 🎯 → publish!
Similar to peeking, large chance of seeing p < 0.05.
Statistical “abstinence”
Abstinence is brittle…
Can we design statistical methods that
remain valid even if we analyse the data
whenever we like?

Developed by Wald (1945)
Birth of sequential analysis
Data (X_i): 0, 0, 1, 1, 0, 1, \dots
Hypotheses: \begin{cases} H_0\colon p = p_0 \\ H_1\colon p = p_1 \end{cases}

Test statistic: S_t = \frac{L_t(p_1)}{L_t(p_0)} = \frac{\Pr(X_1, \dots, X_t \mid H_1)} {\Pr(X_1, \dots, X_t \mid H_0)} = \frac{p_1^{Y_t} (1 - p_1)^{t - Y_t}} {p_0^{Y_t} (1 - p_0)^{t - Y_t}}
Where: Y_t = \sum_{i=1}^t X_i
Stopping rule: \begin{cases} S_t \leqslant \frac{\beta}{1 - \alpha} & \Rightarrow \text{accept } H_0 \\ S_t \geqslant \frac{1 - \beta}{\alpha} & \Rightarrow \text{accept } H_1 \\ \text{otherwise} & \Rightarrow \text{keep sampling} \end{cases}
Error control:

Parameters:
n = 38 for fixed-n test with same power
SPRT often stops earlier
H_0\colon X \sim f_0(\cdot) \quad \text{vs} \quad H_1\colon X \sim f_1(\cdot)
Test statistic: S_t = \frac{f_1(X_1, X_2, \dots, X_t)} {f_0(X_1, X_2, \dots, X_t)}
Test statistic, with iid observations: S_t = \prod_{i=1}^t \frac{f_1(X_i)}{f_0(X_i)} = S_{t - 1} \times \frac{f_1(X_t)}{f_0(X_t)}


“Stays the same” on average

“Stays the same or reduces in value” on average
Let S_0, S_1, S_2, \dots be a test supermartingale (TSM):
has non-negative terms (S_i \geqslant 0) and starting value S_0 = 1.
For any \alpha > 0, \Pr\left(\max_t (S_t) \geqslant \frac{1}{\alpha}\right) \leqslant \alpha


H_0\colon \theta = \theta_0 \quad \text{vs} \quad H_1\colon \theta \in \mathcal{A}
Plug-in strategy: S_t = S_{t-1} \times \frac{f(X_t \mid \hat\theta_t)}{f(X_t \mid \theta_0)} \hat\theta_t = a(X_1, X_2, \dots, X_{t-1}) \in \mathcal{A}
H_0\colon \theta = \theta_0 \quad \text{vs} \quad H_1\colon \theta \in \mathcal{A}
Mixture strategy: S_t = S_{t-1} \times \frac{\int_{\mathcal{A}} f(X_t \mid \theta) \, g_t(\theta) \, d\theta}{f(X_t \mid \theta_0)} g_t(\theta) \text{ is a distribution over } \mathcal{A} g_t(\theta) \text{ can depend on } X_1, X_2, \dots, X_{t-1}
SAVI: “safe, anytime-valid inference”
Anytime-valid CIs, p-values and test decisions.
Review paper: Ramdas et al. (2023)
Australian Research Council:
More automation \rightarrow less scrutiny
Goal:
Gold standard:
Alternative: statistics! 📊
Election: Albo vs Dutton 🤼
p = \text{Proportion of votes for Albo}
\begin{cases} H_0\colon p \leqslant 0.5, & \text{Albo did not win} \\ H_1\colon p > 0.5, & \text{Albo won} \end{cases}
SPRT with a plug-in estimator of p:
S_t = S_{t-1} \times \frac{\Pr(X_t \mid \hat{p}_t)}{\Pr(X_t \mid p = 0.5)} \hat{p}_t = h(X_1, \dots, X_{t-1}) \in (0.5, 1]
“ALPHA supermartingale” (Stark 2023)
Rank the candidates in order of preference.
Examples:
(1) Lynne, (2) Ben, (3) Ian
(1) Ben, (2) Lynne, (3) Ian
(1) Ian, (2) Lynne
(1) Ben
Iterative algorithm:
This produces an elimination order.
Example:
H_0 includes all elimination orders with
alternative winners (“alt-orders”).
k candidates \Rightarrow \sim k! ballot types
k candidates \Rightarrow \sim k! alt-orders
Challenges:
Example:
“Lynne has more votes than Ben assuming that only
candidates in the set \mathcal{S} remain standing.”
Equivalent to a two-candidate election.
Can test using ALPHA.
Unions: H = H^1 \cup H^2 \cup \dots \cup H^m \text{Reject every } H^j \Rightarrow \text{Reject } H
Intersections: H = H^1 \cap H^2 \cap \dots \cap H^m \text{Reject at least one } H^j \Rightarrow \text{Reject } H
Base TSM: for H^j
S_{j,t} = \prod_{i = 1}^t s_{j, i}
Intersection TSM: for H
S_t = \prod_{i = 1}^t \frac{\sum_j w_{j, i} \, s_{j, i}}{\sum_jw_{j, i}}
The weights w_{j, i} can depend on data up until time i - 1.
How to choose the weights w_{j, i}?
k! alt-orders
\rightarrow 💥 explosion of decompositions & intersections
Use combinatorial optimisation methods.
\rightarrow see tomorrow’s talk 🎤
(Tue 11:30am, Experimental Design / Theory)
Many open problems!
Let’s modernise our statistical toolbox.
We can peek, we can adapt, and we can stay valid.
Time for safe stats 🤍, not statistical abstinence.
Ek, Stark, Stuckey, Vukcevic (2023). Adaptively Weighted Audits of Instant-Runoff Voting Elections: AWAIRE. In: Electronic Voting. E-Vote-ID 2023. Lecture Notes in Computer Science 14230:35–51.
Ramdas, Grünwald, Vovk, Shafer (2023). Game-Theoretic Statistics and Safe Anytime-Valid Inference. Statist. Sci. 38(4):576–601.
Stark (2023). ALPHA: Audit that learns from previously hand-audited ballots. Ann. Appl. Stat. 17(1):641–679.
Wald (1945). Sequential tests of statistical hypotheses. Ann. Math. Stat. 16(2):117–186.
The coin flipping graphic was designed by Wannapik.
The following photographs were provided by the Australian Electoral Commission and licensed under CC BY 3.0:
The Higgins ballot paper (available from The Conversation) was originally provided by Hshook/Wikimedia Commons and is licensed under CC BY-SA 4.0.
The photograph of Abraham Wald is freely available from Wikipedia.
The photograph of the 18th century print is freely available from the Digital Commonwealth; the original sits at the Boston Public Library.
The “safe stats / abstinence” joke was borrowed from Hadley Wickham.


“Martingales”: 🎲 gambling strategies from 18th century France ♣️
These strategies cannot “beat the house”
\Rightarrow no guaranteed profit
\Rightarrow at best, a “fair” game
Martingales (in mathematics) represent the 💰 wealth over time of a gambler playing a fair game.
Decision rule:
Ville’s inequality \rightarrow anytime-valid type I error control
Growth rate under H_1 \rightarrow statistical efficiency (“power”)
At time t: \mathbb{E}_{H_0} \left(\frac{f_1(X_t)}{f_0(X_t)}\right) = \int \frac{f_1(x)}{f_0(x)} f_0(x) \, dx = \int f_1(x) \, dx = 1
Martingales generalise likelihood ratios.
P_t = \frac{1}{E_t}
E_t > \frac{1}{\alpha} \quad \Longleftrightarrow \quad P_t < \alpha
Fixed-sample hypothesis test \longleftrightarrow confidence interval (CI)
Sequential hypothesis test \longleftrightarrow confidence sequence (CS)


Reanalysing the data with different test statistics (“betting strategies”) and only reporting the best one.
This is not safe stats.
Fundamental principle:
Declare your bet before you play.
Pre-registration is still an important tool.
However, anytime-valid methods retain flexibility even after pre-specification.
Xerox bug 🐛:


Source: https://www.dkriesel.com/
Possible decisions:
Consequences:
| Certify | Recount | |
|---|---|---|
| Reported result is wrong | ❌ | ✅ |
| Reported result is correct | ✅ | ❌ |
| Reject H_0 (certify) | Do not reject H_0 (recount) | |
|---|---|---|
| H_0\colon Reported result is wrong | Type 1 error ❌ | ✅ |
| H_1\colon Reported result is correct | ✅ | Type 2 error ❌ |
Type 1 error = Wrong democractic outcome
Type 2 error = Waste of resources (but no change of outcome)
Instant-runoff voting (IRV)
Single transferable vote (STV)
How to choose the weights w_{j, i}?
Ideas:
Compared about 30 schemes. “Largest” was often the best.