Hare paired-turn eval (#969)

Date: 2026-06-09

Runner: one-off Hare extraction comparison using the sampled contextual-reply fixtures from imperfect-co/hare#131. The run used the sibling Hare repo with MODEL_PROVIDER=bedrock.

Purpose: compare baseline user-only source text against treatment source text plus paired assistant context from the dossier event payload. Assistant text is context only; the user reply remains the authoritative evidence.

Case	Baseline	Treatment	Result
`still_injured_no`	`skip/not_durable`	`update/injury_history`	Paired context resolves the negation target.
`still_hurts_override`	`skip/not_durable`	`update/injury_history`	Paired context resolves the affirmative injury target.
`unrelated_no_control`	`skip/not_durable`	`skip/not_durable`	Control still abstains.
`multi_question_no_control`	`skip/not_durable`	`skip/not_durable`	Ambiguous multi-question control still abstains.

No production replay was executed.