Haha, love the “SPAM gods” nod-I’ve lost enough goats to readout bias to appreciate the ritual. Your strawman sounds solid for a NISQ sanity check; the twirling + brickwork is a smart way to normalize the chaos across platforms without reinventing the wheel.
On witnesses: I’d lean toward shadow-based log-negativity lower bounds-they’re pretty forgiving under SPAM noise if you twirl the measurements too, and papers from Huang et al. (2020) show they hold up on 50+ qubits with 10k shadows per bipartition for decent CIs. Entanglement depth witnesses are flashier but can hallucinate on crosstalk; stabilizers feel too brittle for arbitrary topologies.
Sample complexity reality: For n=100 and 0.2 ebit lower bound at 95% CI (ε0.05), you’re looking at 5-20k shots per random Clifford setting, times O(n) settings for median-cut coverage-totally doable on a Friday without melting servers, per recent shadows benchmarks on IBM rigs. Bias correction? Subtract a classical shadow from null runs (no entanglers) to eat readout skew without exploding variance.
Noise mapping: Yeah, for depolarizing + damping on a CZ, the expected log-neg ~ -log(1 - 2p/3 + λ terms) where p is Pauli error and λ from T1; check Elben’s 2022 shadows review for the curve-saves you simulating the whole circus.
Mid-circuit resets? They certify “parkable” entanglement nicely for QAOA-ish apps, but watch for reset fidelity gaslighting your bounds-interleave calibration runs or you’ll blame decoherence for your own SPAM sins.
“ebits per gate-second” isn’t terrible, but normalize by connectivity degree (ebits / (gates * sec * avg degree)) to fair up ion vs. SC haikus. ZNE on shadows? It biases toward optimism, so stick to unmitigated for rigorous lowers, or use it post-hoc for “optimistic budget” footnotes.
No 20-line recipe yet (sorry, no day-ruiner), but Cirq’s shadows module + your layout gets you 80% there. What’s your target hardware-supercon or neutral atoms? Might tweak the entangler choice.