I want the simplest possible “are we entangled yet?” test that doesn’t require full tomography, a PhD in loophole-free Bell experiments, or sacrificing a small goat to the SPAM gods. I’m after a push-button, NISQ-friendly protocol that returns a single number with an honest interpretation: “you’ve got at least X ebits across Y of the chip,” with error bars that aren’t aspirational fiction.
Constraints, because of course there are constraints:
- Works on arbitrary topologies with limited connectivity and native two-qubit gates that refuse to commute politely.
- Depth budget tiny enough to finish before T1, T2, and my patience all decohere.
- SPAM-robust(ish): tolerant of biased readout and mild calibration lies without turning “entangled” into “we measured Z and hoped.”
- Scales to, say, 50-200 qubits without data collection that melts a data center. O(poly(n), poly(log(1/ε))) would be dreamy.
- Produces a lower bound on something meaningful: log-negativity, entanglement of formation, entanglement depth, or number of distillable Bell pairs per layer.
- Comparable across platforms: one metric that doesn’t require translating from “ion trap poetry” to “superconducting haiku.”
What I think might work (aka “Hello, Entanglement” benchmark v0.1):
1) Random local Cliffords on all qubits (twirling to spread errors and kill basis bias).
2) One brickwork layer of native entanglers across available edges (optionally two layers, alternating orientation).
3) Measure in Z. Repeat with fresh random local Cliffords and the same entangler layout.
4) Use classical shadows or randomized-measurement estimators to:
- Estimate 2-Rényi entropies on random bipartitions.
- Lower-bound log-negativity on a subset of cuts via shadow-based witnesses.
- Extract an “entanglement depth” witness for contiguous or graph-induced blocks.
5) Report:
- Minimum guaranteed ebits across the median cut (lower bound).
- Entanglement per two-qubit gate-second: ebits / (gate count × gate duration), with confidence intervals.
- An “entangling power” vs “effective noise” fit by comparing to a noisy two-qubit channel model learned from interleaved RB or gate-set tomography-lite.
6) Optional: insert mid-circuit resets on a checkerboard to show you can create, park, and recycle entanglement without it evaporating into crosstalk folklore.
Why I think this is plausible:
- Randomized measurements and shadows can estimate purities and certain entanglement witnesses with O(poly(n)) samples; several groups have used this up to dozens of qubits without summoning tomography demons.
- Lower bounds avoid the “we simulated half the Hilbert space” problem and still say something nontrivial when noise is non-zero.
- Twirling plus brickwork gives apples-to-apples across hardware with different native gates.
Actual questions for the hive mind:
- Best-in-class witness choice: If I want a single scalar with a transparent meaning, which do you trust most in practice? Shadow-based log-negativity lower bounds, entanglement depth witnesses, or something stabilizer-ish that behaves nicely under SPAM?
- SPAM-robustness: With biased readout and mild crosstalk, which estimators degrade gracefully instead of impersonating entanglement? Any favorite bias-correction tricks that don’t double my sample complexity?
- Sample complexity reality check: For n≈100, how many random settings and shots do we need for a 95% CI on, say, a 0.2-0.5 ebit lower bound across typical bipartitions?
- Noise models: Is there a practical closed-form mapping from a two-qubit entangler with depolarizing + amplitude damping noise to expected CHSH value or log-negativity? I’d love a calibration curve to convert hardware error rates into an entanglement budget before I run anything.
- Mid-circuit measurement/reset: Does sprinkling resets help certify “useful” entanglement, or does it just create new SPAM pathways that gaslight the witness?
- Cross-platform comparability: Is “ebits per two-qubit gate-second” a terrible idea? If so, what’s a better hardware-agnostic normalization that correlates with algorithmic performance (e.g., QAOA depth that still beats the best classical baseline)?
- Error mitigation: If I zero-noise-extrapolate the witness, do I break its interpretability as a lower bound? Any mitigation that preserves a rigorous bound rather than a pretty number?
Known gotchas I’m trying to dodge:
- Tomography scaling. Hard pass.
- Device-independent Bell tests. Love them philosophically, can’t afford the statistical and loophole overhead on NISQ Tuesday.
- “Entanglement” that’s actually classical crosstalk and readout correlations cosplaying as quantum magic.
If someone already packaged this as a 20-line recipe (Qiskit/Cirq/PennyLane/handwaving), please ruin my day with a link. Otherwise, critique the strawman, propose better witnesses/estimators, or tell me why this benchmark will absolutely self-destruct on real hardware so I can fail faster and with more dignity.