Zero-Knowledge Proofs for Clinical Research

Abstract

Multi-institutional health research â€” the kind of large-scale, collaborative science needed to advance precision medicine â€” is fundamentally bottlenecked by data sharing. Patient records are sensitive, regulatory frameworks are strict, and institutions are understandably reluctant to share raw clinical data, even for legitimate research purposes. The result is that many scientifically important questions go unanswered not because the data doesn’t exist, but because it can’t be moved.

Zero-knowledge proofs offer an elegant solution to this problem. A zero-knowledge proof allows one party to prove to another that a computation was performed correctly â€” that inclusion/exclusion criteria were applied, that a regression was run, that a statistical test yielded a specific result â€” without revealing any of the underlying data. The verifier learns that the statement is true, but learns nothing else. Applied to clinical research, this means that institutions could verify each other’s analyses without ever exchanging patient records.

In collaboration with researchers in security and cryptography, I am co-designing zero-knowledge proof systems tailored specifically to clinical research workflows. The first system, CoSMeTIC (Computational Sparse Merkle Trees with Inclusion-Exclusion proofs), enables verifiable computation of cohort selection criteria on sensitive datasets. The second extends this framework to membership verification in linear regression analysis â€” proving that specific data points were included in a regression without revealing the data itself.

CoSMeTIC is under review at ACM CCS, and the regression extension targets USENIX Security Symposium. This line of work addresses a growing and increasingly urgent need: as multi-institutional research consortia become the norm in precision medicine, the ability to conduct verifiable, privacy-preserving computation on clinical data will be essential infrastructure.