Photo of Monika Henzinger standing in front of green wall

Back to all posts

Differential Privacy Over Time

by Martin Tappler — 1 April 2026
Topics: Interview

Ahead of her invited talk at ETAPS 2026, we spoke with Monika Henzinger about how she entered the field of differential privacy, the challenges it presents, and the solutions it offers.

Could you start by introducing yourself and your research area?

I am a professor at the Institute of Science and Technology Austria and currently work in two research fields, namely efficient algorithms and data structures, and differential privacy.

I am curious about your recent focus on differential privacy. Coming from efficient algorithms and data structures, was this a natural transition, or how did your work lead to this area?

Actually, it started during the COVID period, when everybody was talking about privacy. I began looking into the theoretical computer science aspects of privacy. That’s how I found this field of differential privacy, which is essentially a subfield of algorithms, and therefore it fits into my broader research area of combinatorial algorithms. At the time, I noticed that very few researchers in Europe were working on it (which has changed since). This motivated me to take a daring step and write an ERC proposal on differential privacy. I had always worked on a variety of notions of efficient algorithms, including time-efficient, space-efficient algorithms, or online algorithms, but not on privacy.

Could you give an intuitive explanation of what it guarantees, and how it differs from traditional notions of privacy, like data anonymization?

The idea behind differential privacy is that we want to provide statistics about gathered data without revealing information about individuals who contributed to it. In contrast to other forms of privacy, the recipient of the statistics is a potential “privacy adversary” rather than the operator performing computations.

Let me illustrate the problem with a simple example: assume you publish the average salary of a group of 10 people. After one person leaves the group and the average salary is recomputed, it is easy to determine the salary of the person who left. This example illustrates that basic statistics can actually leak sensitive information.

Differential privacy addresses this problem by carefully adding noise to the computation. The goal is to keep the average approximately the same while avoiding information leakage when individual contributions are removed or added.

More technically, differential privacy assumes that the set of possible inputs for a computational problem has a neighboring relation. For example, if a possible input is an n × d table of real numbers, then two such tables might be neighboring if the entries in at most one row are different and all other rows are identical in the two tables. A (randomized) algorithm is differentially private if (informally speaking) its output distribution on neighboring inputs is very similar. Based on the output distribution, it should be very unlikely that it is possible to determine which input (out of two neighboring inputs) was given to the algorithm. This definition has various useful properties, and it guarantees a much stronger privacy protection than anonymization.

So, you get statistical guarantees? On what do they depend?

Yes, exactly. Differential privacy provides a statistical guarantee. Since you add random noise, there’s a small chance that the added noise is very small or even zero. In this case, you can infer information about an individual, such as the salary in our example.

Differential privacy algorithms control the risk using a parameter called ε (epsilon), which is related to the probability that information about an individual can be inferred. The smaller you set the epsilon parameter, the stronger the privacy protection.

Differential privacy is becoming increasingly important in today’s AI-driven systems, such as large language models and other machine learning applications. What makes it particularly relevant now?

In order to guarantee that it is very unlikely that the input to an AI-driven system can be reconstructed from its output, the training algorithm can be designed to be differentially private. For example, training algorithms for large language models can be made differentially private by adding small amounts of noise during training. With this, it is possible to avoid leaking sensitive information contained in the training data, such as social security numbers.

The widespread use of AI-driven systems makes this a very timely problem. In this area, I have developed an algorithm for a fundamental computational problem arising in current training algorithms. A variant of this algorithm has been extended, and its running time has been improved by Google to train some of their large language models.

Differential privacy clearly provides benefits. Does it introduce trade-offs? Do you face particular challenges when applying it in complex AI systems involving massive datasets?

In general, we care about two things: privacy protection, meaning how unlikely it is that information about an individual is disclosed, and the error in the results, that is, the accuracy or the utility of the result. Since differential privacy works by adding noise, there is a trade-off between privacy and accuracy.

Interestingly, having more data can actually help. Since the relative contribution of each individual data point decreases, for some computational problems, you need to add less noise, which improves the utility of the result.

In addition to affecting accuracy, adding noise may slow down runtime, especially if noise needs to be added at multiple points in a computation. New algorithms, such as the one by Google mentioned, aim to improve efficiency.

Can formal methods help address these challenges? What does the development of algorithms in this area look like?

Differentially private algorithms are often very delicate in the sense that a small change in the algorithm can already break the differential privacy guarantee. Thus, guaranteeing that an implementation of a differentially private algorithm is indeed differentially private can be challenging, and this is exactly a question where formal methods can help.

There is an initiative, called OpenDP, started at Harvard, which provides open-source implementations of differentially private algorithms. To include an algorithm in OpenDP, contributors must provide a formal proof that their implementation satisfies differential privacy. The framework uses a specification style similar to Hoare triples: assuming certain preconditions, one must prove that a postcondition holds that captures the privacy guarantees of the implementation.

Formal methods could further support this process by automating parts of these proofs, for example, by generating or checking proof steps.

Your ERC Advanced Grant addresses privacy assurance, which falls under GDPR. From your perspective, how are regulatory bodies and large corporations approaching differential privacy in practice, and what gaps still exist between theory and deployment?

Several large corporations, including Google, Apple, and Microsoft, report using differential privacy in practice. For example, Microsoft uses it to collect statistics about emoji usage in a privacy-preserving manner. Companies that adopt these techniques often highlight them publicly, for instance, in blog posts.

However, the privacy assurance requirements in GDPR are written in legal language and not in a mathematical formal way. Thus, it is currently unclear whether, from a legal point of view, the output of a differentially private algorithm applied to personal data fulfills the requirements of the GDPR.

For Ph.D. students or early-career researchers interested in exploring differential privacy, what advice would you give?

First of all, understand differential privacy well, as the definitions are often quite subtle. You also need to understand what the right research questions are. In contrast to other areas of algorithms, you cannot focus solely on improving asymptotic complexity, but you need to be aware of which variants and definitions of differential privacy are relevant to the problem you are studying.

Finally, is there a paper or a book that you would recommend people to read?

There is a great book by Cynthia Dwork and Aaron Roth called “The Algorithmic Foundations of Differential Privacy”. I read it as an introduction when I got into differential privacy and found it very understandable. It is available online.

Your upcoming talk discusses differential privacy in streaming applications. What makes this setting particularly challenging?

In streaming applications, you release statistics repeatedly, and with every release, an adversary can potentially learn a little more about the underlying data. In other words, each release may leak a small amount of privacy. If one is not careful, these leaks accumulate, allowing increasingly precise inferences about the data. This problem becomes even more challenging when the dataset changes over time, since an adversary may add or remove items.

To counter that, it is necessary to devise more sophisticated mechanisms for adding noise that take streaming applications into account and analyze the effect of repeated queries and dataset updates. This is the type of problem I have specialized in, and I will discuss these ideas in more detail in my talk.

See more posts

Differential Privacy Over Time

Inhale, Exhale, Verify

People of ETAPS

From SMT Solvers to Lean and the Future of Automated Reasoning