Privacy Aware Experimentation: Private Hypothesis Testing over Sensitive Groups
We study a new privacy model where users belong to certain sensitive groups and we would like to conduct statistical inference on whether there is significant differences in outcomes between the various groups. In particular we do not consider the outcome of users to be sensitive, rather only the membership to certain groups. This is in contrast to previous work that has considered locally private statistical tests, where outcomes and groups are jointly privatized, as well as private A/B testing where the groups are considered public (control and treatment groups) while the outcomes are privatized. We cover several different settings of hypothesis tests after group membership has been privatized amongst the samples, including binary and real valued outcomes. We adopt the generalized chi square testing framework used in other works on hypothesis testing in different privacy models, which allows us to cover Z-tests, chi square tests, t-tests, and ANOVA tests with a single unified approach. When considering two groups, we derive confidence intervals for the true difference in means and show traditional approaches for computing confidence intervals miss the true difference when privacy is introduced. For more than two groups, we consider several mechanisms for privatizing the group membership, showing that we can improve statistical power over the traditional tests that ignore the noise due to privacy. We also consider the application to private A/B testing to determine whether there is a significant change in the difference in means across sensitive groups between the control and treatment.