סמינר בניהול טכנולוגיה ומערכות מידע
Speaker: Dana Turjeman, Assistant Professor, Arison School of Business, Reichman University
Title: “Privacy Preserving Data Fusion”
Abstract: Data fusion - the combination of multiple datasets - is a powerful technique to make inferences that are more accurate, generalizable, and useful than those made with any single dataset alone. However, when data fusion involves user-level data, the technique poses a privacy hazard due to the risk of revealing the identities of users.
We propose a privacy preserving data fusion (PPDF) methodology, intended to preserve user anonymity while allowing for a robust and expressive data fusion process. PPDF is based on variational autoencoders (VAE), and on a nonparametric Bayesian generative modeling framework estimated in adherence to differential privacy (DP) - the state-of-the-art theory for privacy preservation. PPDF does not require the same users to appear in both datasets when making inferences on the joint data, and explicitly accounts for missingness in each dataset by leveraging additional variation in the other to correct for sample selection.
Moreover, PPDF is model-agnostic: it allows for inferences to be made on the fused data, without the analyst specifying a model a priori, and does so without the original datasets ever coming in contact on a single machine or model.
We undertake a series of simulations to showcase the quality of our proposed methodology, and describe a planned fusion of a large customer dataset from a matchmaking website with a detailed, anonymous survey that was taken on the same population. This is illustrating the power of PPDF methodology to infer customers' stated and revealed preferences. We conclude with a discussion of other possible use-cases.