Abstract
Social and communication networks are formed by entities (such as individuals or computer hosts) and their connections (which may be contacts, relationships, or flows of information). Such networks are analyzed to understand the influence of individuals in organizations, the transmission of disease in communities, the operation of computer networks, among many other topics. While network data can now be recorded at unprecedented scale, releasing it can result in unacceptable disclosures about participants and their relationships. As a result, privacy concerns are severely constraining the dissemination of network data and disrupting the emerging field of network science.
In this talk I will give an overview of recent approaches to protecting network data, including both network anonymization and the application of "differential privacy" to networks. To satisfy differential privacy, statistics about the topology of a sensitive network are perturbed before being released. I will describe recent results on the properties of a network that can be accurately estimated in this model. I will show that the degree distribution of a network can be very accurately estimated by a novel technique in which constraints are applied to the noisy output to improve utility. Studying motifs is fundamentally harder, but can be done with acceptable accuracy if the privacy condition is relaxed. Finally, I will discuss current efforts to generate accurate synthetic networks from privately-learned statistics, a problem which draws heavily on network modeling techniques.
|