ECML PKDD 2011 Tutorial
Privacy Challenges and Solutions for Medical Data Sharing

Tutorial information

Various types of data, including demographics, clinical, and genomic information, are increasingly collected and stored in Electronic Medical Record (EMR) systems and biomedical research repositories. Such data have been traditionally used in automating the workflow of healthcare, but were recently recognized as an invaluable source for performing large-scale and low-cost biological, medical, and healthcare analysis and decision making. These tasks are essential for the discovery of new drugs and therapies, and are a key step towards realizing the vision of personalized medicine. As a result, over $50 Billion were pledged by the Obama administration in 2009 to promote technologies for managing and sharing medical data. Meanwhile, detailed medical data are increasingly disseminated beyond the institution they were collected by, in accordance with data sharing regulations, such as the policy of the National Institutes of Health (NIH) for genomic information. This, however, may pose serious threats to patients' privacy, which must be eliminated to comply with data sharing policies and legislation, such as the HIPAA privacy rule and the EU Directive 95/46/CE.

In this tutorial, we will elaborate on the need of sharing medical data in a privacy-preserving way, review the existing policies and practices for sharing medical data, and present state-of-the-art approaches for ensuring that the disseminated data are protected and useful. Following that, we will highlight important open problems and future directions. More specifically, the tutorial will consist of three parts. The first part will provide an overview of successful practices and paradigms to share and use medical data in applications. We will focus on the analysis and mining tasks supported by different types of medical data, as well as on privacy threats that data sharing entails. The second part of the tutorial will survey approaches for privacy-preserving medical data sharing. We will address a number of important issues, such as capturing and balancing data utility and privacy in applications, and designing privacy techniques for different types of data and data sharing scenarios. We will also present interesting case studies using data from the US Census and the EMR system of the Vanderbilt University Medical Center, a state-of-the-art system that stores information about 2 Million patients over 15 years. In the third part of the tutorial, we will discuss important open problems and provide a roadmap for the future.

By the end of this tutorial, the attendees will have a basic understanding of the concepts and underlying principles used to disseminate medical data in a protected and useful form. The tutorial will be accessible to computer science researchers and educators, who are interested in data privacy, data mining, and information systems, as well as to industry developers. By focusing on open problems, we also hope to engage graduate students to conduct research in this emerging and interesting field.

back to top

Tutorial outline

Part 1: Motivation – Medical data sharing and use

Part 2: Research challenges and state-of-the-art solutions

Part 3: Open problems and research challenges

back to top

Tutorial materials

PDF The slides of the tutorial can be downloaded in PDF format.

Tutorial audience

The target audience of the tutorial includes:



Aris Gkoulalas-DivanisAris Gkoulalas-Divanis is a research staff member in the Information Analytics Lab at IBM Research-Zurich. Prior to that, he was a postdoctoral research fellow in the Health Information Privacy LABoratory (HIPLAB) in Vanderbilt University (2009-2010), working on privacy for medical data. Aris received the Diploma from the University of Ioannina, the MS from the University of Minnesota, and the PhD from the University of Thessaly, all in Computer Science. His PhD dissertation was awarded the Certificate of Recognition and Honorable Mention in the 2009 ACM SIGKDD. His research interests are in the areas of databases, data mining, privacy-preserving data mining, privacy in medical data, and knowledge hiding. He is a Professional member of ACM, IEEE, SIAM and AAAS, and an at-large member of UPE and Sigma-Xi.


Grigorios LoukidesGrigorios Loukides is an Assistant Professor (Lecturer) in the School of Computer Science and Informatics, Cardiff University, and a Royal Academy of Engineering Research Fellow. Prior to that, he was a postdoctoral researcher in the Health Information Privacy LABoratory (HIPLAB), Vanderbilt University. He received a Diploma from the University of Crete (2005) and a Ph.D. from Cardiff University (2009), both in Computer Science. Grigorios' research interests are in privacy-preserving data mining and biomedical informatics. He has investigated both theoretical and practical research aspects, including algorithmic design, optimization, and formal modeling, and explored interesting applications in healthcare and business. Grigorios is a Professional member of the ACM.

back to top



back to top