FPF Webinar Explores the Future of Privacy-Preserving Machine Learning

July 1, 2020

Sara Jordan

On June 8, FPF hosted a webinar, Privacy Preserving Machine Learning: New Research on Data and Model Privacy. Co-hosted by the FPF Artificial Intelligence Working Group and the Applied Privacy Research Coordination Network, an NSF project run by FPF, the webinar explored how machine learning models as well as data fed into machine learning models can be secured through tools and techniques to manage the flow of data and models in the ML ecosystem. Academic researchers from the US, EU, and Singapore presented their work to attendees from around the globe.

The papers presented, summarized in the associated Privacy Scholarship Reporter, represent key strategies in the evolution of private and secure machine learning research. Starting with her co-authored piece, Ms. Patricia Thaine of the University of Toronto outlined how combining privacy enhancing techniques (PETs) can lead to better, perhaps almost perfect, preservation of privacy for the personal data used in ML applications. The combination of techniques, including deidentification, homomorphic encryption, and others, draws on foundational and novel work in private machine learning, such as the use of collaborative or federated learning, pioneered by the next presenter, Professor Reza Shokri of the National University of Singapore.

Professor Shokri discussed data privacy issues in machine learning with a specific focus on “indirect and unintentional” risks such as may arise from metadata, data dependencies, and computations of data. He highlighted that reported statistics and models may provide sufficient information to allow an adversary to make use of the “inference avalanche” to link non-private information back to personal data. He elaborated on the point that models themselves may be personal data which need to be protected under data protection frameworks, such as the GDPR. To address these privacy risks from machine learning, Professor Shokri and his colleagues have developed the ML PrivacyMeter, which is an opensource tool available on GitHub. “ML Privacy Meter.. provides privacy risk scores which help in identifying data records among the training data that are under high risk of being leaked through the model parameters or predictions.”

Next, the foundational work on federated learning was applied to the domain of healthcare narrative data by Mr. Sven Festag, a speaker from Jena University Hospital who explored how collaborative privacy-preserving training of models that de-identify clinical narratives improves on the privacy protections available from de-identification run at a strictly local level. One of the reasons for the approach taken by Mr. Festag was the potential for malicious actors seeking detailed knowledge of models or data, a threat further assessed by the following speaker, Dr. Ahmed Salem from Universitaat Saarland.

Dr. Salem and his co-authors explored how machine learning models can be attacked and prompted to give incorrect answers through both static and dynamic triggers. Triggers, such as replacement of pixels in an image, can be designed to prompt machines to learn something or recommend something different from the data that does not include such triggers. When attack triggers are made dynamic, such as through use of algorithms trained to identify areas of vulnerability, the likelihood of a successful attack markedly increases.

Given the possibility that attacks against models could cause adverse outcomes, it seems likely to expect consensus that machine learning models need to be well protected. However, as Dr. Sun and his Northeastern University co-authors found, many mobile app developers do not, in fact, design or include sufficient security for their models. Failure to appropriately secure models can cost companies considerably, from reputational harm to financial losses.

Following all the presentations, the speakers joined FPF for a joint panel discussion about the general outlook for privacy preserving machine learning. They concur that the future of privacy in machine learning will necessarily include both data protections and model protections and will need to go beyond a simple compliance-focused effort. As a last question, speakers were asked their views on the most salient articles for privacy professionals to read on this topic. Those papers are listed below.

Information about the webinar, including slides from the presentation, some of the papers and GitHub links for our speakers, and a new edition of the Privacy Scholarship Reporter, can be found below.

APRCN – Privacy Scholarship Research Reporter #5