Google: COVID-19 Community Mobility Reports
Google has been recognized with the second-annual FPF Award for Research Data Stewardship for its work to produce, aggregate, anonymize, and share data on community movement during the COVID-19 pandemic. Google’s Community Mobility Reports go through a robust anonymization process that employs differential privacy techniques to ensure that personal data, including an individual’s location, movement, or contacts, cannot be derived from the metrics, while providing researchers and public health authorities with valuable insights to help inform official decision making.
As part of their award submission, Google submitted details about an example research collaboration with researchers from Boston University, Harvard University, and Brown University, which evaluated the impacts of state-level policies on mobility and subsequent COVID-19 case trajectories. Ultimately, researchers found that states with mobility policies experienced substantial reductions in time people spent away from their places of residence. That was ultimately connected to decreases in COVID-19 case growth.
Google was also recognized for a related project – the Google COVID-19 Aggregated Mobility Research Dataset – centered around the same underlying anonymized data with small differences in the privacy protections and procedures used. For the purposes of this award, we have combined both Google projects to produce a series of considerations for future data-sharing projects.
“As the COVID-19 crisis emerged, Google moved to support public health officials and researchers with resources to help manage the spread,” said Dr. Karen DeSalvo, Chief Health Officer, Google Health. “We heard from the public health community that mobility data could help provide them an understanding of whether people were social distancing to interrupt the spread. Given the sensitivity of mobility data, we needed to deliver this information in a privacy preserving way, and we’re honored to be recognized by FPF for our approach.”
The Research Project
Since the beginning of the pandemic and during most of 2020, social distancing remained the primary mitigation strategy to combat the spread of COVID-19 in the United States.In responsetorequests from public health officials to provide aggregated, anonymized insights on community movement that could be used to make critical decisions to combat COVID-19, Google set up Community Mobility Reports to provide insights into what has changed in response to policies aimed at combating COVID-19. The reports chart movement trends over time by geography, across different categories of places such as retail and recreation, groceries, pharmacies, parks, transit stations, workplaces, and residential. To date, the aggregated, anonymized data sets have been heavily used for scientific research and economic analysis, as well as informing policy making by national and local governments and inter-governmental organizations.
Google’s approach to privacy was illustrated by the company’s collaboration with Prof. Gregory Wellenius, Boston University School of Public Health’s Department of Environmental Health, Dr. Thomas Tsai, Brigham and Women’s Hospital Department of Surgery and Harvard T.H. Chan School of Public Health’s Department of Health Policy and Management, and Dr. Ashish Jha, Dean of Brown University’s School of Public Health. The researchers evaluated the impacts of specific state-level policies on mobility and subsequent COVID-19 case trajectories using anonymized and aggregated mobility data from Google users who had opted-in to share their data for research. Then they correlated the decreases in mobility tied to state-level policies with changes in the number of reported COVID-19 cases. The project produced the following insights:
- State-level emergency declarations resulted in a 9.9% reduction in time spent away from places of residence.
- Implementation of one or more social distancing policies resulted in an additional 24.5% reduction in mobility the following week.
- Subsequent shelter-in-place mandates yielded an additional 29% reduction in mobility.
- Decreases in mobility were associated with substantial reductions in case growth two to four weeks later.
Google was also recognized for a related research project, the Google COVID-19 Aggregated Mobility Research Dataset. In addition to the COVID-19 Community Mobility Reports data, which were made publicly available online, this dataset was shared with specific, qualified researchers for the sole purpose of studying the effects of COVID-19. The research was shared with qualified individual researchers (those with proven track records in studying epidemiology, public health, or infectious disease) that accepted the data under contractual commitments to use the data ethically while maintaining privacy. Google was also able to share more detailed mobility data with these researchers while keeping strong mathematical privacy protections in place.
Data Protection Procedures and Processes in the Google COVID-19 Mobility Reports & Google COVID-19 Aggregated Mobility Research Dataset
- Protocol Development, Partner Criteria, and Agreements. Given the sensitive nature of the data, Google developed strict, technical privacy protocols and stringent partner criteria for the Aggregated Mobility Research Dataset to determine how and with whom to share an aggregated version of the underlying data. Data sharing agreements were offered only to well-established non-governmental researchers with proven publication records in epidemiology, public health, or infectious disease, and the scope of research was limited to studying the effects of COVID-19.
- Generating Anonymized Metrics. The anonymization process for the COVID-19 Mobility Reports involves differential privacy, a technical process that intentionally adds random noise to metrics in a way that maintains both users’ privacy and the overall accuracy of the aggregated data. Differential privacy represents an important step in the aggregation and anonymization process. The metrics produced through the differential privacy process are then used to assess relative percentage changes in movement behavior for each day from a baseline and those percentage changes are subsequently published by Google.
- Aggregation of Data. The metrics are aggregated per day and per geographic area. There are three levels of geographic areas, referred to as granularity levels, including metrics aggregated by country or region (level 0), metrics aggregated by top-level geopolitical subdivisions like states (level 1), and metrics aggregated by higher-resolution granularity like counties (level 2).
- Discarding Anonymized, but Geographically Attributable, Data. In addition to the privacy protections implemented through the differential privacy process, Google discards all metrics for which the geographic region is smaller than 3km2, or for which the differentially private count of contributing users (after noise addition) is smaller than 100.
- Pre-Publication Review. Due to the sensitivity of the COVID-19 Aggregated Mobility Research Dataset, Google reviews all research involving this dataset prior to publication, including those without Google attribution. This is done to ensure that they describe the dataset and its limitations properly, and that researchers don’t use the dataset improperly, for example, by combining datasets that may lead to the re-identification of individual users.
Lessons for Future Data-Sharing Projects
Google’s COVID-19 Mobility Reports and Google COVID-19 Aggregated Mobility Research Dataset projects highlight a number of valuable lessons that companies and academic institutions may apply to future data sharing collaborations.
- Develop Robust Partner Criteria. Upon launching the Google COVID-19 Aggregated Mobility Research Dataset project,Google established strict criteria for research partners outside of government in order to ensure that academic researchers are proven stewards of privacy-protective research with established records in epidemiology, public health, and/or infectious disease. By developing stringent protocols for their academic partners, Google worked to ensure that data is used responsibly and only for the study of the effects of COVID-19.
- Consider Differential Privacy. Google’s COVID-19 Mobility Reports data sharing project employed differential privacy to provide mathematical assurances that no individual user data could be manually inspected, studied, or re-identified. The mathematical process that underlines the differential privacy process adds random noise to metrics in a manner that ensures both user privacy and the overall accuracy of the data, which are essential given the use cases of the data.
- Share Aggregated Data. By aggregating data by day and geographic location, Google provided further assurances that location and behavior could not be attributed to any single individual, protecting their privacy while providing valuable insights to researchers and public health authorities. The Google team set a geographic threshold for aggregated data, such that data that has been aggregated into geographic regions smaller than 3km2 was discarded.
- Tailor Formats & Privacy Protections to Your Audience. The team knew that mobility data could provide a variety of insights in different contexts. Rather than choosing a single application, they tailored their privacy protections to meet the needs of both a publicly available data set and one that could be shared under the terms of a specific agreement.
The Selection Process:
Nominees for the Award for Research Data Stewardship were judged by an Award Committee comprised of representatives from FPF, leading foundations, academics, and industry leaders. The Award Committee evaluated projects based on several factors, including their adherence to privacy protection in the sharing process, the quality of the data handling process, and the company’s commitment to supporting the academic research.