Data Sharing … By Any Other Name
Data Sharing … By Any Other Name (Would Still Be a Complex, Multi-stakeholder Situation)
“It is widely agreed that (data) should be shared, but deciding when and with whom raises questions that are sometimes difficult to answer.”[1]
Data sets held by commercial and government organizations are an increasingly necessary and valuable resource for researchers. Such data may become the evidence in evidence based policymaking[2] or the data used to train artificial intelligence.[3] Some large data sets are controlled by government agencies, non-governmental organizations or academic institutions, but many are being accumulated within the private sector. Academic researchers value access to this data as a way to measure any number of consumer, commercial, and scientific questions at a scale they are unable to reach using conventional research data gathering techniques alone. Such data allows researchers access to information that allows them to answer questions on topics ranging from bias in targeted advertising, to the influence of misinformation on election outcomes, to early diagnosis of diseases through use of health and physiological data collected by fitness and health apps.
The Future of Privacy Forum (FPF) is a longtime advocate for facilitating sharing of data by platforms, providers, apps, and services to the research community. Beginning in 2015, we convened a workshop on “Beyond IRBs: Designing Ethical Review Processes for Big Data Research,” followed by “Bridging Industry and Academia to Tackle Responsible Research and Privacy Practices (2017),” and have since supported the continuing engagement of academics and industry partners through our Research Coordination Network. Whether through our work with policy advocates to support appropriate consideration for research data in emerging state legislation, or through our recent development of an Ethical Data Use Committee, FPF remains firmly committed to responsible data sharing for research.[4]
Recent attention on platform data sharing for research is only one conversation in the cacophony of cross-talk on data sharing. There are many different uses of the term “data sharing” to describe a relationship between parties who share data from one organization to another organization for a new purpose. Some uses of the term data sharing are related to academic and scientific research purposes, and some are related to transfer of data for commercial or government purposes. In this moment, where various types of data sharing are a concern elevated even to the attention of the US Congress and the European Commission[5], it is imperative that we are more precise which forms of sharing we are referencing so that the interests of the parties are adequately considered, and the various risks and benefits are appropriately contextualized and managed. In the table at bottom, we outline a taxonomy for the multiplicity of data sharing relationships.
Ultimately, the relationships between these entities are complex. In many cases, the relationship is 1-to-many, with a single agency or corporation sharing data with multiple researchers and civil society organizations or, as in the case with data trusts or data donation platforms, potentially one person sharing data with many research or commercial organizations through a trusted, intermediate, steward.[6] Likewise, researchers and civil society organizations may concurrently pursue data from multiple corporate or government organizations, in many cases for the ability to address those challenges that require extremely large quantities of data (Big Data) or complex networks of related data. This data flow is never just along a single channel, nor does it often stop after a single transfer. Governments and corporations share data with researchers; researchers return that data, generate new data, and share analysis and new questions and outcomes back around.
Managing these complex relationships requires multi-layered contracts, defined procedures, accountability mechanisms, and other technical and policy controls. The terms for data sharing cover obligations that both parties have, including privacy, ethics, governance, and other good stewardship protocols. Changes in the legislative landscape around data protection, privacy, and security mean that these relationships must adjust periodically to meet legal compliance obligations, on the data sharing or data using side.
At the Future of Privacy Forum, we are working to add context, nuance, and a considered evaluation of the needs of these many players to create guidelines and best practices to support data sharing, particularly for conduct of scientific and evidence-based policy research. What data is shared, under what conditions, controls, contracts, and use environments all have important privacy and governance implications. We have been actively working in this area since 2015, and continue to engage with various interested organizations around the challenges in today’s digital environment. With respect to the sharing of data itself, FPF is focused on finding ways to incorporate proportionate precautions so that any sharing activities adequately protect privacy and are designed with the full understanding of potential harms to the people whose data is transferred or the communities of which they are a part.
Data Sharing Relationships | |||
Data Sharing Organization Type | Data Using Organization Type | Outcome of Data Sharing | Terms to Describe Data Sharing Relationship |
Government Agencies | Researchers and Research Institutions | Researchers conduct evidence based evaluations of public programs | Administrative Data sharing[7] |
Government Agencies | Public Interest and Civil Society Organizations | Citizen scientists and journalists can evaluate public programs and hold government agencies accountable for actions and spending | Open Government Data[8] or Data Transparency[9] |
Private Companies or Corporations | Researchers and Research Institutions | Researchers can evaluate the effects of products, processes, and practices at scale | Data for Good[10]; Corporate Data Sharing[11], Data Altruism |
Private Companies or Corporations | Public Interest and Civil Society Organizations | Citizen scientists and journalists can hold companies and corporations accountable; Citizen scientists can conduct research necessary for community improvement | Data for Good[12], Data Altruism |
Private Companies or Corporations | Private Companies or Corporations | Private Companies can share data between themselves to accomplish mutually beneficial goals, such as improving advertisement or customer segmentation | Data Sharing[13] Sale of Data[14] |
Researchers and Research Institutions | Private Companies or Corporations | Researchers whose work is sponsored by corporations or who have privileged access to corporate data assets return data gathered for future corporate research and, in many cases, retain copies of that data for future scientific work | Return of research data[15], Data Exchange[16] |
Researchers and Research Institutions | Government Agencies | Researchers whose work is sponsored by or conducted under a government contract return data gathered for future agency research and, in many cases, retain copies of that data for future scientific work | Return of research data[17] |
Researchers and Research Institutions | Public Interest and Civil Society Organizations | Citizen groups, journalists, and communities of interest (e.g., patient advocacy groups) can gain access to data about themselves gathered during the research process so that they can use it for future treatment, advocacy, or research participation | Return of research data and/or research results[18] |
Researchers and Research Institutions | Researchers and Research Institutions | Researchers can reuse other researchers’ data or combine their primary and others’ secondary data to answer novel questions without having to put people at risk of research harms by conducting further research with them | Research Data Sharing[19] |
Researchers and Research Institutions | Archives and Repositories | Archives can collect the primary data from multiple researcher to streamline the process of acquiring data to answer novel questions by re-examining data and not putting people at risk for research related harms by conducting further research with them | Research Data Archiving[20] |
Data Stewardship bodies, such as Data Trusts or Data Donation Platforms | Researchers and Research Institutions; Government Agencies, Private Companies or Corporations | Individuals and groups share their data with others according to their interests as specified to and protected by a trusted, fiduciary, actor. | Data Trusts, Data Donation |
[1] HHS Office of Research Integrity, ORI Introduction to RCR. https://ori.hhs.gov/content/Chapter-6-Data-Management-Practices-Data-sharing
[2] H.R.4174 – 115th Congress (2017-2018): Foundations for Evidence-Based Policymaking Act of 2018. (2019, January 14). https://www.congress.gov/bill/115th-congress/house-bill/4174
[3] “The Biden Administration Launches the National Artificial Intelligence Research Resource Task Force”. https://www.whitehouse.gov/ostp/news-updates/2021/06/10/the-biden-administration-launches-the-national-artificial-intelligence-research-resource-task-force/
[4] Goroff, Daniel, Jules Polonetsky, and Omer Tene. (2018). Privacy Protective Research: Facilitating Ethically Responsible Access to Administrative Data. The Annals of Political and Social Science, Vol 675, Issue 1, pp. 46-66. https://doi.org/10.1177/0002716217742605.
Harris, Leslie and Chinmayi Sharma. (2017). Understanding Corporate Data Sharing Decisions: Practices, Challenges, And Opportunities for Sharing Corporate Data with Researchers. Future of Privacy Forum. https://fpf.org/wp-content/uploads/2017/11/FPF_Data_Sharing_Report_FINAL.pdf.
[5] European Commission. (2021). “A European Strategy for Data” https://digital-strategy.ec.europa.eu/en/policies/strategy-data
[6] Open Data Institute. (2020). “Data Trusts in 2020”. https://theodi.org/article/data-trusts-in-2020
[7] https://admindatahandbook.mit.edu/
[8] https://obamawhitehouse.archives.gov/open#:~:text=OPEN%20DATA,more%20 efficient%20and%20transparent%20government.
[9] https://fiscal.treasury.gov/data-transparency/
[10] https://www.sas.com/en_us/data-for-good.html
[11] https://fpf.org/blog/understanding-corporate-data-sharing-decisions-practices-challenges-and-opportunities-for-sharing-corporate-data-with-researchers/
[12] https://dataforgood.ca/
[13] https://www.gartner.com/smarterwithgartner/data-sharing-is-a-business-necessity-to-accelerate-digital-business
[14] https://www.fastcompany.com/90310803/here-are-the-data-brokers-quietly-buying-and-selling-your-personal-information
[15] https://www.jscdm.org/article/id/21/
[16] https://www.cdisc.org/standards/data-exchange
[17] https://www.sbir.gov/tutorials/data-rights/tutorial-2#; https://www.wsgr.com/en/insights/dod-small-business-innovation-research-sbir-contractors-data-rights-protections-extended-to-20-years-government-rights-limited-thereafter.html
[18] https://www.hhs.gov/ohrp/sachrp-committee/recommendations/attachment-b-return-individual-research-results/index.html
[19] https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html; https://osp.od.nih.gov/scientific-sharing/nih-data-management-and-sharing-activities-related-to-public-access-and-open-science/
[20] https://www.nsf.gov/sbe/ses/common/archive.jsp