Overcoming Hurdles to Effective Data Sharing for Researchers
In 2021, challenges faced by academics in accessing corporate data sets for research and the issues that companies were experiencing to make privacy-respecting research data available broke into the news. With its long history of research data sharing, FPF saw an opportunity to bring together leaders from the corporate, research, and policy communities for a conversation to pave a way forward on this critical issue. We held a series of four engaging dinner-time conversations to listen and learn from the myriad voices invested in research data sharing. Together, we explored what it will take to create a low-friction, high-efficacy, trusted, safe, ethical, and accountable environment for research data sharing.
FPF formed an expert program committee to set the agenda for the discussion series. The committee guided our selection of topics to discuss, helped identify talented experts to present their views, and introduced FPF to new and salient stakeholders to the research data sharing conversation. The four virtual dinners were held on Thursday, November 4, November 16, December 2, and December 18. Below are significant points of discussion from each event.
The Landscape of Data Sharing
During the first dinner discussion, participants emphasized the importance of reviewing research for ethical soundness and methodological rigor. Many highlighted the challenges of performing consistent and fair ethical and methodological reviews given corporate and research stakeholders’ different expectations and capabilities. FPF has explored this dynamic in the past: both companies and researchers operate with a responsibility to the public that requires technical, ethical, and organizational work to fulfill. The ability of critical stakeholders, including consumers themselves, to articulate the clear and practical steps they take to build trusted public engagement in data sharing varies widely.
Participants offered that one of the key steps necessary to improve public and stakeholder trust in data sharing is to improve education for all parties on the topic. In particular, current efforts should be revised and expanded to more intuitively explain data collection, stewardship, hygiene, interoperability, and the differences in corporate and researchers’ data needs and expectations. Participants suggested improving consumers’ digital literacy so that consent to collecting or using personal data can be more meaningful and dynamic.
Research Ethics and Integrity for a Data Sharing Environment
During our second dinner, two topics emerged. First, participants pointed out how regulations and organizational rules limit the ability of institutions to superintend the ethical, technical, and administrative reviews called for in discussions of data sharing.
Second, the participants honed in on data de-identification and anonymization as critical components of ethical and technical review of proposed data uses for research. While variations in the interpretation of research ethics regulations and norms by Institutional Review Boards (IRBs) lead to an inconsistent and shifting landscape for researchers and companies, the expert panelists pointed out that the variation between IRBs is not as significant as the variation between regulatory controls for research governed by federal restrictions (the Common Rule) and those applied to commercial research under consumer protection laws.
Several participants advocated for a comprehensive U.S. federal data privacy law to equalize institutional variations, eliminate gaps between consumer data protection and research data protections, and clarify protections for research uses of commercial data. Efforts to close such regulatory gaps would require educating all stakeholders, including legislators, researchers, data scientists, and companies’ data protection officers, about the relative differences between risks around research data and risks associated with commercial use or breach of consumer data.
While participants recommended comprehensive privacy legislation as an ideal, serious consideration was paid to the role that specific agency rule-making efforts could play in this space. One of the topics for rulemaking was the concept of data anonymization. Participants considered how to achieve agreement on the ethical imperative for data anonymization. They identified some important steps toward anonymization, such as developing a more agreeable definition of “anonymous” that could be implemented by the many different parties involved in the research data sharing process and providing essential technical support to achieve the expected standards of data anonymization.
The Challenges of Sharing Data from the Perspective of Corporations
During our third dinner, the discussion focused on assessing researchers’ fitness to access an organization’s data. We also discussed evaluating research projects in light of public interest expectations. There was widespread agreement that data sharing is vital for various reasons, such as promoting the next generation of scientific breakthroughs and holding companies publicly accountable. On the other hand, there was disagreement on ensuring that data is available for research and that individuals’ privacy is continuously protected.
Some asserted that privacy was being used as an argument by companies to protect their interests and that it is not as tricky a standard to achieve as is described. Others disagreed with this assessment, saying that they always assumed the worst when it came to the efficacy of privacy protections.
There are also technical and social barriers to democratizing access to corporate data for research. Participants pointed out that technical barriers can be low bars, like file size and type, or high barriers, such as overcoming data fragmentation, including personnel expertise when reviewing projects, building and maintaining shareable data, and managing sector-specific privacy legislation that governs what companies must do to achieve existing data privacy requirements.
Social barriers were discussed as high bars, like limiting access to researchers affiliated with the “right” institutions. Participants discussed how to sufficiently democratize know-how to expand corporate data-sharing and build and maintain the trusted network relationships critical for facilitating data sharing across various parts of the researcher-company environment. Consent reemerged as both a technical and social barrier to data sharing. In particular, participants addressed the problem of securing consumers’ meaningful consent for the use of data in unforeseen but beneficial research use cases that may arise far in the future.
Legislation, Regulation, and Standardization in Data Sharing
During the final dinner conversation, participants tackled the challenging issues of legislation, regulation, and standardization in the research data sharing environment. There was broad agreement that there should be standards for data sharing to make the process more accessible and data more usable. Most participants agreed that data should be FAIR and harmonious. Still, there was disagreement over what field or institution is a good model for this (economics, astronomy, and the US Census were discussed as possibilities).
There was agreement that researchers should meet a certain standard to be given access, but this must be done carefully to avoid creating tiers of first and second-class researchers. The discussion highlighted the importance of having shared standards, vocabulary, terminology, and expectations about the amount of data and supporting material to be transferred.
Interoperability of terms, ontologies, and expectations was another concern flagged throughout the dinner; merely having data available to researchers does not guarantee that they can use it. There was disagreement about what kind of role the National Institutes of Standards and Technology (NIST), the Federal Trade Commission (FTC), and the National Science Foundation (NSF), or researchers’ professional institutions should play or if all of them should play a role in enforcing these standards.
Having access to the code used to process data represents another barrier to research. It isn’t easy to replicate experiments and make discoveries without interoperability and code sharing. There was agreement that an unethical side of data use could complicate any efforts to create positive benefits. Those challenges include zombie data, predatory publication outlets, rogue analysts, and restricting access to research that may have national security implications.
Some Topics Came Up Repeatedly
Persistent topics of discussion throughout the dinners that should be addressed through future legislative or regulatory efforts included: ensuring data quality, data storage requirements (i.e., whether data resides with the firm or with a third party), the incentive structure for academics to share their data with other scholars and with companies, and the emerging role for synthetic data as a method for sharing valuable data representation without transferring the customers’ actual specific and sensitive data.
The series also tackled challenging privacy questions in general, such as: are there special considerations for sharing the data of children or teens (or other vulnerable or protected classes)? Is there a role for funders and publishers to more strongly require documentation for verifying accountability around the use of shared data? Is there a need for involvement by the Office of Research Integrity (ORI) and research misconduct investigators in the supervision of research data sharing?
Next steps toward Responsible Research Data Sharing
In the coming weeks and months, FPF will work with participants in the dinner series to consolidate the knowledge shared during the salon series into a “Playbook for Responsible Data Sharing for Research.” Developed for corporate data protection officers and their counterparts in research institutions, this playbook will cover:
- the contracting, capacity-stabilization, and accountability-assurances that should govern research projects using shared data;
- managing review of ethics and research project design while respecting research independence review the design of research projects using shared data;
- the challenges that researchers must surmount to access and use shared data resources;
- the need for effective communication of the findings from such research projects.
We look forward to sharing the “Playbook for Responsible Data Sharing for Research” with the FPF community and our many new friends and partners from the research community in the early months of 2022. Follow FPF on LinkedIn and Twitter, and subscribe to email to receive notification of its release.