By Shea Swauger and Hannah Babinski
Version 1.0 (Updated March, 2025)
The Data Sharing For Research Tracker is a growing list of organizations that make data available for researchers. This Tracker provides information about the available data, access restrictions, and relevant links and is intended to help researchers find data for secondary analysis. It also allows organizations to raise awareness about their data sharing programs and benchmark them against what other organizations offer.
Check out these publications to learn more about why data sharing is important and how to share data for research while maintaining privacy and ethics:
Company | Project | Description | Access | Notes |
Cloudflare | Cloudflare Radar | Live and archived aggregate data views of the Internet from Cloudflare’s perspective, including traffic, attack trends, protocol adoption, outages, routing, connection characteristics, and more. | Open Data | Launched in 2020, data is available as visualizations on the Radar site, also in greater detail via the API and Data Explorer feature. |
Ford | Autonomous Vehicle Dataset | Sensor data collected by a fleet of Ford autonomous vehicles on different days and times during 2017-2018. | Open Data | System requirements, dependencies, and instructions are available on GitHub. |
Open Images | Around nine million images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narrative. | Open Data | Can be used for training computer vision, machine learning, and AI model development. | |
Research Datasets | 160+ datasets related to published research by Google. | Open Data | All data comes with documentation and license information. | |
kaggle | Users can find and publish datasets, facilitating data exploration and model building within a web-based environment. | Open Data | Google has 9 datasets of its own and is the parent company of kaggle. | |
Hugging Face | Google shares hundreds of its datasets related to machine learning, classification, benchmarking, etc. | Open Data, though some platform features may cost money. | The platform hosts AI models, applications, and datasets from anyone with an account. | |
Honda | Honda Research Institute | 13 datasets on traffic scene understanding, prediction, driver modeling, motion planning, and related topics. | For non-commercial usage. Must be affiliated with a university. | Each dataset has its own request form for access. |
Johnson & Johnson | Yale University Open Data Access (YODA) Project | Anonymized clinical trial data is available for sharing 18 months after study completion. All research data is available through a secure analysis environment and accessed through a VPN but cannot be downloaded directly. | Data Intermediary: researchers must submit a Research Proposal and follow YODA’s data request process. | Researchers must be affiliated with an academic institution, research or healthcare organization, or government agency with some form of IRB and can sign a Data Use Agreement. |
Meta | Meta Content Library | Near real-time public content from Facebook and Instagram. Has a web-based user interface in which to explore data, test out search parameters, and assess whether the resulting data is appropriate for your planned research. | Individuals can apply for access to the tools with the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan: SOMAR Application Guidance Document | Applicants must be affiliated with a qualified academic institution or a qualified research institution. Begin application here. |
Content Library API | Near real-time public content from Facebook and Instagram. Can programmatically pull data from the same public content library. | Same as Meta Content Library. More eligibility information is here. | Data pulled from the API can be analyzed in a clean room environment. | |
23 & Me | Publication Dataset Access Program | Summary statistics from data published in genome-wide association studies (GWAS), including a subset of COVID-19 GWAS. | Submit a publication dataset access request form, a statement of work form, and a data transfer agreement. | Submitters need to be associate professors at a non-profit institution. |
To have information added or changed or for any questions, please contact Shea Swauger.