Data Foundations of AI

A community and resource hub for understanding, improving, and governing data in modern AI.

Science of training data Data-centric algorithms Data & society Science of evaluation

Our Goal

Modern AI systems are shaped as much by their data as by their architectures and training algorithms. While data has traditionally been treated as a secondary artifact in AI research, its central role in determining model behavior, capabilities, and failure modes has become increasingly clear. Our goal is to foster an active research community focused on data as a first-class foundation of AI, systematically understanding, optimizing, and governing the data for better fueling AI development.

Scope

We focus on: (i) scientific understanding of training data, (ii) data-centric algorithms, (iii) data and society (economy, governance, etc), and (iv) science of benchmarks and evaluation.

Get involved

We welcome participation from researchers in academia and industry at all career stages. There are many ways to get involved: join discussions in our Slack workspace, give a talk in our seminar series, suggest relevant resources, or help organize community events. Whether you're just getting started or are an active researcher in the field, we'd love to have you contribute.

Our Efforts

๐ŸŽ™๏ธ Seminar Series

Bi-weekly public talks on training data science, methods, governance, and evaluation.

Learn more โ†’

๐Ÿ“š Resources

Curated tutorials, surveys, libraries, datasets, and events.

Browse resources โ†’