Seminar Series

A bi-weekly public seminar on the Data Foundations of AI.

📅 Format

45 minutes: invited talk
15 minutes: Q&A

🎙️ Nominate a Speaker

Click this link to add the seminar calendar to your Google Calendar.

Join our mailing list for latest announcements (click the "Join group" button in Google Groups).

📅 Upcoming Seminars

Time	Speaker	Topic	Host	Link
7 PM UTC \| 2 PM ET Mar 3, 2026	Dylan Zhang	More Fruitful SFT by Respecting The Learner's Distribution ▼	Junwei	Zoom Link
More Fruitful SFT by Respecting The Learner's Distribution Abstract Classic supervised fine-tuning (SFT) ignores the learner. It treats supervision as universally valid, even when the training data differs substantially from what the model itself would produce — a mismatch that has proven troublesome for LLM post-training in a variety of ways. Recent work on on-policy distillation and self-distillation fine-tuning has similarly argued that effective supervision must respect the learner's own policy. In this talk, I present two works built around that single principle: supervision should be aligned with the learner's distribution. Both implement it as a simple modification to standard SFT. GRAPE addresses this from a data selection perspective. For each instruction, it selects the response with the highest probability under the target model from a pool of existing candidates, using only a forward pass. Models trained on GRAPE-curated data outperform multiple strong baselines while being lightweight and scalable. When SFT is followed by online RL, we show that stronger SFT (and variants) checkpoints can paradoxically underperform weaker ones after RL — because standard SFT optimizes for offline performance in isolation, without accounting for the on-policy distribution that RL will explore during its own rollouts. PEAR extends this idea to the setting where SFT is followed by online RL. We first show that stronger SFT checkpoints can paradoxically underperform weaker ones after RL — because standard SFT optimizes for offline performance in isolation, without accounting for the on-policy distribution that RL will later explore. PEAR addresses this by reweighting the loss on each response according to its importance weight: how likely the target policy is to produce that response. We further show that this correction can operate at finer granularities, reweighting individual tokens based on how likely the continuation from that point in the offline data would be under the target policy. This importance-sampling correction, inspired by off-policy evaluation in RL, bridges the gap between the static SFT dataset and the dynamic on-policy distribution, yielding consistent post-RL gains. Both methods operationalize the same insight — that effective supervision must be shaped by the learner's own distribution — through complementary mechanisms: GRAPE by selecting responses the model trains on, PEAR by reweighting how much it learns. Together, they demonstrate that simple, policy-aware corrections can improve the effectiveness of SFT. Bio Dylan Zhang is a Ph.D. student in Computer Science at the University of Illinois Urbana-Champaign (UIUC), advised by Prof. Hao Peng. His research focuses on large language model (LLM) post-training, particularly on developing offline training algorithms for efficient and effective model alignment. More broadly, he is interested in understanding the behavior, generalization, and inductive biases of large language models—how they learn from data, adapt through supervision, and exhibit emergent capabilities.
Mar 10, 2026	Nicholas Roberts	Scaling Laws and Evaluation	Brian	Zoom Link
Mar 24, 2026	Mayee Chen	Data Mixing	Brian	Zoom Link
Apr 7, 2026	Luxi He	Data in LLM Lifecycles	Junwei	Zoom Link
Apr 21, 2026	Haizhong Zheng	Efficient RL for LLMs	Brian	Zoom Link
Apr 28, 2026	Elisa Nguyen	TDA and Trustworthy AI	Junwei	Zoom Link

Note: Times are provided in UTC and ET. Convert to your timezone before joining. Zoom links will be shared closer to the event date.

📚 Past Seminars

Date	Speaker	Topic	Host	Recording
Past seminars will be listed here after our first event.

We record talks when speakers consent. Recordings will be made available here.

Seminar Series

📅 Format

🎙️ Nominate a Speaker

📅 Upcoming Seminars

More Fruitful SFT by Respecting The Learner's Distribution

Abstract

Bio

📚 Past Seminars