CPSC 4770/5770 | Schedule

Date	Lecture	Readings	Logistics
Tue 01/13/26	Lecture #1: Course Introduction Logistics [ slides ]
Thu 01/15/26	Lecture #2: Word embeddings and vector semantics [ slides ]	Main readings: Jurafsky & Martin Chapter 6
Tue 01/20/26	Lecture #3: Word embeddings and vector semantics (cont.) Sparse representations Dense representations [ slides ]	Main readings: Jurafsky & Martin Chapter 6 Optional readings: Distributed Representation of Words and Phrases and their Compositionality (Mikolov et al., 2013) [link] Efficient Estimation of Word Representations in Vector Space (Mikolov et al., 2013) [link] Word2vec Explained- deriving Mikolov et al.'s negative-sampling word-embedding method (Goldberg and Levy, 2014) [link]	HW 1 out
Thu 01/22/26	Lecture #4: Deriving the gradient of Word2vec Evaluation of word embeddings [ slides ]	Main readings: Jurafsky & Martin Chapter 6 Distributed Representation of Words and Phrases and their Compositionality (Mikolov et al., 2013) [link]
Tue 01/27/26	Lecture #5: N-Gram Language Models Smoothing Evaluation of Language Models [ slides ]	Main readings: Jurafsky & Martin Chapter 7
Thu 01/29/26	Lecture #6: Neural network basics Autograd [ slides ]	Main readings: The Matrix Calculus You Need For Deep Learning (Terence Parr and Jeremy Howard) [link] Little book of deep learning (François Fleuret) - Ch 3, 4
Tue 02/03/26	Lecture #7: Auto Grad Building blocks of Deep Learning for Language Modeling CNNs [ slides ]	Main readings: Goldberg Chapter 9
Thu 02/05/26	Lecture #8: RNNs Task specific neural network architectures Training RNNs Machine translation Attention [ slides ]	Main readings: Understanding LSTM Networks (Christopher Olah) [link] Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link] Learning to Align and Translate with Attention (Bahdanau et al., 2015) [link] Statistical Machine Translation (Koehn) [link] Optional readings: Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]
Tue 02/10/26	Lecture #9: Transformers [ slides ]	Main readings: Luong et al. (2015) Effective Approaches to Attention-based Neural Machine Translation [link] Attention is All You Need (Vaswani et al., 2017) [link] Illustrated Transformer [link]	Project teams due on 02/09 HW1 due 02/10
Thu 02/12/26	Lecture #10: Attention Transformers [ slides ]	Main readings: Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link] Learning to Align and Translate with Attention (Bahdanau et al., 2015) [link] Luong et al. (2015) Effective Approaches to Attention-based Neural Machine Translation [link] Attention is All You Need (Vaswani et al., 2017) [link] Illustrated Transformer [link]	02/14 HW 2 out
Tue 02/17/26	Lecture #11: Tokenization Language modeling with Transformers Early language models Transfer Learning [ slides ]	Main readings: Illustrated Transformer [link] Attention is All You Need (Vaswani et al., 2017) [link] The Annotated Transformer (Harvard NLP) [link] GPT-2 (Radford et al., 2019) [link]
Thu 02/19/26	Lecture #12: Transfer Learning (contd.) Objective functions for pre-training Encoder-decoder pretrained models Architecture and pretraining objectives [ slides ]	Main readings: The Illustrated BERT, ELMo, and co. (Jay Alammar) [link] BERT- Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2018) [link] GPT-2 (Radford et al., 2019) [link] T5- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2020) [link] BART- Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (Lewis et al., 2019) [link] What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? (Wang et al, 2022) [link]
Tue 02/24/26	Lecture #13: Decoding and generation Large language models and impact of scale In-context learning and prompting [ slides ]	Main readings: The Curious Case of Neural Text Degeneration (Holtzman et al., 2019) [link] How to generate text- using different decoding methods for language generation with Transformers [link] Scaling Laws for Neural Language Models (Kaplan et al., 2020) [link] Training Compute-Optimal Large Language Models (Hoffmann et al., 2022) [link] GPT3 paper - Language Models are Few-Shot Learners (Brown et al., 2020) [link]
Thu 02/26/26	Midterm Exam 1
Tue 03/03/26	Lecture #14: Post-training Supervised Finetuning Instruction Following [ slides ]	Main readings: Multitask Prompted Training Enables Zero-Shot Task Generalization (Sanh et al., 2021) [link] Scaling Instruction-Finetuned Language Models (Chung et al., 2022) [link] Are Emergent Abilities of Large Language Models a Mirage? (Sha et al., 2023) [link] Emergent Abilities of Large Language Models (Wei et al., 2022) [link]	HW 2 due 03/04
Thu 03/05/26	Lecture #15: From Assistants to Collaborators: Building Agents for Long-Form, Open-ended Collaboration Guest lecturer: Shannon Zeijiang Shen, MIT	Main readings: Talk abstract: The ever-increasing capabilities of language models call for rethinking how we build agents: moving from assistants that passively help with simple, well-defined tasks, to collaborators that proactively support users in long-form, exploratory work — and continue learning alongside them. In this talk, I will present a systematic view of what it takes to build collaborator agents, organized around three questions. First, what is the right objective for building collaborator agents, and how should we evaluate whether collaboration succeeds? Second, how do we train agents to operate in settings where goals are open-ended and evolve over time? And third, what architectural improvements to language models are needed to make collaboration practical? I will share lessons learned in our recent projects in this direction, including training deep research agents for open-ended long-form generation with evolving rubrics and enabling interleaved latent and text chain of thought for more efficient long-form reasoning in language models.	Project proposals due 03/06
03/06/26 - 03/22/26	Spring recess - No classes
Tue 03/24/26	Lecture #16: Guest Lecture 2: TBD
Thu 03/26/26	Lecture #17: Post-training Reinforcement learning from Human Feedback Alignment	Main readings: Training language models to follow instructions with human feedback (Ouyang et al., 2022) [link] Fine-Tuning Language Models from Human Preferences (Ziegler et al., 2019) [link] Direct Preference Optimization- Your Language Model is Secretly a Reward Model (Rafailov et al., 2023) [link] RLAIF- Scaling Reinforcement Learning from Human Feedback with AI Feedback (Lee et al., 2023) [link]
Tue 03/31/26	Lecture #18: Post-training (contd...)		HW 3 out
Thu 04/02/26	Guest lecturer: Orion Weller, Johns Hopkins
Tue 04/07/26	Lecture #20: Retrieval Augmented Generation (RAG)
Thu 04/09/26	Midterm Exam 2
Tue 04/14/26	Lecture #21: Guest Lecture 4: TBD
Thu 04/16/26	Lecture #22: RAG continued, Intro to Agent-based systems
Tue 04/21/26	Lecture #23: Project presentations session 1		Final project presentations
Thu 04/23/26	Lecture #24: Project presentations session 2		Final project presentations, HW 3 due on 4/27, Final project reports due on 4/30

Tue 01/13/26

Lecture #1:

Course Introduction
Logistics

[ slides ]

Thu 01/15/26

Lecture #2:

Word embeddings and vector semantics

[ slides ]

Main readings:

Jurafsky & Martin Chapter 6

Tue 01/20/26

Lecture #3:

Word embeddings and vector semantics (cont.)
Sparse representations
Dense representations

[ slides ]

Main readings:

Jurafsky & Martin Chapter 6

Optional readings:

Distributed Representation of Words and Phrases and their Compositionality (Mikolov et al., 2013) [link]
Efficient Estimation of Word Representations in Vector Space (Mikolov et al., 2013) [link]
Word2vec Explained- deriving Mikolov et al.'s negative-sampling word-embedding method (Goldberg and Levy, 2014) [link]

HW 1 out

Thu 01/22/26

Lecture #4:

Deriving the gradient of Word2vec
Evaluation of word embeddings

[ slides ]

Main readings:

Jurafsky & Martin Chapter 6
Distributed Representation of Words and Phrases and their Compositionality (Mikolov et al., 2013) [link]

Tue 01/27/26

Lecture #5:

N-Gram Language Models
Smoothing
Evaluation of Language Models

[ slides ]

Main readings:

Jurafsky & Martin Chapter 7

Thu 01/29/26

Lecture #6:

Neural network basics
Autograd

[ slides ]

Main readings:

The Matrix Calculus You Need For Deep Learning (Terence Parr and Jeremy Howard) [link]
Little book of deep learning (François Fleuret) - Ch 3, 4

Tue 02/03/26

Lecture #7:

Auto Grad
Building blocks of Deep Learning for Language Modeling
CNNs

[ slides ]

Main readings:

Goldberg Chapter 9

Thu 02/05/26

Lecture #8:

RNNs
Task specific neural network architectures
Training RNNs
Machine translation
Attention

[ slides ]

Main readings:

Understanding LSTM Networks (Christopher Olah) [link]
Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]
Learning to Align and Translate with Attention (Bahdanau et al., 2015) [link]
Statistical Machine Translation (Koehn) [link]

Optional readings:

Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]

Tue 02/10/26

Lecture #9:

Transformers

[ slides ]

Main readings:

Luong et al. (2015) Effective Approaches to Attention-based Neural Machine Translation [link]
Attention is All You Need (Vaswani et al., 2017) [link]
Illustrated Transformer [link]

Project teams due on 02/09

HW1 due 02/10

Thu 02/12/26

Lecture #10:

Attention
Transformers

[ slides ]

Main readings:

Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]
Learning to Align and Translate with Attention (Bahdanau et al., 2015) [link]
Luong et al. (2015) Effective Approaches to Attention-based Neural Machine Translation [link]
Attention is All You Need (Vaswani et al., 2017) [link]
Illustrated Transformer [link]

02/14
HW 2 out

Tue 02/17/26

Lecture #11:

Tokenization
Language modeling with Transformers
Early language models
Transfer Learning

[ slides ]

Main readings:

Illustrated Transformer [link]
Attention is All You Need (Vaswani et al., 2017) [link]
The Annotated Transformer (Harvard NLP) [link]
GPT-2 (Radford et al., 2019) [link]

Thu 02/19/26

Lecture #12:

Transfer Learning (contd.)
Objective functions for pre-training
Encoder-decoder pretrained models
Architecture and pretraining objectives

[ slides ]

Main readings:

The Illustrated BERT, ELMo, and co. (Jay Alammar) [link]
BERT- Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2018) [link]
GPT-2 (Radford et al., 2019) [link]
T5- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2020) [link]
BART- Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (Lewis et al., 2019) [link]
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? (Wang et al, 2022) [link]

Tue 02/24/26

Lecture #13:

Decoding and generation
Large language models and impact of scale
In-context learning and prompting

[ slides ]

Main readings:

The Curious Case of Neural Text Degeneration (Holtzman et al., 2019) [link]
How to generate text- using different decoding methods for language generation with Transformers [link]
Scaling Laws for Neural Language Models (Kaplan et al., 2020) [link]
Training Compute-Optimal Large Language Models (Hoffmann et al., 2022) [link]
GPT3 paper - Language Models are Few-Shot Learners (Brown et al., 2020) [link]

Thu 02/26/26

Midterm Exam 1

Tue 03/03/26

Lecture #14:

Post-training
Supervised Finetuning
Instruction Following

[ slides ]

Main readings:

Multitask Prompted Training Enables Zero-Shot Task Generalization (Sanh et al., 2021) [link]
Scaling Instruction-Finetuned Language Models (Chung et al., 2022) [link]
Are Emergent Abilities of Large Language Models a Mirage? (Sha et al., 2023) [link]
Emergent Abilities of Large Language Models (Wei et al., 2022) [link]

HW 2 due 03/04

Thu 03/05/26

Lecture #15:

From Assistants to Collaborators: Building Agents for Long-Form, Open-ended Collaboration

Guest lecturer:
Shannon Zeijiang Shen, MIT

Main readings:

Talk abstract: The ever-increasing capabilities of language models call for rethinking how we build agents: moving from assistants that passively help with simple, well-defined tasks, to collaborators that proactively support users in long-form, exploratory work — and continue learning alongside them. In this talk, I will present a systematic view of what it takes to build collaborator agents, organized around three questions. First, what is the right objective for building collaborator agents, and how should we evaluate whether collaboration succeeds? Second, how do we train agents to operate in settings where goals are open-ended and evolve over time? And third, what architectural improvements to language models are needed to make collaboration practical? I will share lessons learned in our recent projects in this direction, including training deep research agents for open-ended long-form generation with evolving rubrics and enabling interleaved latent and text chain of thought for more efficient long-form reasoning in language models.

Project proposals due 03/06

03/06/26 - 03/22/26

Spring recess - No classes

Tue 03/24/26

Lecture #16:

Guest Lecture 2: TBD

Thu 03/26/26

Lecture #17:

Post-training
Reinforcement learning from Human Feedback
Alignment

Main readings:

Training language models to follow instructions with human feedback (Ouyang et al., 2022) [link]
Fine-Tuning Language Models from Human Preferences (Ziegler et al., 2019) [link]
Direct Preference Optimization- Your Language Model is Secretly a Reward Model (Rafailov et al., 2023) [link]
RLAIF- Scaling Reinforcement Learning from Human Feedback with AI Feedback (Lee et al., 2023) [link]

Tue 03/31/26

Lecture #18:

Post-training (contd...)

HW 3 out

Thu 04/02/26

Guest lecturer:
Orion Weller, Johns Hopkins

Tue 04/07/26

Lecture #20:

Retrieval Augmented Generation (RAG)

Thu 04/09/26

Midterm Exam 2

Tue 04/14/26

Lecture #21:

Guest Lecture 4: TBD

Thu 04/16/26

Lecture #22:

RAG continued, Intro to Agent-based systems

Tue 04/21/26

Lecture #23:

Project presentations session 1

Final project presentations

Thu 04/23/26

Lecture #24:

Project presentations session 2

Final project presentations, HW 3 due on 4/27,
Final project reports due on 4/30