CPSC 477/577 | Schedule

Date	Lecture	Readings	Logistics
Tue 01/14/25	Lecture #1: Course Introduction Logistics [ slides ]
Thu 01/16/25	Lecture #2: Word embeddings and vector semantics [ slides ]	Main readings: Jurafsky & Martin Chapter 6
Tue 01/21/25	Lecture #3: Word embeddings and vector semantics (cont.) Sparse representations Dense representations [ slides ]	Main readings: Jurafsky & Martin Chapter 6 Optional readings: Distributed Representation of Words and Phrases and their Compositionality (Mikolov et al., 2013) [link] Efficient Estimation of Word Representations in Vector Space (Mikolov et al., 2013) [link] Word2vec Explained- deriving Mikolov et al.'s negative-sampling word-embedding method (Goldberg and Levy, 2014) [link]
Thu 01/23/25	Lecture #4: Deriving the gradient of Word2vec Evaluation of word embeddings [ slides (annotated) ]	Main readings: Jurafsky & Martin Chapter 6 Distributed Representation of Words and Phrases and their Compositionality (Mikolov et al., 2013) [link]	HW1 out
Thu 01/30/25	Lecture #5: N-Gram Language Models Smoothing Evaluation of Language Models [ slides ]	Main readings: Jurafsky & Martin Chapter 7
Friday 01/31/25	Lecture #6: Neural network basics Autograd [ slides ]	Main readings: The Matrix Calculus You Need For Deep Learning (Terence Parr and Jeremy Howard) [link] Little book of deep learning (François Fleuret) - Ch 3, 4
Tue 02/04/25	Lecture #7: Auto Grad Building blocks of Deep Learning for NLP CNNs [ slides ]	Main readings: Goldberg Chapter 9
Thu 02/06/25	Lecture #8: CNNs (contd.) RNNs Task specific neural network architectures Machine translation [ slides ]	Main readings: Understanding LSTM Networks (Christopher Olah) [link] Eisenstein, Chapter 18 Optional readings: Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]
Tue 02/11/25	Lecture #9: RNNs (contd.) Training sequence models Machine translation (contd.) [ slides ]	Main readings: Statistical Machine Translation (Koehn) [link] Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link] Learning to Align and Translate with Attention (Bahdanau et al., 2015) [link] Luong et al. (2015) Effective Approaches to Attention-based Neural Machine Translation [link] Attention is All You Need (Vaswani et al., 2017) [link] Illustrated Transformer [link]	Project teams due on 02/09
Thu 02/13/25	Lecture #10: Attention Transformers [ slides (annotated) ]	Main readings: Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link] Learning to Align and Translate with Attention (Bahdanau et al., 2015) [link] Luong et al. (2015) Effective Approaches to Attention-based Neural Machine Translation [link] Attention is All You Need (Vaswani et al., 2017) [link] Illustrated Transformer [link]
Tue 02/18/25	Lecture #11: Transformers (contd.) Language modeling with Transformers Transfer Learning [ slides ]	Main readings: Illustrated Transformer [link] Attention is All You Need (Vaswani et al., 2017) [link] The Annotated Transformer (Harvard NLP) [link] GPT-2 (Radford et al., 2019) [link]	HW 1 due / HW 2 out
Thu 02/20/25	Lecture #12: Transfer Learning (contd.) Objective functions for pre-training Encoder-decoder pretrained models Architecture and pretraining objectives [ slides ]	Main readings: The Illustrated BERT, ELMo, and co. (Jay Alammar) [link] BERT- Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2018) [link] GPT-2 (Radford et al., 2019) [link] T5- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2020) [link] BART- Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (Lewis et al., 2019) [link] What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? (Wang et al, 2022) [link]
Tue 02/25/25	Lecture #13: Decoding and generation Large language models and impact of scale In-context learning and prompting [ slides ]	Main readings: The Curious Case of Neural Text Degeneration (Holtzman et al., 2019) [link] How to generate text- using different decoding methods for language generation with Transformers [link] Scaling Laws for Neural Language Models (Kaplan et al., 2020) [link] Training Compute-Optimal Large Language Models (Hoffmann et al., 2022) [link] GPT3 paper - Language Models are Few-Shot Learners (Brown et al., 2020) [link]
Thu 02/27/25	Midterm Exam 1
Thu 03/04/25	Lecture #14: Language Models for Code Guest lecturer: Valerie Chen, Carnegie Mellon University [ slides ]	Main readings: Efficient Training of Language Models to Fill in the Middle (Bavarian et al., 2022) [link] A Survey on Large Language Models for Code Generation (Jiang et al., 2024) [link] Code Llama- Open Foundation Models for Code (Roziere et al., 2023) <a href="https://arxiv.org/pdf/2308.12950", target="_blank">[link]</a> L2CEval- Evaluating Language-to-Code Generation Capabilities of Large Language Models (Ni et al., 2024) [link]	HW 2 due
Tue 03/06/25	Lecture #15: Post-training Supervised Finetuning Instruction Following [ slides ]	Main readings: Multitask Prompted Training Enables Zero-Shot Task Generalization (Sanh et al., 2021) [link] Scaling Instruction-Finetuned Language Models (Chung et al., 2022) [link] Are Emergent Abilities of Large Language Models a Mirage? (Sha et al., 2023) [link] Emergent Abilities of Large Language Models (Wei et al., 2022) [link]	HW2 due on 03/06 Project proposals due 03/08;
03/07/25 - 03/24/25	Spring recess - No classes
Tue 03/25/25	Lecture #16: Data for Languge Models Data curation methods Impact of data on LLMs Data-centric NLP Guest lecturer: Kyle Lo, Allen Institute for AI (Ai2) [ slides ]	Main readings: OLMo- Accelerating the Science of Language Models [link] DOLMA- An Open Corpus of Three Trillion Tokens for Language Model Pretraining Research [link] Organize the Web- Constructing Domains Enhances Pre-Training Data Curation [link] Molmo and PixMo- Open Weights and Open Data for State-of-the-Art Vision-Language Models [link]
Thu 03/27/25	Lecture #17: Post-training Reinforcement learning from Human Feedback Alignment [ slides ]	Main readings: Training language models to follow instructions with human feedback (Ouyang et al., 2022) [link] Fine-Tuning Language Models from Human Preferences (Ziegler et al., 2019) [link] Direct Preference Optimization- Your Language Model is Secretly a Reward Model (Rafailov et al., 2023) [link] RLAIF- Scaling Reinforcement Learning from Human Feedback with AI Feedback (Lee et al., 2023) [link]
Tue 04/01/25	Lecture #18: Post-training (contd...) [ slides ]		HW 3 out
Thu 04/03/25	Lecture #19: Evaluation of natural language generation systems LLM evaluations Guest lecturer: Yixin Liu, Yale University [ slides ]
Tue 04/08/25	Lecture #20: Retrieval Augmented Generation (RAG) [ slides ]
Thu 04/10/25	Midterm Exam 2
Thu 04/15/25	Lecture #21: NLP in Robotics Guest lecturer: Chayan Sarkar, TCS	Main readings: tagE- Enabling an Embodied Agent to Understand Human Instructions, (Sarkar et al, 2023) [link] DoRO- Disambiguation of Referred Object for Embodied Agents (Pramanick et al, 2022) [link] Enabling Human-Like Task Identification From Natural Conversation (Pramanick et al, 2020) [link]
Tue 04/17/25	Lecture #22: RAG continued, Intro to Agent-based systems
Tue 04/22/25	Lecture #23: Project presentations session 1		Final project presentations
Thu 04/24/25	Lecture #24: Project presentations session 2		Final project presentations, HW 3 due on 4/28, Final project reports due on 5/1

Tue 01/14/25

Lecture #1:

Course Introduction
Logistics

[ slides ]

Thu 01/16/25

Lecture #2:

Word embeddings and vector semantics

[ slides ]

Main readings:

Jurafsky & Martin Chapter 6

Tue 01/21/25

Lecture #3:

Word embeddings and vector semantics (cont.)
Sparse representations
Dense representations

[ slides ]

Main readings:

Jurafsky & Martin Chapter 6

Optional readings:

Distributed Representation of Words and Phrases and their Compositionality (Mikolov et al., 2013) [link]
Efficient Estimation of Word Representations in Vector Space (Mikolov et al., 2013) [link]
Word2vec Explained- deriving Mikolov et al.'s negative-sampling word-embedding method (Goldberg and Levy, 2014) [link]

Thu 01/23/25

Lecture #4:

Deriving the gradient of Word2vec
Evaluation of word embeddings

[ slides (annotated) ]

Main readings:

Jurafsky & Martin Chapter 6
Distributed Representation of Words and Phrases and their Compositionality (Mikolov et al., 2013) [link]

HW1 out

Thu 01/30/25

Lecture #5:

N-Gram Language Models
Smoothing
Evaluation of Language Models

[ slides ]

Main readings:

Jurafsky & Martin Chapter 7

Friday 01/31/25

Lecture #6:

Neural network basics
Autograd

[ slides ]

Main readings:

The Matrix Calculus You Need For Deep Learning (Terence Parr and Jeremy Howard) [link]
Little book of deep learning (François Fleuret) - Ch 3, 4

Tue 02/04/25

Lecture #7:

Auto Grad
Building blocks of Deep Learning for NLP
CNNs

[ slides ]

Main readings:

Goldberg Chapter 9

Thu 02/06/25

Lecture #8:

CNNs (contd.)
RNNs
Task specific neural network architectures
Machine translation

[ slides ]

Main readings:

Understanding LSTM Networks (Christopher Olah) [link]
Eisenstein, Chapter 18

Optional readings:

Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]

Tue 02/11/25

Lecture #9:

RNNs (contd.)
Training sequence models
Machine translation (contd.)

[ slides ]

Main readings:

Statistical Machine Translation (Koehn) [link]
Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]
Learning to Align and Translate with Attention (Bahdanau et al., 2015) [link]
Luong et al. (2015) Effective Approaches to Attention-based Neural Machine Translation [link]
Attention is All You Need (Vaswani et al., 2017) [link]
Illustrated Transformer [link]

Project teams due on 02/09

Thu 02/13/25

Lecture #10:

Attention
Transformers

[ slides (annotated) ]

Main readings:

Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]
Learning to Align and Translate with Attention (Bahdanau et al., 2015) [link]
Luong et al. (2015) Effective Approaches to Attention-based Neural Machine Translation [link]
Attention is All You Need (Vaswani et al., 2017) [link]
Illustrated Transformer [link]

Tue 02/18/25

Lecture #11:

Transformers (contd.)
Language modeling with Transformers
Transfer Learning

[ slides ]

Main readings:

Illustrated Transformer [link]
Attention is All You Need (Vaswani et al., 2017) [link]
The Annotated Transformer (Harvard NLP) [link]
GPT-2 (Radford et al., 2019) [link]

HW 1 due / HW 2 out

Thu 02/20/25

Lecture #12:

Transfer Learning (contd.)
Objective functions for pre-training
Encoder-decoder pretrained models
Architecture and pretraining objectives

[ slides ]

Main readings:

The Illustrated BERT, ELMo, and co. (Jay Alammar) [link]
BERT- Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2018) [link]
GPT-2 (Radford et al., 2019) [link]
T5- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2020) [link]
BART- Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (Lewis et al., 2019) [link]
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? (Wang et al, 2022) [link]

Tue 02/25/25

Lecture #13:

Decoding and generation
Large language models and impact of scale
In-context learning and prompting

[ slides ]

Main readings:

The Curious Case of Neural Text Degeneration (Holtzman et al., 2019) [link]
How to generate text- using different decoding methods for language generation with Transformers [link]
Scaling Laws for Neural Language Models (Kaplan et al., 2020) [link]
Training Compute-Optimal Large Language Models (Hoffmann et al., 2022) [link]
GPT3 paper - Language Models are Few-Shot Learners (Brown et al., 2020) [link]

Thu 02/27/25

Midterm Exam 1

Thu 03/04/25

Lecture #14:

Language Models for Code

Guest lecturer:
Valerie Chen, Carnegie Mellon University

[ slides ]

Main readings:

Efficient Training of Language Models to Fill in the Middle (Bavarian et al., 2022) [link]
A Survey on Large Language Models for Code Generation (Jiang et al., 2024) [link]
Code Llama- Open Foundation Models for Code (Roziere et al., 2023) <a href="https://arxiv.org/pdf/2308.12950", target="_blank">[link]</a>
L2CEval- Evaluating Language-to-Code Generation Capabilities of Large Language Models (Ni et al., 2024) [link]

HW 2 due

Tue 03/06/25

Lecture #15:

Post-training
Supervised Finetuning
Instruction Following

[ slides ]

Main readings:

Multitask Prompted Training Enables Zero-Shot Task Generalization (Sanh et al., 2021) [link]
Scaling Instruction-Finetuned Language Models (Chung et al., 2022) [link]
Are Emergent Abilities of Large Language Models a Mirage? (Sha et al., 2023) [link]
Emergent Abilities of Large Language Models (Wei et al., 2022) [link]

HW2 due on 03/06

Project proposals due 03/08;

03/07/25 - 03/24/25

Spring recess - No classes

Tue 03/25/25

Lecture #16:

Data for Languge Models
Data curation methods
Impact of data on LLMs
Data-centric NLP

Guest lecturer:
Kyle Lo, Allen Institute for AI (Ai2)

[ slides ]

Main readings:

OLMo- Accelerating the Science of Language Models [link]
DOLMA- An Open Corpus of Three Trillion Tokens for Language Model Pretraining Research [link]
Organize the Web- Constructing Domains Enhances Pre-Training Data Curation [link]
Molmo and PixMo- Open Weights and Open Data for State-of-the-Art Vision-Language Models [link]

Thu 03/27/25

Lecture #17:

Post-training
Reinforcement learning from Human Feedback
Alignment

[ slides ]

Main readings:

Training language models to follow instructions with human feedback (Ouyang et al., 2022) [link]
Fine-Tuning Language Models from Human Preferences (Ziegler et al., 2019) [link]
Direct Preference Optimization- Your Language Model is Secretly a Reward Model (Rafailov et al., 2023) [link]
RLAIF- Scaling Reinforcement Learning from Human Feedback with AI Feedback (Lee et al., 2023) [link]

Tue 04/01/25

Lecture #18:

Post-training (contd...)

[ slides ]

HW 3 out

Thu 04/03/25

Lecture #19:

Evaluation of natural language generation systems
LLM evaluations

Guest lecturer:
Yixin Liu, Yale University

[ slides ]

Tue 04/08/25

Lecture #20:

Retrieval Augmented Generation (RAG)

[ slides ]

Thu 04/10/25

Midterm Exam 2

Thu 04/15/25

Lecture #21:

NLP in Robotics

Guest lecturer:
Chayan Sarkar, TCS

Main readings:

tagE- Enabling an Embodied Agent to Understand Human Instructions, (Sarkar et al, 2023) [link]
DoRO- Disambiguation of Referred Object for Embodied Agents (Pramanick et al, 2022) [link]
Enabling Human-Like Task Identification From Natural Conversation (Pramanick et al, 2020) [link]

Tue 04/17/25

Lecture #22:

RAG continued, Intro to Agent-based systems

Tue 04/22/25

Lecture #23:

Project presentations session 1

Final project presentations

Thu 04/24/25

Lecture #24:

Project presentations session 2

Final project presentations, HW 3 due on 4/28,
Final project reports due on 5/1