Date Lecture Readings Logistics
Tue 01/14/25 Lecture #1:
  • Course Introduction
  • Logistics
[ slides ]

Thu 01/16/25 Lecture #2:
  • Word embeddings and vector semantics
[ slides ]
Main readings:
  • Jurafsky & Martin Chapter 6

Tue 01/21/25 Lecture #3:
  • Word embeddings and vector semantics (cont.)
  • Sparse representations
  • Dense representations
[ slides ]
Main readings:
  • Jurafsky & Martin Chapter 6
Optional readings:
  • Distributed Representation of Words and Phrases and their Compositionality (Mikolov et al., 2013) [link]
  • Efficient Estimation of Word Representations in Vector Space (Mikolov et al., 2013) [link]
  • Word2vec Explained- deriving Mikolov et al.'s negative-sampling word-embedding method (Goldberg and Levy, 2014) [link]

Thu 01/23/25 Lecture #4:
  • Deriving the gradient of Word2vec
  • Evaluation of word embeddings
[ slides (annotated) ]
Main readings:
  • Jurafsky & Martin Chapter 6
  • Distributed Representation of Words and Phrases and their Compositionality (Mikolov et al., 2013) [link]

HW1 out

Thu 01/30/25 Lecture #5:
  • N-Gram Language Models
  • Smoothing
  • Evaluation of Language Models
[ slides ]
Main readings:
  • Jurafsky & Martin Chapter 7

Friday 01/31/25 Lecture #6:
  • Neural network basics
  • Autograd
[ slides ]
Main readings:
  • The Matrix Calculus You Need For Deep Learning (Terence Parr and Jeremy Howard) [link]
  • Little book of deep learning (François Fleuret) - Ch 3, 4

Tue 02/04/25 Lecture #7:
  • Auto Grad
  • Building blocks of Deep Learning for NLP
  • CNNs
[ slides ]
Main readings:
  • Goldberg Chapter 9

Thu 02/06/25 Lecture #8:
  • CNNs (contd.)
  • RNNs
  • Task specific neural network architectures
  • Machine translation
[ slides ]
Main readings:
  • Understanding LSTM Networks (Christopher Olah) [link]
  • Eisenstein, Chapter 18
Optional readings:
  • Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]

Tue 02/11/25 Lecture #9:
  • RNNs (contd.)
  • Training sequence models
  • Machine translation (contd.)
[ slides ]
Main readings:
  • Statistical Machine Translation (Koehn) [link]
  • Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]
  • Learning to Align and Translate with Attention (Bahdanau et al., 2015) [link]
  • Luong et al. (2015) Effective Approaches to Attention-based Neural Machine Translation [link]
  • Attention is All You Need (Vaswani et al., 2017) [link]
  • Illustrated Transformer [link]

Project teams due on 02/09

Thu 02/13/25 Lecture #10:
  • Attention
  • Transformers
[ slides (annotated) ]
Main readings:
  • Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]
  • Learning to Align and Translate with Attention (Bahdanau et al., 2015) [link]
  • Luong et al. (2015) Effective Approaches to Attention-based Neural Machine Translation [link]
  • Attention is All You Need (Vaswani et al., 2017) [link]
  • Illustrated Transformer [link]

Tue 02/18/25 Lecture #11:
  • Transformers (cont'd.)
  • Language modeling with Transformers
[ slides ]
Main readings:
  • Illustrated Transformer [link]
  • Attention is All You Need (Vaswani et al., 2017) [link]
  • The Annotated Transformer (Harvard NLP) [link]
  • GPT-2 (Radford et al., 2019) [link]

HW 1 due / HW 2 out

Thu 02/20/25 Lecture #12:
  • Pre-training and transfer learning
  • Objective functions for pre-training
  • Model architectures
  • ELMO, BERT, GPT, T5
Main readings:
  • The Illustrated BERT, ELMo, and co. (Jay Alammar) [link]
  • BERT- Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2018) [link]
  • GPT-2 (Radford et al., 2019) [link]

Tue 02/25/25 Lecture #13:
  • Transfer learning (contd.)
  • Encoder-decoder pretrained models
  • Architecture and pretraining objectives
Main readings:
  • T5- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2020) [link]
  • BART- Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (Lewis et al., 2019) [link]
  • What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? (Wang et al, 2022) [link]

Thu 02/27/25 Midterm Exam 1

Tue 03/04/25 Lecture #14:
  • Decoding and generation
  • Large language models and impact of scale
  • In-context learning and prompting
Main readings:
  • The Curious Case of Neural Text Degeneration (Holtzman et al., 2019) [link]
  • How to generate text- using different decoding methods for language generation with Transformers [link]
  • Scaling Laws for Neural Language Models (Kaplan et al., 2020) [link]
  • Training Compute-Optimal Large Language Models (Hoffmann et al., 2022) [link]
  • GPT3 paper - Language Models are Few-Shot Learners (Brown et al., 2020) [link]

Project proposals due on 03/01

Thu 03/06/25 Lecture #15:
  • Language Models for Code
Guest lecturer:
Valerie Chen, Carnegie Mellon University
Photo of Valerie Chen

HW 2 due

03/07/25 - 03/24/25 Spring recess - No classes

Tue 03/25/25 Lecture #16:
  • Learning from Instructions
  • Few-shot Learning
Guest lecturer:
Kyle Lo, Allen Institute for AI (Ai2)
Photo of Kyle Lo
Main readings:
  • Few-Shot Learning with Language Models (Brown et al., 2020) [link]
  • Finetuned Language Models Are Zero-Shot Learners (Wei et al., 2022) [link]
  • Multitask Prompted Training Enables Zero-Shot Task Generalization (Sanh et al., 2021) [link]
  • Scaling Instruction-Finetuned Language Models (Chung et al., 2022) [link]
  • Are Emergent Abilities of Large Language Models a Mirage? (Sha et al., 2023) [link]
  • Emergent Abilities of Large Language Models (Wei et al., 2022) [link]

Thu 03/27/25 Lecture #17:
  • Post-training
  • Reinforcement learning from Human Feedback
  • Alignment
Main readings:
  • Training language models to follow instructions with human feedback (Ouyang et al., 2022) [link]
  • Fine-Tuning Language Models from Human Preferences (Ziegler et al., 2019) [link]
  • Direct Preference Optimization- Your Language Model is Secretly a Reward Model (Rafailov et al., 2023) [link]
  • RLAIF- Scaling Reinforcement Learning from Human Feedback with AI Feedback (Lee et al., 2023) [link]

Tue 04/01/25 Lecture #18:
  • Post-training (cont'd)

HW 3 out

Thu 04/03/25 Lecture #19:
  • Evaluation of natural language generation systems
  • LLM evaluations
Guest lecturer:
Yixin Liu, Yale University
Photo of Yixin Liu

Tue 04/08/25 Lecture #20:
  • Agent-based systems

Thu 04/10/25 Midterm Exam 2

Tue 04/15/25 Lecture #21:
  • Parameter-efficient Fine-Tuning

Thu 04/17/25 Lecture #22:
  • Project presentations session 1

Final project presentations session 1

Tue 04/22/25 Lecture #23:
  • Project presentations session 2

Final project presentations session 2

Thu 04/24/25 Lecture #24:
  • Safety
  • Noncomplience
Guest lecturer:
TBD,

HW3 due, Final project reports due on 5/1