Date Lecture Readings Logistics
Tue 01/14/25 Lecture #1:
  • Course Introduction
  • Logistics

Thu 01/16/25 Lecture #2:
  • Word embeddings and vector semantics

Tue 01/21/25 Lecture #3:
  • Word embeddings and vector semantics (cont.)

Thu 01/23/25 Lecture #4:
  • Basics of Neural Networks and Language Model Training
Main readings:
  • The Matrix Calculus You Need For Deep Learning (Terence Parr and Jeremy Howard) [link]
  • Little book of deep learning (François Fleuret) - Ch 3

Tue 01/28/25 Lecture #5:
  • Autograd
  • Building blocks of Neural Networks
  • Convolutional layers
  • Network layers and optimizers
Main readings:
  • Little book of deep learning (François Fleuret) - Ch 4

Thu 01/30/25 Lecture #6:
  • Building blocks of Neural Networks for NLP
  • Taks specific neural network architectures
  • RNNs
[ slides ]
Main readings:
  • Goldberg Chapter 9

Tue 02/04/25 Lecture #7:
  • RNNs (contd.)
  • Machine translation
Main readings:
  • Understanding LSTM Networks (Christopher Olah) [link]
  • Eisenstein, Chapter 18
Optional readings:
  • Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]

Thu 02/06/25 Lecture #8:
  • Machine translation (contd.)
  • Attention
  • Transformers
Main readings:
  • Statistical Machine Translation (Koehn) [link]
  • Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]
  • Learning to Align and Translate with Attention (Bahdanau et al., 2015) [link]
  • Luong et al. (2015) Effective Approaches to Attention-based Neural Machine Translation [link]
  • Attention is All You Need (Vaswani et al., 2017) [link]
  • Illustrated Transformer [link]

Tue 02/11/25 Lecture #9:
  • Machine translation (contd.)
  • Attention
  • Transformers
Main readings:
  • Statistical Machine Translation (Koehn) [link]
  • Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]
  • Learning to Align and Translate with Attention (Bahdanau et al., 2015) [link]
  • Luong et al. (2015) Effective Approaches to Attention-based Neural Machine Translation [link]
  • Attention is All You Need (Vaswani et al., 2017) [link]
  • Illustrated Transformer [link]

Thu 02/13/25 Lecture #10:
  • Transformers (cont'd.)
  • Language modeling with Transformers
Main readings:
  • Illustrated Transformer [link]
  • Attention is All You Need (Vaswani et al., 2017) [link]
  • The Annotated Transformer (Harvard NLP) [link]
  • GPT-2 (Radford et al., 2019) [link]

Tue 02/18/25 Lecture #11:
  • Pre-training and transfer learning
  • Objective functions for pre-training
  • Model architectures
  • ELMO, BERT, GPT, T5
Main readings:
  • The Illustrated BERT, ELMo, and co. (Jay Alammar) [link]
  • BERT- Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2018) [link]
  • GPT-2 (Radford et al., 2019) [link]

Tue 02/20/25 Lecture #12:
  • Transfer learning (contd.)
  • Encoder-decoder pretrained models
  • Architecture and pretraining objectives
Main readings:
  • T5- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2020) [link]
  • BART- Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (Lewis et al., 2019) [link]
  • What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? (Wang et al, 2022) [link]

Thu 02/25/25 Midterm Exam 1

Thu 02/27/25 Lecture #13:
  • Decoding and generation
  • Large language models and impact of scale
  • In-context learning and prompting
Main readings:
  • The Curious Case of Neural Text Degeneration (Holtzman et al., 2019) [link]
  • How to generate text- using different decoding methods for language generation with Transformers [link]
  • Scaling Laws for Neural Language Models (Kaplan et al., 2020) [link]
  • Training Compute-Optimal Large Language Models (Hoffmann et al., 2022) [link]
  • GPT3 paper - Language Models are Few-Shot Learners (Brown et al., 2020) [link]

Tue 03/04/25 Lecture #14:
  • In-context learning and prompting (cont'd)
  • Improving instruction following and few-shot learning
Main readings:
  • Few-Shot Learning with Language Models (Brown et al., 2020) [link]
  • Finetuned Language Models Are Zero-Shot Learners (Wei et al., 2022) [link]
  • Multitask Prompted Training Enables Zero-Shot Task Generalization (Sanh et al., 2021) [link]
  • Scaling Instruction-Finetuned Language Models (Chung et al., 2022) [link]
  • Are Emergent Abilities of Large Language Models a Mirage? (Sha et al., 2023) [link]
  • Emergent Abilities of Large Language Models (Wei et al., 2022) [link]

03/07/25 - 03/24/25 Spring recess - No classes

Tue 03/25/25 Lecture #15:
  • Post-training
  • Reinforcement learning from Human Feedback
  • Alignment
Main readings:
  • Training language models to follow instructions with human feedback (Ouyang et al., 2022) [link]
  • Fine-Tuning Language Models from Human Preferences (Ziegler et al., 2019) [link]
  • Direct Preference Optimization- Your Language Model is Secretly a Reward Model (Rafailov et al., 2023) [link]
  • RLAIF- Scaling Reinforcement Learning from Human Feedback with AI Feedback (Lee et al., 2023) [link]

Thu 03/27/25 Lecture #16:
  • Post-training (cont'd)

Tue 04/01/25 Midterm Exam 2

Thu 04/03/25 Lecture #17:
  • Evaluation

Tue 04/08/25 Lecture #18:
  • Parameter-efficient Fine-Tuning

Thu 04/10/25 Lecture #19:
  • Safety
  • Noncomplience

Tue 04/15/25 Lecture #20:
  • Agent-based systems

Thu 04/17/25
Guest lecturer:
TBD,

Tue 04/22/25
Guest lecturer:
TBD,

Thu 04/24/25
Guest lecturer:
TBD,