Date Lecture Readings Logistics
Tue 01/13/26 Lecture #1:
  • Course Introduction
  • Logistics
[ slides ]

Thu 01/15/26 Lecture #2:
  • Word embeddings and vector semantics
[ slides ]
Main readings:
  • Jurafsky & Martin Chapter 6

Tue 01/20/26 Lecture #3:
  • Word embeddings and vector semantics (cont.)
  • Sparse representations
  • Dense representations
[ slides ]
Main readings:
  • Jurafsky & Martin Chapter 6
Optional readings:
  • Distributed Representation of Words and Phrases and their Compositionality (Mikolov et al., 2013) [link]
  • Efficient Estimation of Word Representations in Vector Space (Mikolov et al., 2013) [link]
  • Word2vec Explained- deriving Mikolov et al.'s negative-sampling word-embedding method (Goldberg and Levy, 2014) [link]

HW 1 out

Thu 01/22/26 Lecture #4:
  • Deriving the gradient of Word2vec
  • Evaluation of word embeddings
[ slides ]
Main readings:
  • Jurafsky & Martin Chapter 6
  • Distributed Representation of Words and Phrases and their Compositionality (Mikolov et al., 2013) [link]

Tue 01/27/26 Lecture #5:
  • N-Gram Language Models
  • Smoothing
  • Evaluation of Language Models
[ slides ]
Main readings:
  • Jurafsky & Martin Chapter 7

Thu 01/29/26 Lecture #6:
  • Neural network basics
  • Autograd
[ slides ]
Main readings:
  • The Matrix Calculus You Need For Deep Learning (Terence Parr and Jeremy Howard) [link]
  • Little book of deep learning (François Fleuret) - Ch 3, 4

Tue 02/03/26 Lecture #7:
  • Auto Grad
  • Building blocks of Deep Learning for Language Modeling
  • CNNs
[ slides ]
Main readings:
  • Goldberg Chapter 9

Thu 02/05/26 Lecture #8:
  • RNNs
  • Task specific neural network architectures
  • Training RNNs
  • Machine translation
  • Attention
[ slides ]
Main readings:
  • Understanding LSTM Networks (Christopher Olah) [link]
  • Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]
  • Learning to Align and Translate with Attention (Bahdanau et al., 2015) [link]
  • Statistical Machine Translation (Koehn) [link]
Optional readings:
  • Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]

Tue 02/10/26 Lecture #9:
  • Transformers
[ slides ]
Main readings:
  • Luong et al. (2015) Effective Approaches to Attention-based Neural Machine Translation [link]
  • Attention is All You Need (Vaswani et al., 2017) [link]
  • Illustrated Transformer [link]

Project teams due on 02/09

HW1 due 02/10

Thu 02/12/26 Lecture #10:
  • Attention
  • Transformers
[ slides ]
Main readings:
  • Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]
  • Learning to Align and Translate with Attention (Bahdanau et al., 2015) [link]
  • Luong et al. (2015) Effective Approaches to Attention-based Neural Machine Translation [link]
  • Attention is All You Need (Vaswani et al., 2017) [link]
  • Illustrated Transformer [link]

02/14
HW 2 out

Tue 02/17/26 Lecture #11:
  • Tokenization
  • Language modeling with Transformers
  • Early language models
  • Transfer Learning
[ slides ]
Main readings:
  • Illustrated Transformer [link]
  • Attention is All You Need (Vaswani et al., 2017) [link]
  • The Annotated Transformer (Harvard NLP) [link]
  • GPT-2 (Radford et al., 2019) [link]

Thu 02/19/26 Lecture #12:
  • Transfer Learning (contd.)
  • Objective functions for pre-training
  • Encoder-decoder pretrained models
  • Architecture and pretraining objectives
[ slides ]
Main readings:
  • The Illustrated BERT, ELMo, and co. (Jay Alammar) [link]
  • BERT- Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2018) [link]
  • GPT-2 (Radford et al., 2019) [link]
  • T5- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2020) [link]
  • BART- Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (Lewis et al., 2019) [link]
  • What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? (Wang et al, 2022) [link]

Tue 02/24/26 Lecture #13:
  • Decoding and generation
  • Large language models and impact of scale
  • In-context learning and prompting
[ slides ]
Main readings:
  • The Curious Case of Neural Text Degeneration (Holtzman et al., 2019) [link]
  • How to generate text- using different decoding methods for language generation with Transformers [link]
  • Scaling Laws for Neural Language Models (Kaplan et al., 2020) [link]
  • Training Compute-Optimal Large Language Models (Hoffmann et al., 2022) [link]
  • GPT3 paper - Language Models are Few-Shot Learners (Brown et al., 2020) [link]

Thu 02/26/26 Midterm Exam 1

Tue 03/03/26 Lecture #14:
  • Post-training
  • Supervised Finetuning
  • Instruction Following
[ slides ]
Main readings:
  • Multitask Prompted Training Enables Zero-Shot Task Generalization (Sanh et al., 2021) [link]
  • Scaling Instruction-Finetuned Language Models (Chung et al., 2022) [link]
  • Are Emergent Abilities of Large Language Models a Mirage? (Sha et al., 2023) [link]
  • Emergent Abilities of Large Language Models (Wei et al., 2022) [link]

HW 2 due 03/04

Thu 03/05/26 Lecture #15:
  • From Assistants to Collaborators: Building Agents for Long-Form, Open-ended Collaboration
Guest lecturer:
Shannon Zeijiang Shen, MIT
Photo of Shannon Zeijiang Shen
Main readings:
  • Talk abstract: The ever-increasing capabilities of language models call for rethinking how we build agents: moving from assistants that passively help with simple, well-defined tasks, to collaborators that proactively support users in long-form, exploratory work — and continue learning alongside them. In this talk, I will present a systematic view of what it takes to build collaborator agents, organized around three questions. First, what is the right objective for building collaborator agents, and how should we evaluate whether collaboration succeeds? Second, how do we train agents to operate in settings where goals are open-ended and evolve over time? And third, what architectural improvements to language models are needed to make collaboration practical? I will share lessons learned in our recent projects in this direction, including training deep research agents for open-ended long-form generation with evolving rubrics and enabling interleaved latent and text chain of thought for more efficient long-form reasoning in language models.

Project proposals due 03/06

03/06/26 - 03/22/26 Spring recess - No classes

Tue 03/24/26 Lecture #16:
  • Guest Lecture 2: TBD

Thu 03/26/26 Lecture #17:
  • Post-training
  • Reinforcement learning from Human Feedback
  • Alignment
Main readings:
  • Training language models to follow instructions with human feedback (Ouyang et al., 2022) [link]
  • Fine-Tuning Language Models from Human Preferences (Ziegler et al., 2019) [link]
  • Direct Preference Optimization- Your Language Model is Secretly a Reward Model (Rafailov et al., 2023) [link]
  • RLAIF- Scaling Reinforcement Learning from Human Feedback with AI Feedback (Lee et al., 2023) [link]

Tue 03/31/26 Lecture #18:
  • Post-training (contd...)

HW 3 out

Thu 04/02/26
Guest lecturer:
Orion Weller, Johns Hopkins
Photo of Orion Weller

Tue 04/07/26 Lecture #20:
  • Retrieval Augmented Generation (RAG)

Thu 04/09/26 Midterm Exam 2

Tue 04/14/26 Lecture #21:
  • Guest Lecture 4: TBD

Thu 04/16/26 Lecture #22:
  • RAG continued, Intro to Agent-based systems

Tue 04/21/26 Lecture #23:
  • Project presentations session 1

Final project presentations

Thu 04/23/26 Lecture #24:
  • Project presentations session 2

Final project presentations, HW 3 due on 4/27,
Final project reports due on 4/30