About
Multimodal LLM Agents — A Self-Paced Course
This is a free, self-paced version of CS 6960: Multimodal LLM Agents, a graduate course taught by Kenneth Marino at the University of Utah. All lecture slides, readings, and exercises are freely available here.
Course Description
This course explores the rapidly developing area of multimodal large language models in embodied settings. Students will learn the foundations of reinforcement learning and large language models to understand how large-scale models can be deployed to multimodal environments.
Topics include control flow and scaffolding for agents (ReAct, tool use), coding agents, game-playing agents, computer use, and robotics. The course combines lecture slides, paper readings, and programming exercises.
Topics Covered
| Module | Topics |
|---|---|
| Introduction & Prerequisites | Course overview, RL basics, LLM basics, VLM basics |
| Agent Frameworks | ReAct, chain-of-thought, reflection, reasoning |
| Retrieval and Memory | RAG, dense retrieval, memory-augmented agents |
| Tool Use | Tool-augmented LMs, Toolformer, function calling, MCP |
| Code Agents | SWE-bench, coding agent systems, code execution |
| Agent Evaluation | Benchmarks, evaluation methodology, LLM-as-judge |
| Assistant Agents | General-purpose agents, web agents, task benchmarks |
| Game Agents | Game-playing, interactive environments |
| Computer Use | GUI agents, computer use benchmarks |
| Robotics | Embodied agents, robot learning |
Prerequisites
There are no formal prerequisites, but some familiarity with the following will help:
- Machine learning fundamentals
- Basics of natural language processing
- Python programming
How to Use This Course
This is a self-paced course — there are no deadlines or grades. Work through the modules in order, read the recommended papers, and complete the optional exercises if you’d like hands-on practice.
Each module on the Lectures & Resources page contains lecture slides, recordings, and recommended readings.
Get started → Lectures & Resources
Related Courses
- Duke Course — Building Intelligent Agents with Frontier Models
- Berkeley Course — Advanced Large Language Model Agents
- MOOC — Advanced Large Language Model Agents
About the Instructor
Kenneth Marino is a faculty member at the Kahlert School of Computing at the University of Utah, where he teaches CS 6960 and researches multimodal AI and agents. This site is maintained as a public companion to that course.