About

Multimodal LLM Agents — A Self-Paced Course

This is a free, self-paced version of CS 6960: Multimodal LLM Agents, a graduate course taught by Kenneth Marino at the University of Utah. All lecture slides, readings, and exercises are freely available here.

Course Description

This course explores the rapidly developing area of multimodal large language models in embodied settings. Students will learn the foundations of reinforcement learning and large language models to understand how large-scale models can be deployed to multimodal environments.

Topics include control flow and scaffolding for agents (ReAct, tool use), coding agents, game-playing agents, computer use, and robotics. The course combines lecture slides, paper readings, and programming exercises.

Topics Covered

Module	Topics
Introduction & Prerequisites	Course overview, RL basics, LLM basics, VLM basics
Agent Frameworks	ReAct, chain-of-thought, reflection, reasoning
Retrieval and Memory	RAG, dense retrieval, memory-augmented agents
Tool Use	Tool-augmented LMs, Toolformer, function calling, MCP
Code Agents	SWE-bench, coding agent systems, code execution
Agent Evaluation	Benchmarks, evaluation methodology, LLM-as-judge
Assistant Agents	General-purpose agents, web agents, task benchmarks
Game Agents	Game-playing, interactive environments
Computer Use	GUI agents, computer use benchmarks
Robotics	Embodied agents, robot learning

Prerequisites

There are no formal prerequisites, but some familiarity with the following will help:

Machine learning fundamentals
Basics of natural language processing
Python programming

How to Use This Course

This is a self-paced course — there are no deadlines or grades. Work through the modules in order, read the recommended papers, and complete the optional exercises if you’d like hands-on practice.

Each module on the Lectures & Resources page contains lecture slides, recordings, and recommended readings.

Get started → Lectures & Resources

About the Instructor

Kenneth Marino is a faculty member at the Kahlert School of Computing at the University of Utah, where he teaches CS 6960 and researches multimodal AI and agents. This site is maintained as a public companion to that course.