project draft 3 min read

Planning and Learning in Markov Decision Processes

Value Iteration and Policy Iteration on small vs. large MDPs, plus a model-free learner with exploration strategies.
machine-learningpythonreinforcement-learning
Published
The Port of Saint-Tropez by Paul Signac
The Port of Saint-Tropez by Paul Signac
Early Draft

This is an early version of this project write-up. For now it’s largely a placeholder. I’m actively working on it.

Introduction

Reinforcement learning asks a simple question with deep consequences: how should an agent act when outcomes are uncertain and feedback is delayed? Markov Decision Processes give us the formal playground to study that question. This post digs into the concepts covered in the fourth assignment for Georgia Tech’s Machine Learning course. It compares the planner’s toolkit (Value Iteration, Policy Iteration) with a learner’s approach (Q-learning), focusing on what actually changes policies in practice and what to measure when environments scale.

Note

For a broader survey of supervised, unsupervised, and reinforcement learning from the same course, see: Machine Learning: A Retrospective.

Overview

How do we act under uncertainty when we can model the world versus when we must learn from experience? Markov Decision Processes (MDPs) offer a clean lens. We compare two complementary approaches:

  • Planning with a known model: Value Iteration (VI) and Policy Iteration (PI)
  • Learning without a model: a model-free value-based learner (Q-learning) with exploration strategies

MDPs (Conceptual)

  • Small-state MDP: compact state space with stochastic transitions and shaped rewards to examine convergence criteria and policy structure.
  • Large-state MDP: expanded state space via factorization; highlights scaling behavior and the sensitivity of VI/PI to discount and stopping thresholds.
A Note on Code Availability

In accordance with Georgia Tech’s academic integrity policy and the license for course materials, the source code for this project is kept in a private repository. I believe passionately in sharing knowledge, but I also firmly respect the university’s policies. This document follows Dean Joyner’s advice on sharing projects with a focus not on any particular solution and instead on an abstract overview of the problem and the underlying concepts I learned.

I would be delighted to discuss the implementation details, architecture, or specific code sections in an interview. Please feel free to reach out to request private access to the repository.

Table of Contents