From the basics to deep reinforcement learning, this repo provides easy-to-read code examples. One file for each algorithm. Please feel free to create a Pull Request, or open an issue!
Grid World (1-grid-world/)
- Policy Iteration —
1-policy_iteration.py - Value Iteration —
2-value_iteration.py - SARSA —
3-sarsa.py - Q-Learning —
4-q_learning.py - Deep SARSA —
5-deep_sarsa.py - REINFORCE —
6-reinforce.py
CartPole (2-cartpole/)
Requires Python 3.11 and uv.
git clone <this repo>
cd reinforcement-learning
uv sync# Grid World
cd 1-grid-world && uv run python 3-sarsa.py
# CartPole — train
cd 2-cartpole && uv run python 1-dqn.py
# CartPole — watch training (slower)
cd 2-cartpole && uv run python 1-dqn.py --render
# CartPole — replay a trained checkpoint
cd 2-cartpole && uv run python 1-dqn.py --testBoth Atari scripts (1-dqn.py, 2-ppo.py) can stream training metrics to your own Weights & Biases account. One-time login, then pass --wandb:
uv run wandb login # paste the API key from https://wandb.ai/authorize
cd 3-atari && uv run python 2-ppo.py --env breakout --wandb
cd 3-atari && uv run python 1-dqn.py --env breakout --wandbRuns land in your rl-atari-ppo / rl-atari-dqn project — nothing is shared by default. Omit --wandb and the script runs without ever touching the network.
Modernized from the 2017 original:
- Framework: Keras + TensorFlow 1.0 → PyTorch 2.11
- Env: gym 0.8 → gymnasium 1.2
- Rendering: tkinter → pygame (cross-platform with no system Tk)
- Tooling:
requirements.txt→pyproject.toml+uv - Scope: pruned to 9 core algorithms; dropped Monte Carlo / DDQN / A3C / Atari / mountaincar; added PPO
- Layout: flat
1-grid-world/3-sarsa.pyinstead of nested1-grid-world/4-sarsa/sarsa_agent.py - Docs: each algorithm file now opens with a paper citation and the core update equation
