Skip to content

rlcode/reinforcement-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

269 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

From the basics to deep reinforcement learning, this repo provides easy-to-read code examples. One file for each algorithm. Please feel free to create a Pull Request, or open an issue!

Algorithms

Grid World (1-grid-world/)

  1. Policy Iteration — 1-policy_iteration.py
  2. Value Iteration — 2-value_iteration.py
  3. SARSA — 3-sarsa.py
  4. Q-Learning — 4-q_learning.py
  5. Deep SARSA — 5-deep_sarsa.py
  6. REINFORCE — 6-reinforce.py

CartPole (2-cartpole/)

  1. DQN — 1-dqn.py
  2. A2C — 2-a2c.py
  3. PPO — 3-ppo.py

Setup

Requires Python 3.11 and uv.

git clone <this repo>
cd reinforcement-learning
uv sync

Running

# Grid World
cd 1-grid-world && uv run python 3-sarsa.py

# CartPole — train
cd 2-cartpole && uv run python 1-dqn.py

# CartPole — watch training (slower)
cd 2-cartpole && uv run python 1-dqn.py --render

# CartPole — replay a trained checkpoint
cd 2-cartpole && uv run python 1-dqn.py --test

Logging to Weights & Biases (Atari only)

Both Atari scripts (1-dqn.py, 2-ppo.py) can stream training metrics to your own Weights & Biases account. One-time login, then pass --wandb:

uv run wandb login   # paste the API key from https://wandb.ai/authorize
cd 3-atari && uv run python 2-ppo.py --env breakout --wandb
cd 3-atari && uv run python 1-dqn.py --env breakout --wandb

Runs land in your rl-atari-ppo / rl-atari-dqn project — nothing is shared by default. Omit --wandb and the script runs without ever touching the network.

Updates

Modernized from the 2017 original:

  • Framework: Keras + TensorFlow 1.0 → PyTorch 2.11
  • Env: gym 0.8 → gymnasium 1.2
  • Rendering: tkinter → pygame (cross-platform with no system Tk)
  • Tooling: requirements.txtpyproject.toml + uv
  • Scope: pruned to 9 core algorithms; dropped Monte Carlo / DDQN / A3C / Atari / mountaincar; added PPO
  • Layout: flat 1-grid-world/3-sarsa.py instead of nested 1-grid-world/4-sarsa/sarsa_agent.py
  • Docs: each algorithm file now opens with a paper citation and the core update equation

Releases

No releases published

Packages

 
 
 

Contributors

Languages