GitHub - rlcode/reinforcement-learning: Minimal and Clean Reinforcement Learning Examples

From the basics to deep reinforcement learning, this repo provides easy-to-read code examples. One file for each algorithm. Please feel free to create a Pull Request, or open an issue!

Algorithms

Grid World (1-grid-world/)

Policy Iteration — 1-policy_iteration.py
Value Iteration — 2-value_iteration.py
SARSA — 3-sarsa.py
Q-Learning — 4-q_learning.py
Deep SARSA — 5-deep_sarsa.py
REINFORCE — 6-reinforce.py

CartPole (2-cartpole/)

DQN — 1-dqn.py
A2C — 2-a2c.py
PPO — 3-ppo.py

Setup

Requires Python 3.11 and uv.

git clone <this repo>
cd reinforcement-learning
uv sync

Running

# Grid World
cd 1-grid-world && uv run python 3-sarsa.py

# CartPole — train
cd 2-cartpole && uv run python 1-dqn.py

# CartPole — watch training (slower)
cd 2-cartpole && uv run python 1-dqn.py --render

# CartPole — replay a trained checkpoint
cd 2-cartpole && uv run python 1-dqn.py --test

Logging to Weights & Biases (Atari only)

Both Atari scripts (1-dqn.py, 2-ppo.py) can stream training metrics to your own Weights & Biases account. One-time login, then pass --wandb:

uv run wandb login   # paste the API key from https://wandb.ai/authorize
cd 3-atari && uv run python 2-ppo.py --env breakout --wandb
cd 3-atari && uv run python 1-dqn.py --env breakout --wandb

Runs land in your rl-atari-ppo / rl-atari-dqn project — nothing is shared by default. Omit --wandb and the script runs without ever touching the network.

Updates

Modernized from the 2017 original:

Framework: Keras + TensorFlow 1.0 → PyTorch 2.11
Env: gym 0.8 → gymnasium 1.2
Rendering: tkinter → pygame (cross-platform with no system Tk)
Tooling: requirements.txt → pyproject.toml + uv
Scope: pruned to 9 core algorithms; dropped Monte Carlo / DDQN / A3C / Atari / mountaincar; added PPO
Layout: flat 1-grid-world/3-sarsa.py instead of nested 1-grid-world/4-sarsa/sarsa_agent.py
Docs: each algorithm file now opens with a paper citation and the core update equation

Name		Name	Last commit message	Last commit date
Latest commit History 269 Commits
1-grid-world		1-grid-world
2-cartpole		2-cartpole
3-atari		3-atari
images		images
wiki		wiki
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Algorithms

Setup

Running

Logging to Weights & Biases (Atari only)

Updates

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Algorithms

Setup

Running

Logging to Weights & Biases (Atari only)

Updates

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages