Interactive Q-Learning Gridworld

About Q-Learning

The agent's goal is to learn the best action for each state, maximizing its cumulative reward. It learns using the Bellman equation to update its Q-values:

Q(s,a) ← Q(s,a) + α[R + γmax_a'Q(s',a') - Q(s,a)]

The arrows show the agent's preference (Q-value) for each action.
Use the brushes and settings to create custom challenges!
Use the mouse wheel to zoom and drag to pan the view.

Controls

Editor Brush

Speed: 10

Learning Rate (α): 0.1

Discount (γ): 0.9

Exploration (ε): 0.2

About Q-Learning

Settings