Interactive Q-Learning Gridworld

About Q-Learning

The agent's goal is to learn the best action for each state, maximizing its cumulative reward. It learns using the Bellman equation to update its Q-values:

Q(s,a) ← Q(s,a) + α[R + γmaxa'Q(s',a') - Q(s,a)]