Deep Q-Network (DQN) Visualization

Interactive visualization of a Deep Q-Network, showing how neural networks learn to make decisions through reinforcement learning.

DQN Overview

Q-Learning

Learns action-value function Q(s,a) that gives expected utility of taking action a in state s

Deep Neural Network

Approximates Q-function with weights θ, overcoming limitations of tabular methods

Experience Replay

Stores transitions (s,a,r,s') in memory buffer to break correlations between samples

Target Network

Separate network with weights θ⁻ that are periodically updated to stabilize learning

DQN Algorithm

Initialize replay memory D to capacity N
Initialize action-value function Q with random weights θ
Initialize target action-value function Q̂ with weights θ⁻ = θ
for episode = 1, M do
    Initialize sequence s₁ = {x₁} and preprocessed sequence φ₁ = φ(s₁)
    for t = 1, T do
        With probability ε select a random action aₜ
        otherwise select aₜ = argmaxₐ Q(φ(sₜ), a; θ)
        Execute action aₜ in emulator and observe reward rₜ and image xₜ₊₁
        Set sₜ₊₁ = sₜ, aₜ, xₜ₊₁ and preprocess φₜ₊₁ = φ(sₜ₊₁)
        Store transition (φₜ, aₜ, rₜ, φₜ₊₁) in D
        Sample random minibatch of transitions (φⱼ, aⱼ, rⱼ, φⱼ₊₁) from D
        Set yⱼ = rⱼ if episode terminates at step j+1
               rⱼ + γ maxₐ′ Q̂(φⱼ₊₁, a′; θ⁻) otherwise
        Perform gradient descent step on (yⱼ − Q(φⱼ, aⱼ; θ))²
        Every C steps reset Q̂ = Q
                    

DQN Architecture Visualization

Episode: 0 | Step: 0 | Avg Reward: 0.00
ε:
1.00
Environment
CartPole-v1
Replay Memory

Loss Function

Mean Squared Error (MSE) Loss

Q-Values

Average Q-Values per Action

Rewards

Episode Rewards

Made with DeepSite LogoDeepSite - 🧬 Remix