DQN Visualization

DQN Overview

Q-Learning

Learns action-value function Q(s,a) that gives expected utility of taking action a in state s

Deep Neural Network

Approximates Q-function with weights θ, overcoming limitations of tabular methods

Experience Replay

Stores transitions (s,a,r,s') in memory buffer to break correlations between samples

Target Network

Separate network with weights θ⁻ that are periodically updated to stabilize learning

DQN Algorithm

Initialize replay memory D to capacity N
Initialize action-value function Q with random weights θ
Initialize target action-value function Q̂ with weights θ⁻ = θ
for episode = 1, M do
    Initialize sequence s₁ = {x₁} and preprocessed sequence φ₁ = φ(s₁)
    for t = 1, T do
        With probability ε select a random action aₜ
        otherwise select aₜ = argmaxₐ Q(φ(sₜ), a; θ)
        Execute action aₜ in emulator and observe reward rₜ and image xₜ₊₁
        Set sₜ₊₁ = sₜ, aₜ, xₜ₊₁ and preprocess φₜ₊₁ = φ(sₜ₊₁)
        Store transition (φₜ, aₜ, rₜ, φₜ₊₁) in D
        Sample random minibatch of transitions (φⱼ, aⱼ, rⱼ, φⱼ₊₁) from D
        Set yⱼ = rⱼ if episode terminates at step j+1
               rⱼ + γ maxₐ′ Q̂(φⱼ₊₁, a′; θ⁻) otherwise
        Perform gradient descent step on (yⱼ − Q(φⱼ, aⱼ; θ))²
        Every C steps reset Q̂ = Q

Deep Q-Network (DQN) Visualization

DQN Overview

Q-Learning

Deep Neural Network

Experience Replay

Target Network

DQN Algorithm

DQN Architecture Visualization

Loss Function

Q-Values

Rewards