The distributional shift problem

image.png

Terminologies

so what is the distribution shift problem?

The problem is that these two distributions (or data) are not the same ($p_{\text{data}}(o_t) ≠ p_{π_θ}(o_t)$). The agent is tested in situations it was never trained for.

For example, image above shows even a tiny error by the agent can send it into a situation the expert never encountered (= not observed). In this new, unfamiliar state, the agent doesn't know what to do, likely making even bigger mistakes and unreliable behavior and so on... This compounding of errors is called distributional shift.

Ok.. Let's define more precisely what we want then

While the training process focuses on mimicking the expert's actions on the training data (supervised learning), what we really want is for the agent to perform the task well in the real world.

To measure this, a cost function $c(s, a)$ is introduced:

The true goal is to minimize the total cost (number of mistakes) under the agent’s own distribution of states, $p_{π_θ}$, not the expert's training data distribution.