MDPs(Marcov Decision Process) | Notion

1. Marcov Decision Processes

MDP is defined by

set of state s
set of actions a
transition function T
reward function R
a start state
discount factor
horizon

Q-States (or Action States)

The green nodes represent Q-states, where an action has been taken from a state but has yet to be resolved into a successor state. It’s important to understand that agents spend zero timesteps in Q-states, and that they are simply a construct created for ease of representation and development of MDP algorithms.

→ So Q-state is basically the probability of what the next state will be after taking a certain action.

Finite Horizons and Discounting

Finite Horizon is setting finite steps ($m$) an agent can take.

Discounting Factor $\gamma$ is an exponential decay in the value of rewards over time.