In a nutshell, RL is the study of agents and how they learn by trial and error. It formalizes the idea that rewarding or punishing an agent for its behavior makes it more likely to repeat or forego that behavior in the future.
A state $s$ is a complete description of the state of the world. There is no information about the world which is hidden from the state.
An observation $o$ is a partial description of a state, which may omit information.
When the agent is able to observe the complete state of the environment, we say that the environment is fully observed. When the agent can only see a partial observation, we say that the environment is partially observed.
action spaces $a$s are set of all valid actions in a given environment
Policy $\pi_\theta(a|s)$ is a rule used by Agent to make decisions. We can think of it as the brain of the agent, which in many cases, can substitute the word agent to policy.
e.g. feed-forward MLP(Multi Layer Perceptron)

pi_net = nn.Sequential(
nn.Linear(obs_dim, 64),
nn.Tanh(),
nn.Linear(64, 64),
nn.Tanh(),
nn.Linear(64, act_dim)
)
obs_dim: numpy array with a batch of observations