The following algorithms are implemented in the Spinning Up package:
don’t use old data → weaker on sample efficiency.
works out mathematically
tradeoff: sample efficiency vs stability
can use old data → stronger sample efficiency
Bellman’s equations for optimality
DDPG and Q-Learning