SARSE¶
-
class
SARSE
(**kwargs)¶ Bases:
pandemonium.demons.control.TDControl
Semi-gradient Expected \(\SARSA{(\lambda)}\).
References
- “Reinforcement Learning: An Introduction”
Sutton and Barto (2018) ch. 6.6 http://incompleteideas.net/book/the-book.html
- “A Theoretical and Empirical Analysis of Expected Sarsa”
Harm van Seijen et al (2009) http://www.cs.ox.ac.uk/people/shimon.whiteson/pubs/vanseijenadprl09.pdf
Methods Summary
q_t
(self, exp, ForwardRef])Computes action-value targets \(Q(s_{t+1}, \hat{a})\).
Methods Documentation
-
q_t
(self, exp: Union[ForwardRef(‘Transition’), ForwardRef(‘Trajectory’)])¶ Computes action-value targets \(Q(s_{t+1}, \hat{a})\).
Algorithms differ in the way \(\hat{a}\) is chosen.
\[\begin{split}\begin{align*} \text{Q-learning} &: \hat{a} = \argmax_{a \in \mathcal{A}}Q(s_{t+1}, a) \\ \SARSA &: \hat{a} = \mu(s_{t+1}) \end{align*}\end{split}\]