OnlineTD¶
-
class
OnlineTD
(trace_decay: float, eligibility: Type[pandemonium.traces.EligibilityTrace] = <class 'pandemonium.traces.AccumulatingTrace'>, criterion: callable = <function smooth_l1_loss>, **kwargs)¶ Bases:
pandemonium.demons.demon.Demon
Base class for backward-view (online) \(\TD\) methods.
Methods Summary
delta
(self, t)Specifies the update rule for approximate value function (avf)
learn
(self, t)target
(self, t, v)Computes one-step update target.
Methods Documentation
-
delta
(self, t: pandemonium.experience.experience.Transition) → Tuple[Union[torch.Tensor, NoneType], dict]¶ Specifies the update rule for approximate value function (avf)
Since the algorithms in this family are online, the update rule is applied on every Transition.
-
learn
(self, t: pandemonium.experience.experience.Transition)¶
-
target
(self, t: pandemonium.experience.experience.Transition, v: torch.Tensor)¶ Computes one-step update target.
-