OnlineTD¶

class OnlineTD(trace_decay: float, eligibility: Type[pandemonium.traces.EligibilityTrace] = <class 'pandemonium.traces.AccumulatingTrace'>, criterion: callable = <function smooth_l1_loss>, **kwargs)¶

Bases: pandemonium.demons.demon.Demon

Base class for backward-view (online) \(\TD\) methods.

Methods Summary

`delta`(self, t)	Specifies the update rule for approximate value function (avf)
`learn`(self, t)
`target`(self, t, v)	Computes one-step update target.

Methods Documentation

delta(self, t: pandemonium.experience.experience.Transition) → Tuple[Union[torch.Tensor, NoneType], dict]¶

Specifies the update rule for approximate value function (avf)

Since the algorithms in this family are online, the update rule is applied on every Transition.

learn(self, t: pandemonium.experience.experience.Transition)¶

target(self, t: pandemonium.experience.experience.Transition, v: torch.Tensor)¶: Computes one-step update target.