OnlineTD

class OnlineTD(trace_decay: float, eligibility: Type[pandemonium.traces.EligibilityTrace] = <class 'pandemonium.traces.AccumulatingTrace'>, criterion: callable = <function smooth_l1_loss>, **kwargs)

Bases: pandemonium.demons.demon.Demon

Base class for backward-view (online) \(\TD\) methods.

Methods Summary

delta(self, t)

Specifies the update rule for approximate value function (avf)

learn(self, t)

target(self, t, v)

Computes one-step update target.

Methods Documentation

delta(self, t: pandemonium.experience.experience.Transition) → Tuple[Union[torch.Tensor, NoneType], dict]

Specifies the update rule for approximate value function (avf)

Since the algorithms in this family are online, the update rule is applied on every Transition.

learn(self, t: pandemonium.experience.experience.Transition)
target(self, t: pandemonium.experience.experience.Transition, v: torch.Tensor)

Computes one-step update target.