OfflineTD¶

class OfflineTD(criterion=<function smooth_l1_loss>, **kwargs)¶

Bases: pandemonium.demons.demon.Demon

Base class for forward-view \(\TD\) methods.

This class is used as a base for most of the DRL algorithms due to synergy with batching.

Methods Summary

`delta`(self, trajectory)	Updates a value of a state using information in the trajectory.
`target`(self, trajectory, v)	Computes discounted returns for each step in the trajectory.

Methods Documentation

delta(self, trajectory: pandemonium.experience.experience.Trajectory) → Tuple[Union[torch.Tensor, NoneType], dict]¶: Updates a value of a state using information in the trajectory.

target(self, trajectory: pandemonium.experience.experience.Trajectory, v: torch.Tensor)¶: Computes discounted returns for each step in the trajectory.