ValueReplay

class ValueReplay(replay_buffer: pandemonium.experience.buffers.ER, **kwargs)

Bases: pandemonium.demons.demon.LinearDemon, pandemonium.demons.prediction.OfflineTDPrediction, pandemonium.demons.offline_td.TDn

\(n \text{-step} \TD\) performed on the past experiences

This demon re-samples recent historical sequences from the behavior policy distribution and performs extra value function regression. It is used in the UNREAL architecture as an auxiliary task that helps representation learning.

References

RL with unsupervised auxiliary tasks (Jaderberd et al., 2016)

Methods Summary

learn(self, transitions)

Methods Documentation

learn(self, transitions: List[ForwardRef(‘Transition’)])