ValueReplay¶
-
class
ValueReplay
(replay_buffer: pandemonium.experience.buffers.ER, **kwargs)¶ Bases:
pandemonium.demons.demon.LinearDemon
,pandemonium.demons.prediction.OfflineTDPrediction
,pandemonium.demons.offline_td.TDn
\(n \text{-step} \TD\) performed on the past experiences
This demon re-samples recent historical sequences from the behavior policy distribution and performs extra value function regression. It is used in the UNREAL architecture as an auxiliary task that helps representation learning.
References
RL with unsupervised auxiliary tasks (Jaderberd et al., 2016)
Methods Summary
learn
(self, transitions)Methods Documentation
-
learn
(self, transitions: List[ForwardRef(‘Transition’)])¶
-