TDn¶
-
class
TDn
(**kwargs)¶ Bases:
pandemonium.demons.offline_td.TTD
\(n\)-step \(\TD\) for estimating \(V \approx v_{\pi}\)
Targets are calculated using forward view from \(n\)-step returns, where \(n\) is determined by the length of trajectory. \(n\)-step \(\TD\) is a special case of truncated \(\TD\) with \(\lambda=1\).
The actual value of \(n\) is determined implicitly from the length of the trajectory (which itself is based on the rollout_fragment_length).
- TODO: clarify the relationship between n-step, rollout_fragment_length,
batch_size, training_iteration