TDn¶

class TDn(**kwargs)¶

Bases: pandemonium.demons.offline_td.TTD

\(n\)-step \(\TD\) for estimating \(V \approx v_{\pi}\)

Targets are calculated using forward view from \(n\)-step returns, where \(n\) is determined by the length of trajectory. \(n\)-step \(\TD\) is a special case of truncated \(\TD\) with \(\lambda=1\).

The actual value of \(n\) is determined implicitly from the length of the trajectory (which itself is based on the rollout_fragment_length).

TODO: clarify the relationship between n-step, rollout_fragment_length,: batch_size, training_iteration