GVF¶
-
class
GVF
(target_policy: pandemonium.policies.policy.Policy, continuation: pandemonium.continuations.ContinuationFunction, cumulant: pandemonium.cumulants.Cumulant)¶ Bases:
object
General Value Function
Consider a stream of data \(\{ (x_t, A_t) \}^{\infty}_{t=0}\), produced by agent-environment interaction. Here, \(x\) is a tensor of experience (see
pandemonium.experience.Transition
) and \(A\) is an action from a finite action space \(\mathcal{A}\).The target \(G\) is a summary of the future value of the cumulant \(Z\), discounted according to the termination function \(\gamma\):
\[G_t = Z_{t+1} + \sum_{\tau=t+1}^{\infty} \gamma_{\tau} Z_{\tau+1}\]GVF estimates the expected value of the target cumulant, given actions are generated according to the target policy:
\[\mathbb{E}_π [G_t|S_t = s]\]To make things more concrete, keep in mind an example of predicting a robot’s light sensor as it drives around a room. We will stick to this example throughout definitions in this abstract class.
Note
The value produced is not necessarily scalar, i.e. in case of estimating an action-function(Q) we get a row vector with values corresponding to each possible action.
Methods Summary
continuation
(self, s)Outputs continuation signal based on the agent’s observation
cumulant
(self, s)Accumulates future values of the signal.
target_policy
(self, s)The policy, whose value we would like to learn
Methods Documentation
-
continuation
(self, s)¶ Outputs continuation signal based on the agent’s observation
\[\begin{split}\gamma: \mathcal{S} \mapsto[0, 1] \\\end{split}\]Notice that this is different from an MDP discounting factor \(\gamma\) in classic RL. Here we allow the termination to be state-dependent.
-
cumulant
(self, s)¶ Accumulates future values of the signal.
\[z: \mathcal{S} \mapsto \mathbb{R}\]For example, this could be current light sensor reading of a robot.
-
target_policy
(self, s) → torch.distributions.distribution.Distribution¶ The policy, whose value we would like to learn
\[\pi: \mathcal{S} \times \mathcal{A} \mapsto [0, 1]\]
-