Demon¶
-
class
Demon
(gvf: pandemonium.gvf.GVF, avf: Callable, feature, behavior_policy: pandemonium.policies.policy.Policy, eligibility: Optional[pandemonium.traces.EligibilityTrace])¶ Bases:
object
General Value Function Approximator
Each demon is an independent reinforcement learning agent responsible for learning one piece of predictive knowledge about the main agent’s interaction with its environment.
Demon learns an approximate value function \(\tilde{V}\) (avf), to the general value function (gvf) that corresponds to the setting of the three “question” functions: \(\pi\), \(\gamma\), and z. The tools that the demon uses to learn the approximation are called “answer” functions and are comprised of \(\mu\), \(\phi\) and \(\lambda\).
-
gvf
¶ General Value Function to be estimated by the demon
-
avf
¶ Approximate Value Function learned by the demon to approximate gvf
-
φ
¶ Feature generator learning useful state representations
-
μ
¶ Behavior policy that collects experience
-
λ
¶ Eligibility trace assigning credit to experiences
Methods Summary
behavior_policy
(self, s)Specifies behavior of the agent
delta
(self, experience, ForwardRef])Specifies the update rule for approximate value function (avf)
eligibility
(self, s)Specifies eligibility trace-decay rate
feature
(self, \*args, \*\*kwargs)A mapping from MDP states to features
learn
(self, experience, ForwardRef])predict
(self, x)Predict the value (or value distribution) of the state
Methods Documentation
-
behavior_policy
(self, s)¶ Specifies behavior of the agent
\[\mu: \mathcal{S} \times \mathcal{A} \mapsto [0, 1]\]The distribution across all possible motor commands of the agent could be specified in this way.
-
delta
(self, experience: Union[ForwardRef(‘Transition’), ForwardRef(‘Trajectory’)]) → Tuple[Union[torch.Tensor, NoneType], dict]¶ Specifies the update rule for approximate value function (avf)
Depending on whether the algorithm is online or offline, the demon will be learning from a single Transition vs a Trajectory of experiences.
-
eligibility
(self, s)¶ Specifies eligibility trace-decay rate
\[\lambda: \mathcal{S} \mapsto \mathbb{R}\]
-
feature
(self, \*args, \*\*kwargs)¶ A mapping from MDP states to features
\[\phi: \mathcal{S} \mapsto \mathbb{R}^n\]Feature tensor could be constructed from the robot’s external sensor readings (not just the ones corresponding to light).
We can use any representation learning module here.
-
learn
(self, experience: Union[ForwardRef(‘Transition’), ForwardRef(‘Trajectory’)])¶
-
predict
(self, x)¶ Predict the value (or value distribution) of the state
-