HierarchicalPolicy

class HierarchicalPolicy(option_space: pandemonium.utilities.spaces.OptionSpace)

Bases: pandemonium.policies.discrete.Discrete

A decision rule for discrete option spaces.

In order to produce an action \(a\) this policy picks an option \(ω\) from the space of available options \(Ω\) first. To pick an option, it queries initiation set \(I\) of each of the available options, picking the one that has the highest score. It then it uses the internal policy \(π\) of the chosen option to produce the action that will be made by the agent.

Todo

Currently starts with a random option from the space. Maybe wait for initial state to initialize the option instead, then pick the one with the highest interest

Todo

explore the interplay between initiation of one option and termination of another

Todo

might be better to move the OptionSpace into this file since it is discrete at the moment

Methods Summary

act(self, state, vf)

Samples an action from a distribution over actions

dist(self, \*args, \*\*kwargs)

Produces a distribution over actions

Methods Documentation

act(self, state, vf)

Samples an action from a distribution over actions

dist(self, \*args, \*\*kwargs) → torch.distributions.distribution.Distribution

Produces a distribution over actions