SoftmaxPolicy

class SoftmaxPolicy(temperature: Union[pandemonium.utilities.schedules.ConstantSchedule, pandemonium.utilities.schedules.LinearSchedule], *args, **kwargs)

Bases: pandemonium.policies.discrete.Discrete

Picks actions with probability proportional to Q-values.

Also called Boltzman or Gibbs distribution.

References

Sutton & Barto Section 2.3

http://incompleteideas.net/book/ebook/node17.html

Attributes Summary

temperature

Methods Summary

act(self, \*args, \*\*kwargs)

Samples an action from a distribution over actions

dist(self, features, q_fn)

Produces a distribution over actions

Attributes Documentation

temperature

Methods Documentation

act(self, \*args, \*\*kwargs)

Samples an action from a distribution over actions

dist(self, features, q_fn) → torch.distributions.categorical.Categorical

Produces a distribution over actions