SoftmaxPolicy¶
-
class
SoftmaxPolicy
(temperature: Union[pandemonium.utilities.schedules.ConstantSchedule, pandemonium.utilities.schedules.LinearSchedule], *args, **kwargs)¶ Bases:
pandemonium.policies.discrete.Discrete
Picks actions with probability proportional to Q-values.
Also called Boltzman or Gibbs distribution.
References
- Sutton & Barto Section 2.3
Attributes Summary
Methods Summary
act
(self, \*args, \*\*kwargs)Samples an action from a distribution over actions
dist
(self, features, q_fn)Produces a distribution over actions
Attributes Documentation
-
temperature
¶
Methods Documentation
-
act
(self, \*args, \*\*kwargs)¶ Samples an action from a distribution over actions
-
dist
(self, features, q_fn) → torch.distributions.categorical.Categorical¶ Produces a distribution over actions