Library Reference¶
pandemonium.demons Package¶
Classes¶
|
Categorical Q-learning mixin. |
|
Learns the optimal policy while learning to predict |
|
General Value Function Approximator |
Mixin for value-based control algorithms that uses two separate estimators for action-values and state-values. |
|
|
Approximates state or state-action values using linear projection |
|
Base class for forward-view \(\TD\) methods. |
|
Offline \(\TD\) for control tasks. |
|
Offline \(\TD\) for prediction tasks. |
|
Base class for backward-view (online) \(\TD\) methods. |
|
Base class for online \(\TD\) methods for control tasks. |
|
Semi-gradient \(\TD{(\lambda)}\) rule for estimating \(\tilde{v} \approx v_{\pi}\) |
|
Base class fot parametrized Demons implemented in PyTorch |
|
Collects factual knowledge about environment by learning to predict |
|
Classic Q-learning update rule. |
|
Semi-gradient \(\SARSA{(\lambda)}\). |
|
Semi-gradient Expected \(\SARSA{(\lambda)}\). |
|
|
|
|
|
\(n\)-step \(\TD\) for estimating \(V \approx v_{\pi}\) |
|
Truncated \(\TD{(\lambda)}\) |
Class Inheritance Diagram¶

pandemonium.implementations Package¶
Functions¶
|
Classes¶
|
Actor Critic equipped with linear FA and eligibility traces. |
|
Measures the novelty using prediction error of the forward model. |
|
Deep Q-Network with all the bells and whistles mixed in. |
|
Implements online version of Double Q-learning. |
|
Intrinsic Curiosity Module |
|
|
|
|
|
Simple online Q-learning. |
|
|
|
Duelling de-convolutional network for auxiliary pixel control task |
|
A demon that maximizes un-discounted \(n\)-step return. |
|
\(n \text{-step} \TD\) performed on the past experiences |
Class Inheritance Diagram¶

pandemonium.policies Package¶
Functions¶
|
A random tie-breaking argmax |
|
Returns a random tie-breaking argmax mask |
Classes¶
|
|
|
Base class for parametrized policies |
|
Base class for discrete policies. |
|
\(\epsilon\)-greedy policy for discrete action spaces. |
|
\(\epsilon\)-greedy policy over options. |
|
Convenience for greedy policy, often used as a learning target. |
|
A decision rule for discrete option spaces. |
|
Base abstract class for decision making rules |
|
Picks an option at random. |
|
Picks actions with probability proportional to Q-values. |
|
Vanilla Policy Gradient |
Class Inheritance Diagram¶

pandemonium.networks Package¶
Functions¶
|
Determines the dimensionality of a 2d convolutional layer. |
|
Determines the dimensionality of a 2d de-convolutional layer. |
|
A helper function for layer initialization. |
Classes¶
|
ABC for all networks that allows for registration |
|
A chain of convolutional layers. |
|
A CNN with a recurrent LSTM layer on top. |
|
A chain of linear layers. |
|
Predicts next features, given the current features and an action. |
|
|
|
Predicts an action from a pair of consecutive feature vectors. |
|
A fully connected head that follows the network base. |
|
Adds a fully connected layer after a series of convolutional layers. |
|
A convenience module for reshaping the output at the end of the net. |
|
Mixin that adds a target network to the agent. |
Class Inheritance Diagram¶

pandemonium.cumulants Module¶
Classes¶
|
A weighted sum of multiple cumulants. |
|
A signal of interest \(z\) accumulated over time |
Measures the amount of causal influence an agent has on the world |
|
|
Tracks a single value from the feature vector \(\boldsymbol{x}\) |
|
Tracks scalar extrinsic reward received from the MDP environment |
A 2-dimensional cumulant tracking pixel change from frame to frame |
|
|
Tracks the ‘surprise’ associated with a new state using a density model |
Class Inheritance Diagram¶

pandemonium.continuations Module¶
Classes¶
|
Special case of state independent discounting |
|
State-dependent discount factor |
Base abstract class for parametrized continuation functions |
|
|
|
Class Inheritance Diagram¶

pandemonium.experience Package¶
Classes¶
|
Experience Replay buffer. |
|
Prioritized Experience Replay buffer. |
|
Registrar for replay buffers |
|
Mixin that adds a replay buffer to an agent. |
|
Segmented Experience Replay buffer. |
|
Skewed Experience Replay buffer. |
|
A series of consecutive transitions experienced by the agent. |
|
An experience tuple. |
Class Inheritance Diagram¶

pandemonium.traces Module¶
Classes¶
|
|
|
|
|
Base class for various eligibility traces in \(\TD\) learners. |
|
|
|
Class Inheritance Diagram¶
