Library Reference¶

pandemonium.agent Module¶

Classes¶

Agent(feature_extractor, behavior_policy, horde)

Class Inheritance Diagram¶

Inheritance diagram of pandemonium.agent.Agent

pandemonium.gvf Module¶

Classes¶

GVF(target_policy, continuation, cumulant)

General Value Function

Class Inheritance Diagram¶

Inheritance diagram of pandemonium.gvf.GVF

pandemonium.demons Package¶

Classes¶

`CategoricalQ`(num_atoms, v_min, v_max)	Categorical Q-learning mixin.
`ControlDemon`(aqf, avf, **kwargs)	Learns the optimal policy while learning to predict
`Demon`(gvf, avf, feature, behavior_policy, …)	General Value Function Approximator
`DuellingMixin`()	Mixin for value-based control algorithms that uses two separate estimators for action-values and state-values.
`LinearDemon`(feature, output_dim, args, *kwargs)	Approximates state or state-action values using linear projection
`OfflineTD`([criterion])	Base class for forward-view \(\TD\) methods.
`OfflineTDControl`(**kwargs)	Offline \(\TD\) for control tasks.
`OfflineTDPrediction`([criterion])	Offline \(\TD\) for prediction tasks.
`OnlineTD`(trace_decay, eligibility, …)	Base class for backward-view (online) \(\TD\) methods.
`OnlineTDControl`(**kwargs)	Base class for online \(\TD\) methods for control tasks.
`OnlineTDPrediction`(trace_decay, eligibility, …)	Semi-gradient \(\TD{(\lambda)}\) rule for estimating \(\tilde{v} \approx v_{\pi}\)
`ParametricDemon`(**kwargs)	Base class fot parametrized Demons implemented in PyTorch
`PredictionDemon`(gvf, avf, feature, …)	Collects factual knowledge about environment by learning to predict
`QLearning`(**kwargs)	Classic Q-learning update rule.
`SARSA`(**kwargs)	Semi-gradient \(\SARSA{(\lambda)}\).
`SARSE`(**kwargs)	Semi-gradient Expected \(\SARSA{(\lambda)}\).
`TDControl`(**kwargs)
`TDPrediction`(**kwargs)
`TDn`(**kwargs)	\(n\)-step \(\TD\) for estimating \(V \approx v_{\pi}\)
`TTD`(trace_decay, **kwargs)	Truncated \(\TD{(\lambda)}\)

Class Inheritance Diagram¶

Inheritance diagram of pandemonium.demons.control.CategoricalQ, pandemonium.demons.demon.ControlDemon, pandemonium.demons.demon.Demon, pandemonium.demons.control.DuellingMixin, pandemonium.demons.demon.LinearDemon, pandemonium.demons.offline_td.OfflineTD, pandemonium.demons.control.OfflineTDControl, pandemonium.demons.prediction.OfflineTDPrediction, pandemonium.demons.online_td.OnlineTD, pandemonium.demons.control.OnlineTDControl, pandemonium.demons.prediction.OnlineTDPrediction, pandemonium.demons.demon.ParametricDemon, pandemonium.demons.demon.PredictionDemon, pandemonium.demons.control.QLearning, pandemonium.demons.control.SARSA, pandemonium.demons.control.SARSE, pandemonium.demons.control.TDControl, pandemonium.demons.prediction.TDPrediction, pandemonium.demons.offline_td.TDn, pandemonium.demons.offline_td.TTD

pandemonium.implementations Package¶

Functions¶

create_horde(config, env, feature_extractor, …)

Classes¶

`AC`(args, *kwargs)	Actor Critic equipped with linear FA and eligibility traces.
`Curiosity`(icm)	Measures the novelty using prediction error of the forward model.
`DQN`(feature, behavior_policy, replay_buffer, …)	Deep Q-Network with all the bells and whistles mixed in.
`DoubleQLearning`(**kwargs)	Implements online version of Double Q-learning.
`ICM`(feature, behavior_policy, beta)	Intrinsic Curiosity Module
`MultistepQLearning`(**kwargs)
`MultistepSARSA`(**kwargs)
`OnlineQLearning`(**kwargs)	Simple online Q-learning.
`OnlineSARSA`(**kwargs)
`PixelControl`(feature, behavior_policy, …)	Duelling de-convolutional network for auxiliary pixel control task
`RewardPrediction`(replay_buffer, feature, …)	A demon that maximizes un-discounted \(n\)-step return.
`ValueReplay`(replay_buffer, **kwargs)	\(n \text{-step} \TD\) performed on the past experiences

Class Inheritance Diagram¶

Inheritance diagram of pandemonium.implementations.a2c.AC, pandemonium.implementations.icm.Curiosity, pandemonium.implementations.rainbow.DQN, pandemonium.implementations.q_learning.DoubleQLearning, pandemonium.implementations.icm.ICM, pandemonium.implementations.q_learning.MultistepQLearning, pandemonium.implementations.sarsa.MultistepSARSA, pandemonium.implementations.q_learning.OnlineQLearning, pandemonium.implementations.sarsa.OnlineSARSA, pandemonium.implementations.unreal.PixelControl, pandemonium.implementations.unreal.RewardPrediction, pandemonium.implementations.unreal.ValueReplay

pandemonium.horde Module¶

Classes¶

Horde(demons, device, aggregation_fn, …)

A horde of Demons

Class Inheritance Diagram¶

Inheritance diagram of pandemonium.horde.Horde

pandemonium.policies Package¶

Functions¶

`randargmax`(b, rng)	A random tie-breaking argmax
`torch_argmax_mask`(q, dim)	Returns a random tie-breaking argmax mask

Classes¶

`Continuous`(feature_dim, action_space, **params)
`DiffPolicy`(args, *kwargs)	Base class for parametrized policies
`Discrete`(action_space, feature_dim)	Base class for discrete policies.
`Egreedy`(epsilon, …)	\(\epsilon\)-greedy policy for discrete action spaces.
`EgreedyOverOptions`(epsilon, …)	\(\epsilon\)-greedy policy over options.
`Greedy`(args, *kwargs)	Convenience for greedy policy, often used as a learning target.
`HierarchicalPolicy`(option_space)	A decision rule for discrete option spaces.
`Policy`(feature_dim, action_space, **params)	Base abstract class for decision making rules
`Random`(action_space, feature_dim)	Picks an option at random.
`SoftmaxPolicy`(temperature, …)	Picks actions with probability proportional to Q-values.
`VPG`(entropy_coefficient, args, *kwargs)	Vanilla Policy Gradient

Class Inheritance Diagram¶

Inheritance diagram of pandemonium.policies.continuous.Continuous, pandemonium.policies.gradient.DiffPolicy, pandemonium.policies.discrete.Discrete, pandemonium.policies.discrete.Egreedy, pandemonium.policies.discrete.EgreedyOverOptions, pandemonium.policies.discrete.Greedy, pandemonium.policies.discrete.HierarchicalPolicy, pandemonium.policies.policy.Policy, pandemonium.policies.discrete.Random, pandemonium.policies.discrete.SoftmaxPolicy, pandemonium.policies.gradient.VPG

pandemonium.networks Package¶

Functions¶

`conv2d_size_out`(size[, kernel_size, stride, …])	Determines the dimensionality of a 2d convolutional layer.
`deconv2d_size_out`(size, kernel_size, stride, …)	Determines the dimensionality of a 2d de-convolutional layer.
`layer_init`(layer, scaling_factor)	A helper function for layer initialization.

Classes¶

`BaseNetwork`(obs_shape, **kwargs)	ABC for all networks that allows for registration
`ConvBody`(obs_shape[, channels, kernels, …])	A chain of convolutional layers.
`ConvLSTM`(hidden_units, lstm_layers, *args, …)	A CNN with a recurrent LSTM layer on top.
`FCBody`(obs_shape, hidden_units[, activation])	A chain of linear layers.
`ForwardModel`(action_dim, feature_dim)	Predicts next features, given the current features and an action.
`Identity`(obs_shape)
`InverseModel`(action_dim, feature_dim)	Predicts an action from a pair of consecutive feature vectors.
`LinearNet`(output_dim, body)	A fully connected head that follows the network base.
`NatureCNN`(feature_dim, args, *kwargs)	Adds a fully connected layer after a series of convolutional layers.
`Reshape`(*args)	A convenience module for reshaping the output at the end of the net.
`TargetNetMixin`(target_update_freq)	Mixin that adds a target network to the agent.

Class Inheritance Diagram¶

Inheritance diagram of pandemonium.networks.bodies.BaseNetwork, pandemonium.networks.bodies.ConvBody, pandemonium.networks.bodies.ConvLSTM, pandemonium.networks.bodies.FCBody, pandemonium.networks.heads.ForwardModel, pandemonium.networks.bodies.Identity, pandemonium.networks.heads.InverseModel, pandemonium.networks.heads.LinearNet, pandemonium.networks.bodies.NatureCNN, pandemonium.networks.heads.Reshape, pandemonium.networks.target_network.TargetNetMixin

pandemonium.cumulants Module¶

Classes¶

`CombinedCumulant`(cumulants, weights)	A weighted sum of multiple cumulants.
`Cumulant`()	A signal of interest \(z\) accumulated over time
`Empowerment`()	Measures the amount of causal influence an agent has on the world
`FeatureCumulant`(idx)	Tracks a single value from the feature vector \(\boldsymbol{x}\)
`Fitness`(env)	Tracks scalar extrinsic reward received from the MDP environment
`PixelChange`()	A 2-dimensional cumulant tracking pixel change from frame to frame
`Surprise`()	Tracks the ‘surprise’ associated with a new state using a density model

Class Inheritance Diagram¶

Inheritance diagram of pandemonium.cumulants.CombinedCumulant, pandemonium.cumulants.Cumulant, pandemonium.cumulants.Empowerment, pandemonium.cumulants.FeatureCumulant, pandemonium.cumulants.Fitness, pandemonium.cumulants.PixelChange, pandemonium.cumulants.Surprise

pandemonium.continuations Module¶

Classes¶

`ConstantContinuation`(gamma)	Special case of state independent discounting
`ContinuationFunction`([gamma])	State-dependent discount factor
`DiffContinuation`()	Base abstract class for parametrized continuation functions
`SigmoidContinuation`(feature_dim)
`TerminationCritic`()

Class Inheritance Diagram¶

Inheritance diagram of pandemonium.continuations.ConstantContinuation, pandemonium.continuations.ContinuationFunction, pandemonium.continuations.DiffContinuation, pandemonium.continuations.SigmoidContinuation, pandemonium.continuations.TerminationCritic

pandemonium.experience Package¶

Classes¶

`ER`(size, batch_size)	Experience Replay buffer.
`PER`(size, batch_size, alpha, beta, epsilon)	Prioritized Experience Replay buffer.
`ReplayBuffer`(args, *kwargs)	Registrar for replay buffers
`ReplayBufferMixin`(replay_buffer, …)	Mixin that adds a replay buffer to an agent.
`SegmentedER`(size, batch_size, segments, …)	Segmented Experience Replay buffer.
`SkewedER`(args, *kwargs)	Skewed Experience Replay buffer.
`Trajectory`(s0, a, r, s1, done, x0, x1, a1, …)	A series of consecutive transitions experienced by the agent.
`Transition`(s0, a, r, s1, done, x0, x1, a1, …)	An experience tuple.

Class Inheritance Diagram¶

Inheritance diagram of pandemonium.experience.buffers.ER, pandemonium.experience.buffers.PER, pandemonium.experience.buffers.ReplayBuffer, pandemonium.experience.buffers.ReplayBufferMixin, pandemonium.experience.buffers.SegmentedER, pandemonium.experience.buffers.SkewedER, pandemonium.experience.experience.Trajectory, pandemonium.experience.experience.Transition

pandemonium.traces Module¶

Classes¶

`AccumulatingTrace`(λ, trace_dim)
`DutchTrace`(λ, trace_dim)
`EligibilityTrace`(λ, trace_dim)	Base class for various eligibility traces in \(\TD\) learners.
`Retrace`(λ, trace_dim)
`Vtrace`(λ, trace_dim)

Class Inheritance Diagram¶

Inheritance diagram of pandemonium.traces.AccumulatingTrace, pandemonium.traces.DutchTrace, pandemonium.traces.EligibilityTrace, pandemonium.traces.Retrace, pandemonium.traces.Vtrace