Library Reference

pandemonium.agent Module

Classes

Agent(feature_extractor, behavior_policy, horde)

Class Inheritance Diagram

Inheritance diagram of pandemonium.agent.Agent

pandemonium.gvf Module

Classes

GVF(target_policy, continuation, cumulant)

General Value Function

Class Inheritance Diagram

Inheritance diagram of pandemonium.gvf.GVF

pandemonium.demons Package

Classes

CategoricalQ(num_atoms, v_min, v_max)

Categorical Q-learning mixin.

ControlDemon(aqf, avf, **kwargs)

Learns the optimal policy while learning to predict

Demon(gvf, avf, feature, behavior_policy, …)

General Value Function Approximator

DuellingMixin()

Mixin for value-based control algorithms that uses two separate estimators for action-values and state-values.

LinearDemon(feature, output_dim, *args, **kwargs)

Approximates state or state-action values using linear projection

OfflineTD([criterion])

Base class for forward-view \(\TD\) methods.

OfflineTDControl(**kwargs)

Offline \(\TD\) for control tasks.

OfflineTDPrediction([criterion])

Offline \(\TD\) for prediction tasks.

OnlineTD(trace_decay, eligibility, …)

Base class for backward-view (online) \(\TD\) methods.

OnlineTDControl(**kwargs)

Base class for online \(\TD\) methods for control tasks.

OnlineTDPrediction(trace_decay, eligibility, …)

Semi-gradient \(\TD{(\lambda)}\) rule for estimating \(\tilde{v} \approx v_{\pi}\)

ParametricDemon(**kwargs)

Base class fot parametrized Demons implemented in PyTorch

PredictionDemon(gvf, avf, feature, …)

Collects factual knowledge about environment by learning to predict

QLearning(**kwargs)

Classic Q-learning update rule.

SARSA(**kwargs)

Semi-gradient \(\SARSA{(\lambda)}\).

SARSE(**kwargs)

Semi-gradient Expected \(\SARSA{(\lambda)}\).

TDControl(**kwargs)

TDPrediction(**kwargs)

TDn(**kwargs)

\(n\)-step \(\TD\) for estimating \(V \approx v_{\pi}\)

TTD(trace_decay, **kwargs)

Truncated \(\TD{(\lambda)}\)

Class Inheritance Diagram

Inheritance diagram of pandemonium.demons.control.CategoricalQ, pandemonium.demons.demon.ControlDemon, pandemonium.demons.demon.Demon, pandemonium.demons.control.DuellingMixin, pandemonium.demons.demon.LinearDemon, pandemonium.demons.offline_td.OfflineTD, pandemonium.demons.control.OfflineTDControl, pandemonium.demons.prediction.OfflineTDPrediction, pandemonium.demons.online_td.OnlineTD, pandemonium.demons.control.OnlineTDControl, pandemonium.demons.prediction.OnlineTDPrediction, pandemonium.demons.demon.ParametricDemon, pandemonium.demons.demon.PredictionDemon, pandemonium.demons.control.QLearning, pandemonium.demons.control.SARSA, pandemonium.demons.control.SARSE, pandemonium.demons.control.TDControl, pandemonium.demons.prediction.TDPrediction, pandemonium.demons.offline_td.TDn, pandemonium.demons.offline_td.TTD

pandemonium.implementations Package

Functions

create_horde(config, env, feature_extractor, …)

Classes

AC(*args, **kwargs)

Actor Critic equipped with linear FA and eligibility traces.

Curiosity(icm)

Measures the novelty using prediction error of the forward model.

DQN(feature, behavior_policy, replay_buffer, …)

Deep Q-Network with all the bells and whistles mixed in.

DoubleQLearning(**kwargs)

Implements online version of Double Q-learning.

ICM(feature, behavior_policy, beta)

Intrinsic Curiosity Module

MultistepQLearning(**kwargs)

MultistepSARSA(**kwargs)

OnlineQLearning(**kwargs)

Simple online Q-learning.

OnlineSARSA(**kwargs)

PixelControl(feature, behavior_policy, …)

Duelling de-convolutional network for auxiliary pixel control task

RewardPrediction(replay_buffer, feature, …)

A demon that maximizes un-discounted \(n\)-step return.

ValueReplay(replay_buffer, **kwargs)

\(n \text{-step} \TD\) performed on the past experiences

Class Inheritance Diagram

Inheritance diagram of pandemonium.implementations.a2c.AC, pandemonium.implementations.icm.Curiosity, pandemonium.implementations.rainbow.DQN, pandemonium.implementations.q_learning.DoubleQLearning, pandemonium.implementations.icm.ICM, pandemonium.implementations.q_learning.MultistepQLearning, pandemonium.implementations.sarsa.MultistepSARSA, pandemonium.implementations.q_learning.OnlineQLearning, pandemonium.implementations.sarsa.OnlineSARSA, pandemonium.implementations.unreal.PixelControl, pandemonium.implementations.unreal.RewardPrediction, pandemonium.implementations.unreal.ValueReplay

pandemonium.horde Module

Classes

Horde(demons, device, aggregation_fn, …)

A horde of Demons

Class Inheritance Diagram

Inheritance diagram of pandemonium.horde.Horde

pandemonium.policies Package

Functions

randargmax(b, rng)

A random tie-breaking argmax

torch_argmax_mask(q, dim)

Returns a random tie-breaking argmax mask

Classes

Continuous(feature_dim, action_space, **params)

DiffPolicy(*args, **kwargs)

Base class for parametrized policies

Discrete(action_space, feature_dim)

Base class for discrete policies.

Egreedy(epsilon, …)

\(\epsilon\)-greedy policy for discrete action spaces.

EgreedyOverOptions(epsilon, …)

\(\epsilon\)-greedy policy over options.

Greedy(*args, **kwargs)

Convenience for greedy policy, often used as a learning target.

HierarchicalPolicy(option_space)

A decision rule for discrete option spaces.

Policy(feature_dim, action_space, **params)

Base abstract class for decision making rules

Random(action_space, feature_dim)

Picks an option at random.

SoftmaxPolicy(temperature, …)

Picks actions with probability proportional to Q-values.

VPG(entropy_coefficient, *args, **kwargs)

Vanilla Policy Gradient

Class Inheritance Diagram

Inheritance diagram of pandemonium.policies.continuous.Continuous, pandemonium.policies.gradient.DiffPolicy, pandemonium.policies.discrete.Discrete, pandemonium.policies.discrete.Egreedy, pandemonium.policies.discrete.EgreedyOverOptions, pandemonium.policies.discrete.Greedy, pandemonium.policies.discrete.HierarchicalPolicy, pandemonium.policies.policy.Policy, pandemonium.policies.discrete.Random, pandemonium.policies.discrete.SoftmaxPolicy, pandemonium.policies.gradient.VPG

pandemonium.networks Package

Functions

conv2d_size_out(size[, kernel_size, stride, …])

Determines the dimensionality of a 2d convolutional layer.

deconv2d_size_out(size, kernel_size, stride, …)

Determines the dimensionality of a 2d de-convolutional layer.

layer_init(layer, scaling_factor)

A helper function for layer initialization.

Classes

BaseNetwork(obs_shape, **kwargs)

ABC for all networks that allows for registration

ConvBody(obs_shape[, channels, kernels, …])

A chain of convolutional layers.

ConvLSTM(hidden_units, lstm_layers, *args, …)

A CNN with a recurrent LSTM layer on top.

FCBody(obs_shape, hidden_units[, activation])

A chain of linear layers.

ForwardModel(action_dim, feature_dim)

Predicts next features, given the current features and an action.

Identity(obs_shape)

InverseModel(action_dim, feature_dim)

Predicts an action from a pair of consecutive feature vectors.

LinearNet(output_dim, body)

A fully connected head that follows the network base.

NatureCNN(feature_dim, *args, **kwargs)

Adds a fully connected layer after a series of convolutional layers.

Reshape(*args)

A convenience module for reshaping the output at the end of the net.

TargetNetMixin(target_update_freq)

Mixin that adds a target network to the agent.

Class Inheritance Diagram

Inheritance diagram of pandemonium.networks.bodies.BaseNetwork, pandemonium.networks.bodies.ConvBody, pandemonium.networks.bodies.ConvLSTM, pandemonium.networks.bodies.FCBody, pandemonium.networks.heads.ForwardModel, pandemonium.networks.bodies.Identity, pandemonium.networks.heads.InverseModel, pandemonium.networks.heads.LinearNet, pandemonium.networks.bodies.NatureCNN, pandemonium.networks.heads.Reshape, pandemonium.networks.target_network.TargetNetMixin

pandemonium.cumulants Module

Classes

CombinedCumulant(cumulants, weights)

A weighted sum of multiple cumulants.

Cumulant()

A signal of interest \(z\) accumulated over time

Empowerment()

Measures the amount of causal influence an agent has on the world

FeatureCumulant(idx)

Tracks a single value from the feature vector \(\boldsymbol{x}\)

Fitness(env)

Tracks scalar extrinsic reward received from the MDP environment

PixelChange()

A 2-dimensional cumulant tracking pixel change from frame to frame

Surprise()

Tracks the ‘surprise’ associated with a new state using a density model

Class Inheritance Diagram

Inheritance diagram of pandemonium.cumulants.CombinedCumulant, pandemonium.cumulants.Cumulant, pandemonium.cumulants.Empowerment, pandemonium.cumulants.FeatureCumulant, pandemonium.cumulants.Fitness, pandemonium.cumulants.PixelChange, pandemonium.cumulants.Surprise

pandemonium.continuations Module

Classes

ConstantContinuation(gamma)

Special case of state independent discounting

ContinuationFunction([gamma])

State-dependent discount factor

DiffContinuation()

Base abstract class for parametrized continuation functions

SigmoidContinuation(feature_dim)

TerminationCritic()

Class Inheritance Diagram

Inheritance diagram of pandemonium.continuations.ConstantContinuation, pandemonium.continuations.ContinuationFunction, pandemonium.continuations.DiffContinuation, pandemonium.continuations.SigmoidContinuation, pandemonium.continuations.TerminationCritic

pandemonium.experience Package

Classes

ER(size, batch_size)

Experience Replay buffer.

PER(size, batch_size, alpha, beta, epsilon)

Prioritized Experience Replay buffer.

ReplayBuffer(*args, **kwargs)

Registrar for replay buffers

ReplayBufferMixin(replay_buffer, …)

Mixin that adds a replay buffer to an agent.

SegmentedER(size, batch_size, segments, …)

Segmented Experience Replay buffer.

SkewedER(*args, **kwargs)

Skewed Experience Replay buffer.

Trajectory(s0, a, r, s1, done, x0, x1, a1, …)

A series of consecutive transitions experienced by the agent.

Transition(s0, a, r, s1, done, x0, x1, a1, …)

An experience tuple.

Class Inheritance Diagram

Inheritance diagram of pandemonium.experience.buffers.ER, pandemonium.experience.buffers.PER, pandemonium.experience.buffers.ReplayBuffer, pandemonium.experience.buffers.ReplayBufferMixin, pandemonium.experience.buffers.SegmentedER, pandemonium.experience.buffers.SkewedER, pandemonium.experience.experience.Trajectory, pandemonium.experience.experience.Transition

pandemonium.traces Module

Classes

AccumulatingTrace(λ, trace_dim)

DutchTrace(λ, trace_dim)

EligibilityTrace(λ, trace_dim)

Base class for various eligibility traces in \(\TD\) learners.

Retrace(λ, trace_dim)

Vtrace(λ, trace_dim)

Class Inheritance Diagram

Inheritance diagram of pandemonium.traces.AccumulatingTrace, pandemonium.traces.DutchTrace, pandemonium.traces.EligibilityTrace, pandemonium.traces.Retrace, pandemonium.traces.Vtrace