Welcome to tfrddlsim’s documentation!

tfrddlsim package

Subpackages

tfrddlsim.policy package

Submodules
tfrddlsim.policy.abstract_policy module
class tfrddlsim.policy.abstract_policy.Policy[source]

Bases: object

Abstract base class for representing Policy functions.

__call__(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor][source]

Returns action fluents for the current state and timestep.

Parameters:
  • state (Sequence[tf.Tensor]) – The current state fluents.
  • timestep (tf.Tensor) – The current timestep.
Returns:

A tuple of action fluents.

Return type:

Sequence[tf.Tensor]

tfrddlsim.policy.default_policy module
class tfrddlsim.policy.default_policy.DefaultPolicy(compiler: rddl2tf.compilers.compiler.Compiler, batch_size: int)[source]

Bases: tfrddlsim.policy.abstract_policy.Policy

DefaultPolicy class.

The default policy returns the default action fluents regardless of the current state and timestep.

Parameters:
  • compiler (rddl2tf.compiler.Compiler) – A RDDL2TensorFlow compiler.
  • batch_size (int) – The batch size.
__call__(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor][source]

Returns the default action fluents regardless of the current state and timestep.

Parameters:
  • state (Sequence[tf.Tensor]) – The current state fluents.
  • timestep (tf.Tensor) – The current timestep.
Returns:

A tuple of action fluents.

Return type:

Sequence[tf.Tensor]

tfrddlsim.policy.random_policy module
class tfrddlsim.policy.random_policy.RandomPolicy(compiler: rddl2tf.compilers.compiler.Compiler)[source]

Bases: tfrddlsim.policy.abstract_policy.Policy

RandomPolicy class.

The random policy samples action fluents uniformly. It checks for all action preconditions and constraints. The range of each action fluent is defined by action bounds constraints if defined in the RDDL model, or by default maximum values. values.

Parameters:
  • compiler (rddl2tf.compiler.Compiler) – A RDDL2TensorFlow compiler.
  • batch_size (int) – The batch size.
compiler

rddl2tf.compiler.Compiler – A RDDL2TensorFlow compiler.

batch_size

int – The batch size.

MAX_INT_VALUE = 5
MAX_REAL_VALUE = 5.0
__call__(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor][source]

Returns sampled action fluents for the current state and timestep.

Parameters:
  • state (Sequence[tf.Tensor]) – The current state fluents.
  • timestep (tf.Tensor) – The current timestep.
Returns:

A tuple of action fluents.

Return type:

Sequence[tf.Tensor]

_check_preconditions(state: Sequence[tensorflow.python.framework.ops.Tensor], action: Sequence[tensorflow.python.framework.ops.Tensor], bound_constraints: Dict[str, Tuple[Union[rddl2tf.core.fluent.TensorFluent, NoneType], Union[rddl2tf.core.fluent.TensorFluent, NoneType]]], default: Sequence[tensorflow.python.framework.ops.Tensor]) → Tuple[tensorflow.python.framework.ops.Tensor, Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor][source]

Samples action fluents until all preconditions are satisfied.

Checks action preconditions for the sampled action and current state, and iff all preconditions are satisfied it returns the sampled action fluents.

Parameters:
  • state (Sequence[tf.Tensor]) – A list of state fluents.
  • action (Sequence[tf.Tensor]) – A list of action fluents.
  • bound_constraints (Dict[str, Tuple[Optional[TensorFluent], Optional[TensorFluent]]]) – The bounds for each action fluent.
  • default (Sequence[tf.Tensor]) – The default action fluents.
Returns:

A tuple with an integer tensor corresponding to the number of samples, action fluents and a boolean tensor for checking all action preconditions.

Return type:

Tuple[tf.Tensor, Sequence[tf.Tensor], tf.Tensor]

_sample_action(constraints: Dict[str, Tuple[Union[rddl2tf.core.fluent.TensorFluent, NoneType], Union[rddl2tf.core.fluent.TensorFluent, NoneType]]], default: Sequence[tensorflow.python.framework.ops.Tensor], prob: float = 0.3) → Sequence[tensorflow.python.framework.ops.Tensor][source]

Samples action fluents respecting the given bound constraints.

With probability prob it chooses the action fluent default value, with probability 1-prob it samples the fluent w.r.t. its bounds.

Parameters:
  • constraints (Dict[str, Tuple[Optional[TensorFluent], Optional[TensorFluent]]]) – The bounds for each action fluent.
  • default (Sequence[tf.Tensor]) – The default action fluents.
  • prob (float) – A probability measure.
Returns:

A tuple of action fluents.

Return type:

Sequence[tf.Tensor]

_sample_action_fluent(name: str, dtype: tensorflow.python.framework.dtypes.DType, size: Sequence[int], constraints: Dict[str, Tuple[Union[rddl2tf.core.fluent.TensorFluent, NoneType], Union[rddl2tf.core.fluent.TensorFluent, NoneType]]], default_value: tensorflow.python.framework.ops.Tensor, prob: float) → tensorflow.python.framework.ops.Tensor[source]

Samples the action fluent with given name, dtype, and size.

With probability prob it chooses the action fluent default_value, with probability 1-prob it samples the fluent w.r.t. its constraints.

Parameters:
  • name (str) – The name of the action fluent.
  • dtype (tf.DType) – The data type of the action fluent.
  • size (Sequence[int]) – The size and shape of the action fluent.
  • constraints (Dict[str, Tuple[Optional[TensorFluent], Optional[TensorFluent]]]) – The bounds for each action fluent.
  • default_value (tf.Tensor) – The default value for the action fluent.
  • prob (float) – A probability measure.
Returns:

A tensor for sampling the action fluent.

Return type:

tf.Tensor

_sample_actions(state: Sequence[tensorflow.python.framework.ops.Tensor]) → Tuple[Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor][source]

Returns sampled action fluents and tensors related to the sampling.

Parameters:state (Sequence[tf.Tensor]) – A list of state fluents.
Returns:A tuple with action fluents, an integer tensor for the number of samples, and a boolean tensor for checking all action preconditions.
Return type:Tuple[Sequence[tf.Tensor], tf.Tensor, tf.Tensor]
Module contents

tfrddlsim.simulation package

Submodules
tfrddlsim.simulation.policy_simulator module
class tfrddlsim.simulation.policy_simulator.PolicySimulationCell(compiler: rddl2tf.compilers.compiler.Compiler, policy: tfrddlsim.policy.abstract_policy.Policy)[source]

Bases: tensorflow.python.ops.rnn_cell_impl.RNNCell

SimulationCell implements a 1-step MDP transition cell.

It extends`tf.nn.rnn_cell.RNNCell` for simulating an MDP transition for a given policy. The cell input is the timestep. The hidden state is the factored MDP state. The cell output is the tuple of MDP fluents (next-state, action, interm, rewards).

Note

All fluents are represented in factored form as Tuple[tf.Tensors].

Parameters:
  • compiler (rddl2tf.compiler.Compiler) – RDDL2TensorFlow compiler.
  • policy (tfrddlsim.policy.Policy) – MDP Policy.
  • batch_size (int) – The size of the simulation batch.
__call__(input: tensorflow.python.framework.ops.Tensor, state: Sequence[tensorflow.python.framework.ops.Tensor], scope: Union[str, NoneType] = None) → Tuple[Tuple[Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor]][source]

Returns the simulation cell for the given input and state.

The cell returns states, actions and interms as a sequence of tensors (i.e., all representations are factored). The reward is an 1-dimensional tensor.

Note

All tensors have shape: (batch_size, fluent_shape).

Parameters:
  • input (tf.Tensor) – The current MDP timestep.
  • state (Sequence[tf.Tensor]) – State fluents in canonical order.
  • scope (Optional[str]) – Scope for operations in graph.
Returns:

(output, next_state).

Return type:

Tuple[CellOutput, CellState]

classmethod _dtype(tensor: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor[source]

Converts tensor to tf.float32 datatype if needed.

classmethod _output(fluents: Sequence[Tuple[str, rddl2tf.core.fluent.TensorFluent]]) → Sequence[tensorflow.python.framework.ops.Tensor][source]

Returns output tensors for fluents.

classmethod _tensors(fluents: Sequence[Tuple[str, rddl2tf.core.fluent.TensorFluent]]) → Iterable[tensorflow.python.framework.ops.Tensor][source]

Yields the fluents’ tensors.

action_size

Returns the MDP action size.

graph

Returns the computation graph.

initial_state() → Sequence[tensorflow.python.framework.ops.Tensor][source]

Returns the initial state tensor.

interm_size

Returns the MDP intermediate state size.

output_size

Returns the simulation cell output size.

state_size

Returns the MDP state size.

class tfrddlsim.simulation.policy_simulator.PolicySimulator(compiler: rddl2tf.compilers.compiler.Compiler, policy: tfrddlsim.policy.abstract_policy.Policy)[source]

Bases: object

Simulator class samples MDP trajectories in the computation graph.

It implements the n-step MDP trajectory simulator using dynamic unrolling in a recurrent model. Its inputs are the MDP initial state and the number of timesteps in the horizon.

Parameters:
  • compiler (rddl2tf.compiler.Compiler) – RDDL2TensorFlow compiler.
  • policy (tfrddlsim.policy.Policy) – MDP Policy.
  • batch_size (int) – The size of the simulation batch.
classmethod _output(tensors: Sequence[tensorflow.python.framework.ops.Tensor], dtypes: Sequence[tensorflow.python.framework.dtypes.DType]) → Sequence[tensorflow.python.framework.ops.Tensor][source]

Converts tensors to the corresponding dtypes.

batch_size

Returns the size of the simulation batch.

graph

Returns the computation graph.

input_size

Returns the simulation input size (e.g., timestep).

output_size

Returns the simulation output size.

run(horizon: int, initial_state: Union[typing.Sequence[tensorflow.python.framework.ops.Tensor], NoneType] = None) → Tuple[Sequence[<built-in function array>], Sequence[<built-in function array>], Sequence[<built-in function array>], Sequence[<built-in function array>], Sequence[<built-in function array>], <built-in function array>][source]

Builds the MDP graph and simulates in batch the trajectories with given horizon. Returns the non-fluents, states, actions, interms and rewards. Fluents and non-fluents are returned in factored form.

Note

All output arrays have shape: (batch_size, horizon, fluent_shape). Except initial state that has shape: (batch_size, fluent_shape).

Parameters:
  • horizon (int) – The number of timesteps in the simulation.
  • initial_state (Optional[Sequence[tf.Tensor]]) – The initial state tensors.
Returns:

Simulation ouput tuple.

Return type:

Tuple[NonFluentsArray, StatesArray, ActionsArray, IntermsArray, np.array]

state_size

Returns the MDP state size.

timesteps(horizon: int) → tensorflow.python.framework.ops.Tensor[source]

Returns the input tensor for the given horizon.

trajectory(horizon: int, initial_state: Union[typing.Sequence[tensorflow.python.framework.ops.Tensor], NoneType] = None) → Tuple[Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor][source]

Returns the ops for the trajectory generation with given horizon and initial_state.

The simulation returns states, actions and interms as a sequence of tensors (i.e., all representations are factored). The reward is a batch sized tensor. The trajectoty output is a tuple: (initial_state, states, actions, interms, rewards). If initial state is None, use default compiler’s initial state.

Note

All tensors have shape: (batch_size, horizon, fluent_shape). Except initial state that has shape: (batch_size, fluent_shape).

Parameters:
  • horizon (int) – The number of simulation timesteps.
  • initial_state (Optional[Sequence[tf.Tensor]]) – The initial state tensors.
Returns:

Trajectory output tuple.

Return type:

Tuple[StateTensor, StatesTensor, ActionsTensor, IntermsTensor, tf.Tensor]

tfrddlsim.simulation.transition_simulator module
class tfrddlsim.simulation.transition_simulator.ActionSimulationCell(compiler: rddl2tf.compilers.compiler.Compiler, batch_size: int = 1)[source]

Bases: tensorflow.python.ops.rnn_cell_impl.RNNCell

ActionSimulationCell implements an MDP transition cell.

It extends a RNNCell in order to simulate the next state, given the current state and action. The cell input is the action fluents and the cell output is the next state fluents.

Note

All fluents are represented in factored form as Sequence[tf.Tensors].

Parameters:
  • compiler (rddl2tf.compiler.Compiler) – RDDL2TensorFlow compiler.
  • batch_size (int) – The simulation batch size.
__call__(inputs: Sequence[tensorflow.python.framework.ops.Tensor], state: Sequence[tensorflow.python.framework.ops.Tensor], scope: Union[str, NoneType] = None) → Tuple[Tuple[Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor]][source]

Returns the transition simulation cell for the given input and state.

The cell outputs the reward as an 1-dimensional tensor, and the next state as a tuple of tensors.

Note

All tensors have shape: (batch_size, fluent_shape).

Parameters:
  • input (tf.Tensor) – The current action.
  • state (Sequence[tf.Tensor]) – The current state.
  • scope (Optional[str]) – Operations’ scope in computation graph.
Returns:

(output, next_state).

Return type:

Tuple[CellOutput, CellState]

classmethod _output(fluents: Sequence[Tuple[str, rddl2tf.core.fluent.TensorFluent]]) → Sequence[tensorflow.python.framework.ops.Tensor][source]

Converts fluents to tensors with datatype tf.float32.

action_size

Returns the MDP action size.

interm_size

Returns the MDP intermediate state size.

output_size

Returns the simulation cell output size.

state_size

Returns the MDP state size.

Module contents

tfrddlsim.viz package

Submodules
tfrddlsim.viz.abstract_visualizer module
tfrddlsim.viz.generic_visualizer module
class tfrddlsim.viz.generic_visualizer.GenericVisualizer(compiler: rddl2tf.compilers.compiler.Compiler, verbose: bool)[source]

Bases: tfrddlsim.viz.abstract_visualizer.Visualizer

GenericVisualizer is a generic text-based trajectory visualizer.

Parameters:
  • compiler (rddl2tf.compiler.Compiler) – RDDL2TensorFlow compiler
  • verbose (bool) – Verbosity flag
_render_batch(non_fluents: Sequence[Tuple[str, Union[int, float, <built-in function array>]]], states: Sequence[Tuple[str, <built-in function array>]], actions: Sequence[Tuple[str, <built-in function array>]], interms: Sequence[Tuple[str, <built-in function array>]], rewards: <built-in function array>, horizon: Union[int, NoneType] = None) → None[source]

Prints non_fluents, states, actions, interms and rewards for given horizon.

Parameters:
  • states (Sequence[Tuple[str, np.array]]) – A state trajectory.
  • actions (Sequence[Tuple[str, np.array]]) – An action trajectory.
  • interms (Sequence[Tuple[str, np.array]]) – An interm state trajectory.
  • rewards (np.array) – Sequence of rewards (1-dimensional array).
  • horizon (Optional[int]) – Number of timesteps.
_render_fluent_timestep(fluent_type: str, fluents: Sequence[Tuple[str, <built-in function array>]], fluent_variables: Sequence[Tuple[str, List[str]]]) → None[source]

Prints fluents of given fluent_type as list of instantiated variables with corresponding values.

Parameters:
  • fluent_type (str) – Fluent type.
  • fluents (Sequence[Tuple[str, np.array]]) – List of pairs (fluent_name, fluent_values).
  • fluent_variables (Sequence[Tuple[str, List[str]]]) – List of pairs (fluent_name, args).
_render_reward(r: numpy.float32) → None[source]

Prints reward r.

_render_round_end(rewards: <built-in function array>) → None[source]

Prints round end information about rewards.

_render_round_init(horizon: int, non_fluents: Sequence[Tuple[str, Union[int, float, <built-in function array>]]]) → None[source]

Prints round init information about horizon and non_fluents.

_render_timestep(t: int, s: Sequence[Tuple[str, <built-in function array>]], a: Sequence[Tuple[str, <built-in function array>]], f: Sequence[Tuple[str, <built-in function array>]], r: numpy.float32) → None[source]

Prints fluents and rewards for the given timestep t.

Parameters:
  • t (int) – timestep
  • (Sequence[Tuple[str], np.array] (f) – State fluents.
  • (Sequence[Tuple[str], np.array] – Action fluents.
  • (Sequence[Tuple[str], np.array] – Interm state fluents.
  • r (np.float32) – Reward.
_render_trajectories(trajectories: Tuple[Sequence[Tuple[str, Union[int, float, <built-in function array>]]], Sequence[Tuple[str, <built-in function array>]], Sequence[Tuple[str, <built-in function array>]], Sequence[Tuple[str, <built-in function array>]], <built-in function array>]) → None[source]

Prints the first batch of simulated trajectories.

Parameters:trajectories – NonFluents, states, actions, interms and rewards.
render(trajectories: Tuple[Sequence[Tuple[str, Union[int, float, <built-in function array>]]], Sequence[Tuple[str, <built-in function array>]], Sequence[Tuple[str, <built-in function array>]], Sequence[Tuple[str, <built-in function array>]], <built-in function array>], batch: Union[int, NoneType] = None) → None[source]

Prints the simulated trajectories.

Parameters:
  • trajectories – NonFluents, states, actions, interms and rewards.
  • batch – Number of batches to render.
tfrddlsim.viz.navigation_visualizer module
Module contents

Module contents

Indices and tables