tfrddlsim.simulation package

Submodules

tfrddlsim.simulation.policy_simulator module

class tfrddlsim.simulation.policy_simulator.PolicySimulationCell(compiler: rddl2tf.compilers.compiler.Compiler, policy: tfrddlsim.policy.abstract_policy.Policy)[source]

Bases: tensorflow.python.ops.rnn_cell_impl.RNNCell

SimulationCell implements a 1-step MDP transition cell.

It extends`tf.nn.rnn_cell.RNNCell` for simulating an MDP transition for a given policy. The cell input is the timestep. The hidden state is the factored MDP state. The cell output is the tuple of MDP fluents (next-state, action, interm, rewards).

Note

All fluents are represented in factored form as Tuple[tf.Tensors].

Parameters:
  • compiler (rddl2tf.compiler.Compiler) – RDDL2TensorFlow compiler.
  • policy (tfrddlsim.policy.Policy) – MDP Policy.
  • batch_size (int) – The size of the simulation batch.
__call__(input: tensorflow.python.framework.ops.Tensor, state: Sequence[tensorflow.python.framework.ops.Tensor], scope: Union[str, NoneType] = None) → Tuple[Tuple[Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor]][source]

Returns the simulation cell for the given input and state.

The cell returns states, actions and interms as a sequence of tensors (i.e., all representations are factored). The reward is an 1-dimensional tensor.

Note

All tensors have shape: (batch_size, fluent_shape).

Parameters:
  • input (tf.Tensor) – The current MDP timestep.
  • state (Sequence[tf.Tensor]) – State fluents in canonical order.
  • scope (Optional[str]) – Scope for operations in graph.
Returns:

(output, next_state).

Return type:

Tuple[CellOutput, CellState]

classmethod _dtype(tensor: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor[source]

Converts tensor to tf.float32 datatype if needed.

classmethod _output(fluents: Sequence[Tuple[str, rddl2tf.core.fluent.TensorFluent]]) → Sequence[tensorflow.python.framework.ops.Tensor][source]

Returns output tensors for fluents.

classmethod _tensors(fluents: Sequence[Tuple[str, rddl2tf.core.fluent.TensorFluent]]) → Iterable[tensorflow.python.framework.ops.Tensor][source]

Yields the fluents’ tensors.

action_size

Returns the MDP action size.

graph

Returns the computation graph.

initial_state() → Sequence[tensorflow.python.framework.ops.Tensor][source]

Returns the initial state tensor.

interm_size

Returns the MDP intermediate state size.

output_size

Returns the simulation cell output size.

state_size

Returns the MDP state size.

class tfrddlsim.simulation.policy_simulator.PolicySimulator(compiler: rddl2tf.compilers.compiler.Compiler, policy: tfrddlsim.policy.abstract_policy.Policy)[source]

Bases: object

Simulator class samples MDP trajectories in the computation graph.

It implements the n-step MDP trajectory simulator using dynamic unrolling in a recurrent model. Its inputs are the MDP initial state and the number of timesteps in the horizon.

Parameters:
  • compiler (rddl2tf.compiler.Compiler) – RDDL2TensorFlow compiler.
  • policy (tfrddlsim.policy.Policy) – MDP Policy.
  • batch_size (int) – The size of the simulation batch.
classmethod _output(tensors: Sequence[tensorflow.python.framework.ops.Tensor], dtypes: Sequence[tensorflow.python.framework.dtypes.DType]) → Sequence[tensorflow.python.framework.ops.Tensor][source]

Converts tensors to the corresponding dtypes.

batch_size

Returns the size of the simulation batch.

graph

Returns the computation graph.

input_size

Returns the simulation input size (e.g., timestep).

output_size

Returns the simulation output size.

run(horizon: int, initial_state: Union[typing.Sequence[tensorflow.python.framework.ops.Tensor], NoneType] = None) → Tuple[Sequence[<built-in function array>], Sequence[<built-in function array>], Sequence[<built-in function array>], Sequence[<built-in function array>], Sequence[<built-in function array>], <built-in function array>][source]

Builds the MDP graph and simulates in batch the trajectories with given horizon. Returns the non-fluents, states, actions, interms and rewards. Fluents and non-fluents are returned in factored form.

Note

All output arrays have shape: (batch_size, horizon, fluent_shape). Except initial state that has shape: (batch_size, fluent_shape).

Parameters:
  • horizon (int) – The number of timesteps in the simulation.
  • initial_state (Optional[Sequence[tf.Tensor]]) – The initial state tensors.
Returns:

Simulation ouput tuple.

Return type:

Tuple[NonFluentsArray, StatesArray, ActionsArray, IntermsArray, np.array]

state_size

Returns the MDP state size.

timesteps(horizon: int) → tensorflow.python.framework.ops.Tensor[source]

Returns the input tensor for the given horizon.

trajectory(horizon: int, initial_state: Union[typing.Sequence[tensorflow.python.framework.ops.Tensor], NoneType] = None) → Tuple[Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor][source]

Returns the ops for the trajectory generation with given horizon and initial_state.

The simulation returns states, actions and interms as a sequence of tensors (i.e., all representations are factored). The reward is a batch sized tensor. The trajectoty output is a tuple: (initial_state, states, actions, interms, rewards). If initial state is None, use default compiler’s initial state.

Note

All tensors have shape: (batch_size, horizon, fluent_shape). Except initial state that has shape: (batch_size, fluent_shape).

Parameters:
  • horizon (int) – The number of simulation timesteps.
  • initial_state (Optional[Sequence[tf.Tensor]]) – The initial state tensors.
Returns:

Trajectory output tuple.

Return type:

Tuple[StateTensor, StatesTensor, ActionsTensor, IntermsTensor, tf.Tensor]

tfrddlsim.simulation.transition_simulator module

class tfrddlsim.simulation.transition_simulator.ActionSimulationCell(compiler: rddl2tf.compilers.compiler.Compiler, batch_size: int = 1)[source]

Bases: tensorflow.python.ops.rnn_cell_impl.RNNCell

ActionSimulationCell implements an MDP transition cell.

It extends a RNNCell in order to simulate the next state, given the current state and action. The cell input is the action fluents and the cell output is the next state fluents.

Note

All fluents are represented in factored form as Sequence[tf.Tensors].

Parameters:
  • compiler (rddl2tf.compiler.Compiler) – RDDL2TensorFlow compiler.
  • batch_size (int) – The simulation batch size.
__call__(inputs: Sequence[tensorflow.python.framework.ops.Tensor], state: Sequence[tensorflow.python.framework.ops.Tensor], scope: Union[str, NoneType] = None) → Tuple[Tuple[Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor]][source]

Returns the transition simulation cell for the given input and state.

The cell outputs the reward as an 1-dimensional tensor, and the next state as a tuple of tensors.

Note

All tensors have shape: (batch_size, fluent_shape).

Parameters:
  • input (tf.Tensor) – The current action.
  • state (Sequence[tf.Tensor]) – The current state.
  • scope (Optional[str]) – Operations’ scope in computation graph.
Returns:

(output, next_state).

Return type:

Tuple[CellOutput, CellState]

classmethod _output(fluents: Sequence[Tuple[str, rddl2tf.core.fluent.TensorFluent]]) → Sequence[tensorflow.python.framework.ops.Tensor][source]

Converts fluents to tensors with datatype tf.float32.

action_size

Returns the MDP action size.

interm_size

Returns the MDP intermediate state size.

output_size

Returns the simulation cell output size.

state_size

Returns the MDP state size.

Module contents