tfrddlsim.simulation package¶
Submodules¶
tfrddlsim.simulation.policy_simulator module¶
-
class
tfrddlsim.simulation.policy_simulator.
PolicySimulationCell
(compiler: rddl2tf.compilers.compiler.Compiler, policy: tfrddlsim.policy.abstract_policy.Policy)[source]¶ Bases:
tensorflow.python.ops.rnn_cell_impl.RNNCell
SimulationCell implements a 1-step MDP transition cell.
It extends`tf.nn.rnn_cell.RNNCell` for simulating an MDP transition for a given policy. The cell input is the timestep. The hidden state is the factored MDP state. The cell output is the tuple of MDP fluents (next-state, action, interm, rewards).
Note
All fluents are represented in factored form as Tuple[tf.Tensors].
Parameters: - compiler (
rddl2tf.compiler.Compiler
) – RDDL2TensorFlow compiler. - policy (
tfrddlsim.policy.Policy
) – MDP Policy. - batch_size (int) – The size of the simulation batch.
-
__call__
(input: tensorflow.python.framework.ops.Tensor, state: Sequence[tensorflow.python.framework.ops.Tensor], scope: Union[str, NoneType] = None) → Tuple[Tuple[Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor]][source]¶ Returns the simulation cell for the given input and state.
The cell returns states, actions and interms as a sequence of tensors (i.e., all representations are factored). The reward is an 1-dimensional tensor.
Note
All tensors have shape: (batch_size, fluent_shape).
Parameters: - input (tf.Tensor) – The current MDP timestep.
- state (Sequence[tf.Tensor]) – State fluents in canonical order.
- scope (Optional[str]) – Scope for operations in graph.
Returns: (output, next_state).
Return type: Tuple[CellOutput, CellState]
-
classmethod
_dtype
(tensor: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor[source]¶ Converts tensor to tf.float32 datatype if needed.
-
classmethod
_output
(fluents: Sequence[Tuple[str, rddl2tf.core.fluent.TensorFluent]]) → Sequence[tensorflow.python.framework.ops.Tensor][source]¶ Returns output tensors for fluents.
-
classmethod
_tensors
(fluents: Sequence[Tuple[str, rddl2tf.core.fluent.TensorFluent]]) → Iterable[tensorflow.python.framework.ops.Tensor][source]¶ Yields the fluents’ tensors.
-
action_size
¶ Returns the MDP action size.
-
graph
¶ Returns the computation graph.
-
initial_state
() → Sequence[tensorflow.python.framework.ops.Tensor][source]¶ Returns the initial state tensor.
-
interm_size
¶ Returns the MDP intermediate state size.
-
output_size
¶ Returns the simulation cell output size.
-
state_size
¶ Returns the MDP state size.
- compiler (
-
class
tfrddlsim.simulation.policy_simulator.
PolicySimulator
(compiler: rddl2tf.compilers.compiler.Compiler, policy: tfrddlsim.policy.abstract_policy.Policy)[source]¶ Bases:
object
Simulator class samples MDP trajectories in the computation graph.
It implements the n-step MDP trajectory simulator using dynamic unrolling in a recurrent model. Its inputs are the MDP initial state and the number of timesteps in the horizon.
Parameters: - compiler (
rddl2tf.compiler.Compiler
) – RDDL2TensorFlow compiler. - policy (
tfrddlsim.policy.Policy
) – MDP Policy. - batch_size (int) – The size of the simulation batch.
-
classmethod
_output
(tensors: Sequence[tensorflow.python.framework.ops.Tensor], dtypes: Sequence[tensorflow.python.framework.dtypes.DType]) → Sequence[tensorflow.python.framework.ops.Tensor][source]¶ Converts tensors to the corresponding dtypes.
-
batch_size
¶ Returns the size of the simulation batch.
-
graph
¶ Returns the computation graph.
-
input_size
¶ Returns the simulation input size (e.g., timestep).
-
output_size
¶ Returns the simulation output size.
-
run
(horizon: int, initial_state: Union[typing.Sequence[tensorflow.python.framework.ops.Tensor], NoneType] = None) → Tuple[Sequence[<built-in function array>], Sequence[<built-in function array>], Sequence[<built-in function array>], Sequence[<built-in function array>], Sequence[<built-in function array>], <built-in function array>][source]¶ Builds the MDP graph and simulates in batch the trajectories with given horizon. Returns the non-fluents, states, actions, interms and rewards. Fluents and non-fluents are returned in factored form.
Note
All output arrays have shape: (batch_size, horizon, fluent_shape). Except initial state that has shape: (batch_size, fluent_shape).
Parameters: - horizon (int) – The number of timesteps in the simulation.
- initial_state (Optional[Sequence[tf.Tensor]]) – The initial state tensors.
Returns: Simulation ouput tuple.
Return type: Tuple[NonFluentsArray, StatesArray, ActionsArray, IntermsArray, np.array]
-
state_size
¶ Returns the MDP state size.
-
timesteps
(horizon: int) → tensorflow.python.framework.ops.Tensor[source]¶ Returns the input tensor for the given horizon.
-
trajectory
(horizon: int, initial_state: Union[typing.Sequence[tensorflow.python.framework.ops.Tensor], NoneType] = None) → Tuple[Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor][source]¶ Returns the ops for the trajectory generation with given horizon and initial_state.
The simulation returns states, actions and interms as a sequence of tensors (i.e., all representations are factored). The reward is a batch sized tensor. The trajectoty output is a tuple: (initial_state, states, actions, interms, rewards). If initial state is None, use default compiler’s initial state.
Note
All tensors have shape: (batch_size, horizon, fluent_shape). Except initial state that has shape: (batch_size, fluent_shape).
Parameters: - horizon (int) – The number of simulation timesteps.
- initial_state (Optional[Sequence[tf.Tensor]]) – The initial state tensors.
Returns: Trajectory output tuple.
Return type: Tuple[StateTensor, StatesTensor, ActionsTensor, IntermsTensor, tf.Tensor]
- compiler (
tfrddlsim.simulation.transition_simulator module¶
-
class
tfrddlsim.simulation.transition_simulator.
ActionSimulationCell
(compiler: rddl2tf.compilers.compiler.Compiler, batch_size: int = 1)[source]¶ Bases:
tensorflow.python.ops.rnn_cell_impl.RNNCell
ActionSimulationCell implements an MDP transition cell.
It extends a RNNCell in order to simulate the next state, given the current state and action. The cell input is the action fluents and the cell output is the next state fluents.
Note
All fluents are represented in factored form as Sequence[tf.Tensors].
Parameters: - compiler (
rddl2tf.compiler.Compiler
) – RDDL2TensorFlow compiler. - batch_size (int) – The simulation batch size.
-
__call__
(inputs: Sequence[tensorflow.python.framework.ops.Tensor], state: Sequence[tensorflow.python.framework.ops.Tensor], scope: Union[str, NoneType] = None) → Tuple[Tuple[Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor]][source]¶ Returns the transition simulation cell for the given input and state.
The cell outputs the reward as an 1-dimensional tensor, and the next state as a tuple of tensors.
Note
All tensors have shape: (batch_size, fluent_shape).
Parameters: - input (tf.Tensor) – The current action.
- state (Sequence[tf.Tensor]) – The current state.
- scope (Optional[str]) – Operations’ scope in computation graph.
Returns: (output, next_state).
Return type: Tuple[CellOutput, CellState]
-
classmethod
_output
(fluents: Sequence[Tuple[str, rddl2tf.core.fluent.TensorFluent]]) → Sequence[tensorflow.python.framework.ops.Tensor][source]¶ Converts fluents to tensors with datatype tf.float32.
-
action_size
¶ Returns the MDP action size.
-
interm_size
¶ Returns the MDP intermediate state size.
-
output_size
¶ Returns the simulation cell output size.
-
state_size
¶ Returns the MDP state size.
- compiler (