Welcome to tfrddlsim’s documentation!¶
tfrddlsim package¶
Subpackages¶
tfrddlsim.policy package¶
Submodules¶
tfrddlsim.policy.abstract_policy module¶
-
class
tfrddlsim.policy.abstract_policy.
Policy
[source]¶ Bases:
object
Abstract base class for representing Policy functions.
-
__call__
(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor][source]¶ Returns action fluents for the current state and timestep.
Parameters: - state (Sequence[tf.Tensor]) – The current state fluents.
- timestep (tf.Tensor) – The current timestep.
Returns: A tuple of action fluents.
Return type: Sequence[tf.Tensor]
-
tfrddlsim.policy.default_policy module¶
-
class
tfrddlsim.policy.default_policy.
DefaultPolicy
(compiler: rddl2tf.compilers.compiler.Compiler, batch_size: int)[source]¶ Bases:
tfrddlsim.policy.abstract_policy.Policy
DefaultPolicy class.
The default policy returns the default action fluents regardless of the current state and timestep.
Parameters: - compiler (
rddl2tf.compiler.Compiler
) – A RDDL2TensorFlow compiler. - batch_size (int) – The batch size.
-
__call__
(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor][source]¶ Returns the default action fluents regardless of the current state and timestep.
Parameters: - state (Sequence[tf.Tensor]) – The current state fluents.
- timestep (tf.Tensor) – The current timestep.
Returns: A tuple of action fluents.
Return type: Sequence[tf.Tensor]
- compiler (
tfrddlsim.policy.random_policy module¶
-
class
tfrddlsim.policy.random_policy.
RandomPolicy
(compiler: rddl2tf.compilers.compiler.Compiler)[source]¶ Bases:
tfrddlsim.policy.abstract_policy.Policy
RandomPolicy class.
The random policy samples action fluents uniformly. It checks for all action preconditions and constraints. The range of each action fluent is defined by action bounds constraints if defined in the RDDL model, or by default maximum values. values.
Parameters: - compiler (
rddl2tf.compiler.Compiler
) – A RDDL2TensorFlow compiler. - batch_size (int) – The batch size.
-
compiler
¶ rddl2tf.compiler.Compiler
– A RDDL2TensorFlow compiler.
-
batch_size
¶ int – The batch size.
-
MAX_INT_VALUE
= 5¶
-
MAX_REAL_VALUE
= 5.0¶
-
__call__
(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor][source]¶ Returns sampled action fluents for the current state and timestep.
Parameters: - state (Sequence[tf.Tensor]) – The current state fluents.
- timestep (tf.Tensor) – The current timestep.
Returns: A tuple of action fluents.
Return type: Sequence[tf.Tensor]
-
_check_preconditions
(state: Sequence[tensorflow.python.framework.ops.Tensor], action: Sequence[tensorflow.python.framework.ops.Tensor], bound_constraints: Dict[str, Tuple[Union[rddl2tf.core.fluent.TensorFluent, NoneType], Union[rddl2tf.core.fluent.TensorFluent, NoneType]]], default: Sequence[tensorflow.python.framework.ops.Tensor]) → Tuple[tensorflow.python.framework.ops.Tensor, Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor][source]¶ Samples action fluents until all preconditions are satisfied.
Checks action preconditions for the sampled action and current state, and iff all preconditions are satisfied it returns the sampled action fluents.
Parameters: - state (Sequence[tf.Tensor]) – A list of state fluents.
- action (Sequence[tf.Tensor]) – A list of action fluents.
- bound_constraints (Dict[str, Tuple[Optional[TensorFluent], Optional[TensorFluent]]]) – The bounds for each action fluent.
- default (Sequence[tf.Tensor]) – The default action fluents.
Returns: A tuple with an integer tensor corresponding to the number of samples, action fluents and a boolean tensor for checking all action preconditions.
Return type: Tuple[tf.Tensor, Sequence[tf.Tensor], tf.Tensor]
-
_sample_action
(constraints: Dict[str, Tuple[Union[rddl2tf.core.fluent.TensorFluent, NoneType], Union[rddl2tf.core.fluent.TensorFluent, NoneType]]], default: Sequence[tensorflow.python.framework.ops.Tensor], prob: float = 0.3) → Sequence[tensorflow.python.framework.ops.Tensor][source]¶ Samples action fluents respecting the given bound constraints.
With probability prob it chooses the action fluent default value, with probability 1-prob it samples the fluent w.r.t. its bounds.
Parameters: - constraints (Dict[str, Tuple[Optional[TensorFluent], Optional[TensorFluent]]]) – The bounds for each action fluent.
- default (Sequence[tf.Tensor]) – The default action fluents.
- prob (float) – A probability measure.
Returns: A tuple of action fluents.
Return type: Sequence[tf.Tensor]
-
_sample_action_fluent
(name: str, dtype: tensorflow.python.framework.dtypes.DType, size: Sequence[int], constraints: Dict[str, Tuple[Union[rddl2tf.core.fluent.TensorFluent, NoneType], Union[rddl2tf.core.fluent.TensorFluent, NoneType]]], default_value: tensorflow.python.framework.ops.Tensor, prob: float) → tensorflow.python.framework.ops.Tensor[source]¶ Samples the action fluent with given name, dtype, and size.
With probability prob it chooses the action fluent default_value, with probability 1-prob it samples the fluent w.r.t. its constraints.
Parameters: - name (str) – The name of the action fluent.
- dtype (tf.DType) – The data type of the action fluent.
- size (Sequence[int]) – The size and shape of the action fluent.
- constraints (Dict[str, Tuple[Optional[TensorFluent], Optional[TensorFluent]]]) – The bounds for each action fluent.
- default_value (tf.Tensor) – The default value for the action fluent.
- prob (float) – A probability measure.
Returns: A tensor for sampling the action fluent.
Return type: tf.Tensor
-
_sample_actions
(state: Sequence[tensorflow.python.framework.ops.Tensor]) → Tuple[Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor][source]¶ Returns sampled action fluents and tensors related to the sampling.
Parameters: state (Sequence[tf.Tensor]) – A list of state fluents. Returns: A tuple with action fluents, an integer tensor for the number of samples, and a boolean tensor for checking all action preconditions. Return type: Tuple[Sequence[tf.Tensor], tf.Tensor, tf.Tensor]
- compiler (
Module contents¶
tfrddlsim.simulation package¶
Submodules¶
tfrddlsim.simulation.policy_simulator module¶
-
class
tfrddlsim.simulation.policy_simulator.
PolicySimulationCell
(compiler: rddl2tf.compilers.compiler.Compiler, policy: tfrddlsim.policy.abstract_policy.Policy)[source]¶ Bases:
tensorflow.python.ops.rnn_cell_impl.RNNCell
SimulationCell implements a 1-step MDP transition cell.
It extends`tf.nn.rnn_cell.RNNCell` for simulating an MDP transition for a given policy. The cell input is the timestep. The hidden state is the factored MDP state. The cell output is the tuple of MDP fluents (next-state, action, interm, rewards).
Note
All fluents are represented in factored form as Tuple[tf.Tensors].
Parameters: - compiler (
rddl2tf.compiler.Compiler
) – RDDL2TensorFlow compiler. - policy (
tfrddlsim.policy.Policy
) – MDP Policy. - batch_size (int) – The size of the simulation batch.
-
__call__
(input: tensorflow.python.framework.ops.Tensor, state: Sequence[tensorflow.python.framework.ops.Tensor], scope: Union[str, NoneType] = None) → Tuple[Tuple[Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor]][source]¶ Returns the simulation cell for the given input and state.
The cell returns states, actions and interms as a sequence of tensors (i.e., all representations are factored). The reward is an 1-dimensional tensor.
Note
All tensors have shape: (batch_size, fluent_shape).
Parameters: - input (tf.Tensor) – The current MDP timestep.
- state (Sequence[tf.Tensor]) – State fluents in canonical order.
- scope (Optional[str]) – Scope for operations in graph.
Returns: (output, next_state).
Return type: Tuple[CellOutput, CellState]
-
classmethod
_dtype
(tensor: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor[source]¶ Converts tensor to tf.float32 datatype if needed.
-
classmethod
_output
(fluents: Sequence[Tuple[str, rddl2tf.core.fluent.TensorFluent]]) → Sequence[tensorflow.python.framework.ops.Tensor][source]¶ Returns output tensors for fluents.
-
classmethod
_tensors
(fluents: Sequence[Tuple[str, rddl2tf.core.fluent.TensorFluent]]) → Iterable[tensorflow.python.framework.ops.Tensor][source]¶ Yields the fluents’ tensors.
-
action_size
¶ Returns the MDP action size.
-
graph
¶ Returns the computation graph.
-
initial_state
() → Sequence[tensorflow.python.framework.ops.Tensor][source]¶ Returns the initial state tensor.
-
interm_size
¶ Returns the MDP intermediate state size.
-
output_size
¶ Returns the simulation cell output size.
-
state_size
¶ Returns the MDP state size.
- compiler (
-
class
tfrddlsim.simulation.policy_simulator.
PolicySimulator
(compiler: rddl2tf.compilers.compiler.Compiler, policy: tfrddlsim.policy.abstract_policy.Policy)[source]¶ Bases:
object
Simulator class samples MDP trajectories in the computation graph.
It implements the n-step MDP trajectory simulator using dynamic unrolling in a recurrent model. Its inputs are the MDP initial state and the number of timesteps in the horizon.
Parameters: - compiler (
rddl2tf.compiler.Compiler
) – RDDL2TensorFlow compiler. - policy (
tfrddlsim.policy.Policy
) – MDP Policy. - batch_size (int) – The size of the simulation batch.
-
classmethod
_output
(tensors: Sequence[tensorflow.python.framework.ops.Tensor], dtypes: Sequence[tensorflow.python.framework.dtypes.DType]) → Sequence[tensorflow.python.framework.ops.Tensor][source]¶ Converts tensors to the corresponding dtypes.
-
batch_size
¶ Returns the size of the simulation batch.
-
graph
¶ Returns the computation graph.
-
input_size
¶ Returns the simulation input size (e.g., timestep).
-
output_size
¶ Returns the simulation output size.
-
run
(horizon: int, initial_state: Union[typing.Sequence[tensorflow.python.framework.ops.Tensor], NoneType] = None) → Tuple[Sequence[<built-in function array>], Sequence[<built-in function array>], Sequence[<built-in function array>], Sequence[<built-in function array>], Sequence[<built-in function array>], <built-in function array>][source]¶ Builds the MDP graph and simulates in batch the trajectories with given horizon. Returns the non-fluents, states, actions, interms and rewards. Fluents and non-fluents are returned in factored form.
Note
All output arrays have shape: (batch_size, horizon, fluent_shape). Except initial state that has shape: (batch_size, fluent_shape).
Parameters: - horizon (int) – The number of timesteps in the simulation.
- initial_state (Optional[Sequence[tf.Tensor]]) – The initial state tensors.
Returns: Simulation ouput tuple.
Return type: Tuple[NonFluentsArray, StatesArray, ActionsArray, IntermsArray, np.array]
-
state_size
¶ Returns the MDP state size.
-
timesteps
(horizon: int) → tensorflow.python.framework.ops.Tensor[source]¶ Returns the input tensor for the given horizon.
-
trajectory
(horizon: int, initial_state: Union[typing.Sequence[tensorflow.python.framework.ops.Tensor], NoneType] = None) → Tuple[Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor][source]¶ Returns the ops for the trajectory generation with given horizon and initial_state.
The simulation returns states, actions and interms as a sequence of tensors (i.e., all representations are factored). The reward is a batch sized tensor. The trajectoty output is a tuple: (initial_state, states, actions, interms, rewards). If initial state is None, use default compiler’s initial state.
Note
All tensors have shape: (batch_size, horizon, fluent_shape). Except initial state that has shape: (batch_size, fluent_shape).
Parameters: - horizon (int) – The number of simulation timesteps.
- initial_state (Optional[Sequence[tf.Tensor]]) – The initial state tensors.
Returns: Trajectory output tuple.
Return type: Tuple[StateTensor, StatesTensor, ActionsTensor, IntermsTensor, tf.Tensor]
- compiler (
tfrddlsim.simulation.transition_simulator module¶
-
class
tfrddlsim.simulation.transition_simulator.
ActionSimulationCell
(compiler: rddl2tf.compilers.compiler.Compiler, batch_size: int = 1)[source]¶ Bases:
tensorflow.python.ops.rnn_cell_impl.RNNCell
ActionSimulationCell implements an MDP transition cell.
It extends a RNNCell in order to simulate the next state, given the current state and action. The cell input is the action fluents and the cell output is the next state fluents.
Note
All fluents are represented in factored form as Sequence[tf.Tensors].
Parameters: - compiler (
rddl2tf.compiler.Compiler
) – RDDL2TensorFlow compiler. - batch_size (int) – The simulation batch size.
-
__call__
(inputs: Sequence[tensorflow.python.framework.ops.Tensor], state: Sequence[tensorflow.python.framework.ops.Tensor], scope: Union[str, NoneType] = None) → Tuple[Tuple[Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor]][source]¶ Returns the transition simulation cell for the given input and state.
The cell outputs the reward as an 1-dimensional tensor, and the next state as a tuple of tensors.
Note
All tensors have shape: (batch_size, fluent_shape).
Parameters: - input (tf.Tensor) – The current action.
- state (Sequence[tf.Tensor]) – The current state.
- scope (Optional[str]) – Operations’ scope in computation graph.
Returns: (output, next_state).
Return type: Tuple[CellOutput, CellState]
-
classmethod
_output
(fluents: Sequence[Tuple[str, rddl2tf.core.fluent.TensorFluent]]) → Sequence[tensorflow.python.framework.ops.Tensor][source]¶ Converts fluents to tensors with datatype tf.float32.
-
action_size
¶ Returns the MDP action size.
-
interm_size
¶ Returns the MDP intermediate state size.
-
output_size
¶ Returns the simulation cell output size.
-
state_size
¶ Returns the MDP state size.
- compiler (
Module contents¶
tfrddlsim.viz package¶
Submodules¶
tfrddlsim.viz.abstract_visualizer module¶
tfrddlsim.viz.generic_visualizer module¶
-
class
tfrddlsim.viz.generic_visualizer.
GenericVisualizer
(compiler: rddl2tf.compilers.compiler.Compiler, verbose: bool)[source]¶ Bases:
tfrddlsim.viz.abstract_visualizer.Visualizer
GenericVisualizer is a generic text-based trajectory visualizer.
Parameters: - compiler (
rddl2tf.compiler.Compiler
) – RDDL2TensorFlow compiler - verbose (bool) – Verbosity flag
-
_render_batch
(non_fluents: Sequence[Tuple[str, Union[int, float, <built-in function array>]]], states: Sequence[Tuple[str, <built-in function array>]], actions: Sequence[Tuple[str, <built-in function array>]], interms: Sequence[Tuple[str, <built-in function array>]], rewards: <built-in function array>, horizon: Union[int, NoneType] = None) → None[source]¶ Prints non_fluents, states, actions, interms and rewards for given horizon.
Parameters: - states (Sequence[Tuple[str, np.array]]) – A state trajectory.
- actions (Sequence[Tuple[str, np.array]]) – An action trajectory.
- interms (Sequence[Tuple[str, np.array]]) – An interm state trajectory.
- rewards (np.array) – Sequence of rewards (1-dimensional array).
- horizon (Optional[int]) – Number of timesteps.
-
_render_fluent_timestep
(fluent_type: str, fluents: Sequence[Tuple[str, <built-in function array>]], fluent_variables: Sequence[Tuple[str, List[str]]]) → None[source]¶ Prints fluents of given fluent_type as list of instantiated variables with corresponding values.
Parameters: - fluent_type (str) – Fluent type.
- fluents (Sequence[Tuple[str, np.array]]) – List of pairs (fluent_name, fluent_values).
- fluent_variables (Sequence[Tuple[str, List[str]]]) – List of pairs (fluent_name, args).
-
_render_round_end
(rewards: <built-in function array>) → None[source]¶ Prints round end information about rewards.
-
_render_round_init
(horizon: int, non_fluents: Sequence[Tuple[str, Union[int, float, <built-in function array>]]]) → None[source]¶ Prints round init information about horizon and non_fluents.
-
_render_timestep
(t: int, s: Sequence[Tuple[str, <built-in function array>]], a: Sequence[Tuple[str, <built-in function array>]], f: Sequence[Tuple[str, <built-in function array>]], r: numpy.float32) → None[source]¶ Prints fluents and rewards for the given timestep t.
Parameters: - t (int) – timestep
- (Sequence[Tuple[str], np.array] (f) – State fluents.
- (Sequence[Tuple[str], np.array] – Action fluents.
- (Sequence[Tuple[str], np.array] – Interm state fluents.
- r (np.float32) – Reward.
-
_render_trajectories
(trajectories: Tuple[Sequence[Tuple[str, Union[int, float, <built-in function array>]]], Sequence[Tuple[str, <built-in function array>]], Sequence[Tuple[str, <built-in function array>]], Sequence[Tuple[str, <built-in function array>]], <built-in function array>]) → None[source]¶ Prints the first batch of simulated trajectories.
Parameters: trajectories – NonFluents, states, actions, interms and rewards.
-
render
(trajectories: Tuple[Sequence[Tuple[str, Union[int, float, <built-in function array>]]], Sequence[Tuple[str, <built-in function array>]], Sequence[Tuple[str, <built-in function array>]], Sequence[Tuple[str, <built-in function array>]], <built-in function array>], batch: Union[int, NoneType] = None) → None[source]¶ Prints the simulated trajectories.
Parameters: - trajectories – NonFluents, states, actions, interms and rewards.
- batch – Number of batches to render.
- compiler (