tfrddlsim.policy package

Submodules

tfrddlsim.policy.abstract_policy module

class tfrddlsim.policy.abstract_policy.Policy[source]

Bases: object

Abstract base class for representing Policy functions.

__call__(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor][source]

Returns action fluents for the current state and timestep.

Parameters:
  • state (Sequence[tf.Tensor]) – The current state fluents.
  • timestep (tf.Tensor) – The current timestep.
Returns:

A tuple of action fluents.

Return type:

Sequence[tf.Tensor]

tfrddlsim.policy.default_policy module

class tfrddlsim.policy.default_policy.DefaultPolicy(compiler: rddl2tf.compilers.compiler.Compiler, batch_size: int)[source]

Bases: tfrddlsim.policy.abstract_policy.Policy

DefaultPolicy class.

The default policy returns the default action fluents regardless of the current state and timestep.

Parameters:
  • compiler (rddl2tf.compiler.Compiler) – A RDDL2TensorFlow compiler.
  • batch_size (int) – The batch size.
__call__(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor][source]

Returns the default action fluents regardless of the current state and timestep.

Parameters:
  • state (Sequence[tf.Tensor]) – The current state fluents.
  • timestep (tf.Tensor) – The current timestep.
Returns:

A tuple of action fluents.

Return type:

Sequence[tf.Tensor]

tfrddlsim.policy.random_policy module

class tfrddlsim.policy.random_policy.RandomPolicy(compiler: rddl2tf.compilers.compiler.Compiler)[source]

Bases: tfrddlsim.policy.abstract_policy.Policy

RandomPolicy class.

The random policy samples action fluents uniformly. It checks for all action preconditions and constraints. The range of each action fluent is defined by action bounds constraints if defined in the RDDL model, or by default maximum values. values.

Parameters:
  • compiler (rddl2tf.compiler.Compiler) – A RDDL2TensorFlow compiler.
  • batch_size (int) – The batch size.
compiler

rddl2tf.compiler.Compiler – A RDDL2TensorFlow compiler.

batch_size

int – The batch size.

MAX_INT_VALUE = 5
MAX_REAL_VALUE = 5.0
__call__(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor][source]

Returns sampled action fluents for the current state and timestep.

Parameters:
  • state (Sequence[tf.Tensor]) – The current state fluents.
  • timestep (tf.Tensor) – The current timestep.
Returns:

A tuple of action fluents.

Return type:

Sequence[tf.Tensor]

_check_preconditions(state: Sequence[tensorflow.python.framework.ops.Tensor], action: Sequence[tensorflow.python.framework.ops.Tensor], bound_constraints: Dict[str, Tuple[Union[rddl2tf.core.fluent.TensorFluent, NoneType], Union[rddl2tf.core.fluent.TensorFluent, NoneType]]], default: Sequence[tensorflow.python.framework.ops.Tensor]) → Tuple[tensorflow.python.framework.ops.Tensor, Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor][source]

Samples action fluents until all preconditions are satisfied.

Checks action preconditions for the sampled action and current state, and iff all preconditions are satisfied it returns the sampled action fluents.

Parameters:
  • state (Sequence[tf.Tensor]) – A list of state fluents.
  • action (Sequence[tf.Tensor]) – A list of action fluents.
  • bound_constraints (Dict[str, Tuple[Optional[TensorFluent], Optional[TensorFluent]]]) – The bounds for each action fluent.
  • default (Sequence[tf.Tensor]) – The default action fluents.
Returns:

A tuple with an integer tensor corresponding to the number of samples, action fluents and a boolean tensor for checking all action preconditions.

Return type:

Tuple[tf.Tensor, Sequence[tf.Tensor], tf.Tensor]

_sample_action(constraints: Dict[str, Tuple[Union[rddl2tf.core.fluent.TensorFluent, NoneType], Union[rddl2tf.core.fluent.TensorFluent, NoneType]]], default: Sequence[tensorflow.python.framework.ops.Tensor], prob: float = 0.3) → Sequence[tensorflow.python.framework.ops.Tensor][source]

Samples action fluents respecting the given bound constraints.

With probability prob it chooses the action fluent default value, with probability 1-prob it samples the fluent w.r.t. its bounds.

Parameters:
  • constraints (Dict[str, Tuple[Optional[TensorFluent], Optional[TensorFluent]]]) – The bounds for each action fluent.
  • default (Sequence[tf.Tensor]) – The default action fluents.
  • prob (float) – A probability measure.
Returns:

A tuple of action fluents.

Return type:

Sequence[tf.Tensor]

_sample_action_fluent(name: str, dtype: tensorflow.python.framework.dtypes.DType, size: Sequence[int], constraints: Dict[str, Tuple[Union[rddl2tf.core.fluent.TensorFluent, NoneType], Union[rddl2tf.core.fluent.TensorFluent, NoneType]]], default_value: tensorflow.python.framework.ops.Tensor, prob: float) → tensorflow.python.framework.ops.Tensor[source]

Samples the action fluent with given name, dtype, and size.

With probability prob it chooses the action fluent default_value, with probability 1-prob it samples the fluent w.r.t. its constraints.

Parameters:
  • name (str) – The name of the action fluent.
  • dtype (tf.DType) – The data type of the action fluent.
  • size (Sequence[int]) – The size and shape of the action fluent.
  • constraints (Dict[str, Tuple[Optional[TensorFluent], Optional[TensorFluent]]]) – The bounds for each action fluent.
  • default_value (tf.Tensor) – The default value for the action fluent.
  • prob (float) – A probability measure.
Returns:

A tensor for sampling the action fluent.

Return type:

tf.Tensor

_sample_actions(state: Sequence[tensorflow.python.framework.ops.Tensor]) → Tuple[Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor][source]

Returns sampled action fluents and tensors related to the sampling.

Parameters:state (Sequence[tf.Tensor]) – A list of state fluents.
Returns:A tuple with action fluents, an integer tensor for the number of samples, and a boolean tensor for checking all action preconditions.
Return type:Tuple[Sequence[tf.Tensor], tf.Tensor, tf.Tensor]

Module contents