# Action Sequencing

## Description

The **action sequencing** module takes the task ⟨$s_0$, $g$⟩ as input, where:

- **$s_0$** represents the initial state of the environment.
- **$g$** is the task goal.

The module uses a transition model $\mathcal{M}$ specific to the simulator, which governs how the environment evolves based on actions. For more details on the transition model, refer to:

- `src/behavior_eval/evolving_graph/evolving_graph.py` for `Behavior`
- `src/virtualhome_eval/simulation/evolving_graph/environment.py` for `VirtualHome`

The module generates an action sequence $\bar{a} = \{a_i\}_{i=1}^{n}$, representing the actions required to move from the initial state toward achieving the task goal.

## Evaluation Details

### Evaluation Workflow

The evaluation of the action sequencing module involves two main components:

1. **Trajectory Evaluation**:
   - **Purpose**: To determine whether the generated action sequence $\bar{a}$ is executable in the simulator.
   - **Process**: Execute $\bar{a}$ to obtain the trajectory $T = ⟨\{s_i\}_{i=0}^{m}, \{a_i\}_{i=1}^{m}⟩$, e.g. `behavior_eval.evolving_graph.eval_evolving_graph_env.apply_action`.
   - **Outcome**: If an infeasible action occurs, execution may stop early. Execution failures are categorized into:
     - **Missing Steps**: Necessary actions that were omitted.
     - **Additional Steps**: Unnecessary actions that were included.
     - **Wrong Temporal Order**: Actions executed in an incorrect sequence.
     - **Affordance Errors**: Actions incompatible with the current state of objects (e.g., trying to "open" an object that cannot be opened).

2. **Goal Evaluation**:
   - **Purpose**: To assess if the task goal $g$ is satisfied after executing $\bar{a}$.
   - **Process**: Check for goal satisfaction, e.g. `behavior_eval.evolving_graph.evolving_graph.check_success`.
   - **Partial Goal Satisfaction Evaluation**:
     - Measures the percentage of subgoals in $g$ that are satisfied by $\bar{a}$.
     - **Process**:
       - Decompose $g$ into simple Linear Temporal Logic (LTL) goals $g_i$.
       - For each $g_i$:
         - Let $g_i = a₁ \overset{\text{then}}{\ldots} aₖ \textbf{~then~} (p₁ \land \ldots \land p_\ell)$.
         - Check if a subsequence in $\bar{a}$ matches $\{a_j\}_{j=1}^k$.
         - Evaluate the final state propositions $p_j$ in $s_m$.
       - Assign partial credits based on the number of propositions satisfied.
     - **Final Metric**: $\textit{PartialSucc}(\bar{a}, g) = \max_{g_i \in \mathcal{G}(g, \mathcal{U})} \textit{PartialSucc}(\bar{a}, g_i)$.

### Metrics

The evaluation metrics are divided into two categories:

1. **Trajectory Metrics**:
   - **Execution Success Rate**: The proportion of actions in $\bar{a}$ executed successfully without errors.
   - **Error Rates**:
     - **Parsing Errors**: Issues in interpreting the action sequence.
     - **Hallucination Errors**: Actions involving objects or states not present in the environment.
     - **Argument Errors**: Incorrect arguments provided for actions.
     - **Missing Steps**: Rate of necessary actions that were omitted.
     - **Additional Steps**: Rate of unnecessary actions included.
     - **Wrong Temporal Order**: Rate of actions executed in an incorrect sequence.
     - **Affordance Errors**: Rate of actions that cannot be performed due to object states.

2. **Goal Metrics**:
   - **Task Success Rate**: The proportion of tasks where the goal $g$ is fully satisfied after executing $\bar{a}$.
   - **Partial Goal Satisfaction Evaluation**:
        - **State Goal Satisfaction**: Success rate for satisfying state-based goals (e.g., object states).
        - **Relation Goal Satisfaction**: Success rate for satisfying relation-based goals (e.g., object relationships).
        - **Action Goal Satisfaction**: Success rate for achieving the specified action sequence.
        - **Total Goal Satisfaction**: Overall goal achievement rate, combining state, relation, and action goals.

### Output

The evaluation process produces several outputs:

- **Execution Information**:
  - Details for each action in $\bar{a}$, indicating whether it was executed successfully.
  - Error types encountered during execution (if any).
  - Step-by-step execution status.

- **Goal Satisfaction Results**:
  - Metrics indicating whether the goal was fully or partially satisfied.
  - Counts of total and satisfied predicates, including:
    - **Total Predicates**: Number of conditions evaluated.
    - **Satisfied Predicates**: Number of conditions that were satisfied.
    - Breakdown into edge and node predicates.

- **Overall Evaluation Metrics**:
  - **Goal Evaluation**:
    - **Task Success Rate**: Overall success rate for completing the task.
    - **State Goal Satisfaction**: Success rate for satisfying state-based goals.
    - **Relation Goal Satisfaction**: Success rate for satisfying relation-based goals.
    - **Action Goal Satisfaction**: Success rate for achieving the specified action sequence.
    - **Total Goal Satisfaction**: Combined success rate across all goal types.
  - **Trajectory Evaluation**:
    - **Execution Success Rate**: Overall success rate of the action sequence execution.
    - **Grammar Errors**: Rates of parsing, hallucination, and predicate argument number errors.
    - **Runtime Errors**: Rates of wrong order, missing step, affordance, and additional step errors.

### Example

**Task**: `assembling_gift_baskets_0_Beechwood_0_int_0_2021-10-26_12-46-37`

**Model**: `o1-preview`

**Transition Model ($\mathcal{M}$)**: `Behavior` simulator

**Initial States ($s_0$)**:

```python
[
    "['onfloor', 'basket_0', 'room_floor_living_room_0']",
    "['onfloor', 'basket_1', 'room_floor_living_room_0']",
    "['onfloor', 'basket_2', 'room_floor_living_room_0']",
    "['onfloor', 'basket_3', 'room_floor_living_room_0']",
    "['ontop', 'candle_0', 'breakfast_table_13']",
    "['ontop', 'candle_1', 'breakfast_table_13']",
    "['ontop', 'candle_2', 'breakfast_table_13']",
    "['ontop', 'candle_3', 'breakfast_table_13']",
    "['ontop', 'cookie_0', 'breakfast_table_13']",
    "['ontop', 'cookie_1', 'breakfast_table_13']",
    "['ontop', 'cookie_2', 'breakfast_table_13']",
    "['ontop', 'cookie_3', 'breakfast_table_13']",
    "['ontop', 'cheese_0', 'coffee_table_12']",
    "['ontop', 'cheese_1', 'coffee_table_12']",
    "['ontop', 'cheese_2', 'coffee_table_12']",
    "['ontop', 'cheese_3', 'coffee_table_12']",
    "['ontop', 'bow_0', 'coffee_table_12']",
    "['ontop', 'bow_1', 'coffee_table_12']",
    "['ontop', 'bow_2', 'coffee_table_12']",
    "['ontop', 'bow_3', 'coffee_table_12']",
    "['onfloor', 'agent.n.01_1', 'room_floor_living_room_0']"
]
```

**Goal ($g$)**:

```python
[
    "['forpairs', 'basket.n.01', '-', 'basket.n.01', 'candle.n.01', '-', 'candle.n.01', 'inside', 'candle.n.01', 'basket.n.01']",
    "['forpairs', 'basket.n.01', '-', 'basket.n.01', 'cheese.n.01', '-', 'cheese.n.01', 'inside', 'cheese.n.01', 'basket.n.01']",
    "['forpairs', 'basket.n.01', '-', 'basket.n.01', 'cookie.n.01', '-', 'cookie.n.01', 'inside', 'cookie.n.01', 'basket.n.01']",
    "['forpairs', 'basket.n.01', '-', 'basket.n.01', 'bow.n.08', '-', 'bow.n.08', 'inside', 'bow.n.08', 'basket.n.01']"
]
```

**Output**:

```python
[
    {"action": "LEFT_GRASP", "object": "candle_0"},
    {"action": "RIGHT_GRASP", "object": "cookie_0"},
    {"action": "LEFT_PLACE_INSIDE", "object": "basket_0"},
    {"action": "RIGHT_PLACE_INSIDE", "object": "basket_0"},
    {"action": "LEFT_GRASP", "object": "cheese_0"},
    {"action": "RIGHT_GRASP", "object": "bow_0"},
    {"action": "LEFT_PLACE_INSIDE", "object": "basket_0"},
    {"action": "RIGHT_PLACE_INSIDE", "object": "basket_0"},
    {"action": "LEFT_GRASP", "object": "candle_1"},
    {"action": "RIGHT_GRASP", "object": "cookie_1"},
    {"action": "LEFT_PLACE_INSIDE", "object": "basket_1"},
    {"action": "RIGHT_PLACE_INSIDE", "object": "basket_1"},
    {"action": "LEFT_GRASP", "object": "cheese_1"},
    {"action": "RIGHT_GRASP", "object": "bow_1"},
    {"action": "LEFT_PLACE_INSIDE", "object": "basket_1"},
    {"action": "RIGHT_PLACE_INSIDE", "object": "basket_1"},
    {"action": "LEFT_GRASP", "object": "candle_2"},
    {"action": "RIGHT_GRASP", "object": "cookie_2"},
    {"action": "LEFT_PLACE_INSIDE", "object": "basket_2"},
    {"action": "RIGHT_PLACE_INSIDE", "object": "basket_2"},
    {"action": "LEFT_GRASP", "object": "cheese_2"},
    {"action": "RIGHT_GRASP", "object": "bow_2"},
    {"action": "LEFT_PLACE_INSIDE", "object": "basket_2"},
    {"action": "RIGHT_PLACE_INSIDE", "object": "basket_2"},
    {"action": "LEFT_GRASP", "object": "candle_3"},
    {"action": "RIGHT_GRASP", "object": "cookie_3"},
    {"action": "LEFT_PLACE_INSIDE", "object": "basket_3"},
    {"action": "RIGHT_PLACE_INSIDE", "object": "basket_3"},
    {"action": "LEFT_GRASP", "object": "cheese_3"},
    {"action": "RIGHT_GRASP", "object": "bow_3"},
    {"action": "LEFT_PLACE_INSIDE", "object": "basket_3"},
    {"action": "RIGHT_PLACE_INSIDE", "object": "basket_3"}
]
```

**Results**:

```python
"llm_rst": {
    "error_type": {
        "parsing": null,            # No parsing errors occurred
        "hallucination": null,      # No hallucination errors occurred (no false information)
        "arguments": null,          # No argument errors occurred
        "execution_success": true   # Execution was successful
    },
    "goal_rst": {
        "all_goal_satisfied_ig": true,       # All goals were satisfied according to the internal graph (IG)
        "all_goal_satisfied_graph": true,    # All goals were satisfied according to the external goal graph
        "tot_predicates": 4.0,               # Total number of predicates (conditions) evaluated
        "tot_edge_predicates": 4.0,          # Total number of edge predicates (relationships between entities)
        "tot_node_predicates": 0.0,          # Total number of node predicates (properties of entities)
        "satisfied_predicates": 4.0,         # Number of predicates that were satisfied
        "satisfied_edge_predicates": 4.0,    # Number of satisfied edge predicates
        "satisfied_node_predicates": 0.0,    # Number of satisfied node predicates
        "pure_edge_predicates": 4,           # Number of pure edge predicates (without involving nodes)
        "pure_node_predicates": 0,           # Number of pure node predicates
        "mixed_predicates": 0,               # Number of mixed predicates (involving both edges and nodes)
        "satisfied_pure_edge_predicates": 4, # Number of satisfied pure edge predicates
        "satisfied_pure_node_predicates": 0, # Number of satisfied pure node predicates
        "satisfied_mixed_predicates": 0      # Number of satisfied mixed predicates
    },
    "execution_info": [
        {
            "action": "LEFT_GRASP",
            "object": "candle_0",
            "execution_success": True,
            "step": 0
        },
        {
            "action": "RIGHT_GRASP",
            "object": "cookie_0",
            "execution_success": True,
            "step": 1
        },
        {
            "action": "LEFT_PLACE_INSIDE",
            "object": "basket_0",
            "execution_success": True,
            "step": 2
        },
        {
            "action": "RIGHT_PLACE_INSIDE",
            "object": "basket_0",
            "execution_success": True,
            "step": 3
        },
        ...
        {
            "action": "RIGHT_PLACE_INSIDE",
            "object": "basket_3",
            "execution_success": True,
            "step": 31
        }
    ]
}
```

**Overall Results Across Tasks**

```python
{
    "goal_evaluation": {
        "task_success_rate": 0.81,    # Overall success rate for completing the task
        "state_goal": 0.895,          # Success rate for satisfying state-based goals
        "relation_goal": 0.844,       # Success rate for satisfying relation-based goals
        "action_goal": 0,             # Success rate for achieving the specified action sequence
        "total_goal": 0.8579          # Combined goal achievement rate
    },
    "trajectory_evaluation": {
        "execution_success_rate": 0.91,   # Overall success rate of action sequence execution
        "grammar_error": {
            "parsing": 0.0,               # No parsing errors
            "hallucination": 0.0,         # No hallucination errors
            "predicate_argument_number": 0.0  # No predicate argument number errors
        },
        "runtime_error": {
            "wrong_order": 0.0,           # No wrong order errors
            "missing_step": 0.06,         # 6% of sequences had missing steps
            "affordance": 0.02,           # 2% had affordance errors
            "additional_step": 0.03       # 3% had additional steps
        }
    }
}
```

## Usage

To evaluate the action sequencing module, use the following commands:

```bash
eai-eval --dataset virtualhome --eval-type action_sequencing --mode evaluate_results
eai-eval --dataset behavior --eval-type action_sequencing --mode evaluate_results
eai-eval --dataset virtualhome --eval-type action_sequencing --mode generate_prompts
eai-eval --dataset behavior --eval-type action_sequencing --mode generate_prompts
```