Redundant Computation of `dones` in PPO Implementation #777

songyuc · 2024-12-28T07:31:37Z

There seems to be a redundant computation in the current implementation where dones is calculated twice:

First calculation in ManiSkillVectorEnv.step():

# mani_skill/vector/wrappers/gymnasium.py
dones = torch.logical_or(terminations, truncations)

Second calculation in ppo.py:

# examples/baselines/ppo/ppo.py
next_done = torch.logical_or(terminations, truncations).to(torch.float32)

Proposed Solution

We can eliminate this redundancy by:

Modifying ManiSkillVectorEnv.step() to return the pre-computed dones:

def step(self, actions: Union[Array, Dict]) -> Tuple[Array, Array, Array, Array, Array, Dict]:
    obs, rew, terminations, truncations, infos = self._env.step(actions)
    ...
    dones = torch.logical_or(terminations, truncations)
    ...
    return obs, rew, terminations, truncations, dones, infos  # Add dones to return values

Updating ppo.py to use the returned dones:

next_obs, reward, terminations, truncations, next_done, infos = envs.step(clip_action(action))
# Remove redundant computation of next_done

Benefits

Eliminates redundant computation
Makes the code more efficient and cleaner
Maintains full compatibility with existing functionality

I'm happy to help submit a PR implementing these changes if this proposal seems reasonable.

The text was updated successfully, but these errors were encountered:

StoneT2000 · 2024-12-30T19:56:46Z

If there's a simple way to do this without breaking API / not adhering to Gymnasium VectorEnv API then sure. The current proposed fix breaks the API however

songyuc · 2024-12-31T07:54:38Z

Thanks for the feedback about API compatibility. Based on the VectorEnv implementation in gymnasium/vector/vector_env.py, I'd like to propose an alternative solution that maintains API compatibility;

Solution: Add dones to the infos dictionary

In ManiSkillVectorEnv.step():

def step(self, actions):
    obs, rew, terminations, truncations, infos = self._env.step(actions)
    dones = torch.logical_or(terminations, truncations)
    infos['dones'] = dones  # Add dones to infos dictionary
    return obs, rew, terminations, truncations, infos

In ppo.py, update to use the dones from infos:

next_obs, reward, terminations, truncations, infos = envs.step(clip_action(action))
next_done = infos['dones'].to(torch.float32)  # Get dones from infos

Benefits:

Eliminates redundant computation
Maintains Gymnasium VectorEnv API compatibility
Follows the pattern shown in vector_env.py for adding additional info
Clean and efficient solution

Would this approach work better? I would be happy to submit a PR if this looks good.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redundant Computation of `dones` in PPO Implementation #777

Redundant Computation of `dones` in PPO Implementation #777

songyuc commented Dec 28, 2024

StoneT2000 commented Dec 30, 2024

songyuc commented Dec 31, 2024 •

edited

Loading

Redundant Computation of dones in PPO Implementation #777

Redundant Computation of dones in PPO Implementation #777

Comments

songyuc commented Dec 28, 2024

Proposed Solution

Benefits

StoneT2000 commented Dec 30, 2024

songyuc commented Dec 31, 2024 • edited Loading

Redundant Computation of `dones` in PPO Implementation #777

Redundant Computation of `dones` in PPO Implementation #777

songyuc commented Dec 31, 2024 •

edited

Loading