Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant Computation of dones in PPO Implementation #777

Open
songyuc opened this issue Dec 28, 2024 · 2 comments
Open

Redundant Computation of dones in PPO Implementation #777

songyuc opened this issue Dec 28, 2024 · 2 comments

Comments

@songyuc
Copy link
Contributor

songyuc commented Dec 28, 2024

There seems to be a redundant computation in the current implementation where dones is calculated twice:

  1. First calculation in ManiSkillVectorEnv.step():
# mani_skill/vector/wrappers/gymnasium.py
dones = torch.logical_or(terminations, truncations)
  1. Second calculation in ppo.py:
# examples/baselines/ppo/ppo.py
next_done = torch.logical_or(terminations, truncations).to(torch.float32)

Proposed Solution

We can eliminate this redundancy by:

  1. Modifying ManiSkillVectorEnv.step() to return the pre-computed dones:
def step(self, actions: Union[Array, Dict]) -> Tuple[Array, Array, Array, Array, Array, Dict]:
    obs, rew, terminations, truncations, infos = self._env.step(actions)
    ...
    dones = torch.logical_or(terminations, truncations)
    ...
    return obs, rew, terminations, truncations, dones, infos  # Add dones to return values
  1. Updating ppo.py to use the returned dones:
next_obs, reward, terminations, truncations, next_done, infos = envs.step(clip_action(action))
# Remove redundant computation of next_done

Benefits

  • Eliminates redundant computation
  • Makes the code more efficient and cleaner
  • Maintains full compatibility with existing functionality

I'm happy to help submit a PR implementing these changes if this proposal seems reasonable.

@StoneT2000
Copy link
Member

If there's a simple way to do this without breaking API / not adhering to Gymnasium VectorEnv API then sure. The current proposed fix breaks the API however

@songyuc
Copy link
Contributor Author

songyuc commented Dec 31, 2024

Thanks for the feedback about API compatibility. Based on the VectorEnv implementation in gymnasium/vector/vector_env.py, I'd like to propose an alternative solution that maintains API compatibility;

Solution: Add dones to the infos dictionary

  1. In ManiSkillVectorEnv.step():
def step(self, actions):
    obs, rew, terminations, truncations, infos = self._env.step(actions)
    dones = torch.logical_or(terminations, truncations)
    infos['dones'] = dones  # Add dones to infos dictionary
    return obs, rew, terminations, truncations, infos
  1. In ppo.py, update to use the dones from infos:
next_obs, reward, terminations, truncations, infos = envs.step(clip_action(action))
next_done = infos['dones'].to(torch.float32)  # Get dones from infos

Benefits:

  • Eliminates redundant computation
  • Maintains Gymnasium VectorEnv API compatibility
  • Follows the pattern shown in vector_env.py for adding additional info
  • Clean and efficient solution

Would this approach work better? I would be happy to submit a PR if this looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants