Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Multi-Discrete action spaces for PPO #19

Closed
tobiasmerkt opened this issue Nov 9, 2023 · 3 comments · Fixed by #30
Closed

[Feature Request] Multi-Discrete action spaces for PPO #19

tobiasmerkt opened this issue Nov 9, 2023 · 3 comments · Fixed by #30
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@tobiasmerkt
Copy link

🚀 Feature

Currently, PPO only supports (<class 'gymnasium.spaces.box.Box'>, <class 'gymnasium.spaces.discrete.Discrete'>) as action spaces. It would be awesome if it also supported MultiDiscrete action spaces.

Motivation

For many applications (Atari), one has to choose multiple discrete actions at each time step. StableBaselines3 supports MultiDiscrete action spaces already and it would be great if sbx supported it as well.

### Checklist

  • [x ] I have checked that there is no similar issue in the repo (required)
@tobiasmerkt tobiasmerkt added the enhancement New feature or request label Nov 9, 2023
@araffin
Copy link
Owner

araffin commented Nov 9, 2023

Hello,
would you like to contribute this feature?

@araffin araffin added the help wanted Extra attention is needed label Nov 9, 2023
@tobiasmerkt
Copy link
Author

tobiasmerkt commented Nov 13, 2023 via email

@loafthecomputerphile
Copy link

if i am reading the PPO source code correctly from SB3 and SBX the main thing we need to do is create a multi-categorical distribution class. the tensorflow_probability api that is used only supplies a categorical distribution which means we have to build our own or find a third party application. i had to go into my python installs to find the source code since Tensorflow didn't have it on their Github so i can see the source code. gladly from what i have seen it seems to be similar to the SB3 version meaning you need a few list comprehensions or for loops but i am not to sure how i would implement it and additionally i still need to learn how to use Github to commit, etc.

the distribution code can be found in the tensorflow_probability folder inside the distributions folder after you install SBX or tensorflow_probability . you will see the similarities to the Categorical distribution in SB3 (here) and the tensorflow_probability Categorical distribution code and it may be easy to apply the necessary conversions for the multi-categorical distributions in the same link posted previously. you may also have to edit the KL-divergence function also

i hope this comment is useful and can be used by others to help speed up the development of this feature and others if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants