[BugFix] PPOs with composite distribution #2791

louisfaury · 2025-02-17T14:30:40Z

Description

I believe there is a bug in PPOs' implementation when both prev_log_prob and log_prob are TensorDicts.

Motivation and Context

In the setting were both prev_log_prob and log_prob are TensorDicts, we were clamping prev_log_prob - log_prob directly, instead of their sum over features.

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

pytorch-bot · 2025-02-17T14:30:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2791

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 21 Pending

As of commit eadb9e1 with merge base 27a8ecc ():

NEW FAILURE - The following job has failed:

Habitat Tests on Linux / tests (3.9, 12.4) / linux-job (gh)
RuntimeError: Command docker exec -t e57efe7d59049ef4c701ac08b87ac034b85456a5bed4cd66b75dfe519b96824f /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

louisfaury

I checked that all PPO tests are still passing; two comments:

We should probably have test to catch this,
I'm planning to do a docstring pass on the entire PPO stack so things can be much clearer (some operations are a bit obscure at first read).

louisfaury · 2025-02-17T14:31:27Z

torchrl/objectives/ppo.py

+        if is_tensor_collection(log_weight):
+            log_weight = _sum_td_features(log_weight)
+            log_weight = log_weight.view(adv_shape).unsqueeze(-1)
+


That is the main change for this method, which is also now consistent with type hints.

louisfaury · 2025-02-17T14:33:23Z

torchrl/objectives/ppo.py

@@ -987,8 +982,6 @@ def forward(self, tensordict: TensorDictBase) -> TensorDictBase:
            # to different, unrelated trajectories, which is not standard. Still, it can give an idea of the weights'
            # dispersion.
            lw = log_weight.squeeze()
-            if not isinstance(lw, torch.Tensor):
-                lw = _sum_td_features(lw)
            ess = (2 * lw.logsumexp(0) - (2 * lw).logsumexp(0)).exp()
            batch = log_weight.shape[0]


The main error is two lines below; clamp was applied to the TensorDict log_weight before it is summed over the feature dimension.

vmoens

LGTM thanks

Co-authored-by: Louis Faury <[email protected]> (cherry picked from commit edfa25d)

Fix PPOs with composite dbs

eadb9e1

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 17, 2025

louisfaury commented Feb 17, 2025

View reviewed changes

vmoens approved these changes Feb 17, 2025

View reviewed changes

vmoens changed the title ~~Fix PPOs with composite distribution~~ [BugFix] PPOs with composite distribution Feb 17, 2025

vmoens added the bug Something isn't working label Feb 17, 2025

vmoens merged commit edfa25d into pytorch:main Feb 17, 2025
64 of 76 checks passed

vmoens pushed a commit that referenced this pull request Feb 17, 2025

[BugFix] PPOs with composite distribution (#2791)

882dc79

Co-authored-by: Louis Faury <[email protected]> (cherry picked from commit edfa25d)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] PPOs with composite distribution #2791

[BugFix] PPOs with composite distribution #2791

louisfaury commented Feb 17, 2025

pytorch-bot bot commented Feb 17, 2025 •

edited

Loading

louisfaury left a comment

louisfaury Feb 17, 2025

louisfaury Feb 17, 2025

vmoens left a comment

[BugFix] PPOs with composite distribution #2791

[BugFix] PPOs with composite distribution #2791

Conversation

louisfaury commented Feb 17, 2025

Description

Motivation and Context

Types of changes

Checklist

pytorch-bot bot commented Feb 17, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2791

❌ 1 New Failure, 21 Pending

louisfaury left a comment

Choose a reason for hiding this comment

louisfaury Feb 17, 2025

Choose a reason for hiding this comment

louisfaury Feb 17, 2025

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment

pytorch-bot bot commented Feb 17, 2025 •

edited

Loading