-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Multiple Issues in Samplers and Buffers Affecting Stability and Expected Behavior #2205
Comments
Hi @wertyuilife2 I'm currently fixing I think I can integrate 1-3 in #2202 For 4, this seems to be an additional feature, not a bug (as you say, TorchRL doesn't currently encourage modifying a storage in-place). Not saying we don't want to support that but we'd have to carefully look into it! For 5 I'd be open to modify the logic around
but if the initial priority is 0, the initial sample will fail. Your I agree with 6 too, the priority should be updated just once. For 7, we can group that in the same stack as #2185 which also points at some thread safety issues. For these to be patched we'd need a good minimally reproducible example! |
Thanks for the quick response! I will test 1-3 for sure. For 5, I completely understand your point. However, my key issue is that
In the early stages of training, priorities are generally high, but as the network updates, the priorities gradually decrease. Because For 7, I will try to test that and will provide feedback if I find anything. |
Oh yeah I can see that |
In fact, |
Can I ask you to open 1 issue for each of these? It will be easier for me to track them! |
Of course, I will do it later! |
I have been using
torchrl
for my work recently and have encountered several bugs or unexpected behaviors. I noticed that @vmoens has been addressing some fixes, but to ensure nothing is overlooked, I am listing the issues I encountered here. These issues have not been resolved intorchrl-nightly==2024.6.3
.Given that this is a collection of issues, this post might be a bit lengthy. Please let me know if you would prefer me to split it into multiple issues!
In
PrioritizedSliceSampler.sample()
,preceding_stop_idx
needs to be moved to the CPU before executingself._sum_tree[preceding_stop_idx] = 0.0
. Ifpreceding_stop_idx
is on the GPU, the program results in a segmentation fault. It can be reproduced in the example code of 3rd issue.In commit c2e1c05, at samplers.py line 1767 and 1213,
index
andstop_idx
might not be on the same device, withstop_idx
potentially being on the GPU. This line should be modified as follows:As per the comments, the
preceding_stop_idx
variable inPrioritizedSliceSampler.sample()
attempts tobuild a list of indexes that we don't want to sample: all the steps at a seq_length distance from the end of the trajectory, with the end of the trajectory (stop_idx) included.
However, it does not do this correctly. The following code demonstrates this issue:This causes
PrioritizedSliceSampler.sample
to sample across trajectories, which is not the expected behavior, unlikeSliceSampler
which handles this correctly.My work requires modifying the contents of the buffer. Specifically, I need to sample an item, modify it, and put it back in the buffer. However, torchrl currently does not seem to encourage modifying buffer contents. When calling
buffer._storage.set(index, data)
to put my modified data back into the buffer, it implicitly changes_storage._len
, which can cause the sampler to sample empty samples. The following code demonstrates this issue:I resolved this by directly modifying
buffer._storage._storage
while holding thebuffer._replay_lock
. It took me two days to discover thatTensorStorage.set()
implicitly changes_len
, and I believe this method should behave more intuitively. I am not sure if otherStorage
classes have similar issues, butTensorStorage
definitely does.[Current Implementation and Issues]
The current implementation maintains
_max_priority
, which represents the maximum priority of all samples historically, not just the current buffer. Early in RL training, outliers can cause_max_priority
to remain high, making it unrepresentative. Additionally,_max_priority
is initialized to 1, while most RL algorithms use Bellman error as priority, which can often be much smaller (close to 0). Consequently,_max_priority
may never be updated. New samples are thus given a priority of 1, which essentially means their PER weight is close to 0. This means they are sampled immediately but contribute little to the weighted loss, reducing sample efficiency.[Proposed Solution]
Maintain a
_neg_min_tree = MinSegmentTree()
to track the maximum priority in the current buffer. With this, and addself._upper_priority = 1
, part ofPrioritizedSampler
methods can be updated as follows:This change implies that the
default_priority
function will need to takestorage
as an additional parameter, and eventually affecting several methods likeSampler(ABC).extend()
,Sampler(ABC).add()
, andSampler(ABC).mark_update()
, but I believe this is reasonable, akin to howSampler.sample()
already takes storage as a parameter.When
ReplayBuffer._add()
is called, the following sequence occurs:(1)
_writer.add() -> _storage.__setitem__() -> buffer.mark_update() -> _sampler.mark_update() -> _sampler.update_priority()
(2)
_sampler.add() -> _sampler._add_or_extend()
Both
_sampler._add_or_extend()
and_sampler.update_priority()
update the priority, withupdate_priority()
even applying additional transformations (e.g.,torch.pow(priority + self._eps, self._alpha)
). This behavior is also present inReplayBuffer._extend()
.This behavior is not reasonable. I believe the
mark_update
mechanism is somewhat redundant. We do not need to ensure that_attached_entities
are updated when changingstorage
content. Any additional updates required after directly modifying_storage
should be the responsibility of the user.mark_update
can lead to redundant calls and even cause conflicts.Although I haven't confirmed or tested this, there may be a threading risk. The ReplayBuffer initiates a separate thread for prefetching, which will call
PrioritizedSliceSampler.sample()
.In
PrioritizedSliceSampler.sample()
, we have:If the main thread calls
update_priority()
between these two lines, it might update_sum_tree
, causingself._sum_tree[preceding_stop_idx] = 0.0
to fail and then sample the end of a trajectory as the slice start.I'm uncertain of the role of
buffer._futures_lock
, but it doesn't seem to prevent this conflict.Overall, I hope this comprehensive overview helps in addressing these issues. Please let me know if you need further details or if I should break this down into separate issues!
Checklist
The text was updated successfully, but these errors were encountered: