-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] A way to specify the input of environment resets through the DataCollector #1906
Comments
That makes total sense yeah. I can imagine several cases:
Maybe a callable would be a good idea? collector = DataCollector(..., env_reset_func: Callable[[], TensorDictBase]) wdyt? |
callable sounds good to me and is probably much more versatile than just kwargs, but at least to me the name Other than that, it's exactly what I'd like to have :) |
i'm pretty bad at naming things you might have noticed :p |
maybe |
That actually sounds like TensorDictPrimer. Does that solve your problem or do you need something a bit more fancy? |
I feel like In my use case, I need to load an individual (empirical) sample from a recorded dataset (in an unfortunately rather unwieldy format) into the environment. A callback as discussed earlier sounds like a much simpler solution for this. |
Got it! I can make a PR with that if you think that could work! |
I think this would work. My data format is a bit odd in that it also contains strings and all kinds of other stuff that is not supported within |
I love odd :) |
oh interesting, my Some more details about what I'm trying to achieve and how the data looks like:The dataset contains dictionaries with all sorts of different types of data which together represent a complex system and its corresponding state. The current limitationsTo be able to have all this additional data available in the environment and use collectors during training/inference all of this data needs to be given to the environment so it can manually fetch everything corresponding to a system state at In principle one could probably do with less information in the environment but it makes the entire workflow much simpler. For instance, during validation, one could simply feed in the entire system and its initial state into the environment, perform an action as per the trained policy and then take out the resulting system + state in one go. |
Ok quite clear thanks! Let me sleep on this a bit and come back to you! |
Motivation
I'm working on an RL task in a (continuous) domain, however, the initial state the environment assumes on a reset comes
from a curated dataset, since we have prior knowledge of how the state of the environment typically looks in practice.
The environment should ideally not contain the entire dataset but only work with a single example (since that's all it "needs" to know to simulate the agent's actions and their effect). However, I would also like to make use of DataCollectors for training and validation.
Solution
Add an optional parameter
reset_env_kwargs
or similar to DataCollectors that allows to specify the arguments that are used when the collector calls the reset function of the environment.This way one can specify the input of the reset (outside of the environment code) and hence does not have to move the entire dataset into the environment to be able to reset to specific environment states.
Alternatives
It is possible that I missed another (easier) way of doing this by registering some sort of hook? In that case, I'd appreciate a pointer or small example of how this could be implemented.
Additional context
In the case that you find this addition useful, I'd be happy to contribute.
Checklist
The text was updated successfully, but these errors were encountered: