Skip to content

Official code release for Learning to Look: Seeking Information for Decision Making via Policy Factorization

Notifications You must be signed in to change notification settings

UT-Austin-RobIn/l2l

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

L2L

Shivin Dass1, Jiaheng Hu2, Ben Abbatematteo1, Peter Stone1,2, Roberto Martín-Martín1

1The University of Texas at Austin, 2Sony AI

Note: The task names between the paper and codebase are slightly different. We have three simulation tasks -- kitchen, walled and two_arm which correspond to cooking, walls and assembly tasks in the paper respectively.

Setup

Installation

conda create --name l2l python==3.10
conda activate l2l
git clone --recursive https://github.com/ShivinDass/l2l.git
cd l2l && git submodule update --init --recursive
pip install -r requirements.txt
pip install -e stable-baselines3/.
pip install -e .

Install robosuite,

git clone https://github.com/ARISE-Initiative/robosuite.git
cd robosuite 
git checkout 48c1b8a6c077d04399a00db05694d7f9f876ffc9
pip install -e .

For the two_arm task we use some assets from mimicgen so optionally set that up as well,

git clone https://github.com/NVlabs/mimicgen.git
cd mimicgen
pip install -e .

Usage

Our proposed solution DISaM works in two phases,

  1. Phase 1: Train an Information-Receiving (IR) policy using imitation learning. (pretrained ckpts)
  2. Phase 2: Freeze the pretrained IR policy and train the Information-Seeking (IS) policy using RL. (pretrained ckpts)

Following we provide instructions for the walled task but they can be appropriately modified for kitchen and two_arm tasks.

Phase 1: Imitation Learning (IR)

  1. Download the data (ex. skill_walled_oh_n200.h5) and change the data path in the IR config file.
  2.  python l2l/scripts/train_il.py \
     --config l2l/config/il/bc_ce_walled_multi_stage_config.py \
     --exp_name il_walled
    

Phase 2: Reinforcement Learning (IS)

  1. Change the path in the IS config file to point to the trained ckpt from phase 1 or use the provided pretrained ckpts (ex. walled/weights/weights_ep15.pth).
  2.  python l2l/scripts/dual_optimization.py \
     --config l2l/config/dual/robosuite/skill_walled_multi_stage/walled_multi_stage_action_dual_config.py \
     --exp_name disam_walled
    

Evaluation

python l2l/scripts/final_eval_dual.py --env <task-name> --info_step_break 3 --ckpt path/to/IS_ckpt.zip --n_rollouts 50

Where <task-name> is one of kitchen, walled or two_arm and set --ckpt to the trained RL ckpt path from phase 2 or try one of the pretrained ckpts (ex. disam_walled/epoch_25/weights/rl_model_537120_steps).

About

Official code release for Learning to Look: Seeking Information for Decision Making via Policy Factorization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages