L2L

Shivin Dass¹, Jiaheng Hu², Ben Abbatematteo¹, Peter Stone^1,2, Roberto Martín-Martín¹

¹The University of Texas at Austin, ²Sony AI

Note: The task names between the paper and codebase are slightly different. We have three simulation tasks -- kitchen, walled and two_arm which correspond to cooking, walls and assembly tasks in the paper respectively.

Setup

Installation

conda create --name l2l python==3.10
conda activate l2l
git clone --recursive https://github.com/ShivinDass/l2l.git
cd l2l && git submodule update --init --recursive
pip install -r requirements.txt
pip install -e stable-baselines3/.
pip install -e .

Install robosuite,

git clone https://github.com/ARISE-Initiative/robosuite.git
cd robosuite 
git checkout 48c1b8a6c077d04399a00db05694d7f9f876ffc9
pip install -e .

For the two_arm task we use some assets from mimicgen so optionally set that up as well,

git clone https://github.com/NVlabs/mimicgen.git
cd mimicgen
pip install -e .

Usage

Our proposed solution DISaM works in two phases,

Phase 1: Train an Information-Receiving (IR) policy using imitation learning. (pretrained ckpts)
Phase 2: Freeze the pretrained IR policy and train the Information-Seeking (IS) policy using RL. (pretrained ckpts)

Following we provide instructions for the walled task but they can be appropriately modified for kitchen and two_arm tasks.

Phase 1: Imitation Learning (IR)

Download the data (ex. skill_walled_oh_n200.h5) and change the data path in the IR config file.

 python l2l/scripts/train_il.py \
 --config l2l/config/il/bc_ce_walled_multi_stage_config.py \
 --exp_name il_walled

Phase 2: Reinforcement Learning (IS)

Change the path in the IS config file to point to the trained ckpt from phase 1 or use the provided pretrained ckpts (ex. walled/weights/weights_ep15.pth).

 python l2l/scripts/dual_optimization.py \
 --config l2l/config/dual/robosuite/skill_walled_multi_stage/walled_multi_stage_action_dual_config.py \
 --exp_name disam_walled

Evaluation

python l2l/scripts/final_eval_dual.py --env <task-name> --info_step_break 3 --ckpt path/to/IS_ckpt.zip --n_rollouts 50

Where <task-name> is one of kitchen, walled or two_arm and set --ckpt to the trained RL ckpt path from phase 2 or try one of the pretrained ckpts (ex. disam_walled/epoch_25/weights/rl_model_537120_steps).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
l2l		l2l
stable-baselines3 @ 1fcaaab		stable-baselines3 @ 1fcaaab
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

L2L

Setup

Installation

Usage

Phase 1: Imitation Learning (IR)

Phase 2: Reinforcement Learning (IS)

Evaluation

About

Releases

Packages

Languages

UT-Austin-RobIn/l2l

Folders and files

Latest commit

History

Repository files navigation

L2L

Setup

Installation

Usage

Phase 1: Imitation Learning (IR)

Phase 2: Reinforcement Learning (IS)

Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages