Projects

Guaranteed Trust Region Optimization via Two-Phase KL Penalization

We show that applying KL penalization alone is nearly enough to enforce a trust region, even without clipping, and that adding a small number of additional gradient steps is sufficient to always enforce a trust region in practice.
[ArXiv URL]

Reinforcement Learning

[Learn More]

A diagram showing a trust region being enforced using a dynamic beta, with fixup gradient steps shown in green. See Algorithm 1 for details.

Conditionally Combining Robot Skills using Large Language Models

We introduce a method we call Plan Conditioned Behavioral Cloning (PCBC), that allows finetuning the behavior of high-level plans using end-to-end demonstrations. Using Language-World, we show that PCBC is able to achieve strong performance in a variety of few-shot regimes, often achieving task generalization with as little as a single demonstration. [ArXiv URL]

Robotics

Large Language Models

Transfer Learning

[Learn More]

A diagram showing a robot combining three skills using our method. See Figure 5 for details.

Policy Diffusion: Generative Models for Policy Datasets

We show how to train a generative model over neural network parameters (a "diffusion graph hyper-network"). The resulting model is able to take in a task description (e.g. "hop forward on your right foot"), and produces a small neural network that performs that behavior. [ArXiv URL]

Large Language Models

Diffusion

Hyper-Networks

A figure showing a humanoid controlled by a sequence of policies, beginning with “slide forward on your right foot while kicking with your left foot” (top left), then “run forward on left foot while dragging right foot”, then “quickly shuffle forward on your left foot”, and finally “wildly hop forward on left foot while lifting your right foot up”.

Efficient Multi-Task Learning via Iterated Single-Task Transfer

We investigate the feasibility of competing with multi-task RL by performing repeated Transfer RL from one task to another. We describe a method of finding near optimal sequences of transfers to perform in this setting, and use it to show that performing the optimal sequence of transfer is competitive with other MTRL methods on the MetaWorld MT10 benchmark. [Semantic Scholar URL]

Robotics

Reinforcement Learning

Transfer Learning

A diagram showing transfer costs between tasks in MT10

Towards Exploiting Geometry and Time for Fast Off-Distribution Adaptation in Multi-Task Robot Learning

We train policies for a base set of pre-training tasks, then experiment with adapting to new off-distribution tasks, using simple architectural approaches for re-using these policies as black-box priors. We find that combining low-complexity target policy classes, base policies as black-box priors, and simple optimization algorithms allows us to acquire new tasks outside the base task distribution, using small amounts of offline training data. [ArXiv URL]

Robotics

Imitation Learning

Transfer Learning

A screenshot of the car-goal environment