We factor contexts for contextual policy search into environment and target components, such that experience can be directly generalized over target contexts.
We incorporate bi-perspective reward learning from human preferences into a general hierarchical reinforcement learning framework for robotic grasping.