Point Cloud Classification Using Content-based Transformer via Clustering in Feature Space
In many sequential decision-making problems (e.g., robotics control, game
playing, sequential prediction), human or expert data is available containing
useful information about the task. However, imitation learning (IL) from a
small amount of expert data can be challenging in high-dimensional environments
with complex dynamics. Behavioral cloning is a simple method that is widely
used due to its simplicity of implementation and stable convergence but doesn't
utilize any information involving the environment's dynamics. Many existing
methods that exploit dynamics information are difficult to train in practice
due to an adversarial optimization process over reward and policy approximators
or biased, high variance gradient estimators. We introduce a method for
dynamics-aware IL which avoids adversarial training by learning a single
Q-function, implicitly representing both reward and policy. On standard
benchmarks, the implicitly learned rewards show a high positive correlation
with the ground-truth rewards, illustrating our method can also be used for
inverse reinforcement learning (IRL). Our method, Inverse soft-Q learning
(IQ-Learn) obtains state-of-the-art results in offline and online imitation
learning settings, significantly outperforming existing methods both in the
number of required environment interactions and scalability in high-dimensional
spaces, often by more than 3x.