Striking a balanceMany existing methods that seek to strike a balance between imitation learning and reinforcement learning do so through brute force trial-and-error. Their solution involves training two students: one with a weighted combination of reinforcement learning and imitation learning, and a second that can only use reinforcement learning to learn the same task. The main idea is to automatically and dynamically adjust the weighting of the reinforcement and imitation learning objectives of the first student. The researchers’ algorithm continually compares the two students. Their method outperformed others that used either only imitation learning or only reinforcement learning.
Feb 20 2023