The human expert must iteratively update that reward function as the agent explores and tries different actions. Instead, it leverages crowdsourced feedback, gathered from many nonexpert users, to guide the agent as it learns to reach its goal. “One of the most time-consuming and challenging parts in designing a robotic agent today is engineering the reward function. Some previous approaches try to use this crowdsourced, binary feedback to optimize a reward function that the agent would use to learn the task. The feedback is not used as a reward function, but rather to guide the agent’s exploration.