English UT







Joining Lab




Year: 2008-
Masakazu Hirokawa
Kenji Suzuki
- Private Company
- Cognitive Robotics
- Cybernics
- Emerging Technologies

Coaching Robots with Subjective Feedbacks
Novel Approach to Socially Guided Machine Learning


In general, most of the methodologies for behavior learning consist of the following steps: Performing an action, evaluating the result of the action and modifying the learning parameters accordingly. The evaluation step is important in order to modify each parameter. Although the best way to achieve learning is to design an evaluation function which immediately gives the correct evaluation for every action, it is often not practical to design an evaluation function in advance due to the necessity of covering the whole state space.

We assume that reinforcement learning (RL) is useful for behavior learning, but the question of designing the reward function remains unanswered. Learning through interaction with a human trainer can realize action acquisition without a reward function, but giving too much concrete information regarding each task and its environment decreases its general versatility. Thus, the teaching signals from the trainer to the agent should be as simple and intuitive as possible.

We propose a novel methodology for behavior learning of an agent, called Coaching. The proposed method is an interactive and iterative learning method which allows a human trainer to give a subjective evaluation to the robotic agent asynchronously and in real time, and the agent can update the reward function dynamically based on this evaluation simultaneously. We demonstrated that the agent is capable of learning the desired behavior by receiving simple and subjective instructions such as positive and negative. The proposed approach is also effective when it is difficult to determine a suitable reward function for the learning situation in advance. Since this framework is inspired by human-to-human skill transfer process, it is intuitive for human trainers, and we define this framework as Coaching a robot, opposed to Teaching a robot. The biggest difference between Teaching and Coaching is: conventional Teaching methodologies require adequate evaluation function in advance, while Coaching, allows the agent to learn not only the desired behavior but also the adequate evaluation function as an internal model of human trainer, simultaneously. In this paper, we introduce how to achieve the implementation of the Coaching framework on a typical RL agent.


This work is partly supported by Grants-in-Aid for Scientific Research, MEXT, Japan.


  • Hirokawa, M., Suzuki, K., Coaching robots: online behavior learning from human subjective feedback, Contemporary Achievements in Intelligent Systems, Studies in Computational Intelligence, 442:37-51, 2013.
  • Hirokawa, M., Suzuki, K., "Coaching to Enhance the Online Behavior Learning of a Robotic Agent," Lecture Notes in Computer Science, 6276, pp.148-157, Springer, 2010.
Related Projects


  © 2005-2011 Artificial Intelligent Laboratory, University of Tsukuba, Japan