Hands-On Intelligent Agents with OpenAI Gym

上QQ阅读APP看书，第一时间看更新

Reinforcement learning

Reinforcement learning is kind of a hybrid way of learning compared to supervised and unsupervised learning. As we learned at the start of this section, reinforcement learning is driven by a reward signal. In the case of the kid with their homework problem, the reward signal was the chocolate from their parents. In the machine learning world, a chocolate may not be enticing for a computer (well, we could program a computer to want chocolates, but why would we? Aren't kids enough?!), but a mere scalar value (a number) will do the trick! The reward signals are still human-specified in some way, signifying the intended goal of the task. For example, to train an agent to play Atari games using reinforcement learning, the scores from the games can be the reward signal. This makes reinforcement learning much easier (for humans and not for the machine!) because we don't have to label the button to be pressed at each point in the game to teach the machine how to play the game. Instead, we just ask the machine to learn on its own to maximize their score. Doesn't it sound fascinating that we could make a machine figure out how to play a game, or how to control a car, or how to do its homework all by itself, and all we have to do is just say how it did with a score? That is why we are learning about it in this chapter. You will develop some of those cool machines yourself in the upcoming chapters.