Reinforcement Learning — Part 1

An IntroductionMotivation

The term reinforcement learning was first heard by the world in 1950s. It was figured out that by mimicking the behavior of humans, dogs, and other biological entities, machines can actually do some incredible things.

In 2013, Reinforcement Learning, as Machine Learning methodology started it’s revolution when couple of researchers called DeepMind made software to learn Atari Games without any prior knowledge about game rules.

In 2016, Deep Mind’s AlphaGo beat the world champion in the game of Go. Read the Google DeepMind’s AlphaGo: How it works to know more.

An example of this is training your dog. How do you teach your dog to sit down whenever you give the command ? You tell him the command million times, and when it finally sits down, you reward him with the treat as reward.

Here are some videos showcasing the power of this learning technique and with an intention to develop user’s interest in this field.

Elements of Reinforcement Learning

The Policy: The way in which the software agents acts at a given time is called the policy. It can be as simple as a function / Lookup table or more sophisticated techniques like search. The policy is what really determines how the agent reacts which can be either simple or complex, representing different behaviors at different times.
The Reward Signal: The rewards are actually that derive the agents motivation. At each step of the way, the environment gives rewards or penalties to the software agents. The agent has no way of knowing what the rewards signal will be until after it receives its reward or penalty. Agent has the ability to change the policy. If the software agents receives a punishment instead of reward, it will probably change its policy in order to maximize the rewards.
The Value Function: The value function is what defines the reward in long run not at that moment itself.
The Model: Mimics the behavior of environment.

Real-World Applications

Manufacturing: In Fanuc, robot uses deep reinforcement learning to pick a device from one box and putting it in a container. Whether it succeeds or fails, it memorizes the object and gains knowledge and train’s itself to do this job with great speed and precision.
Delivery management: Reinforcement learning is used to solve the problem of Split Delivery Vehicle Routing. Q-learning is used to serve appropriate customers with just one vehicle.
Gaming: Reinforcement Learning is also great for games.

If you have not read Part 2 of this series, please check it out here.

Reinforcement Learning — Part 1 was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.

Publication date

11/20/2017 - 06:59

Author

Prakhar Mishra

Article source

Reinforcement Learning — Part 1

Reinforcement Learning — Part 1

Tags

Disclaimer