What will be the policy if the state space is continuous in Reinforcement learning

I have started recently with reinforcement learning. I have few doubts regarding the policy of an agent when it comes to continuous space. From my understanding, policy tells the agent which action to perform given a particular state. This makes sense when it comes to the maze example, where the state space is descrete and limited. What if the state space is continuous, will the agent have information of every possible state in the state space?
Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?

asked Apr 18 at 4:52

Chinni

276

add a comment |

asked Apr 18 at 4:52

Chinni

276

add a comment |

asked Apr 18 at 4:52

Chinni

276

reinforcement-learning

asked Apr 18 at 4:52

Chinni

276

asked Apr 18 at 4:52

Chinni

276

asked Apr 18 at 4:52

Chinni

276

asked Apr 18 at 4:52

Chinni

276

asked Apr 18 at 4:52

Chinni

276

add a comment |

1 Answer
1

active

oldest

votes

You can still define state value functions $v(s)$, action value functions $q(s,a)$ and policy functions $pi(s)$ or $pi(a|s)$ when the state $s$ is from a very large or continuous space. Reinforcement Learning (RL) is still a well-defined problem in that space.

What becomes harder is iterating through the state space. That rules out two simple approaches:

Tabular methods - that store lists of all states with the correct action or value.

Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.

These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.

However, RL methods can still work with large state spaces. The main method to do so is to use some form of function approximation, which then generalises the space so that knowledge learned about a single state is used to assess similar states.

Function approximation can simply be discretising the space to make the numbers more manageable. Or you can use a parametrisable machine learning approach, such as neural networks. The combination of neural networks with reinforcement learning methods is behind the "deep" reinforcement learning approaches that have been subject of much recent research.

If you use any function approximation with RL, then you are not guaranteed to find the most optimal policy. Instead you will find an approximation of that policy. However, that is often good enough for purpose.

To answer the questions more directly:

What will be the policy if the state space is continuous in Reinforcement learning

There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.

At the implementation level, you will need to implement a parametric function that takes $s$ as one of its inputs. The function parameters $theta$ are what is learned. For instance if you use an action value based method such as Q-learning, then you will create an approximation to $Q(s,a)$ - in the literature you may see this directly represented as $hat{q}(s,a,theta) approx Q(s,a)$

Using a neural network for $hat{q}(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.

What if the state space is continuous, will the agent have information of every possible state in the state space?

Depends what you mean by "have information". The agent cannot possibly store separate data about each state. However, it may have information about similar states, or store its knowledge about states in a more abstract fashion (such as in the parameters $theta$)

Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?

Yes. For this to work well with function approximation, it relies on successful generalisation between similar states. So it is important that the state space representation works towards this. For instance, if two states are close together in the state space representation you use, it should be expected that value function and policy functions are often similar - not always, the function can have arbitrary shape, but trying to learn effectively random mapping would be impossible.

edited Apr 18 at 8:34

answered Apr 18 at 7:42

Neil Slater

17.8k33264

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49511%2fwhat-will-be-the-policy-if-the-state-space-is-continuous-in-reinforcement-learni%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

What becomes harder is iterating through the state space. That rules out two simple approaches:

Tabular methods - that store lists of all states with the correct action or value.

Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.

These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.

To answer the questions more directly:

What will be the policy if the state space is continuous in Reinforcement learning

There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.

Using a neural network for $hat{q}(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.

What if the state space is continuous, will the agent have information of every possible state in the state space?

Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?

edited Apr 18 at 8:34

answered Apr 18 at 7:42

Neil Slater

17.8k33264

add a comment |

What becomes harder is iterating through the state space. That rules out two simple approaches:

Tabular methods - that store lists of all states with the correct action or value.

Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.

These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.

To answer the questions more directly:

What will be the policy if the state space is continuous in Reinforcement learning

There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.

Using a neural network for $hat{q}(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.

What if the state space is continuous, will the agent have information of every possible state in the state space?

Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?

edited Apr 18 at 8:34

answered Apr 18 at 7:42

Neil Slater

17.8k33264

add a comment |

What becomes harder is iterating through the state space. That rules out two simple approaches:

Tabular methods - that store lists of all states with the correct action or value.

Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.

These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.

To answer the questions more directly:

What will be the policy if the state space is continuous in Reinforcement learning

There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.

Using a neural network for $hat{q}(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.

What if the state space is continuous, will the agent have information of every possible state in the state space?

Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?

edited Apr 18 at 8:34

answered Apr 18 at 7:42

Neil Slater

17.8k33264

What becomes harder is iterating through the state space. That rules out two simple approaches:

Tabular methods - that store lists of all states with the correct action or value.

Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.

These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.

To answer the questions more directly:

What will be the policy if the state space is continuous in Reinforcement learning

There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.

Using a neural network for $hat{q}(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.

What if the state space is continuous, will the agent have information of every possible state in the state space?

Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?

edited Apr 18 at 8:34

answered Apr 18 at 7:42

Neil Slater

17.8k33264

edited Apr 18 at 8:34

answered Apr 18 at 7:42

Neil Slater

17.8k33264

answered Apr 18 at 7:42

Neil Slater

17.8k33264

answered Apr 18 at 7:42

Neil Slater

17.8k33264

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Csdrhrt