What will be the policy if the state space is continuous in Reinforcement learning












1












$begingroup$


I have started recently with reinforcement learning. I have few doubts regarding the policy of an agent when it comes to continuous space. From my understanding, policy tells the agent which action to perform given a particular state. This makes sense when it comes to the maze example, where the state space is descrete and limited. What if the state space is continuous, will the agent have information of every possible state in the state space?
Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?










share|improve this question









$endgroup$

















    1












    $begingroup$


    I have started recently with reinforcement learning. I have few doubts regarding the policy of an agent when it comes to continuous space. From my understanding, policy tells the agent which action to perform given a particular state. This makes sense when it comes to the maze example, where the state space is descrete and limited. What if the state space is continuous, will the agent have information of every possible state in the state space?
    Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?










    share|improve this question









    $endgroup$















      1












      1








      1





      $begingroup$


      I have started recently with reinforcement learning. I have few doubts regarding the policy of an agent when it comes to continuous space. From my understanding, policy tells the agent which action to perform given a particular state. This makes sense when it comes to the maze example, where the state space is descrete and limited. What if the state space is continuous, will the agent have information of every possible state in the state space?
      Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?










      share|improve this question









      $endgroup$




      I have started recently with reinforcement learning. I have few doubts regarding the policy of an agent when it comes to continuous space. From my understanding, policy tells the agent which action to perform given a particular state. This makes sense when it comes to the maze example, where the state space is descrete and limited. What if the state space is continuous, will the agent have information of every possible state in the state space?
      Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?







      reinforcement-learning






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Apr 18 at 4:52









      ChinniChinni

      276




      276






















          1 Answer
          1






          active

          oldest

          votes


















          2












          $begingroup$

          You can still define state value functions $v(s)$, action value functions $q(s,a)$ and policy functions $pi(s)$ or $pi(a|s)$ when the state $s$ is from a very large or continuous space. Reinforcement Learning (RL) is still a well-defined problem in that space.



          What becomes harder is iterating through the state space. That rules out two simple approaches:




          • Tabular methods - that store lists of all states with the correct action or value.


          • Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.



          These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.



          However, RL methods can still work with large state spaces. The main method to do so is to use some form of function approximation, which then generalises the space so that knowledge learned about a single state is used to assess similar states.



          Function approximation can simply be discretising the space to make the numbers more manageable. Or you can use a parametrisable machine learning approach, such as neural networks. The combination of neural networks with reinforcement learning methods is behind the "deep" reinforcement learning approaches that have been subject of much recent research.



          If you use any function approximation with RL, then you are not guaranteed to find the most optimal policy. Instead you will find an approximation of that policy. However, that is often good enough for purpose.



          To answer the questions more directly:




          What will be the policy if the state space is continuous in Reinforcement learning




          There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.



          At the implementation level, you will need to implement a parametric function that takes $s$ as one of its inputs. The function parameters $theta$ are what is learned. For instance if you use an action value based method such as Q-learning, then you will create an approximation to $Q(s,a)$ - in the literature you may see this directly represented as $hat{q}(s,a,theta) approx Q(s,a)$



          Using a neural network for $hat{q}(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.




          What if the state space is continuous, will the agent have information of every possible state in the state space?




          Depends what you mean by "have information". The agent cannot possibly store separate data about each state. However, it may have information about similar states, or store its knowledge about states in a more abstract fashion (such as in the parameters $theta$)




          Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?




          Yes. For this to work well with function approximation, it relies on successful generalisation between similar states. So it is important that the state space representation works towards this. For instance, if two states are close together in the state space representation you use, it should be expected that value function and policy functions are often similar - not always, the function can have arbitrary shape, but trying to learn effectively random mapping would be impossible.






          share|improve this answer











          $endgroup$














            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "557"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49511%2fwhat-will-be-the-policy-if-the-state-space-is-continuous-in-reinforcement-learni%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2












            $begingroup$

            You can still define state value functions $v(s)$, action value functions $q(s,a)$ and policy functions $pi(s)$ or $pi(a|s)$ when the state $s$ is from a very large or continuous space. Reinforcement Learning (RL) is still a well-defined problem in that space.



            What becomes harder is iterating through the state space. That rules out two simple approaches:




            • Tabular methods - that store lists of all states with the correct action or value.


            • Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.



            These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.



            However, RL methods can still work with large state spaces. The main method to do so is to use some form of function approximation, which then generalises the space so that knowledge learned about a single state is used to assess similar states.



            Function approximation can simply be discretising the space to make the numbers more manageable. Or you can use a parametrisable machine learning approach, such as neural networks. The combination of neural networks with reinforcement learning methods is behind the "deep" reinforcement learning approaches that have been subject of much recent research.



            If you use any function approximation with RL, then you are not guaranteed to find the most optimal policy. Instead you will find an approximation of that policy. However, that is often good enough for purpose.



            To answer the questions more directly:




            What will be the policy if the state space is continuous in Reinforcement learning




            There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.



            At the implementation level, you will need to implement a parametric function that takes $s$ as one of its inputs. The function parameters $theta$ are what is learned. For instance if you use an action value based method such as Q-learning, then you will create an approximation to $Q(s,a)$ - in the literature you may see this directly represented as $hat{q}(s,a,theta) approx Q(s,a)$



            Using a neural network for $hat{q}(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.




            What if the state space is continuous, will the agent have information of every possible state in the state space?




            Depends what you mean by "have information". The agent cannot possibly store separate data about each state. However, it may have information about similar states, or store its knowledge about states in a more abstract fashion (such as in the parameters $theta$)




            Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?




            Yes. For this to work well with function approximation, it relies on successful generalisation between similar states. So it is important that the state space representation works towards this. For instance, if two states are close together in the state space representation you use, it should be expected that value function and policy functions are often similar - not always, the function can have arbitrary shape, but trying to learn effectively random mapping would be impossible.






            share|improve this answer











            $endgroup$


















              2












              $begingroup$

              You can still define state value functions $v(s)$, action value functions $q(s,a)$ and policy functions $pi(s)$ or $pi(a|s)$ when the state $s$ is from a very large or continuous space. Reinforcement Learning (RL) is still a well-defined problem in that space.



              What becomes harder is iterating through the state space. That rules out two simple approaches:




              • Tabular methods - that store lists of all states with the correct action or value.


              • Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.



              These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.



              However, RL methods can still work with large state spaces. The main method to do so is to use some form of function approximation, which then generalises the space so that knowledge learned about a single state is used to assess similar states.



              Function approximation can simply be discretising the space to make the numbers more manageable. Or you can use a parametrisable machine learning approach, such as neural networks. The combination of neural networks with reinforcement learning methods is behind the "deep" reinforcement learning approaches that have been subject of much recent research.



              If you use any function approximation with RL, then you are not guaranteed to find the most optimal policy. Instead you will find an approximation of that policy. However, that is often good enough for purpose.



              To answer the questions more directly:




              What will be the policy if the state space is continuous in Reinforcement learning




              There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.



              At the implementation level, you will need to implement a parametric function that takes $s$ as one of its inputs. The function parameters $theta$ are what is learned. For instance if you use an action value based method such as Q-learning, then you will create an approximation to $Q(s,a)$ - in the literature you may see this directly represented as $hat{q}(s,a,theta) approx Q(s,a)$



              Using a neural network for $hat{q}(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.




              What if the state space is continuous, will the agent have information of every possible state in the state space?




              Depends what you mean by "have information". The agent cannot possibly store separate data about each state. However, it may have information about similar states, or store its knowledge about states in a more abstract fashion (such as in the parameters $theta$)




              Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?




              Yes. For this to work well with function approximation, it relies on successful generalisation between similar states. So it is important that the state space representation works towards this. For instance, if two states are close together in the state space representation you use, it should be expected that value function and policy functions are often similar - not always, the function can have arbitrary shape, but trying to learn effectively random mapping would be impossible.






              share|improve this answer











              $endgroup$
















                2












                2








                2





                $begingroup$

                You can still define state value functions $v(s)$, action value functions $q(s,a)$ and policy functions $pi(s)$ or $pi(a|s)$ when the state $s$ is from a very large or continuous space. Reinforcement Learning (RL) is still a well-defined problem in that space.



                What becomes harder is iterating through the state space. That rules out two simple approaches:




                • Tabular methods - that store lists of all states with the correct action or value.


                • Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.



                These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.



                However, RL methods can still work with large state spaces. The main method to do so is to use some form of function approximation, which then generalises the space so that knowledge learned about a single state is used to assess similar states.



                Function approximation can simply be discretising the space to make the numbers more manageable. Or you can use a parametrisable machine learning approach, such as neural networks. The combination of neural networks with reinforcement learning methods is behind the "deep" reinforcement learning approaches that have been subject of much recent research.



                If you use any function approximation with RL, then you are not guaranteed to find the most optimal policy. Instead you will find an approximation of that policy. However, that is often good enough for purpose.



                To answer the questions more directly:




                What will be the policy if the state space is continuous in Reinforcement learning




                There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.



                At the implementation level, you will need to implement a parametric function that takes $s$ as one of its inputs. The function parameters $theta$ are what is learned. For instance if you use an action value based method such as Q-learning, then you will create an approximation to $Q(s,a)$ - in the literature you may see this directly represented as $hat{q}(s,a,theta) approx Q(s,a)$



                Using a neural network for $hat{q}(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.




                What if the state space is continuous, will the agent have information of every possible state in the state space?




                Depends what you mean by "have information". The agent cannot possibly store separate data about each state. However, it may have information about similar states, or store its knowledge about states in a more abstract fashion (such as in the parameters $theta$)




                Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?




                Yes. For this to work well with function approximation, it relies on successful generalisation between similar states. So it is important that the state space representation works towards this. For instance, if two states are close together in the state space representation you use, it should be expected that value function and policy functions are often similar - not always, the function can have arbitrary shape, but trying to learn effectively random mapping would be impossible.






                share|improve this answer











                $endgroup$



                You can still define state value functions $v(s)$, action value functions $q(s,a)$ and policy functions $pi(s)$ or $pi(a|s)$ when the state $s$ is from a very large or continuous space. Reinforcement Learning (RL) is still a well-defined problem in that space.



                What becomes harder is iterating through the state space. That rules out two simple approaches:




                • Tabular methods - that store lists of all states with the correct action or value.


                • Any method that needs to iterate through all states, e.g. the dynamic programming methods Policy Iteration or Value Iteration.



                These are important methods for RL. With tabulation and assuming you can iterate through all possibilities, then you can prove that you will find the optimal policy.



                However, RL methods can still work with large state spaces. The main method to do so is to use some form of function approximation, which then generalises the space so that knowledge learned about a single state is used to assess similar states.



                Function approximation can simply be discretising the space to make the numbers more manageable. Or you can use a parametrisable machine learning approach, such as neural networks. The combination of neural networks with reinforcement learning methods is behind the "deep" reinforcement learning approaches that have been subject of much recent research.



                If you use any function approximation with RL, then you are not guaranteed to find the most optimal policy. Instead you will find an approximation of that policy. However, that is often good enough for purpose.



                To answer the questions more directly:




                What will be the policy if the state space is continuous in Reinforcement learning




                There is no change at the theoretical level. You can express the policy as $pi(s)$ for a deterministic policy, or $pi(a|s)$ for a stochastic policy, regardless of the space of $s$.



                At the implementation level, you will need to implement a parametric function that takes $s$ as one of its inputs. The function parameters $theta$ are what is learned. For instance if you use an action value based method such as Q-learning, then you will create an approximation to $Q(s,a)$ - in the literature you may see this directly represented as $hat{q}(s,a,theta) approx Q(s,a)$



                Using a neural network for $hat{q}(s,a,theta)$ is one common way to achieve this, where the neural network's weight and bias values are in $theta$.




                What if the state space is continuous, will the agent have information of every possible state in the state space?




                Depends what you mean by "have information". The agent cannot possibly store separate data about each state. However, it may have information about similar states, or store its knowledge about states in a more abstract fashion (such as in the parameters $theta$)




                Also will an RL agent be able to take decision if its in a new state that it has not encountered during training ?




                Yes. For this to work well with function approximation, it relies on successful generalisation between similar states. So it is important that the state space representation works towards this. For instance, if two states are close together in the state space representation you use, it should be expected that value function and policy functions are often similar - not always, the function can have arbitrary shape, but trying to learn effectively random mapping would be impossible.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Apr 18 at 8:34

























                answered Apr 18 at 7:42









                Neil SlaterNeil Slater

                17.8k33264




                17.8k33264






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49511%2fwhat-will-be-the-policy-if-the-state-space-is-continuous-in-reinforcement-learni%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Plaza Victoria

                    Puebla de Zaragoza

                    Musa