How to write a pseudo-algorithm with algorithm2e package inside a tcolorbox environment?

up vote
3
down vote

favorite

I've recently seen in the second edition of Reinforcement Learning: an introduction by Sutton and Barto an appealing way to display pseudo-algorithms. In the following an example image.

algo

I think that the environment is done with tcolorbox package and I think that I should be able to make something similar. However, I like to put my pseudo-algorithm in an algorithm environment through algorithm2e package. There is a way to mix the two things? The ideal case would be to have the background color, the box, and the captions the same as the image and the internal structure of the algorithm environment of algorithm2e.

This is the code of what I've done so far:

documentclass{article}

usepackage[utf8]{inputenc}

usepackage[ruled,longend]{algorithm2e}

usepackage{textgreek}

usepackage{amssymb}



begin{document}

begin{algorithm}

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

caption{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}

end{algorithm}

end{document}

This is the result:

algo2

Basically, it's only the algorithm part. I don't know how to start to change its appearance since in the documentation there are very few commands to change the visual effect and none of them seems helpful.

edited Nov 23 at 19:56

asked Nov 23 at 19:13

gvgramazio

1,359520

Obviously, it is done using tcolorbox.
– Bernard
Nov 23 at 19:14

@Bernard. I don't have the source, for this reason I only stated that I think that it's done with tcolorbox.
– gvgramazio
Nov 23 at 19:17

add a comment |

up vote
3
down vote

favorite

I've recently seen in the second edition of Reinforcement Learning: an introduction by Sutton and Barto an appealing way to display pseudo-algorithms. In the following an example image.

algo

This is the code of what I've done so far:

documentclass{article}

usepackage[utf8]{inputenc}

usepackage[ruled,longend]{algorithm2e}

usepackage{textgreek}

usepackage{amssymb}



begin{document}

begin{algorithm}

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

caption{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}

end{algorithm}

end{document}

This is the result:

algo2

edited Nov 23 at 19:56

asked Nov 23 at 19:13

gvgramazio

1,359520

Obviously, it is done using tcolorbox.
– Bernard
Nov 23 at 19:14

@Bernard. I don't have the source, for this reason I only stated that I think that it's done with tcolorbox.
– gvgramazio
Nov 23 at 19:17

add a comment |

up vote
3
down vote

favorite

I've recently seen in the second edition of Reinforcement Learning: an introduction by Sutton and Barto an appealing way to display pseudo-algorithms. In the following an example image.

algo

This is the code of what I've done so far:

documentclass{article}

usepackage[utf8]{inputenc}

usepackage[ruled,longend]{algorithm2e}

usepackage{textgreek}

usepackage{amssymb}



begin{document}

begin{algorithm}

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

caption{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}

end{algorithm}

end{document}

This is the result:

algo2

edited Nov 23 at 19:56

asked Nov 23 at 19:13

gvgramazio

1,359520

I've recently seen in the second edition of Reinforcement Learning: an introduction by Sutton and Barto an appealing way to display pseudo-algorithms. In the following an example image.

algo

This is the code of what I've done so far:

documentclass{article}

usepackage[utf8]{inputenc}

usepackage[ruled,longend]{algorithm2e}

usepackage{textgreek}

usepackage{amssymb}



begin{document}

begin{algorithm}

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

caption{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}

end{algorithm}

end{document}

This is the result:

algo2

tcolorbox algorithm2e

edited Nov 23 at 19:56

asked Nov 23 at 19:13

gvgramazio

1,359520

edited Nov 23 at 19:56

asked Nov 23 at 19:13

gvgramazio

1,359520

edited Nov 23 at 19:56

asked Nov 23 at 19:13

gvgramazio

1,359520

asked Nov 23 at 19:13

gvgramazio

1,359520

asked Nov 23 at 19:13

gvgramazio

1,359520

Obviously, it is done using tcolorbox.
– Bernard
Nov 23 at 19:14

@Bernard. I don't have the source, for this reason I only stated that I think that it's done with tcolorbox.
– gvgramazio
Nov 23 at 19:17

add a comment |

Obviously, it is done using tcolorbox.
– Bernard
Nov 23 at 19:14

@Bernard. I don't have the source, for this reason I only stated that I think that it's done with tcolorbox.
– gvgramazio
Nov 23 at 19:17

Obviously, it is done using tcolorbox.
– Bernard
Nov 23 at 19:14

@Bernard. I don't have the source, for this reason I only stated that I think that it's done with tcolorbox.
– gvgramazio
Nov 23 at 19:17

add a comment |

2 Answers
2

active

oldest

votes

up vote
6
down vote

I missed the part in the algorithm2e manual when it is stated that the option H makes the environment non-floatable, thus it could be put inside the tcolorbox environment.

This is the updated code:

documentclass{article}

usepackage[utf8]{inputenc}

usepackage[longend]{algorithm2e}

usepackage{textgreek}

usepackage{amssymb}

usepackage{tcolorbox}



begin{document}

begin{tcolorbox}[fonttitle=bfseries, title=Q-learning (off-policy TD control) for estimating $pi approx pi_*$]

begin{algorithm}[H]

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

end{algorithm}

end{tcolorbox}

end{document}

This is the visual result:

algo3

Usually, I don't post self-answered questions. This was a genuine question but I found the answer five minutes after I posted it. Since it could be helpful to others I will leave it here.

answered Nov 23 at 19:32

gvgramazio

1,359520

You may prefer to change the title accordingly to reflect the integration of tcolorbox into algoritm2e.
– Diaa
Nov 23 at 19:41

1

@Diaa. Done. Feel free to change it again if you have of a better title.
– gvgramazio
Nov 23 at 19:57

add a comment |

up vote
2
down vote

This is more a long comment that an answer which was already provided by gvgramazio.

I just propose to integrate tcolorbox+algoritm environments inside a new myalgorithm environment. Using before upper and after upper options, the algorithm environment can be easily integrated inside a tcolorbox. And the new environment includes title as as second parameter. The first parameter is optional and can help to customize the box:

documentclass{article}

usepackage[utf8]{inputenc}

usepackage[longend]{algorithm2e}

usepackage{textgreek}

usepackage{amssymb}

usepackage{tcolorbox}



newtcolorbox{myalgorithm}[2]{%

    fonttitle=bfseries,

    title=#2,

    #1,

    before upper={begin{algorithm}[H]},

    after upper={end{algorithm}}

    }



begin{document}

begin{myalgorithm}{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

end{myalgorithm}



begin{myalgorithm}[sharp corners, colback=cyan!15]{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

end{myalgorithm}

end{document}

enter image description here

answered 2 days ago

Ignasi

90.3k4163302

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "85"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f461468%2fhow-to-write-a-pseudo-algorithm-with-algorithm2e-package-inside-a-tcolorbox-envi%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
6
down vote

I missed the part in the algorithm2e manual when it is stated that the option H makes the environment non-floatable, thus it could be put inside the tcolorbox environment.

This is the updated code:

documentclass{article}

usepackage[utf8]{inputenc}

usepackage[longend]{algorithm2e}

usepackage{textgreek}

usepackage{amssymb}

usepackage{tcolorbox}



begin{document}

begin{tcolorbox}[fonttitle=bfseries, title=Q-learning (off-policy TD control) for estimating $pi approx pi_*$]

begin{algorithm}[H]

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

end{algorithm}

end{tcolorbox}

end{document}

This is the visual result:

algo3

Usually, I don't post self-answered questions. This was a genuine question but I found the answer five minutes after I posted it. Since it could be helpful to others I will leave it here.

answered Nov 23 at 19:32

gvgramazio

1,359520

You may prefer to change the title accordingly to reflect the integration of tcolorbox into algoritm2e.
– Diaa
Nov 23 at 19:41

1

@Diaa. Done. Feel free to change it again if you have of a better title.
– gvgramazio
Nov 23 at 19:57

add a comment |

up vote
6
down vote

I missed the part in the algorithm2e manual when it is stated that the option H makes the environment non-floatable, thus it could be put inside the tcolorbox environment.

This is the updated code:

documentclass{article}

usepackage[utf8]{inputenc}

usepackage[longend]{algorithm2e}

usepackage{textgreek}

usepackage{amssymb}

usepackage{tcolorbox}



begin{document}

begin{tcolorbox}[fonttitle=bfseries, title=Q-learning (off-policy TD control) for estimating $pi approx pi_*$]

begin{algorithm}[H]

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

end{algorithm}

end{tcolorbox}

end{document}

This is the visual result:

algo3

Usually, I don't post self-answered questions. This was a genuine question but I found the answer five minutes after I posted it. Since it could be helpful to others I will leave it here.

answered Nov 23 at 19:32

gvgramazio

1,359520

You may prefer to change the title accordingly to reflect the integration of tcolorbox into algoritm2e.
– Diaa
Nov 23 at 19:41

1

@Diaa. Done. Feel free to change it again if you have of a better title.
– gvgramazio
Nov 23 at 19:57

add a comment |

up vote
6
down vote

I missed the part in the algorithm2e manual when it is stated that the option H makes the environment non-floatable, thus it could be put inside the tcolorbox environment.

This is the updated code:

documentclass{article}

usepackage[utf8]{inputenc}

usepackage[longend]{algorithm2e}

usepackage{textgreek}

usepackage{amssymb}

usepackage{tcolorbox}



begin{document}

begin{tcolorbox}[fonttitle=bfseries, title=Q-learning (off-policy TD control) for estimating $pi approx pi_*$]

begin{algorithm}[H]

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

end{algorithm}

end{tcolorbox}

end{document}

This is the visual result:

algo3

Usually, I don't post self-answered questions. This was a genuine question but I found the answer five minutes after I posted it. Since it could be helpful to others I will leave it here.

answered Nov 23 at 19:32

gvgramazio

1,359520

I missed the part in the algorithm2e manual when it is stated that the option H makes the environment non-floatable, thus it could be put inside the tcolorbox environment.

This is the updated code:

documentclass{article}

usepackage[utf8]{inputenc}

usepackage[longend]{algorithm2e}

usepackage{textgreek}

usepackage{amssymb}

usepackage{tcolorbox}



begin{document}

begin{tcolorbox}[fonttitle=bfseries, title=Q-learning (off-policy TD control) for estimating $pi approx pi_*$]

begin{algorithm}[H]

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

end{algorithm}

end{tcolorbox}

end{document}

This is the visual result:

algo3

Usually, I don't post self-answered questions. This was a genuine question but I found the answer five minutes after I posted it. Since it could be helpful to others I will leave it here.

answered Nov 23 at 19:32

gvgramazio

1,359520

answered Nov 23 at 19:32

gvgramazio

1,359520

answered Nov 23 at 19:32

gvgramazio

1,359520

answered Nov 23 at 19:32

gvgramazio

1,359520

You may prefer to change the title accordingly to reflect the integration of tcolorbox into algoritm2e.
– Diaa
Nov 23 at 19:41

1

@Diaa. Done. Feel free to change it again if you have of a better title.
– gvgramazio
Nov 23 at 19:57

add a comment |

You may prefer to change the title accordingly to reflect the integration of tcolorbox into algoritm2e.
– Diaa
Nov 23 at 19:41

1

@Diaa. Done. Feel free to change it again if you have of a better title.
– gvgramazio
Nov 23 at 19:57

You may prefer to change the title accordingly to reflect the integration of tcolorbox into algoritm2e.
– Diaa
Nov 23 at 19:41

@Diaa. Done. Feel free to change it again if you have of a better title.
– gvgramazio
Nov 23 at 19:57

add a comment |

up vote
2
down vote

This is more a long comment that an answer which was already provided by gvgramazio.

documentclass{article}

usepackage[utf8]{inputenc}

usepackage[longend]{algorithm2e}

usepackage{textgreek}

usepackage{amssymb}

usepackage{tcolorbox}



newtcolorbox{myalgorithm}[2]{%

    fonttitle=bfseries,

    title=#2,

    #1,

    before upper={begin{algorithm}[H]},

    after upper={end{algorithm}}

    }



begin{document}

begin{myalgorithm}{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

end{myalgorithm}



begin{myalgorithm}[sharp corners, colback=cyan!15]{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

end{myalgorithm}

end{document}

enter image description here

answered 2 days ago

Ignasi

90.3k4163302

add a comment |

up vote
2
down vote

This is more a long comment that an answer which was already provided by gvgramazio.

documentclass{article}

usepackage[utf8]{inputenc}

usepackage[longend]{algorithm2e}

usepackage{textgreek}

usepackage{amssymb}

usepackage{tcolorbox}



newtcolorbox{myalgorithm}[2]{%

    fonttitle=bfseries,

    title=#2,

    #1,

    before upper={begin{algorithm}[H]},

    after upper={end{algorithm}}

    }



begin{document}

begin{myalgorithm}{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

end{myalgorithm}



begin{myalgorithm}[sharp corners, colback=cyan!15]{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

end{myalgorithm}

end{document}

enter image description here

answered 2 days ago

Ignasi

90.3k4163302

add a comment |

up vote
2
down vote

This is more a long comment that an answer which was already provided by gvgramazio.

documentclass{article}

usepackage[utf8]{inputenc}

usepackage[longend]{algorithm2e}

usepackage{textgreek}

usepackage{amssymb}

usepackage{tcolorbox}



newtcolorbox{myalgorithm}[2]{%

    fonttitle=bfseries,

    title=#2,

    #1,

    before upper={begin{algorithm}[H]},

    after upper={end{algorithm}}

    }



begin{document}

begin{myalgorithm}{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

end{myalgorithm}



begin{myalgorithm}[sharp corners, colback=cyan!15]{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

end{myalgorithm}

end{document}

enter image description here

answered 2 days ago

Ignasi

90.3k4163302

This is more a long comment that an answer which was already provided by gvgramazio.

documentclass{article}

usepackage[utf8]{inputenc}

usepackage[longend]{algorithm2e}

usepackage{textgreek}

usepackage{amssymb}

usepackage{tcolorbox}



newtcolorbox{myalgorithm}[2]{%

    fonttitle=bfseries,

    title=#2,

    #1,

    before upper={begin{algorithm}[H]},

    after upper={end{algorithm}}

    }



begin{document}

begin{myalgorithm}{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

end{myalgorithm}



begin{myalgorithm}[sharp corners, colback=cyan!15]{Q-learning (off-policy TD control) for estimating $pi approx pi_*$}

Algorithm parameters: step size $alpha in (0, 1]$, small $epsilon > 0$;

Initialize $Q(s, a)$, for all $s in mathcal{S}^+, a in mathcal{A}(s)$, arbitrarily except that $Q(mathrm{terminal}, cdot) = 0$;



ForEach{episode}{

    Initialize S;

    ForEach{step of episode}{

        Choose $A$ from $S$ using policy derived from $Q$ (e.g., textepsilon-greedy);

        Take action $A$, observe $R$, $S'$;

        $Q(S, A) leftarrow Q(S, A) + alpha [R + gamma max_a Q(S', a) - Q(S, A)]$;

        $S leftarrow S'$;

    }

}

end{myalgorithm}

end{document}

enter image description here

answered 2 days ago

Ignasi

90.3k4163302

answered 2 days ago

Ignasi

90.3k4163302

answered 2 days ago

Ignasi

90.3k4163302

answered 2 days ago

Ignasi

90.3k4163302

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Csdrhrt