How do I model line comments in a CFG?












1












$begingroup$


Assume we want to define a context free grammar of say a programming language, where on each line everything after the character # until the end of line is considered a comment and should be ignored. How to express that in a context free grammar?










share|cite|improve this question











$endgroup$

















    1












    $begingroup$


    Assume we want to define a context free grammar of say a programming language, where on each line everything after the character # until the end of line is considered a comment and should be ignored. How to express that in a context free grammar?










    share|cite|improve this question











    $endgroup$















      1












      1








      1





      $begingroup$


      Assume we want to define a context free grammar of say a programming language, where on each line everything after the character # until the end of line is considered a comment and should be ignored. How to express that in a context free grammar?










      share|cite|improve this question











      $endgroup$




      Assume we want to define a context free grammar of say a programming language, where on each line everything after the character # until the end of line is considered a comment and should be ignored. How to express that in a context free grammar?







      context-free formal-grammars






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Apr 13 at 9:03









      Raphael

      58.2k24143319




      58.2k24143319










      asked Apr 13 at 8:52









      Troy McClureTroy McClure

      30129




      30129






















          2 Answers
          2






          active

          oldest

          votes


















          3












          $begingroup$

          Without multi-line statements, it's rather simple; assuming line is the non-terminal for well-formed line, change its right-hand-side appearances to



          line (# .*)?


          (borrowing regular expression syntax for brevity).



          Otherwise, if you explicitly handle line breaks, replace occurrences of the line-break token CRLF (or so) similarly by



          (# .*)? CRLF


          If line breaks remain implicit, one chance you have left is make it part of the lexer; don't only skip whitepace, but also (# .*)? [rn]. You can also create a token COMMENT for that and place it in all places of the grammar where it may appear. This strategy doesn't work for nested multiline comments, though, a common problem in older languages.






          share|cite|improve this answer









          $endgroup$













          • $begingroup$
            thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
            $endgroup$
            – Troy McClure
            Apr 13 at 9:17










          • $begingroup$
            @TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
            $endgroup$
            – Raphael
            Apr 13 at 17:40










          • $begingroup$
            The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
            $endgroup$
            – Raphael
            Apr 13 at 17:44












          • $begingroup$
            @troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
            $endgroup$
            – rici
            Apr 13 at 19:49










          • $begingroup$
            @Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
            $endgroup$
            – rici
            Apr 13 at 21:38



















          0












          $begingroup$

          Can just treat comments as whitespaces.






          share|cite|improve this answer









          $endgroup$














            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "419"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f106897%2fhow-do-i-model-line-comments-in-a-cfg%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3












            $begingroup$

            Without multi-line statements, it's rather simple; assuming line is the non-terminal for well-formed line, change its right-hand-side appearances to



            line (# .*)?


            (borrowing regular expression syntax for brevity).



            Otherwise, if you explicitly handle line breaks, replace occurrences of the line-break token CRLF (or so) similarly by



            (# .*)? CRLF


            If line breaks remain implicit, one chance you have left is make it part of the lexer; don't only skip whitepace, but also (# .*)? [rn]. You can also create a token COMMENT for that and place it in all places of the grammar where it may appear. This strategy doesn't work for nested multiline comments, though, a common problem in older languages.






            share|cite|improve this answer









            $endgroup$













            • $begingroup$
              thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
              $endgroup$
              – Troy McClure
              Apr 13 at 9:17










            • $begingroup$
              @TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
              $endgroup$
              – Raphael
              Apr 13 at 17:40










            • $begingroup$
              The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
              $endgroup$
              – Raphael
              Apr 13 at 17:44












            • $begingroup$
              @troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
              $endgroup$
              – rici
              Apr 13 at 19:49










            • $begingroup$
              @Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
              $endgroup$
              – rici
              Apr 13 at 21:38
















            3












            $begingroup$

            Without multi-line statements, it's rather simple; assuming line is the non-terminal for well-formed line, change its right-hand-side appearances to



            line (# .*)?


            (borrowing regular expression syntax for brevity).



            Otherwise, if you explicitly handle line breaks, replace occurrences of the line-break token CRLF (or so) similarly by



            (# .*)? CRLF


            If line breaks remain implicit, one chance you have left is make it part of the lexer; don't only skip whitepace, but also (# .*)? [rn]. You can also create a token COMMENT for that and place it in all places of the grammar where it may appear. This strategy doesn't work for nested multiline comments, though, a common problem in older languages.






            share|cite|improve this answer









            $endgroup$













            • $begingroup$
              thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
              $endgroup$
              – Troy McClure
              Apr 13 at 9:17










            • $begingroup$
              @TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
              $endgroup$
              – Raphael
              Apr 13 at 17:40










            • $begingroup$
              The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
              $endgroup$
              – Raphael
              Apr 13 at 17:44












            • $begingroup$
              @troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
              $endgroup$
              – rici
              Apr 13 at 19:49










            • $begingroup$
              @Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
              $endgroup$
              – rici
              Apr 13 at 21:38














            3












            3








            3





            $begingroup$

            Without multi-line statements, it's rather simple; assuming line is the non-terminal for well-formed line, change its right-hand-side appearances to



            line (# .*)?


            (borrowing regular expression syntax for brevity).



            Otherwise, if you explicitly handle line breaks, replace occurrences of the line-break token CRLF (or so) similarly by



            (# .*)? CRLF


            If line breaks remain implicit, one chance you have left is make it part of the lexer; don't only skip whitepace, but also (# .*)? [rn]. You can also create a token COMMENT for that and place it in all places of the grammar where it may appear. This strategy doesn't work for nested multiline comments, though, a common problem in older languages.






            share|cite|improve this answer









            $endgroup$



            Without multi-line statements, it's rather simple; assuming line is the non-terminal for well-formed line, change its right-hand-side appearances to



            line (# .*)?


            (borrowing regular expression syntax for brevity).



            Otherwise, if you explicitly handle line breaks, replace occurrences of the line-break token CRLF (or so) similarly by



            (# .*)? CRLF


            If line breaks remain implicit, one chance you have left is make it part of the lexer; don't only skip whitepace, but also (# .*)? [rn]. You can also create a token COMMENT for that and place it in all places of the grammar where it may appear. This strategy doesn't work for nested multiline comments, though, a common problem in older languages.







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered Apr 13 at 9:10









            RaphaelRaphael

            58.2k24143319




            58.2k24143319












            • $begingroup$
              thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
              $endgroup$
              – Troy McClure
              Apr 13 at 9:17










            • $begingroup$
              @TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
              $endgroup$
              – Raphael
              Apr 13 at 17:40










            • $begingroup$
              The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
              $endgroup$
              – Raphael
              Apr 13 at 17:44












            • $begingroup$
              @troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
              $endgroup$
              – rici
              Apr 13 at 19:49










            • $begingroup$
              @Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
              $endgroup$
              – rici
              Apr 13 at 21:38


















            • $begingroup$
              thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
              $endgroup$
              – Troy McClure
              Apr 13 at 9:17










            • $begingroup$
              @TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
              $endgroup$
              – Raphael
              Apr 13 at 17:40










            • $begingroup$
              The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
              $endgroup$
              – Raphael
              Apr 13 at 17:44












            • $begingroup$
              @troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
              $endgroup$
              – rici
              Apr 13 at 19:49










            • $begingroup$
              @Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
              $endgroup$
              – rici
              Apr 13 at 21:38
















            $begingroup$
            thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
            $endgroup$
            – Troy McClure
            Apr 13 at 9:17




            $begingroup$
            thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
            $endgroup$
            – Troy McClure
            Apr 13 at 9:17












            $begingroup$
            @TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
            $endgroup$
            – Raphael
            Apr 13 at 17:40




            $begingroup$
            @TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
            $endgroup$
            – Raphael
            Apr 13 at 17:40












            $begingroup$
            The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
            $endgroup$
            – Raphael
            Apr 13 at 17:44






            $begingroup$
            The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
            $endgroup$
            – Raphael
            Apr 13 at 17:44














            $begingroup$
            @troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
            $endgroup$
            – rici
            Apr 13 at 19:49




            $begingroup$
            @troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
            $endgroup$
            – rici
            Apr 13 at 19:49












            $begingroup$
            @Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
            $endgroup$
            – rici
            Apr 13 at 21:38




            $begingroup$
            @Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
            $endgroup$
            – rici
            Apr 13 at 21:38











            0












            $begingroup$

            Can just treat comments as whitespaces.






            share|cite|improve this answer









            $endgroup$


















              0












              $begingroup$

              Can just treat comments as whitespaces.






              share|cite|improve this answer









              $endgroup$
















                0












                0








                0





                $begingroup$

                Can just treat comments as whitespaces.






                share|cite|improve this answer









                $endgroup$



                Can just treat comments as whitespaces.







                share|cite|improve this answer












                share|cite|improve this answer



                share|cite|improve this answer










                answered Apr 13 at 23:24









                Troy McClureTroy McClure

                30129




                30129






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Computer Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f106897%2fhow-do-i-model-line-comments-in-a-cfg%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Plaza Victoria

                    Brian Clough

                    Cáceres