How do I model line comments in a CFG?

Assume we want to define a context free grammar of say a programming language, where on each line everything after the character # until the end of line is considered a comment and should be ignored. How to express that in a context free grammar?

edited Apr 13 at 9:03

Raphael♦

58.2k24143319

asked Apr 13 at 8:52

Troy McClure

30129

add a comment |

edited Apr 13 at 9:03

Raphael♦

58.2k24143319

asked Apr 13 at 8:52

Troy McClure

30129

add a comment |

edited Apr 13 at 9:03

Raphael♦

58.2k24143319

asked Apr 13 at 8:52

Troy McClure

30129

context-free formal-grammars

edited Apr 13 at 9:03

Raphael♦

58.2k24143319

asked Apr 13 at 8:52

Troy McClure

30129

edited Apr 13 at 9:03

Raphael♦

58.2k24143319

asked Apr 13 at 8:52

Troy McClure

30129

edited Apr 13 at 9:03

Raphael♦

58.2k24143319

edited Apr 13 at 9:03

Raphael♦

58.2k24143319

edited Apr 13 at 9:03

Raphael♦

58.2k24143319

asked Apr 13 at 8:52

Troy McClure

30129

asked Apr 13 at 8:52

Troy McClure

30129

asked Apr 13 at 8:52

Troy McClure

30129

add a comment |

2 Answers
2

active

oldest

votes

Without multi-line statements, it's rather simple; assuming line is the non-terminal for well-formed line, change its right-hand-side appearances to

line (# .*)?

(borrowing regular expression syntax for brevity).

Otherwise, if you explicitly handle line breaks, replace occurrences of the line-break token CRLF (or so) similarly by

(# .*)? CRLF

If line breaks remain implicit, one chance you have left is make it part of the lexer; don't only skip whitepace, but also (# .*)? [rn]. You can also create a token COMMENT for that and place it in all places of the grammar where it may appear. This strategy doesn't work for nested multiline comments, though, a common problem in older languages.

answered Apr 13 at 9:10

Raphael♦

58.2k24143319

$begingroup$
thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
$endgroup$
– Troy McClure
Apr 13 at 9:17

$begingroup$
@TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
$endgroup$
– Raphael♦
Apr 13 at 17:40

$begingroup$
The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
$endgroup$
– Raphael♦
Apr 13 at 17:44

$begingroup$
@troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
$endgroup$
– rici
Apr 13 at 19:49

$begingroup$
@Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
$endgroup$
– rici
Apr 13 at 21:38

|
show 1 more comment

Can just treat comments as whitespaces.

answered Apr 13 at 23:24

Troy McClure

30129

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "419"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f106897%2fhow-do-i-model-line-comments-in-a-cfg%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Without multi-line statements, it's rather simple; assuming line is the non-terminal for well-formed line, change its right-hand-side appearances to

line (# .*)?

(borrowing regular expression syntax for brevity).

Otherwise, if you explicitly handle line breaks, replace occurrences of the line-break token CRLF (or so) similarly by

(# .*)? CRLF

answered Apr 13 at 9:10

Raphael♦

58.2k24143319

$begingroup$
thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
$endgroup$
– Troy McClure
Apr 13 at 9:17

$begingroup$
@TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
$endgroup$
– Raphael♦
Apr 13 at 17:40

$begingroup$
The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
$endgroup$
– Raphael♦
Apr 13 at 17:44

$begingroup$
@troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
$endgroup$
– rici
Apr 13 at 19:49

$begingroup$
@Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
$endgroup$
– rici
Apr 13 at 21:38

|
show 1 more comment

Without multi-line statements, it's rather simple; assuming line is the non-terminal for well-formed line, change its right-hand-side appearances to

line (# .*)?

(borrowing regular expression syntax for brevity).

Otherwise, if you explicitly handle line breaks, replace occurrences of the line-break token CRLF (or so) similarly by

(# .*)? CRLF

answered Apr 13 at 9:10

Raphael♦

58.2k24143319

$begingroup$
thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
$endgroup$
– Troy McClure
Apr 13 at 9:17

$begingroup$
@TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
$endgroup$
– Raphael♦
Apr 13 at 17:40

$begingroup$
The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
$endgroup$
– Raphael♦
Apr 13 at 17:44

$begingroup$
@troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
$endgroup$
– rici
Apr 13 at 19:49

$begingroup$
@Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
$endgroup$
– rici
Apr 13 at 21:38

|
show 1 more comment

Without multi-line statements, it's rather simple; assuming line is the non-terminal for well-formed line, change its right-hand-side appearances to

line (# .*)?

(borrowing regular expression syntax for brevity).

Otherwise, if you explicitly handle line breaks, replace occurrences of the line-break token CRLF (or so) similarly by

(# .*)? CRLF

answered Apr 13 at 9:10

Raphael♦

58.2k24143319

Without multi-line statements, it's rather simple; assuming line is the non-terminal for well-formed line, change its right-hand-side appearances to

line (# .*)?

(borrowing regular expression syntax for brevity).

Otherwise, if you explicitly handle line breaks, replace occurrences of the line-break token CRLF (or so) similarly by

(# .*)? CRLF

answered Apr 13 at 9:10

Raphael♦

58.2k24143319

answered Apr 13 at 9:10

Raphael♦

58.2k24143319

answered Apr 13 at 9:10

Raphael♦

58.2k24143319

answered Apr 13 at 9:10

Raphael♦

58.2k24143319

$begingroup$
thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
$endgroup$
– Troy McClure
Apr 13 at 9:17

$begingroup$
@TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
$endgroup$
– Raphael♦
Apr 13 at 17:40

$begingroup$
The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
$endgroup$
– Raphael♦
Apr 13 at 17:44

$begingroup$
@troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
$endgroup$
– rici
Apr 13 at 19:49

$begingroup$
@Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
$endgroup$
– rici
Apr 13 at 21:38

|
show 1 more comment

$begingroup$
thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
$endgroup$
– Troy McClure
Apr 13 at 9:17

$begingroup$
@TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
$endgroup$
– Raphael♦
Apr 13 at 17:40

$begingroup$
The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
$endgroup$
– Raphael♦
Apr 13 at 17:44

$begingroup$
@troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
$endgroup$
– rici
Apr 13 at 19:49

$begingroup$
@Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
$endgroup$
– rici
Apr 13 at 21:38

thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )

– Troy McClure
Apr 13 at 9:17

@TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?

– Raphael♦
Apr 13 at 17:40

The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.

– Raphael♦
Apr 13 at 17:44

@troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.

– rici
Apr 13 at 19:49

@Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.

– rici
Apr 13 at 21:38

|
show 1 more comment

Can just treat comments as whitespaces.

answered Apr 13 at 23:24

Troy McClure

30129

add a comment |

Can just treat comments as whitespaces.

answered Apr 13 at 23:24

Troy McClure

30129

add a comment |

Can just treat comments as whitespaces.

answered Apr 13 at 23:24

Troy McClure

30129

Can just treat comments as whitespaces.

answered Apr 13 at 23:24

Troy McClure

30129

answered Apr 13 at 23:24

Troy McClure

30129

answered Apr 13 at 23:24

Troy McClure

30129

answered Apr 13 at 23:24

Troy McClure

30129

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Computer Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Csdrhrt