How do I model line comments in a CFG?
$begingroup$
Assume we want to define a context free grammar of say a programming language, where on each line everything after the character # until the end of line is considered a comment and should be ignored. How to express that in a context free grammar?
context-free formal-grammars
$endgroup$
add a comment |
$begingroup$
Assume we want to define a context free grammar of say a programming language, where on each line everything after the character # until the end of line is considered a comment and should be ignored. How to express that in a context free grammar?
context-free formal-grammars
$endgroup$
add a comment |
$begingroup$
Assume we want to define a context free grammar of say a programming language, where on each line everything after the character # until the end of line is considered a comment and should be ignored. How to express that in a context free grammar?
context-free formal-grammars
$endgroup$
Assume we want to define a context free grammar of say a programming language, where on each line everything after the character # until the end of line is considered a comment and should be ignored. How to express that in a context free grammar?
context-free formal-grammars
context-free formal-grammars
edited Apr 13 at 9:03
Raphael♦
58.2k24143319
58.2k24143319
asked Apr 13 at 8:52
Troy McClureTroy McClure
30129
30129
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Without multi-line statements, it's rather simple; assuming line is the non-terminal for well-formed line, change its right-hand-side appearances to
line (# .*)?
(borrowing regular expression syntax for brevity).
Otherwise, if you explicitly handle line breaks, replace occurrences of the line-break token CRLF (or so) similarly by
(# .*)? CRLF
If line breaks remain implicit, one chance you have left is make it part of the lexer; don't only skip whitepace, but also (# .*)? [rn]. You can also create a token COMMENT for that and place it in all places of the grammar where it may appear. This strategy doesn't work for nested multiline comments, though, a common problem in older languages.
$endgroup$
$begingroup$
thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
$endgroup$
– Troy McClure
Apr 13 at 9:17
$begingroup$
@TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
$endgroup$
– Raphael♦
Apr 13 at 17:40
$begingroup$
The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
$endgroup$
– Raphael♦
Apr 13 at 17:44
$begingroup$
@troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
$endgroup$
– rici
Apr 13 at 19:49
$begingroup$
@Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
$endgroup$
– rici
Apr 13 at 21:38
|
show 1 more comment
$begingroup$
Can just treat comments as whitespaces.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "419"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f106897%2fhow-do-i-model-line-comments-in-a-cfg%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Without multi-line statements, it's rather simple; assuming line is the non-terminal for well-formed line, change its right-hand-side appearances to
line (# .*)?
(borrowing regular expression syntax for brevity).
Otherwise, if you explicitly handle line breaks, replace occurrences of the line-break token CRLF (or so) similarly by
(# .*)? CRLF
If line breaks remain implicit, one chance you have left is make it part of the lexer; don't only skip whitepace, but also (# .*)? [rn]. You can also create a token COMMENT for that and place it in all places of the grammar where it may appear. This strategy doesn't work for nested multiline comments, though, a common problem in older languages.
$endgroup$
$begingroup$
thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
$endgroup$
– Troy McClure
Apr 13 at 9:17
$begingroup$
@TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
$endgroup$
– Raphael♦
Apr 13 at 17:40
$begingroup$
The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
$endgroup$
– Raphael♦
Apr 13 at 17:44
$begingroup$
@troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
$endgroup$
– rici
Apr 13 at 19:49
$begingroup$
@Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
$endgroup$
– rici
Apr 13 at 21:38
|
show 1 more comment
$begingroup$
Without multi-line statements, it's rather simple; assuming line is the non-terminal for well-formed line, change its right-hand-side appearances to
line (# .*)?
(borrowing regular expression syntax for brevity).
Otherwise, if you explicitly handle line breaks, replace occurrences of the line-break token CRLF (or so) similarly by
(# .*)? CRLF
If line breaks remain implicit, one chance you have left is make it part of the lexer; don't only skip whitepace, but also (# .*)? [rn]. You can also create a token COMMENT for that and place it in all places of the grammar where it may appear. This strategy doesn't work for nested multiline comments, though, a common problem in older languages.
$endgroup$
$begingroup$
thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
$endgroup$
– Troy McClure
Apr 13 at 9:17
$begingroup$
@TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
$endgroup$
– Raphael♦
Apr 13 at 17:40
$begingroup$
The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
$endgroup$
– Raphael♦
Apr 13 at 17:44
$begingroup$
@troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
$endgroup$
– rici
Apr 13 at 19:49
$begingroup$
@Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
$endgroup$
– rici
Apr 13 at 21:38
|
show 1 more comment
$begingroup$
Without multi-line statements, it's rather simple; assuming line is the non-terminal for well-formed line, change its right-hand-side appearances to
line (# .*)?
(borrowing regular expression syntax for brevity).
Otherwise, if you explicitly handle line breaks, replace occurrences of the line-break token CRLF (or so) similarly by
(# .*)? CRLF
If line breaks remain implicit, one chance you have left is make it part of the lexer; don't only skip whitepace, but also (# .*)? [rn]. You can also create a token COMMENT for that and place it in all places of the grammar where it may appear. This strategy doesn't work for nested multiline comments, though, a common problem in older languages.
$endgroup$
Without multi-line statements, it's rather simple; assuming line is the non-terminal for well-formed line, change its right-hand-side appearances to
line (# .*)?
(borrowing regular expression syntax for brevity).
Otherwise, if you explicitly handle line breaks, replace occurrences of the line-break token CRLF (or so) similarly by
(# .*)? CRLF
If line breaks remain implicit, one chance you have left is make it part of the lexer; don't only skip whitepace, but also (# .*)? [rn]. You can also create a token COMMENT for that and place it in all places of the grammar where it may appear. This strategy doesn't work for nested multiline comments, though, a common problem in older languages.
answered Apr 13 at 9:10
Raphael♦Raphael
58.2k24143319
58.2k24143319
$begingroup$
thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
$endgroup$
– Troy McClure
Apr 13 at 9:17
$begingroup$
@TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
$endgroup$
– Raphael♦
Apr 13 at 17:40
$begingroup$
The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
$endgroup$
– Raphael♦
Apr 13 at 17:44
$begingroup$
@troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
$endgroup$
– rici
Apr 13 at 19:49
$begingroup$
@Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
$endgroup$
– rici
Apr 13 at 21:38
|
show 1 more comment
$begingroup$
thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
$endgroup$
– Troy McClure
Apr 13 at 9:17
$begingroup$
@TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
$endgroup$
– Raphael♦
Apr 13 at 17:40
$begingroup$
The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
$endgroup$
– Raphael♦
Apr 13 at 17:44
$begingroup$
@troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
$endgroup$
– rici
Apr 13 at 19:49
$begingroup$
@Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
$endgroup$
– rici
Apr 13 at 21:38
$begingroup$
thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
$endgroup$
– Troy McClure
Apr 13 at 9:17
$begingroup$
thanks, sorry but I don't fully understand the syntax you're using. also I have to do it all using a context free grammar, even if I rely on a lexer, the lexer has to be expressed as a cfg as well. and unfortunately I have to support multiline statements that may contain comments each, just like // in C (btw I see that C lexer also had to do it using some kind of "hack" as in here lysator.liu.se/c/ANSI-C-grammar-l.html )
$endgroup$
– Troy McClure
Apr 13 at 9:17
$begingroup$
@TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
$endgroup$
– Raphael♦
Apr 13 at 17:40
$begingroup$
@TroyMcClure Are you a student facing an exercise task, or are you a practitioner traying to build a compiler?
$endgroup$
– Raphael♦
Apr 13 at 17:40
$begingroup$
The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
$endgroup$
– Raphael♦
Apr 13 at 17:44
$begingroup$
The first two approaches should be workable given the setting you describe. Note that, in practice, such issues are probably often taken care of by some preprocessing -- it's trivial to run a tool over a a set of files that removes all comment suffices from lines (unless your language has stuff like multi-line strings, you are interested in extracting the content of some comments, or any number of other complications, of course). Then you don't have to handle them in your grammar.
$endgroup$
– Raphael♦
Apr 13 at 17:44
$begingroup$
@troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
$endgroup$
– rici
Apr 13 at 19:49
$begingroup$
@troy: the link you provide a quote is not "the C lexer", but rather a lexer put together by someone, not as part of any C compiler. Furthermore, it does not recognise line comments, which as Raphael says, are trivial, but rather C's multiline comment syntax. (This can also be represented as a regular expression, and you should be able to find a correct one easily enough a or with a little more thought write one.) The hack in the code snippet was probably pragmatic with lex, but flex works much better with regular expressions and any serious C implementation would use one.
$endgroup$
– rici
Apr 13 at 19:49
$begingroup$
@Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
$endgroup$
– rici
Apr 13 at 21:38
$begingroup$
@Troy: anyway, you "model" a comment just like you model whitespace. In almost all languages, they're syntactically and semantically identical.
$endgroup$
– rici
Apr 13 at 21:38
|
show 1 more comment
$begingroup$
Can just treat comments as whitespaces.
$endgroup$
add a comment |
$begingroup$
Can just treat comments as whitespaces.
$endgroup$
add a comment |
$begingroup$
Can just treat comments as whitespaces.
$endgroup$
Can just treat comments as whitespaces.
answered Apr 13 at 23:24
Troy McClureTroy McClure
30129
30129
add a comment |
add a comment |
Thanks for contributing an answer to Computer Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f106897%2fhow-do-i-model-line-comments-in-a-cfg%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown