How to correctly write regular expression to match ASCII control chars
I would like to to create a regular expression in elisp (in the standard 'read' form) to match extended ASCII chars the same as PCRE does:
^[a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]*$
So, I'm currently сonfused about x7f-xff
. Is there a way to set a range using something like xhh?
regular-expressions
add a comment |
I would like to to create a regular expression in elisp (in the standard 'read' form) to match extended ASCII chars the same as PCRE does:
^[a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]*$
So, I'm currently сonfused about x7f-xff
. Is there a way to set a range using something like xhh?
regular-expressions
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
Apr 14 at 15:34
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
Apr 14 at 16:13
And as I can seeÀ
is defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF
– serghei
Apr 14 at 16:19
add a comment |
I would like to to create a regular expression in elisp (in the standard 'read' form) to match extended ASCII chars the same as PCRE does:
^[a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]*$
So, I'm currently сonfused about x7f-xff
. Is there a way to set a range using something like xhh?
regular-expressions
I would like to to create a regular expression in elisp (in the standard 'read' form) to match extended ASCII chars the same as PCRE does:
^[a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]*$
So, I'm currently сonfused about x7f-xff
. Is there a way to set a range using something like xhh?
regular-expressions
regular-expressions
edited Apr 14 at 15:59
serghei
asked Apr 14 at 14:54
sergheiserghei
189110
189110
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
Apr 14 at 15:34
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
Apr 14 at 16:13
And as I can seeÀ
is defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF
– serghei
Apr 14 at 16:19
add a comment |
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
Apr 14 at 15:34
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
Apr 14 at 16:13
And as I can seeÀ
is defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF
– serghei
Apr 14 at 16:19
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
Apr 14 at 15:34
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
Apr 14 at 15:34
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
Apr 14 at 16:13
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
Apr 14 at 16:13
And as I can see
À
is defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF– serghei
Apr 14 at 16:19
And as I can see
À
is defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF– serghei
Apr 14 at 16:19
add a comment |
1 Answer
1
active
oldest
votes
You can use -ÿ
instead of x7f-xff
. That first character, which StackExchange prints as a space, is DEL
, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).
That is, you can just insert the characters themselves in the regexp pattern.
One way to input such characters is to use C-x 8 RET
. To search for any char in the range x7f
through xff
you would type this at the C-M-s
prompt (without the spaces):
[ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "583"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2femacs.stackexchange.com%2fquestions%2f48925%2fhow-to-correctly-write-regular-expression-to-match-ascii-control-chars%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use -ÿ
instead of x7f-xff
. That first character, which StackExchange prints as a space, is DEL
, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).
That is, you can just insert the characters themselves in the regexp pattern.
One way to input such characters is to use C-x 8 RET
. To search for any char in the range x7f
through xff
you would type this at the C-M-s
prompt (without the spaces):
[ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]
add a comment |
You can use -ÿ
instead of x7f-xff
. That first character, which StackExchange prints as a space, is DEL
, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).
That is, you can just insert the characters themselves in the regexp pattern.
One way to input such characters is to use C-x 8 RET
. To search for any char in the range x7f
through xff
you would type this at the C-M-s
prompt (without the spaces):
[ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]
add a comment |
You can use -ÿ
instead of x7f-xff
. That first character, which StackExchange prints as a space, is DEL
, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).
That is, you can just insert the characters themselves in the regexp pattern.
One way to input such characters is to use C-x 8 RET
. To search for any char in the range x7f
through xff
you would type this at the C-M-s
prompt (without the spaces):
[ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]
You can use -ÿ
instead of x7f-xff
. That first character, which StackExchange prints as a space, is DEL
, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).
That is, you can just insert the characters themselves in the regexp pattern.
One way to input such characters is to use C-x 8 RET
. To search for any char in the range x7f
through xff
you would type this at the C-M-s
prompt (without the spaces):
[ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]
edited Apr 14 at 17:52
answered Apr 14 at 17:46
DrewDrew
49.3k463108
49.3k463108
add a comment |
add a comment |
Thanks for contributing an answer to Emacs Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2femacs.stackexchange.com%2fquestions%2f48925%2fhow-to-correctly-write-regular-expression-to-match-ascii-control-chars%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
Apr 14 at 15:34
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
Apr 14 at 16:13
And as I can see
À
is defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF– serghei
Apr 14 at 16:19