awk, sed, or other text processing suggestions, please

I have the following repeating pattern of text that needs to be reformatted.

Normally this should be easy, even with a standard text editor, but in this case I need to expand the information in the parenthesis and enumerate them.

Best I give an example:

"Gene Code (1A - 1F) D2 fragment, D74F"

I need to be able to have the final product look like this:

Gene Code, 1A, D2 fragment, D74F

Gene Code, 1B, D2 fragment, D74F

Gene Code, 1C, D2 fragment, D74F

Gene Code, 1D, D2 fragment, D74F

Gene Code, 1E, D2 fragment, D74F

Gene Code, 1F, D2 fragment, D74F

The snag is that the initial string contained in the parenthesis, could be anything like 1A-1F, or 3D-3H, etc. That is the only shifting bits of information. The numeric in the parenthesis is always the same, just the alphabetic letters need expansion with their associated number.

So someway of correlating the alphabet with the numbers is needed.

This looks like a mind-bender to me. Any help much appreciated. New to this, by the way.

edited Dec 31 '18 at 10:18

Cyrus

3,80611024

asked Dec 30 '18 at 21:49

jeffschips

Is this performance-sensitive? An easy solution with a for loop would be not very fast.

– Eugen Rieck
Dec 30 '18 at 21:53

add a comment |

I have the following repeating pattern of text that needs to be reformatted.

Normally this should be easy, even with a standard text editor, but in this case I need to expand the information in the parenthesis and enumerate them.

Best I give an example:

"Gene Code (1A - 1F) D2 fragment, D74F"

I need to be able to have the final product look like this:

Gene Code, 1A, D2 fragment, D74F

Gene Code, 1B, D2 fragment, D74F

Gene Code, 1C, D2 fragment, D74F

Gene Code, 1D, D2 fragment, D74F

Gene Code, 1E, D2 fragment, D74F

Gene Code, 1F, D2 fragment, D74F

So someway of correlating the alphabet with the numbers is needed.

This looks like a mind-bender to me. Any help much appreciated. New to this, by the way.

edited Dec 31 '18 at 10:18

Cyrus

3,80611024

asked Dec 30 '18 at 21:49

jeffschips

Is this performance-sensitive? An easy solution with a for loop would be not very fast.

– Eugen Rieck
Dec 30 '18 at 21:53

add a comment |

I have the following repeating pattern of text that needs to be reformatted.

Normally this should be easy, even with a standard text editor, but in this case I need to expand the information in the parenthesis and enumerate them.

Best I give an example:

"Gene Code (1A - 1F) D2 fragment, D74F"

I need to be able to have the final product look like this:

Gene Code, 1A, D2 fragment, D74F

Gene Code, 1B, D2 fragment, D74F

Gene Code, 1C, D2 fragment, D74F

Gene Code, 1D, D2 fragment, D74F

Gene Code, 1E, D2 fragment, D74F

Gene Code, 1F, D2 fragment, D74F

So someway of correlating the alphabet with the numbers is needed.

This looks like a mind-bender to me. Any help much appreciated. New to this, by the way.

edited Dec 31 '18 at 10:18

Cyrus

3,80611024

asked Dec 30 '18 at 21:49

jeffschips

I have the following repeating pattern of text that needs to be reformatted.

Normally this should be easy, even with a standard text editor, but in this case I need to expand the information in the parenthesis and enumerate them.

Best I give an example:

"Gene Code (1A - 1F) D2 fragment, D74F"

I need to be able to have the final product look like this:

Gene Code, 1A, D2 fragment, D74F

Gene Code, 1B, D2 fragment, D74F

Gene Code, 1C, D2 fragment, D74F

Gene Code, 1D, D2 fragment, D74F

Gene Code, 1E, D2 fragment, D74F

Gene Code, 1F, D2 fragment, D74F

So someway of correlating the alphabet with the numbers is needed.

This looks like a mind-bender to me. Any help much appreciated. New to this, by the way.

bash sed awk text-editing

edited Dec 31 '18 at 10:18

Cyrus

3,80611024

asked Dec 30 '18 at 21:49

jeffschips

edited Dec 31 '18 at 10:18

Cyrus

3,80611024

asked Dec 30 '18 at 21:49

jeffschips

edited Dec 31 '18 at 10:18

Cyrus

3,80611024

edited Dec 31 '18 at 10:18

Cyrus

3,80611024

edited Dec 31 '18 at 10:18

Cyrus

3,80611024

asked Dec 30 '18 at 21:49

jeffschips

asked Dec 30 '18 at 21:49

jeffschips

asked Dec 30 '18 at 21:49

jeffschips

Is this performance-sensitive? An easy solution with a for loop would be not very fast.

– Eugen Rieck
Dec 30 '18 at 21:53

add a comment |

Is this performance-sensitive? An easy solution with a for loop would be not very fast.

– Eugen Rieck
Dec 30 '18 at 21:53

Is this performance-sensitive? An easy solution with a for loop would be not very fast.

– Eugen Rieck
Dec 30 '18 at 21:53

add a comment |

4 Answers
4

active

oldest

votes

This bash script

#!/bin/bash



PART1=$(echo "$1" | sed 's/(.*)s(.*/1/')

PART3=$(echo "$1" | sed 's/.*)(.*)/1/')

PART2=$(echo "$1" | sed 's/.*(s*(.*)).*/1/')



START=$(echo "$PART2" | sed 's/s*-.*//')

END=$(echo "$PART2" | sed 's/.*-s*//')



STARTNUM=$(echo "$START" | sed 's/^(.).*/1/')

ENDNUM=$(echo "$END" | sed 's/^(.).*/1/')

if test "$STARTNUM" '!=' "$ENDNUM"; then

    echo "Error: Numeral is different"

    exit 1

fi



STARTLETTER=$(echo "$START" | sed 's/^.(.).*/1/')

ENDLETTER=$(echo "$END" | sed 's/^.(.).*/1/')



OUTPUT=''

for LETTER in A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ; do

    test "$LETTER" '==' "$STARTLETTER" && OUTPUT='yes'

    test -n "$OUTPUT" && echo "$PART1, $STARTNUM$LETTER,$PART3"

    test "$LETTER" '==' "$ENDLETTER" && OUTPUT=''

done

Will do what you need, albeit not in a very performant way when called with the original text as $1

EDIT

As requested a few words about the sed expressions:

I isolate PART1 by taking everything before whitespace and an opening (

I isolate PART3 by taking everything from the closing ) onwards

I isolate PART2 by taking what is between ( and ), ignoring whitespace

START and END are isolated by the dash, again ignoring whitespace

Number and Letter are isolated by being first and second character

edited Dec 30 '18 at 22:56

answered Dec 30 '18 at 22:08

Eugen Rieck

10.1k22128

A breakdown of the sed expressions would be fantastic, looks like some sub-expressions, and a s that does...?

– Xen2050
Dec 30 '18 at 22:48

@Xen2050 The s is just for robustness: Ignore or correctly process whitespace around the relevant parts. Everything else should be quite self-explaining.

– Eugen Rieck
Dec 30 '18 at 22:52

1

I wouldn't count on it being self-explaining to someone looking for "awk, sed, or basically anything," every hint helps +1

– Xen2050
Dec 30 '18 at 22:58

add a comment |

If GNU sed is available

sed -r 's/([^(]+) ((.)(.) - .(.))(.*)/printf x271, 2%s,5\nx27 {3..4}/e' <<<'Gene Code (1A - 1F) D2 fragment, D74F'

Gene Code, 1A, D2 fragment, D74F

Gene Code, 1B, D2 fragment, D74F

Gene Code, 1C, D2 fragment, D74F

Gene Code, 1D, D2 fragment, D74F

Gene Code, 1E, D2 fragment, D74F

Gene Code, 1F, D2 fragment, D74F

If not, run it sending as pipe to the shell

sed -r 's/([^(]+) ((.)(.) - .(.))(.*)/printf x271, 2%s,5\nx27 {3..4}/' <<<'Gene Code (1A - 1F) D2 fragment, D74F'|bash

Gene Code, 1A, D2 fragment, D74F

Gene Code, 1B, D2 fragment, D74F

Gene Code, 1C, D2 fragment, D74F

Gene Code, 1D, D2 fragment, D74F

Gene Code, 1E, D2 fragment, D74F

Gene Code, 1F, D2 fragment, D74F

(with sh and ksh the output is the same)

answered Dec 31 '18 at 0:57

Paulo

57628

add a comment |

A perl way:

#!/usr/bin/perl

use feature 'say';



my $str = '"Gene Code (3D - 3H) D2 fragment, D74F"';

# get begin number, begin letter, end number, end letter

my ($bn,$bl,$en,$el) = $str =~ /((.)(.) - (.)(.))/;

# loop from begin letter to end letter

for my $i ($bl .. $el) {

    # do the substitution and print

    ($_ = $str) =~ s/ (.. - ..)/, $bn$i,/ && say;

}

Output:

"Gene Code, 3D, D2 fragment, D74F"

"Gene Code, 3E, D2 fragment, D74F"

"Gene Code, 3F, D2 fragment, D74F"

"Gene Code, 3G, D2 fragment, D74F"

"Gene Code, 3H, D2 fragment, D74F"

answered Dec 31 '18 at 11:03

Toto

3,735101226

Thank you everyone for providing these great solutions. I'm really awed by the generosity and professionalism. It works! I didn't know sed was so powerful. Now I need to figure out how to pass over the entries that don't match this specific pattern. Thank you all and have a great New Year!!

– jeffschips
Dec 31 '18 at 22:33

@jeffschips: You're welcome.Feel free to mark one of the answers as accepted, see: superuser.com/help/someone-answers

– Toto
Jan 1 at 11:00

add a comment |

A version that doesn't require looping, and uses only four calls to sed. Granted though, my version doesn't check that the two numerics are equal. In fact, the second one is ignored and can even be omitted, as with "Gene Code (91K - Q) D2 fragment, D74F". Also the low bound and high bound can appear in either order. If the low bound is greater than the high bound, then the output sequence is reversed.

$ cat foo

#!/usr/bin/env bash



# Script to expand $1 passed as:



# "Gene Code (91K - 91Q) D2 fragment, D74F"

# 

# into the output:

# 

# Gene Code, 91K, D2 fragment, D74F

# Gene Code, 91L, D2 fragment, D74F

# Gene Code, 91M, D2 fragment, D74F

# Gene Code, 91N, D2 fragment, D74F

# Gene Code, 91O, D2 fragment, D74F

# Gene Code, 91P, D2 fragment, D74F

# Gene Code, 91Q, D2 fragment, D74F





# Copy $1 into FMT_STRING, replacing the " (91K - 91Q)" bit with a ', %s,' 

# printf directive, such as 'Gene Code, %s, D2 fragment, D74F':



FMT_STRING="$(sed -e 's/ (.* - .*)/, %s,/' <<< "$1")"



# Parse the beginning and ending bounds and format them with just a 

# space between, such as '91K 91Q':



BOUNDS="$(sed -e 's/^[^(]*((.*) - (.*)) .*/1 2/' <<< "$1")"



# Extract the (first) static numeric part from BOUNDS, e.g. '91'



NUMERIC="$(sed -e 's/[^0-9].*//' <<< "$BOUNDS")"



# remove all digits [0-9] from BOUNDS, e.g. 'K Q'

BOUNDS="$(sed -e 's/[0-9]//g' <<< "$BOUNDS")"



FMT_STRING="$(printf "$FMT_STRING" "${NUMERIC}%c")"



jot -w "$FMT_STRING" - $BOUNDS

Sample output:

$ ./foo "Gene Code (737L - 737X) D2 fragment, D74F"

Gene Code, 737L, D2 fragment, D74F

Gene Code, 737M, D2 fragment, D74F

Gene Code, 737N, D2 fragment, D74F

Gene Code, 737O, D2 fragment, D74F

Gene Code, 737P, D2 fragment, D74F

Gene Code, 737Q, D2 fragment, D74F

Gene Code, 737R, D2 fragment, D74F

Gene Code, 737S, D2 fragment, D74F

Gene Code, 737T, D2 fragment, D74F

Gene Code, 737U, D2 fragment, D74F

Gene Code, 737V, D2 fragment, D74F

Gene Code, 737W, D2 fragment, D74F

Gene Code, 737X, D2 fragment, D74F

Reversing the bounds reverses the output:

$ ./foo "Gene Code (737X - 737L) D2 fragment, D74F"

Gene Code, 737X, D2 fragment, D74F

Gene Code, 737W, D2 fragment, D74F

Gene Code, 737V, D2 fragment, D74F

Gene Code, 737U, D2 fragment, D74F

Gene Code, 737T, D2 fragment, D74F

Gene Code, 737S, D2 fragment, D74F

Gene Code, 737R, D2 fragment, D74F

Gene Code, 737Q, D2 fragment, D74F

Gene Code, 737P, D2 fragment, D74F

Gene Code, 737O, D2 fragment, D74F

Gene Code, 737N, D2 fragment, D74F

Gene Code, 737M, D2 fragment, D74F

Gene Code, 737L, D2 fragment, D74F

edited Jan 18 at 22:19

answered Jan 18 at 22:11

Jim L.

30617

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1389078%2fawk-sed-or-other-text-processing-suggestions-please%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

This bash script

#!/bin/bash



PART1=$(echo "$1" | sed 's/(.*)s(.*/1/')

PART3=$(echo "$1" | sed 's/.*)(.*)/1/')

PART2=$(echo "$1" | sed 's/.*(s*(.*)).*/1/')



START=$(echo "$PART2" | sed 's/s*-.*//')

END=$(echo "$PART2" | sed 's/.*-s*//')



STARTNUM=$(echo "$START" | sed 's/^(.).*/1/')

ENDNUM=$(echo "$END" | sed 's/^(.).*/1/')

if test "$STARTNUM" '!=' "$ENDNUM"; then

    echo "Error: Numeral is different"

    exit 1

fi



STARTLETTER=$(echo "$START" | sed 's/^.(.).*/1/')

ENDLETTER=$(echo "$END" | sed 's/^.(.).*/1/')



OUTPUT=''

for LETTER in A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ; do

    test "$LETTER" '==' "$STARTLETTER" && OUTPUT='yes'

    test -n "$OUTPUT" && echo "$PART1, $STARTNUM$LETTER,$PART3"

    test "$LETTER" '==' "$ENDLETTER" && OUTPUT=''

done

Will do what you need, albeit not in a very performant way when called with the original text as $1

EDIT

As requested a few words about the sed expressions:

I isolate PART1 by taking everything before whitespace and an opening (

I isolate PART3 by taking everything from the closing ) onwards

I isolate PART2 by taking what is between ( and ), ignoring whitespace

START and END are isolated by the dash, again ignoring whitespace

Number and Letter are isolated by being first and second character

edited Dec 30 '18 at 22:56

answered Dec 30 '18 at 22:08

Eugen Rieck

10.1k22128

A breakdown of the sed expressions would be fantastic, looks like some sub-expressions, and a s that does...?

– Xen2050
Dec 30 '18 at 22:48

@Xen2050 The s is just for robustness: Ignore or correctly process whitespace around the relevant parts. Everything else should be quite self-explaining.

– Eugen Rieck
Dec 30 '18 at 22:52

1

I wouldn't count on it being self-explaining to someone looking for "awk, sed, or basically anything," every hint helps +1

– Xen2050
Dec 30 '18 at 22:58

add a comment |

This bash script

#!/bin/bash



PART1=$(echo "$1" | sed 's/(.*)s(.*/1/')

PART3=$(echo "$1" | sed 's/.*)(.*)/1/')

PART2=$(echo "$1" | sed 's/.*(s*(.*)).*/1/')



START=$(echo "$PART2" | sed 's/s*-.*//')

END=$(echo "$PART2" | sed 's/.*-s*//')



STARTNUM=$(echo "$START" | sed 's/^(.).*/1/')

ENDNUM=$(echo "$END" | sed 's/^(.).*/1/')

if test "$STARTNUM" '!=' "$ENDNUM"; then

    echo "Error: Numeral is different"

    exit 1

fi



STARTLETTER=$(echo "$START" | sed 's/^.(.).*/1/')

ENDLETTER=$(echo "$END" | sed 's/^.(.).*/1/')



OUTPUT=''

for LETTER in A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ; do

    test "$LETTER" '==' "$STARTLETTER" && OUTPUT='yes'

    test -n "$OUTPUT" && echo "$PART1, $STARTNUM$LETTER,$PART3"

    test "$LETTER" '==' "$ENDLETTER" && OUTPUT=''

done

Will do what you need, albeit not in a very performant way when called with the original text as $1

EDIT

As requested a few words about the sed expressions:

I isolate PART1 by taking everything before whitespace and an opening (

I isolate PART3 by taking everything from the closing ) onwards

I isolate PART2 by taking what is between ( and ), ignoring whitespace

START and END are isolated by the dash, again ignoring whitespace

Number and Letter are isolated by being first and second character

edited Dec 30 '18 at 22:56

answered Dec 30 '18 at 22:08

Eugen Rieck

10.1k22128

A breakdown of the sed expressions would be fantastic, looks like some sub-expressions, and a s that does...?

– Xen2050
Dec 30 '18 at 22:48

@Xen2050 The s is just for robustness: Ignore or correctly process whitespace around the relevant parts. Everything else should be quite self-explaining.

– Eugen Rieck
Dec 30 '18 at 22:52

1

I wouldn't count on it being self-explaining to someone looking for "awk, sed, or basically anything," every hint helps +1

– Xen2050
Dec 30 '18 at 22:58

add a comment |

This bash script

#!/bin/bash



PART1=$(echo "$1" | sed 's/(.*)s(.*/1/')

PART3=$(echo "$1" | sed 's/.*)(.*)/1/')

PART2=$(echo "$1" | sed 's/.*(s*(.*)).*/1/')



START=$(echo "$PART2" | sed 's/s*-.*//')

END=$(echo "$PART2" | sed 's/.*-s*//')



STARTNUM=$(echo "$START" | sed 's/^(.).*/1/')

ENDNUM=$(echo "$END" | sed 's/^(.).*/1/')

if test "$STARTNUM" '!=' "$ENDNUM"; then

    echo "Error: Numeral is different"

    exit 1

fi



STARTLETTER=$(echo "$START" | sed 's/^.(.).*/1/')

ENDLETTER=$(echo "$END" | sed 's/^.(.).*/1/')



OUTPUT=''

for LETTER in A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ; do

    test "$LETTER" '==' "$STARTLETTER" && OUTPUT='yes'

    test -n "$OUTPUT" && echo "$PART1, $STARTNUM$LETTER,$PART3"

    test "$LETTER" '==' "$ENDLETTER" && OUTPUT=''

done

Will do what you need, albeit not in a very performant way when called with the original text as $1

EDIT

As requested a few words about the sed expressions:

I isolate PART1 by taking everything before whitespace and an opening (

I isolate PART3 by taking everything from the closing ) onwards

I isolate PART2 by taking what is between ( and ), ignoring whitespace

START and END are isolated by the dash, again ignoring whitespace

Number and Letter are isolated by being first and second character

edited Dec 30 '18 at 22:56

answered Dec 30 '18 at 22:08

Eugen Rieck

10.1k22128

This bash script

#!/bin/bash



PART1=$(echo "$1" | sed 's/(.*)s(.*/1/')

PART3=$(echo "$1" | sed 's/.*)(.*)/1/')

PART2=$(echo "$1" | sed 's/.*(s*(.*)).*/1/')



START=$(echo "$PART2" | sed 's/s*-.*//')

END=$(echo "$PART2" | sed 's/.*-s*//')



STARTNUM=$(echo "$START" | sed 's/^(.).*/1/')

ENDNUM=$(echo "$END" | sed 's/^(.).*/1/')

if test "$STARTNUM" '!=' "$ENDNUM"; then

    echo "Error: Numeral is different"

    exit 1

fi



STARTLETTER=$(echo "$START" | sed 's/^.(.).*/1/')

ENDLETTER=$(echo "$END" | sed 's/^.(.).*/1/')



OUTPUT=''

for LETTER in A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ; do

    test "$LETTER" '==' "$STARTLETTER" && OUTPUT='yes'

    test -n "$OUTPUT" && echo "$PART1, $STARTNUM$LETTER,$PART3"

    test "$LETTER" '==' "$ENDLETTER" && OUTPUT=''

done

Will do what you need, albeit not in a very performant way when called with the original text as $1

EDIT

As requested a few words about the sed expressions:

I isolate PART1 by taking everything before whitespace and an opening (

I isolate PART3 by taking everything from the closing ) onwards

I isolate PART2 by taking what is between ( and ), ignoring whitespace

START and END are isolated by the dash, again ignoring whitespace

Number and Letter are isolated by being first and second character

edited Dec 30 '18 at 22:56

answered Dec 30 '18 at 22:08

Eugen Rieck

10.1k22128

edited Dec 30 '18 at 22:56

answered Dec 30 '18 at 22:08

Eugen Rieck

10.1k22128

answered Dec 30 '18 at 22:08

Eugen Rieck

10.1k22128

answered Dec 30 '18 at 22:08

Eugen Rieck

10.1k22128

A breakdown of the sed expressions would be fantastic, looks like some sub-expressions, and a s that does...?

– Xen2050
Dec 30 '18 at 22:48

@Xen2050 The s is just for robustness: Ignore or correctly process whitespace around the relevant parts. Everything else should be quite self-explaining.

– Eugen Rieck
Dec 30 '18 at 22:52

1

I wouldn't count on it being self-explaining to someone looking for "awk, sed, or basically anything," every hint helps +1

– Xen2050
Dec 30 '18 at 22:58

add a comment |

A breakdown of the sed expressions would be fantastic, looks like some sub-expressions, and a s that does...?

– Xen2050
Dec 30 '18 at 22:48

@Xen2050 The s is just for robustness: Ignore or correctly process whitespace around the relevant parts. Everything else should be quite self-explaining.

– Eugen Rieck
Dec 30 '18 at 22:52

1

I wouldn't count on it being self-explaining to someone looking for "awk, sed, or basically anything," every hint helps +1

– Xen2050
Dec 30 '18 at 22:58

A breakdown of the sed expressions would be fantastic, looks like some sub-expressions, and a s that does...?

– Xen2050
Dec 30 '18 at 22:48

@Xen2050 The s is just for robustness: Ignore or correctly process whitespace around the relevant parts. Everything else should be quite self-explaining.

– Eugen Rieck
Dec 30 '18 at 22:52

I wouldn't count on it being self-explaining to someone looking for "awk, sed, or basically anything," every hint helps +1

– Xen2050
Dec 30 '18 at 22:58

add a comment |

If GNU sed is available

sed -r 's/([^(]+) ((.)(.) - .(.))(.*)/printf x271, 2%s,5\nx27 {3..4}/e' <<<'Gene Code (1A - 1F) D2 fragment, D74F'

Gene Code, 1A, D2 fragment, D74F

Gene Code, 1B, D2 fragment, D74F

Gene Code, 1C, D2 fragment, D74F

Gene Code, 1D, D2 fragment, D74F

Gene Code, 1E, D2 fragment, D74F

Gene Code, 1F, D2 fragment, D74F

If not, run it sending as pipe to the shell

sed -r 's/([^(]+) ((.)(.) - .(.))(.*)/printf x271, 2%s,5\nx27 {3..4}/' <<<'Gene Code (1A - 1F) D2 fragment, D74F'|bash

Gene Code, 1A, D2 fragment, D74F

Gene Code, 1B, D2 fragment, D74F

Gene Code, 1C, D2 fragment, D74F

Gene Code, 1D, D2 fragment, D74F

Gene Code, 1E, D2 fragment, D74F

Gene Code, 1F, D2 fragment, D74F

(with sh and ksh the output is the same)

answered Dec 31 '18 at 0:57

Paulo

57628

add a comment |

If GNU sed is available

sed -r 's/([^(]+) ((.)(.) - .(.))(.*)/printf x271, 2%s,5\nx27 {3..4}/e' <<<'Gene Code (1A - 1F) D2 fragment, D74F'

Gene Code, 1A, D2 fragment, D74F

Gene Code, 1B, D2 fragment, D74F

Gene Code, 1C, D2 fragment, D74F

Gene Code, 1D, D2 fragment, D74F

Gene Code, 1E, D2 fragment, D74F

Gene Code, 1F, D2 fragment, D74F

If not, run it sending as pipe to the shell

sed -r 's/([^(]+) ((.)(.) - .(.))(.*)/printf x271, 2%s,5\nx27 {3..4}/' <<<'Gene Code (1A - 1F) D2 fragment, D74F'|bash

Gene Code, 1A, D2 fragment, D74F

Gene Code, 1B, D2 fragment, D74F

Gene Code, 1C, D2 fragment, D74F

Gene Code, 1D, D2 fragment, D74F

Gene Code, 1E, D2 fragment, D74F

Gene Code, 1F, D2 fragment, D74F

(with sh and ksh the output is the same)

answered Dec 31 '18 at 0:57

Paulo

57628

add a comment |

If GNU sed is available

sed -r 's/([^(]+) ((.)(.) - .(.))(.*)/printf x271, 2%s,5\nx27 {3..4}/e' <<<'Gene Code (1A - 1F) D2 fragment, D74F'

Gene Code, 1A, D2 fragment, D74F

Gene Code, 1B, D2 fragment, D74F

Gene Code, 1C, D2 fragment, D74F

Gene Code, 1D, D2 fragment, D74F

Gene Code, 1E, D2 fragment, D74F

Gene Code, 1F, D2 fragment, D74F

If not, run it sending as pipe to the shell

sed -r 's/([^(]+) ((.)(.) - .(.))(.*)/printf x271, 2%s,5\nx27 {3..4}/' <<<'Gene Code (1A - 1F) D2 fragment, D74F'|bash

Gene Code, 1A, D2 fragment, D74F

Gene Code, 1B, D2 fragment, D74F

Gene Code, 1C, D2 fragment, D74F

Gene Code, 1D, D2 fragment, D74F

Gene Code, 1E, D2 fragment, D74F

Gene Code, 1F, D2 fragment, D74F

(with sh and ksh the output is the same)

answered Dec 31 '18 at 0:57

Paulo

57628

If GNU sed is available

sed -r 's/([^(]+) ((.)(.) - .(.))(.*)/printf x271, 2%s,5\nx27 {3..4}/e' <<<'Gene Code (1A - 1F) D2 fragment, D74F'

Gene Code, 1A, D2 fragment, D74F

Gene Code, 1B, D2 fragment, D74F

Gene Code, 1C, D2 fragment, D74F

Gene Code, 1D, D2 fragment, D74F

Gene Code, 1E, D2 fragment, D74F

Gene Code, 1F, D2 fragment, D74F

If not, run it sending as pipe to the shell

sed -r 's/([^(]+) ((.)(.) - .(.))(.*)/printf x271, 2%s,5\nx27 {3..4}/' <<<'Gene Code (1A - 1F) D2 fragment, D74F'|bash

Gene Code, 1A, D2 fragment, D74F

Gene Code, 1B, D2 fragment, D74F

Gene Code, 1C, D2 fragment, D74F

Gene Code, 1D, D2 fragment, D74F

Gene Code, 1E, D2 fragment, D74F

Gene Code, 1F, D2 fragment, D74F

(with sh and ksh the output is the same)

answered Dec 31 '18 at 0:57

Paulo

57628

answered Dec 31 '18 at 0:57

Paulo

57628

answered Dec 31 '18 at 0:57

Paulo

57628

answered Dec 31 '18 at 0:57

Paulo

57628

add a comment |

A perl way:

#!/usr/bin/perl

use feature 'say';



my $str = '"Gene Code (3D - 3H) D2 fragment, D74F"';

# get begin number, begin letter, end number, end letter

my ($bn,$bl,$en,$el) = $str =~ /((.)(.) - (.)(.))/;

# loop from begin letter to end letter

for my $i ($bl .. $el) {

    # do the substitution and print

    ($_ = $str) =~ s/ (.. - ..)/, $bn$i,/ && say;

}

Output:

"Gene Code, 3D, D2 fragment, D74F"

"Gene Code, 3E, D2 fragment, D74F"

"Gene Code, 3F, D2 fragment, D74F"

"Gene Code, 3G, D2 fragment, D74F"

"Gene Code, 3H, D2 fragment, D74F"

answered Dec 31 '18 at 11:03

Toto

3,735101226

Thank you everyone for providing these great solutions. I'm really awed by the generosity and professionalism. It works! I didn't know sed was so powerful. Now I need to figure out how to pass over the entries that don't match this specific pattern. Thank you all and have a great New Year!!

– jeffschips
Dec 31 '18 at 22:33

@jeffschips: You're welcome.Feel free to mark one of the answers as accepted, see: superuser.com/help/someone-answers

– Toto
Jan 1 at 11:00

add a comment |

A perl way:

#!/usr/bin/perl

use feature 'say';



my $str = '"Gene Code (3D - 3H) D2 fragment, D74F"';

# get begin number, begin letter, end number, end letter

my ($bn,$bl,$en,$el) = $str =~ /((.)(.) - (.)(.))/;

# loop from begin letter to end letter

for my $i ($bl .. $el) {

    # do the substitution and print

    ($_ = $str) =~ s/ (.. - ..)/, $bn$i,/ && say;

}

Output:

"Gene Code, 3D, D2 fragment, D74F"

"Gene Code, 3E, D2 fragment, D74F"

"Gene Code, 3F, D2 fragment, D74F"

"Gene Code, 3G, D2 fragment, D74F"

"Gene Code, 3H, D2 fragment, D74F"

answered Dec 31 '18 at 11:03

Toto

3,735101226

Thank you everyone for providing these great solutions. I'm really awed by the generosity and professionalism. It works! I didn't know sed was so powerful. Now I need to figure out how to pass over the entries that don't match this specific pattern. Thank you all and have a great New Year!!

– jeffschips
Dec 31 '18 at 22:33

@jeffschips: You're welcome.Feel free to mark one of the answers as accepted, see: superuser.com/help/someone-answers

– Toto
Jan 1 at 11:00

add a comment |

A perl way:

#!/usr/bin/perl

use feature 'say';



my $str = '"Gene Code (3D - 3H) D2 fragment, D74F"';

# get begin number, begin letter, end number, end letter

my ($bn,$bl,$en,$el) = $str =~ /((.)(.) - (.)(.))/;

# loop from begin letter to end letter

for my $i ($bl .. $el) {

    # do the substitution and print

    ($_ = $str) =~ s/ (.. - ..)/, $bn$i,/ && say;

}

Output:

"Gene Code, 3D, D2 fragment, D74F"

"Gene Code, 3E, D2 fragment, D74F"

"Gene Code, 3F, D2 fragment, D74F"

"Gene Code, 3G, D2 fragment, D74F"

"Gene Code, 3H, D2 fragment, D74F"

answered Dec 31 '18 at 11:03

Toto

3,735101226

A perl way:

#!/usr/bin/perl

use feature 'say';



my $str = '"Gene Code (3D - 3H) D2 fragment, D74F"';

# get begin number, begin letter, end number, end letter

my ($bn,$bl,$en,$el) = $str =~ /((.)(.) - (.)(.))/;

# loop from begin letter to end letter

for my $i ($bl .. $el) {

    # do the substitution and print

    ($_ = $str) =~ s/ (.. - ..)/, $bn$i,/ && say;

}

Output:

"Gene Code, 3D, D2 fragment, D74F"

"Gene Code, 3E, D2 fragment, D74F"

"Gene Code, 3F, D2 fragment, D74F"

"Gene Code, 3G, D2 fragment, D74F"

"Gene Code, 3H, D2 fragment, D74F"

answered Dec 31 '18 at 11:03

Toto

3,735101226

answered Dec 31 '18 at 11:03

Toto

3,735101226

answered Dec 31 '18 at 11:03

Toto

3,735101226

answered Dec 31 '18 at 11:03

Toto

3,735101226

Thank you everyone for providing these great solutions. I'm really awed by the generosity and professionalism. It works! I didn't know sed was so powerful. Now I need to figure out how to pass over the entries that don't match this specific pattern. Thank you all and have a great New Year!!

– jeffschips
Dec 31 '18 at 22:33

@jeffschips: You're welcome.Feel free to mark one of the answers as accepted, see: superuser.com/help/someone-answers

– Toto
Jan 1 at 11:00

add a comment |

Thank you everyone for providing these great solutions. I'm really awed by the generosity and professionalism. It works! I didn't know sed was so powerful. Now I need to figure out how to pass over the entries that don't match this specific pattern. Thank you all and have a great New Year!!

– jeffschips
Dec 31 '18 at 22:33

@jeffschips: You're welcome.Feel free to mark one of the answers as accepted, see: superuser.com/help/someone-answers

– Toto
Jan 1 at 11:00

Thank you everyone for providing these great solutions. I'm really awed by the generosity and professionalism. It works! I didn't know sed was so powerful. Now I need to figure out how to pass over the entries that don't match this specific pattern. Thank you all and have a great New Year!!

– jeffschips
Dec 31 '18 at 22:33

@jeffschips: You're welcome.Feel free to mark one of the answers as accepted, see: superuser.com/help/someone-answers

– Toto
Jan 1 at 11:00

add a comment |

$ cat foo

#!/usr/bin/env bash



# Script to expand $1 passed as:



# "Gene Code (91K - 91Q) D2 fragment, D74F"

# 

# into the output:

# 

# Gene Code, 91K, D2 fragment, D74F

# Gene Code, 91L, D2 fragment, D74F

# Gene Code, 91M, D2 fragment, D74F

# Gene Code, 91N, D2 fragment, D74F

# Gene Code, 91O, D2 fragment, D74F

# Gene Code, 91P, D2 fragment, D74F

# Gene Code, 91Q, D2 fragment, D74F





# Copy $1 into FMT_STRING, replacing the " (91K - 91Q)" bit with a ', %s,' 

# printf directive, such as 'Gene Code, %s, D2 fragment, D74F':



FMT_STRING="$(sed -e 's/ (.* - .*)/, %s,/' <<< "$1")"



# Parse the beginning and ending bounds and format them with just a 

# space between, such as '91K 91Q':



BOUNDS="$(sed -e 's/^[^(]*((.*) - (.*)) .*/1 2/' <<< "$1")"



# Extract the (first) static numeric part from BOUNDS, e.g. '91'



NUMERIC="$(sed -e 's/[^0-9].*//' <<< "$BOUNDS")"



# remove all digits [0-9] from BOUNDS, e.g. 'K Q'

BOUNDS="$(sed -e 's/[0-9]//g' <<< "$BOUNDS")"



FMT_STRING="$(printf "$FMT_STRING" "${NUMERIC}%c")"



jot -w "$FMT_STRING" - $BOUNDS

Sample output:

$ ./foo "Gene Code (737L - 737X) D2 fragment, D74F"

Gene Code, 737L, D2 fragment, D74F

Gene Code, 737M, D2 fragment, D74F

Gene Code, 737N, D2 fragment, D74F

Gene Code, 737O, D2 fragment, D74F

Gene Code, 737P, D2 fragment, D74F

Gene Code, 737Q, D2 fragment, D74F

Gene Code, 737R, D2 fragment, D74F

Gene Code, 737S, D2 fragment, D74F

Gene Code, 737T, D2 fragment, D74F

Gene Code, 737U, D2 fragment, D74F

Gene Code, 737V, D2 fragment, D74F

Gene Code, 737W, D2 fragment, D74F

Gene Code, 737X, D2 fragment, D74F

Reversing the bounds reverses the output:

$ ./foo "Gene Code (737X - 737L) D2 fragment, D74F"

Gene Code, 737X, D2 fragment, D74F

Gene Code, 737W, D2 fragment, D74F

Gene Code, 737V, D2 fragment, D74F

Gene Code, 737U, D2 fragment, D74F

Gene Code, 737T, D2 fragment, D74F

Gene Code, 737S, D2 fragment, D74F

Gene Code, 737R, D2 fragment, D74F

Gene Code, 737Q, D2 fragment, D74F

Gene Code, 737P, D2 fragment, D74F

Gene Code, 737O, D2 fragment, D74F

Gene Code, 737N, D2 fragment, D74F

Gene Code, 737M, D2 fragment, D74F

Gene Code, 737L, D2 fragment, D74F

edited Jan 18 at 22:19

answered Jan 18 at 22:11

Jim L.

30617

add a comment |

$ cat foo

#!/usr/bin/env bash



# Script to expand $1 passed as:



# "Gene Code (91K - 91Q) D2 fragment, D74F"

# 

# into the output:

# 

# Gene Code, 91K, D2 fragment, D74F

# Gene Code, 91L, D2 fragment, D74F

# Gene Code, 91M, D2 fragment, D74F

# Gene Code, 91N, D2 fragment, D74F

# Gene Code, 91O, D2 fragment, D74F

# Gene Code, 91P, D2 fragment, D74F

# Gene Code, 91Q, D2 fragment, D74F





# Copy $1 into FMT_STRING, replacing the " (91K - 91Q)" bit with a ', %s,' 

# printf directive, such as 'Gene Code, %s, D2 fragment, D74F':



FMT_STRING="$(sed -e 's/ (.* - .*)/, %s,/' <<< "$1")"



# Parse the beginning and ending bounds and format them with just a 

# space between, such as '91K 91Q':



BOUNDS="$(sed -e 's/^[^(]*((.*) - (.*)) .*/1 2/' <<< "$1")"



# Extract the (first) static numeric part from BOUNDS, e.g. '91'



NUMERIC="$(sed -e 's/[^0-9].*//' <<< "$BOUNDS")"



# remove all digits [0-9] from BOUNDS, e.g. 'K Q'

BOUNDS="$(sed -e 's/[0-9]//g' <<< "$BOUNDS")"



FMT_STRING="$(printf "$FMT_STRING" "${NUMERIC}%c")"



jot -w "$FMT_STRING" - $BOUNDS

Sample output:

$ ./foo "Gene Code (737L - 737X) D2 fragment, D74F"

Gene Code, 737L, D2 fragment, D74F

Gene Code, 737M, D2 fragment, D74F

Gene Code, 737N, D2 fragment, D74F

Gene Code, 737O, D2 fragment, D74F

Gene Code, 737P, D2 fragment, D74F

Gene Code, 737Q, D2 fragment, D74F

Gene Code, 737R, D2 fragment, D74F

Gene Code, 737S, D2 fragment, D74F

Gene Code, 737T, D2 fragment, D74F

Gene Code, 737U, D2 fragment, D74F

Gene Code, 737V, D2 fragment, D74F

Gene Code, 737W, D2 fragment, D74F

Gene Code, 737X, D2 fragment, D74F

Reversing the bounds reverses the output:

$ ./foo "Gene Code (737X - 737L) D2 fragment, D74F"

Gene Code, 737X, D2 fragment, D74F

Gene Code, 737W, D2 fragment, D74F

Gene Code, 737V, D2 fragment, D74F

Gene Code, 737U, D2 fragment, D74F

Gene Code, 737T, D2 fragment, D74F

Gene Code, 737S, D2 fragment, D74F

Gene Code, 737R, D2 fragment, D74F

Gene Code, 737Q, D2 fragment, D74F

Gene Code, 737P, D2 fragment, D74F

Gene Code, 737O, D2 fragment, D74F

Gene Code, 737N, D2 fragment, D74F

Gene Code, 737M, D2 fragment, D74F

Gene Code, 737L, D2 fragment, D74F

edited Jan 18 at 22:19

answered Jan 18 at 22:11

Jim L.

30617

add a comment |

$ cat foo

#!/usr/bin/env bash



# Script to expand $1 passed as:



# "Gene Code (91K - 91Q) D2 fragment, D74F"

# 

# into the output:

# 

# Gene Code, 91K, D2 fragment, D74F

# Gene Code, 91L, D2 fragment, D74F

# Gene Code, 91M, D2 fragment, D74F

# Gene Code, 91N, D2 fragment, D74F

# Gene Code, 91O, D2 fragment, D74F

# Gene Code, 91P, D2 fragment, D74F

# Gene Code, 91Q, D2 fragment, D74F





# Copy $1 into FMT_STRING, replacing the " (91K - 91Q)" bit with a ', %s,' 

# printf directive, such as 'Gene Code, %s, D2 fragment, D74F':



FMT_STRING="$(sed -e 's/ (.* - .*)/, %s,/' <<< "$1")"



# Parse the beginning and ending bounds and format them with just a 

# space between, such as '91K 91Q':



BOUNDS="$(sed -e 's/^[^(]*((.*) - (.*)) .*/1 2/' <<< "$1")"



# Extract the (first) static numeric part from BOUNDS, e.g. '91'



NUMERIC="$(sed -e 's/[^0-9].*//' <<< "$BOUNDS")"



# remove all digits [0-9] from BOUNDS, e.g. 'K Q'

BOUNDS="$(sed -e 's/[0-9]//g' <<< "$BOUNDS")"



FMT_STRING="$(printf "$FMT_STRING" "${NUMERIC}%c")"



jot -w "$FMT_STRING" - $BOUNDS

Sample output:

$ ./foo "Gene Code (737L - 737X) D2 fragment, D74F"

Gene Code, 737L, D2 fragment, D74F

Gene Code, 737M, D2 fragment, D74F

Gene Code, 737N, D2 fragment, D74F

Gene Code, 737O, D2 fragment, D74F

Gene Code, 737P, D2 fragment, D74F

Gene Code, 737Q, D2 fragment, D74F

Gene Code, 737R, D2 fragment, D74F

Gene Code, 737S, D2 fragment, D74F

Gene Code, 737T, D2 fragment, D74F

Gene Code, 737U, D2 fragment, D74F

Gene Code, 737V, D2 fragment, D74F

Gene Code, 737W, D2 fragment, D74F

Gene Code, 737X, D2 fragment, D74F

Reversing the bounds reverses the output:

$ ./foo "Gene Code (737X - 737L) D2 fragment, D74F"

Gene Code, 737X, D2 fragment, D74F

Gene Code, 737W, D2 fragment, D74F

Gene Code, 737V, D2 fragment, D74F

Gene Code, 737U, D2 fragment, D74F

Gene Code, 737T, D2 fragment, D74F

Gene Code, 737S, D2 fragment, D74F

Gene Code, 737R, D2 fragment, D74F

Gene Code, 737Q, D2 fragment, D74F

Gene Code, 737P, D2 fragment, D74F

Gene Code, 737O, D2 fragment, D74F

Gene Code, 737N, D2 fragment, D74F

Gene Code, 737M, D2 fragment, D74F

Gene Code, 737L, D2 fragment, D74F

edited Jan 18 at 22:19

answered Jan 18 at 22:11

Jim L.

30617

$ cat foo

#!/usr/bin/env bash



# Script to expand $1 passed as:



# "Gene Code (91K - 91Q) D2 fragment, D74F"

# 

# into the output:

# 

# Gene Code, 91K, D2 fragment, D74F

# Gene Code, 91L, D2 fragment, D74F

# Gene Code, 91M, D2 fragment, D74F

# Gene Code, 91N, D2 fragment, D74F

# Gene Code, 91O, D2 fragment, D74F

# Gene Code, 91P, D2 fragment, D74F

# Gene Code, 91Q, D2 fragment, D74F





# Copy $1 into FMT_STRING, replacing the " (91K - 91Q)" bit with a ', %s,' 

# printf directive, such as 'Gene Code, %s, D2 fragment, D74F':



FMT_STRING="$(sed -e 's/ (.* - .*)/, %s,/' <<< "$1")"



# Parse the beginning and ending bounds and format them with just a 

# space between, such as '91K 91Q':



BOUNDS="$(sed -e 's/^[^(]*((.*) - (.*)) .*/1 2/' <<< "$1")"



# Extract the (first) static numeric part from BOUNDS, e.g. '91'



NUMERIC="$(sed -e 's/[^0-9].*//' <<< "$BOUNDS")"



# remove all digits [0-9] from BOUNDS, e.g. 'K Q'

BOUNDS="$(sed -e 's/[0-9]//g' <<< "$BOUNDS")"



FMT_STRING="$(printf "$FMT_STRING" "${NUMERIC}%c")"



jot -w "$FMT_STRING" - $BOUNDS

Sample output:

$ ./foo "Gene Code (737L - 737X) D2 fragment, D74F"

Gene Code, 737L, D2 fragment, D74F

Gene Code, 737M, D2 fragment, D74F

Gene Code, 737N, D2 fragment, D74F

Gene Code, 737O, D2 fragment, D74F

Gene Code, 737P, D2 fragment, D74F

Gene Code, 737Q, D2 fragment, D74F

Gene Code, 737R, D2 fragment, D74F

Gene Code, 737S, D2 fragment, D74F

Gene Code, 737T, D2 fragment, D74F

Gene Code, 737U, D2 fragment, D74F

Gene Code, 737V, D2 fragment, D74F

Gene Code, 737W, D2 fragment, D74F

Gene Code, 737X, D2 fragment, D74F

Reversing the bounds reverses the output:

$ ./foo "Gene Code (737X - 737L) D2 fragment, D74F"

Gene Code, 737X, D2 fragment, D74F

Gene Code, 737W, D2 fragment, D74F

Gene Code, 737V, D2 fragment, D74F

Gene Code, 737U, D2 fragment, D74F

Gene Code, 737T, D2 fragment, D74F

Gene Code, 737S, D2 fragment, D74F

Gene Code, 737R, D2 fragment, D74F

Gene Code, 737Q, D2 fragment, D74F

Gene Code, 737P, D2 fragment, D74F

Gene Code, 737O, D2 fragment, D74F

Gene Code, 737N, D2 fragment, D74F

Gene Code, 737M, D2 fragment, D74F

Gene Code, 737L, D2 fragment, D74F

edited Jan 18 at 22:19

answered Jan 18 at 22:11

Jim L.

30617

edited Jan 18 at 22:19

answered Jan 18 at 22:11

Jim L.

30617

answered Jan 18 at 22:11

Jim L.

30617

answered Jan 18 at 22:11

Jim L.

30617

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Super User!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

zQCuDLWNx4MX,FMnQJeAKL0zMzAzRHTXK uj 191,EO20FsvDbihAJteUDF

搜尋此網誌

Csdrhrt