How to get sum of values in column based on variables in other column separately? [duplicate]

up vote
4
down vote

favorite

This question already has an answer here:

How to calculate the sum of the data that have the same ID in the first column?

4 answers

I have a table data like below

abc 1   1   1

bcd 2   2   4

bcd 12  23  3

cde 3   5   5

cde 3   4   5

cde 14  2   25

I want the sum of values in each column based on variables in first column and desired result is like below:

abc 1   1   1

bcd 14  25  7

cde 20  11  35

I used awk command like this

awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath

and I got a result below:

abc 3

bcd 46

cde 66

I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.

edited Nov 27 at 11:40

terdon♦

127k31245422

asked Nov 27 at 6:05

awkprob

232

marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

up vote
4
down vote

favorite

This question already has an answer here:

How to calculate the sum of the data that have the same ID in the first column?

4 answers

I have a table data like below

abc 1   1   1

bcd 2   2   4

bcd 12  23  3

cde 3   5   5

cde 3   4   5

cde 14  2   25

I want the sum of values in each column based on variables in first column and desired result is like below:

abc 1   1   1

bcd 14  25  7

cde 20  11  35

I used awk command like this

awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath

and I got a result below:

abc 3

bcd 46

cde 66

I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.

edited Nov 27 at 11:40

terdon♦

127k31245422

asked Nov 27 at 6:05

awkprob

232

marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

up vote
4
down vote

favorite

This question already has an answer here:

How to calculate the sum of the data that have the same ID in the first column?

4 answers

I have a table data like below

abc 1   1   1

bcd 2   2   4

bcd 12  23  3

cde 3   5   5

cde 3   4   5

cde 14  2   25

I want the sum of values in each column based on variables in first column and desired result is like below:

abc 1   1   1

bcd 14  25  7

cde 20  11  35

I used awk command like this

awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath

and I got a result below:

abc 3

bcd 46

cde 66

I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.

edited Nov 27 at 11:40

terdon♦

127k31245422

asked Nov 27 at 6:05

awkprob

232

This question already has an answer here:

How to calculate the sum of the data that have the same ID in the first column?

4 answers

I have a table data like below

abc 1   1   1

bcd 2   2   4

bcd 12  23  3

cde 3   5   5

cde 3   4   5

cde 14  2   25

I want the sum of values in each column based on variables in first column and desired result is like below:

abc 1   1   1

bcd 14  25  7

cde 20  11  35

I used awk command like this

awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath

and I got a result below:

abc 3

bcd 46

cde 66

I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.

This question already has an answer here:

How to calculate the sum of the data that have the same ID in the first column?

4 answers

shell-script text-processing awk numeric-data

edited Nov 27 at 11:40

terdon♦

127k31245422

asked Nov 27 at 6:05

awkprob

232

edited Nov 27 at 11:40

terdon♦

127k31245422

asked Nov 27 at 6:05

awkprob

232

edited Nov 27 at 11:40

terdon♦

127k31245422

edited Nov 27 at 11:40

terdon♦

127k31245422

edited Nov 27 at 11:40

terdon♦

127k31245422

asked Nov 27 at 6:05

awkprob

232

asked Nov 27 at 6:05

awkprob

232

asked Nov 27 at 6:05

awkprob

232

marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

3 Answers
3

active

oldest

votes

up vote
4
down vote

accepted

You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.

This is similar to Inian's answer,
but trivially extendable to handle any number of columns:

awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}

        END {for(i in a) {

                printf "%s", i

                for (n=2; n<=4; ++n) printf "t%s", a[i][n]

                printf "n"

             }

        }'

Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.

answered Nov 27 at 6:27

Scott

6,77642750

Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46

@terdon: Thanks for dropping by. "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10

Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23

@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55

@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54

|
show 3 more comments

up vote
4
down vote

So long as your file is tab-delimited, datamash is a good fit for this.

$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath

abc     1       1       1

bcd     14      25      7

cde     20      11      35

Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.

Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:

sed -i 's/ +/t/g' tablefilepath

answered Nov 27 at 6:12

cryptarch

3856

1

At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17

@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57

add a comment |

up vote
2
down vote

Using awk summing up the columns 2-4 based on 1.

awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file

answered Nov 27 at 6:17

Inian

3,815824

add a comment |

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
4
down vote

accepted

You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.

This is similar to Inian's answer,
but trivially extendable to handle any number of columns:

awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}

        END {for(i in a) {

                printf "%s", i

                for (n=2; n<=4; ++n) printf "t%s", a[i][n]

                printf "n"

             }

        }'

Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.

answered Nov 27 at 6:27

Scott

6,77642750

Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46

@terdon: Thanks for dropping by. "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10

Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23

@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55

@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54

|
show 3 more comments

up vote
4
down vote

accepted

You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.

This is similar to Inian's answer,
but trivially extendable to handle any number of columns:

awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}

        END {for(i in a) {

                printf "%s", i

                for (n=2; n<=4; ++n) printf "t%s", a[i][n]

                printf "n"

             }

        }'

Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.

answered Nov 27 at 6:27

Scott

6,77642750

Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46

@terdon: Thanks for dropping by. "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10

Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23

@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55

@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54

|
show 3 more comments

up vote
4
down vote

accepted

You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.

This is similar to Inian's answer,
but trivially extendable to handle any number of columns:

awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}

        END {for(i in a) {

                printf "%s", i

                for (n=2; n<=4; ++n) printf "t%s", a[i][n]

                printf "n"

             }

        }'

Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.

answered Nov 27 at 6:27

Scott

6,77642750

You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.

This is similar to Inian's answer,
but trivially extendable to handle any number of columns:

awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}

        END {for(i in a) {

                printf "%s", i

                for (n=2; n<=4; ++n) printf "t%s", a[i][n]

                printf "n"

             }

        }'

Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.

answered Nov 27 at 6:27

Scott

6,77642750

answered Nov 27 at 6:27

Scott

6,77642750

answered Nov 27 at 6:27

Scott

6,77642750

answered Nov 27 at 6:27

Scott

6,77642750

Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46

@terdon: Thanks for dropping by. "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10

Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23

@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55

@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54

|
show 3 more comments

Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46

@terdon: Thanks for dropping by. "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10

Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23

@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55

@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54

Why limit it at all? Why not

awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'

? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46

Why limit it at all? Why not

awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'

? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46

@terdon: Thanks for dropping by. "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10

Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23

@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55

@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54

|
show 3 more comments

up vote
4
down vote

So long as your file is tab-delimited, datamash is a good fit for this.

$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath

abc     1       1       1

bcd     14      25      7

cde     20      11      35

Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.

sed -i 's/ +/t/g' tablefilepath

answered Nov 27 at 6:12

cryptarch

3856

1

At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17

@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57

add a comment |

up vote
4
down vote

So long as your file is tab-delimited, datamash is a good fit for this.

$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath

abc     1       1       1

bcd     14      25      7

cde     20      11      35

Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.

sed -i 's/ +/t/g' tablefilepath

answered Nov 27 at 6:12

cryptarch

3856

1

At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17

@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57

add a comment |

up vote
4
down vote

So long as your file is tab-delimited, datamash is a good fit for this.

$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath

abc     1       1       1

bcd     14      25      7

cde     20      11      35

Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.

sed -i 's/ +/t/g' tablefilepath

answered Nov 27 at 6:12

cryptarch

3856

So long as your file is tab-delimited, datamash is a good fit for this.

$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath

abc     1       1       1

bcd     14      25      7

cde     20      11      35

Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.

sed -i 's/ +/t/g' tablefilepath

answered Nov 27 at 6:12

cryptarch

3856

answered Nov 27 at 6:12

cryptarch

3856

answered Nov 27 at 6:12

cryptarch

3856

answered Nov 27 at 6:12

cryptarch

3856

1

At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17

@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57

add a comment |

1

At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17

@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57

At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17

@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57

add a comment |

up vote
2
down vote

Using awk summing up the columns 2-4 based on 1.

awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file

answered Nov 27 at 6:17

Inian

3,815824

add a comment |

up vote
2
down vote

Using awk summing up the columns 2-4 based on 1.

awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file

answered Nov 27 at 6:17

Inian

3,815824

add a comment |

up vote
2
down vote

Using awk summing up the columns 2-4 based on 1.

awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file

answered Nov 27 at 6:17

Inian

3,815824

Using awk summing up the columns 2-4 based on 1.

awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file

answered Nov 27 at 6:17

Inian

3,815824

answered Nov 27 at 6:17

Inian

3,815824

answered Nov 27 at 6:17

Inian

3,815824

answered Nov 27 at 6:17

Inian

3,815824

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Csdrhrt