How to get sum of values in column based on variables in other column separately? [duplicate]











up vote
4
down vote

favorite
1













This question already has an answer here:




  • How to calculate the sum of the data that have the same ID in the first column?

    4 answers




I have a table data like below



abc 1   1   1
bcd 2 2 4
bcd 12 23 3
cde 3 5 5
cde 3 4 5
cde 14 2 25


I want the sum of values in each column based on variables in first column and desired result is like below:



abc 1   1   1
bcd 14 25 7
cde 20 11 35


I used awk command like this



awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath


and I got a result below:



abc 3
bcd 46
cde 66


I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.










share|improve this question















marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















    up vote
    4
    down vote

    favorite
    1













    This question already has an answer here:




    • How to calculate the sum of the data that have the same ID in the first column?

      4 answers




    I have a table data like below



    abc 1   1   1
    bcd 2 2 4
    bcd 12 23 3
    cde 3 5 5
    cde 3 4 5
    cde 14 2 25


    I want the sum of values in each column based on variables in first column and desired result is like below:



    abc 1   1   1
    bcd 14 25 7
    cde 20 11 35


    I used awk command like this



    awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath


    and I got a result below:



    abc 3
    bcd 46
    cde 66


    I think the end of my code is wrong but don't know how to fix it.
    I need some directions to fix the code.










    share|improve this question















    marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37


    This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

















      up vote
      4
      down vote

      favorite
      1









      up vote
      4
      down vote

      favorite
      1






      1






      This question already has an answer here:




      • How to calculate the sum of the data that have the same ID in the first column?

        4 answers




      I have a table data like below



      abc 1   1   1
      bcd 2 2 4
      bcd 12 23 3
      cde 3 5 5
      cde 3 4 5
      cde 14 2 25


      I want the sum of values in each column based on variables in first column and desired result is like below:



      abc 1   1   1
      bcd 14 25 7
      cde 20 11 35


      I used awk command like this



      awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath


      and I got a result below:



      abc 3
      bcd 46
      cde 66


      I think the end of my code is wrong but don't know how to fix it.
      I need some directions to fix the code.










      share|improve this question
















      This question already has an answer here:




      • How to calculate the sum of the data that have the same ID in the first column?

        4 answers




      I have a table data like below



      abc 1   1   1
      bcd 2 2 4
      bcd 12 23 3
      cde 3 5 5
      cde 3 4 5
      cde 14 2 25


      I want the sum of values in each column based on variables in first column and desired result is like below:



      abc 1   1   1
      bcd 14 25 7
      cde 20 11 35


      I used awk command like this



      awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath


      and I got a result below:



      abc 3
      bcd 46
      cde 66


      I think the end of my code is wrong but don't know how to fix it.
      I need some directions to fix the code.





      This question already has an answer here:




      • How to calculate the sum of the data that have the same ID in the first column?

        4 answers








      shell-script text-processing awk numeric-data






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 27 at 11:40









      terdon

      127k31245422




      127k31245422










      asked Nov 27 at 6:05









      awkprob

      232




      232




      marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






      marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
























          3 Answers
          3






          active

          oldest

          votes

















          up vote
          4
          down vote



          accepted










          You were fairly close. 
          You see what you were doing wrong, don't you? 
          You were keeping one total for each column 1 value,
          when you should have been keeping three.



          This is similar to Inian's answer,
          but trivially extendable to handle any number of columns:



          awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
          END {for(i in a) {
          printf "%s", i
          for (n=2; n<=4; ++n) printf "t%s", a[i][n]
          printf "n"
          }
          }'


          Rather than keep three arrays, like Inian's answer,
          it keeps a two-dimensional array.






          share|improve this answer





















          • Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
            – terdon
            Nov 27 at 11:46










          • @terdon: Thanks for dropping by.  "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX  Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort.  Your enhancement would output the columns in random order, with no way to fix it by post-processing.
            – Scott
            Nov 27 at 19:10










          • Ah, yes indeed. Fair point.
            – terdon
            Nov 27 at 19:23










          • @Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
            – awkprob
            Nov 28 at 1:55










          • @Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
            – awkprob
            Nov 28 at 4:54


















          up vote
          4
          down vote













          So long as your file is tab-delimited, datamash is a good fit for this.



          $ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
          abc 1 1 1
          bcd 14 25 7
          cde 20 11 35


          Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.



          Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:



          sed -i 's/ +/t/g' tablefilepath





          share|improve this answer

















          • 1




            At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
            – steeldriver
            Nov 27 at 6:17












          • @steeldriver Thanks!
            – cryptarch
            Nov 27 at 6:57


















          up vote
          2
          down vote













          Using awk summing up the columns 2-4 based on 1.



          awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file





          share|improve this answer




























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            4
            down vote



            accepted










            You were fairly close. 
            You see what you were doing wrong, don't you? 
            You were keeping one total for each column 1 value,
            when you should have been keeping three.



            This is similar to Inian's answer,
            but trivially extendable to handle any number of columns:



            awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
            END {for(i in a) {
            printf "%s", i
            for (n=2; n<=4; ++n) printf "t%s", a[i][n]
            printf "n"
            }
            }'


            Rather than keep three arrays, like Inian's answer,
            it keeps a two-dimensional array.






            share|improve this answer





















            • Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
              – terdon
              Nov 27 at 11:46










            • @terdon: Thanks for dropping by.  "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX  Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort.  Your enhancement would output the columns in random order, with no way to fix it by post-processing.
              – Scott
              Nov 27 at 19:10










            • Ah, yes indeed. Fair point.
              – terdon
              Nov 27 at 19:23










            • @Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
              – awkprob
              Nov 28 at 1:55










            • @Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
              – awkprob
              Nov 28 at 4:54















            up vote
            4
            down vote



            accepted










            You were fairly close. 
            You see what you were doing wrong, don't you? 
            You were keeping one total for each column 1 value,
            when you should have been keeping three.



            This is similar to Inian's answer,
            but trivially extendable to handle any number of columns:



            awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
            END {for(i in a) {
            printf "%s", i
            for (n=2; n<=4; ++n) printf "t%s", a[i][n]
            printf "n"
            }
            }'


            Rather than keep three arrays, like Inian's answer,
            it keeps a two-dimensional array.






            share|improve this answer





















            • Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
              – terdon
              Nov 27 at 11:46










            • @terdon: Thanks for dropping by.  "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX  Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort.  Your enhancement would output the columns in random order, with no way to fix it by post-processing.
              – Scott
              Nov 27 at 19:10










            • Ah, yes indeed. Fair point.
              – terdon
              Nov 27 at 19:23










            • @Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
              – awkprob
              Nov 28 at 1:55










            • @Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
              – awkprob
              Nov 28 at 4:54













            up vote
            4
            down vote



            accepted







            up vote
            4
            down vote



            accepted






            You were fairly close. 
            You see what you were doing wrong, don't you? 
            You were keeping one total for each column 1 value,
            when you should have been keeping three.



            This is similar to Inian's answer,
            but trivially extendable to handle any number of columns:



            awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
            END {for(i in a) {
            printf "%s", i
            for (n=2; n<=4; ++n) printf "t%s", a[i][n]
            printf "n"
            }
            }'


            Rather than keep three arrays, like Inian's answer,
            it keeps a two-dimensional array.






            share|improve this answer












            You were fairly close. 
            You see what you were doing wrong, don't you? 
            You were keeping one total for each column 1 value,
            when you should have been keeping three.



            This is similar to Inian's answer,
            but trivially extendable to handle any number of columns:



            awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
            END {for(i in a) {
            printf "%s", i
            for (n=2; n<=4; ++n) printf "t%s", a[i][n]
            printf "n"
            }
            }'


            Rather than keep three arrays, like Inian's answer,
            it keeps a two-dimensional array.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 27 at 6:27









            Scott

            6,77642750




            6,77642750












            • Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
              – terdon
              Nov 27 at 11:46










            • @terdon: Thanks for dropping by.  "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX  Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort.  Your enhancement would output the columns in random order, with no way to fix it by post-processing.
              – Scott
              Nov 27 at 19:10










            • Ah, yes indeed. Fair point.
              – terdon
              Nov 27 at 19:23










            • @Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
              – awkprob
              Nov 28 at 1:55










            • @Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
              – awkprob
              Nov 28 at 4:54


















            • Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
              – terdon
              Nov 27 at 11:46










            • @terdon: Thanks for dropping by.  "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX  Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort.  Your enhancement would output the columns in random order, with no way to fix it by post-processing.
              – Scott
              Nov 27 at 19:10










            • Ah, yes indeed. Fair point.
              – terdon
              Nov 27 at 19:23










            • @Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
              – awkprob
              Nov 28 at 1:55










            • @Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
              – awkprob
              Nov 28 at 4:54
















            Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
            – terdon
            Nov 27 at 11:46




            Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
            – terdon
            Nov 27 at 11:46












            @terdon: Thanks for dropping by.  "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX  Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort.  Your enhancement would output the columns in random order, with no way to fix it by post-processing.
            – Scott
            Nov 27 at 19:10




            @terdon: Thanks for dropping by.  "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX  Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort.  Your enhancement would output the columns in random order, with no way to fix it by post-processing.
            – Scott
            Nov 27 at 19:10












            Ah, yes indeed. Fair point.
            – terdon
            Nov 27 at 19:23




            Ah, yes indeed. Fair point.
            – terdon
            Nov 27 at 19:23












            @Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
            – awkprob
            Nov 28 at 1:55




            @Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
            – awkprob
            Nov 28 at 1:55












            @Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
            – awkprob
            Nov 28 at 4:54




            @Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
            – awkprob
            Nov 28 at 4:54












            up vote
            4
            down vote













            So long as your file is tab-delimited, datamash is a good fit for this.



            $ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
            abc 1 1 1
            bcd 14 25 7
            cde 20 11 35


            Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.



            Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:



            sed -i 's/ +/t/g' tablefilepath





            share|improve this answer

















            • 1




              At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
              – steeldriver
              Nov 27 at 6:17












            • @steeldriver Thanks!
              – cryptarch
              Nov 27 at 6:57















            up vote
            4
            down vote













            So long as your file is tab-delimited, datamash is a good fit for this.



            $ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
            abc 1 1 1
            bcd 14 25 7
            cde 20 11 35


            Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.



            Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:



            sed -i 's/ +/t/g' tablefilepath





            share|improve this answer

















            • 1




              At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
              – steeldriver
              Nov 27 at 6:17












            • @steeldriver Thanks!
              – cryptarch
              Nov 27 at 6:57













            up vote
            4
            down vote










            up vote
            4
            down vote









            So long as your file is tab-delimited, datamash is a good fit for this.



            $ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
            abc 1 1 1
            bcd 14 25 7
            cde 20 11 35


            Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.



            Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:



            sed -i 's/ +/t/g' tablefilepath





            share|improve this answer












            So long as your file is tab-delimited, datamash is a good fit for this.



            $ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
            abc 1 1 1
            bcd 14 25 7
            cde 20 11 35


            Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.



            Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:



            sed -i 's/ +/t/g' tablefilepath






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 27 at 6:12









            cryptarch

            3856




            3856








            • 1




              At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
              – steeldriver
              Nov 27 at 6:17












            • @steeldriver Thanks!
              – cryptarch
              Nov 27 at 6:57














            • 1




              At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
              – steeldriver
              Nov 27 at 6:17












            • @steeldriver Thanks!
              – cryptarch
              Nov 27 at 6:57








            1




            1




            At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
            – steeldriver
            Nov 27 at 6:17






            At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
            – steeldriver
            Nov 27 at 6:17














            @steeldriver Thanks!
            – cryptarch
            Nov 27 at 6:57




            @steeldriver Thanks!
            – cryptarch
            Nov 27 at 6:57










            up vote
            2
            down vote













            Using awk summing up the columns 2-4 based on 1.



            awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file





            share|improve this answer

























              up vote
              2
              down vote













              Using awk summing up the columns 2-4 based on 1.



              awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file





              share|improve this answer























                up vote
                2
                down vote










                up vote
                2
                down vote









                Using awk summing up the columns 2-4 based on 1.



                awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file





                share|improve this answer












                Using awk summing up the columns 2-4 based on 1.



                awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 27 at 6:17









                Inian

                3,815824




                3,815824















                    Popular posts from this blog

                    Plaza Victoria

                    Puebla de Zaragoza

                    Musa