Count appearances of a value until it changes to another value











up vote
7
down vote

favorite
1












I have the following DataFrame:



df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])


I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.



I tried:



df['values'].value_counts()


but it gives me



10    6
9 3
23 2
12 1


The desired output is



10:2 
23:2
9:3
10:4
12:1


How can I do this?










share|improve this question
























  • You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
    – Buhb
    Nov 29 at 21:36















up vote
7
down vote

favorite
1












I have the following DataFrame:



df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])


I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.



I tried:



df['values'].value_counts()


but it gives me



10    6
9 3
23 2
12 1


The desired output is



10:2 
23:2
9:3
10:4
12:1


How can I do this?










share|improve this question
























  • You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
    – Buhb
    Nov 29 at 21:36













up vote
7
down vote

favorite
1









up vote
7
down vote

favorite
1






1





I have the following DataFrame:



df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])


I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.



I tried:



df['values'].value_counts()


but it gives me



10    6
9 3
23 2
12 1


The desired output is



10:2 
23:2
9:3
10:4
12:1


How can I do this?










share|improve this question















I have the following DataFrame:



df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])


I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.



I tried:



df['values'].value_counts()


but it gives me



10    6
9 3
23 2
12 1


The desired output is



10:2 
23:2
9:3
10:4
12:1


How can I do this?







python pandas count frequency






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 29 at 20:01









Alex Riley

75.6k21155159




75.6k21155159










asked Nov 29 at 15:43









Mischa

666




666












  • You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
    – Buhb
    Nov 29 at 21:36


















  • You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
    – Buhb
    Nov 29 at 21:36
















You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
Nov 29 at 21:36




You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
Nov 29 at 21:36












5 Answers
5






active

oldest

votes

















up vote
12
down vote













Use:



df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()


Or:



df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()




print (df)
values values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
Name: values, dtype: int64


Last for remove first level:



df = df.reset_index(level=0, drop=True)
print (df)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64


Explanation:



Compare original column by shifted with not equal ne and then add cumsum for helper Series:



print (pd.concat([df['values'], a, b, c], 
keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
orig shifted not_equal cumsum
0 10 NaN True 1
1 10 10.0 False 1
2 23 10.0 True 2
3 23 23.0 False 2
4 9 23.0 True 3
5 9 9.0 False 3
6 9 9.0 False 3
7 10 9.0 True 4
8 10 10.0 False 4
9 10 10.0 False 4
10 10 10.0 False 4
11 12 10.0 True 5





share|improve this answer























  • i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
    – Mischa
    Nov 29 at 15:52






  • 1




    @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
    – jezrael
    Nov 29 at 15:53










  • @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
    – RavinderSingh13
    Nov 30 at 12:34


















up vote
5
down vote













You can keep track of where the changes in df['values'] occur:



changes = df['values'].diff().ne(0).cumsum()
print(changes)

0 1
1 1
2 2
3 2
4 3
5 3
6 3
7 4
8 4
9 4
10 4
11 5


And groupby the changes and also df['values'] (to keep them as index) computing the size of each group



df.groupby([changes,'values']).size().reset_index(level=0, drop=True)

values
10 2
23 2
9 3
10 4
12 1
dtype: int64





share|improve this answer






























    up vote
    5
    down vote













    itertools.groupby



    from itertools import groupby

    pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))

    10 2
    23 2
    9 3
    10 4
    12 1
    dtype: int64




    It's a generator



    def f(x):
    count = 1
    for this, that in zip(x, x[1:]):
    if this == that:
    count += 1
    else:
    yield count, this
    count = 1
    yield count, [*x][-1]

    pd.Series(*zip(*f(df['values'])))

    10 2
    23 2
    9 3
    10 4
    12 1
    dtype: int64





    share|improve this answer






























      up vote
      4
      down vote













      Using crosstab



      df['key']=df['values'].diff().ne(0).cumsum()
      pd.crosstab(df['key'],df['values'])
      Out[353]:
      values 9 10 12 23
      key
      1 0 2 0 0
      2 0 0 0 2
      3 3 0 0 0
      4 0 4 0 0
      5 0 0 1 0


      Slightly modify the result above



      pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
      Out[355]:
      key values
      1 10 2
      2 23 2
      3 9 3
      4 10 4
      5 12 1
      dtype: int64




      Base on python groupby



      from itertools import groupby

      [ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
      Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]





      share|improve this answer






























        up vote
        0
        down vote













        This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.



        import pandas as pd

        df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])

        dict_count = {}
        for v in df['values'].unique():
        dict_count[v] = 0

        curr_val = df.iloc[0]['values']
        count = 1
        for i in range(1, len(df)):
        if df.iloc[i]['values'] == curr_val:
        count += 1
        else:
        if count > dict_count[curr_val]:
        dict_count[curr_val] = count
        curr_val = df.iloc[i]['values']
        count = 1
        if count > dict_count[curr_val]:
        dict_count[curr_val] = count

        df_count = pd.DataFrame(dict_count, index=[0])
        print(df_count)





        share|improve this answer





















          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53542668%2fcount-appearances-of-a-value-until-it-changes-to-another-value%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          5 Answers
          5






          active

          oldest

          votes








          5 Answers
          5






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          12
          down vote













          Use:



          df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()


          Or:



          df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()




          print (df)
          values values
          1 10 2
          2 23 2
          3 9 3
          4 10 4
          5 12 1
          Name: values, dtype: int64


          Last for remove first level:



          df = df.reset_index(level=0, drop=True)
          print (df)
          values
          10 2
          23 2
          9 3
          10 4
          12 1
          dtype: int64


          Explanation:



          Compare original column by shifted with not equal ne and then add cumsum for helper Series:



          print (pd.concat([df['values'], a, b, c], 
          keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
          orig shifted not_equal cumsum
          0 10 NaN True 1
          1 10 10.0 False 1
          2 23 10.0 True 2
          3 23 23.0 False 2
          4 9 23.0 True 3
          5 9 9.0 False 3
          6 9 9.0 False 3
          7 10 9.0 True 4
          8 10 10.0 False 4
          9 10 10.0 False 4
          10 10 10.0 False 4
          11 12 10.0 True 5





          share|improve this answer























          • i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
            – Mischa
            Nov 29 at 15:52






          • 1




            @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
            – jezrael
            Nov 29 at 15:53










          • @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
            – RavinderSingh13
            Nov 30 at 12:34















          up vote
          12
          down vote













          Use:



          df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()


          Or:



          df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()




          print (df)
          values values
          1 10 2
          2 23 2
          3 9 3
          4 10 4
          5 12 1
          Name: values, dtype: int64


          Last for remove first level:



          df = df.reset_index(level=0, drop=True)
          print (df)
          values
          10 2
          23 2
          9 3
          10 4
          12 1
          dtype: int64


          Explanation:



          Compare original column by shifted with not equal ne and then add cumsum for helper Series:



          print (pd.concat([df['values'], a, b, c], 
          keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
          orig shifted not_equal cumsum
          0 10 NaN True 1
          1 10 10.0 False 1
          2 23 10.0 True 2
          3 23 23.0 False 2
          4 9 23.0 True 3
          5 9 9.0 False 3
          6 9 9.0 False 3
          7 10 9.0 True 4
          8 10 10.0 False 4
          9 10 10.0 False 4
          10 10 10.0 False 4
          11 12 10.0 True 5





          share|improve this answer























          • i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
            – Mischa
            Nov 29 at 15:52






          • 1




            @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
            – jezrael
            Nov 29 at 15:53










          • @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
            – RavinderSingh13
            Nov 30 at 12:34













          up vote
          12
          down vote










          up vote
          12
          down vote









          Use:



          df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()


          Or:



          df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()




          print (df)
          values values
          1 10 2
          2 23 2
          3 9 3
          4 10 4
          5 12 1
          Name: values, dtype: int64


          Last for remove first level:



          df = df.reset_index(level=0, drop=True)
          print (df)
          values
          10 2
          23 2
          9 3
          10 4
          12 1
          dtype: int64


          Explanation:



          Compare original column by shifted with not equal ne and then add cumsum for helper Series:



          print (pd.concat([df['values'], a, b, c], 
          keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
          orig shifted not_equal cumsum
          0 10 NaN True 1
          1 10 10.0 False 1
          2 23 10.0 True 2
          3 23 23.0 False 2
          4 9 23.0 True 3
          5 9 9.0 False 3
          6 9 9.0 False 3
          7 10 9.0 True 4
          8 10 10.0 False 4
          9 10 10.0 False 4
          10 10 10.0 False 4
          11 12 10.0 True 5





          share|improve this answer














          Use:



          df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()


          Or:



          df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()




          print (df)
          values values
          1 10 2
          2 23 2
          3 9 3
          4 10 4
          5 12 1
          Name: values, dtype: int64


          Last for remove first level:



          df = df.reset_index(level=0, drop=True)
          print (df)
          values
          10 2
          23 2
          9 3
          10 4
          12 1
          dtype: int64


          Explanation:



          Compare original column by shifted with not equal ne and then add cumsum for helper Series:



          print (pd.concat([df['values'], a, b, c], 
          keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
          orig shifted not_equal cumsum
          0 10 NaN True 1
          1 10 10.0 False 1
          2 23 10.0 True 2
          3 23 23.0 False 2
          4 9 23.0 True 3
          5 9 9.0 False 3
          6 9 9.0 False 3
          7 10 9.0 True 4
          8 10 10.0 False 4
          9 10 10.0 False 4
          10 10 10.0 False 4
          11 12 10.0 True 5






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 29 at 15:51

























          answered Nov 29 at 15:45









          jezrael

          312k21247323




          312k21247323












          • i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
            – Mischa
            Nov 29 at 15:52






          • 1




            @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
            – jezrael
            Nov 29 at 15:53










          • @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
            – RavinderSingh13
            Nov 30 at 12:34


















          • i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
            – Mischa
            Nov 29 at 15:52






          • 1




            @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
            – jezrael
            Nov 29 at 15:53










          • @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
            – RavinderSingh13
            Nov 30 at 12:34
















          i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
          – Mischa
          Nov 29 at 15:52




          i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
          – Mischa
          Nov 29 at 15:52




          1




          1




          @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
          – jezrael
          Nov 29 at 15:53




          @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
          – jezrael
          Nov 29 at 15:53












          @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
          – RavinderSingh13
          Nov 30 at 12:34




          @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
          – RavinderSingh13
          Nov 30 at 12:34












          up vote
          5
          down vote













          You can keep track of where the changes in df['values'] occur:



          changes = df['values'].diff().ne(0).cumsum()
          print(changes)

          0 1
          1 1
          2 2
          3 2
          4 3
          5 3
          6 3
          7 4
          8 4
          9 4
          10 4
          11 5


          And groupby the changes and also df['values'] (to keep them as index) computing the size of each group



          df.groupby([changes,'values']).size().reset_index(level=0, drop=True)

          values
          10 2
          23 2
          9 3
          10 4
          12 1
          dtype: int64





          share|improve this answer



























            up vote
            5
            down vote













            You can keep track of where the changes in df['values'] occur:



            changes = df['values'].diff().ne(0).cumsum()
            print(changes)

            0 1
            1 1
            2 2
            3 2
            4 3
            5 3
            6 3
            7 4
            8 4
            9 4
            10 4
            11 5


            And groupby the changes and also df['values'] (to keep them as index) computing the size of each group



            df.groupby([changes,'values']).size().reset_index(level=0, drop=True)

            values
            10 2
            23 2
            9 3
            10 4
            12 1
            dtype: int64





            share|improve this answer

























              up vote
              5
              down vote










              up vote
              5
              down vote









              You can keep track of where the changes in df['values'] occur:



              changes = df['values'].diff().ne(0).cumsum()
              print(changes)

              0 1
              1 1
              2 2
              3 2
              4 3
              5 3
              6 3
              7 4
              8 4
              9 4
              10 4
              11 5


              And groupby the changes and also df['values'] (to keep them as index) computing the size of each group



              df.groupby([changes,'values']).size().reset_index(level=0, drop=True)

              values
              10 2
              23 2
              9 3
              10 4
              12 1
              dtype: int64





              share|improve this answer














              You can keep track of where the changes in df['values'] occur:



              changes = df['values'].diff().ne(0).cumsum()
              print(changes)

              0 1
              1 1
              2 2
              3 2
              4 3
              5 3
              6 3
              7 4
              8 4
              9 4
              10 4
              11 5


              And groupby the changes and also df['values'] (to keep them as index) computing the size of each group



              df.groupby([changes,'values']).size().reset_index(level=0, drop=True)

              values
              10 2
              23 2
              9 3
              10 4
              12 1
              dtype: int64






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 29 at 16:01

























              answered Nov 29 at 15:55









              nixon

              1,50016




              1,50016






















                  up vote
                  5
                  down vote













                  itertools.groupby



                  from itertools import groupby

                  pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))

                  10 2
                  23 2
                  9 3
                  10 4
                  12 1
                  dtype: int64




                  It's a generator



                  def f(x):
                  count = 1
                  for this, that in zip(x, x[1:]):
                  if this == that:
                  count += 1
                  else:
                  yield count, this
                  count = 1
                  yield count, [*x][-1]

                  pd.Series(*zip(*f(df['values'])))

                  10 2
                  23 2
                  9 3
                  10 4
                  12 1
                  dtype: int64





                  share|improve this answer



























                    up vote
                    5
                    down vote













                    itertools.groupby



                    from itertools import groupby

                    pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))

                    10 2
                    23 2
                    9 3
                    10 4
                    12 1
                    dtype: int64




                    It's a generator



                    def f(x):
                    count = 1
                    for this, that in zip(x, x[1:]):
                    if this == that:
                    count += 1
                    else:
                    yield count, this
                    count = 1
                    yield count, [*x][-1]

                    pd.Series(*zip(*f(df['values'])))

                    10 2
                    23 2
                    9 3
                    10 4
                    12 1
                    dtype: int64





                    share|improve this answer

























                      up vote
                      5
                      down vote










                      up vote
                      5
                      down vote









                      itertools.groupby



                      from itertools import groupby

                      pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))

                      10 2
                      23 2
                      9 3
                      10 4
                      12 1
                      dtype: int64




                      It's a generator



                      def f(x):
                      count = 1
                      for this, that in zip(x, x[1:]):
                      if this == that:
                      count += 1
                      else:
                      yield count, this
                      count = 1
                      yield count, [*x][-1]

                      pd.Series(*zip(*f(df['values'])))

                      10 2
                      23 2
                      9 3
                      10 4
                      12 1
                      dtype: int64





                      share|improve this answer














                      itertools.groupby



                      from itertools import groupby

                      pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))

                      10 2
                      23 2
                      9 3
                      10 4
                      12 1
                      dtype: int64




                      It's a generator



                      def f(x):
                      count = 1
                      for this, that in zip(x, x[1:]):
                      if this == that:
                      count += 1
                      else:
                      yield count, this
                      count = 1
                      yield count, [*x][-1]

                      pd.Series(*zip(*f(df['values'])))

                      10 2
                      23 2
                      9 3
                      10 4
                      12 1
                      dtype: int64






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Nov 29 at 16:38

























                      answered Nov 29 at 15:59









                      piRSquared

                      150k21135279




                      150k21135279






















                          up vote
                          4
                          down vote













                          Using crosstab



                          df['key']=df['values'].diff().ne(0).cumsum()
                          pd.crosstab(df['key'],df['values'])
                          Out[353]:
                          values 9 10 12 23
                          key
                          1 0 2 0 0
                          2 0 0 0 2
                          3 3 0 0 0
                          4 0 4 0 0
                          5 0 0 1 0


                          Slightly modify the result above



                          pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
                          Out[355]:
                          key values
                          1 10 2
                          2 23 2
                          3 9 3
                          4 10 4
                          5 12 1
                          dtype: int64




                          Base on python groupby



                          from itertools import groupby

                          [ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
                          Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]





                          share|improve this answer



























                            up vote
                            4
                            down vote













                            Using crosstab



                            df['key']=df['values'].diff().ne(0).cumsum()
                            pd.crosstab(df['key'],df['values'])
                            Out[353]:
                            values 9 10 12 23
                            key
                            1 0 2 0 0
                            2 0 0 0 2
                            3 3 0 0 0
                            4 0 4 0 0
                            5 0 0 1 0


                            Slightly modify the result above



                            pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
                            Out[355]:
                            key values
                            1 10 2
                            2 23 2
                            3 9 3
                            4 10 4
                            5 12 1
                            dtype: int64




                            Base on python groupby



                            from itertools import groupby

                            [ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
                            Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]





                            share|improve this answer

























                              up vote
                              4
                              down vote










                              up vote
                              4
                              down vote









                              Using crosstab



                              df['key']=df['values'].diff().ne(0).cumsum()
                              pd.crosstab(df['key'],df['values'])
                              Out[353]:
                              values 9 10 12 23
                              key
                              1 0 2 0 0
                              2 0 0 0 2
                              3 3 0 0 0
                              4 0 4 0 0
                              5 0 0 1 0


                              Slightly modify the result above



                              pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
                              Out[355]:
                              key values
                              1 10 2
                              2 23 2
                              3 9 3
                              4 10 4
                              5 12 1
                              dtype: int64




                              Base on python groupby



                              from itertools import groupby

                              [ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
                              Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]





                              share|improve this answer














                              Using crosstab



                              df['key']=df['values'].diff().ne(0).cumsum()
                              pd.crosstab(df['key'],df['values'])
                              Out[353]:
                              values 9 10 12 23
                              key
                              1 0 2 0 0
                              2 0 0 0 2
                              3 3 0 0 0
                              4 0 4 0 0
                              5 0 0 1 0


                              Slightly modify the result above



                              pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
                              Out[355]:
                              key values
                              1 10 2
                              2 23 2
                              3 9 3
                              4 10 4
                              5 12 1
                              dtype: int64




                              Base on python groupby



                              from itertools import groupby

                              [ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
                              Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]






                              share|improve this answer














                              share|improve this answer



                              share|improve this answer








                              edited Nov 29 at 15:59

























                              answered Nov 29 at 15:48









                              W-B

                              95.4k72961




                              95.4k72961






















                                  up vote
                                  0
                                  down vote













                                  This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.



                                  import pandas as pd

                                  df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])

                                  dict_count = {}
                                  for v in df['values'].unique():
                                  dict_count[v] = 0

                                  curr_val = df.iloc[0]['values']
                                  count = 1
                                  for i in range(1, len(df)):
                                  if df.iloc[i]['values'] == curr_val:
                                  count += 1
                                  else:
                                  if count > dict_count[curr_val]:
                                  dict_count[curr_val] = count
                                  curr_val = df.iloc[i]['values']
                                  count = 1
                                  if count > dict_count[curr_val]:
                                  dict_count[curr_val] = count

                                  df_count = pd.DataFrame(dict_count, index=[0])
                                  print(df_count)





                                  share|improve this answer

























                                    up vote
                                    0
                                    down vote













                                    This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.



                                    import pandas as pd

                                    df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])

                                    dict_count = {}
                                    for v in df['values'].unique():
                                    dict_count[v] = 0

                                    curr_val = df.iloc[0]['values']
                                    count = 1
                                    for i in range(1, len(df)):
                                    if df.iloc[i]['values'] == curr_val:
                                    count += 1
                                    else:
                                    if count > dict_count[curr_val]:
                                    dict_count[curr_val] = count
                                    curr_val = df.iloc[i]['values']
                                    count = 1
                                    if count > dict_count[curr_val]:
                                    dict_count[curr_val] = count

                                    df_count = pd.DataFrame(dict_count, index=[0])
                                    print(df_count)





                                    share|improve this answer























                                      up vote
                                      0
                                      down vote










                                      up vote
                                      0
                                      down vote









                                      This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.



                                      import pandas as pd

                                      df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])

                                      dict_count = {}
                                      for v in df['values'].unique():
                                      dict_count[v] = 0

                                      curr_val = df.iloc[0]['values']
                                      count = 1
                                      for i in range(1, len(df)):
                                      if df.iloc[i]['values'] == curr_val:
                                      count += 1
                                      else:
                                      if count > dict_count[curr_val]:
                                      dict_count[curr_val] = count
                                      curr_val = df.iloc[i]['values']
                                      count = 1
                                      if count > dict_count[curr_val]:
                                      dict_count[curr_val] = count

                                      df_count = pd.DataFrame(dict_count, index=[0])
                                      print(df_count)





                                      share|improve this answer












                                      This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.



                                      import pandas as pd

                                      df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])

                                      dict_count = {}
                                      for v in df['values'].unique():
                                      dict_count[v] = 0

                                      curr_val = df.iloc[0]['values']
                                      count = 1
                                      for i in range(1, len(df)):
                                      if df.iloc[i]['values'] == curr_val:
                                      count += 1
                                      else:
                                      if count > dict_count[curr_val]:
                                      dict_count[curr_val] = count
                                      curr_val = df.iloc[i]['values']
                                      count = 1
                                      if count > dict_count[curr_val]:
                                      dict_count[curr_val] = count

                                      df_count = pd.DataFrame(dict_count, index=[0])
                                      print(df_count)






                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Nov 30 at 19:22









                                      UBears

                                      104111




                                      104111






























                                          draft saved

                                          draft discarded




















































                                          Thanks for contributing an answer to Stack Overflow!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.





                                          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                          Please pay close attention to the following guidance:


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function () {
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53542668%2fcount-appearances-of-a-value-until-it-changes-to-another-value%23new-answer', 'question_page');
                                          }
                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          Plaza Victoria

                                          Puebla de Zaragoza

                                          Musa