A pythonic and uFunc-y way to turn pandas column into “increasing” index? [duplicate]












7















This question already has an answer here:




  • Count each group sequentially pandas

    3 answers



  • Add numeric column to pandas dataframe based on other textual column

    5 answers




Let's say I have a pandas df like so:



Index   A     B
0 foo 3
1 foo 2
2 foo 5
3 bar 3
4 bar 4
5 baz 5


What's a good fast way to add a column like so:



Index   A     B    Aidx
0 foo 3 0
1 foo 2 0
2 foo 5 0
3 bar 3 1
4 bar 4 1
5 baz 5 2


I.e. adding an increasing index for each unique value?



I know I could use df.unique(), then use a dict and enumerate to create a lookup, and then apply that dictionary lookup to create the column. But I feel like there should be faster way, possibly involving groupby with some special function?










share|improve this question













marked as duplicate by coldspeed pandas
Users with the  pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Dec 14 '18 at 3:54


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.




















    7















    This question already has an answer here:




    • Count each group sequentially pandas

      3 answers



    • Add numeric column to pandas dataframe based on other textual column

      5 answers




    Let's say I have a pandas df like so:



    Index   A     B
    0 foo 3
    1 foo 2
    2 foo 5
    3 bar 3
    4 bar 4
    5 baz 5


    What's a good fast way to add a column like so:



    Index   A     B    Aidx
    0 foo 3 0
    1 foo 2 0
    2 foo 5 0
    3 bar 3 1
    4 bar 4 1
    5 baz 5 2


    I.e. adding an increasing index for each unique value?



    I know I could use df.unique(), then use a dict and enumerate to create a lookup, and then apply that dictionary lookup to create the column. But I feel like there should be faster way, possibly involving groupby with some special function?










    share|improve this question













    marked as duplicate by coldspeed pandas
    Users with the  pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

    StackExchange.ready(function() {
    if (StackExchange.options.isMobile) return;

    $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
    var $hover = $(this).addClass('hover-bound'),
    $msg = $hover.siblings('.dupe-hammer-message');

    $hover.hover(
    function() {
    $hover.showInfoMessage('', {
    messageElement: $msg.clone().show(),
    transient: false,
    position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
    dismissable: false,
    relativeToBody: true
    });
    },
    function() {
    StackExchange.helpers.removeMessages();
    }
    );
    });
    });
    Dec 14 '18 at 3:54


    This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.


















      7












      7








      7


      2






      This question already has an answer here:




      • Count each group sequentially pandas

        3 answers



      • Add numeric column to pandas dataframe based on other textual column

        5 answers




      Let's say I have a pandas df like so:



      Index   A     B
      0 foo 3
      1 foo 2
      2 foo 5
      3 bar 3
      4 bar 4
      5 baz 5


      What's a good fast way to add a column like so:



      Index   A     B    Aidx
      0 foo 3 0
      1 foo 2 0
      2 foo 5 0
      3 bar 3 1
      4 bar 4 1
      5 baz 5 2


      I.e. adding an increasing index for each unique value?



      I know I could use df.unique(), then use a dict and enumerate to create a lookup, and then apply that dictionary lookup to create the column. But I feel like there should be faster way, possibly involving groupby with some special function?










      share|improve this question














      This question already has an answer here:




      • Count each group sequentially pandas

        3 answers



      • Add numeric column to pandas dataframe based on other textual column

        5 answers




      Let's say I have a pandas df like so:



      Index   A     B
      0 foo 3
      1 foo 2
      2 foo 5
      3 bar 3
      4 bar 4
      5 baz 5


      What's a good fast way to add a column like so:



      Index   A     B    Aidx
      0 foo 3 0
      1 foo 2 0
      2 foo 5 0
      3 bar 3 1
      4 bar 4 1
      5 baz 5 2


      I.e. adding an increasing index for each unique value?



      I know I could use df.unique(), then use a dict and enumerate to create a lookup, and then apply that dictionary lookup to create the column. But I feel like there should be faster way, possibly involving groupby with some special function?





      This question already has an answer here:




      • Count each group sequentially pandas

        3 answers



      • Add numeric column to pandas dataframe based on other textual column

        5 answers








      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Dec 14 '18 at 0:48









      LagerbaerLagerbaer

      2,6451124




      2,6451124




      marked as duplicate by coldspeed pandas
      Users with the  pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

      StackExchange.ready(function() {
      if (StackExchange.options.isMobile) return;

      $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
      var $hover = $(this).addClass('hover-bound'),
      $msg = $hover.siblings('.dupe-hammer-message');

      $hover.hover(
      function() {
      $hover.showInfoMessage('', {
      messageElement: $msg.clone().show(),
      transient: false,
      position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
      dismissable: false,
      relativeToBody: true
      });
      },
      function() {
      StackExchange.helpers.removeMessages();
      }
      );
      });
      });
      Dec 14 '18 at 3:54


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






      marked as duplicate by coldspeed pandas
      Users with the  pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

      StackExchange.ready(function() {
      if (StackExchange.options.isMobile) return;

      $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
      var $hover = $(this).addClass('hover-bound'),
      $msg = $hover.siblings('.dupe-hammer-message');

      $hover.hover(
      function() {
      $hover.showInfoMessage('', {
      messageElement: $msg.clone().show(),
      transient: false,
      position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
      dismissable: false,
      relativeToBody: true
      });
      },
      function() {
      StackExchange.helpers.removeMessages();
      }
      );
      });
      });
      Dec 14 '18 at 3:54


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.


























          3 Answers
          3






          active

          oldest

          votes


















          7














          No need groupby using





          Method 1factorize



          pd.factorize(df.A)[0]
          array([0, 0, 0, 1, 1, 2], dtype=int64)
          #df['Aidx']=pd.factorize(df.A)[0]




          Method 2 sklearn



          from sklearn import preprocessing
          le = preprocessing.LabelEncoder()
          le.fit(df.A)
          LabelEncoder()
          le.transform(df.A)
          array([2, 2, 2, 0, 0, 1])




          Method 3 cat.codes



          df.A.astype('category').cat.codes




          Method 4 map + unique



          l=df.A.unique()
          df.A.map(dict(zip(l,range(len(l)))))
          0 0
          1 0
          2 0
          3 1
          4 1
          5 2
          Name: A, dtype: int64




          Method 5 np.unique



          x,y=np.unique(df.A.values,return_inverse=True)
          y
          array([2, 2, 2, 0, 0, 1], dtype=int64)


          EDIT: Some timings with OP's dataframe



          '''



          %timeit pd.factorize(view.Company)[0]

          The slowest run took 6.68 times longer than the fastest. This could mean that an intermediate result is being cached.
          10000 loops, best of 3: 155 µs per loop

          %timeit view.Company.astype('category').cat.codes

          The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.
          1000 loops, best of 3: 449 µs per loop

          from itertools import izip

          %timeit l = view.Company.unique(); view.Company.map(dict(izip(l,xrange(len(l)))))

          1000 loops, best of 3: 666 µs per loop

          import numpy as np

          %timeit np.unique(view.Company.values, return_inverse=True)

          The slowest run took 8.08 times longer than the fastest. This could mean that an intermediate result is being cached.
          10000 loops, best of 3: 32.7 µs per loop


          Seems like numpy wins.






          share|improve this answer























          • Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
            – Vaishali
            Dec 14 '18 at 2:11












          • @Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
            – W-B
            Dec 14 '18 at 2:23



















          7














          One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:



          df['Aidx'] = df.groupby('A',sort=False).ngroup()
          >>> df
          Index A B Aidx
          0 0 foo 3 0
          1 1 foo 2 0
          2 2 foo 5 0
          3 3 bar 3 1
          4 4 bar 4 1
          5 5 baz 5 2





          share|improve this answer































            5














            One more method of doing so could be.



            df['C'] = i.ne(df.A.shift()).cumsum()-1
            df


            When we print df value it will be as follows.



              Index  A    B  C
            0 0 foo 3 0
            1 1 foo 2 0
            2 2 foo 5 0
            3 3 bar 3 1
            4 4 bar 4 1
            5 5 baz 5 2


            Explanation of solution: Let's break above solution into parts for understanding purposes.



            1st step: Compare df's A column by shifting its value down to itself as follows.



            i.ne(df.A.shift())


            Output we will get is:



            0     True
            1 False
            2 False
            3 True
            4 False
            5 True


            2nd step: Use of cumsum() function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum() function and its value will be increased.



            i.ne(df.A.shift()).cumsum()-1
            0 0
            1 0
            2 0
            3 1
            4 1
            5 2
            Name: A, dtype: int32


            3rd step: Save command's value into df['C'] which will create a new column named C in df.






            share|improve this answer



















            • 1




              Nice method ve++ for you
              – W-B
              Dec 14 '18 at 1:32










            • @W-B, thank you for encouragement sir, ++ve for your unique style already :)
              – RavinderSingh13
              Dec 14 '18 at 1:34




















            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            7














            No need groupby using





            Method 1factorize



            pd.factorize(df.A)[0]
            array([0, 0, 0, 1, 1, 2], dtype=int64)
            #df['Aidx']=pd.factorize(df.A)[0]




            Method 2 sklearn



            from sklearn import preprocessing
            le = preprocessing.LabelEncoder()
            le.fit(df.A)
            LabelEncoder()
            le.transform(df.A)
            array([2, 2, 2, 0, 0, 1])




            Method 3 cat.codes



            df.A.astype('category').cat.codes




            Method 4 map + unique



            l=df.A.unique()
            df.A.map(dict(zip(l,range(len(l)))))
            0 0
            1 0
            2 0
            3 1
            4 1
            5 2
            Name: A, dtype: int64




            Method 5 np.unique



            x,y=np.unique(df.A.values,return_inverse=True)
            y
            array([2, 2, 2, 0, 0, 1], dtype=int64)


            EDIT: Some timings with OP's dataframe



            '''



            %timeit pd.factorize(view.Company)[0]

            The slowest run took 6.68 times longer than the fastest. This could mean that an intermediate result is being cached.
            10000 loops, best of 3: 155 µs per loop

            %timeit view.Company.astype('category').cat.codes

            The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.
            1000 loops, best of 3: 449 µs per loop

            from itertools import izip

            %timeit l = view.Company.unique(); view.Company.map(dict(izip(l,xrange(len(l)))))

            1000 loops, best of 3: 666 µs per loop

            import numpy as np

            %timeit np.unique(view.Company.values, return_inverse=True)

            The slowest run took 8.08 times longer than the fastest. This could mean that an intermediate result is being cached.
            10000 loops, best of 3: 32.7 µs per loop


            Seems like numpy wins.






            share|improve this answer























            • Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
              – Vaishali
              Dec 14 '18 at 2:11












            • @Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
              – W-B
              Dec 14 '18 at 2:23
















            7














            No need groupby using





            Method 1factorize



            pd.factorize(df.A)[0]
            array([0, 0, 0, 1, 1, 2], dtype=int64)
            #df['Aidx']=pd.factorize(df.A)[0]




            Method 2 sklearn



            from sklearn import preprocessing
            le = preprocessing.LabelEncoder()
            le.fit(df.A)
            LabelEncoder()
            le.transform(df.A)
            array([2, 2, 2, 0, 0, 1])




            Method 3 cat.codes



            df.A.astype('category').cat.codes




            Method 4 map + unique



            l=df.A.unique()
            df.A.map(dict(zip(l,range(len(l)))))
            0 0
            1 0
            2 0
            3 1
            4 1
            5 2
            Name: A, dtype: int64




            Method 5 np.unique



            x,y=np.unique(df.A.values,return_inverse=True)
            y
            array([2, 2, 2, 0, 0, 1], dtype=int64)


            EDIT: Some timings with OP's dataframe



            '''



            %timeit pd.factorize(view.Company)[0]

            The slowest run took 6.68 times longer than the fastest. This could mean that an intermediate result is being cached.
            10000 loops, best of 3: 155 µs per loop

            %timeit view.Company.astype('category').cat.codes

            The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.
            1000 loops, best of 3: 449 µs per loop

            from itertools import izip

            %timeit l = view.Company.unique(); view.Company.map(dict(izip(l,xrange(len(l)))))

            1000 loops, best of 3: 666 µs per loop

            import numpy as np

            %timeit np.unique(view.Company.values, return_inverse=True)

            The slowest run took 8.08 times longer than the fastest. This could mean that an intermediate result is being cached.
            10000 loops, best of 3: 32.7 µs per loop


            Seems like numpy wins.






            share|improve this answer























            • Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
              – Vaishali
              Dec 14 '18 at 2:11












            • @Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
              – W-B
              Dec 14 '18 at 2:23














            7












            7








            7






            No need groupby using





            Method 1factorize



            pd.factorize(df.A)[0]
            array([0, 0, 0, 1, 1, 2], dtype=int64)
            #df['Aidx']=pd.factorize(df.A)[0]




            Method 2 sklearn



            from sklearn import preprocessing
            le = preprocessing.LabelEncoder()
            le.fit(df.A)
            LabelEncoder()
            le.transform(df.A)
            array([2, 2, 2, 0, 0, 1])




            Method 3 cat.codes



            df.A.astype('category').cat.codes




            Method 4 map + unique



            l=df.A.unique()
            df.A.map(dict(zip(l,range(len(l)))))
            0 0
            1 0
            2 0
            3 1
            4 1
            5 2
            Name: A, dtype: int64




            Method 5 np.unique



            x,y=np.unique(df.A.values,return_inverse=True)
            y
            array([2, 2, 2, 0, 0, 1], dtype=int64)


            EDIT: Some timings with OP's dataframe



            '''



            %timeit pd.factorize(view.Company)[0]

            The slowest run took 6.68 times longer than the fastest. This could mean that an intermediate result is being cached.
            10000 loops, best of 3: 155 µs per loop

            %timeit view.Company.astype('category').cat.codes

            The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.
            1000 loops, best of 3: 449 µs per loop

            from itertools import izip

            %timeit l = view.Company.unique(); view.Company.map(dict(izip(l,xrange(len(l)))))

            1000 loops, best of 3: 666 µs per loop

            import numpy as np

            %timeit np.unique(view.Company.values, return_inverse=True)

            The slowest run took 8.08 times longer than the fastest. This could mean that an intermediate result is being cached.
            10000 loops, best of 3: 32.7 µs per loop


            Seems like numpy wins.






            share|improve this answer














            No need groupby using





            Method 1factorize



            pd.factorize(df.A)[0]
            array([0, 0, 0, 1, 1, 2], dtype=int64)
            #df['Aidx']=pd.factorize(df.A)[0]




            Method 2 sklearn



            from sklearn import preprocessing
            le = preprocessing.LabelEncoder()
            le.fit(df.A)
            LabelEncoder()
            le.transform(df.A)
            array([2, 2, 2, 0, 0, 1])




            Method 3 cat.codes



            df.A.astype('category').cat.codes




            Method 4 map + unique



            l=df.A.unique()
            df.A.map(dict(zip(l,range(len(l)))))
            0 0
            1 0
            2 0
            3 1
            4 1
            5 2
            Name: A, dtype: int64




            Method 5 np.unique



            x,y=np.unique(df.A.values,return_inverse=True)
            y
            array([2, 2, 2, 0, 0, 1], dtype=int64)


            EDIT: Some timings with OP's dataframe



            '''



            %timeit pd.factorize(view.Company)[0]

            The slowest run took 6.68 times longer than the fastest. This could mean that an intermediate result is being cached.
            10000 loops, best of 3: 155 µs per loop

            %timeit view.Company.astype('category').cat.codes

            The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.
            1000 loops, best of 3: 449 µs per loop

            from itertools import izip

            %timeit l = view.Company.unique(); view.Company.map(dict(izip(l,xrange(len(l)))))

            1000 loops, best of 3: 666 µs per loop

            import numpy as np

            %timeit np.unique(view.Company.values, return_inverse=True)

            The slowest run took 8.08 times longer than the fastest. This could mean that an intermediate result is being cached.
            10000 loops, best of 3: 32.7 µs per loop


            Seems like numpy wins.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Dec 14 '18 at 17:14









            Lagerbaer

            2,6451124




            2,6451124










            answered Dec 14 '18 at 1:30









            W-BW-B

            103k73164




            103k73164












            • Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
              – Vaishali
              Dec 14 '18 at 2:11












            • @Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
              – W-B
              Dec 14 '18 at 2:23


















            • Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
              – Vaishali
              Dec 14 '18 at 2:11












            • @Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
              – W-B
              Dec 14 '18 at 2:23
















            Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
            – Vaishali
            Dec 14 '18 at 2:11






            Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
            – Vaishali
            Dec 14 '18 at 2:11














            @Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
            – W-B
            Dec 14 '18 at 2:23




            @Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
            – W-B
            Dec 14 '18 at 2:23













            7














            One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:



            df['Aidx'] = df.groupby('A',sort=False).ngroup()
            >>> df
            Index A B Aidx
            0 0 foo 3 0
            1 1 foo 2 0
            2 2 foo 5 0
            3 3 bar 3 1
            4 4 bar 4 1
            5 5 baz 5 2





            share|improve this answer




























              7














              One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:



              df['Aidx'] = df.groupby('A',sort=False).ngroup()
              >>> df
              Index A B Aidx
              0 0 foo 3 0
              1 1 foo 2 0
              2 2 foo 5 0
              3 3 bar 3 1
              4 4 bar 4 1
              5 5 baz 5 2





              share|improve this answer


























                7












                7








                7






                One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:



                df['Aidx'] = df.groupby('A',sort=False).ngroup()
                >>> df
                Index A B Aidx
                0 0 foo 3 0
                1 1 foo 2 0
                2 2 foo 5 0
                3 3 bar 3 1
                4 4 bar 4 1
                5 5 baz 5 2





                share|improve this answer














                One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:



                df['Aidx'] = df.groupby('A',sort=False).ngroup()
                >>> df
                Index A B Aidx
                0 0 foo 3 0
                1 1 foo 2 0
                2 2 foo 5 0
                3 3 bar 3 1
                4 4 bar 4 1
                5 5 baz 5 2






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Dec 14 '18 at 1:49

























                answered Dec 14 '18 at 0:50









                saculsacul

                29.9k41740




                29.9k41740























                    5














                    One more method of doing so could be.



                    df['C'] = i.ne(df.A.shift()).cumsum()-1
                    df


                    When we print df value it will be as follows.



                      Index  A    B  C
                    0 0 foo 3 0
                    1 1 foo 2 0
                    2 2 foo 5 0
                    3 3 bar 3 1
                    4 4 bar 4 1
                    5 5 baz 5 2


                    Explanation of solution: Let's break above solution into parts for understanding purposes.



                    1st step: Compare df's A column by shifting its value down to itself as follows.



                    i.ne(df.A.shift())


                    Output we will get is:



                    0     True
                    1 False
                    2 False
                    3 True
                    4 False
                    5 True


                    2nd step: Use of cumsum() function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum() function and its value will be increased.



                    i.ne(df.A.shift()).cumsum()-1
                    0 0
                    1 0
                    2 0
                    3 1
                    4 1
                    5 2
                    Name: A, dtype: int32


                    3rd step: Save command's value into df['C'] which will create a new column named C in df.






                    share|improve this answer



















                    • 1




                      Nice method ve++ for you
                      – W-B
                      Dec 14 '18 at 1:32










                    • @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                      – RavinderSingh13
                      Dec 14 '18 at 1:34


















                    5














                    One more method of doing so could be.



                    df['C'] = i.ne(df.A.shift()).cumsum()-1
                    df


                    When we print df value it will be as follows.



                      Index  A    B  C
                    0 0 foo 3 0
                    1 1 foo 2 0
                    2 2 foo 5 0
                    3 3 bar 3 1
                    4 4 bar 4 1
                    5 5 baz 5 2


                    Explanation of solution: Let's break above solution into parts for understanding purposes.



                    1st step: Compare df's A column by shifting its value down to itself as follows.



                    i.ne(df.A.shift())


                    Output we will get is:



                    0     True
                    1 False
                    2 False
                    3 True
                    4 False
                    5 True


                    2nd step: Use of cumsum() function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum() function and its value will be increased.



                    i.ne(df.A.shift()).cumsum()-1
                    0 0
                    1 0
                    2 0
                    3 1
                    4 1
                    5 2
                    Name: A, dtype: int32


                    3rd step: Save command's value into df['C'] which will create a new column named C in df.






                    share|improve this answer



















                    • 1




                      Nice method ve++ for you
                      – W-B
                      Dec 14 '18 at 1:32










                    • @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                      – RavinderSingh13
                      Dec 14 '18 at 1:34
















                    5












                    5








                    5






                    One more method of doing so could be.



                    df['C'] = i.ne(df.A.shift()).cumsum()-1
                    df


                    When we print df value it will be as follows.



                      Index  A    B  C
                    0 0 foo 3 0
                    1 1 foo 2 0
                    2 2 foo 5 0
                    3 3 bar 3 1
                    4 4 bar 4 1
                    5 5 baz 5 2


                    Explanation of solution: Let's break above solution into parts for understanding purposes.



                    1st step: Compare df's A column by shifting its value down to itself as follows.



                    i.ne(df.A.shift())


                    Output we will get is:



                    0     True
                    1 False
                    2 False
                    3 True
                    4 False
                    5 True


                    2nd step: Use of cumsum() function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum() function and its value will be increased.



                    i.ne(df.A.shift()).cumsum()-1
                    0 0
                    1 0
                    2 0
                    3 1
                    4 1
                    5 2
                    Name: A, dtype: int32


                    3rd step: Save command's value into df['C'] which will create a new column named C in df.






                    share|improve this answer














                    One more method of doing so could be.



                    df['C'] = i.ne(df.A.shift()).cumsum()-1
                    df


                    When we print df value it will be as follows.



                      Index  A    B  C
                    0 0 foo 3 0
                    1 1 foo 2 0
                    2 2 foo 5 0
                    3 3 bar 3 1
                    4 4 bar 4 1
                    5 5 baz 5 2


                    Explanation of solution: Let's break above solution into parts for understanding purposes.



                    1st step: Compare df's A column by shifting its value down to itself as follows.



                    i.ne(df.A.shift())


                    Output we will get is:



                    0     True
                    1 False
                    2 False
                    3 True
                    4 False
                    5 True


                    2nd step: Use of cumsum() function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum() function and its value will be increased.



                    i.ne(df.A.shift()).cumsum()-1
                    0 0
                    1 0
                    2 0
                    3 1
                    4 1
                    5 2
                    Name: A, dtype: int32


                    3rd step: Save command's value into df['C'] which will create a new column named C in df.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Dec 14 '18 at 1:45

























                    answered Dec 14 '18 at 1:26









                    RavinderSingh13RavinderSingh13

                    25.7k41438




                    25.7k41438








                    • 1




                      Nice method ve++ for you
                      – W-B
                      Dec 14 '18 at 1:32










                    • @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                      – RavinderSingh13
                      Dec 14 '18 at 1:34
















                    • 1




                      Nice method ve++ for you
                      – W-B
                      Dec 14 '18 at 1:32










                    • @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                      – RavinderSingh13
                      Dec 14 '18 at 1:34










                    1




                    1




                    Nice method ve++ for you
                    – W-B
                    Dec 14 '18 at 1:32




                    Nice method ve++ for you
                    – W-B
                    Dec 14 '18 at 1:32












                    @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                    – RavinderSingh13
                    Dec 14 '18 at 1:34






                    @W-B, thank you for encouragement sir, ++ve for your unique style already :)
                    – RavinderSingh13
                    Dec 14 '18 at 1:34





                    Popular posts from this blog

                    Plaza Victoria

                    Puebla de Zaragoza

                    Musa