A pythonic and uFunc-y way to turn pandas column into “increasing” index? [duplicate]

This question already has an answer here:

Count each group sequentially pandas

3 answers

Add numeric column to pandas dataframe based on other textual column

5 answers

Let's say I have a pandas df like so:

Index   A     B

0      foo    3

1      foo    2

2      foo    5

3      bar    3

4      bar    4

5      baz    5

What's a good fast way to add a column like so:

Index   A     B    Aidx

0      foo    3    0

1      foo    2    0

2      foo    5    0

3      bar    3    1

4      bar    4    1

5      baz    5    2

I.e. adding an increasing index for each unique value?

I know I could use df.unique(), then use a dict and enumerate to create a lookup, and then apply that dictionary lookup to create the column. But I feel like there should be faster way, possibly involving groupby with some special function?

asked Dec 14 '18 at 0:48

Lagerbaer

2,6451124

marked as duplicate by coldspeed pandas
Users with the pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Dec 14 '18 at 3:54

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

This question already has an answer here:

Count each group sequentially pandas

3 answers

Add numeric column to pandas dataframe based on other textual column

5 answers

Let's say I have a pandas df like so:

Index   A     B

0      foo    3

1      foo    2

2      foo    5

3      bar    3

4      bar    4

5      baz    5

What's a good fast way to add a column like so:

Index   A     B    Aidx

0      foo    3    0

1      foo    2    0

2      foo    5    0

3      bar    3    1

4      bar    4    1

5      baz    5    2

I.e. adding an increasing index for each unique value?

asked Dec 14 '18 at 0:48

Lagerbaer

2,6451124

marked as duplicate by coldspeed pandas
Users with the pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Dec 14 '18 at 3:54

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

This question already has an answer here:

Count each group sequentially pandas

3 answers

Add numeric column to pandas dataframe based on other textual column

5 answers

Let's say I have a pandas df like so:

Index   A     B

0      foo    3

1      foo    2

2      foo    5

3      bar    3

4      bar    4

5      baz    5

What's a good fast way to add a column like so:

Index   A     B    Aidx

0      foo    3    0

1      foo    2    0

2      foo    5    0

3      bar    3    1

4      bar    4    1

5      baz    5    2

I.e. adding an increasing index for each unique value?

asked Dec 14 '18 at 0:48

Lagerbaer

2,6451124

This question already has an answer here:

Count each group sequentially pandas

3 answers

Add numeric column to pandas dataframe based on other textual column

5 answers

Let's say I have a pandas df like so:

Index   A     B

0      foo    3

1      foo    2

2      foo    5

3      bar    3

4      bar    4

5      baz    5

What's a good fast way to add a column like so:

Index   A     B    Aidx

0      foo    3    0

1      foo    2    0

2      foo    5    0

3      bar    3    1

4      bar    4    1

5      baz    5    2

I.e. adding an increasing index for each unique value?

This question already has an answer here:

Count each group sequentially pandas

3 answers

Add numeric column to pandas dataframe based on other textual column

5 answers

python pandas

asked Dec 14 '18 at 0:48

Lagerbaer

2,6451124

asked Dec 14 '18 at 0:48

Lagerbaer

2,6451124

asked Dec 14 '18 at 0:48

Lagerbaer

2,6451124

asked Dec 14 '18 at 0:48

Lagerbaer

2,6451124

asked Dec 14 '18 at 0:48

Lagerbaer

2,6451124

marked as duplicate by coldspeed pandas
Users with the pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Dec 14 '18 at 3:54

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by coldspeed pandas
Users with the pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Dec 14 '18 at 3:54

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

3 Answers
3

active

oldest

votes

No need groupby using

Method 1factorize

pd.factorize(df.A)[0]

array([0, 0, 0, 1, 1, 2], dtype=int64)

#df['Aidx']=pd.factorize(df.A)[0]

Method 2 sklearn

from sklearn import preprocessing

le = preprocessing.LabelEncoder()

le.fit(df.A)

LabelEncoder()

le.transform(df.A)

array([2, 2, 2, 0, 0, 1])

Method 3 cat.codes

df.A.astype('category').cat.codes

Method 4 map + unique

l=df.A.unique()

df.A.map(dict(zip(l,range(len(l)))))

0    0

1    0

2    0

3    1

4    1

5    2

Name: A, dtype: int64

Method 5 np.unique

x,y=np.unique(df.A.values,return_inverse=True)

y

array([2, 2, 2, 0, 0, 1], dtype=int64)

EDIT: Some timings with OP's dataframe

'''

%timeit pd.factorize(view.Company)[0]



The slowest run took 6.68 times longer than the fastest. This could mean that an intermediate result is being cached.

10000 loops, best of 3: 155 µs per loop



%timeit view.Company.astype('category').cat.codes



The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.

1000 loops, best of 3: 449 µs per loop



from itertools import izip



%timeit l = view.Company.unique(); view.Company.map(dict(izip(l,xrange(len(l)))))



1000 loops, best of 3: 666 µs per loop



import numpy as np



%timeit np.unique(view.Company.values, return_inverse=True)



The slowest run took 8.08 times longer than the fastest. This could mean that an intermediate result is being cached.

10000 loops, best of 3: 32.7 µs per loop

Seems like numpy wins.

edited Dec 14 '18 at 17:14

Lagerbaer

2,6451124

answered Dec 14 '18 at 1:30

W-B

103k73164

Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
– Vaishali
Dec 14 '18 at 2:11

@Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
– W-B
Dec 14 '18 at 2:23

add a comment |

One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:

df['Aidx'] = df.groupby('A',sort=False).ngroup()

>>> df

   Index    A  B  Aidx

0      0  foo  3     0

1      1  foo  2     0

2      2  foo  5     0

3      3  bar  3     1

4      4  bar  4     1

5      5  baz  5     2

edited Dec 14 '18 at 1:49

answered Dec 14 '18 at 0:50

sacul

29.9k41740

add a comment |

One more method of doing so could be.

df['C'] = i.ne(df.A.shift()).cumsum()-1

df

When we print df value it will be as follows.

  Index  A    B  C

0  0     foo  3  0

1  1     foo  2  0 

2  2     foo  5  0 

3  3     bar  3  1 

4  4     bar  4  1 

5  5     baz  5  2

Explanation of solution: Let's break above solution into parts for understanding purposes.

1st step: Compare df's A column by shifting its value down to itself as follows.

i.ne(df.A.shift())

Output we will get is:

0     True

1    False

2    False

3     True

4    False

5     True

2nd step: Use of cumsum() function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum() function and its value will be increased.

i.ne(df.A.shift()).cumsum()-1

0    0

1    0

2    0

3    1

4    1

5    2

Name: A, dtype: int32

3rd step: Save command's value into df['C'] which will create a new column named C in df.

edited Dec 14 '18 at 1:45

answered Dec 14 '18 at 1:26

RavinderSingh13

25.7k41438

1

Nice method ve++ for you
– W-B
Dec 14 '18 at 1:32

@W-B, thank you for encouragement sir, ++ve for your unique style already :)
– RavinderSingh13
Dec 14 '18 at 1:34

add a comment |

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

No need groupby using

Method 1factorize

pd.factorize(df.A)[0]

array([0, 0, 0, 1, 1, 2], dtype=int64)

#df['Aidx']=pd.factorize(df.A)[0]

Method 2 sklearn

from sklearn import preprocessing

le = preprocessing.LabelEncoder()

le.fit(df.A)

LabelEncoder()

le.transform(df.A)

array([2, 2, 2, 0, 0, 1])

Method 3 cat.codes

df.A.astype('category').cat.codes

Method 4 map + unique

l=df.A.unique()

df.A.map(dict(zip(l,range(len(l)))))

0    0

1    0

2    0

3    1

4    1

5    2

Name: A, dtype: int64

Method 5 np.unique

x,y=np.unique(df.A.values,return_inverse=True)

y

array([2, 2, 2, 0, 0, 1], dtype=int64)

EDIT: Some timings with OP's dataframe

'''

%timeit pd.factorize(view.Company)[0]



The slowest run took 6.68 times longer than the fastest. This could mean that an intermediate result is being cached.

10000 loops, best of 3: 155 µs per loop



%timeit view.Company.astype('category').cat.codes



The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.

1000 loops, best of 3: 449 µs per loop



from itertools import izip



%timeit l = view.Company.unique(); view.Company.map(dict(izip(l,xrange(len(l)))))



1000 loops, best of 3: 666 µs per loop



import numpy as np



%timeit np.unique(view.Company.values, return_inverse=True)



The slowest run took 8.08 times longer than the fastest. This could mean that an intermediate result is being cached.

10000 loops, best of 3: 32.7 µs per loop

Seems like numpy wins.

edited Dec 14 '18 at 17:14

Lagerbaer

2,6451124

answered Dec 14 '18 at 1:30

W-B

103k73164

Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
– Vaishali
Dec 14 '18 at 2:11

@Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
– W-B
Dec 14 '18 at 2:23

add a comment |

No need groupby using

Method 1factorize

pd.factorize(df.A)[0]

array([0, 0, 0, 1, 1, 2], dtype=int64)

#df['Aidx']=pd.factorize(df.A)[0]

Method 2 sklearn

from sklearn import preprocessing

le = preprocessing.LabelEncoder()

le.fit(df.A)

LabelEncoder()

le.transform(df.A)

array([2, 2, 2, 0, 0, 1])

Method 3 cat.codes

df.A.astype('category').cat.codes

Method 4 map + unique

l=df.A.unique()

df.A.map(dict(zip(l,range(len(l)))))

0    0

1    0

2    0

3    1

4    1

5    2

Name: A, dtype: int64

Method 5 np.unique

x,y=np.unique(df.A.values,return_inverse=True)

y

array([2, 2, 2, 0, 0, 1], dtype=int64)

EDIT: Some timings with OP's dataframe

'''

%timeit pd.factorize(view.Company)[0]



The slowest run took 6.68 times longer than the fastest. This could mean that an intermediate result is being cached.

10000 loops, best of 3: 155 µs per loop



%timeit view.Company.astype('category').cat.codes



The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.

1000 loops, best of 3: 449 µs per loop



from itertools import izip



%timeit l = view.Company.unique(); view.Company.map(dict(izip(l,xrange(len(l)))))



1000 loops, best of 3: 666 µs per loop



import numpy as np



%timeit np.unique(view.Company.values, return_inverse=True)



The slowest run took 8.08 times longer than the fastest. This could mean that an intermediate result is being cached.

10000 loops, best of 3: 32.7 µs per loop

Seems like numpy wins.

edited Dec 14 '18 at 17:14

Lagerbaer

2,6451124

answered Dec 14 '18 at 1:30

W-B

103k73164

Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
– Vaishali
Dec 14 '18 at 2:11

@Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
– W-B
Dec 14 '18 at 2:23

add a comment |

No need groupby using

Method 1factorize

pd.factorize(df.A)[0]

array([0, 0, 0, 1, 1, 2], dtype=int64)

#df['Aidx']=pd.factorize(df.A)[0]

Method 2 sklearn

from sklearn import preprocessing

le = preprocessing.LabelEncoder()

le.fit(df.A)

LabelEncoder()

le.transform(df.A)

array([2, 2, 2, 0, 0, 1])

Method 3 cat.codes

df.A.astype('category').cat.codes

Method 4 map + unique

l=df.A.unique()

df.A.map(dict(zip(l,range(len(l)))))

0    0

1    0

2    0

3    1

4    1

5    2

Name: A, dtype: int64

Method 5 np.unique

x,y=np.unique(df.A.values,return_inverse=True)

y

array([2, 2, 2, 0, 0, 1], dtype=int64)

EDIT: Some timings with OP's dataframe

'''

%timeit pd.factorize(view.Company)[0]



The slowest run took 6.68 times longer than the fastest. This could mean that an intermediate result is being cached.

10000 loops, best of 3: 155 µs per loop



%timeit view.Company.astype('category').cat.codes



The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.

1000 loops, best of 3: 449 µs per loop



from itertools import izip



%timeit l = view.Company.unique(); view.Company.map(dict(izip(l,xrange(len(l)))))



1000 loops, best of 3: 666 µs per loop



import numpy as np



%timeit np.unique(view.Company.values, return_inverse=True)



The slowest run took 8.08 times longer than the fastest. This could mean that an intermediate result is being cached.

10000 loops, best of 3: 32.7 µs per loop

Seems like numpy wins.

edited Dec 14 '18 at 17:14

Lagerbaer

2,6451124

answered Dec 14 '18 at 1:30

W-B

103k73164

No need groupby using

Method 1factorize

pd.factorize(df.A)[0]

array([0, 0, 0, 1, 1, 2], dtype=int64)

#df['Aidx']=pd.factorize(df.A)[0]

Method 2 sklearn

from sklearn import preprocessing

le = preprocessing.LabelEncoder()

le.fit(df.A)

LabelEncoder()

le.transform(df.A)

array([2, 2, 2, 0, 0, 1])

Method 3 cat.codes

df.A.astype('category').cat.codes

Method 4 map + unique

l=df.A.unique()

df.A.map(dict(zip(l,range(len(l)))))

0    0

1    0

2    0

3    1

4    1

5    2

Name: A, dtype: int64

Method 5 np.unique

x,y=np.unique(df.A.values,return_inverse=True)

y

array([2, 2, 2, 0, 0, 1], dtype=int64)

EDIT: Some timings with OP's dataframe

'''

%timeit pd.factorize(view.Company)[0]



The slowest run took 6.68 times longer than the fastest. This could mean that an intermediate result is being cached.

10000 loops, best of 3: 155 µs per loop



%timeit view.Company.astype('category').cat.codes



The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.

1000 loops, best of 3: 449 µs per loop



from itertools import izip



%timeit l = view.Company.unique(); view.Company.map(dict(izip(l,xrange(len(l)))))



1000 loops, best of 3: 666 µs per loop



import numpy as np



%timeit np.unique(view.Company.values, return_inverse=True)



The slowest run took 8.08 times longer than the fastest. This could mean that an intermediate result is being cached.

10000 loops, best of 3: 32.7 µs per loop

Seems like numpy wins.

edited Dec 14 '18 at 17:14

Lagerbaer

2,6451124

answered Dec 14 '18 at 1:30

W-B

103k73164

edited Dec 14 '18 at 17:14

Lagerbaer

2,6451124

edited Dec 14 '18 at 17:14

Lagerbaer

2,6451124

edited Dec 14 '18 at 17:14

Lagerbaer

2,6451124

answered Dec 14 '18 at 1:30

W-B

103k73164

answered Dec 14 '18 at 1:30

W-B

103k73164

answered Dec 14 '18 at 1:30

W-B

103k73164

Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
– Vaishali
Dec 14 '18 at 2:11

@Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
– W-B
Dec 14 '18 at 2:23

add a comment |

Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
– Vaishali
Dec 14 '18 at 2:11

@Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
– W-B
Dec 14 '18 at 2:23

Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
– Vaishali
Dec 14 '18 at 2:11

@Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
– W-B
Dec 14 '18 at 2:23

add a comment |

One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:

df['Aidx'] = df.groupby('A',sort=False).ngroup()

>>> df

   Index    A  B  Aidx

0      0  foo  3     0

1      1  foo  2     0

2      2  foo  5     0

3      3  bar  3     1

4      4  bar  4     1

5      5  baz  5     2

edited Dec 14 '18 at 1:49

answered Dec 14 '18 at 0:50

sacul

29.9k41740

add a comment |

One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:

df['Aidx'] = df.groupby('A',sort=False).ngroup()

>>> df

   Index    A  B  Aidx

0      0  foo  3     0

1      1  foo  2     0

2      2  foo  5     0

3      3  bar  3     1

4      4  bar  4     1

5      5  baz  5     2

edited Dec 14 '18 at 1:49

answered Dec 14 '18 at 0:50

sacul

29.9k41740

add a comment |

One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:

df['Aidx'] = df.groupby('A',sort=False).ngroup()

>>> df

   Index    A  B  Aidx

0      0  foo  3     0

1      1  foo  2     0

2      2  foo  5     0

3      3  bar  3     1

4      4  bar  4     1

5      5  baz  5     2

edited Dec 14 '18 at 1:49

answered Dec 14 '18 at 0:50

sacul

29.9k41740

One way is to use ngroup. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False:

df['Aidx'] = df.groupby('A',sort=False).ngroup()

>>> df

   Index    A  B  Aidx

0      0  foo  3     0

1      1  foo  2     0

2      2  foo  5     0

3      3  bar  3     1

4      4  bar  4     1

5      5  baz  5     2

edited Dec 14 '18 at 1:49

answered Dec 14 '18 at 0:50

sacul

29.9k41740

edited Dec 14 '18 at 1:49

answered Dec 14 '18 at 0:50

sacul

29.9k41740

answered Dec 14 '18 at 0:50

sacul

29.9k41740

answered Dec 14 '18 at 0:50

sacul

29.9k41740

add a comment |

One more method of doing so could be.

df['C'] = i.ne(df.A.shift()).cumsum()-1

df

When we print df value it will be as follows.

  Index  A    B  C

0  0     foo  3  0

1  1     foo  2  0 

2  2     foo  5  0 

3  3     bar  3  1 

4  4     bar  4  1 

5  5     baz  5  2

Explanation of solution: Let's break above solution into parts for understanding purposes.

1st step: Compare df's A column by shifting its value down to itself as follows.

i.ne(df.A.shift())

Output we will get is:

0     True

1    False

2    False

3     True

4    False

5     True

i.ne(df.A.shift()).cumsum()-1

0    0

1    0

2    0

3    1

4    1

5    2

Name: A, dtype: int32

3rd step: Save command's value into df['C'] which will create a new column named C in df.

edited Dec 14 '18 at 1:45

answered Dec 14 '18 at 1:26

RavinderSingh13

25.7k41438

1

Nice method ve++ for you
– W-B
Dec 14 '18 at 1:32

@W-B, thank you for encouragement sir, ++ve for your unique style already :)
– RavinderSingh13
Dec 14 '18 at 1:34

add a comment |

One more method of doing so could be.

df['C'] = i.ne(df.A.shift()).cumsum()-1

df

When we print df value it will be as follows.

  Index  A    B  C

0  0     foo  3  0

1  1     foo  2  0 

2  2     foo  5  0 

3  3     bar  3  1 

4  4     bar  4  1 

5  5     baz  5  2

Explanation of solution: Let's break above solution into parts for understanding purposes.

1st step: Compare df's A column by shifting its value down to itself as follows.

i.ne(df.A.shift())

Output we will get is:

0     True

1    False

2    False

3     True

4    False

5     True

i.ne(df.A.shift()).cumsum()-1

0    0

1    0

2    0

3    1

4    1

5    2

Name: A, dtype: int32

3rd step: Save command's value into df['C'] which will create a new column named C in df.

edited Dec 14 '18 at 1:45

answered Dec 14 '18 at 1:26

RavinderSingh13

25.7k41438

1

Nice method ve++ for you
– W-B
Dec 14 '18 at 1:32

@W-B, thank you for encouragement sir, ++ve for your unique style already :)
– RavinderSingh13
Dec 14 '18 at 1:34

add a comment |

One more method of doing so could be.

df['C'] = i.ne(df.A.shift()).cumsum()-1

df

When we print df value it will be as follows.

  Index  A    B  C

0  0     foo  3  0

1  1     foo  2  0 

2  2     foo  5  0 

3  3     bar  3  1 

4  4     bar  4  1 

5  5     baz  5  2

Explanation of solution: Let's break above solution into parts for understanding purposes.

1st step: Compare df's A column by shifting its value down to itself as follows.

i.ne(df.A.shift())

Output we will get is:

0     True

1    False

2    False

3     True

4    False

5     True

i.ne(df.A.shift()).cumsum()-1

0    0

1    0

2    0

3    1

4    1

5    2

Name: A, dtype: int32

3rd step: Save command's value into df['C'] which will create a new column named C in df.

edited Dec 14 '18 at 1:45

answered Dec 14 '18 at 1:26

RavinderSingh13

25.7k41438

One more method of doing so could be.

df['C'] = i.ne(df.A.shift()).cumsum()-1

df

When we print df value it will be as follows.

  Index  A    B  C

0  0     foo  3  0

1  1     foo  2  0 

2  2     foo  5  0 

3  3     bar  3  1 

4  4     bar  4  1 

5  5     baz  5  2

Explanation of solution: Let's break above solution into parts for understanding purposes.

1st step: Compare df's A column by shifting its value down to itself as follows.

i.ne(df.A.shift())

Output we will get is:

0     True

1    False

2    False

3     True

4    False

5     True

i.ne(df.A.shift()).cumsum()-1

0    0

1    0

2    0

3    1

4    1

5    2

Name: A, dtype: int32

3rd step: Save command's value into df['C'] which will create a new column named C in df.

edited Dec 14 '18 at 1:45

answered Dec 14 '18 at 1:26

RavinderSingh13

25.7k41438

edited Dec 14 '18 at 1:45

answered Dec 14 '18 at 1:26

RavinderSingh13

25.7k41438

answered Dec 14 '18 at 1:26

RavinderSingh13

25.7k41438

answered Dec 14 '18 at 1:26

RavinderSingh13

25.7k41438

1

Nice method ve++ for you
– W-B
Dec 14 '18 at 1:32

@W-B, thank you for encouragement sir, ++ve for your unique style already :)
– RavinderSingh13
Dec 14 '18 at 1:34

add a comment |

1

Nice method ve++ for you
– W-B
Dec 14 '18 at 1:32

@W-B, thank you for encouragement sir, ++ve for your unique style already :)
– RavinderSingh13
Dec 14 '18 at 1:34

Nice method ve++ for you
– W-B
Dec 14 '18 at 1:32

@W-B, thank you for encouragement sir, ++ve for your unique style already :)
– RavinderSingh13
Dec 14 '18 at 1:34

add a comment |

This page is only for reference, If you need detailed information, please check here

PBLLk1cMA nf3XYmg YFzoG30ToyvUylGBI4y T2 MTjI

搜尋此網誌

Csdrhrt