How to calculate the average of the most recent three non-nan value using Python

I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.

name day1 day2 day3 day4  day5 day6 day7

A    1     1   nan   2    3    0   3

B    nan   nan nan   nan  nan  nan 3

C    1     1   0     1    1    1   1

D    1     1   0     1    nan  1   4

The expect output should looks like the following

name day1 day2 day3 day4  day5 day6 day7    expected 

A    1     1   nan   2    3    0   3        2     <-  1/3*(day5 + day6 + day7)

B    nan   nan nan   nan  nan  nan 3        nan   <-  less than 3 non-missing

C    1     1   0     1    1    1   1        1     <-  1/3*(day5 + day6 + day7)

D    1     1   0     1    nan  1   4        2    <-  1/3 *(day4 + day6 + day7)

I know how to calculate the average of the last three column and count how many non-missing observation are there.
df.iloc[:, 5:7].count(axis=1) average of the last three column
df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column

If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.

But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?

edited Dec 26 '18 at 20:56

asked Dec 26 '18 at 20:44

fly36

1911112

add a comment |

I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.

name day1 day2 day3 day4  day5 day6 day7

A    1     1   nan   2    3    0   3

B    nan   nan nan   nan  nan  nan 3

C    1     1   0     1    1    1   1

D    1     1   0     1    nan  1   4

The expect output should looks like the following

name day1 day2 day3 day4  day5 day6 day7    expected 

A    1     1   nan   2    3    0   3        2     <-  1/3*(day5 + day6 + day7)

B    nan   nan nan   nan  nan  nan 3        nan   <-  less than 3 non-missing

C    1     1   0     1    1    1   1        1     <-  1/3*(day5 + day6 + day7)

D    1     1   0     1    nan  1   4        2    <-  1/3 *(day4 + day6 + day7)

If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.

But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?

edited Dec 26 '18 at 20:56

asked Dec 26 '18 at 20:44

fly36

1911112

add a comment |

I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.

name day1 day2 day3 day4  day5 day6 day7

A    1     1   nan   2    3    0   3

B    nan   nan nan   nan  nan  nan 3

C    1     1   0     1    1    1   1

D    1     1   0     1    nan  1   4

The expect output should looks like the following

name day1 day2 day3 day4  day5 day6 day7    expected 

A    1     1   nan   2    3    0   3        2     <-  1/3*(day5 + day6 + day7)

B    nan   nan nan   nan  nan  nan 3        nan   <-  less than 3 non-missing

C    1     1   0     1    1    1   1        1     <-  1/3*(day5 + day6 + day7)

D    1     1   0     1    nan  1   4        2    <-  1/3 *(day4 + day6 + day7)

If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.

But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?

edited Dec 26 '18 at 20:56

asked Dec 26 '18 at 20:44

fly36

1911112

I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.

name day1 day2 day3 day4  day5 day6 day7

A    1     1   nan   2    3    0   3

B    nan   nan nan   nan  nan  nan 3

C    1     1   0     1    1    1   1

D    1     1   0     1    nan  1   4

The expect output should looks like the following

name day1 day2 day3 day4  day5 day6 day7    expected 

A    1     1   nan   2    3    0   3        2     <-  1/3*(day5 + day6 + day7)

B    nan   nan nan   nan  nan  nan 3        nan   <-  less than 3 non-missing

C    1     1   0     1    1    1   1        1     <-  1/3*(day5 + day6 + day7)

D    1     1   0     1    nan  1   4        2    <-  1/3 *(day4 + day6 + day7)

If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.

But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?

python pandas numpy

edited Dec 26 '18 at 20:56

asked Dec 26 '18 at 20:44

fly36

1911112

edited Dec 26 '18 at 20:56

asked Dec 26 '18 at 20:44

fly36

1911112

edited Dec 26 '18 at 20:56

asked Dec 26 '18 at 20:44

fly36

1911112

asked Dec 26 '18 at 20:44

fly36

1911112

asked Dec 26 '18 at 20:44

fly36

1911112

add a comment |

3 Answers
3

active

oldest

votes

Vectorized one using justify -

N = 3 # last N entries for averaging

avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)

df['expected'] = avg

edited Dec 26 '18 at 21:02

answered Dec 26 '18 at 20:58

Divakar

155k1484174

2

I know I will see justify here :-)

– W-B
Dec 26 '18 at 21:00

add a comment |

You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.

def mean_calculator(row):

    non_nulls = row.notnull()

    if non_nulls.sum() < 3:

        return np.nan

    return row[non_nulls].values[-3:].mean()



df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)



print(df)



  name  day1  day2  day3  day4  day5  day6  day7  expected

0    A   1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

1    B   NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

2    C   1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

3    D   1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

answered Dec 26 '18 at 20:56

jpp

97.4k2159109

add a comment |

You can start by calculating the expected column using applying the following function:

expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)

And insert these values in the columns that have at least 3 valid values:

m = df.isnull().sum(axis=1) > 3

df.loc[~m,'expected'] = expected.mask(m)



       day1  day2  day3  day4  day5  day6  day7  expected

name                                                    

A      1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

B      NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

C      1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

D      1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

edited Dec 26 '18 at 21:29

answered Dec 26 '18 at 20:58

yatu

6,6211826

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53936985%2fhow-to-calculate-the-average-of-the-most-recent-three-non-nan-value-using-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Vectorized one using justify -

N = 3 # last N entries for averaging

avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)

df['expected'] = avg

edited Dec 26 '18 at 21:02

answered Dec 26 '18 at 20:58

Divakar

155k1484174

2

I know I will see justify here :-)

– W-B
Dec 26 '18 at 21:00

add a comment |

Vectorized one using justify -

N = 3 # last N entries for averaging

avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)

df['expected'] = avg

edited Dec 26 '18 at 21:02

answered Dec 26 '18 at 20:58

Divakar

155k1484174

2

I know I will see justify here :-)

– W-B
Dec 26 '18 at 21:00

add a comment |

Vectorized one using justify -

N = 3 # last N entries for averaging

avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)

df['expected'] = avg

edited Dec 26 '18 at 21:02

answered Dec 26 '18 at 20:58

Divakar

155k1484174

Vectorized one using justify -

N = 3 # last N entries for averaging

avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)

df['expected'] = avg

edited Dec 26 '18 at 21:02

answered Dec 26 '18 at 20:58

Divakar

155k1484174

edited Dec 26 '18 at 21:02

answered Dec 26 '18 at 20:58

Divakar

155k1484174

answered Dec 26 '18 at 20:58

Divakar

155k1484174

answered Dec 26 '18 at 20:58

Divakar

155k1484174

2

I know I will see justify here :-)

– W-B
Dec 26 '18 at 21:00

add a comment |

2

I know I will see justify here :-)

– W-B
Dec 26 '18 at 21:00

I know I will see justify here :-)

– W-B
Dec 26 '18 at 21:00

add a comment |

You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.

def mean_calculator(row):

    non_nulls = row.notnull()

    if non_nulls.sum() < 3:

        return np.nan

    return row[non_nulls].values[-3:].mean()



df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)



print(df)



  name  day1  day2  day3  day4  day5  day6  day7  expected

0    A   1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

1    B   NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

2    C   1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

3    D   1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

answered Dec 26 '18 at 20:56

jpp

97.4k2159109

add a comment |

You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.

def mean_calculator(row):

    non_nulls = row.notnull()

    if non_nulls.sum() < 3:

        return np.nan

    return row[non_nulls].values[-3:].mean()



df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)



print(df)



  name  day1  day2  day3  day4  day5  day6  day7  expected

0    A   1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

1    B   NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

2    C   1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

3    D   1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

answered Dec 26 '18 at 20:56

jpp

97.4k2159109

add a comment |

You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.

def mean_calculator(row):

    non_nulls = row.notnull()

    if non_nulls.sum() < 3:

        return np.nan

    return row[non_nulls].values[-3:].mean()



df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)



print(df)



  name  day1  day2  day3  day4  day5  day6  day7  expected

0    A   1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

1    B   NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

2    C   1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

3    D   1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

answered Dec 26 '18 at 20:56

jpp

97.4k2159109

You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.

def mean_calculator(row):

    non_nulls = row.notnull()

    if non_nulls.sum() < 3:

        return np.nan

    return row[non_nulls].values[-3:].mean()



df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)



print(df)



  name  day1  day2  day3  day4  day5  day6  day7  expected

0    A   1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

1    B   NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

2    C   1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

3    D   1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

answered Dec 26 '18 at 20:56

jpp

97.4k2159109

answered Dec 26 '18 at 20:56

jpp

97.4k2159109

answered Dec 26 '18 at 20:56

jpp

97.4k2159109

answered Dec 26 '18 at 20:56

jpp

97.4k2159109

add a comment |

You can start by calculating the expected column using applying the following function:

expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)

And insert these values in the columns that have at least 3 valid values:

m = df.isnull().sum(axis=1) > 3

df.loc[~m,'expected'] = expected.mask(m)



       day1  day2  day3  day4  day5  day6  day7  expected

name                                                    

A      1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

B      NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

C      1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

D      1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

edited Dec 26 '18 at 21:29

answered Dec 26 '18 at 20:58

yatu

6,6211826

add a comment |

You can start by calculating the expected column using applying the following function:

expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)

And insert these values in the columns that have at least 3 valid values:

m = df.isnull().sum(axis=1) > 3

df.loc[~m,'expected'] = expected.mask(m)



       day1  day2  day3  day4  day5  day6  day7  expected

name                                                    

A      1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

B      NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

C      1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

D      1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

edited Dec 26 '18 at 21:29

answered Dec 26 '18 at 20:58

yatu

6,6211826

add a comment |

You can start by calculating the expected column using applying the following function:

expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)

And insert these values in the columns that have at least 3 valid values:

m = df.isnull().sum(axis=1) > 3

df.loc[~m,'expected'] = expected.mask(m)



       day1  day2  day3  day4  day5  day6  day7  expected

name                                                    

A      1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

B      NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

C      1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

D      1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

edited Dec 26 '18 at 21:29

answered Dec 26 '18 at 20:58

yatu

6,6211826

You can start by calculating the expected column using applying the following function:

expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)

And insert these values in the columns that have at least 3 valid values:

m = df.isnull().sum(axis=1) > 3

df.loc[~m,'expected'] = expected.mask(m)



       day1  day2  day3  day4  day5  day6  day7  expected

name                                                    

A      1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

B      NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

C      1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

D      1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

edited Dec 26 '18 at 21:29

answered Dec 26 '18 at 20:58

yatu

6,6211826

edited Dec 26 '18 at 21:29

answered Dec 26 '18 at 20:58

yatu

6,6211826

answered Dec 26 '18 at 20:58

yatu

6,6211826

answered Dec 26 '18 at 20:58

yatu

6,6211826

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Csdrhrt