How to calculate the average of the most recent three non-nan value using Python
I have a dataframe df
looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.
name day1 day2 day3 day4 day5 day6 day7
A 1 1 nan 2 3 0 3
B nan nan nan nan nan nan 3
C 1 1 0 1 1 1 1
D 1 1 0 1 nan 1 4
The expect output should looks like the following
name day1 day2 day3 day4 day5 day6 day7 expected
A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)
I know how to calculate the average of the last three column and count how many non-missing observation are there.
df.iloc[:, 5:7].count(axis=1) average of the last three column
df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column
If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3
.
But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?
python pandas numpy
add a comment |
I have a dataframe df
looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.
name day1 day2 day3 day4 day5 day6 day7
A 1 1 nan 2 3 0 3
B nan nan nan nan nan nan 3
C 1 1 0 1 1 1 1
D 1 1 0 1 nan 1 4
The expect output should looks like the following
name day1 day2 day3 day4 day5 day6 day7 expected
A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)
I know how to calculate the average of the last three column and count how many non-missing observation are there.
df.iloc[:, 5:7].count(axis=1) average of the last three column
df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column
If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3
.
But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?
python pandas numpy
add a comment |
I have a dataframe df
looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.
name day1 day2 day3 day4 day5 day6 day7
A 1 1 nan 2 3 0 3
B nan nan nan nan nan nan 3
C 1 1 0 1 1 1 1
D 1 1 0 1 nan 1 4
The expect output should looks like the following
name day1 day2 day3 day4 day5 day6 day7 expected
A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)
I know how to calculate the average of the last three column and count how many non-missing observation are there.
df.iloc[:, 5:7].count(axis=1) average of the last three column
df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column
If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3
.
But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?
python pandas numpy
I have a dataframe df
looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.
name day1 day2 day3 day4 day5 day6 day7
A 1 1 nan 2 3 0 3
B nan nan nan nan nan nan 3
C 1 1 0 1 1 1 1
D 1 1 0 1 nan 1 4
The expect output should looks like the following
name day1 day2 day3 day4 day5 day6 day7 expected
A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)
I know how to calculate the average of the last three column and count how many non-missing observation are there.
df.iloc[:, 5:7].count(axis=1) average of the last three column
df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column
If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3
.
But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?
python pandas numpy
python pandas numpy
edited Dec 26 '18 at 20:56
fly36
asked Dec 26 '18 at 20:44
fly36fly36
1911112
1911112
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
Vectorized one using justify
-
N = 3 # last N entries for averaging
avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
df['expected'] = avg
2
I know I will seejustify
here :-)
– W-B
Dec 26 '18 at 21:00
add a comment |
You can use pd.DataFrame.apply
with a custom function. This is only partially vectorised.
def mean_calculator(row):
non_nulls = row.notnull()
if non_nulls.sum() < 3:
return np.nan
return row[non_nulls].values[-3:].mean()
df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)
print(df)
name day1 day2 day3 day4 day5 day6 day7 expected
0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
1 B NaN NaN NaN NaN NaN NaN 3 NaN
2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
add a comment |
You can start by calculating the expected
column using applying the following function:
expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)
And insert these values in the columns that have at least 3
valid values:
m = df.isnull().sum(axis=1) > 3
df.loc[~m,'expected'] = expected.mask(m)
day1 day2 day3 day4 day5 day6 day7 expected
name
A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
B NaN NaN NaN NaN NaN NaN 3 NaN
C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53936985%2fhow-to-calculate-the-average-of-the-most-recent-three-non-nan-value-using-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Vectorized one using justify
-
N = 3 # last N entries for averaging
avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
df['expected'] = avg
2
I know I will seejustify
here :-)
– W-B
Dec 26 '18 at 21:00
add a comment |
Vectorized one using justify
-
N = 3 # last N entries for averaging
avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
df['expected'] = avg
2
I know I will seejustify
here :-)
– W-B
Dec 26 '18 at 21:00
add a comment |
Vectorized one using justify
-
N = 3 # last N entries for averaging
avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
df['expected'] = avg
Vectorized one using justify
-
N = 3 # last N entries for averaging
avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
df['expected'] = avg
edited Dec 26 '18 at 21:02
answered Dec 26 '18 at 20:58
DivakarDivakar
155k1484174
155k1484174
2
I know I will seejustify
here :-)
– W-B
Dec 26 '18 at 21:00
add a comment |
2
I know I will seejustify
here :-)
– W-B
Dec 26 '18 at 21:00
2
2
I know I will see
justify
here :-)– W-B
Dec 26 '18 at 21:00
I know I will see
justify
here :-)– W-B
Dec 26 '18 at 21:00
add a comment |
You can use pd.DataFrame.apply
with a custom function. This is only partially vectorised.
def mean_calculator(row):
non_nulls = row.notnull()
if non_nulls.sum() < 3:
return np.nan
return row[non_nulls].values[-3:].mean()
df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)
print(df)
name day1 day2 day3 day4 day5 day6 day7 expected
0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
1 B NaN NaN NaN NaN NaN NaN 3 NaN
2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
add a comment |
You can use pd.DataFrame.apply
with a custom function. This is only partially vectorised.
def mean_calculator(row):
non_nulls = row.notnull()
if non_nulls.sum() < 3:
return np.nan
return row[non_nulls].values[-3:].mean()
df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)
print(df)
name day1 day2 day3 day4 day5 day6 day7 expected
0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
1 B NaN NaN NaN NaN NaN NaN 3 NaN
2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
add a comment |
You can use pd.DataFrame.apply
with a custom function. This is only partially vectorised.
def mean_calculator(row):
non_nulls = row.notnull()
if non_nulls.sum() < 3:
return np.nan
return row[non_nulls].values[-3:].mean()
df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)
print(df)
name day1 day2 day3 day4 day5 day6 day7 expected
0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
1 B NaN NaN NaN NaN NaN NaN 3 NaN
2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
You can use pd.DataFrame.apply
with a custom function. This is only partially vectorised.
def mean_calculator(row):
non_nulls = row.notnull()
if non_nulls.sum() < 3:
return np.nan
return row[non_nulls].values[-3:].mean()
df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)
print(df)
name day1 day2 day3 day4 day5 day6 day7 expected
0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
1 B NaN NaN NaN NaN NaN NaN 3 NaN
2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
answered Dec 26 '18 at 20:56
jppjpp
97.4k2159109
97.4k2159109
add a comment |
add a comment |
You can start by calculating the expected
column using applying the following function:
expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)
And insert these values in the columns that have at least 3
valid values:
m = df.isnull().sum(axis=1) > 3
df.loc[~m,'expected'] = expected.mask(m)
day1 day2 day3 day4 day5 day6 day7 expected
name
A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
B NaN NaN NaN NaN NaN NaN 3 NaN
C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
add a comment |
You can start by calculating the expected
column using applying the following function:
expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)
And insert these values in the columns that have at least 3
valid values:
m = df.isnull().sum(axis=1) > 3
df.loc[~m,'expected'] = expected.mask(m)
day1 day2 day3 day4 day5 day6 day7 expected
name
A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
B NaN NaN NaN NaN NaN NaN 3 NaN
C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
add a comment |
You can start by calculating the expected
column using applying the following function:
expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)
And insert these values in the columns that have at least 3
valid values:
m = df.isnull().sum(axis=1) > 3
df.loc[~m,'expected'] = expected.mask(m)
day1 day2 day3 day4 day5 day6 day7 expected
name
A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
B NaN NaN NaN NaN NaN NaN 3 NaN
C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
You can start by calculating the expected
column using applying the following function:
expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)
And insert these values in the columns that have at least 3
valid values:
m = df.isnull().sum(axis=1) > 3
df.loc[~m,'expected'] = expected.mask(m)
day1 day2 day3 day4 day5 day6 day7 expected
name
A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
B NaN NaN NaN NaN NaN NaN 3 NaN
C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0
edited Dec 26 '18 at 21:29
answered Dec 26 '18 at 20:58
yatuyatu
6,6211826
6,6211826
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53936985%2fhow-to-calculate-the-average-of-the-most-recent-three-non-nan-value-using-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown