How to calculate the average of the most recent three non-nan value using Python












7















I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.



name day1 day2 day3 day4  day5 day6 day7
A 1 1 nan 2 3 0 3
B nan nan nan nan nan nan 3
C 1 1 0 1 1 1 1
D 1 1 0 1 nan 1 4


The expect output should looks like the following



name day1 day2 day3 day4  day5 day6 day7    expected 
A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)


I know how to calculate the average of the last three column and count how many non-missing observation are there.
df.iloc[:, 5:7].count(axis=1) average of the last three column
df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column



If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.



But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?










share|improve this question





























    7















    I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.



    name day1 day2 day3 day4  day5 day6 day7
    A 1 1 nan 2 3 0 3
    B nan nan nan nan nan nan 3
    C 1 1 0 1 1 1 1
    D 1 1 0 1 nan 1 4


    The expect output should looks like the following



    name day1 day2 day3 day4  day5 day6 day7    expected 
    A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
    B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
    C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
    D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)


    I know how to calculate the average of the last three column and count how many non-missing observation are there.
    df.iloc[:, 5:7].count(axis=1) average of the last three column
    df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column



    If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.



    But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?










    share|improve this question



























      7












      7








      7


      2






      I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.



      name day1 day2 day3 day4  day5 day6 day7
      A 1 1 nan 2 3 0 3
      B nan nan nan nan nan nan 3
      C 1 1 0 1 1 1 1
      D 1 1 0 1 nan 1 4


      The expect output should looks like the following



      name day1 day2 day3 day4  day5 day6 day7    expected 
      A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
      B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
      C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
      D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)


      I know how to calculate the average of the last three column and count how many non-missing observation are there.
      df.iloc[:, 5:7].count(axis=1) average of the last three column
      df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column



      If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.



      But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?










      share|improve this question
















      I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.



      name day1 day2 day3 day4  day5 day6 day7
      A 1 1 nan 2 3 0 3
      B nan nan nan nan nan nan 3
      C 1 1 0 1 1 1 1
      D 1 1 0 1 nan 1 4


      The expect output should looks like the following



      name day1 day2 day3 day4  day5 day6 day7    expected 
      A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
      B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
      C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
      D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)


      I know how to calculate the average of the last three column and count how many non-missing observation are there.
      df.iloc[:, 5:7].count(axis=1) average of the last three column
      df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column



      If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.



      But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?







      python pandas numpy






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 26 '18 at 20:56







      fly36

















      asked Dec 26 '18 at 20:44









      fly36fly36

      1911112




      1911112
























          3 Answers
          3






          active

          oldest

          votes


















          5














          Vectorized one using justify -



          N = 3 # last N entries for averaging
          avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
          df['expected'] = avg





          share|improve this answer





















          • 2





            I know I will see justify here :-)

            – W-B
            Dec 26 '18 at 21:00



















          2














          You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.



          def mean_calculator(row):
          non_nulls = row.notnull()
          if non_nulls.sum() < 3:
          return np.nan
          return row[non_nulls].values[-3:].mean()

          df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)

          print(df)

          name day1 day2 day3 day4 day5 day6 day7 expected
          0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
          1 B NaN NaN NaN NaN NaN NaN 3 NaN
          2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
          3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0





          share|improve this answer































            1














            You can start by calculating the expected column using applying the following function:



            expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)


            And insert these values in the columns that have at least 3 valid values:



            m = df.isnull().sum(axis=1) > 3
            df.loc[~m,'expected'] = expected.mask(m)

            day1 day2 day3 day4 day5 day6 day7 expected
            name
            A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
            B NaN NaN NaN NaN NaN NaN 3 NaN
            C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
            D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0





            share|improve this answer

























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53936985%2fhow-to-calculate-the-average-of-the-most-recent-three-non-nan-value-using-python%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              5














              Vectorized one using justify -



              N = 3 # last N entries for averaging
              avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
              df['expected'] = avg





              share|improve this answer





















              • 2





                I know I will see justify here :-)

                – W-B
                Dec 26 '18 at 21:00
















              5














              Vectorized one using justify -



              N = 3 # last N entries for averaging
              avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
              df['expected'] = avg





              share|improve this answer





















              • 2





                I know I will see justify here :-)

                – W-B
                Dec 26 '18 at 21:00














              5












              5








              5







              Vectorized one using justify -



              N = 3 # last N entries for averaging
              avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
              df['expected'] = avg





              share|improve this answer















              Vectorized one using justify -



              N = 3 # last N entries for averaging
              avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
              df['expected'] = avg






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Dec 26 '18 at 21:02

























              answered Dec 26 '18 at 20:58









              DivakarDivakar

              155k1484174




              155k1484174








              • 2





                I know I will see justify here :-)

                – W-B
                Dec 26 '18 at 21:00














              • 2





                I know I will see justify here :-)

                – W-B
                Dec 26 '18 at 21:00








              2




              2





              I know I will see justify here :-)

              – W-B
              Dec 26 '18 at 21:00





              I know I will see justify here :-)

              – W-B
              Dec 26 '18 at 21:00













              2














              You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.



              def mean_calculator(row):
              non_nulls = row.notnull()
              if non_nulls.sum() < 3:
              return np.nan
              return row[non_nulls].values[-3:].mean()

              df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)

              print(df)

              name day1 day2 day3 day4 day5 day6 day7 expected
              0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
              1 B NaN NaN NaN NaN NaN NaN 3 NaN
              2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
              3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0





              share|improve this answer




























                2














                You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.



                def mean_calculator(row):
                non_nulls = row.notnull()
                if non_nulls.sum() < 3:
                return np.nan
                return row[non_nulls].values[-3:].mean()

                df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)

                print(df)

                name day1 day2 day3 day4 day5 day6 day7 expected
                0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
                1 B NaN NaN NaN NaN NaN NaN 3 NaN
                2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
                3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0





                share|improve this answer


























                  2












                  2








                  2







                  You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.



                  def mean_calculator(row):
                  non_nulls = row.notnull()
                  if non_nulls.sum() < 3:
                  return np.nan
                  return row[non_nulls].values[-3:].mean()

                  df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)

                  print(df)

                  name day1 day2 day3 day4 day5 day6 day7 expected
                  0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
                  1 B NaN NaN NaN NaN NaN NaN 3 NaN
                  2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
                  3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0





                  share|improve this answer













                  You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.



                  def mean_calculator(row):
                  non_nulls = row.notnull()
                  if non_nulls.sum() < 3:
                  return np.nan
                  return row[non_nulls].values[-3:].mean()

                  df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)

                  print(df)

                  name day1 day2 day3 day4 day5 day6 day7 expected
                  0 A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
                  1 B NaN NaN NaN NaN NaN NaN 3 NaN
                  2 C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
                  3 D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Dec 26 '18 at 20:56









                  jppjpp

                  97.4k2159109




                  97.4k2159109























                      1














                      You can start by calculating the expected column using applying the following function:



                      expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)


                      And insert these values in the columns that have at least 3 valid values:



                      m = df.isnull().sum(axis=1) > 3
                      df.loc[~m,'expected'] = expected.mask(m)

                      day1 day2 day3 day4 day5 day6 day7 expected
                      name
                      A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
                      B NaN NaN NaN NaN NaN NaN 3 NaN
                      C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
                      D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0





                      share|improve this answer






























                        1














                        You can start by calculating the expected column using applying the following function:



                        expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)


                        And insert these values in the columns that have at least 3 valid values:



                        m = df.isnull().sum(axis=1) > 3
                        df.loc[~m,'expected'] = expected.mask(m)

                        day1 day2 day3 day4 day5 day6 day7 expected
                        name
                        A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
                        B NaN NaN NaN NaN NaN NaN 3 NaN
                        C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
                        D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0





                        share|improve this answer




























                          1












                          1








                          1







                          You can start by calculating the expected column using applying the following function:



                          expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)


                          And insert these values in the columns that have at least 3 valid values:



                          m = df.isnull().sum(axis=1) > 3
                          df.loc[~m,'expected'] = expected.mask(m)

                          day1 day2 day3 day4 day5 day6 day7 expected
                          name
                          A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
                          B NaN NaN NaN NaN NaN NaN 3 NaN
                          C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
                          D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0





                          share|improve this answer















                          You can start by calculating the expected column using applying the following function:



                          expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)


                          And insert these values in the columns that have at least 3 valid values:



                          m = df.isnull().sum(axis=1) > 3
                          df.loc[~m,'expected'] = expected.mask(m)

                          day1 day2 day3 day4 day5 day6 day7 expected
                          name
                          A 1.0 1.0 NaN 2.0 3.0 0.0 3 2.0
                          B NaN NaN NaN NaN NaN NaN 3 NaN
                          C 1.0 1.0 0.0 1.0 1.0 1.0 1 1.0
                          D 1.0 1.0 0.0 1.0 NaN 1.0 4 2.0






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Dec 26 '18 at 21:29

























                          answered Dec 26 '18 at 20:58









                          yatuyatu

                          6,6211826




                          6,6211826






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53936985%2fhow-to-calculate-the-average-of-the-most-recent-three-non-nan-value-using-python%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Plaza Victoria

                              In PowerPoint, is there a keyboard shortcut for bulleted / numbered list?

                              How to put 3 figures in Latex with 2 figures side by side and 1 below these side by side images but in...