python - Rolling sum in subgroups of a dataframe (pandas) -
i have sessions
dataframe contains e-mail
, sessions
(int) columns.
i need calculate rolling sum of sessions per email (i.e. not globally).
now, following works, it's painfully slow:
emails = set(list(sessions['e-mail'])) ses_sums = [] em in emails: email_sessions = sessions[sessions['e-mail'] == em] email_sessions.is_copy = false email_sessions['session_rolling_sum'] = pd.rolling_sum(email_sessions['sessions'], window=self.window).fillna(0) ses_sums.append(email_sessions) df = pd.concat(ses_sums, ignore_index=true)
is there way of achieving same in pandas
, using pandas
operators on dataframe instead of creating separate dataframes each email , concatenating them?
(either or other way of making faster)
np.random.seed([3,1415]) df = pd.dataframe({'e-mail': np.random.choice(list('ab'), 20), 'session': np.random.randint(1, 10, 20)}) df.groupby('e-mail').session.rolling(3).sum() e-mail 0 nan 2 nan 4 11.0 5 7.0 7 10.0 12 16.0 15 16.0 17 16.0 18 17.0 19 18.0 b 1 nan 3 nan 6 18.0 8 14.0 9 16.0 10 12.0 11 13.0 13 16.0 14 20.0 16 22.0 name: session, dtype: float64
Comments
Post a Comment