python & pandas - How to calculate frequency under conditions in columns in DataFrame? -
i have series of data in dataframe called frames
:
nousager sens idvehiculeutilise noconducteur noadresse fait nodemande periods 0 000001 + 287véh 000087 000079 1 42196000013 matin 1 000001 - 287véh 000087 000079 1 42196000013 matin 2 000314 + 263véh 000077 006470 1 42196000002 matin 3 002372 + 287véh 000087 002932 1 42196000016 matin 4 000466 + 287véh 000087 002932 1 42196000015 matin 5 000314 - 263véh 000077 000456 1 42196000002 matin 6 000466 - 287véh 000087 004900 1 42196000015 matin 7 002372 - 287véh 000087 007072 1 42196000016 matin 8 002641 + 263véh 000077 007225 1 42196000004 soir 9 002641 - 263véh 000077 000889 1 42196000004 soir 10 000382 + 263véh 000077 002095 1 42196000006 soir 11 002641 + 287véh 000087 000889 1 42196000019 soir 12 000382 - 263véh 000077 006168 1 42196000006 soir 13 002641 - 287véh 000087 007225 1 42196000019 soir 14 001611 + 287véh 000087 004236 -1 42196000021 soir 15 002785 + 263véh 000077 007482 1 42196000007 soir 16 002372 + 287véh 000087 007072 1 42196000022 soir 17 002785 - 263véh 000077 007483 1 42196000007 soir 18 000466 + 287véh 000087 004900 1 42196000023 soir 19 000382 + 263véh 000077 006168 1 42196000008 soir
for each usager
, depending on sens
, periods
, can have more 1 related address. want know usager
, how many address
have , frequency of each address. used frames.set_index(['nousager','noadresse'])
make looks like:
edit
i don't want other columns new 1 result of frequency. in way can it? can use pivot()
it?
any appreciated!
i think need groupby
columns indexes
(nousager
,sens
,periods
) in output df. need add column (noadresse
) last item in list in groupby
, converted unstack
columns in output. , need aggregate size
.
df = df.groupby(['nousager','sens','periods', 'noadresse']).size().unstack(fill_value=0) print (df)
noadresse 79 456 889 2095 2932 4236 4900 6168 6470 \ nousager sens periods 1 + matin 1 0 0 0 0 0 0 0 0 - matin 1 0 0 0 0 0 0 0 0 314 + matin 0 0 0 0 0 0 0 0 1 - matin 0 1 0 0 0 0 0 0 0 382 + soir 0 0 0 1 0 0 0 1 0 - soir 0 0 0 0 0 0 0 1 0 466 + matin 0 0 0 0 1 0 0 0 0 soir 0 0 0 0 0 0 1 0 0 - matin 0 0 0 0 0 0 1 0 0 1611 + soir 0 0 0 0 0 1 0 0 0 2372 + matin 0 0 0 0 1 0 0 0 0 soir 0 0 0 0 0 0 0 0 0 - matin 0 0 0 0 0 0 0 0 0 2641 + soir 0 0 1 0 0 0 0 0 0 - soir 0 0 1 0 0 0 0 0 0 2785 + soir 0 0 0 0 0 0 0 0 0 - soir 0 0 0 0 0 0 0 0 0 noadresse 7072 7225 7482 7483 nousager sens periods 1 + matin 0 0 0 0 - matin 0 0 0 0 314 + matin 0 0 0 0 - matin 0 0 0 0 382 + soir 0 0 0 0 - soir 0 0 0 0 466 + matin 0 0 0 0 soir 0 0 0 0 - matin 0 0 0 0 1611 + soir 0 0 0 0 2372 + matin 0 0 0 0 soir 1 0 0 0 - matin 1 0 0 0 2641 + soir 0 1 0 0 - soir 0 1 0 0 2785 + soir 0 0 1 0 - soir 0 0 0 1
if need reset index:
df = df.groupby(['nousager','sens','periods', 'noadresse']) .size() .unstack(fill_value=0) .reset_index() .rename_axis(none, axis=1) print (df) nousager sens periods 79 456 889 2095 2932 4236 4900 6168 6470 \ 0 1 + matin 1 0 0 0 0 0 0 0 0 1 1 - matin 1 0 0 0 0 0 0 0 0 2 314 + matin 0 0 0 0 0 0 0 0 1 3 314 - matin 0 1 0 0 0 0 0 0 0 4 382 + soir 0 0 0 1 0 0 0 1 0 5 382 - soir 0 0 0 0 0 0 0 1 0 6 466 + matin 0 0 0 0 1 0 0 0 0 7 466 + soir 0 0 0 0 0 0 1 0 0 8 466 - matin 0 0 0 0 0 0 1 0 0 9 1611 + soir 0 0 0 0 0 1 0 0 0 10 2372 + matin 0 0 0 0 1 0 0 0 0 11 2372 + soir 0 0 0 0 0 0 0 0 0 12 2372 - matin 0 0 0 0 0 0 0 0 0 13 2641 + soir 0 0 1 0 0 0 0 0 0 14 2641 - soir 0 0 1 0 0 0 0 0 0 15 2785 + soir 0 0 0 0 0 0 0 0 0 16 2785 - soir 0 0 0 0 0 0 0 0 0 7072 7225 7482 7483 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 7 0 0 0 0 8 0 0 0 0 9 0 0 0 0 10 0 0 0 0 11 1 0 0 0 12 1 0 0 0 13 0 1 0 0 14 0 1 0 0 15 0 0 1 0 16 0 0 0 1
another solution crosstab
:
df = pd.crosstab([df.nousager,df.sens,df.periods], df.noadresse) .reset_index() .rename_axis(none, axis=1) print (df) nousager sens periods 79 456 889 2095 2932 4236 4900 6168 6470 \ 0 1 + matin 1 0 0 0 0 0 0 0 0 1 1 - matin 1 0 0 0 0 0 0 0 0 2 314 + matin 0 0 0 0 0 0 0 0 1 3 314 - matin 0 1 0 0 0 0 0 0 0 4 382 + soir 0 0 0 1 0 0 0 1 0 5 382 - soir 0 0 0 0 0 0 0 1 0 6 466 + matin 0 0 0 0 1 0 0 0 0 7 466 + soir 0 0 0 0 0 0 1 0 0 8 466 - matin 0 0 0 0 0 0 1 0 0 9 1611 + soir 0 0 0 0 0 1 0 0 0 10 2372 + matin 0 0 0 0 1 0 0 0 0 11 2372 + soir 0 0 0 0 0 0 0 0 0 12 2372 - matin 0 0 0 0 0 0 0 0 0 13 2641 + soir 0 0 1 0 0 0 0 0 0 14 2641 - soir 0 0 1 0 0 0 0 0 0 15 2785 + soir 0 0 0 0 0 0 0 0 0 16 2785 - soir 0 0 0 0 0 0 0 0 0 7072 7225 7482 7483 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 7 0 0 0 0 8 0 0 0 0 9 0 0 0 0 10 0 0 0 0 11 1 0 0 0 12 1 0 0 0 13 0 1 0 0 14 0 1 0 0 15 0 0 1 0 16 0 0 0 1
edit comment:
i think need aggregate size
:
df = df.groupby(['nousager','noadresse']).size().reset_index(name='count') print (df) nousager noadresse count 0 1 79 2 1 314 456 1 2 314 6470 1 3 382 2095 1 4 382 6168 2 5 466 2932 1 6 466 4900 2 7 1611 4236 1 8 2372 2932 1 9 2372 7072 2 10 2641 889 2 11 2641 7225 2 12 2785 7482 1 13 2785 7483 1
if need set indexes, can use solution - rename
series
name , call to_frame
:
df = df.groupby(['nousager','noadresse']).size().rename('count').to_frame() count nousager noadresse 1 79 2 314 456 1 6470 1 382 2095 1 6168 2 466 2932 1 4900 2 1611 4236 1 2372 2932 1 7072 2 2641 889 2 7225 2 2785 7482 1 7483 1
or add set_index
:
df = df.groupby(['nousager','noadresse']) .size() .reset_index(name='count') .set_index(['nousager','noadresse']) print (df) count nousager noadresse 1 79 2 314 456 1 6470 1 382 2095 1 6168 2 466 2932 1 4900 2 1611 4236 1 2372 2932 1 7072 2 2641 889 2 7225 2 2785 7482 1 7483 1
Comments
Post a Comment