python - SQL / Pandas equivalent -
what pandas equivalent sql query :
select column1, sum(column2) a, count(distinct column3) b, sum(column2) / count(distinct column3) c table1 group column1
thanks on that!!
i'm not sure sum(column2) / count(distinct column3) c
part can done in same single step, can in 2 steps:
demo:
in [47]: df = pd.dataframe(np.random.randint(0,5,size=(15, 3)), columns=['c1','c2','c3']) in [48]: df out[48]: c1 c2 c3 0 4 0 3 1 2 3 2 2 1 2 3 3 3 3 0 4 1 0 4 5 1 1 1 6 2 3 3 7 2 2 2 8 4 0 0 9 1 1 0 10 1 3 0 11 4 3 1 12 0 0 3 13 3 1 0 14 4 3 1 in [49]: x = df.groupby('c1').agg({'c2':'sum', 'c3': 'nunique'}).reset_index().rename(columns={'c2':'a', 'c3':'b'}) in [50]: x out[50]: c1 b 0 0 0 1 1 1 7 4 2 2 8 2 3 3 4 1 4 4 6 3 in [51]: x['c'] = x.a / x.b in [52]: x out[52]: c1 b c 0 0 0 1 0.00 1 1 7 4 1.75 2 2 8 2 4.00 3 3 4 1 4.00 4 4 6 3 2.00
Comments
Post a Comment