python - SQL / Pandas equivalent -


what pandas equivalent sql query :

select  column1,          sum(column2) a,          count(distinct column3) b,          sum(column2) / count(distinct column3) c     table1 group column1 

thanks on that!!

i'm not sure sum(column2) / count(distinct column3) c part can done in same single step, can in 2 steps:

demo:

in [47]: df = pd.dataframe(np.random.randint(0,5,size=(15, 3)), columns=['c1','c2','c3']) in [48]: df out[48]:     c1  c2  c3 0    4   0   3 1    2   3   2 2    1   2   3 3    3   3   0 4    1   0   4 5    1   1   1 6    2   3   3 7    2   2   2 8    4   0   0 9    1   1   0 10   1   3   0 11   4   3   1 12   0   0   3 13   3   1   0 14   4   3   1  in [49]: x = df.groupby('c1').agg({'c2':'sum', 'c3': 'nunique'}).reset_index().rename(columns={'c2':'a', 'c3':'b'})  in [50]: x out[50]:    c1   b 0   0  0  1 1   1  7  4 2   2  8  2 3   3  4  1 4   4  6  3  in [51]: x['c'] = x.a / x.b  in [52]: x out[52]:    c1   b     c 0   0  0  1  0.00 1   1  7  4  1.75 2   2  8  2  4.00 3   3  4  1  4.00 4   4  6  3  2.00 

Comments

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

mapreduce - Resource manager does not transit to active state from standby -

serialization - Convert Any type in scala to Array[Byte] and back -