python - Doing calculations on Pandas DataFrame with groupby and then passing it back into a DataFrame? -
i have data frame want group 2 variables, , perform calculation within variables. there easy way , put information dataframe when i'm done, i.e. this:
df=pd.dataframe({'a':[1,1,1,2,2,2,30,12,122,345], 'b':[1,1,1,2,3,3,3,2,3,4], 'c':[101,230,12,122,345,23,943,83,923,10]}) total = [] avg = [] aid = [] bid = [] name, group in df.groupby(['a', 'b']): total.append(group.c.sum()) avg.append(group.c.sum()/group.c.nunique()) aid.append(name[0]) bid.append(name[1]) x = pd.dataframe({'total':total,'avg':avg,'aid':aid,'bid':bid})
but more efficiently?
you can use pandas
aggregate function after groupby
:
import pandas pd import numpy np df.groupby(['a', 'b'])['c'].agg({'total': np.sum, 'avg': np.mean}).reset_index() # b total avg # 0 1 1 343 114.333333 # 1 2 2 122 122.000000 # 2 2 3 368 184.000000 # 3 12 2 83 83.000000 # 4 30 3 943 943.000000 # 5 122 3 923 923.000000 # 6 345 4 10 10.000000
Comments
Post a Comment