Python pandas - remove group based on collective NaN count -
i have dataset based on different weather stations several variables (temperature, pressure, etc.),
stationid | time | temperature | pressure |... ----------+------+-------------+----------+ 123 | 1 | 30 | 1010.5 | 123 | 2 | 31 | 1009.0 | 202 | 1 | 24 | nan | 202 | 2 | 24.3 | nan | 202 | 3 | nan | 1000.3 | ...
and remove 'stationid' groups, have more number of nans (taking account variables in count).
if try,
df.loc[df.groupby('station')['temperature'].filter(lambda x: len(x[pd.isnull(x)] ) < 30).index]
it works, shown here: python pandas - remove groups based on nan count threshold
but above example takes account 'temperature' only. so, how can take account collective sum of nans of available variables? i.e.: remove group, collective sum of nans in [variable1, variable2, variable3,...] less threshold.
this should work:
df.groupby('stationid').filter(lambda g: g.isnull().sum().sum() < 4)
you can replace 4
threshold number be.
df.groupby('stationid').filter(lambda g: g.isnull().sum().sum() < 4) stationid time temperature pressure 0 123 1 30.0 1010.5 1 123 2 31.0 1009.0 2 202 1 24.0 nan 3 202 2 24.3 nan 4 202 3 nan 1000.3 df.groupby('stationid').filter(lambda g: g.isnull().sum().sum() < 3) stationid time temperature pressure 0 123 1 30.0 1010.5 1 123 2 31.0 1009.0
Comments
Post a Comment