Python pandas - remove group based on collective NaN count -


i have dataset based on different weather stations several variables (temperature, pressure, etc.),

stationid | time | temperature | pressure |... ----------+------+-------------+----------+ 123       |  1   |     30      |  1010.5  | 123       |  2   |     31      |  1009.0  | 202       |  1   |     24      |  nan     | 202       |  2   |     24.3    |  nan     | 202       |  3   |     nan     |  1000.3  | ... 

and remove 'stationid' groups, have more number of nans (taking account variables in count).

if try,

df.loc[df.groupby('station')['temperature'].filter(lambda x: len(x[pd.isnull(x)] ) < 30).index] 

it works, shown here: python pandas - remove groups based on nan count threshold

but above example takes account 'temperature' only. so, how can take account collective sum of nans of available variables? i.e.: remove group, collective sum of nans in [variable1, variable2, variable3,...] less threshold.

this should work:

df.groupby('stationid').filter(lambda g: g.isnull().sum().sum() < 4) 

you can replace 4 threshold number be.

df.groupby('stationid').filter(lambda g: g.isnull().sum().sum() < 4)     stationid    time    temperature pressure 0        123       1           30.0   1010.5 1        123       2           31.0   1009.0 2        202       1           24.0      nan 3        202       2           24.3      nan 4        202       3            nan   1000.3   df.groupby('stationid').filter(lambda g: g.isnull().sum().sum() < 3)     stationid    time    temperature pressure 0        123       1           30.0   1010.5 1        123       2           31.0   1009.0 

Comments

Popular posts from this blog

jOOQ update returning clause with Oracle -

java - Warning equals/hashCode on @Data annotation lombok with inheritance -

java - BasicPathUsageException: Cannot join to attribute of basic type -