python - Merge two data frames while using boolean indices (to filter) -


i have may simple question related syntax, can't figure out.

i have 2 data frames, df1 , df2, i'd a) merge on specific columns, while b) simultaneously checking column in each data frame boolean relationship (>, <, or ==).

the crucial part need both , b simultaneously because data frames large. not work merge 2 data frames in 1 step, remove rows don't pass boolean logic in second step. because merged data frame very, large , cause me 2 run out of memory.

so, have:

df1:     col_1   col_2   test_value 0         b       1 1   b             3 2         b       2 3   b             5 4         b       2 5   b             1 

and

df2:     col_1   col_2   test_value 0         b       1 1   b             3 2         b       2 3   b             5 4         b       2 5   b             1 

(for simplicity, 2 data frames identical)

and i'd merge them, so:

df3 = pd.merge(df1, df2, left_on=['col_1'], right_on=['col_2']) 

while simultaneously filtering row df1['test value'] less df2['test value'], so:

df3.loc[df3['test_value_x'] < df3['test_value_y']] 

the result be:

    col_1_x col_2_x test_value_x    col_1_y col_2_y test_value_y 0         b       1               b             3 1         b       1               b             5 3         b       2               b             3 4         b       2               b             5 6         b       2               b             3 7         b       2               b             5 16  b             1                     b       2 17  b             1                     b       2 

again, can in 2 steps, code above, creates memory problem me because intermediate data frame large.

so there syntax combine this,

df3 = pd.merge(df1, df2, left_on=['col_1'], right_on=['col_2']) 

with this,

df3.loc[df3['test_value_x'] < df3['test_value_y']] 

try this:

import pandas pd  df1_col1 = pd.series(['a', 'b', 'a', 'b', 'a', 'b'], index=[0, 1, 2, 3, 4, 5 ]) df1_col2 = pd.series(['b', 'a', 'b', 'a', 'b', 'a'], index=[0, 1, 2, 3, 4, 5]) df1_col3 = pd.series([1, 3, 2, 5, 2, 1], index=[0, 1, 2, 3, 4, 5]) df1 = pd.concat([df1_col1, df1_col2, df1_col3], axis=1)  df1 = df1.rename(columns={0: 'col_1', 1: 'col_2', 2: 'test_value'}) df2 = df1.copy(deep=true) 

to answer question above:

df3 = pd.merge(df1, df2, left_on=['col_1'], right_on=['col_2'])[pd.merge(df1, df2, left_on=['col_1'], right_on=['col_2'])['test_value_x']       <pd.merge(df1, df2, left_on=['col_1'], right_on=['col_2'])['test_value_y']] 

Comments

Popular posts from this blog

jOOQ update returning clause with Oracle -

java - Warning equals/hashCode on @Data annotation lombok with inheritance -

java - BasicPathUsageException: Cannot join to attribute of basic type -