python - Merge two data frames while using boolean indices (to filter) -
i have may simple question related syntax, can't figure out.
i have 2 data frames, df1 , df2, i'd a) merge on specific columns, while b) simultaneously checking column in each data frame boolean relationship (>, <, or ==).
the crucial part need both , b simultaneously because data frames large. not work merge 2 data frames in 1 step, remove rows don't pass boolean logic in second step. because merged data frame very, large , cause me 2 run out of memory.
so, have:
df1: col_1 col_2 test_value 0 b 1 1 b 3 2 b 2 3 b 5 4 b 2 5 b 1
and
df2: col_1 col_2 test_value 0 b 1 1 b 3 2 b 2 3 b 5 4 b 2 5 b 1
(for simplicity, 2 data frames identical)
and i'd merge them, so:
df3 = pd.merge(df1, df2, left_on=['col_1'], right_on=['col_2'])
while simultaneously filtering row df1['test value'] less df2['test value'], so:
df3.loc[df3['test_value_x'] < df3['test_value_y']]
the result be:
col_1_x col_2_x test_value_x col_1_y col_2_y test_value_y 0 b 1 b 3 1 b 1 b 5 3 b 2 b 3 4 b 2 b 5 6 b 2 b 3 7 b 2 b 5 16 b 1 b 2 17 b 1 b 2
again, can in 2 steps, code above, creates memory problem me because intermediate data frame large.
so there syntax combine this,
df3 = pd.merge(df1, df2, left_on=['col_1'], right_on=['col_2'])
with this,
df3.loc[df3['test_value_x'] < df3['test_value_y']]
try this:
import pandas pd df1_col1 = pd.series(['a', 'b', 'a', 'b', 'a', 'b'], index=[0, 1, 2, 3, 4, 5 ]) df1_col2 = pd.series(['b', 'a', 'b', 'a', 'b', 'a'], index=[0, 1, 2, 3, 4, 5]) df1_col3 = pd.series([1, 3, 2, 5, 2, 1], index=[0, 1, 2, 3, 4, 5]) df1 = pd.concat([df1_col1, df1_col2, df1_col3], axis=1) df1 = df1.rename(columns={0: 'col_1', 1: 'col_2', 2: 'test_value'}) df2 = df1.copy(deep=true)
to answer question above:
df3 = pd.merge(df1, df2, left_on=['col_1'], right_on=['col_2'])[pd.merge(df1, df2, left_on=['col_1'], right_on=['col_2'])['test_value_x'] <pd.merge(df1, df2, left_on=['col_1'], right_on=['col_2'])['test_value_y']]
Comments
Post a Comment