Python pandas make new column from data in existing column and from another dataframe -
i have dataframe called 'mydata', , if do
len(mydata.loc['2015-9-2'])
it counts number of rows in mydata have date, , returns number like
1067
i have dataframe called 'yourdata' looks like
timestamp 51 2015-06-22 52 2015-06-23 53 2015-06-24 54 2015-06-25 43 2015-07-13
now want use each date in yourdata instead of typing in each date
len(mydata.loc['2015-9-2'])
i can iterate through 'yourdata' using them like
len(mydata.loc[yourdata['timestamp']])
and produce new dataframe results or add new column yourdata result each date, i'm lost how this?
the following not work
yourdata['result'] = len(mydata.loc[yourdata['timestamp']])
neither this
yourdata['result'] = len(mydata.loc[yourdata.iloc[:,-3]])
this work
yourdata['result'] = len(mydata.loc['2015-9-2'])
buts no want use date in each row not fixed date.
edit: first few rows of mydata
timestamp bpm 0 2015-08-30 16:48:00 65 1 2015-08-30 16:48:10 65 2 2015-08-30 16:48:15 66 3 2015-08-30 16:48:20 67 4 2015-08-30 16:48:30 70
import numpy np import pandas pd mydata = pd.dataframe({'timestamp': ['2015-06-22 16:48:00']*3 + ['2015-06-23 16:48:00']*2 + ['2015-06-24 16:48:00'] + ['2015-06-25 16:48:00']*4 + ['2015-07-13 16:48:00', '2015-08-13 16:48:00'], 'bpm': [65]*8 + [70]*4}) mydata['timestamp'] = pd.to_datetime(mydata['timestamp']) print(mydata) # bpm timestamp # 0 65 2015-06-22 16:48:00 # 1 65 2015-06-22 16:48:00 # 2 65 2015-06-22 16:48:00 # 3 65 2015-06-23 16:48:00 # 4 65 2015-06-23 16:48:00 # 5 65 2015-06-24 16:48:00 # 6 65 2015-06-25 16:48:00 # 7 65 2015-06-25 16:48:00 # 8 70 2015-06-25 16:48:00 # 9 70 2015-06-25 16:48:00 # 10 70 2015-07-13 16:48:00 # 11 70 2015-08-13 16:48:00 yourdata = pd.series(['2015-06-22', '2015-06-23', '2015-06-24', '2015-06-25', '2015-07-13'], name='timestamp') yourdata = pd.to_datetime(yourdata).to_frame() print(yourdata) # 0 2015-06-22 # 1 2015-06-23 # 2 2015-06-24 # 3 2015-06-25 # 4 2015-07-13 result = (mydata.set_index('timestamp').resample('d') .size().loc[yourdata['timestamp']] .reset_index()) result.columns = ['timestamp', 'result'] print(result) # timestamp result # 0 2015-06-22 3 # 1 2015-06-23 2 # 2 2015-06-24 1 # 3 2015-06-25 4 # 4 2015-07-13 1
Comments
Post a Comment