python - Collapsing list to unique IDs with a range of dates -
i have large list of ids repeat different ranges of dates. need create unique list of ids 1 range of dates includes earliest start date , latest end date uncollapsed list.
this example of have:
id start_date end_date 1 9/25/2015 10/12/2015 1 9/16/2015 11/1/2015 1 8/25/2015 9/21/2015 2 9/2/2015 10/29/2015 3 9/18/2015 10/15/2015 3 9/19/2015 9/30/2015 4 8/27/2015 9/15/2015 and need.
id start_date end_date 1 8/25/2015 11/1/2015 2 9/2/2015 10/29/2015 3 9/18/2015 10/15/2015 4 8/27/2015 9/15/2015 i'm trying in python, not having luck. thanks!
use groupby/aggregate:
in [12]: df.groupby('id').agg({'start_date':min, 'end_date':max}) out[12]: start_date end_date id 1 2015-08-25 2015-11-01 2 2015-09-02 2015-10-29 3 2015-09-18 2015-10-15 4 2015-08-27 2015-09-15 note important start_date , end_date parsed dates, min , max return minimum , maximum dates each id. if values merely string representations of dates, min , max give string min or max depends on string lexicographic order. if date-strings in yyyy/mm/dd format, lexicographic order correspond parsed-date order, date-strings in mm/dd/yyyy format not have property.
if start_date , end_date have string values, then
for col in ['start_date', 'end_date']: df[col] = pd.to_datetime(df[col]) would convert strings dates.
if loading dataframe file using pd.read_table (or pd.read_csv), then
df = pd.read_table(filename, ..., parse_dates=[1, 2]) would parse strings in second , third columns of file dates. [1, 2] corresponds second , third columns since python uses 0-based indexing.
Comments
Post a Comment