python - Collapsing list to unique IDs with a range of dates -
i have large list of ids repeat different ranges of dates. need create unique list of ids 1 range of dates includes earliest start date , latest end date uncollapsed list.
this example of have:
id start_date end_date 1 9/25/2015 10/12/2015 1 9/16/2015 11/1/2015 1 8/25/2015 9/21/2015 2 9/2/2015 10/29/2015 3 9/18/2015 10/15/2015 3 9/19/2015 9/30/2015 4 8/27/2015 9/15/2015
and need.
id start_date end_date 1 8/25/2015 11/1/2015 2 9/2/2015 10/29/2015 3 9/18/2015 10/15/2015 4 8/27/2015 9/15/2015
i'm trying in python, not having luck. thanks!
use groupby/aggregate
:
in [12]: df.groupby('id').agg({'start_date':min, 'end_date':max}) out[12]: start_date end_date id 1 2015-08-25 2015-11-01 2 2015-09-02 2015-10-29 3 2015-09-18 2015-10-15 4 2015-08-27 2015-09-15
note important start_date
, end_date
parsed dates, min
, max
return minimum , maximum dates each id
. if values merely string representations of dates, min
, max
give string min or max depends on string lexicographic order. if date-strings in yyyy/mm/dd
format, lexicographic order correspond parsed-date order, date-strings in mm/dd/yyyy
format not have property.
if start_date
, end_date
have string values, then
for col in ['start_date', 'end_date']: df[col] = pd.to_datetime(df[col])
would convert strings dates.
if loading dataframe file using pd.read_table
(or pd.read_csv
), then
df = pd.read_table(filename, ..., parse_dates=[1, 2])
would parse strings in second , third columns of file dates. [1, 2]
corresponds second , third columns since python uses 0-based indexing.
Comments
Post a Comment