python - Repeating strings in pandas DF -- want to return list of unique strings -
i have bunch of rows of data in pandas df contain inconsistently offsetting string characters. each game id (another column), 2 string characters unique game id, not switch off in predicatble pattern. regardless, i'm trying write helper function takes each unique game id , gets 2 team names associated it.
for example...
index game_id 0 400827888 1 400827888 2 400827888 3 400827888 4 400827888 ... 555622 400829117 555623 400829117 555624 400829117 555625 400829117
index team 0 atl 1 det 2 atl 3 det 4 atl ... 555622 por 555623 den 555624 por 555625 por
here woeful attempt, not working.
def get_teams(df): in df['gameid']: both_teams = [df['team'].astype(str)] return(both_teams)
i'd return ['atl', 'det] game id 400827888 , ['por', 'den'] game id 400829117. instead, returning team name associated each index.
you can use seriesgroupby.unique
:
print (df.groupby('game_id')['team'].unique()) game_id 400827888 [atl, det] 400829117 [por, den] name: team, dtype: object
for looping use iterrows
:
for i, g in df.groupby('game_id')['team'].unique().reset_index().iterrows(): print (g.game_id) print (g.team)
edit:
if need find game_id
string (e.g. det
) use boolean indexing
:
s = df.groupby('game_id')['team'].unique() print (s[s.apply(lambda x: 'det' in x)].index.tolist()) [400827888]
Comments
Post a Comment