Scraping Table With Python/BS4 -
im trying scrape "team stats" table http://www.pro-football-reference.com/boxscores/201602070den.htm bs4 , python 2.7. im unable anywhere close it,
url = 'http://www.pro-football-reference.com/boxscores/201602070den.htm' page = requests.get(url) soup = beautifulsoup(page.text, "html5lib") table=soup.findall('table', {'id':"team_stats", "class":"stats_table"}) print table
i thought above code work no luck.
the problem in case "team stats" table located inside comment in html source download requests
. locate comment , reparse beautifulsoup
"soup" object:
import requests bs4 import beautifulsoup, navigablestring url = 'http://www.pro-football-reference.com/boxscores/201602070den.htm' page = requests.get(url, headers={'user-agent': 'mozilla/5.0 (macintosh; intel mac os x 10_11_4) applewebkit/537.36 (khtml, gecko) chrome/51.0.2704.103 safari/537.36'}) soup = beautifulsoup(page.content, "html5lib") comment = soup.find(text=lambda x: isinstance(x, navigablestring) , "team_stats" in x) soup = beautifulsoup(comment, "html5lib") table = soup.find("table", id="team_stats") print(table)
and/or, can load table into, example, pandas
dataframe convenient work with:
import pandas pd import requests bs4 import beautifulsoup bs4 import navigablestring url = 'http://www.pro-football-reference.com/boxscores/201602070den.htm' page = requests.get(url, headers={'user-agent': 'mozilla/5.0 (macintosh; intel mac os x 10_11_4) applewebkit/537.36 (khtml, gecko) chrome/51.0.2704.103 safari/537.36'}) soup = beautifulsoup(page.content, "html5lib") comment = soup.find(text=lambda x: isinstance(x, navigablestring) , "team_stats" in x) df = pd.read_html(comment)[0] print(df)
prints:
unnamed: 0 den car 0 first downs 11 21 1 rush-yds-tds 28-90-1 27-118-1 2 cmp-att-yd-td-int 13-23-141-0-1 18-41-265-0-1 3 sacked-yards 5-37 7-68 4 net pass yards 104 197 5 total yards 194 315 6 fumbles-lost 3-1 4-3 7 turnovers 2 4 8 penalties-yards 6-51 12-102 9 third down conv. 1-14 3-15 10 fourth down conv. 0-0 0-0 11 time of possession 27:13 32:47
Comments
Post a Comment