python - How to fix "unexpected keyword argument 'useChardet'" in html5lib -
i'm using html5lib , after updating latest version, keep getting error:
traceback (most recent call last): file "/home/travis/build/freelawproject/juriscraper/tests/test_everything.py", line 119, in test_scrape_all_example_files site.parse() file "/home/travis/build/freelawproject/juriscraper/juriscraper/abstractsite.py", line 95, in parse self.html = self._download() file "/home/travis/build/freelawproject/juriscraper/juriscraper/abstractsite.py", line 384, in _download html_tree = self._make_html_tree(text) file "/home/travis/build/freelawproject/juriscraper/juriscraper/opinions/united_states/federal_appellate/ca11_u.py", line 26, in _make_html_tree e = html5parser.document_fromstring(text) file "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/lxml/html/html5parser.py", line 64, in document_fromstring return parser.parse(html, usechardet=guess_charset).getroot() file "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/html5lib/html5parser.py", line 235, in parse self._parse(stream, false, none, *args, **kwargs) file "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/html5lib/html5parser.py", line 85, in _parse self.tokenizer = _tokenizer.htmltokenizer(stream, parser=self, **kwargs) file "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/html5lib/_tokenizer.py", line 36, in __init__ self.stream = htmlinputstream(stream, **kwargs) file "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/html5lib/_inputstream.py", line 149, in htmlinputstream return htmlunicodeinputstream(source, **kwargs) typeerror: __init__() got unexpected keyword argument 'usechardet'
the code i'm using simple:
from lxml.html import html5parser html5parser.document_fromstring(u'<html></html')
any ideas?
turns out if feed unicode object document_fromstring
method, barfs. didn't used because happened when updated dependencies.
anyway, fix easy:
html5parser.document_fromstring(u'<html></html'.encode('utf-8'))
Comments
Post a Comment