python - How can I jump to next page in Scrapy -
i'm trying scrape results here using scrapy. problem not of classes appear on page until 'load more results' tab clicked.
the problem can seen here:
my code looks this:
class classcentralspider(crawlspider): name = "class_central" allowed_domains = ["www.class-central.com"] start_urls = ( 'https://www.class-central.com/courses/recentlyadded', ) rules = ( rule( linkextractor( # allow=("index\d00\.html",), restrict_xpaths=('//div[@id="show-more-courses"]',) ), callback='parse', follow=true ), ) def parse(self, response): x = response.xpath('//span[@class="course-name-text"]/text()').extract() item = classcentralitem() y in x: item['name'] = y print item['name'] pass
the second page website seems generated via ajax call. if network tab of browser inspection tool, you'll see like:
in case seems retrieving json file https://www.class-central.com/maestro/courses/recentlyadded?page=2&_=1469471093134
now seems url parameter _=1469471093134
nothing can trim away to: https://www.class-central.com/maestro/courses/recentlyadded?page=2
return json contains html code next page:
# need load data = json.loads(response.body) # , convert scrapy selector - sel = selector(text=data['table'])
to replicate in code try like:
from w3lib.url import add_or_replace_parameter def parse(self, response): # check if response json, if convert selector if response.meta.get('is_json',false): # convert json scrapy.selector here parsing sel = selector(text=json.loads(response.body)['table']) else: sel = selector(response) # parse page here items x = sel.xpath('//span[@class="course-name-text"]/text()').extract() item = classcentralitem() y in x: item['name'] = y print(item['name']) # next page next_page_el = respones.xpath("//div[@id='show-more-courses']") if next_page_el: # there next page next_page = response.meta.get('page',1) + 1 # make next page url url = add_or_replace_parameter(url, 'page', next_page) yield request(url, self.parse, meta={'page': next_page, 'is_json': true)
Comments
Post a Comment