python - How can I jump to next page in Scrapy -


i'm trying scrape results here using scrapy. problem not of classes appear on page until 'load more results' tab clicked.

the problem can seen here:

enter image description here

my code looks this:

class classcentralspider(crawlspider):     name = "class_central"     allowed_domains = ["www.class-central.com"]     start_urls = (         'https://www.class-central.com/courses/recentlyadded',     )     rules = (         rule(             linkextractor(                 # allow=("index\d00\.html",),                 restrict_xpaths=('//div[@id="show-more-courses"]',)             ),             callback='parse',             follow=true         ),     )  def parse(self, response):     x = response.xpath('//span[@class="course-name-text"]/text()').extract()     item = classcentralitem()     y in x:         item['name'] = y         print item['name']      pass 

the second page website seems generated via ajax call. if network tab of browser inspection tool, you'll see like:

firebug network tab

in case seems retrieving json file https://www.class-central.com/maestro/courses/recentlyadded?page=2&_=1469471093134

now seems url parameter _=1469471093134 nothing can trim away to: https://www.class-central.com/maestro/courses/recentlyadded?page=2
return json contains html code next page:

# need load  data = json.loads(response.body)  # , convert scrapy selector -  sel = selector(text=data['table']) 

to replicate in code try like:

from w3lib.url import add_or_replace_parameter  def parse(self, response):     # check if response json, if convert selector     if response.meta.get('is_json',false):         # convert json scrapy.selector here parsing         sel = selector(text=json.loads(response.body)['table'])     else:         sel = selector(response)      # parse page here items     x = sel.xpath('//span[@class="course-name-text"]/text()').extract()     item = classcentralitem()     y in x:         item['name'] = y         print(item['name'])     # next page     next_page_el = respones.xpath("//div[@id='show-more-courses']")     if next_page_el:  # there next page         next_page = response.meta.get('page',1) + 1         # make next page url         url = add_or_replace_parameter(url, 'page', next_page)         yield request(url, self.parse, meta={'page': next_page, 'is_json': true) 

Comments

Popular posts from this blog

jOOQ update returning clause with Oracle -

java - Warning equals/hashCode on @Data annotation lombok with inheritance -

java - BasicPathUsageException: Cannot join to attribute of basic type -