node.js - unable to complete promises due to out of memory -
i have script scrape ~1000 webpages. i'm using promise.all fire them together, , returns when pages done:
promise.all(urls.map(url => scrap(url))) .then(results => console.log('all done!', results));
this sweet , correct, except 1 thing - machine goes out of memory because of concurrent requests. i'm use jsdom
scrapping, takes few gb of mem, understandable considering instantiates hundreds of window
.
i have idea fix don't it. is, change control flow not use promise.all, chain promises:
let results = {}; urls.reduce((prev, cur) => prev .then(() => scrap(cur)) .then(result => results[cur] = result) // ^ not nice. , promise.resolve()) .then(() => console.log('all done!', results));
this not promise.all... not performant it's chained, , returned values have stored later processing.
any suggestions? should improve control flow or should improve mem usage in scrap(), or there way let node throttle mem allocation?
you trying run 1000 web scrapes in parallel. need pick number less 1000 , run n @ time consume less memory while doing so. can still use promise keep track of when done.
bluebird's promise.map()
can passing concurrency value option. or, write yourself.
i have idea fix don't it. is, change control flow not use promise.all, chain promises:
what want n operations in flight @ same time. sequencing special case n = 1
slower doing of them in parallel (perhaps n = 10
).
this not promise.all... not performant it's chained, , returned values have stored later processing.
if stored values part of memory problem, may have store them out of memory somewhere ways. have analyze how memory stored results using.
any suggestions? should improve control flow or should improve mem usage in scrap(), or there way let node throttle mem allocation?
use bluebird's promise.map()
or write similar yourself. writing runs n operations in parallel , keeps results in order not rocket science, bit of work right. i've presented before in answer, can't seem find right now. keep looking.
found prior related answer here: make several requests api can handle 20 request minute
Comments
Post a Comment