python - What is the proper way to import large amounts data into a Firebase database? -
i'm working dataset of political campaign contributions ends being approximately 500mb json file (originally 124mb csv). it's far big import in firebase web interface (trying before crashed tab on google chrome). attempted manually uploading objects made csv (using csvtojson converter, each row becomes json object, , upload object firebase came).
here's code used.
var firebase = require('firebase'); var converter = require("csvtojson").converter; firebase.initializeapp({ serviceaccount: "./credentials.json", databaseurl: "url went here" }); var converter = new converter({ constructresult:false, workernum:4 }); var db = firebase.database(); var ref = db.ref("/"); var lastindex = 0; var count = 0; var section = 0; var sectionref; converter.on("record_parsed",function(resultrow,rawrow,rowindex){ if (rowindex >= 0) { sectionref = ref.child("reports" + section); var reportref = sectionref.child(resultrow.report_id); reportref.set(resultrow); console.log("report uploaded, count @ " + count + ", section @ " + section); count += 1; lastindex = rowindex; if (count >= 1000) { count = 0; section += 1; } if (section >= 100) { console.log("last completed index: " + lastindex); process.exit(); } } else { console.log("we out of indices"); process.exit(); } }); var readstream=require("fs").createreadstream("./vupload_master.csv"); readstream.pipe(converter);
however, ran memory issues , wasn't able complete dataset. trying in chunks not viable either firebase wasn't showing data uploaded , wasn't sure left off. (when leaving firebase database open in chrome, see data coming in, tab crash , upon reloading lot of later data missing.)
i tried using firebase streaming import, throws error:
started @ 1469471482.77 traceback (most recent call last): file "import.py", line 90, in <module> main(argparser.parse_args()) file "import.py", line 20, in main prefix, event, value in parser: file "r:\python27\lib\site-packages\ijson\common.py", line 65, in parse event, value in basic_events: file "r:\python27\lib\site-packages\ijson\backends\python.py", line 185, in basic_parse value in parse_value(lexer): file "r:\python27\lib\site-packages\ijson\backends\python.py", line 127, in parse_value raise unexpectedsymbol(symbol, pos) ijson.backends.python.unexpectedsymbol: unexpected symbol u'\ufeff' @ 0
looking last line (the error ijson), found this thread, i'm not sure how i'm supposed use firebase streaming import working.
i removed byte order mark using vim json file trying upload, , error after minute or of running importer:
traceback (most recent call last): file "import.py", line 90, in <module> main(argparser.parse_args()) file "import.py", line 20, in main prefix, event, value in parser: file "r:\python27\lib\site-packages\ijson\common.py", line 65, in parse event, value in basic_events: file "r:\python27\lib\site-packages\ijson\backends\python.py", line 185, in basic_parse value in parse_value(lexer): file "r:\python27\lib\site-packages\ijson\backends\python.py", line 116, in parse_value event in parse_array(lexer): file "r:\python27\lib\site-packages\ijson\backends\python.py", line 138, in parse_array event in parse_value(lexer, symbol, pos): file "r:\python27\lib\site-packages\ijson\backends\python.py", line 119, in parse_value event in parse_object(lexer): file "r:\python27\lib\site-packages\ijson\backends\python.py", line 170, in parse_object pos, symbol = next(lexer) file "r:\python27\lib\site-packages\ijson\backends\python.py", line 51, in lexer buf += data memoryerror
the firebase streaming importer supposed able handle files upwards of 250mb, , i'm have more enough ram handle file. ideas why error appearing?
if seeing actual json file i'm trying upload firebase streaming import help, here is.
i got around problem giving on firebase streaming import , writing own tool used csvtojson convert csv , firebase node api upload each object 1 @ time.
here's script:
var firebase = require("firebase"); firebase.initializeapp({ serviceaccount: "./credentials.json", databaseurl: "https://necir-hackathon.firebaseio.com/" }); var db = firebase.database(); var ref = db.ref("/reports"); var fs = require('fs'); var converter = require("csvtojson").converter; var header = "report_id,status,cpf_id,filing_id,report_type_id,report_type_description,amendment,amendment_reason,amendment_to_report_id,amended_by_report_id,filing_date,reporting_period,report_year,beginning_date,ending_date,beginning_balance,receipts,subtotal,expenditures,ending_balance,inkinds,receipts_unitemized,receipts_itemized,expenditures_unitemized,expenditures_itemized,inkinds_unitemized,inkinds_itemized,liabilities,savings_total,report_month,ui,reimbursee,candidate_first_name,candidate_last_name,full_name,full_name_reverse,bank_name,district_code,office,district,comm_name,report_candidate_first_name,report_candidate_last_name,report_office_district,report_comm_name,report_bank_name,report_candidate_address,report_candidate_city,report_candidate_state,report_candidate_zip,report_treasurer_first_name,report_treasurer_last_name,report_comm_address,report_comm_city,report_comm_state,report_comm_zip,category,candidate_clarification,rec_count,exp_count,inkind_count,liab_count,r1_count,cpf9_count,sv1_count,asset_count,savings_account_count,r1_item_count,cpf9_item_count,sv1_item_count,filing_mechanism,also_dissolution,segregated_account_type,municipality_code,current_report_id,location,individual_or_organization,notable_contributor,currently_accessed" var queue = []; var count = 0; var upload_lock = false; var linereader = require('readline').createinterface({ input: fs.createreadstream('test.csv') }); linereader.on('line', function (line) { var line = line.replace(/'/g, "\\'"); var csvstring = header + '\n' + line; var converter = new converter({}); converter.fromstring(csvstring, function(err,result){ if (err) { var errstring = err + "\n"; fs.appendfile('converter_error_log.txt', errstring, function(err){ if (err) { console.log("converter: append log file error below:"); console.error(err); process.exit(1); } else { console.log("converter error saved"); } }); } else { result[0].location = ""; result[0].individual_or_organization = ""; result[0].notable_contributor = ""; result[0].currently_accessed = ""; var reportref = ref.child(result[0].report_id); count += 1; reportref.set(result[0]); console.log("sent #" + count); } }); });
the caveat although script can send out objects, firebase apparently needed connection remain while saving them, closing script after objects sent resulted in lot of objects not appearing in database. (i waited 20 minutes sure, might shorter)
Comments
Post a Comment