Filter or clean CSV data while loading to PostgreSQL -
i loading csv files postgresql tables using bulk load method copy
command. there fields have bad character in (like "|", """, ";" , on). keep getting different error while loading it. tried tab-delimited, comma-delimited, , other options, too, no luck.
is there way can clean csv data before loading postgresql using copy
command or there copy
command syntax can replace bad characters default?
these of syntax have tried:
copy tblsf '/filelocation/test.csv' csv header delimiter ',' null '?'; copy tblsf '/filelocation/test.csv' csv header delimiter '|' null '?'; copy tblsf '/filelocation/test.csv' csv header delimiter e'\t' null '?'; copy tblsf '/filelocation/test.csv' csv header delimiter '<>' null '?';
thanks in advance.
sometimes file not encoded using utf-8. try this:
iconv -f utf-8 -t utf-8 -c /filelocation/test.csv > /filelocation/test_clean.csv
and try postgresql copy (below command assumes fields separated commas):
copy tblsf '/filelocation/test_clean.csv' csv header delimiter ',';
if have mal-formed file, example:
company,owner john's pizza, llc,john smith burger co,jones, mike
you need resave data in corrected format. example:
"company","owner" "john's pizza, llc","john smith" "burger co","jones, mike"
once have clean file, can edit , resave using different delimiter (for example in excel, or using csv module in python). before saving new delimiter, want scrub delimiter out of file, example, in case of pipes |
:
sed -i 's/|//g' test_clean.csv
Comments
Post a Comment