Is it generally better to transform semi-structured into structured data on Hadoop if the possibility exists? -
i have large , growing datasets of semi-structured data in json files on hadoop cluster. data benign 1 of keys holds list of maps can change heavily in size, can vary between 0 , few thousands of maps each few dozen keys themselves.
however data transformed 2 separate tables of structured data linked foreign keys. both narrow tables, 1 of them ten times long other.
i either keep data in semi-structured format , use wide-column store hbase store or alternatively use columnar storage parquet store data in 2 large relational tables.
it unlikely data format change, can't ruled out.
i'm new hadoop , big data, of 2 possibilities preferable? should semi-structured data changed structured data if possibility exists , data format constant?
edit: additional info requested rahul sharma.
the data consists of shopping carts shopping software, variable length comes variable number of items in carts. data in xml format transformed json, not me, have no control on step.
no realtime analytics planned, batch analytics.
the relationship in both tables 1 table customer/purchase info while other purchased items. both linked fitting key.
i hope helps.
Comments
Post a Comment