hadoop - Apache Spark on EMR 10 node cluster for 150TB of data not completing -

February 15, 2014

i have s3 bucket, taking data , saving hdfs on different emr cluster. read these stored files stored on hdfs apache spark , perform joins , data filtering, save resultant dataset of around 150tb in csv format on hdfs. operation taking forever. using 64 executor, executor memory of 120gb , driver memory of 100gb. using databricks saving data in csv.

rest of hadoop setting of emr cluster on spark running default.

while running spark-submit gets following error:

error yarnscheduler: lost executor 5 on ip-xx-xx-xx.ec2.internal: container marked failed: container_14687884542720_0157_01_000006 on host: ip-xx-xx-xx.ec2.internal. exit status: 143. diagnostics: container killed on request. exit code 143 container exited non-zero exit code 143 killed external signal

kindly point me in right direction, fix this

Search This Blog

Perl

hadoop - Apache Spark on EMR 10 node cluster for 150TB of data not completing -

Comments

Post a Comment

Popular posts from this blog

jOOQ update returning clause with Oracle -

java - Warning equals/hashCode on @Data annotation lombok with inheritance -

java - BasicPathUsageException: Cannot join to attribute of basic type -