hadoop - Apache Spark on EMR 10 node cluster for 150TB of data not completing -
i have s3 bucket, taking data , saving hdfs on different emr cluster. read these stored files stored on hdfs apache spark , perform joins , data filtering, save resultant dataset of around 150tb in csv format on hdfs. operation taking forever. using 64 executor, executor memory of 120gb , driver memory of 100gb. using databricks saving data in csv.
rest of hadoop setting of emr cluster on spark running default.
while running spark-submit gets following error:
error yarnscheduler: lost executor 5 on ip-xx-xx-xx.ec2.internal: container marked failed: container_14687884542720_0157_01_000006 on host: ip-xx-xx-xx.ec2.internal. exit status: 143. diagnostics: container killed on request. exit code 143 container exited non-zero exit code 143 killed external signal
kindly point me in right direction, fix this
Comments
Post a Comment