mapreduce - Resource manager does not transit to active state from standby -
one spark job running more 23 days , caused resource manager crash. after restarting resource manager istance (there 2 of them in our cluster) both of them stayed in standby state.
and getting error:
error org.apache.hadoop.yarn.server.resourcemanager.resourcemanager failed load/recover state org.apache.hadoop.yarn.exceptions.yarnexception: application id application_1470300000724_40101 present! cannot add duplicate!
we not kill 'application_1470300000724_40101' yarn resource manager not working. killed instances unix level on nodes dint work. have tried rebooting nodes , still same.
somewhere 1 entry of job still there , preventing resource manager elected active. using cloudera 5.3.0 , can see issue has been addressed , resolved in cloudera 5.3.3. @ moment need workaround past now.
to resolve issue can format rmstatestore executing below command:
yarn resourcemanager -format-state-store
but careful clear application history executed before executing command.
Comments
Post a Comment