mapreduce - Resource manager does not transit to active state from standby -


one spark job running more 23 days , caused resource manager crash. after restarting resource manager istance (there 2 of them in our cluster) both of them stayed in standby state.

and getting error:

error org.apache.hadoop.yarn.server.resourcemanager.resourcemanager failed load/recover state org.apache.hadoop.yarn.exceptions.yarnexception: application id application_1470300000724_40101 present! cannot add duplicate!

we not kill 'application_1470300000724_40101' yarn resource manager not working. killed instances unix level on nodes dint work. have tried rebooting nodes , still same.

somewhere 1 entry of job still there , preventing resource manager elected active. using cloudera 5.3.0 , can see issue has been addressed , resolved in cloudera 5.3.3. @ moment need workaround past now.

to resolve issue can format rmstatestore executing below command:

yarn resourcemanager -format-state-store 

but careful clear application history executed before executing command.


Comments

Popular posts from this blog

serialization - Convert Any type in scala to Array[Byte] and back -

matplotlib support failed in PyCharm on OSX -

python - Matplotlib: TypeError: 'AxesSubplot' object is not callable -