What feature you would like to be added?
spark_application_failed_submission_count add failure_reason as dimension.
spark_application_failed_submission_count{failure_reason=}
Why is this needed?
There are some obvious reason which has nothing to do with operator specially capacity issues with cluster. If we try to submit spark app high rps this metrics shoots up we can not use this signal for alerting.
As of now the idea is is to add dimension so that we get the correct signal for the failure reason.
Describe the solution you would like
ErrorMessage is already in ApplicationState
what if we assign code some code to failure and add them in failure results.
while incrementing the failed counter we need to get the reason from the code.
we just need to update m.incFailedSubmissionCount(newApp) this function and may be some constants
Describe alternatives you have considered
No response
Additional context
Enrich the failure reason for spark-submit failure reason.
Love this feature?
Give it a 👍 We prioritize the features with most 👍
What feature you would like to be added?
spark_application_failed_submission_count add failure_reason as dimension.
spark_application_failed_submission_count{failure_reason=}
Why is this needed?
There are some obvious reason which has nothing to do with operator specially capacity issues with cluster. If we try to submit spark app high rps this metrics shoots up we can not use this signal for alerting.
As of now the idea is is to add dimension so that we get the correct signal for the failure reason.
Describe the solution you would like
ErrorMessage is already in ApplicationState
what if we assign code some code to failure and add them in failure results.
while incrementing the failed counter we need to get the reason from the code.
we just need to update m.incFailedSubmissionCount(newApp) this function and may be some constants
Describe alternatives you have considered
No response
Additional context
Enrich the failure reason for spark-submit failure reason.
Love this feature?
Give it a 👍 We prioritize the features with most 👍