Skip to content

Failure reason as dimension in spark_application_failed_submission_count #2904

@dineshkumar181094

Description

@dineshkumar181094

What feature you would like to be added?

spark_application_failed_submission_count add failure_reason as dimension.
spark_application_failed_submission_count{failure_reason=}

Why is this needed?

There are some obvious reason which has nothing to do with operator specially capacity issues with cluster. If we try to submit spark app high rps this metrics shoots up we can not use this signal for alerting.

As of now the idea is is to add dimension so that we get the correct signal for the failure reason.

Describe the solution you would like

ErrorMessage is already in ApplicationState
what if we assign code some code to failure and add them in failure results.
while incrementing the failed counter we need to get the reason from the code.
we just need to update m.incFailedSubmissionCount(newApp) this function and may be some constants

Describe alternatives you have considered

No response

Additional context

Enrich the failure reason for spark-submit failure reason.

Love this feature?

Give it a 👍 We prioritize the features with most 👍

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions