Add spark_application_submit_latency_seconds metric to measure operator submission performance

### What feature you would like to be added?

Add a new Prometheus metric `spark_application_submit_latency_seconds` to track SparkApplication submission latency, measuring the time from application creation to the submitted state. This provides visibility into operator submission performance and helps identify bottlenecks in the job submission pipeline.           

### Why is this needed?

The existing `spark_application_start_latency_seconds` includes external factors (K8s scheduler, resource availability, Yunikorn queues, image pulls, pod initialization).
                                                                                                                                                                                                                                         
  **Problem:** When start latency is high, we can't tell if the operator is slow or if cluster resources are constrained.                                                                                                                
                                                                                                                                                                                                                                         
  **Solution:** By comparing both metrics:                                                                                                                                                                                               
  - `submit_latency` = 2s, `start_latency` = 5min → Infrastructure issue (scale cluster)
  - `submit_latency` = 4min, `start_latency` = 5min → Operator issue (tune operator)                                                                                                                                                     
                                                                                                                                                                                                                                         
  This enables:                                                                                                                                                                                                                          
  - Accurate operator SLA monitoring (separate from infrastructure)                                                                                                                                                                      
  - Root cause analysis (operator vs. K8s vs. queue saturation)                                                                                                                                                                          
  - Better capacity planning                   

### Describe the solution you would like

Add `spark_application_submit_latency_seconds` metric:                        
  - Measures: Creation → Submitted state (operator work only)                                                                                                                                                                            
  - Includes: Summary (percentiles) + Histogram (distribution)                                                                                                                                                                           
  - Buckets: `[0.5, 1, 2, 4, 8, 16, 32, 64, 128, 256]` seconds (exponential, typical range 0.5-8s)                                                                                                                                       
  - Flag: `--metrics-job-submit-latency-buckets` (configurable)                                                                                                                                                                          
  - Records on first submission only (SubmissionAttempts == 1)                  

### Describe alternatives you have considered

Use existing metrics only → Can't isolate operator performance             
Parse operator logs for timestamps → Not suitable for dashboards/alerts                                                                                                                                                                      

### Additional context

_No response_

### Love this feature?

Give it a 👍 We prioritize the features with most 👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add spark_application_submit_latency_seconds metric to measure operator submission performance #2911

What feature you would like to be added?

Why is this needed?

Describe the solution you would like

Describe alternatives you have considered

Additional context

Love this feature?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add spark_application_submit_latency_seconds metric to measure operator submission performance #2911

Description

What feature you would like to be added?

Why is this needed?

Describe the solution you would like

Describe alternatives you have considered

Additional context

Love this feature?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions