Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion enterprise/analytics.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,9 @@

Once the deployment status shows **Ready**, navigate to `https://analytics.app.<your-base-domain>`.

Click the **Continue with Keycloak** button:

Check warning on line 58 in enterprise/analytics.mdx

View check run for this annotation

Mintlify / Mintlify Validation (allhandsai) - vale-spellcheck

enterprise/analytics.mdx#L58

Did you really mean 'Keycloak'?

![Laminar Keycloak Auth](./images/laminar-keycloak-auth.png)

Check warning on line 60 in enterprise/analytics.mdx

View check run for this annotation

Mintlify / Mintlify Validation (allhandsai) - vale-spellcheck

enterprise/analytics.mdx#L60

Did you really mean 'Keycloak'?

## Create a Laminar project

Expand All @@ -71,7 +71,7 @@

Important: Always use ingest API keys when deploying.

Create a key with ther right permissions. Ingest only keys are recommended as they only have write access to write traces. They cannot be used to read data.
Create a key with the right permissions. Ingest only keys are recommended as they only have write access to write traces. They cannot be used to read data.

![Configure Laminar Ingest Only Key](./images/laminar-ingest-only-key.png)

Expand Down
2 changes: 1 addition & 1 deletion openhands/usage/use-cases/spark-migrations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

Spark version upgrades are deceptively difficult. The [Spark 3.0 migration guide](https://spark.apache.org/docs/latest/migration-guide.html) alone documents hundreds of behavioral changes, deprecated APIs, and removed features, and many of these changes are _semantic_. That means the same code compiles and runs but produces different results across different Spark versions: for example, a date parsing expression that worked correctly in Spark 2.4 may silently return different values in Spark 3.x due to the switch from the Julian calendar to the Gregorian calendar.

Version upgrades are also made difficult due to the scale of typical enterprise Spark codebases. When you have dozens of jobs across ETL, reporting, and ML pipelines, each with its own combination of DataFrame operations, UDFs, and configuration, manual migration stops scaling well and becomes prone to subtle regressions.

Check warning on line 20 in openhands/usage/use-cases/spark-migrations.mdx

View check run for this annotation

Mintlify / Mintlify Validation (allhandsai) - vale-spellcheck

openhands/usage/use-cases/spark-migrations.mdx#L20

Did you really mean 'UDFs'?

Spark migration requires careful analysis, targeted code changes, and thorough validation to ensure that migrated pipelines produce identical results. The migration needs to be driven by an experienced data engineering team, but even that isn't sufficient to ensure the job is done quickly or without regressions. This is where OpenHands comes in.

Expand All @@ -31,7 +31,7 @@

## Understanding

Before changin any code, it helps to build a clear picture of what is affected and where the risk is concentrated. Spark migrations touch a large surface area, between API deprecations, behavioral changes, configuration defaults, and dependency versions, and the interactions between them are hard to reason about manually.
Before changing any code, it helps to build a clear picture of what is affected and where the risk is concentrated. Spark migrations touch a large surface area, between API deprecations, behavioral changes, configuration defaults, and dependency versions, and the interactions between them are hard to reason about manually.

Apache releases detailed lists of changes between each major and minor version of Spark. OpenHands can utilize this list of changes while scanning your codebase to produce a structured inventory of everything that needs attention. This inventory becomes the foundation for the migration itself, helping you prioritize work and track progress.

Expand All @@ -57,10 +57,10 @@
"deprecated_apis": [
{"line": 42, "current": "df.registerTempTable(\"temp\")", "replacement": "df.createOrReplaceTempView(\"temp\")"}
],
"behavioral_changes": [

Check warning on line 60 in openhands/usage/use-cases/spark-migrations.mdx

View check run for this annotation

Mintlify / Mintlify Validation (allhandsai) - vale-spellcheck

openhands/usage/use-cases/spark-migrations.mdx#L60

Did you really mean 'behavioral_changes'?
{"line": 78, "description": "to_date() uses proleptic Gregorian calendar in Spark 3.x; verify date handling with test data"}
],
"config_changes": [],

Check warning on line 63 in openhands/usage/use-cases/spark-migrations.mdx

View check run for this annotation

Mintlify / Mintlify Validation (allhandsai) - vale-spellcheck

openhands/usage/use-cases/spark-migrations.mdx#L63

Did you really mean 'config_changes'?
"risk": "medium"
},
...
Expand Down Expand Up @@ -91,7 +91,7 @@
2. For behavioral changes (especially date handling and CSV parsing), add explicit configuration to preserve Spark 2.4 behavior where needed (e.g., spark.sql.legacy.timeParserPolicy=LEGACY)
3. Update build.sbt / pom.xml dependencies to Spark 3.0 compatible versions
4. Replace RDD-based operations with DataFrame/Dataset equivalents where practical
5. Replace UDFs with built-in Spark SQL functions where a direct equivalent exists

Check warning on line 94 in openhands/usage/use-cases/spark-migrations.mdx

View check run for this annotation

Mintlify / Mintlify Validation (allhandsai) - vale-spellcheck

openhands/usage/use-cases/spark-migrations.mdx#L94

Did you really mean 'UDFs'?
6. Update import statements for any relocated classes
7. Preserve all existing business logic and output schemas
```
Expand Down Expand Up @@ -125,10 +125,10 @@
"name": "daily_etl",
"data_match": true,
"row_count": {"v2": 1000000, "v3": 1000000},
"column_diffs": [],

Check warning on line 128 in openhands/usage/use-cases/spark-migrations.mdx

View check run for this annotation

Mintlify / Mintlify Validation (allhandsai) - vale-spellcheck

openhands/usage/use-cases/spark-migrations.mdx#L128

Did you really mean 'column_diffs'?
"performance": {
"duration_seconds": {"v2": 340, "v3": 285},
"shuffle_bytes": {"v2": "2.1GB", "v3": "1.8GB"}

Check warning on line 131 in openhands/usage/use-cases/spark-migrations.mdx

View check run for this annotation

Mintlify / Mintlify Validation (allhandsai) - vale-spellcheck

openhands/usage/use-cases/spark-migrations.mdx#L131

Did you really mean 'shuffle_bytes'?
}
},
...
Expand All @@ -138,13 +138,13 @@

Note this prompt relies on existing data in `/test_data`. This can be generated by standard fuzzing tools, but in a pinch OpenHands can also help construct synthetic data that stresses the potential corner cases in the relevant systems.

Every migration is unique, and developer experience is crucial to ensure the testing strategy covers your organization's requirements. Pay particular attention to jobs that involve date arithmetic, decimal precision in financial calculations, or custom UDFs that may depend on Spark internals. A solid validation suite not only ensures the migrated code works as expected, but also builds the organizational confidence needed to deploy the new version to production.

Check warning on line 141 in openhands/usage/use-cases/spark-migrations.mdx

View check run for this annotation

Mintlify / Mintlify Validation (allhandsai) - vale-spellcheck

openhands/usage/use-cases/spark-migrations.mdx#L141

Did you really mean 'UDFs'?

## Beyond Version Upgrades

While this document focuses on Spark version upgrades, the same Understanding → Migration → Validation workflow applies to other Spark migration scenarios:

- **Cloud platform migrations** (e.g., EMR to Databricks, on-premises to Dataproc): The "understanding" step inventories platform-specific code (S3 paths, IAM roles, EMR bootstrap scripts), the migration step converts them to the target platform's equivalents, and validation confirms that jobs produce identical output in the new environment.

Check warning on line 147 in openhands/usage/use-cases/spark-migrations.mdx

View check run for this annotation

Mintlify / Mintlify Validation (allhandsai) - vale-spellcheck

openhands/usage/use-cases/spark-migrations.mdx#L147

Did you really mean 'Databricks'?

Check warning on line 147 in openhands/usage/use-cases/spark-migrations.mdx

View check run for this annotation

Mintlify / Mintlify Validation (allhandsai) - vale-spellcheck

openhands/usage/use-cases/spark-migrations.mdx#L147

Did you really mean 'Dataproc'?
- **Framework migrations** (MapReduce, Hive, or Pig to Spark): The "understanding" step maps the existing framework's operations to Spark equivalents, the migration step performs the conversion, and validation compares outputs between the old and new frameworks.

In each case, the key principle is the same: build a structured inventory of what needs to change, apply targeted transformations, and validate rigorously before deploying.
Expand Down
Loading