Skip to content

Update Salesforce Connector To Create Source Models#9491

Open
cwarden wants to merge 4 commits into
rilldata:mainfrom
cwarden:update-salesforce-connector
Open

Update Salesforce Connector To Create Source Models#9491
cwarden wants to merge 4 commits into
rilldata:mainfrom
cwarden:update-salesforce-connector

Conversation

@cwarden

@cwarden cwarden commented May 22, 2026

Copy link
Copy Markdown
Contributor

Update the Salesforce driver to create salesforce connectors and source models rather than the deprecated sources. Re-enable it in the Add Data modal.

Update the Salesforce connector to use OAuth Username/Password flow because SOAP login is being decommissioned. Client Credentials and JWT flows are also supported.

Update Salesforce models to use Bulk API 2.0, which doesn't require the sobject property, allowing it to work with the New Model modal.

cwarden added 2 commits May 22, 2026 09:17
The Salesforce connector now uses Salesforce's OAuth 2.0 endpoints
(SOAP login is being decommissioned) and fits the warehouse-style
connector + model flow used by other warehouses.

- Authentication: drop SOAP login; pick between OAuth password flow,
  client credentials flow, and JWT bearer based on which credentials
  are populated. Adds a required `client_secret` for username/password
  authentication.
- Schema: mark Salesforce as a warehouse so credentials are written to
  `connectors/<name>.yaml` and the SOQL query lives in
  `models/<name>.yaml`. The `key` field is uploaded as a file and
  base64-encoded before being stored in `.env` so embedded newlines
  do not break dotenv parsing; the driver accepts raw PEM as well.
- Driver: accepts `sql:` as an alias for `soql:` to match the model
  shape produced by the explorer.
- Bulk API 2.0: replace the v1 query path with `Bulk2QueryJob`. v2
  manages chunking server-side and derives the SObject from the query
  itself, so `sobject:` is no longer a required model property.
- SELECT * rewrite: SOQL does not accept `SELECT *`, but the connector
  explorer's "Table" mode produces it. The driver detects `SELECT * FROM
  <SObject>` (with optional WHERE / ORDER BY / LIMIT), calls
  DescribeSObject, and rewrites the query into an explicit field list.
  Compound `address` / `location` types are skipped — their atomic
  components remain because Salesforce exposes them as their own fields.
- Information schema: implement `ListDatabaseSchemas` / `ListTables` /
  `GetTable` via `force.ListSobjects` and `force.DescribeSObject`, so
  the connector explorer can browse SObjects as tables and clear the
  "does not implement information schema" error.
- Shared auth helpers: extract `connection.authOptions()` and
  `sourceProperties.applyOverrides()` so InformationSchema, Ping, and
  QueryAsFiles all build their auth options the same way.
Comment thread runtime/drivers/salesforce/bulk2.go Outdated
}
}

page, err := j.session.GetBulk2QueryResultsWithContext(ctx, j.jobID, j.locator, 0)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetBulk2QueryResultsWithContext is called with maxRecords=0, which omits the maxRecords parameter from the request and leaves the page size up to Salesforce. The library reads the entire response body into memory (resp.ReadResponseBody is a []byte), so a single page — potentially the whole result set — is buffered in RAM before being written to a temp file. The previous Bulk 1.0 implementation streamed result bodies to disk and split large jobs with PK chunking at 100,000 records per batch, so this is a memory regression for exactly the large extracts the Bulk API is intended for.

Consider passing an explicit maxRecords (e.g. 100000) to bound per-page memory usage.


Developed in collaboration with Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 1c31408. Set maxRecords to 100,000, and also switched to copying directly from the response body to the temp file.

@k-anshul

Copy link
Copy Markdown
Member

Since you maintain ForceCLI/force, would it be feasible to break the libbeeep dependency chain upstream? lib/force.go imports ForceCLI/force/desktop only for the desktop.Open call in the interactive OAuth flow, but desktop/notify.go in the same package imports gen2brain/beeep, which pulls nine desktop notification/GUI modules (systray, go-toast, dbus, icns, image codecs, etc.) into Rill's module graph. Splitting the notification helpers into a separate package would let a future version bump drop them from go.mod entirely. Not blocking for this PR — the linker eliminates most of the unreachable code (though some dbus init code survives in Linux builds) — but it would keep the notification stack out of dependency and vulnerability scans.


Developed in collaboration with Claude Code

Update github.com/ForceCLI/force to v1.11.0 to drop dependencies on
notification packages.
@cwarden

cwarden commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Since you maintain ForceCLI/force, would it be feasible to break the libbeeep dependency chain upstream? lib/force.go imports ForceCLI/force/desktop only for the desktop.Open call in the interactive OAuth flow, but desktop/notify.go in the same package imports gen2brain/beeep, which pulls nine desktop notification/GUI modules (systray, go-toast, dbus, icns, image codecs, etc.) into Rill's module graph. Splitting the notification helpers into a separate package would let a future version bump drop them from go.mod entirely. Not blocking for this PR — the linker eliminates most of the unreachable code (though some dbus init code survives in Linux builds) — but it would keep the notification stack out of dependency and vulnerability scans.

Good idea. Addressed in e0f8ea3.

Use the new GetBulk2QueryResultsWithCallback streaming API in force
v1.12.0 to copy each results page straight to a temp file instead of
buffering the whole page in memory. Also pass an explicit maxRecords of
100,000 so Salesforce bounds the per-page size rather than potentially
returning the entire result set in one page.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants