Update Salesforce Connector To Create Source Models#9491
Conversation
The Salesforce connector now uses Salesforce's OAuth 2.0 endpoints (SOAP login is being decommissioned) and fits the warehouse-style connector + model flow used by other warehouses. - Authentication: drop SOAP login; pick between OAuth password flow, client credentials flow, and JWT bearer based on which credentials are populated. Adds a required `client_secret` for username/password authentication. - Schema: mark Salesforce as a warehouse so credentials are written to `connectors/<name>.yaml` and the SOQL query lives in `models/<name>.yaml`. The `key` field is uploaded as a file and base64-encoded before being stored in `.env` so embedded newlines do not break dotenv parsing; the driver accepts raw PEM as well. - Driver: accepts `sql:` as an alias for `soql:` to match the model shape produced by the explorer.
- Bulk API 2.0: replace the v1 query path with `Bulk2QueryJob`. v2 manages chunking server-side and derives the SObject from the query itself, so `sobject:` is no longer a required model property. - SELECT * rewrite: SOQL does not accept `SELECT *`, but the connector explorer's "Table" mode produces it. The driver detects `SELECT * FROM <SObject>` (with optional WHERE / ORDER BY / LIMIT), calls DescribeSObject, and rewrites the query into an explicit field list. Compound `address` / `location` types are skipped — their atomic components remain because Salesforce exposes them as their own fields. - Information schema: implement `ListDatabaseSchemas` / `ListTables` / `GetTable` via `force.ListSobjects` and `force.DescribeSObject`, so the connector explorer can browse SObjects as tables and clear the "does not implement information schema" error. - Shared auth helpers: extract `connection.authOptions()` and `sourceProperties.applyOverrides()` so InformationSchema, Ping, and QueryAsFiles all build their auth options the same way.
| } | ||
| } | ||
|
|
||
| page, err := j.session.GetBulk2QueryResultsWithContext(ctx, j.jobID, j.locator, 0) |
There was a problem hiding this comment.
GetBulk2QueryResultsWithContext is called with maxRecords=0, which omits the maxRecords parameter from the request and leaves the page size up to Salesforce. The library reads the entire response body into memory (resp.ReadResponseBody is a []byte), so a single page — potentially the whole result set — is buffered in RAM before being written to a temp file. The previous Bulk 1.0 implementation streamed result bodies to disk and split large jobs with PK chunking at 100,000 records per batch, so this is a memory regression for exactly the large extracts the Bulk API is intended for.
Consider passing an explicit maxRecords (e.g. 100000) to bound per-page memory usage.
Developed in collaboration with Claude Code
There was a problem hiding this comment.
Addressed in 1c31408. Set maxRecords to 100,000, and also switched to copying directly from the response body to the temp file.
|
Since you maintain Developed in collaboration with Claude Code |
Update github.com/ForceCLI/force to v1.11.0 to drop dependencies on notification packages.
Good idea. Addressed in e0f8ea3. |
Use the new GetBulk2QueryResultsWithCallback streaming API in force v1.12.0 to copy each results page straight to a temp file instead of buffering the whole page in memory. Also pass an explicit maxRecords of 100,000 so Salesforce bounds the per-page size rather than potentially returning the entire result set in one page.
Update the Salesforce driver to create salesforce connectors and source models rather than the deprecated sources. Re-enable it in the Add Data modal.
Update the Salesforce connector to use OAuth Username/Password flow because SOAP login is being decommissioned. Client Credentials and JWT flows are also supported.
Update Salesforce models to use Bulk API 2.0, which doesn't require the sobject property, allowing it to work with the New Model modal.