Backfill users.name from oauth_github.login#13547
Conversation
`users.name` is currently set only when the GitHub profile has a display name, so a sizable fraction of accounts have it as NULL. That makes the column unusable as a guaranteed identifier and means any feature that wants a non-NULL name has to fall back to gh_login on its own. Fill in the NULLs with `oauth_github.login` (the GitHub username) so every account has a name. The DO-block walks `users.id` in 5000-row windows and COMMITs between batches so row locks release between iterations and concurrent session-login upserts aren't blocked. lock_timeout / statement_timeout cap each batch in case of unexpected contention. Idempotent: filtered on `users.name IS NULL`, so re-running is a no-op once every account has a name. Follows the same out-of-band pattern as data_oauth_github.sql -- run with `psql -f <file>` and do NOT wrap in BEGIN/COMMIT, since the COMMIT inside the DO block requires the block to be at the top level. The session-login upsert path is unchanged: future logins will still write the GitHub display name when one is set, overwriting any backfilled value with the user's actual chosen name.
|
I'm confused... I don't remember seeing backfill of In the username RFC's unresolved questions, I have been considering that we should get rid of |
|
Honestly I don't agree. Username/display name aren't standard across SSO providers (Github provides a username from Github, but Bitbucket/Google/etc is centered around email addresses). To keep current users in the position they would initially be in after signing up from Github, we need to backfill a users username/display name on the user table (and probably drop it from SSO providers to centralize around a core "Crates Username" identity for a single source of truth when you can potentially have a 1-N map of SSO providers to a single Crates username). Further, keeping usernames as a SSO provided field makes lookups tougher on the database (it's an extra JOIN that doesn't need to be added). Finally, tying username to account is easier on support if a user changes preferred SSO due to reasons outside Crate's scope. Say user John was a Github sign-up but wants toove everything over to GitLab because he is sick of Github's recent downtime statistics. Changing a single SSO preferred account but maintaining the Crate's username is a lot easier than needing to open a support ticket for manual database intervention. |
Yes, this is what the entire rust-lang/rfcs#3946 is about. Have you read it? |
Yes I have read it. Which is exactly what the point of this PR is for (backfilling username based on current Github credentials already in the Database). If the point of the RFC is to instead make usernames wholly unique and configurable via Crates, that would be addressed later in other work. For now, a backfill of current identity to the core user is what's needed to not break existing functionality and brand identity of users who have already signed up through Github. Unless I'm missing something entirely. |
Yes, I don't think my explanation is working, so let's try an approach where you go explore in crates.io to experience what I'm trying to explain for yourself:
When you have those explanations, and if you think we still need this PR after doing that, you may reopen! |
Most of the username groundwork was tangled up in #13477. Splitting it out so it can land on its own, per @carols10cents.
users.nameis only set when a GitHub profile has a display name, so plenty of accounts have it as NULL. This backfills the NULLs fromoauth_github.login. Walksusers.idin 5000-row batches with a commit between each so we're not holding locks while logins are happening. Idempotent — re-running is a no-op once everyone has a name. Run it the same way asdata_oauth_github.sql:psql -f <file>.