Skip to content

Backfill users.name from oauth_github.login#13547

Closed
quinnjr wants to merge 1 commit into
rust-lang:mainfrom
quinnjr:feature/backfill-user-name
Closed

Backfill users.name from oauth_github.login#13547
quinnjr wants to merge 1 commit into
rust-lang:mainfrom
quinnjr:feature/backfill-user-name

Conversation

@quinnjr
Copy link
Copy Markdown

@quinnjr quinnjr commented May 1, 2026

Most of the username groundwork was tangled up in #13477. Splitting it out so it can land on its own, per @carols10cents.

users.name is only set when a GitHub profile has a display name, so plenty of accounts have it as NULL. This backfills the NULLs from oauth_github.login. Walks users.id in 5000-row batches with a commit between each so we're not holding locks while logins are happening. Idempotent — re-running is a no-op once everyone has a name. Run it the same way as data_oauth_github.sql: psql -f <file>.

`users.name` is currently set only when the GitHub profile has a
display name, so a sizable fraction of accounts have it as NULL.
That makes the column unusable as a guaranteed identifier and means
any feature that wants a non-NULL name has to fall back to gh_login
on its own.

Fill in the NULLs with `oauth_github.login` (the GitHub username)
so every account has a name. The DO-block walks `users.id` in
5000-row windows and COMMITs between batches so row locks release
between iterations and concurrent session-login upserts aren't
blocked. lock_timeout / statement_timeout cap each batch in case
of unexpected contention.

Idempotent: filtered on `users.name IS NULL`, so re-running is a
no-op once every account has a name. Follows the same out-of-band
pattern as data_oauth_github.sql -- run with `psql -f <file>` and
do NOT wrap in BEGIN/COMMIT, since the COMMIT inside the DO block
requires the block to be at the top level.

The session-login upsert path is unchanged: future logins will
still write the GitHub display name when one is set, overwriting
any backfilled value with the user's actual chosen name.
@carols10cents
Copy link
Copy Markdown
Member

I'm confused... I don't remember seeing backfill of users.name in #13477. What you had that I commented on was attempting to backfill oauth_github, which wasn't needed? Why is this PR described as being pulled out of #13477 when it never existed there as far as i remember?

In the username RFC's unresolved questions, I have been considering that we should get rid of users.name completely, rather than backfill it. Currently it's a display name, not the username. So this change may not be needed at all.

@quinnjr
Copy link
Copy Markdown
Author

quinnjr commented May 6, 2026

Honestly I don't agree. Username/display name aren't standard across SSO providers (Github provides a username from Github, but Bitbucket/Google/etc is centered around email addresses). To keep current users in the position they would initially be in after signing up from Github, we need to backfill a users username/display name on the user table (and probably drop it from SSO providers to centralize around a core "Crates Username" identity for a single source of truth when you can potentially have a 1-N map of SSO providers to a single Crates username).

Further, keeping usernames as a SSO provided field makes lookups tougher on the database (it's an extra JOIN that doesn't need to be added).

Finally, tying username to account is easier on support if a user changes preferred SSO due to reasons outside Crate's scope. Say user John was a Github sign-up but wants toove everything over to GitLab because he is sick of Github's recent downtime statistics. Changing a single SSO preferred account but maintaining the Crate's username is a lot easier than needing to open a support ticket for manual database intervention.

@carols10cents
Copy link
Copy Markdown
Member

(and probably drop it from SSO providers to centralize around a core "Crates Username" identity for a single source of truth when you can potentially have a 1-N map of SSO providers to a single Crates username).

Yes, this is what the entire rust-lang/rfcs#3946 is about. Have you read it?

@quinnjr
Copy link
Copy Markdown
Author

quinnjr commented May 6, 2026

(and probably drop it from SSO providers to centralize around a core "Crates Username" identity for a single source of truth when you can potentially have a 1-N map of SSO providers to a single Crates username).

Yes, this is what the entire rust-lang/rfcs#3946 is about. Have you read it?

Yes I have read it. Which is exactly what the point of this PR is for (backfilling username based on current Github credentials already in the Database).

If the point of the RFC is to instead make usernames wholly unique and configurable via Crates, that would be addressed later in other work. For now, a backfill of current identity to the core user is what's needed to not break existing functionality and brand identity of users who have already signed up through Github.

Unless I'm missing something entirely.

@carols10cents
Copy link
Copy Markdown
Member

Unless I'm missing something entirely.

Yes, I don't think my explanation is working, so let's try an approach where you go explore in crates.io to experience what I'm trying to explain for yourself:

  • Find all the places in crates.io's codebase that use users.name, and collect links in GitHub to those lines of code. Explain what happens currently when users.name is not set.
  • Explain what would change in the user experience, exactly, if we were to merge this PR.
  • Explain how a user would change the value of users.name if we add other oauth providers.
  • Explain why we need to make this change right now.

When you have those explanations, and if you think we still need this PR after doing that, you may reopen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants