Skip to content

Remove unicode diacritics in default filter function #386

@davbrito

Description

@davbrito

Suggestion

Currently, the search string normalization in cmdk/src/command-score.ts only performs lowercasing and space character replacement:

function formatInput(string) {
  // convert all valid space characters to space so they match each other
  return string.toLowerCase().replace(COUNT_SPACE_REGEXP, ' ')
}

This approach does not handle unicode diacritics (e.g., accents in café, naïve, etc.). As a result, searches for "cafe" will not match "café".

Proposal:

Extend search string normalization to remove unicode diacritics using String.prototype.normalize('NFD') and a regex to strip combining marks:

function formatInput(string) {
  return string
    .toLowerCase()
    .normalize('NFD') // Decompose unicode characters
    .replace(/[\u0300-\u036f]/g, '') // Remove diacritical marks
    .replace(COUNT_SPACE_REGEXP, ' ')
}

This change will make search matching more robust for international users and improve search results for text containing diacritics.

Location:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions