Skip to content

Refactor dict restoring abstraction#3561

Draft
JimB123 wants to merge 1 commit intovalkey-io:unstablefrom
JimB123:refactor-dict
Draft

Refactor dict restoring abstraction#3561
JimB123 wants to merge 1 commit intovalkey-io:unstablefrom
JimB123:refactor-dict

Conversation

@JimB123
Copy link
Copy Markdown
Member

@JimB123 JimB123 commented Apr 24, 2026

The recent update to dict (#3366) improves performance by making dict a thin wrapper on top of the hashtable implementation.

As part of that refactoring, we lost some of our dict abstraction. The dictEntry became public (again). The defrag code was diving into the entry directly.

This update:

  • Hardens the dict abstraction by making dictEntry opaque (again)
  • Moving some defrag capability back to the dict (out of defrag)
  • Uses a .c file rather than the .h file (allowing for opaqueness in the data structure and code)
  • Eliminates the requirement to configure dictEntryGetKey on every dict (it's essentially a required constant)

Link time optimization (LTO) will result in the same inlining of functions, however it can now use configurable options to tune the level of inlining. This potentially reduces L1 cache bloat.

Comment thread src/dict.h
* the "htdict" prefix is used to avoid colliding with the "dict" in libvalkey */
#define dictCreate(type) htdictCreate(type)
#define dictExpand(d, size) htdictExpand(d, size)
#define dictSetKey(d, de, key) htdictSetKey(d, de, key)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to eliminate this function. "set key" is an anti-pattern for a dict structure. The only place this is being used is for one use-case in expire.c. We could have fixed that with the key dup callback, but that callback was eliminated in the transition to hashtable.

Might just need a better API which addresses the use case in expire.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or would it be feasible to refactor that use case to use hashtable directly?

I got slightly distracted - turns out it was easy. I had a PR ready (#3566) before I remembered I was reviewing here 😓

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 83.76068% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.21%. Comparing base (d2db0c2) to head (013ac74).

Files with missing lines Patch % Lines
src/dict_ht.c 83.62% 19 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3561      +/-   ##
============================================
- Coverage     76.35%   76.21%   -0.15%     
============================================
  Files           159      159              
  Lines         80054    80075      +21     
============================================
- Hits          61125    61028      -97     
- Misses        18929    19047     +118     
Files with missing lines Coverage Δ
src/cluster_legacy.c 88.16% <ø> (-0.08%) ⬇️
src/config.c 78.09% <ø> (ø)
src/defrag.c 81.96% <100.00%> (-0.30%) ⬇️
src/eval.c 91.50% <ø> (ø)
src/expire.c 98.12% <ø> (ø)
src/functions.c 96.61% <ø> (-0.03%) ⬇️
src/fuzzer_command_generator.c 76.82% <ø> (ø)
src/latency.c 83.33% <ø> (ø)
src/module.c 25.31% <ø> (ø)
src/rdb.c 76.84% <ø> (-0.36%) ⬇️
... and 6 more

... and 20 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Jim Brunner <brunnerj@amazon.com>
Copy link
Copy Markdown
Contributor

@rainsupreme rainsupreme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, though I think there are couple things to fix. I do think keeping the opaqueness of dict is the right move, and we don't have to give up the performance gain either!

I'm a little disappointed with the dict naming conflict from libvalkey, and all the #define boilerplate in dict.h. We didn't have to do this before this change (apparently?), so my (unresearched) hunch is that there should be some way to avoid the boilerplate. Could we remove dict.h from server.h and only include it where dict is used, or something like that? 🤔

Comment thread src/dict.h
} dictEntry;

#define UNUSED(V) ((void)V)
typedef struct dictEntry dictEntry; // opaque
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: /* opaque */ (Valkey style doesn't use // comments)

Comment thread src/dict_ht.c
* This file implements the dict API as a thin wrapper of the newer hashtable
* API. The dictEntry struct is used as the entry type in underlying hashtable.
*
* Copyright (c) 2006-2012, Redis Ltd.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new file, so we should be giving Valkey credit, not Redis. Put this at the top instead and get rid of that redistribution/disclaimer cruft too:

/*
 * Copyright Valkey Contributors.
 * All rights reserved.
 * SPDX-License-Identifier: BSD 3-Clause
 */

Comment thread src/dict_ht.c

double htdictIncrDoubleVal(dictEntry *de, double val) {
de->v.d += val;
return de->v.d;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: imo these ultra short alphabet soup names are hard to read as a reviewer. I assume this is for compatibility with old dict-using code though? otherwise I'd kinda prefer entry->val.doubleval over de->v.d for example

Comment thread src/dict_ht.c
* Always returns 1 to indicate the key was consumed (either added or used
* to replace). The caller should not free the key after calling this. */
int htdictReplace(dict *d, void *key, void *val) {
dictEntry *entry = (dictEntry *)zmalloc(sizeof(*entry));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid the temporary allocation when the key already exists if you use the hashtableFindPositionForInsert pattern here. (Good comment in hashtable.c) You can pass just the key and find whether it exists, then only allocate and build the entry if actually needed, otherwise you replace the existing entry's value.

Comment thread src/dict.h
* the "htdict" prefix is used to avoid colliding with the "dict" in libvalkey */
#define dictCreate(type) htdictCreate(type)
#define dictExpand(d, size) htdictExpand(d, size)
#define dictSetKey(d, de, key) htdictSetKey(d, de, key)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or would it be feasible to refactor that use case to use hashtable directly?

I got slightly distracted - turns out it was easy. I had a PR ready (#3566) before I remembered I was reviewing here 😓

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants