Draft: Forkless, background iteration utility (forkless) by JimB123 · Pull Request #3553 · valkey-io/valkey

JimB123 · 2026-04-23T22:35:58Z

//     #####:                          :####                      
//     #######                         #####     ##               
//     ##  :##:                        ##        ##               
//     ##   :##   ##.####   :####    #######   #######            
//     ##   .##   #######   ######   #######   #######            
//     ##    ##   ###.      #:  :##    ##        ##               
//     ##    ##   ##         :#####    ##        ##               
//     ##   .##   ##       .#######    ##        ##               
//     ##   :##   ##       ## .  ##    ##        ##               
//     ##  :##:   ##       ##:  ###    ##        ##.              
//     #######    ##       ########    ##        #####            
//     #####:     ##         ###.##    ##        .####
//
//----------------------------------------------------------------------------------------
// Yes, it's really draft.  The code is reviewable - but the unit tests are incomplete.
//----------------------------------------------------------------------------------------

The heart of forkless save (also forkless sync and forkless slot migration) is the bgIteration utility.

bgIteration is responsible for presenting a view of the database, possibly with replication, to achieve the requested consistency.

Important concepts:

bgIteration is a utility which performs iteration over the database with the intent to present this iteration to a "bgIteration client". It is analogous to any other iterator. It has no specific knowledge of the reason for iteration. It can be used for any purpose.
A "bgIteration client" receives iteration data from bgIteration. It is expected that the bgIteration client works in the background (on another thread). The bgIteration client is not permitted to modify the data during iteration. The bgIteration client understands the reason for iteration. (Threadsave is the bgIteration client responsible for forkless save - and is not included in this PR.)
A "consistent" iteration offers point-in-time consistency. This is the type of consistency needed for snapshot (bgsave). In a consistent iteration, all key/value data must be presented (to the bgIteration client) - as it existed at the point of consistency - before any modifications may be permitted.
A "non-consistent" iteration doesn't maintain consistency from the start of iteration. Instead, it presents key/value pairs as they exist currently in the database. The guarantees are similar to other iterators. If a value is created during the iteration, it may or may not be presented as part of the iteration.
An iteration may contain replication. Replication is a filtered view of replication which presents replication data related to key/values which have already been iterated (and new values being created). When replication is combined with a non-consistent iteration, it's possible to achieve consistency at the END of the iteration.

The current use case, forkless save, will use a consistent iteration without replication.
Full sync will use a non-consistent iteration, with replication. This allows consistency to be achieved by the end of the syncronization.

There is currently no use case for consistent with replication or non-consistent without replication. However such use cases may be developed in the future. They are fully supported and tested.

Read the documentation in bgiteration.h for more details.

Signed-off-by: Jim Brunner <brunnerj@amazon.com>

JimB123 · 2026-04-23T22:37:18Z

This replaces previous draft version (which was against a different branch). #3432

codecov · 2026-04-23T22:58:34Z

Codecov Report

❌ Patch coverage is 72.58065% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.73%. Comparing base (1dd4f7f) to head (206635b).

Files with missing lines	Patch %	Lines
src/server.c	22.22%	14 Missing ⚠️
src/db.c	90.00%	1 Missing ⚠️
src/module.c	0.00%	1 Missing ⚠️
src/replication.c	50.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           forkless    #3553      +/-   ##
============================================
+ Coverage     76.57%   76.73%   +0.15%     
============================================
  Files           159      161       +2     
  Lines         81206    81422     +216     
============================================
+ Hits          62187    62481     +294     
+ Misses        19019    18941      -78

Files with missing lines	Coverage Δ
src/bgiteration.c	`69.47% <ø> (ø)`
src/hashtable.c	`97.96% <100.00%> (+0.15%)`	⬆️
src/kvstore.c	`96.65% <100.00%> (-0.08%)`	⬇️
src/object.c	`90.02% <100.00%> (-2.01%)`	⬇️
src/rdb.c	`77.04% <100.00%> (-0.09%)`	⬇️
src/server.h	`100.00% <ø> (ø)`
src/unit/test_bgiteration.cpp	`98.22% <ø> (ø)`
src/db.c	`94.47% <90.00%> (-0.09%)`	⬇️
src/module.c	`26.29% <0.00%> (-0.24%)`	⬇️
src/replication.c	`86.08% <50.00%> (-0.34%)`	⬇️
... and 1 more

... and 118 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

JimB123 · 2026-04-24T22:21:54Z

+/* This is an internal function - not part of the standard API.  It must be explicitly declared
+ * where used.  It shouldn't be included in any .h (API) file.  Use of this interface is discouraged
+ * as it depends on the internal structure, which may change.
+ *
+ * For a given key, return:
+ *   table_idx - the index of the internal table (0 or 1)
+ *   bucket_idx - the bucket index within the table (0..n)
+ *
+ * Returns TRUE if the the key exists in the table.
+ * Returns FALSE if the key doesn't exist (and table/index are undefined)
+ */
+bool hashtableInternalFindBucketIdx(hashtable *ht, void *key, int *table_idx, size_t *bucket_idx) {
+    uint64_t hash = hashKey(ht, key);
+    int pos_in_bucket;
+    int table;
+    bucket *b = findBucket(ht, hash, key, &pos_in_bucket, &table);
+    if (!b) return false;
+
+    *table_idx = table;
+    *bucket_idx = hash & expToMask(ht->bucket_exp[table]);
+    return true;
+}
+
+/* This is an internal function - not part of the standard API.  It must be explicitly declared
+ * where used.  It shouldn't be included in any .h (API) file.  Use of this interface is discouraged
+ * as it depends on the internal structure, which may change.
+ *
+ * For a given iterator, return:
+ *   table_idx - the index of the internal table (0 or 1)
+ *   bucket_idx - the bucket index within the table (0..n)
+ *
+ * NOTE: hashtableIterator position is based on the LAST item returned.
+ */
+void hashtableInternalIteratorGetBucketIdx(hashtableIterator *iterator, int *table_idx, size_t *bucket_idx) {
+    iter *it = iteratorFromOpaque(iterator);
+    *table_idx = it->table;
+    *bucket_idx = it->index;
+}
+
+/* This is an internal function - not part of the standard API.  It must be explicitly declared
+ * where used.  It shouldn't be included in any .h (API) file.  Use of this interface is discouraged
+ * as it depends on the internal structure, which may change.
+ *
+ * Returns TRUE if the iterator is ready to move to the next bucket index (if it has completed the
+ * current bucket index).  Note: hashtableIterator bucket_idx is the bucket index of the last item
+ * returned by hashtableNext.
+ *
+ * Note: If this function returns true, the iterator commits to move onto the next bucket index,
+ * even if something new is added to the end of the current bucket before hashtableNext is called.
+ */
+bool hashtableInternalIteratorIsBucketIdxComplete(hashtableIterator *iterator) {
+    iter *it = iteratorFromOpaque(iterator);
+
+    if (it->bucket->chained) return false;
+
+    if (!(it->bucket->presence >> (it->pos_in_bucket + 1))) {
+        /* There's CURRENTLY nothing else to return at this bucket index.  Mark pos_in_bucket so
+         * so that hashtableNext will move to the next bucket index, regardless of items which may
+         * be added in the future. */
+        it->pos_in_bucket = ITERATOR_DONE_WITH_BUCKET_IDX;
+        return true;
+    }
+    return false;
+}


Note: Per @rainsupreme this isn't the best approach. Calling code to be refactored to use "scan". See: #3432 (comment)

JimB123 · 2026-04-24T22:23:10Z

+        while (it->iter_dbi == NULL) {
+            if (++it->iter_db >= server.dbnum) {
+                fifoRelease(dbEntryFifo);
+                return NULL; // Iteration complete
+            }
+            serverDb *db = server.db[it->orig_to_cur_db[it->iter_db]];
+            if (db != NULL) {
+                it->kvs = db->keys;
+                it->iter_dbi = kvstoreIteratorInit(it->kvs, HASHTABLE_ITER_SAFE);
+            }
+        }
+
+        hashtableIterator *ht_it = NULL;
+        do {
+            dbEntry *de;
+            if (!kvstoreIteratorNext(it->iter_dbi, (void **)&de)) {
+                kvstoreIteratorRelease(it->iter_dbi);
+                it->kvs = NULL, it->iter_dbi = NULL;
+                break;
+            }
+
+            ht_it = kvstoreInternalIteratorGetCurrentHashtableIterator(it->iter_dbi);
+            if (ignoreKeyForSave(objectGetKey(de))) continue; // slot migration: keys being purged
+            fifoPush(dbEntryFifo, de);
+        } while (!hashtableInternalIteratorIsBucketIdxComplete(ht_it));


This code to be refactored to use scan.

JimB123 · 2026-04-24T22:24:45Z

+    hashtable *iter_current_ht = kvstoreGetHashtable(it->kvs, keySlot);
+    int table; // 0 or 1 (supporting rehashing)
+    size_t index; // bucket number within the hashtable
+    // If key doesn't exist, we consider it passed - we MIGHT have iterated over it had it existed.
+    if (!hashtableInternalFindBucketIdx(iter_current_ht, (void *)key, &table, &index)) return true;
+
+    hashtableIterator *htIter = kvstoreInternalIteratorGetCurrentHashtableIterator(it->iter_dbi);
+    int iter_table;
+    size_t iter_index;
+    hashtableInternalIteratorGetBucketIdx(htIter, &iter_table, &iter_index);
+    if (table < iter_table) return true;  // iteration in table 1, but item is in table 0
+    if (table > iter_table) return false; // iteration in table 0, but item is in table 1
+    // if index <= iterator index, it has been passed. bgIterator
+    // processes buckets atomically. hashtableIterator points to the
+    // last returned position. It means bucket at iter_index has
+    // already been processed.
+    if (index <= iter_index) return true;


This will be changed when "scan" is used. Need to base this on the scan cursor. (A new hashtable function will be needed.)

Forkless Save

206635b

Signed-off-by: Jim Brunner <brunnerj@amazon.com>

JimB123 force-pushed the forkless-bgiterator branch from ca38ccc to 206635b Compare April 23, 2026 22:36

github-actions Bot assigned JimB123 Apr 23, 2026

JimB123 requested a review from murphyjacob4 April 23, 2026 22:41

JimB123 commented Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Forkless, background iteration utility (forkless)#3553

Draft: Forkless, background iteration utility (forkless)#3553
JimB123 wants to merge 1 commit intovalkey-io:forklessfrom
JimB123:forkless-bgiterator

JimB123 commented Apr 23, 2026

Uh oh!

JimB123 commented Apr 23, 2026

Uh oh!

codecov Bot commented Apr 23, 2026

Uh oh!

JimB123 Apr 24, 2026

Uh oh!

JimB123 Apr 24, 2026

Uh oh!

JimB123 Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JimB123 commented Apr 23, 2026

Uh oh!

JimB123 commented Apr 23, 2026

Uh oh!

codecov Bot commented Apr 23, 2026

Codecov Report

Uh oh!

JimB123 Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

JimB123 Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

JimB123 Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant