Skip to content

Draft: Forkless, background iteration utility (forkless)#3553

Draft
JimB123 wants to merge 1 commit intovalkey-io:forklessfrom
JimB123:forkless-bgiterator
Draft

Draft: Forkless, background iteration utility (forkless)#3553
JimB123 wants to merge 1 commit intovalkey-io:forklessfrom
JimB123:forkless-bgiterator

Conversation

@JimB123
Copy link
Copy Markdown
Member

@JimB123 JimB123 commented Apr 23, 2026

//     #####:                          :####                      
//     #######                         #####     ##               
//     ##  :##:                        ##        ##               
//     ##   :##   ##.####   :####    #######   #######            
//     ##   .##   #######   ######   #######   #######            
//     ##    ##   ###.      #:  :##    ##        ##               
//     ##    ##   ##         :#####    ##        ##               
//     ##   .##   ##       .#######    ##        ##               
//     ##   :##   ##       ## .  ##    ##        ##               
//     ##  :##:   ##       ##:  ###    ##        ##.              
//     #######    ##       ########    ##        #####            
//     #####:     ##         ###.##    ##        .####
//
//----------------------------------------------------------------------------------------
// Yes, it's really draft.  The code is reviewable - but the unit tests are incomplete.
//----------------------------------------------------------------------------------------

The heart of forkless save (also forkless sync and forkless slot migration) is the bgIteration utility.

bgIteration is responsible for presenting a view of the database, possibly with replication, to achieve the requested consistency.

Important concepts:

  • bgIteration is a utility which performs iteration over the database with the intent to present this iteration to a "bgIteration client". It is analogous to any other iterator. It has no specific knowledge of the reason for iteration. It can be used for any purpose.
  • A "bgIteration client" receives iteration data from bgIteration. It is expected that the bgIteration client works in the background (on another thread). The bgIteration client is not permitted to modify the data during iteration. The bgIteration client understands the reason for iteration. (Threadsave is the bgIteration client responsible for forkless save - and is not included in this PR.)
  • A "consistent" iteration offers point-in-time consistency. This is the type of consistency needed for snapshot (bgsave). In a consistent iteration, all key/value data must be presented (to the bgIteration client) - as it existed at the point of consistency - before any modifications may be permitted.
  • A "non-consistent" iteration doesn't maintain consistency from the start of iteration. Instead, it presents key/value pairs as they exist currently in the database. The guarantees are similar to other iterators. If a value is created during the iteration, it may or may not be presented as part of the iteration.
  • An iteration may contain replication. Replication is a filtered view of replication which presents replication data related to key/values which have already been iterated (and new values being created). When replication is combined with a non-consistent iteration, it's possible to achieve consistency at the END of the iteration.

The current use case, forkless save, will use a consistent iteration without replication.
Full sync will use a non-consistent iteration, with replication. This allows consistency to be achieved by the end of the syncronization.

There is currently no use case for consistent with replication or non-consistent without replication. However such use cases may be developed in the future. They are fully supported and tested.

Read the documentation in bgiteration.h for more details.

Signed-off-by: Jim Brunner <brunnerj@amazon.com>
@JimB123 JimB123 force-pushed the forkless-bgiterator branch from ca38ccc to 206635b Compare April 23, 2026 22:36
@JimB123
Copy link
Copy Markdown
Member Author

JimB123 commented Apr 23, 2026

This replaces previous draft version (which was against a different branch). #3432

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 72.58065% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.73%. Comparing base (1dd4f7f) to head (206635b).

Files with missing lines Patch % Lines
src/server.c 22.22% 14 Missing ⚠️
src/db.c 90.00% 1 Missing ⚠️
src/module.c 0.00% 1 Missing ⚠️
src/replication.c 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           forkless    #3553      +/-   ##
============================================
+ Coverage     76.57%   76.73%   +0.15%     
============================================
  Files           159      161       +2     
  Lines         81206    81422     +216     
============================================
+ Hits          62187    62481     +294     
+ Misses        19019    18941      -78     
Files with missing lines Coverage Δ
src/bgiteration.c 69.47% <ø> (ø)
src/hashtable.c 97.96% <100.00%> (+0.15%) ⬆️
src/kvstore.c 96.65% <100.00%> (-0.08%) ⬇️
src/object.c 90.02% <100.00%> (-2.01%) ⬇️
src/rdb.c 77.04% <100.00%> (-0.09%) ⬇️
src/server.h 100.00% <ø> (ø)
src/unit/test_bgiteration.cpp 98.22% <ø> (ø)
src/db.c 94.47% <90.00%> (-0.09%) ⬇️
src/module.c 26.29% <0.00%> (-0.24%) ⬇️
src/replication.c 86.08% <50.00%> (-0.34%) ⬇️
... and 1 more

... and 118 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread src/hashtable.c
Comment on lines +2571 to +2634
/* This is an internal function - not part of the standard API. It must be explicitly declared
* where used. It shouldn't be included in any .h (API) file. Use of this interface is discouraged
* as it depends on the internal structure, which may change.
*
* For a given key, return:
* table_idx - the index of the internal table (0 or 1)
* bucket_idx - the bucket index within the table (0..n)
*
* Returns TRUE if the the key exists in the table.
* Returns FALSE if the key doesn't exist (and table/index are undefined)
*/
bool hashtableInternalFindBucketIdx(hashtable *ht, void *key, int *table_idx, size_t *bucket_idx) {
uint64_t hash = hashKey(ht, key);
int pos_in_bucket;
int table;
bucket *b = findBucket(ht, hash, key, &pos_in_bucket, &table);
if (!b) return false;

*table_idx = table;
*bucket_idx = hash & expToMask(ht->bucket_exp[table]);
return true;
}

/* This is an internal function - not part of the standard API. It must be explicitly declared
* where used. It shouldn't be included in any .h (API) file. Use of this interface is discouraged
* as it depends on the internal structure, which may change.
*
* For a given iterator, return:
* table_idx - the index of the internal table (0 or 1)
* bucket_idx - the bucket index within the table (0..n)
*
* NOTE: hashtableIterator position is based on the LAST item returned.
*/
void hashtableInternalIteratorGetBucketIdx(hashtableIterator *iterator, int *table_idx, size_t *bucket_idx) {
iter *it = iteratorFromOpaque(iterator);
*table_idx = it->table;
*bucket_idx = it->index;
}

/* This is an internal function - not part of the standard API. It must be explicitly declared
* where used. It shouldn't be included in any .h (API) file. Use of this interface is discouraged
* as it depends on the internal structure, which may change.
*
* Returns TRUE if the iterator is ready to move to the next bucket index (if it has completed the
* current bucket index). Note: hashtableIterator bucket_idx is the bucket index of the last item
* returned by hashtableNext.
*
* Note: If this function returns true, the iterator commits to move onto the next bucket index,
* even if something new is added to the end of the current bucket before hashtableNext is called.
*/
bool hashtableInternalIteratorIsBucketIdxComplete(hashtableIterator *iterator) {
iter *it = iteratorFromOpaque(iterator);

if (it->bucket->chained) return false;

if (!(it->bucket->presence >> (it->pos_in_bucket + 1))) {
/* There's CURRENTLY nothing else to return at this bucket index. Mark pos_in_bucket so
* so that hashtableNext will move to the next bucket index, regardless of items which may
* be added in the future. */
it->pos_in_bucket = ITERATOR_DONE_WITH_BUCKET_IDX;
return true;
}
return false;
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Per @rainsupreme this isn't the best approach. Calling code to be refactored to use "scan". See: #3432 (comment)

Comment thread src/bgiteration.c
Comment on lines +496 to +520
while (it->iter_dbi == NULL) {
if (++it->iter_db >= server.dbnum) {
fifoRelease(dbEntryFifo);
return NULL; // Iteration complete
}
serverDb *db = server.db[it->orig_to_cur_db[it->iter_db]];
if (db != NULL) {
it->kvs = db->keys;
it->iter_dbi = kvstoreIteratorInit(it->kvs, HASHTABLE_ITER_SAFE);
}
}

hashtableIterator *ht_it = NULL;
do {
dbEntry *de;
if (!kvstoreIteratorNext(it->iter_dbi, (void **)&de)) {
kvstoreIteratorRelease(it->iter_dbi);
it->kvs = NULL, it->iter_dbi = NULL;
break;
}

ht_it = kvstoreInternalIteratorGetCurrentHashtableIterator(it->iter_dbi);
if (ignoreKeyForSave(objectGetKey(de))) continue; // slot migration: keys being purged
fifoPush(dbEntryFifo, de);
} while (!hashtableInternalIteratorIsBucketIdxComplete(ht_it));
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code to be refactored to use scan.

Comment thread src/bgiteration.c
Comment on lines +566 to +582
hashtable *iter_current_ht = kvstoreGetHashtable(it->kvs, keySlot);
int table; // 0 or 1 (supporting rehashing)
size_t index; // bucket number within the hashtable
// If key doesn't exist, we consider it passed - we MIGHT have iterated over it had it existed.
if (!hashtableInternalFindBucketIdx(iter_current_ht, (void *)key, &table, &index)) return true;

hashtableIterator *htIter = kvstoreInternalIteratorGetCurrentHashtableIterator(it->iter_dbi);
int iter_table;
size_t iter_index;
hashtableInternalIteratorGetBucketIdx(htIter, &iter_table, &iter_index);
if (table < iter_table) return true; // iteration in table 1, but item is in table 0
if (table > iter_table) return false; // iteration in table 0, but item is in table 1
// if index <= iterator index, it has been passed. bgIterator
// processes buckets atomically. hashtableIterator points to the
// last returned position. It means bucket at iter_index has
// already been processed.
if (index <= iter_index) return true;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be changed when "scan" is used. Need to base this on the scan cursor. (A new hashtable function will be needed.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant