Draft: Forkless, background iteration utility (forkless)#3553
Draft
JimB123 wants to merge 1 commit intovalkey-io:forklessfrom
Draft
Draft: Forkless, background iteration utility (forkless)#3553JimB123 wants to merge 1 commit intovalkey-io:forklessfrom
JimB123 wants to merge 1 commit intovalkey-io:forklessfrom
Conversation
Signed-off-by: Jim Brunner <brunnerj@amazon.com>
ca38ccc to
206635b
Compare
Member
Author
|
This replaces previous draft version (which was against a different branch). #3432 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## forkless #3553 +/- ##
============================================
+ Coverage 76.57% 76.73% +0.15%
============================================
Files 159 161 +2
Lines 81206 81422 +216
============================================
+ Hits 62187 62481 +294
+ Misses 19019 18941 -78
🚀 New features to boost your workflow:
|
JimB123
commented
Apr 24, 2026
Comment on lines
+2571
to
+2634
| /* This is an internal function - not part of the standard API. It must be explicitly declared | ||
| * where used. It shouldn't be included in any .h (API) file. Use of this interface is discouraged | ||
| * as it depends on the internal structure, which may change. | ||
| * | ||
| * For a given key, return: | ||
| * table_idx - the index of the internal table (0 or 1) | ||
| * bucket_idx - the bucket index within the table (0..n) | ||
| * | ||
| * Returns TRUE if the the key exists in the table. | ||
| * Returns FALSE if the key doesn't exist (and table/index are undefined) | ||
| */ | ||
| bool hashtableInternalFindBucketIdx(hashtable *ht, void *key, int *table_idx, size_t *bucket_idx) { | ||
| uint64_t hash = hashKey(ht, key); | ||
| int pos_in_bucket; | ||
| int table; | ||
| bucket *b = findBucket(ht, hash, key, &pos_in_bucket, &table); | ||
| if (!b) return false; | ||
|
|
||
| *table_idx = table; | ||
| *bucket_idx = hash & expToMask(ht->bucket_exp[table]); | ||
| return true; | ||
| } | ||
|
|
||
| /* This is an internal function - not part of the standard API. It must be explicitly declared | ||
| * where used. It shouldn't be included in any .h (API) file. Use of this interface is discouraged | ||
| * as it depends on the internal structure, which may change. | ||
| * | ||
| * For a given iterator, return: | ||
| * table_idx - the index of the internal table (0 or 1) | ||
| * bucket_idx - the bucket index within the table (0..n) | ||
| * | ||
| * NOTE: hashtableIterator position is based on the LAST item returned. | ||
| */ | ||
| void hashtableInternalIteratorGetBucketIdx(hashtableIterator *iterator, int *table_idx, size_t *bucket_idx) { | ||
| iter *it = iteratorFromOpaque(iterator); | ||
| *table_idx = it->table; | ||
| *bucket_idx = it->index; | ||
| } | ||
|
|
||
| /* This is an internal function - not part of the standard API. It must be explicitly declared | ||
| * where used. It shouldn't be included in any .h (API) file. Use of this interface is discouraged | ||
| * as it depends on the internal structure, which may change. | ||
| * | ||
| * Returns TRUE if the iterator is ready to move to the next bucket index (if it has completed the | ||
| * current bucket index). Note: hashtableIterator bucket_idx is the bucket index of the last item | ||
| * returned by hashtableNext. | ||
| * | ||
| * Note: If this function returns true, the iterator commits to move onto the next bucket index, | ||
| * even if something new is added to the end of the current bucket before hashtableNext is called. | ||
| */ | ||
| bool hashtableInternalIteratorIsBucketIdxComplete(hashtableIterator *iterator) { | ||
| iter *it = iteratorFromOpaque(iterator); | ||
|
|
||
| if (it->bucket->chained) return false; | ||
|
|
||
| if (!(it->bucket->presence >> (it->pos_in_bucket + 1))) { | ||
| /* There's CURRENTLY nothing else to return at this bucket index. Mark pos_in_bucket so | ||
| * so that hashtableNext will move to the next bucket index, regardless of items which may | ||
| * be added in the future. */ | ||
| it->pos_in_bucket = ITERATOR_DONE_WITH_BUCKET_IDX; | ||
| return true; | ||
| } | ||
| return false; | ||
| } |
Member
Author
There was a problem hiding this comment.
Note: Per @rainsupreme this isn't the best approach. Calling code to be refactored to use "scan". See: #3432 (comment)
Comment on lines
+496
to
+520
| while (it->iter_dbi == NULL) { | ||
| if (++it->iter_db >= server.dbnum) { | ||
| fifoRelease(dbEntryFifo); | ||
| return NULL; // Iteration complete | ||
| } | ||
| serverDb *db = server.db[it->orig_to_cur_db[it->iter_db]]; | ||
| if (db != NULL) { | ||
| it->kvs = db->keys; | ||
| it->iter_dbi = kvstoreIteratorInit(it->kvs, HASHTABLE_ITER_SAFE); | ||
| } | ||
| } | ||
|
|
||
| hashtableIterator *ht_it = NULL; | ||
| do { | ||
| dbEntry *de; | ||
| if (!kvstoreIteratorNext(it->iter_dbi, (void **)&de)) { | ||
| kvstoreIteratorRelease(it->iter_dbi); | ||
| it->kvs = NULL, it->iter_dbi = NULL; | ||
| break; | ||
| } | ||
|
|
||
| ht_it = kvstoreInternalIteratorGetCurrentHashtableIterator(it->iter_dbi); | ||
| if (ignoreKeyForSave(objectGetKey(de))) continue; // slot migration: keys being purged | ||
| fifoPush(dbEntryFifo, de); | ||
| } while (!hashtableInternalIteratorIsBucketIdxComplete(ht_it)); |
Member
Author
There was a problem hiding this comment.
This code to be refactored to use scan.
Comment on lines
+566
to
+582
| hashtable *iter_current_ht = kvstoreGetHashtable(it->kvs, keySlot); | ||
| int table; // 0 or 1 (supporting rehashing) | ||
| size_t index; // bucket number within the hashtable | ||
| // If key doesn't exist, we consider it passed - we MIGHT have iterated over it had it existed. | ||
| if (!hashtableInternalFindBucketIdx(iter_current_ht, (void *)key, &table, &index)) return true; | ||
|
|
||
| hashtableIterator *htIter = kvstoreInternalIteratorGetCurrentHashtableIterator(it->iter_dbi); | ||
| int iter_table; | ||
| size_t iter_index; | ||
| hashtableInternalIteratorGetBucketIdx(htIter, &iter_table, &iter_index); | ||
| if (table < iter_table) return true; // iteration in table 1, but item is in table 0 | ||
| if (table > iter_table) return false; // iteration in table 0, but item is in table 1 | ||
| // if index <= iterator index, it has been passed. bgIterator | ||
| // processes buckets atomically. hashtableIterator points to the | ||
| // last returned position. It means bucket at iter_index has | ||
| // already been processed. | ||
| if (index <= iter_index) return true; |
Member
Author
There was a problem hiding this comment.
This will be changed when "scan" is used. Need to base this on the scan cursor. (A new hashtable function will be needed.)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The heart of forkless save (also forkless sync and forkless slot migration) is the bgIteration utility.
bgIteration is responsible for presenting a view of the database, possibly with replication, to achieve the requested consistency.
Important concepts:
The current use case, forkless save, will use a consistent iteration without replication.
Full sync will use a non-consistent iteration, with replication. This allows consistency to be achieved by the end of the syncronization.
There is currently no use case for consistent with replication or non-consistent without replication. However such use cases may be developed in the future. They are fully supported and tested.
Read the documentation in
bgiteration.hfor more details.