Skip to content

feat(replica): support querying replica status via RESTful API#2377

Merged
empiredan merged 7 commits intoapache:masterfrom
empiredan:restful-api-get-replica-status
Mar 18, 2026
Merged

feat(replica): support querying replica status via RESTful API#2377
empiredan merged 7 commits intoapache:masterfrom
empiredan:restful-api-get-replica-status

Conversation

@empiredan
Copy link
Copy Markdown
Contributor

@empiredan empiredan commented Mar 4, 2026

Sometimes we need to know the current status of a replica. For example,
during offline partition split, after new partitions are generated locally,
we need to start the replica server to load the new partitions. Only after
confirming that all partition data has been successfully loaded can we
rebuild the metadata and recover the Pegasus cluster. However, there is
currently no reliable way to verify that all partition data has finished loading.

There are two possible approaches:

  1. Check the replica server logs.
    For example, if we find "load replica successfully", we assume the partition
    has been loaded successfully; if we find "load replica failed", we assume the
    loading failed.
    However, the problem is that log files are automatically cleaned up once their
    size or count exceeds certain thresholds. When there are a large number of
    partitions, the relevant logs might already be removed before we even start
    checking whether the partitions were loaded successfully.

  2. Wait for a fixed period of time.
    This approach is also impractical because we do not know when a partition
    starts loading or how long it will take to load. At the same time, we cannot wait
    indefinitely.

If we could directly know the current status of a replica — such as whether it is
still loading or already serving — this problem would be much easier to solve.
Therefore, this PR introduces a RESTful API to query the current status of a
replica.

Since the HTTP service is started before partition data loading begins, it is
possible to query the replica status from the replica server while partitions are
being loaded.

An example usage of the RESTful API:

GET http://1.2.3.4:34801/replica/status?app_id=1&partition_index=2

If the partition is currently loading, the replica server will return the following
response in JSON format:

{"status": "LOADING"}

The currently supported statuses include:

  • LOADING: the replica is being loaded;
  • NOT_FOUND: the replica does not exist;
  • CREATING: the replica is being created;
  • SERVING: the replica is serving;
  • CLOSING: the replica is being closed;
  • CLOSED: the replica has been closed;
  • UNKNOWN: the replica is in an unknown status.

@github-actions github-actions Bot added the cpp label Mar 4, 2026
@empiredan empiredan marked this pull request as ready for review March 6, 2026 10:18
Copy link
Copy Markdown

@acelyc111-bot acelyc111-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: RESTful API for querying replica status

Summary: Adds a new replica/status?app_id=X&partition_index=Y HTTP endpoint that returns the current status of a replica. Also includes significant code modernization across replica_stub.

What's good:

  • Clean HTTP handler with proper input validation (app_id >= 0, partition_index >= 0, valid int parsing)
  • Nice http_response helper methods (as_bad_request, as_missing_query_arg, as_ok_json) — makes handlers much cleaner
  • Solid modernization: replica_life_cycle → enum class, structured bindings in loops, const correctness, _is_runningstd::atomic_bool, [[nodiscard]]
  • Good separation: unlocked internal method (get_replica_life_cycle_unlocked) + public locked wrapper
  • Destructor properly deregisters the new endpoint

Minor notes:

  • The get_replica_status returns a std::string_view pointing to a static array — safe and efficient, good pattern
  • One include <nlohmann/detail/json_ref.hpp> in replica_http_service.cpp seems unnecessary (only json.hpp and json_fwd.hpp are needed) — might be an IDE auto-include

Verdict: ✅ Approve — Clean feature addition with bonus modernization. Ready to merge.

@empiredan empiredan merged commit 6e04cdf into apache:master Mar 18, 2026
172 of 173 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants