feat(argilla): add Argilla MCP Server with 13 tools by OdinHoang03 · Pull Request #13 · OAI-Labs/xfmr-zem

OdinHoang03 · 2026-02-23T07:40:48Z

Dataset Tools (8)

create_dataset: tạo dataset với fields & questions
push_records: đẩy records từ pipeline vào Argilla
get_records: lấy records đã annotate
query_records: query theo filter (status/annotator/label)
export_dataset: export ra JSONL hoặc Parquet
delete_records: xóa records theo filter
annotation_progress: thống kê tiến độ annotation
agreement_score: tính IAA (Cohen/Fleiss Kappa, Krippendorff Alpha, Overlap)

User/Annotator Management Tools (5)

create_user: tạo user mới (admin | annotator)
list_users: liệt kê users theo role
delete_user: xóa user theo username
manage_workspace: thêm/xóa user khỏi workspace
annotator_stats: thống kê annotations theo annotator

Factory Modules (SOLID)

connection.py: ArgillaConnectionFactory (Singleton)
dataset.py: DatasetFactory
record.py: RecordFactory
export.py: ExportFactory (JSONL/Parquet/HF Hub)
agreement.py: AgreementFactory (IAA algorithms)
user.py: UserFactory

Dependencies

argilla>=2.0.0, scikit-learn>=1.3.0, krippendorff>=0.6.0

## Dataset Tools (8) - create_dataset: tạo dataset với fields & questions - push_records: đẩy records từ pipeline vào Argilla - get_records: lấy records đã annotate - query_records: query theo filter (status/annotator/label) - export_dataset: export ra JSONL hoặc Parquet - delete_records: xóa records theo filter - annotation_progress: thống kê tiến độ annotation - agreement_score: tính IAA (Cohen/Fleiss Kappa, Krippendorff Alpha, Overlap) ## User/Annotator Management Tools (5) - create_user: tạo user mới (admin | annotator) - list_users: liệt kê users theo role - delete_user: xóa user theo username - manage_workspace: thêm/xóa user khỏi workspace - annotator_stats: thống kê annotations theo annotator ## Factory Modules (SOLID) - connection.py: ArgillaConnectionFactory (Singleton) - dataset.py: DatasetFactory - record.py: RecordFactory - export.py: ExportFactory (JSONL/Parquet/HF Hub) - agreement.py: AgreementFactory (IAA algorithms) - user.py: UserFactory ## Dependencies - argilla>=2.0.0, scikit-learn>=1.3.0, krippendorff>=0.6.0

github-actions · 2026-02-23T07:41:36Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 85
🧪 No relevant tests
🔒 Security concerns Sensitive information exposure: API Key và Password của user được truyền qua các tham số của tool MCP. Nếu hệ thống logging của MCP server ghi lại toàn bộ input/output của tool, các thông tin nhạy cảm này sẽ bị lộ trong log file. Cần đảm bảo `loguru` hoặc `ZemServer` có cơ chế mask các trường nhạy cảm.
⏳ Contribution time estimate (best, average, worst case): 2h \| 5h \| 10h
⚡ Recommended focus areas for review Logic Error Trong hàm `fleiss_kappa`, biến `n_annotators` được tính bằng trung bình cộng của tổng các hàng. Nếu dữ liệu đầu vào có số lượng annotator không đồng nhất trên mỗi record, công thức Fleiss' Kappa chuẩn sẽ không còn chính xác hoặc gây lỗi logic khi chia cho `(n_annotators - 1)`. n_annotators = int(matrix.sum(axis=1).mean()) n_categories = matrix.shape[1] else: raise ValueError("ratings_matrix phải là 2D: [n_records × n_categories]") # Fleiss' Kappa formula p_j = matrix.sum(axis=0) / (n_records * n_annotators) # proportion mỗi category P_e = float(np.sum(p_j 2)) P_i = (np.sum(matrix 2, axis=1) - n_annotators) / (n_annotators * (n_annotators - 1)) P_bar = float(np.mean(P_i)) Data Integrity Hàm `from_list` ép kiểu tất cả các giá trị field sang `str`. Điều này có thể làm mất cấu trúc dữ liệu đối với các trường phức tạp (như JSON hoặc list) mà Argilla 2.0 có thể hỗ trợ thông qua các field type khác nhau. argilla_key = field_map.get(key, key) fields[argilla_key] = str(value) if value is not None else "" Performance Hàm `to_parquet` và `to_huggingface` thực hiện chuyển đổi toàn bộ records sang list of dicts trong bộ nhớ trước khi tạo DataFrame/Dataset. Với số lượng records lớn (ví dụ >100k), việc này có thể gây tràn bộ nhớ (OOM). base = { "id": str(rec.id) if rec.id else None, {f"field_{k}": v for k, v in (rec.fields or {}).items()}, {f"meta_{k}": v for k, v in (rec.metadata or {}).items()}, } if rec.responses: for resp in rec.responses: row = dict(base) row["annotator"] = str(resp.user_id) if resp.user_id else None row["status"] = resp.status.value if resp.status else None for q, v in (resp.values or {}).items(): row[f"answer_{q}"] = v.value rows.append(row) else: rows.append(base) Security Hàm `create_user` nhận mật khẩu thô. Cần đảm bảo các log hoặc hệ thống MCP không vô tình ghi lại tham số này. Ngoài ra, không thấy có bước validate độ mạnh của mật khẩu ở phía client trước khi gửi lên server. def create_user( client, username: str, password: str, role: str = "annotator", first_name: str = "", last_name: str = "", ): """ Tạo user mới. Args: client: Argilla client username: Tên đăng nhập (unique) password: Mật khẩu (min 8 ký tự) role: "annotator" \| "admin" first_name: Tên last_name: Họ Returns: rg.User instance """ try: import argilla as rg except ImportError: raise ImportError("Cài: pip install 'xfmr-zem[argilla]'") if role not in VALID_ROLES: raise ValueError(f"Role không hợp lệ: '{role}'. Dùng: {VALID_ROLES}") user = rg.User( username=username, password=password, role=role, first_name=first_name or username, last_name=last_name, client=client, ) user.create() logger.info(f"Đã tạo user '{username}' (role={role})") return user

github-actions · 2026-02-23T07:41:52Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix division by zero in Kappa The current Fleiss' Kappa implementation does not handle cases where all annotators agree perfectly on a single category across all records, which leads to a division by zero when `P_e` is 1.0. While the code attempts to catch `P_e == 1.0`, it returns 0.0, which is mathematically incorrect for perfect agreement (should be 1.0). Additionally, it should handle the case where `n_annotators` is 1 to avoid division by zero in `P_i`. src/xfmr_zem/servers/argilla/factory/agreement.py [107-116] -p_j = matrix.sum(axis=0) / (n_records * n_annotators) # proportion mỗi category +if n_annotators <= 1: + return 0.0 +p_j = matrix.sum(axis=0) / (n_records * n_annotators) P_e = float(np.sum(p_j 2)) - P_i = (np.sum(matrix 2, axis=1) - n_annotators) / (n_annotators * (n_annotators - 1)) P_bar = float(np.mean(P_i)) +if P_e >= 1.0: + return 1.0 if P_bar >= 1.0 else 0.0 +kappa = (P_bar - P_e) / (1 - P_e) -if P_e == 1.0: - kappa = 0.0 -else: - kappa = (P_bar - P_e) / (1 - P_e) - Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies a mathematical error in the Fleiss' Kappa implementation. If there is perfect agreement (`P_e == 1.0`), the score should be 1.0, not 0.0. It also correctly identifies a potential division by zero if `n_annotators` is 1.	Medium
General	Ensure consistent schema for Parquet export When exporting to Parquet, if the `rows` list contains dictionaries with inconsistent keys (which happens if different records have different metadata or fields), `pandas` will create a DataFrame with many NaNs. For large datasets, this can cause schema inference issues or excessive memory usage. Ensure the data is flattened consistently or use a fixed schema before calling `to_parquet`. src/xfmr_zem/servers/argilla/factory/export.py [96-97] df = pd.DataFrame(rows) -df.to_parquet(path, index=False) +# Ensure all columns are present and handle potential object-type mixed columns +for col in df.columns: + if df[col].dtype == object: + df[col] = df[col].astype(str).replace("None", None) +df.to_parquet(path, index=False, engine='pyarrow') Suggestion importance[1-10]: 5 __ Why: The suggestion addresses a valid concern regarding schema consistency in Parquet files when dealing with dynamic metadata. The proposed fix of casting object columns to strings is a reasonable safeguard to prevent schema inference crashes in downstream tools.	Low
General	Improve existing dataset detection logic The `client.datasets` call in Argilla 2.x returns a list-like object. If the dataset exists but the provided `fields` or `questions` configuration in the PR differs from the remote state, subsequent operations might fail or lead to data inconsistency. You should verify if the existing dataset's settings match the requested configuration or provide a warning if they differ. src/xfmr_zem/servers/argilla/factory/dataset.py [119-126] try: existing = client.datasets(name=name, workspace=workspace) if existing: - logger.info(f"Dataset '{name}' (workspace='{workspace}') đã tồn tại.") - return existing[0] -except Exception: - pass + ds = existing[0] + logger.info(f"Dataset '{name}' already exists. Note: settings are not updated if they differ.") + return ds +except Exception as e: + logger.debug(f"Dataset lookup failed: {e}") Suggestion importance[1-10]: 4 __ Why: The suggestion correctly points out that settings are not updated if a dataset already exists. However, the 'improved_code' doesn't actually implement a check for configuration differences; it just changes the log message and adds a debug log for the exception.	Low

OdinHoang03 requested a review from a team February 23, 2026 07:40

OdinHoang03 merged commit ac58d5f into main Feb 23, 2026
1 check passed

OdinHoang03 deleted the feat/argilla-mcp-server branch February 23, 2026 07:41

github-actions Bot added the Review effort 3/5 label Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(argilla): add Argilla MCP Server with 13 tools#13

feat(argilla): add Argilla MCP Server with 13 tools#13
OdinHoang03 merged 1 commit intomainfrom
feat/argilla-mcp-server

OdinHoang03 commented Feb 23, 2026

Uh oh!

Uh oh!

github-actions Bot commented Feb 23, 2026

Uh oh!

github-actions Bot commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

OdinHoang03 commented Feb 23, 2026

Dataset Tools (8)

User/Annotator Management Tools (5)

Factory Modules (SOLID)

Dependencies

Uh oh!

Uh oh!

github-actions Bot commented Feb 23, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions Bot commented Feb 23, 2026

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant