Skip to content

feat(argilla): add Argilla MCP Server with 13 tools#13

Merged
OdinHoang03 merged 1 commit intomainfrom
feat/argilla-mcp-server
Feb 23, 2026
Merged

feat(argilla): add Argilla MCP Server with 13 tools#13
OdinHoang03 merged 1 commit intomainfrom
feat/argilla-mcp-server

Conversation

@OdinHoang03
Copy link
Copy Markdown
Contributor

Dataset Tools (8)

  • create_dataset: tạo dataset với fields & questions
  • push_records: đẩy records từ pipeline vào Argilla
  • get_records: lấy records đã annotate
  • query_records: query theo filter (status/annotator/label)
  • export_dataset: export ra JSONL hoặc Parquet
  • delete_records: xóa records theo filter
  • annotation_progress: thống kê tiến độ annotation
  • agreement_score: tính IAA (Cohen/Fleiss Kappa, Krippendorff Alpha, Overlap)

User/Annotator Management Tools (5)

  • create_user: tạo user mới (admin | annotator)
  • list_users: liệt kê users theo role
  • delete_user: xóa user theo username
  • manage_workspace: thêm/xóa user khỏi workspace
  • annotator_stats: thống kê annotations theo annotator

Factory Modules (SOLID)

  • connection.py: ArgillaConnectionFactory (Singleton)
  • dataset.py: DatasetFactory
  • record.py: RecordFactory
  • export.py: ExportFactory (JSONL/Parquet/HF Hub)
  • agreement.py: AgreementFactory (IAA algorithms)
  • user.py: UserFactory

Dependencies

  • argilla>=2.0.0, scikit-learn>=1.3.0, krippendorff>=0.6.0

## Dataset Tools (8)
- create_dataset: tạo dataset với fields & questions
- push_records: đẩy records từ pipeline vào Argilla
- get_records: lấy records đã annotate
- query_records: query theo filter (status/annotator/label)
- export_dataset: export ra JSONL hoặc Parquet
- delete_records: xóa records theo filter
- annotation_progress: thống kê tiến độ annotation
- agreement_score: tính IAA (Cohen/Fleiss Kappa, Krippendorff Alpha, Overlap)

## User/Annotator Management Tools (5)
- create_user: tạo user mới (admin | annotator)
- list_users: liệt kê users theo role
- delete_user: xóa user theo username
- manage_workspace: thêm/xóa user khỏi workspace
- annotator_stats: thống kê annotations theo annotator

## Factory Modules (SOLID)
- connection.py: ArgillaConnectionFactory (Singleton)
- dataset.py: DatasetFactory
- record.py: RecordFactory
- export.py: ExportFactory (JSONL/Parquet/HF Hub)
- agreement.py: AgreementFactory (IAA algorithms)
- user.py: UserFactory

## Dependencies
- argilla>=2.0.0, scikit-learn>=1.3.0, krippendorff>=0.6.0
@OdinHoang03 OdinHoang03 requested a review from a team February 23, 2026 07:40
@OdinHoang03 OdinHoang03 merged commit ac58d5f into main Feb 23, 2026
1 check passed
@OdinHoang03 OdinHoang03 deleted the feat/argilla-mcp-server branch February 23, 2026 07:41
@github-actions
Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 85
🧪 No relevant tests
🔒 Security concerns

Sensitive information exposure:
API Key và Password của user được truyền qua các tham số của tool MCP. Nếu hệ thống logging của MCP server ghi lại toàn bộ input/output của tool, các thông tin nhạy cảm này sẽ bị lộ trong log file. Cần đảm bảo loguru hoặc ZemServer có cơ chế mask các trường nhạy cảm.

⏳ Contribution time estimate (best, average, worst case): 2h | 5h | 10h
⚡ Recommended focus areas for review

Logic Error

Trong hàm fleiss_kappa, biến n_annotators được tính bằng trung bình cộng của tổng các hàng. Nếu dữ liệu đầu vào có số lượng annotator không đồng nhất trên mỗi record, công thức Fleiss' Kappa chuẩn sẽ không còn chính xác hoặc gây lỗi logic khi chia cho (n_annotators - 1).

    n_annotators = int(matrix.sum(axis=1).mean())
    n_categories = matrix.shape[1]
else:
    raise ValueError("ratings_matrix phải là 2D: [n_records × n_categories]")

# Fleiss' Kappa formula
p_j = matrix.sum(axis=0) / (n_records * n_annotators)  # proportion mỗi category
P_e = float(np.sum(p_j ** 2))

P_i = (np.sum(matrix ** 2, axis=1) - n_annotators) / (n_annotators * (n_annotators - 1))
P_bar = float(np.mean(P_i))
Data Integrity

Hàm from_list ép kiểu tất cả các giá trị field sang str. Điều này có thể làm mất cấu trúc dữ liệu đối với các trường phức tạp (như JSON hoặc list) mà Argilla 2.0 có thể hỗ trợ thông qua các field type khác nhau.

argilla_key = field_map.get(key, key)
fields[argilla_key] = str(value) if value is not None else ""
Performance

Hàm to_parquetto_huggingface thực hiện chuyển đổi toàn bộ records sang list of dicts trong bộ nhớ trước khi tạo DataFrame/Dataset. Với số lượng records lớn (ví dụ >100k), việc này có thể gây tràn bộ nhớ (OOM).

base = {
    "id": str(rec.id) if rec.id else None,
    **{f"field_{k}": v for k, v in (rec.fields or {}).items()},
    **{f"meta_{k}": v for k, v in (rec.metadata or {}).items()},
}
if rec.responses:
    for resp in rec.responses:
        row = dict(base)
        row["annotator"] = str(resp.user_id) if resp.user_id else None
        row["status"] = resp.status.value if resp.status else None
        for q, v in (resp.values or {}).items():
            row[f"answer_{q}"] = v.value
        rows.append(row)
else:
    rows.append(base)
Security

Hàm create_user nhận mật khẩu thô. Cần đảm bảo các log hoặc hệ thống MCP không vô tình ghi lại tham số này. Ngoài ra, không thấy có bước validate độ mạnh của mật khẩu ở phía client trước khi gửi lên server.

def create_user(
    client,
    username: str,
    password: str,
    role: str = "annotator",
    first_name: str = "",
    last_name: str = "",
):
    """
    Tạo user mới.

    Args:
        client: Argilla client
        username: Tên đăng nhập (unique)
        password: Mật khẩu (min 8 ký tự)
        role: "annotator" | "admin"
        first_name: Tên
        last_name: Họ

    Returns:
        rg.User instance
    """
    try:
        import argilla as rg
    except ImportError:
        raise ImportError("Cài: pip install 'xfmr-zem[argilla]'")

    if role not in VALID_ROLES:
        raise ValueError(f"Role không hợp lệ: '{role}'. Dùng: {VALID_ROLES}")

    user = rg.User(
        username=username,
        password=password,
        role=role,
        first_name=first_name or username,
        last_name=last_name,
        client=client,
    )
    user.create()
    logger.info(f"Đã tạo user '{username}' (role={role})")
    return user

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix division by zero in Kappa

The current Fleiss' Kappa implementation does not handle cases where all annotators
agree perfectly on a single category across all records, which leads to a division
by zero when P_e is 1.0. While the code attempts to catch P_e == 1.0, it returns
0.0, which is mathematically incorrect for perfect agreement (should be 1.0).
Additionally, it should handle the case where n_annotators is 1 to avoid division by
zero in P_i.

src/xfmr_zem/servers/argilla/factory/agreement.py [107-116]

-p_j = matrix.sum(axis=0) / (n_records * n_annotators)  # proportion mỗi category
+if n_annotators <= 1:
+    return 0.0
+p_j = matrix.sum(axis=0) / (n_records * n_annotators)
 P_e = float(np.sum(p_j ** 2))
-
 P_i = (np.sum(matrix ** 2, axis=1) - n_annotators) / (n_annotators * (n_annotators - 1))
 P_bar = float(np.mean(P_i))
+if P_e >= 1.0:
+    return 1.0 if P_bar >= 1.0 else 0.0
+kappa = (P_bar - P_e) / (1 - P_e)
 
-if P_e == 1.0:
-    kappa = 0.0
-else:
-    kappa = (P_bar - P_e) / (1 - P_e)
-
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a mathematical error in the Fleiss' Kappa implementation. If there is perfect agreement (P_e == 1.0), the score should be 1.0, not 0.0. It also correctly identifies a potential division by zero if n_annotators is 1.

Medium
General
Ensure consistent schema for Parquet export

When exporting to Parquet, if the rows list contains dictionaries with inconsistent
keys (which happens if different records have different metadata or fields), pandas
will create a DataFrame with many NaNs. For large datasets, this can cause schema
inference issues or excessive memory usage. Ensure the data is flattened
consistently or use a fixed schema before calling to_parquet.

src/xfmr_zem/servers/argilla/factory/export.py [96-97]

 df = pd.DataFrame(rows)
-df.to_parquet(path, index=False)
+# Ensure all columns are present and handle potential object-type mixed columns
+for col in df.columns:
+    if df[col].dtype == object:
+        df[col] = df[col].astype(str).replace("None", None)
+df.to_parquet(path, index=False, engine='pyarrow')
Suggestion importance[1-10]: 5

__

Why: The suggestion addresses a valid concern regarding schema consistency in Parquet files when dealing with dynamic metadata. The proposed fix of casting object columns to strings is a reasonable safeguard to prevent schema inference crashes in downstream tools.

Low
Improve existing dataset detection logic

The client.datasets call in Argilla 2.x returns a list-like object. If the dataset
exists but the provided fields or questions configuration in the PR differs from the
remote state, subsequent operations might fail or lead to data inconsistency. You
should verify if the existing dataset's settings match the requested configuration
or provide a warning if they differ.

src/xfmr_zem/servers/argilla/factory/dataset.py [119-126]

 try:
     existing = client.datasets(name=name, workspace=workspace)
     if existing:
-        logger.info(f"Dataset '{name}' (workspace='{workspace}') đã tồn tại.")
-        return existing[0]
-except Exception:
-    pass
+        ds = existing[0]
+        logger.info(f"Dataset '{name}' already exists. Note: settings are not updated if they differ.")
+        return ds
+except Exception as e:
+    logger.debug(f"Dataset lookup failed: {e}")
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly points out that settings are not updated if a dataset already exists. However, the 'improved_code' doesn't actually implement a check for configuration differences; it just changes the log message and adds a debug log for the exception.

Low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant