KG-FakeBench Dataset

This folder contains the KG-FakeBench dataset, a knowledge graph (KG)-grounded benchmark for evaluating large language models (LLMs) and detection frameworks on AI-generated misinformation.

📦 Dataset File

kg_fakebench_v1.rar
Compressed archive containing the full dataset in JSON format.
kg_fakebench_Real.json
JSON file containing real samples

📊 Dataset Overview

KG-FakeBench is constructed using a KG-guided generation pipeline that enables controlled factual distortions.

Total synthetic samples: 28,900
- High-plausibility: 14,450
- Low-plausibility: 14,450
Real samples: 1,239

Each sample is generated by modifying a structured KG triple and synthesizing natural language using LLMs.

🧾 Data Format

Each entry in the KG-FakeBench dataset, including both synthetic and real subsets, follows a structured format as follows:

{
  "subject": "...",
  "description": "...",
  "predicate": "...",
  "fake_object": "...",
  "plausibility": "high | low",
  "fake_news": "..."
}

{
  "subject": "...",
  "description": "...",
  "predicate": "...",
  "real_object": "...",
  "real_news": "..."
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KG-FakeBench Dataset

📦 Dataset File

📊 Dataset Overview

🧾 Data Format

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

KG-FakeBench Dataset

📦 Dataset File

📊 Dataset Overview

🧾 Data Format