This folder contains the KG-FakeBench dataset, a knowledge graph (KG)-grounded benchmark for evaluating large language models (LLMs) and detection frameworks on AI-generated misinformation.
-
kg_fakebench_v1.rar
Compressed archive containing the full dataset in JSON format. -
kg_fakebench_Real.json
JSON file containing real samples
KG-FakeBench is constructed using a KG-guided generation pipeline that enables controlled factual distortions.
- Total synthetic samples: 28,900
- High-plausibility: 14,450
- Low-plausibility: 14,450
- Real samples: 1,239
Each sample is generated by modifying a structured KG triple and synthesizing natural language using LLMs.
Each entry in the KG-FakeBench dataset, including both synthetic and real subsets, follows a structured format as follows:
{
"subject": "...",
"description": "...",
"predicate": "...",
"fake_object": "...",
"plausibility": "high | low",
"fake_news": "..."
}
{
"subject": "...",
"description": "...",
"predicate": "...",
"real_object": "...",
"real_news": "..."
}