Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

KG-FakeBench Dataset

This folder contains the KG-FakeBench dataset, a knowledge graph (KG)-grounded benchmark for evaluating large language models (LLMs) and detection frameworks on AI-generated misinformation.

📦 Dataset File

  • kg_fakebench_v1.rar
    Compressed archive containing the full dataset in JSON format.

  • kg_fakebench_Real.json
    JSON file containing real samples


📊 Dataset Overview

KG-FakeBench is constructed using a KG-guided generation pipeline that enables controlled factual distortions.

  • Total synthetic samples: 28,900
    • High-plausibility: 14,450
    • Low-plausibility: 14,450
  • Real samples: 1,239

Each sample is generated by modifying a structured KG triple and synthesizing natural language using LLMs.


🧾 Data Format

Each entry in the KG-FakeBench dataset, including both synthetic and real subsets, follows a structured format as follows:

{
  "subject": "...",
  "description": "...",
  "predicate": "...",
  "fake_object": "...",
  "plausibility": "high | low",
  "fake_news": "..."
}

{
  "subject": "...",
  "description": "...",
  "predicate": "...",
  "real_object": "...",
  "real_news": "..."
}