Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

1Department of Interactive Media, Hong Kong Baptist University
2Department of Computer Science, The University of Hong Kong
3Microsoft Corporation
*Correspondence: ivanypli@gmail.com

Fact2Fiction is the first poisoning attack framework targeting agentic fact-checking systems. It mirrors the decomposition strategy and exploits system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification.

Fact2Fiction Attack Framework Overview.

Abstract

State-of-the-art fact-checking systems combat misinformation at scale by employing autonomous LLM-based agents to decompose complex claims into smaller sub-claims, verify each sub-claim individually, and aggregate the partial results to produce verdicts with justifications (explanatory rationales for the verdicts). The security of these systems is crucial, as compromised fact-checkers, which tend to be easily underexplored, can amplify misinformation.

This work introduces Fact2Fiction, the first poisoning attack framework targeting such agentic fact-checking systems. Fact2Fiction mirrors the decomposition strategy and exploits system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification. Extensive experiments demonstrate that Fact2Fiction achieves 8.9%--21.2% higher attack success rates than state-of-the-art attacks across various poisoning budgets. Fact2Fiction exposes security weaknesses in current fact-checking systems and highlights the need for defensive countermeasures.

Key Contributions

  • Threat Model: We propose a novel threat model against fact-checking systems that exploits their justifications for targeted poisoning attacks.
  • Attack Method: We introduce Fact2Fiction, the first attack framework that targets SOTA agentic fact-checking systems and crafts targeted malicious evidences.
  • Evaluations and Findings: Extensive experiments show that Fact2Fiction outperforms prior attacks across diverse settings and reveal critical insights based on the findings.

Experimental Results

Fact2Fiction consistently outperforms all baseline attacks across all poisoning rates on all victim systems (DEFAME, InFact, and Simple). Notably, at a 1% poisoning rate, Fact2Fiction achieves an ASR of 42.4% on DEFAME and 46.0% on InFact, which surpasses PoisonedRAG by 8.9% and 9.2%, respectively.

Our evaluations reveal these critical insights:

  1. Justifications introduce a transparency-security trade-off, yielding up to a 12.4% improvement in ASR under constrained budgets.
  2. Evidence quality matters beyond retrievability. Malicious evidences crafted by Fact2Fiction achieve an 8.9% higher ASR than PoisonedRAG at the same level of retrievability.
  3. Different attacks exhibit varying saturation points for attack effectiveness, beyond which additional poisoning budgets yield minimal improvements in ASR, across different victim systems.
  4. Current defenses are ineffective against Fact2Fiction, which highlights the urgent need for novel countermeasures.

BibTeX

@inproceedings{fact2fiction2026,
  title={Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System},
  author={He, Haorui and Li, Yupeng and Zhu, Bin Benjamin and Wen, Dacheng and Cheng, Reynold and Lau, Francis C. M.},
  booktitle={Proc.~of AAAI},
  year={2026},
}