State-of-the-art fact-checking systems combat misinformation at scale by employing autonomous LLM-based agents to decompose complex claims into smaller sub-claims, verify each sub-claim individually, and aggregate the partial results to produce verdicts with justifications (explanatory rationales for the verdicts). The security of these systems is crucial, as compromised fact-checkers, which tend to be easily underexplored, can amplify misinformation.
This work introduces Fact2Fiction, the first poisoning attack framework targeting such agentic fact-checking systems. Fact2Fiction mirrors the decomposition strategy and exploits system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification. Extensive experiments demonstrate that Fact2Fiction achieves 8.9%--21.2% higher attack success rates than state-of-the-art attacks across various poisoning budgets. Fact2Fiction exposes security weaknesses in current fact-checking systems and highlights the need for defensive countermeasures.
Fact2Fiction consistently outperforms all baseline attacks across all poisoning rates on all victim systems (DEFAME, InFact, and Simple). Notably, at a 1% poisoning rate, Fact2Fiction achieves an ASR of 42.4% on DEFAME and 46.0% on InFact, which surpasses PoisonedRAG by 8.9% and 9.2%, respectively.
Our evaluations reveal these critical insights:
@inproceedings{fact2fiction2026,
title={Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System},
author={He, Haorui and Li, Yupeng and Zhu, Bin Benjamin and Wen, Dacheng and Cheng, Reynold and Lau, Francis C. M.},
booktitle={Proc.~of AAAI},
year={2026},
}