Awesome papers about validator
Introduction
The answer generated by LLMs are tend to be haullucinations. To overcome this, it would be better to introduce an additional component to increase the quality of the answer, namely answer enhancement component. The
1. Answer Verifiaction
1.1 Attribution Detection
Date | Title | Authors | Orgnization | Abs |
---|---|---|---|---|
2023/10/07 | Automatic Evaluation of Attribution by Large Language Models [code: |
Xiang Yue, Boshi Wang, Ziru Chen, et. al. | The Ohio State University | <summary>This paper presents evaluation…</summary>This work tries to evaluate the attribution ability (3 types: attributable, extrapolatory, contradictory) of existing LLMs by introducing two benchmarks (i.e., AttrEval-Simulation and AttrEval-GenSearch). It also introduces two types of automatic evaluation methods: 1) Prompting LLMs, 2) Fine-tuning LMs on Repurposed Data. |
1.2 Claim Verification
Date | Title | Authors | Orgnization | Abs |
---|---|---|---|---|
2024/12/16 | Attention with Dependency Parsing Augmentation for Fine-Grained Attribution |
Qiang Ding, Lvzhou Luo, Yixuan Cao, Ping Luo | ICT | <summary>This paper presents fine-grained attribution …</summary>This work proposes two techniques to model-internals-based methods for fine-grained attribution. First, it aggregates token-wise evidence (i.e., attention weights) through set union operations, preserving the granularity of representations. Second, it enhances the attributor by integrating dependency parsing to enrich the semantic completeness of target spans. |
2024/07/02 | Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification |
Pritish Sahu, Karan Sikka, Ajay Divakaran | SRI International, Princeton | <summary>This paper presents Pelican …</summary>Pelican 1) decomposes the visual claim into a chain of sub-claims based on first-order predicates, 2) it then use Program-of-Thought prompting to generate Python code for answering these questions through flexible composition of external tools. |
2024/02/23 | Merging Facts, Crafting Fallacies: Evaluating the Contradictory Nature of Aggregated Factual Claims in Long-Form Generations | heng-Han Chiang, Hung-yi Lee. | National Taiwan University | <summary>This paper presents D-FActScore …</summary>This work finds that combining factual claims together can result in a non-factual paragraph due to entity ambiguity. Current metrics for fact verification fail to properly evaluate these non-factual passages. The authors proposed D-FActScore based on FActScore, and showed the methods and results of human and automatic evaluation. |
2023/10/20 | Explainable Claim Verification via Knowledge-Grounded Reasoning with Large Language Models [code: |
Haoran Wang, Kai Shu | Illinois Institute of Technology, Chicago | <summary>This paper presents FOLK …</summary>This work introduces First-Order-Logic-Guided Knowledge-Grounded (FOLK). 1)FOLK translates input claim into a FOL clause and uses it to guide LLMs to generate a set of question-answer pairs, 2) FOLK then retrieves knowledge-grounded answers from external knowledge-source; 3) FOLK performs FOL-guided reasoning over knowledge-grounded answers to make veracity prediction and generate explanations. |
2. Reasoning-based (CoT) Filtering
Date | Title | Authors | Orgnization | Abs |
---|---|---|---|---|
2023/12/31 | Rethinking with Retrieval: Faithful Large Language Model Inference [code: |
Hangfeng He, Hongming Zhang, Dan Roth | University of Rochester, Tencent AI Lab Seattle, University of Pennsylvania | <summary>This paper presents RR …</summary>This work propose a novel post-processing approach, rethinking with retrieval (RR), which uses decomposed reasoning steps obtained from CoT prompting to retrieve relevant docs for LLMs. Four steps: 1)CoT prompting to generate explanation E and prediction P for query Q. 2)Sampling diverse reasoning path R (i.e., E + P), 3)knowledge K retrieval for each path, 4)faithful inference (NLI model) for each R+K. |
3. Datasets
Date | Title | Authors | Orgnization | Abs | Dataset |
---|---|---|---|---|---|
2024/04/16 | Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers | Yuxia Wang, Revanth G. Reddy, Zain M. Mujahid, et. al. | MBZUAI, Abu Dhabi, UAE | <summary>This paper presents Factcheck-Bench …</summary>Factcheck-bench is a open-domain document-level factuality benchmark in three-level granularity: claim, sentence and document. They frame the automated detection and correction of factual errors for outputs of LLMs into eight subtasks: 1)decomposition; 2) decontextualisation; 3) checkworthiness identification; 4) evidence retrieval and collection; 5) stance detection; 6) correction determination; 7) claim correction; 8) final response revision. |
Factcheck-bench |