Awesome papers about compressor
Introduction
The retrieved documents often contain a list of passages which are ranked by their relevance score to the question. It would be costly to directly input these passages into the LLM. On one hand, the relevance score of these passages does not necessarily indicate their usefulness for answer generation, which could introduce noise to the LLM. On the other hand, the length of the list could exceed the length limit of the LLM. Thus, the mediation (also called post-retrieval) component is introduced to select or compress the retrieved content.
1. Survey papers
Date | Title | Authors | Orgnization | Abs |
---|---|---|---|---|
2024/10/02 (π) |
Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey | Sourav Verma | IBM Watsonx Client Engineering, India | <summary> Survey on contextual compression.</summary>Contextual compression for large language models: semantic compression, pre-trained language models, retrievers. |
2. Compression-based Adapter
2.1 Selective Methods
Selective Methods aims to select a subset of tokens from original contexts, to filter out noises in the context.
Date | Title | Authors | Orgnization | Abs | Dataset |
---|---|---|---|---|---|
2025/01/27 (πππ) |
Provence: efficient and robust context pruning for retrieval-augmented generation [Huggingface Model] |
Nadezhda Chirkova, Thibault Formal, Vassilina Nikoulina, StΓ©phane Clinchant | KNAVER LABS Europe | <summary>Provence.</summary>Provence (Pruning and Reranking Of retrieVEd relevaNt ContExts) poses the context pruning as a sequence labeling task. It finetunes DeBERTa model to encode query-context pair and output binary masks. The labels for training are generated by LLama-3-8B-Instruct. |
NQ, TyDi QA, PopQA, HotpotQA, BioASQ, SyllabusQA, RGB |
2024/12/18 (ππππ) |
EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation [code: |
Taeho Hwang, Sukmin Cho, Soyeong Jeong, et al. | Korea Advanced Institute of Science and Technology | <summary>EXIT.</summary>EXIT (EXtractIve ContexT compression) operates in three stages: 1) splitting retrieved documents into sentences (rule-based), 2οΌperforming parallelizable binary classification (βYESβ or βNOβ) on each sentence (Gemma-2B-it), 3) recombining selected sentences while preserving their orginial order. (LLaMA3.1-8B) |
NQ, TriviaQA, HotpotQA, 2WikiMultiHopQA |
2024/03/21 (ππ) |
FIT-RAG: Black-Box RAG with Factual Information and Token Reduction | Yuren Mao, Xuemei Dong, Wenyi Xu, et al. | Zhejiang University | <summary>This paper presents FIT-RAG β¦</summary>FIT-RAG utilizes the factual information in the retrieval and reduces the number of tokens for augmentation. It consists of five components: a similarity-based retriever, a bi-label document scorer, a bi-faceted self-knowledge recognizer, a sub-document-level token reducer and a prompt construction module. |
TriviaQA, NQ, PopQA |
2024/03/19 (πππ) |
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression [code: |
Zhuoshi Pan, Qianhui Wu, et al. | Microsoft Corporation | <summary>LLMLingua-2.</summary>LLMLingua-2 formulate prompt compression as a token classification problem to guarantee the faithfulness of the compressed prompt to the original one, and use a Transformer encoder as the base architecture to capture all essential information for prompt compression from the full bidirectional context. |
MeetingBank, LongBench, ZeroScrolls, GSM8K, BBH |
2023/11/14 (ππππ) |
Learning to Filter Context for Retrieval-Augmented Generation [code: |
Zhiruo Wang, Jun Araki, et al. | Carnegie Mellon University | <summary>FILCO.</summary>FILCO improves the quality of the context provided to the generator by (1) identifying useful context based on lexical and information-theoretic approaches, and (2) training context filtering models that can filter retrieved contexts at test time. |
NQ, TriviaQA, ELI5, FEVER, WoW |
2023/10/10 (πππ) |
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression [code: |
Huiqiang Jiang, Qianhui Wu, et al. | Microsoft Corporation | <summary>LongLLMLingua.</summary>LongLLMLingua conducts prompt compression towards improving LLMsβ perception of the key information to address three challenges: higher computational/financial cost, longer latency, and inferior performance. |
LongBench, ZeroSCROLLS, MuSiQue, LooGLE |
2023/10/09 (ππππ) |
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models [code: |
Huiqiang Jiang, Qianhui Wu, et al. | Microsoft Corporation | <summary>LLMLingua: LLaMA-7B to identify and remove non-essential tokens.</summary>LLMLingua is a coarse-to-fine prompt compression method that involves a budget controller to maintain semantic integrity under high compression ratios, a token-level iterative compression algorithm to better model the interdependence between compressed contents, and an instruction tuning based method for distribution alignment between language models. |
GSM8K, BBH, ShareGPT, and Arxiv-March23 |
2023/09/02 (ππ) |
LeanContext: Cost-Efficient Domain-Specific Question Answering Using LLMs | Md Adnan Arefeen, Biplob Debnath, Srimat Chakradhar | NEC Laboratories America | <summary>LeanContext: ranking sentence with cosine.</summary>LeanContext extracts k key sentences from the context that are closely aligned with the query. LeanContext introduces a reinforcement learning technique that dynamically determines k based on the query and context. The rest of the less important sentences are reduced using a free open source text reduction method. |
Arxiv, BBC News |
2023/04/24 (ππππ) |
Compressing Context to Enhance Inference Efficiency of Large Language Models [code: |
Yucheng Li, Bo Dong, Frank Guerin, Chenghua Lin | University of Surrey, University of Manchester, University of Sheffield, UK | <summary>Selective_Context: LLaMA-7B token probabilities</summary>Selective_Context evaluates informativeness of lexical units (i.e., tokens, phrases, or sentences) with self-information computed by a base causal language model. It selectively retains content with higher self-information. |
axXiv papers, BBC News, ShareGPT.com |
2.2 Abstractive Methods
Abstractive Methods usually compress contexts by generating summarys, to filter out noises in the context.
Date | Title | Authors | Orgnization | Abs | Dataset |
---|---|---|---|---|---|
2025/05/21 (πππ) |
Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention [code: |
Huanxuan Liao, Wen Hu, Yao Xu, et al. | Institute of Automation, CAS | <summary>HyCo2:finetune, LLaMA3.1-8B.</summary>HyCo2 integrates global and local perspectives to guide context compression. It uses a hybrid adapter to refine global semantics, and incorporates a classification layer to assign a retension probability to each token on the local view. |
NQ, TriviaQA, WebQuestions, PopQA, ComplexWebQuestions, HotpotQA, 2WikiMultihopQA |
2024/10/14 (πππ) |
COMPACT: Compressing Retrieved Documents Actively for Question Answering [code: |
Chanwoong Yoon,Taewhoo Lee, Hyeon Hwang, et al. | Korea University | <summary>CompAct: instruction-tuned Mistral-7B.</summary>CompAct groups documents into several segments, then sequentially compress segments into a compacted context. It uses a subset of HotpotQA training set for data collection, and utilizes GPT-4o API to collect dataset. |
NQ, TriviaQA, HotpotQA, 2WikiMultiHopQA, MuSiQue |
2024/07/04 (πππ) |
Attribute First, then Generate: Locally-attributable Grounded Text Generation [code: |
Aviv Slobodkin, Eran Hirsch et al. | Bar-Ilan University | <summary>AttrFirst.</summary>AttrFirst propose a locally-attributable text generation approach, with prompt-based three steps: 1) content selection (choosing relevant spans from source texts), 2)sentence-level planning (organizing and grouping content), 3)sentence-by-sentence generation (based on selected and structured output). |
DUC, TAC, MultiNews |
2024/02/15 (πππ) |
Grounding Language Model with Chunking-Free In-Context Retrieval | Hongjin Qian, et al. | Gaoling School of Artificial Intelligence, Renmin University of China, | <summary>CFIC.</summary>This paper presents a novel Chunking-Free In-Context (CFIC) retrieval approach, specifically tailored for Retrieval-Augmented Generation (RAG) systems. CFIC addresses these challenges by circumventing the conventional chunking process. It utilizes the encoded hidden states of documents for in-context retrieval, employing auto-aggressive decoding to accurately identify the specific evidence text required for user queries, eliminating the need for chunking. CFIC is further enhanced by incorporating two decoding strategies, namely Constrained Sentence Prefix Decoding and Skip Decoding. These strategies not only improve the efficiency of the retrieval process but also ensure that the fidelity of the generated grounding text evidence is maintained. |
NarrativeQA, Qasper, MultiFieldQA, HotpotQA, MuSiQue |
2023/10/25 (ππ) |
TCRA-LLM: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction | Junyi Liu, Liangzhi Li, et al. | Meetyou AI Lab | <summary>This paper presents TCRAβ¦</summary>TCRA propose a token compression scheme that includes two methods: summarization compression and semantic compression. The first method applies a T5-based model that is fine-tuned by datasets generated using self-instruct containing samples with varying lengths and reduce token size by doing summarization. The second method further compresses the token size by removing words with lower impact on the semantic. |
FRDB |
2023/04/12 (πππ) |
RECOMP: Improving Retrieval-Augmented LMs With Compression and Selective Augmentation [code: |
Fangyuan Xu, Weijia Shi, Eunsol Choi1 | The University of Texas at Austin, University of Washington | <summary>Recomp.</summary>Recomp introduces two types of compressors: an extractive compressor that selects pertinent sentences from retrieved documents, and an abstractive compressor that produces concise summaries by amalgamating information from multiple documents. |
WikiText-103, NQ, TriviaQA, HotpotQA |
3. Thoughts-based Methods
Date | Title | Authors | Orgnization | Abs | Dataset |
---|---|---|---|---|---|
2024/03/28 (πππ) |
ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents [code: |
Zhipeng Xu, Zhenghao Liu, Yukun Yan, et al. | Northeastern University, China | <summary>ActiveRAG β¦</summary>The ActiveRAG workflow follows three-step: 1) the Self-Inquiry Agent produce chain-of-thought (P) to answer the question using LLM based on Q. 2) The Knowledge Assimilation agent generates an assimilation rational (T) based on Q and D. 3) The Thought Accommodation agent generates responses based on (Q, T,P). |
PopQA, TriviaQA, NQ, 2WikiMHQA, ASQA |
2023/10/17 (ππππ) |
SELF-RAG: Learning To Retrieve, Generate, and Critique Through SELF-Reflection [code: |
YAkari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi | University of Washington | <summary>Self-RAG.</summary>This work introduces a framework called Self-Reflective Retrieval-Augmented Generation that enhances an LMβs quality and factuality through retrieval and self-reflection. It trains a single LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special tokens, called reflection tokens. |
PubHealth, ARC-Challenge, PopQA, TRiviaQA, ALCE-ASQA |
2023/10/8 (ππ) |
Self-Knowledge Guided Retrieval Augmentation for Large Language Models [code: |
Yile Wang, Peng Li, Maosong Sun, Yang Liu | Tsinghua University | <summary>SKR.</summary>This work investigate eliciting the modelβs ability to recognize what they know and do not know (which is also called self-knowledge) and propose Self-Knowledge guided Retrieval augmentation (SKR), a simple yet effective method which can let LLMs refer to the questions they have previously encountered and adaptively call for external resources when dealing with new questions. |
TemporalQA, CommonsenseQA, TabularQA, StrategyQA, TruthfulQA |
4. Preference (Dual) Alignment Methods
4.1. Finetuning-based Alignment
Finetuning both retriever and generator to align them for better retrieval and generation, respectively.
Date | Title | Authors | Orgnization | Abs | Dataset |
---|---|---|---|---|---|
2024/07/18 | Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation [code: |
Guanting Dong, Yutao Zhu, Chenghao Zhang, Zechen Wang, Zhicheng Dou, Ji-Rong Wen | Renmin University of China | <summary>This paper presents DPA-RAG β¦</summary>DPA-RAG consists of three key components: (1) Preference Knowledge Construction: it first extracts the specific knowledge that significantly affects LLMsβ reasoning preferences. Then we introduce five query augmentation strategies and a quality filtering process to synthesize high-quality preference knowledge. (2) Reranker-LLM Alignment: it designs multi-grained alignment tasks for fine-tuning a preference-aligned reranker. (3) LLM Self-Alignment: it introduces a pre-aligned phrase prior to the vanilla SFT stage. |
NQ, TriviaQA, HotpotQA, WebQSP |
2023/5/24 | REPLUG: Retrieval-Augmented Black-Box Language Models [code: |
Weijia Shi, Sewon Min, Michihiro Yasunaga, et. al. | University of Washington, Stanford University, KAIST, Meta AI | <summary>This paper presents REPLUT β¦</summary>This work introduce REPLUG, which prepends each retrieved document and question separately to the LLM and ensembles output probabilities from different passes. Besides, it takes LM to score documents to supervise the dense retriever training. |
Pile, NQ, TriviaQA |
2022/11/16 | Atlas: Few-shot Learning with Retrieval Augmented Language Models [code: |
Gautier Izacard, Patrick Lewis, Maria Lomeli, et. al. | Meta AI, ENS, PSL University, Inria, UCL | <summary>This paper presents Atlas β¦</summary>This work present Atlas, a retrieval (Contriever) augmented language model (T5) by carefully designed training, i.e., 1) jointly pre-train the retriever and LLM using unsupervised output, 2) efficient retriever fine-tuning (including full index update, reranking, and query-side fine-tuning). |
KILT, MMLU, NQ, TriviaQA |
4.2. Interative-based Alignment
Iterative between retriever and generator to align them for better retrieval and generation, respectively.
Date | Title | Authors | Orgnization | Abs | Dataset |
---|---|---|---|---|---|
2025/01/24 | Chain-of-Retrieval Augmented Generation | Liang Wang, Haonan Chen, Nan Yang, et. al. | Microsoft Corporation, Renming University of China | <summary>This paper presents CoRAG β¦</summary>This work propose CoRAG, simulates the iterative, step-by-step reasoning process to solve complex questions. It allows the model to refine its query, gather new insights, and synthesize information in a more structured way. They propose rejection sampling to generates intermediate retrieval chains (chain of sub-queries and sub-answers) to fine-tune an LLM using standard next-token prediction. |
HotpotQA,2WikiMHQA, Bamboogle, Musique, and the KILT benchmark. |
2024/11/29 | Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models [code: |
Yu Tian, Shaolei Zhang, Yang Feng | ICT, CAS | <summary>This paper presents Auto-RAG β¦</summary>This work propose Auto-RAG, which autonomously synthesizing reasoning-based decision-making instructions in iterative retrieval and fine-tuned the open-source LLMs. |
NQ, HotpotQA,2WikiMHQA, TriviaQA, PopQA, WebQuestions. |
2023/10/23 | Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy | Zhihong Shao, Yeyun Gong, yelong shen, Minlie Huang, et. al. | Tsinghua University, Microsoft Research Asia | <summary>This paper presents ITER-RETGEN β¦</summary>This work propose ITER-RETGEN, which iterates retrieval-augmented generation and generation-augmented retrieval. Besides, they find that exact match can significantly underestimate the performance of LLMs, and using LLMs to evaluate is more reliable. |
HotpotQA, 2WikiMHQA, MuSiQue, Feverous, StrategyQA |
2023/10/08 | Retrieval-Generation Synergy Augmented Large Language Models | Zhangyin Feng, Xiaocheng Feng, Dezhi Zhao, Maojin Yang, Bing Qin | Harbin Institute of Technology | <summary>This paper presents ITRG β¦</summary>This work propose ITRG, which contains two steps: 1) generation augmented retrieval (GAR) to expand the query based on previous iteration to help retrieve, 2) retrieval augmented generation (RAG) to generate new document to answer questions based on retrieved documents. |
NQ, TriviaQA, 2WikiMHQA, HotpotQA |
2023/6/22 | Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions [code: |
Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal | Stony Brook University, Allen Institute for AI | <summary>This paper presents IRCoT β¦</summary>This work propose IRCoT, which interleaves CoT generation and retrieval steps to guid the retrieval by CoT and vice-versa. Two steps: 1) reason step generates next CoT sentence based on question, retrieved passage, and CoT sentences; 2) retrieval step retrieves K more passages based on the last CoT sentence. |
HotpotQA,2WikiMHQA, MuSiQue, and IIRC |