Introduction

The retrieved documents often contain a list of passages which are ranked by their relevance score to the question. It would be costly to directly input these passages into the LLM. On one hand, the relevance score of these passages does not necessarily indicate their usefulness for answer generation, which could introduce noise to the LLM. On the other hand, the length of the list could exceed the length limit of the LLM. Thus, the mediation (also called post-retrieval) component is introduced to select or compress the retrieved content.

1. Survey papers

Date Title Authors Orgnization Abs
2024/10/02
(🌟)
Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey Sourav Verma IBM Watsonx Client Engineering, India
<summary> Survey on contextual compression.</summary>Contextual compression for large language models: semantic compression, pre-trained language models, retrievers.

2. Compression-based Adapter

2.1 Selective Methods

Selective Methods aims to select a subset of tokens from original contexts, to filter out noises in the context.

Date Title Authors Orgnization Abs Dataset
2025/01/27
(🌟🌟🌟)
Provence: efficient and robust context pruning for retrieval-augmented generation
[Huggingface Model]
Nadezhda Chirkova, Thibault Formal, Vassilina Nikoulina, StΓ©phane Clinchant KNAVER LABS Europe
<summary>Provence.</summary>Provence (Pruning and Reranking Of retrieVEd relevaNt ContExts) poses the context pruning as a sequence labeling task. It finetunes DeBERTa model to encode query-context pair and output binary masks. The labels for training are generated by LLama-3-8B-Instruct.
NQ, TyDi QA, PopQA, HotpotQA, BioASQ, SyllabusQA, RGB
2024/12/18
(🌟🌟🌟🌟)
EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation
[code: ]
Taeho Hwang, Sukmin Cho, Soyeong Jeong, et al. Korea Advanced Institute of Science and Technology
<summary>EXIT.</summary>EXIT (EXtractIve ContexT compression) operates in three stages: 1) splitting retrieved documents into sentences (rule-based), 2οΌ‰performing parallelizable binary classification (β€œYES” or β€œNO”) on each sentence (Gemma-2B-it), 3) recombining selected sentences while preserving their orginial order. (LLaMA3.1-8B)
NQ, TriviaQA, HotpotQA, 2WikiMultiHopQA
2024/03/21
(🌟🌟)
FIT-RAG: Black-Box RAG with Factual Information and Token Reduction Yuren Mao, Xuemei Dong, Wenyi Xu, et al. Zhejiang University
<summary>This paper presents FIT-RAG …</summary>FIT-RAG utilizes the factual information in the retrieval and reduces the number of tokens for augmentation. It consists of five components: a similarity-based retriever, a bi-label document scorer, a bi-faceted self-knowledge recognizer, a sub-document-level token reducer and a prompt construction module.
TriviaQA, NQ, PopQA
2024/03/19
(🌟🌟🌟)
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
[code: ]
Zhuoshi Pan, Qianhui Wu, et al. Microsoft Corporation
<summary>LLMLingua-2.</summary>LLMLingua-2 formulate prompt compression as a token classification problem to guarantee the faithfulness of the compressed prompt to the original one, and use a Transformer encoder as the base architecture to capture all essential information for prompt compression from the full bidirectional context.
MeetingBank, LongBench, ZeroScrolls, GSM8K, BBH
2023/11/14
(🌟🌟🌟🌟)
Learning to Filter Context for Retrieval-Augmented Generation
[code: ]
Zhiruo Wang, Jun Araki, et al. Carnegie Mellon University
<summary>FILCO.</summary>FILCO improves the quality of the context provided to the generator by (1) identifying useful context based on lexical and information-theoretic approaches, and (2) training context filtering models that can filter retrieved contexts at test time.
NQ, TriviaQA, ELI5, FEVER, WoW
2023/10/10
(🌟🌟🌟)
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
[code: ]
Huiqiang Jiang, Qianhui Wu, et al. Microsoft Corporation
<summary>LongLLMLingua.</summary>LongLLMLingua conducts prompt compression towards improving LLMs’ perception of the key information to address three challenges: higher computational/financial cost, longer latency, and inferior performance.
LongBench, ZeroSCROLLS, MuSiQue, LooGLE
2023/10/09
(🌟🌟🌟🌟)
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
[code: ]
Huiqiang Jiang, Qianhui Wu, et al. Microsoft Corporation
<summary>LLMLingua: LLaMA-7B to identify and remove non-essential tokens.</summary>LLMLingua is a coarse-to-fine prompt compression method that involves a budget controller to maintain semantic integrity under high compression ratios, a token-level iterative compression algorithm to better model the interdependence between compressed contents, and an instruction tuning based method for distribution alignment between language models.
GSM8K, BBH, ShareGPT, and Arxiv-March23
2023/09/02
(🌟🌟)
LeanContext: Cost-Efficient Domain-Specific Question Answering Using LLMs Md Adnan Arefeen, Biplob Debnath, Srimat Chakradhar NEC Laboratories America
<summary>LeanContext: ranking sentence with cosine.</summary>LeanContext extracts k key sentences from the context that are closely aligned with the query. LeanContext introduces a reinforcement learning technique that dynamically determines k based on the query and context. The rest of the less important sentences are reduced using a free open source text reduction method.
Arxiv, BBC News
2023/04/24
(🌟🌟🌟🌟)
Compressing Context to Enhance Inference Efficiency of Large Language Models
[code: ]
Yucheng Li, Bo Dong, Frank Guerin, Chenghua Lin University of Surrey, University of Manchester, University of Sheffield, UK
<summary>Selective_Context: LLaMA-7B token probabilities</summary>Selective_Context evaluates informativeness of lexical units (i.e., tokens, phrases, or sentences) with self-information computed by a base causal language model. It selectively retains content with higher self-information.
axXiv papers, BBC News, ShareGPT.com

2.2 Abstractive Methods

Abstractive Methods usually compress contexts by generating summarys, to filter out noises in the context.

Date Title Authors Orgnization Abs Dataset
2025/05/21
(🌟🌟🌟)
Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention
[code: ]
Huanxuan Liao, Wen Hu, Yao Xu, et al. Institute of Automation, CAS
<summary>HyCo2:finetune, LLaMA3.1-8B.</summary>HyCo2 integrates global and local perspectives to guide context compression. It uses a hybrid adapter to refine global semantics, and incorporates a classification layer to assign a retension probability to each token on the local view.
NQ, TriviaQA, WebQuestions, PopQA, ComplexWebQuestions, HotpotQA, 2WikiMultihopQA
2024/10/14
(🌟🌟🌟)
COMPACT: Compressing Retrieved Documents Actively for Question Answering
[code: ]
Chanwoong Yoon,Taewhoo Lee, Hyeon Hwang, et al. Korea University
<summary>CompAct: instruction-tuned Mistral-7B.</summary>CompAct groups documents into several segments, then sequentially compress segments into a compacted context. It uses a subset of HotpotQA training set for data collection, and utilizes GPT-4o API to collect dataset.
NQ, TriviaQA, HotpotQA, 2WikiMultiHopQA, MuSiQue
2024/07/04
(🌟🌟🌟)
Attribute First, then Generate: Locally-attributable Grounded Text Generation
[code: ]
Aviv Slobodkin, Eran Hirsch et al. Bar-Ilan University
<summary>AttrFirst.</summary>AttrFirst propose a locally-attributable text generation approach, with prompt-based three steps: 1) content selection (choosing relevant spans from source texts), 2)sentence-level planning (organizing and grouping content), 3)sentence-by-sentence generation (based on selected and structured output).
DUC, TAC, MultiNews
2024/02/15
(🌟🌟🌟)
Grounding Language Model with Chunking-Free In-Context Retrieval Hongjin Qian, et al. Gaoling School of Artificial Intelligence, Renmin University of China,
<summary>CFIC.</summary>This paper presents a novel Chunking-Free In-Context (CFIC) retrieval approach, specifically tailored for Retrieval-Augmented Generation (RAG) systems. CFIC addresses these challenges by circumventing the conventional chunking process. It utilizes the encoded hidden states of documents for in-context retrieval, employing auto-aggressive decoding to accurately identify the specific evidence text required for user queries, eliminating the need for chunking. CFIC is further enhanced by incorporating two decoding strategies, namely Constrained Sentence Prefix Decoding and Skip Decoding. These strategies not only improve the efficiency of the retrieval process but also ensure that the fidelity of the generated grounding text evidence is maintained.
NarrativeQA, Qasper, MultiFieldQA, HotpotQA, MuSiQue
2023/10/25
(🌟🌟)
TCRA-LLM: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction Junyi Liu, Liangzhi Li, et al. Meetyou AI Lab
<summary>This paper presents TCRA…</summary>TCRA propose a token compression scheme that includes two methods: summarization compression and semantic compression. The first method applies a T5-based model that is fine-tuned by datasets generated using self-instruct containing samples with varying lengths and reduce token size by doing summarization. The second method further compresses the token size by removing words with lower impact on the semantic.
FRDB
2023/04/12
(🌟🌟🌟)
RECOMP: Improving Retrieval-Augmented LMs With Compression and Selective Augmentation
[code: ]
Fangyuan Xu, Weijia Shi, Eunsol Choi1 The University of Texas at Austin, University of Washington
<summary>Recomp.</summary>Recomp introduces two types of compressors: an extractive compressor that selects pertinent sentences from retrieved documents, and an abstractive compressor that produces concise summaries by amalgamating information from multiple documents.
WikiText-103, NQ, TriviaQA, HotpotQA

3. Thoughts-based Methods

Date Title Authors Orgnization Abs Dataset
2024/03/28
(🌟🌟🌟)
ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents [code: ] Zhipeng Xu, Zhenghao Liu, Yukun Yan, et al. Northeastern University, China
<summary>ActiveRAG …</summary>The ActiveRAG workflow follows three-step: 1) the Self-Inquiry Agent produce chain-of-thought (P) to answer the question using LLM based on Q. 2) The Knowledge Assimilation agent generates an assimilation rational (T) based on Q and D. 3) The Thought Accommodation agent generates responses based on (Q, T,P).
PopQA, TriviaQA, NQ, 2WikiMHQA, ASQA
2023/10/17
(🌟🌟🌟🌟)
SELF-RAG: Learning To Retrieve, Generate, and Critique Through SELF-Reflection [code: ] YAkari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi University of Washington
<summary>Self-RAG.</summary>This work introduces a framework called Self-Reflective Retrieval-Augmented Generation that enhances an LM’s quality and factuality through retrieval and self-reflection. It trains a single LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special tokens, called reflection tokens.
PubHealth, ARC-Challenge, PopQA, TRiviaQA, ALCE-ASQA
2023/10/8
(🌟🌟)
Self-Knowledge Guided Retrieval Augmentation for Large Language Models [code: ] Yile Wang, Peng Li, Maosong Sun, Yang Liu Tsinghua University
<summary>SKR.</summary>This work investigate eliciting the model’s ability to recognize what they know and do not know (which is also called self-knowledge) and propose Self-Knowledge guided Retrieval augmentation (SKR), a simple yet effective method which can let LLMs refer to the questions they have previously encountered and adaptively call for external resources when dealing with new questions.
TemporalQA, CommonsenseQA, TabularQA, StrategyQA, TruthfulQA

4. Preference (Dual) Alignment Methods

4.1. Finetuning-based Alignment

Finetuning both retriever and generator to align them for better retrieval and generation, respectively.

Date Title Authors Orgnization Abs Dataset
2024/07/18 Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation [code: ] Guanting Dong, Yutao Zhu, Chenghao Zhang, Zechen Wang, Zhicheng Dou, Ji-Rong Wen Renmin University of China
<summary>This paper presents DPA-RAG …</summary>DPA-RAG consists of three key components: (1) Preference Knowledge Construction: it first extracts the specific knowledge that significantly affects LLMs’ reasoning preferences. Then we introduce five query augmentation strategies and a quality filtering process to synthesize high-quality preference knowledge. (2) Reranker-LLM Alignment: it designs multi-grained alignment tasks for fine-tuning a preference-aligned reranker. (3) LLM Self-Alignment: it introduces a pre-aligned phrase prior to the vanilla SFT stage.
NQ, TriviaQA, HotpotQA, WebQSP
2023/5/24 REPLUG: Retrieval-Augmented Black-Box Language Models [code: ] Weijia Shi, Sewon Min, Michihiro Yasunaga, et. al. University of Washington, Stanford University, KAIST, Meta AI
<summary>This paper presents REPLUT …</summary>This work introduce REPLUG, which prepends each retrieved document and question separately to the LLM and ensembles output probabilities from different passes. Besides, it takes LM to score documents to supervise the dense retriever training.
Pile, NQ, TriviaQA
2022/11/16 Atlas: Few-shot Learning with Retrieval Augmented Language Models
[code: ]
Gautier Izacard, Patrick Lewis, Maria Lomeli, et. al. Meta AI, ENS, PSL University, Inria, UCL
<summary>This paper presents Atlas …</summary>This work present Atlas, a retrieval (Contriever) augmented language model (T5) by carefully designed training, i.e., 1) jointly pre-train the retriever and LLM using unsupervised output, 2) efficient retriever fine-tuning (including full index update, reranking, and query-side fine-tuning).
KILT, MMLU, NQ, TriviaQA

4.2. Interative-based Alignment

Iterative between retriever and generator to align them for better retrieval and generation, respectively.

Date Title Authors Orgnization Abs Dataset
2025/01/24 Chain-of-Retrieval Augmented Generation Liang Wang, Haonan Chen, Nan Yang, et. al. Microsoft Corporation, Renming University of China
<summary>This paper presents CoRAG …</summary>This work propose CoRAG, simulates the iterative, step-by-step reasoning process to solve complex questions. It allows the model to refine its query, gather new insights, and synthesize information in a more structured way. They propose rejection sampling to generates intermediate retrieval chains (chain of sub-queries and sub-answers) to fine-tune an LLM using standard next-token prediction.
HotpotQA,2WikiMHQA, Bamboogle, Musique, and the KILT benchmark.
2024/11/29 Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models
[code: ]
Yu Tian, Shaolei Zhang, Yang Feng ICT, CAS
<summary>This paper presents Auto-RAG …</summary>This work propose Auto-RAG, which autonomously synthesizing reasoning-based decision-making instructions in iterative retrieval and fine-tuned the open-source LLMs.
NQ, HotpotQA,2WikiMHQA, TriviaQA, PopQA, WebQuestions.
2023/10/23 Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy Zhihong Shao, Yeyun Gong, yelong shen, Minlie Huang, et. al. Tsinghua University, Microsoft Research Asia
<summary>This paper presents ITER-RETGEN …</summary>This work propose ITER-RETGEN, which iterates retrieval-augmented generation and generation-augmented retrieval. Besides, they find that exact match can significantly underestimate the performance of LLMs, and using LLMs to evaluate is more reliable.
HotpotQA, 2WikiMHQA, MuSiQue, Feverous, StrategyQA
2023/10/08 Retrieval-Generation Synergy Augmented Large Language Models Zhangyin Feng, Xiaocheng Feng, Dezhi Zhao, Maojin Yang, Bing Qin Harbin Institute of Technology
<summary>This paper presents ITRG …</summary>This work propose ITRG, which contains two steps: 1) generation augmented retrieval (GAR) to expand the query based on previous iteration to help retrieve, 2) retrieval augmented generation (RAG) to generate new document to answer questions based on retrieved documents.
NQ, TriviaQA, 2WikiMHQA, HotpotQA
2023/6/22 Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
[code: ]
Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal Stony Brook University, Allen Institute for AI
<summary>This paper presents IRCoT …</summary>This work propose IRCoT, which interleaves CoT generation and retrieval steps to guid the retrieval by CoT and vice-versa. Two steps: 1) reason step generates next CoT sentence based on question, retrieved passage, and CoT sentences; 2) retrieval step retrieves K more passages based on the last CoT sentence.
HotpotQA,2WikiMHQA, MuSiQue, and IIRC